Better At Oracle

A collection of tutorials, code and tools to help you get better using Oracle


15 May 2013

The Performance Cost Of Clobs

The Performance Cost of Clobs

A requirement that comes up a lot is the need to store strings longer than 4000 characters in the database, and this presents a problem. Up to and including Oracle 11gR2 the longest varchar2 that can be stored in the database is 4000 characters. I have heard that will change to 32K characters in Oracle 12c, but that hasn't been released yet, and when it is, it will be some time before it is widely adopted.

For this post, I am not concerned about storing strings that are many mega or gigabytes long - I am more interested in strings that are generally a few thousand characters up to about 32kb in size.

Options

There are really two options if you need to store strings longer than 4000 characters. You can either chop the string into 4000 long pieces and store it in many rows, or make use of the CLOB data type, which facilitates storing very large strings.

Chopping the string up into pieces requires a little more code in the application. While this is an area bugs could creep in, the code generally isn't too complex. CLOBs on the other hand just work, and realistically have no size limitations.

How CLOBs work

I don't want to get into too much detail about how CLOBs work, as it has been covered well in many other places, but there are a few things to consider.

Inline or Out of Line

When you create a table with a CLOB column, there is an option called 'Enable Storage In Row' for the CLOB. This dictates whether the CLOB is stored in the same segment as the table, and hence alongside the rest of the table row, or in a separate segment by itself. In the second case, the table row would then contain a pointer to the location of the CLOB data.

Generally it is more efficient to store the CLOB with the table row, but if the CLOB is longer than about 4000 characters, it can no longer be stored in the row, and is stored in the CLOB segment regardless of whether storage in row in enabled or not.

Chunksize

When a CLOB is not stored with the table row it is stored in a series of blocks in its own segment. Only 1 CLOB can use a given block, so if the database is running an 8kb blocksize, and a CLOB is 5000 bytes, it will require an entire database block. The remaining 3kb cannot be used for anything else. Similarly a 9kb CLOB will require 2 database blocks, wasting about 7kb of space. This is less of a problem with very large CLOBs, but with applications dealing with many CLOBs under 32kb a lot of space could get wasted. Note that if the CLOB is under 4kb, and stored with the table row this '1 CLOB per block' limitation does not apply.

Write Performance over JDBC

I decided to run a test inserting 2000 rows of CLOBs of different sizes into a table with a single CLOB column, storage in row enabled. I did this using 3 different techniques:

Clob Size | Single Row | Oracle Batching | Array Inserts
2000        11.40s       9.95s             4.71s
4000        13.40s       9.99s             3.95
6000        18.59s       12.08s            error

There are a couple of interesting observations.

  • Passing the CLOBs to PLSQL and inserting using the FORALL syntax is much faster.
  • You don't seem to be able to use PLSQL arrays when the size is over 4000. This may be a limitation or a bug in my code, I am not sure.
  • The Oracle batching seems to be faster, but not much faster than the single row approach.
  • 10+ seconds to insert 2000 rows isn't exactly fast.

To get a further set of numbers, I ran a test to insert 20K rows of 4000 character CLOBs:

CLOB Size | Single Row | Oracle Batching | Array Inserts
4000        121.90s      90.68s            43.5s

These numbers line up pretty well with the observations above.

Comparing With Varchar2 Inserts.

We know that it is not possible to insert a Varchar2 over 4000 characters, but it is interesting to compare the insert speeds of VARCAHRs for the same insert methods. The results of inserting 20K varchar2's:

VARCHAR Size | Single Row | Oracle Batching | Array Inserts
4000           27.09s       12.96             13.5s       

In this case, Oracle batching is about 4 times faster than the most efficient way of inserting CLOBs, but remember that the Array Insert code does not work for CLOBs over 4000 characters, so it is about 8 times faster in a real world test.

For one final test, I created 20K 10K long CLOBs - this took 147 seconds. Compare that to inserting 60K rows of varchar2s (three times as many rows) at about 39 seconds.

Write Performance Conclusion

Writing small CLOBs is slow when compared to VARCHAR2s. If write performance is of great concern, then coding up something to split long strings into 4000 character chunks is probably a good investment.

Read Performance

The problems with CLOBs don't stop with write performance - you also pay a heavy price when reading them back. The reason is that when Oracle returns a row containing a CLOB, it doesn't actually contain any of the CLOB data - the client must make 2 additional server round trips to get the data. This is easy to see in sqlplus with autotrace enabled:

SQL> create table round_trips (c1 clob);

Table created.

SQL> insert into round_trips
     select 'abcdef' from dual connect by level <= 10;

10 rows created.

SQL> set autot on

SQL> select c1 from round_trips;

C1
--------------------------------------------------------------------------------
abcdef
abcdef
abcdef
abcdef
abcdef
abcdef
abcdef
abcdef
abcdef
abcdef

10 rows selected.


Statistics
----------------------------------------------------------
          5  recursive calls
          0  db block gets
         24  consistent gets
          0  physical reads
          0  redo size
       3627  bytes sent via SQL*Net to client
       2575  bytes received via SQL*Net from client
===>     22  SQL*Net roundtrips to/from client         
          0  sorts (memory)
          0  sorts (disk)
         10  rows processed

Compare this to a similar test with a varchar2 column:

SQL> create table less_roundtrips (c1 varchar2(4000));

Table created.

SQL> insert into less_roundtrips
     select 'abcdef' from dual connect by level <= 10;

10 rows created.

SQL> select * from less_roundtrips;

C1
--------------------------------------------------------------------------------
abcdef
abcdef
abcdef
abcdef
abcdef
abcdef
abcdef
abcdef
abcdef
abcdef

10 rows selected.

Statistics
----------------------------------------------------------
          5  recursive calls
          0  db block gets
         16  consistent gets
          0  physical reads
          0  redo size
        353  bytes sent via SQL*Net to client
        365  bytes received via SQL*Net from client
===>      2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         10  rows processed

In the first test 10 rows containing a CLOB were returned, but in the second, 10 rows containing a varchar2 were returned. The difference in round trips is 20 - 2 per CLOB, even though each CLOB only contains 5 characters.

If you want to test this stuff out yourself, I uploaded the code code I used.

blog comments powered by Disqus