A collection of tutorials, code and tools to help you get better using Oracle
15 May 2013
A requirement that comes up a lot is the need to store strings longer than 4000 characters in the database, and this presents a problem. Up to and including Oracle 11gR2 the longest varchar2 that can be stored in the database is 4000 characters. I have heard that will change to 32K characters in Oracle 12c, but that hasn't been released yet, and when it is, it will be some time before it is widely adopted.
For this post, I am not concerned about storing strings that are many mega or gigabytes long - I am more interested in strings that are generally a few thousand characters up to about 32kb in size.
There are really two options if you need to store strings longer than 4000 characters. You can either chop the string into 4000 long pieces and store it in many rows, or make use of the CLOB data type, which facilitates storing very large strings.
Chopping the string up into pieces requires a little more code in the application. While this is an area bugs could creep in, the code generally isn't too complex. CLOBs on the other hand just work, and realistically have no size limitations.
I don't want to get into too much detail about how CLOBs work, as it has been covered well in many other places, but there are a few things to consider.
When you create a table with a CLOB column, there is an option called 'Enable Storage In Row' for the CLOB. This dictates whether the CLOB is stored in the same segment as the table, and hence alongside the rest of the table row, or in a separate segment by itself. In the second case, the table row would then contain a pointer to the location of the CLOB data.
Generally it is more efficient to store the CLOB with the table row, but if the CLOB is longer than about 4000 characters, it can no longer be stored in the row, and is stored in the CLOB segment regardless of whether storage in row in enabled or not.
When a CLOB is not stored with the table row it is stored in a series of blocks in its own segment. Only 1 CLOB can use a given block, so if the database is running an 8kb blocksize, and a CLOB is 5000 bytes, it will require an entire database block. The remaining 3kb cannot be used for anything else. Similarly a 9kb CLOB will require 2 database blocks, wasting about 7kb of space. This is less of a problem with very large CLOBs, but with applications dealing with many CLOBs under 32kb a lot of space could get wasted. Note that if the CLOB is under 4kb, and stored with the table row this '1 CLOB per block' limitation does not apply.
I decided to run a test inserting 2000 rows of CLOBs of different sizes into a table with a single CLOB column, storage in row enabled. I did this using 3 different techniques:
Clob Size | Single Row | Oracle Batching | Array Inserts
2000 11.40s 9.95s 4.71s
4000 13.40s 9.99s 3.95
6000 18.59s 12.08s error
There are a couple of interesting observations.
To get a further set of numbers, I ran a test to insert 20K rows of 4000 character CLOBs:
CLOB Size | Single Row | Oracle Batching | Array Inserts
4000 121.90s 90.68s 43.5s
These numbers line up pretty well with the observations above.
We know that it is not possible to insert a Varchar2 over 4000 characters, but it is interesting to compare the insert speeds of VARCAHRs for the same insert methods. The results of inserting 20K varchar2's:
VARCHAR Size | Single Row | Oracle Batching | Array Inserts
4000 27.09s 12.96 13.5s
In this case, Oracle batching is about 4 times faster than the most efficient way of inserting CLOBs, but remember that the Array Insert code does not work for CLOBs over 4000 characters, so it is about 8 times faster in a real world test.
For one final test, I created 20K 10K long CLOBs - this took 147 seconds. Compare that to inserting 60K rows of varchar2s (three times as many rows) at about 39 seconds.
Writing small CLOBs is slow when compared to VARCHAR2s. If write performance is of great concern, then coding up something to split long strings into 4000 character chunks is probably a good investment.
The problems with CLOBs don't stop with write performance - you also pay a heavy price when reading them back. The reason is that when Oracle returns a row containing a CLOB, it doesn't actually contain any of the CLOB data - the client must make 2 additional server round trips to get the data. This is easy to see in sqlplus with autotrace enabled:
SQL> create table round_trips (c1 clob);
Table created.
SQL> insert into round_trips
select 'abcdef' from dual connect by level <= 10;
10 rows created.
SQL> set autot on
SQL> select c1 from round_trips;
C1
--------------------------------------------------------------------------------
abcdef
abcdef
abcdef
abcdef
abcdef
abcdef
abcdef
abcdef
abcdef
abcdef
10 rows selected.
Statistics
----------------------------------------------------------
5 recursive calls
0 db block gets
24 consistent gets
0 physical reads
0 redo size
3627 bytes sent via SQL*Net to client
2575 bytes received via SQL*Net from client
===> 22 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
10 rows processed
Compare this to a similar test with a varchar2 column:
SQL> create table less_roundtrips (c1 varchar2(4000));
Table created.
SQL> insert into less_roundtrips
select 'abcdef' from dual connect by level <= 10;
10 rows created.
SQL> select * from less_roundtrips;
C1
--------------------------------------------------------------------------------
abcdef
abcdef
abcdef
abcdef
abcdef
abcdef
abcdef
abcdef
abcdef
abcdef
10 rows selected.
Statistics
----------------------------------------------------------
5 recursive calls
0 db block gets
16 consistent gets
0 physical reads
0 redo size
353 bytes sent via SQL*Net to client
365 bytes received via SQL*Net from client
===> 2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
10 rows processed
In the first test 10 rows containing a CLOB were returned, but in the second, 10 rows containing a varchar2 were returned. The difference in round trips is 20 - 2 per CLOB, even though each CLOB only contains 5 characters.
If you want to test this stuff out yourself, I uploaded the code code I used.