A collection of tutorials, code and tools to help you get better using Oracle
14 July 2011
I am running a forum, and have a table called Posts. I would like to display the 10 newest posts on the front page. I have tried limiting the result set using ROWNUM and order by, but it doesn't give me the 10 newest posts. What is wrong?
This sort of question seems to come up time and time again, and as far as I can tell, just about everyone implements it wrong when they first start out.
Queries that limit the number of rows in a result set are known as window queries or top n queries, and there are two kinds:
In other database engines (like Mysql for example) there is a key word called LIMIT that can be applied to a query to implement top n queries easily. Oracle does not have this feature.
Every query which Oracle executes has a pseudo column called rownum. As Oracle produces the results for a query, each row to be returned is assigned a 'row number', starting a 1 and increasing by 1 for each row.
You can always select the rownum for a query using the rownum keyword, eg:
SQL11G> select rownum, d.*
from dual d;
ROWNUM DUMMY
---------------------- -----
1 X
Aside from knowing about the presence of rownum, the key is to know when it is applied to the result set.
A row is assigned a row number after all filters and joins have been applied, but before any sorts (order by) have taken place.
Lets create a simple Posts table to implement the forum example to test things out, and fill it with some posts:
SQL11G> create table posts (id integer,
post_date date,
post_title varchar2(255));
SQL11G> begin
for i in 1 .. 50 loop
insert into posts values (i, sysdate + i, 'Title for post number '||i);
end loop;
end;
/
SQL11G> select max(post_date) min(post_date)
from posts;
MAX(POST_DATE) MIN(POST_DATE)
------------------------- -------------------------
02-SEP-11 15-JUL-11
Now we have a table with 50 rows. The earliest post is from 15th July and there is one new post each day until the 2nd September.
To demonstrate that rownum is assigned before any sorts have been applied, we can select from the Posts table, and order it randomly:
SQL11G> select rownum, p.*
from posts p
order by dbms_random.random;
ROWNUM ID POST_DATE POST_TITLE
---------------------- ---------------------- ------------------------- -------------------------
31 31 14-AUG-11 Title for post number 31
12 12 26-JUL-11 Title for post number 12
10 10 24-JUL-11 Title for post number 10
28 28 11-AUG-11 Title for post number 28
41 41 24-AUG-11 Title for post number 41
43 43 26-AUG-11 Title for post number 43
7 7 21-JUL-11 Title for post number 7
36 36 19-AUG-11 Title for post number 36
23 23 06-AUG-11 Title for post number 23
13 13 27-JUL-11 Title for post number 13
49 49 01-SEP-11 Title for post number 49
Notice how the rownum assigned to each row is in a random order. If the rownum was assigned after sorting, then we would expect the first to have 1, the second 2 etc.
After all this discussion of rownum, it will come as no surprise that we can use it to limit the number of results returned by a query. Consider the requirement at the start of this post - show the 10 newest posts. Most Oracle beginners will implement the following wrong code:
SQL11G> select id, post_date, post_title
from posts
where rownum <= 10
order by post_date desc;
We already know that rownum is assigned to the rows before the order by is applied, and we can prove it by looking at the explain plan:
-----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 360 | 4 (25)| 00:00:01 |
| 1 | SORT ORDER BY | | 10 | 360 | 4 (25)| 00:00:01 |
|* 2 | COUNT STOPKEY | | | | | |
| 3 | TABLE ACCESS FULL| POSTS | 50 | 1800 | 3 (0)| 00:00:01 |
-----------------------------------------------------------------------------
Oracle executes this plan from the bottom up. Each row from the 'Table Access Full' step is passed to the 'COUNT STOPKEY' operation which is there because we limited the results using rownum. When it receives 10 rows, Oracle stops looking for more rows, sorts and returns the results:
ID POST_DATE POST_TITLE
---------------------- ------------------------- -------------------------
10 24-JUL-11 Title for post number 10
9 23-JUL-11 Title for post number 9
8 22-JUL-11 Title for post number 8
7 21-JUL-11 Title for post number 7
6 20-JUL-11 Title for post number 6
5 19-JUL-11 Title for post number 5
4 18-JUL-11 Title for post number 4
3 17-JUL-11 Title for post number 3
2 16-JUL-11 Title for post number 2
1 15-JUL-11 Title for post number 1
Remember the newest post was made on 2nd September, so these results are not correct. In the general case, this query will give you 10 random rows from the table, and sort that result. The fact the results here are the 10 oldest rows is really down to how the table was loaded and cannot be relied upon.
To get the top n results, you need to sort the data, then apply rownum, and you can do that using an inline view:
SQL11G> select id, post_date, post_title
from
(
select id, post_date, post_title
from posts
order by post_date desc
) a
where rownum <= 10;
Notice how the entire query including order by is in the inner query, and the rownum filter is applied to that result set. The explain plan will also highlight the difference:
---------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 360 | 4 (25)| 00:00:01 |
|* 1 | COUNT STOPKEY | | | | | |
| 2 | VIEW | | 50 | 1800 | 4 (25)| 00:00:01 |
|* 3 | SORT ORDER BY STOPKEY| | 50 | 1800 | 4 (25)| 00:00:01 |
| 4 | TABLE ACCESS FULL | POSTS | 50 | 1800 | 3 (0)| 00:00:01 |
---------------------------------------------------------------------------------
Notice how the COUNT STOPKEY operation has moved up the plan, and it is now applied after the SORT ORDER BY STOPKEY operation on line 3 (more on it later). Looking at the results, we can see this query gives the correct answer:
ID POST_DATE POST_TITLE
---------------------- ------------------------- -------------------------
50 02-SEP-11 Title for post number 50
49 01-SEP-11 Title for post number 49
48 31-AUG-11 Title for post number 48
47 30-AUG-11 Title for post number 47
46 29-AUG-11 Title for post number 46
45 28-AUG-11 Title for post number 45
44 27-AUG-11 Title for post number 44
43 26-AUG-11 Title for post number 43
42 25-AUG-11 Title for post number 42
41 24-AUG-11 Title for post number 41
If this query was used in a forum to display the 10 most recent posts on page 1, then it is likely another query is require to display posts 11 - 20 on page 2. Doing that is an extension of the query above.
To get results 11 - 20, first sort the data and keep the first 20 posts numbering each row with rownum:
SQL11G> select rownum rnum, id, post_date, post_title
from
(
select id, post_date, post_title
from posts
order by post_date desc
) a
where rownum <= 20;
Now we have the top 20 rows, and created a new column in the result set called rnum. Using another inline view, you can use rnum to filter out the first 10 results, giving you 11 - 20:
SQL11G> select id, post_date, post_title
from
(
select rownum rnum, id, post_date, post_title
from
(
select id, post_date, post_title
from posts
order by post_date desc
) a
where rownum <= 20
) b where rnum > 10
ID POST_DATE POST_TITLE
---------------------- ------------------------- -------------------------
40 23-AUG-11 Title for post number 40
39 22-AUG-11 Title for post number 39
38 21-AUG-11 Title for post number 38
37 20-AUG-11 Title for post number 37
36 19-AUG-11 Title for post number 36
35 18-AUG-11 Title for post number 35
34 17-AUG-11 Title for post number 34
33 16-AUG-11 Title for post number 33
32 15-AUG-11 Title for post number 32
31 14-AUG-11 Title for post number 31
The syntax is a bit more clunky than as simple LIMIT keyword, but once you know how rownum works, it is easy to apply it to any case.
select *
from
(
<your query here
including all filters
and order by>
) where rownum <= :N
(Substitute numperpage and page_num accordingly)
select
from
(
select rownum rnum a.*
from
(
<your query here
including all filters
and order by>
) a where rownum <= :num_per_page * :page_num
) b where rnum > :num_per_page * (:page_num - 1)
In the top N query explain plan, I highlighted the sort operation changing from 'SORT ORDER BY' to 'SORT ORDER BY STOPKEY'. This is a special sort operation that Oracle uses to make top N queries more efficient.
When you add a rownum filter, Oracle knows the query will return at most N rows, it doesn't need to do a full sort of the data, it only needs to remember the top (or bottom) N results of the sort.
Conceptually, Oracle will allocate N slots for the results.
For each row read, it will check if it sorts higher than the N results it is holding, if it does the record in the lowest position gets pushed out, and the new record is slotted into the correct space. This is much more efficient than sorting potentially millions of rows when only 10 are needed.
In the forum example above, over time many posts will be made, and scanning and sorting the entire table to find only the 10 latest posts is going to be slow. The solution is to create an index on the column used to sort the results (postdate in this case) and ensure that postdate has a not null constraint on it. Unlike a table, records are stored sorted in an index, and Oracle can do a very efficient index range scan to find the top N or window of results.
The queries to get the results remain the same as above whether an index is involved or not.