Better At Oracle

A collection of tutorials, code and tools to help you get better using Oracle


14 July 2011

Limiting query results - Top n and Window Queries

I am running a forum, and have a table called Posts. I would like to display the 10 newest posts on the front page. I have tried limiting the result set using ROWNUM and order by, but it doesn't give me the 10 newest posts. What is wrong?

This sort of question seems to come up time and time again, and as far as I can tell, just about everyone implements it wrong when they first start out.

Queries that limit the number of rows in a result set are known as window queries or top n queries, and there are two kinds:

  • Obtain the first or last n rows of a result set
  • Obtain page 1, page 2, page 3 of a result set

In other database engines (like Mysql for example) there is a key word called LIMIT that can be applied to a query to implement top n queries easily. Oracle does not have this feature.

Rownum

Every query which Oracle executes has a pseudo column called rownum. As Oracle produces the results for a query, each row to be returned is assigned a 'row number', starting a 1 and increasing by 1 for each row.

You can always select the rownum for a query using the rownum keyword, eg:

SQL11G> select rownum, d.*
        from dual d;

ROWNUM                 DUMMY 
---------------------- ----- 
1                      X 

Aside from knowing about the presence of rownum, the key is to know when it is applied to the result set.

A row is assigned a row number after all filters and joins have been applied, but before any sorts (order by) have taken place.

Lets create a simple Posts table to implement the forum example to test things out, and fill it with some posts:

SQL11G> create table posts (id integer, 
                            post_date date,
                            post_title varchar2(255));

SQL11G> begin
          for i in 1 .. 50 loop
            insert into posts values (i, sysdate + i, 'Title for post number '||i); 
          end loop;
        end;
        /

SQL11G> select max(post_date) min(post_date)
        from posts;

MAX(POST_DATE)            MIN(POST_DATE)            
------------------------- ------------------------- 
02-SEP-11                 15-JUL-11   

Now we have a table with 50 rows. The earliest post is from 15th July and there is one new post each day until the 2nd September.

To demonstrate that rownum is assigned before any sorts have been applied, we can select from the Posts table, and order it randomly:

SQL11G> select rownum, p.* 
        from posts p
        order by dbms_random.random;

ROWNUM                 ID                     POST_DATE                 POST_TITLE                
---------------------- ---------------------- ------------------------- ------------------------- 
31                     31                     14-AUG-11                 Title for post number 31  
12                     12                     26-JUL-11                 Title for post number 12  
10                     10                     24-JUL-11                 Title for post number 10  
28                     28                     11-AUG-11                 Title for post number 28  
41                     41                     24-AUG-11                 Title for post number 41  
43                     43                     26-AUG-11                 Title for post number 43  
7                      7                      21-JUL-11                 Title for post number 7   
36                     36                     19-AUG-11                 Title for post number 36  
23                     23                     06-AUG-11                 Title for post number 23  
13                     13                     27-JUL-11                 Title for post number 13  
49                     49                     01-SEP-11                 Title for post number 49

Notice how the rownum assigned to each row is in a random order. If the rownum was assigned after sorting, then we would expect the first to have 1, the second 2 etc.

Rownum to limit results

After all this discussion of rownum, it will come as no surprise that we can use it to limit the number of results returned by a query. Consider the requirement at the start of this post - show the 10 newest posts. Most Oracle beginners will implement the following wrong code:

SQL11G> select id, post_date, post_title
        from posts
        where rownum <= 10
        order by post_date desc;

We already know that rownum is assigned to the rows before the order by is applied, and we can prove it by looking at the explain plan:

-----------------------------------------------------------------------------
| Id  | Operation           | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |       |    10 |   360 |     4  (25)| 00:00:01 |
|   1 |  SORT ORDER BY      |       |    10 |   360 |     4  (25)| 00:00:01 |
|*  2 |   COUNT STOPKEY     |       |       |       |            |          |
|   3 |    TABLE ACCESS FULL| POSTS |    50 |  1800 |     3   (0)| 00:00:01 |
-----------------------------------------------------------------------------

Oracle executes this plan from the bottom up. Each row from the 'Table Access Full' step is passed to the 'COUNT STOPKEY' operation which is there because we limited the results using rownum. When it receives 10 rows, Oracle stops looking for more rows, sorts and returns the results:

ID                     POST_DATE                 POST_TITLE                
---------------------- ------------------------- ------------------------- 
10                     24-JUL-11                 Title for post number 10  
9                      23-JUL-11                 Title for post number 9   
8                      22-JUL-11                 Title for post number 8   
7                      21-JUL-11                 Title for post number 7   
6                      20-JUL-11                 Title for post number 6   
5                      19-JUL-11                 Title for post number 5   
4                      18-JUL-11                 Title for post number 4   
3                      17-JUL-11                 Title for post number 3   
2                      16-JUL-11                 Title for post number 2   
1                      15-JUL-11                 Title for post number 1  

Remember the newest post was made on 2nd September, so these results are not correct. In the general case, this query will give you 10 random rows from the table, and sort that result. The fact the results here are the 10 oldest rows is really down to how the table was loaded and cannot be relied upon.

Getting the top n

To get the top n results, you need to sort the data, then apply rownum, and you can do that using an inline view:

SQL11G> select id, post_date, post_title
        from
        ( 
          select id, post_date, post_title
          from posts
          order by post_date desc
        ) a
        where rownum <= 10;

Notice how the entire query including order by is in the inner query, and the rownum filter is applied to that result set. The explain plan will also highlight the difference:

---------------------------------------------------------------------------------
| Id  | Operation               | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |       |    10 |   360 |     4  (25)| 00:00:01 |
|*  1 |  COUNT STOPKEY          |       |       |       |            |          |
|   2 |   VIEW                  |       |    50 |  1800 |     4  (25)| 00:00:01 |
|*  3 |    SORT ORDER BY STOPKEY|       |    50 |  1800 |     4  (25)| 00:00:01 |
|   4 |     TABLE ACCESS FULL   | POSTS |    50 |  1800 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------

Notice how the COUNT STOPKEY operation has moved up the plan, and it is now applied after the SORT ORDER BY STOPKEY operation on line 3 (more on it later). Looking at the results, we can see this query gives the correct answer:

ID                     POST_DATE                 POST_TITLE                
---------------------- ------------------------- ------------------------- 
50                     02-SEP-11                 Title for post number 50  
49                     01-SEP-11                 Title for post number 49  
48                     31-AUG-11                 Title for post number 48  
47                     30-AUG-11                 Title for post number 47  
46                     29-AUG-11                 Title for post number 46  
45                     28-AUG-11                 Title for post number 45  
44                     27-AUG-11                 Title for post number 44  
43                     26-AUG-11                 Title for post number 43  
42                     25-AUG-11                 Title for post number 42  
41                     24-AUG-11                 Title for post number 41 

Paging Data

If this query was used in a forum to display the 10 most recent posts on page 1, then it is likely another query is require to display posts 11 - 20 on page 2. Doing that is an extension of the query above.

To get results 11 - 20, first sort the data and keep the first 20 posts numbering each row with rownum:

SQL11G> select rownum rnum, id, post_date, post_title
        from
        (
          select id, post_date, post_title
          from posts
          order by post_date desc
        ) a
        where rownum <= 20;

Now we have the top 20 rows, and created a new column in the result set called rnum. Using another inline view, you can use rnum to filter out the first 10 results, giving you 11 - 20:

SQL11G> select id, post_date, post_title
        from
        (
          select rownum rnum, id, post_date, post_title
          from
          (
            select id, post_date, post_title
            from posts
            order by post_date desc
          ) a
          where rownum <= 20
        ) b where rnum > 10

ID                     POST_DATE                 POST_TITLE                
---------------------- ------------------------- ------------------------- 
40                     23-AUG-11                 Title for post number 40  
39                     22-AUG-11                 Title for post number 39  
38                     21-AUG-11                 Title for post number 38  
37                     20-AUG-11                 Title for post number 37  
36                     19-AUG-11                 Title for post number 36  
35                     18-AUG-11                 Title for post number 35  
34                     17-AUG-11                 Title for post number 34  
33                     16-AUG-11                 Title for post number 33  
32                     15-AUG-11                 Title for post number 32  
31                     14-AUG-11                 Title for post number 31

The syntax is a bit more clunky than as simple LIMIT keyword, but once you know how rownum works, it is easy to apply it to any case.

Generic Top N

select *
from 
(
   <your query here
    including all filters
    and order by>
) where rownum <= :N

Generic Paging Query

(Substitute numperpage and page_num accordingly)

select
from
(
  select rownum rnum a.*
  from 
  (
     <your query here
      including all filters
      and order by>
  ) a where rownum <= :num_per_page * :page_num
) b where rnum > :num_per_page * (:page_num - 1)

SORT ORDER BY STOPKEY

In the top N query explain plan, I highlighted the sort operation changing from 'SORT ORDER BY' to 'SORT ORDER BY STOPKEY'. This is a special sort operation that Oracle uses to make top N queries more efficient.

When you add a rownum filter, Oracle knows the query will return at most N rows, it doesn't need to do a full sort of the data, it only needs to remember the top (or bottom) N results of the sort.

Conceptually, Oracle will allocate N slots for the results.

For each row read, it will check if it sorts higher than the N results it is holding, if it does the record in the lowest position gets pushed out, and the new record is slotted into the correct space. This is much more efficient than sorting potentially millions of rows when only 10 are needed.

Making it more efficient

In the forum example above, over time many posts will be made, and scanning and sorting the entire table to find only the 10 latest posts is going to be slow. The solution is to create an index on the column used to sort the results (postdate in this case) and ensure that postdate has a not null constraint on it. Unlike a table, records are stored sorted in an index, and Oracle can do a very efficient index range scan to find the top N or window of results.

The queries to get the results remain the same as above whether an index is involved or not.

blog comments powered by Disqus