Better At Oracle

A collection of tutorials, code and tools to help you get better using Oracle

07 March 2014

Using Collect For One To Many Result Sets

In a reporting context, a problem that comes up fairly often is having a master table, such as Orders than joins to a detail table, such as Order_Line_Items in a one to many relationship. If you want to produce a report that contains all orders and their Line Items, there are two obvious choices:

  1. Join the two tables, accepting that the information from the Orders table will be duplicated in every Line Item row, and deal with that when producing the report.
  2. Query the Order data, and for each row returned issue another query to get the line items.

If you have only two tables involved, the first method will work pretty well, but if you have many detail tables, eg Order, Order_Line_Items, Order_Addresses, then the rows will keep on multiplying due to multiple one to many joins.

The second approach will be hopelessly inefficient (ie slow) for any large dataset.

The Collect Function

I think the collect function was introduced in 10g, but I only came across it recently when trying to solve a problem like the one outlined above. It is a group function, which means it can be used in a SQL statement that contains a group by statement, similar to max, min, sum etc.

The 10g implementation was fairly limited - this article has a pretty good overview. The features I am interested in came along in 11gR2 - the ability to use a complex type in the collect function and also sort the results inside the collection array. Have a look here for a good overview.

In simple terms, collect allows you to create an array of objects as a column in your result set. An example is probably better than more words:

create table orders (order_id integer,
                     order_date date,
                     total_cost number);

insert into orders
select level, sysdate - level, 2.99*level
from dual
connect by level <= 100;

create table order_line_items(order_id integer,
                              line_item_number integer,
                              description varchar2(255));

insert into order_line_items
select mod(level, 100) + 1, mod(level, 5) + 1, 'Description for line item '||level
from dual
connect by level <= 500;

Now we have 100 orders, and for each order we have 5 line items.

Next we need a type that models the order_line_items table, so in this case it needs to contain line_item_number and description:

create or replace type t_order_line_items as object
  line_item_number integer,
  description      varchar2(255)

create or replace type tab_order_line_items
  as table of t_order_line_items;

Now we can use the collect function in a query. First, create a query that groups order_line_items by order_id:

select order_id, 
         collect(t_order_line_items(line_item_number, description) order by line_item_number
       ) as tab_order_line_items) as line_items
from order_line_items
group by order_id;

This produces a series of results like the following:

97    TAB_ORDER_LINE_ITEMS(T_ORDER_LINE_ITEMS(2, 'Description for line item 96'),
                           T_ORDER_LINE_ITEMS(2, 'Description for line item 496'), 
                           T_ORDER_LINE_ITEMS(2, 'Description for line item 396'), 
                           T_ORDER_LINE_ITEMS(2, 'Description for line item 296'), 
                           T_ORDER_LINE_ITEMS(2, 'Description for line item 196'))

What we have done here is compress the many rows per order_id into a single row that contains the order_id and an array of line items. Its pretty simple to nest this as a view in a query, and join it to the Orders table:

with line_items as
  select order_id, 
           collect(t_order_line_items(line_item_number, description) order by line_item_number
         ) as tab_order_line_items) as line_items
  from order_line_items
  group by order_id
select o.order_id,
from orders o, 
     line_items li
where o.order_id = li.order_id;

100 27-NOV-13        299  TAB_ORDER_LINE_ITEMS(
                            T_ORDER_LINE_ITEMS(5, 'Description for line item 99'), 
                            T_ORDER_LINE_ITEMS(5, 'Description for line item 499'), 
                            T_ORDER_LINE_ITEMS(5, 'Description for line item 399'), 
                            T_ORDER_LINE_ITEMS(5, 'Description for line item 299'), 
                            T_ORDER_LINE_ITEMS(5, 'Description for line item 199'))

I think this is a pretty power technique in a reporting context. I doubt (and have not tested) it would be wise to try and put 1000's of rows into a collection, but for one to many relationships where there are not too many records on the 'many' side of the relationship it should work well.

The application that is reading a result set like this need to know how to read an array of record types from a result set, which is easy when you know how!

blog comments powered by Disqus