Better At Oracle

A collection of tutorials, code and tools to help you get better using Oracle


17 May 2012

Collapsing continuous ranges into single rows

Every now and then a problem appears that seems impossible to solve in a single SQL query, but with Oracle Analytic functions, SQL can be more powerful that you think.

Say we have a table containing tariff start and end dates for a customer, eg:

create table customer_tariff (
  customer_id integer,
  tariff_id   integer,
  start_date  date,
  end_date    date
);

Lets pretend we want to implement a loyalty scheme, and the rules say that we want to find all customers who have had continuous service for the past year. However, if a customer changes tariff, the old tariff is ended on a day, and the new tariff starts the next day. A break in service is consider as a gap bigger than 1 day between two tariffs.

We need some test data to play with, so create a customer that has had two tariffs - note how the second tariff starts the day after the first one ends.

insert into customer_tariff values (
  1, 1, to_date('20120101', 'YYYYMMDD'), to_date('20120201', 'YYYYMMDD'));
insert into customer_tariff values (
  1, 2, to_date('20120202', 'YYYYMMDD'), null); 

Now create a second customer who has a break between the tariff changes. Maybe they canceled their account and then reopened it a month later:

insert into customer_tariff values (
  2, 1, to_date('20120101', 'YYYYMMDD'), to_date('20120201', 'YYYYMMDD'));
insert into customer_tariff values (
  2, 2, to_date('20120301', 'YYYYMMDD'), null); 

In our example, customer 1 has had continuous service since 1st January, but customer 2 has a break in service - the question is, how can we identify all the customers that fall into the continuous category in a single SQL query?

In the general case, there can be lots of tariff changes, to joining the table to itself is not going to help. To solve this problem, we need to be able to look backwards in the result set to see how far behind the previous enddate is from the current rows startdate. The lag function lets us do this:

select customer_id, 
       tariff_id, 
       lag(end_date) over (partition by customer_id order by start_date asc) prev_end, 
       start_date, 
       end_date  
from customer_tariff;

CUSTOMER_ID  TARIFF_ID PREV_END  START_DAT END_DATE
----------- ---------- --------- --------- ---------
          1          1           01-JAN-12 01-FEB-12
          1          2 01-FEB-12 02-FEB-12
          2          1           01-JAN-12 01-FEB-12
          2          2 01-FEB-12 01-MAR-12

Notice how the previous rows end date (or null if there is no previous row) is copied into the current row. We can use data in this form to group the rows together if the gap between the tariffs is 1 day or less. First we can use the case statement to assign a group to the start of each service period:

select customer_id,
       tariff_id,
       case 
         when start_date - lag(end_date) over (partition by customer_id order by start_date asc) <= 1 then
           null
         else
           rownum
         end grp,
         start_date,
         nvl(end_date, sysdate)
from customer_tariff
order by customer_id, start_date


CUSTOMER_ID  TARIFF_ID        GRP START_DAT END_DATE
----------- ---------- ---------- --------- ---------
          1          1          1 01-JAN-12 01-FEB-12
          1          2            02-FEB-12
          2          1          3 01-JAN-12 01-FEB-12
          2          2          4 01-MAR-12

Then we can use the max and min analytic functions to copy the group down through the results set:

select customer_id,
       start_date,
       end_date,
       max(grp) over (order by customer_id, start_date) grp
from (
  select customer_id,
         tariff_id,
         case 
           when start_date - lag(end_date) over (partition by customer_id order by start_date asc) <= 1 then
             null
           else
             rownum
           end grp,
           start_date,
           nvl(end_date, sysdate) end_date
  from customer_tariff
  order by customer_id, start_date
);


CUSTOMER_ID START_DAT END_DATE         GRP
----------- --------- --------- ----------
          1 01-JAN-12 01-FEB-12          1
          1 02-FEB-12 17-MAY-12          1
          2 01-JAN-12 01-FEB-12          3
          2 01-MAR-12 17-MAY-12          4

The last query assigned an arbitary group_id to rows that are considered continuous, and we can use that with some standard SQL to collapse continuous rows into a single row in our final results set, giving the start and end dates of continuous service:

select customer_id,
       min(start_date) start_date,
       max(end_date) end_date
from
(
  select customer_id,
         start_date,
         end_date,
         max(grp) over (order by customer_id, start_date) grp
  from (
    select customer_id,
           tariff_id,
           case 
             when start_date - lag(end_date) over (partition by customer_id order by start_date asc) <= 1 then
               null
             else
               rownum
             end grp,
             start_date,
             nvl(end_date, sysdate) end_date
    from customer_tariff
  )
)
group by customer_id, grp
order by customer_id, start_date;

CUSTOMER_ID START_DAT END_DATE
----------- --------- ---------
          1 01-JAN-12 17-MAY-12
          2 01-JAN-12 01-FEB-12
          2 01-MAR-12 17-MAY-12

From here, it is pretty trivial to figure out how long a customer has had continuous service for, and hence be given a loyalty award.

This example is a little contrived, but it is surprising how often it comes up in various forms. Next time you think about needing to collapse multiple rows into one, have a thing about how analytic functions might be able to help.

blog comments powered by Disqus