A collection of tutorials, code and tools to help you get better using Oracle
17 May 2012
Every now and then a problem appears that seems impossible to solve in a single SQL query, but with Oracle Analytic functions, SQL can be more powerful that you think.
Say we have a table containing tariff start and end dates for a customer, eg:
create table customer_tariff (
customer_id integer,
tariff_id integer,
start_date date,
end_date date
);
Lets pretend we want to implement a loyalty scheme, and the rules say that we want to find all customers who have had continuous service for the past year. However, if a customer changes tariff, the old tariff is ended on a day, and the new tariff starts the next day. A break in service is consider as a gap bigger than 1 day between two tariffs.
We need some test data to play with, so create a customer that has had two tariffs - note how the second tariff starts the day after the first one ends.
insert into customer_tariff values (
1, 1, to_date('20120101', 'YYYYMMDD'), to_date('20120201', 'YYYYMMDD'));
insert into customer_tariff values (
1, 2, to_date('20120202', 'YYYYMMDD'), null);
Now create a second customer who has a break between the tariff changes. Maybe they canceled their account and then reopened it a month later:
insert into customer_tariff values (
2, 1, to_date('20120101', 'YYYYMMDD'), to_date('20120201', 'YYYYMMDD'));
insert into customer_tariff values (
2, 2, to_date('20120301', 'YYYYMMDD'), null);
In our example, customer 1 has had continuous service since 1st January, but customer 2 has a break in service - the question is, how can we identify all the customers that fall into the continuous category in a single SQL query?
In the general case, there can be lots of tariff changes, to joining the table to itself is not going to help. To solve this problem, we need to be able to look backwards in the result set to see how far behind the previous enddate is from the current rows startdate. The lag function lets us do this:
select customer_id,
tariff_id,
lag(end_date) over (partition by customer_id order by start_date asc) prev_end,
start_date,
end_date
from customer_tariff;
CUSTOMER_ID TARIFF_ID PREV_END START_DAT END_DATE
----------- ---------- --------- --------- ---------
1 1 01-JAN-12 01-FEB-12
1 2 01-FEB-12 02-FEB-12
2 1 01-JAN-12 01-FEB-12
2 2 01-FEB-12 01-MAR-12
Notice how the previous rows end date (or null if there is no previous row) is copied into the current row. We can use data in this form to group the rows together if the gap between the tariffs is 1 day or less. First we can use the case statement to assign a group to the start of each service period:
select customer_id,
tariff_id,
case
when start_date - lag(end_date) over (partition by customer_id order by start_date asc) <= 1 then
null
else
rownum
end grp,
start_date,
nvl(end_date, sysdate)
from customer_tariff
order by customer_id, start_date
CUSTOMER_ID TARIFF_ID GRP START_DAT END_DATE
----------- ---------- ---------- --------- ---------
1 1 1 01-JAN-12 01-FEB-12
1 2 02-FEB-12
2 1 3 01-JAN-12 01-FEB-12
2 2 4 01-MAR-12
Then we can use the max and min analytic functions to copy the group down through the results set:
select customer_id,
start_date,
end_date,
max(grp) over (order by customer_id, start_date) grp
from (
select customer_id,
tariff_id,
case
when start_date - lag(end_date) over (partition by customer_id order by start_date asc) <= 1 then
null
else
rownum
end grp,
start_date,
nvl(end_date, sysdate) end_date
from customer_tariff
order by customer_id, start_date
);
CUSTOMER_ID START_DAT END_DATE GRP
----------- --------- --------- ----------
1 01-JAN-12 01-FEB-12 1
1 02-FEB-12 17-MAY-12 1
2 01-JAN-12 01-FEB-12 3
2 01-MAR-12 17-MAY-12 4
The last query assigned an arbitary group_id to rows that are considered continuous, and we can use that with some standard SQL to collapse continuous rows into a single row in our final results set, giving the start and end dates of continuous service:
select customer_id,
min(start_date) start_date,
max(end_date) end_date
from
(
select customer_id,
start_date,
end_date,
max(grp) over (order by customer_id, start_date) grp
from (
select customer_id,
tariff_id,
case
when start_date - lag(end_date) over (partition by customer_id order by start_date asc) <= 1 then
null
else
rownum
end grp,
start_date,
nvl(end_date, sysdate) end_date
from customer_tariff
)
)
group by customer_id, grp
order by customer_id, start_date;
CUSTOMER_ID START_DAT END_DATE
----------- --------- ---------
1 01-JAN-12 17-MAY-12
2 01-JAN-12 01-FEB-12
2 01-MAR-12 17-MAY-12
From here, it is pretty trivial to figure out how long a customer has had continuous service for, and hence be given a loyalty award.
This example is a little contrived, but it is surprising how often it comes up in various forms. Next time you think about needing to collapse multiple rows into one, have a thing about how analytic functions might be able to help.