SQL – Combining Overlapping Date Rows

8 12 2009

December 7, 2009

Some time ago the following question appeared on the comp.databases.oracle.server Usenet group:
http://groups.google.com/group/comp.databases.oracle.server/browse_thread/thread/aff40e7ccf09d9cc

On Oracle 10G, lets say I have a table that looks like this:

create table t1 (key varchar2(10), start_dtm date, end_dtm date);

and I have data some of which over laps:

insert into t1 values ('1234', to_date('01/01/2001', 'DD/MM/YYYY'), to_date('30/03/2001', 'DD/MM/YYYY'));
insert into t1 values ('1234', to_date('01/02/2001', 'DD/MM/YYYY'), to_date('30/03/2001', 'DD/MM/YYYY'));
insert into t1 values ('1234', to_date('15/02/2001', 'DD/MM/YYYY'), to_date('15/04/2001', 'DD/MM/YYYY'));
insert into t1 values ('1234', to_date('01/07/2001', 'DD/MM/YYYY'), to_date('30/12/2001', 'DD/MM/YYYY'));
Ie:

1st Jan - 30th March
    2nd Feb - 30th March
       15th Feb - 15th April

1st July - 30th Dec

I want to return only two rows from the 4 that look like

1st jan - 15th April
1st July - 30th Dec

ie – if the dates are continuous, squash all the continuous rows into 1 with the min start and max end date but if there is a gap it has to become a second row.  There can be any number of rows and over laps etc. I reckon this can be done with some analytics trickery, but I just cannot seem to figure out how.

Any ideas?

I provided the following solution:

First, we experiment with LAG so that we are able to examine the previous row when sorted by START_DTM and END_DTM:

SELECT
  T1.KEY,
  T1.START_DTM,
  T1.END_DTM,
  LEAD(T1.START_DTM) OVER (PARTITION BY T1.KEY ORDER BY T1.START_DTM,T1.END_DTM) M_START_DTM,
  LEAD(T1.END_DTM) OVER (PARTITION BY T1.KEY ORDER BY T1.START_DTM,T1.END_DTM) M_END_DTM
FROM
  T1
ORDER BY
  T1.START_DTM,
  T1.END_DTM;

KEY        START_DTM END_DTM   M_START_D M_END_DTM
---------- --------- --------- --------- ---------
1234       01-JAN-01 30-MAR-01 01-FEB-01 30-MAR-01
1234       01-FEB-01 30-MAR-01 15-FEB-01 15-APR-01
1234       15-FEB-01 15-APR-01 01-JUL-01 30-DEC-01
1234       01-JUL-01 30-DEC-01

If we now slide the above into an inline view and see if the START_DTM is greater than M_END_DTM (the previous row’s END_DTM), we may have found a new series of dates (note, I just noticed that this may have problems if there are two rows with discrete date ranges that fall entirely within a third, larger date range), we will output 1 if true, or 0 if false:

SELECT
  KEY,
  START_DTM,
  END_DTM,
  M_START_DTM,
  M_END_DTM,
  DECODE(SIGN(START_DTM-NVL(M_END_DTM,START_DTM)),1,1,0) C
FROM
  (SELECT
    T1.KEY,
    T1.START_DTM,
    T1.END_DTM,
    LAG(T1.START_DTM) OVER (PARTITION BY T1.KEY ORDER BY T1.START_DTM,T1.END_DTM) M_START_DTM,
    LAG(T1.END_DTM) OVER (PARTITION BY T1.KEY ORDER BY T1.START_DTM,T1.END_DTM) M_END_DTM
  FROM
    T1
  ORDER BY
    T1.START_DTM,
    T1.END_DTM);

KEY        START_DTM END_DTM   M_START_D M_END_DTM          C
---------- --------- --------- --------- --------- ----------
1234       01-JAN-01 30-MAR-01                              0
1234       01-FEB-01 30-MAR-01 01-JAN-01 30-MAR-01          0
1234       15-FEB-01 15-APR-01 01-FEB-01 30-MAR-01          0
1234       01-JUL-01 30-DEC-01 15-FEB-01 15-APR-01          1

In the above, a 1 is output whenever there is a jump in the date range – if we use the COUNT analytical function to create a running count of the 1s and slide the above into an inline view, we actually create a column that may be used for grouping:

SELECT
  KEY,
  START_DTM START_DTM,
  END_DTM END_DTM,
  SUM(C) OVER (PARTITION BY KEY ORDER BY START_DTM,END_DTM) G
FROM
(SELECT
  KEY,
  START_DTM,
  END_DTM,
  M_START_DTM,
  M_END_DTM,
  DECODE(SIGN(START_DTM-NVL(M_END_DTM,START_DTM)),1,1,0) C
FROM
  (SELECT
    T1.KEY,
    T1.START_DTM,
    T1.END_DTM,
    LAG(T1.START_DTM) OVER (PARTITION BY T1.KEY ORDER BY T1.START_DTM,T1.END_DTM) M_START_DTM,
    LAG(T1.END_DTM) OVER (PARTITION BY T1.KEY ORDER BY T1.START_DTM,T1.END_DTM) M_END_DTM
  FROM
    T1
  ORDER BY
    T1.START_DTM,
    T1.END_DTM));

KEY        START_DTM END_DTM            G
---------- --------- --------- ----------
1234       01-JAN-01 30-MAR-01          0
1234       01-FEB-01 30-MAR-01          0
1234       15-FEB-01 15-APR-01          0
1234       01-JUL-01 30-DEC-01          1

We are now able to group on the column G by again sliding the above into an inline view:

SELECT
  KEY,
  MIN(START_DTM) START_DTM,
  MAX(END_DTM) END_DTM
FROM
(SELECT
  KEY,
  START_DTM START_DTM,
  END_DTM END_DTM,
  SUM(C) OVER (PARTITION BY KEY ORDER BY START_DTM,END_DTM) G
FROM
(SELECT
  KEY,
  START_DTM,
  END_DTM,
  M_START_DTM,
  M_END_DTM,
  DECODE(SIGN(START_DTM-NVL(M_END_DTM,START_DTM)),1,1,0) C
FROM
  (SELECT
    T1.KEY,
    T1.START_DTM,
    T1.END_DTM,
    LAG(T1.START_DTM) OVER (PARTITION BY T1.KEY ORDER BY T1.START_DTM,T1.END_DTM) M_START_DTM,
    LAG(T1.END_DTM) OVER (PARTITION BY T1.KEY ORDER BY T1.START_DTM,T1.END_DTM) M_END_DTM
  FROM
    T1
  ORDER BY
    T1.START_DTM,
    T1.END_DTM)))
GROUP BY
  KEY,
  G;

KEY        START_DTM END_DTM
---------- --------- ---------
1234       01-JAN-01 15-APR-01
1234       01-JUL-01 30-DEC-01

Let’s add a little more data to see what happens:

insert into t1 values ('1234', to_date('10/10/2001', 'DD/MM/YYYY'), to_date('29/12/2001', 'DD/MM/YYYY'));
insert into t1 values ('1234', to_date('31/12/2001', 'DD/MM/YYYY'), to_date('15/01/2002', 'DD/MM/YYYY'));

KEY        START_DTM END_DTM
---------- --------- ---------
1234       01-JAN-01 15-APR-01
1234       01-JUL-01 30-DEC-01
1234       31-DEC-01 15-JAN-02

Note the addition of the last row in the output – you may want to determine if that should be part of the previous group of date.  You should be able to fix this by adding or subtracting a date in the SIGN( ) function.

—-

I originally thought that you were trying to eliminate this row from the output:

KEY        START_DTM END_DTM
---------- --------- ---------
1234       01-FEB-01 30-MAR-01

For the above, there are several methods:

SELECT
  T1.KEY,
  T1.START_DTM,
  T1.END_DTM
FROM
  T1
WHERE
  (T1.KEY,
  T1.START_DTM,
  T1.END_DTM) NOT IN (
    SELECT
      T1.KEY,
      T1.START_DTM,
      T1.END_DTM
    FROM
      T1,
      T1 T2
    WHERE
      T1.KEY=T2.KEY
      AND T1.START_DTM BETWEEN T2.START_DTM AND T2.END_DTM
      AND T1.END_DTM BETWEEN  T2.START_DTM AND T2.END_DTM
      AND (
        T1.START_DTM<>T2.START_DTM
        OR T1.END_DTM<>T2.END_DTM));

KEY        START_DTM END_DTM
---------- --------- ---------
1234       01-JAN-01 30-MAR-01
1234       15-FEB-01 15-APR-01
1234       01-JUL-01 30-DEC-01

 

SELECT
  T1.KEY,
  T1.START_DTM,
  T1.END_DTM
FROM
  T1,
  (SELECT
    T1.KEY,
    T1.START_DTM,
    T1.END_DTM
  FROM
    T1,
    T1 T2
  WHERE
    T1.KEY=T2.KEY
    AND T1.START_DTM BETWEEN T2.START_DTM AND T2.END_DTM
    AND T1.END_DTM BETWEEN  T2.START_DTM AND T2.END_DTM
    AND (
      T1.START_DTM<>T2.START_DTM
      OR T1.END_DTM<>T2.END_DTM)) T3
WHERE
  T1.KEY=T3.KEY(+)
  AND T1.START_DTM=T3.START_DTM(+)
  AND T1.END_DTM=T3.END_DTM(+)
  AND T3.KEY IS NULL;

KEY        START_DTM END_DTM
---------- --------- ---------
1234       01-JAN-01 30-MAR-01
1234       15-FEB-01 15-APR-01
1234       01-JUL-01 30-DEC-01

 

SELECT
  T1.KEY,
  T1.START_DTM,
  T1.END_DTM
FROM
  T1
MINUS
SELECT
  T1.KEY,
  T1.START_DTM,
  T1.END_DTM
FROM
  T1,
  T1 T2
WHERE
  T1.KEY=T2.KEY
  AND T1.START_DTM BETWEEN T2.START_DTM AND T2.END_DTM
  AND T1.END_DTM BETWEEN  T2.START_DTM AND T2.END_DTM
  AND (
    T1.START_DTM<>T2.START_DTM
    OR T1.END_DTM<>T2.END_DTM);

KEY        START_DTM END_DTM
---------- --------- ---------
1234       01-JAN-01 30-MAR-01
1234       15-FEB-01 15-APR-01
1234       01-JUL-01 30-DEC-01

 

SELECT
  KEY,
  START_DTM,
  END_DTM
FROM
  (SELECT
    T1.KEY,
    T1.START_DTM,
    T1.END_DTM,
    MIN(T1.START_DTM) OVER (PARTITION BY T1.KEY, T1.END_DTM) M_START_DTM,
    MAX(T1.END_DTM) OVER  (PARTITION BY T1.KEY, T1.START_DTM) M_END_DTM
  FROM
    T1)
WHERE
  START_DTM=M_START_DTM
  AND END_DTM=M_END_DTM;

KEY        START_DTM END_DTM
---------- --------- ---------
1234       01-JAN-01 30-MAR-01
1234       15-FEB-01 15-APR-01
1234       01-JUL-01 30-DEC-01

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Maxim Demenko provided the following solution:

Maybe, something like this

SQL> select key, min(start_dtm) start_dtm, max(end_dtm) end_dtm
   2    from (select key,
   3                 start_dtm,
   4                 end_dtm,
   5                 sum(interval_change) over(partition by key order by end_dtm, start_dtm) interval_no
   6            from (select key,
   7                         start_dtm,
   8                         end_dtm,
   9                         case
  10                           when (lag(end_dtm)
  11                                 over(partition by key order by end_dtm,
  12                                      start_dtm) - start_dtm) > 0 then
  13                            0
  14                           else
  15                            1
  16                         end interval_change
  17                    from t1))
  18   group by key, interval_no
  19   order by key, interval_no
  20  ;

KEY        START_DTM            END_DTM
---------- -------------------- --------------------
1234       01-Jan-2001          15-Apr-2001
1234       01-Jul-2001          30-Dec-2001

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The original poster followed up with the following solution that he developed:

Using the same table as above and this data:

create table t1 (key varchar2(10), start_dtm date, end_dtm date);

insert into t1 values( 9999, to_date('01/01/2008', 'DD/MM/YYYY'), to_date('01/06/2008', 'DD/MM/YYYY'));
insert into t1 values( 9999, to_date('01/02/2008', 'DD/MM/YYYY'), to_date('01/06/2008', 'DD/MM/YYYY'));
insert into t1 values( 9999, to_date('01/02/2008', 'DD/MM/YYYY'), to_date('01/05/2008', 'DD/MM/YYYY'));
insert into t1 values( 9999, to_date('01/02/2008', 'DD/MM/YYYY'), to_date('01/07/2008', 'DD/MM/YYYY'));

This gives a set of rows like:

|------------------------------------------|
   |---------------------------------------|
   |---------------------|
      |------------------------------------------|

And we expect this to turn into a single row like

01/01/08 |----------------------------------------------| 01/07/2008

 

insert into t1 values( 9999, to_date('01/08/2008', 'DD/MM/YYYY'), null);
insert into t1 values( 9999, to_date('01/09/2008', 'DD/MM/YYYY'), to_date('01/10/2008', 'DD/MM/YYYY'));

This gives a set of rows like:

|--------------------------------------->
   |------------|

And we want an row that looks like

01/08/2008 |----------------------------> (null end date)

I think this query does it:

select key, min(start_dtm), max(end_dtm)
from
(
  select key, start_dtm, end_dtm, max (grp) over (partition by key order by start_dtm asc) grp2
  from
  (
    SELECT key,
           start_dtm,
           NVL (end_dtm, '31-dec-4712') end_dtm,
           case
             when
               (start_dtm not between lag(start_dtm) over (partition by key order by start_dtm asc)
                   and nvl( lag(end_dtm) over (partition by key order by start_dtm asc), '31-dec-4712') )
                or lag(start_dtm) over (partition by key order by start_dtm asc) is null
                then
                  row_number() over (partition by key order by start_dtm asc)
             end grp
    FROM t1
    order by key, start_dtm asc
  ) a
) b
group by key, grp2

GRP                              MIN(START MAX(END_D
--------------------------------- --------- ---------
09999                             01-JAN-08 01-JUL-08
09999                             01-AUG-08 31-DEC-12

Basically I said, if you order the rows by increasing start date, then if rows overlap, the current start_dtm must be between the previous start_dtm and end_dtm – if there is no previous start_dtm or they don’t overlap, then its a new group.

The innermost select outputs something like:

KEY                               START_DTM END_DTM          GRP
--------------------------------- --------- --------- ----------
09999                             01-JAN-08 01-JUN-08          1
09999                             01-FEB-08 01-JUL-08
09999                             01-FEB-08 01-JUN-08
09999                             01-FEB-08 01-MAY-08
09999                             01-AUG-08 31-DEC-12          5
09999                             01-SEP-08 01-OCT-08

Then we use analytics again to fill in the blanks in the enclosing query (b):

KEY                               START_DTM END_DTM         GRP2
--------------------------------- --------- --------- ----------
09999                             01-JAN-08 01-JUN-08          1
09999                             01-FEB-08 01-JUL-08          1
09999                             01-FEB-08 01-JUN-08          1
09999                             01-FEB-08 01-MAY-08          1
09999                             01-AUG-08 31-DEC-12          5
09999                             01-SEP-08 01-OCT-08          5

and then simply group by KEY, GRP in the outer query to get the result.

I am glad I went through the pain of figuring this out (took me a good 90 minutes) as I really feel like I get analytics now.  It also allowed me to replace a temporary table and several 100 lines of PLSQL that was performing rather poorly and written by someone who doesn’t get analytics!


Actions

Information

10 responses

16 07 2014
celauritz

Would you need to change anything if the dates didn’t overlap, but were sequential?
Like Start Date 1/1/14, end date 1/31/14, Start Date 2/1/14 End Date 2/11/14.
I would like this bounced to one record: Start Date 1/1/14 End Date 2/11/14.

16 07 2014
Charles Hooper

celauritz,
It has been several years since I created this example. My first thought is that you need to create the overlap – you could do that by adding one day to the ending date in the formula, so the 1/31/14 ending date would appear as 2/1/14 in the formula, thus causing an overlap situation.

Let’s set up a test table using the SQL statements at the start of this article (note that the dates in those SQL statements are in the format of dd/mm/yyyy, rather than mm/dd/yyyy):

create table t1 (key varchar2(10), start_dtm date, end_dtm date);
 
insert into t1 values ('1234', to_date('01/01/2014', 'DD/MM/YYYY'), to_date('31/01/2014', 'DD/MM/YYYY'));
insert into t1 values ('1234', to_date('01/02/2014', 'DD/MM/YYYY'), to_date('11/02/2014', 'DD/MM/YYYY'));
insert into t1 values ('1234', to_date('15/02/2014', 'DD/MM/YYYY'), to_date('15/04/2014', 'DD/MM/YYYY'));
insert into t1 values ('1234', to_date('01/07/2014', 'DD/MM/YYYY'), to_date('30/12/2014', 'DD/MM/YYYY'));
 
COMMIT;

Starting SQL statement, let’s see what dates are in the table:

SELECT
  T1.KEY,
  T1.START_DTM,
  T1.END_DTM,
  LEAD(T1.START_DTM) OVER (PARTITION BY T1.KEY ORDER BY T1.START_DTM,T1.END_DTM) M_START_DTM,
  LEAD(T1.END_DTM) OVER (PARTITION BY T1.KEY ORDER BY T1.START_DTM,T1.END_DTM) M_END_DTM
FROM
  T1
ORDER BY
  T1.START_DTM,
  T1.END_DTM;
 
KEY        START_DTM END_DTM   M_START_D M_END_DTM
---------- --------- --------- --------- ---------
1234       01-JAN-14 31-JAN-14 01-FEB-14 11-FEB-14
1234       01-FEB-14 11-FEB-14 15-FEB-14 15-APR-14
1234       15-FEB-14 15-APR-14 01-JUL-14 30-DEC-14
1234       01-JUL-14 30-DEC-14

You want the first two rows to be combined. Let’s see what happens when the next SQL statement in the blog article is executed:

SELECT
  KEY,
  START_DTM,
  END_DTM,
  M_START_DTM,
  M_END_DTM,
  DECODE(SIGN(START_DTM-NVL(M_END_DTM,START_DTM)),1,1,0) C
FROM
  (SELECT
    T1.KEY,
    T1.START_DTM,
    T1.END_DTM,
    LAG(T1.START_DTM) OVER (PARTITION BY T1.KEY ORDER BY T1.START_DTM,T1.END_DTM) M_START_DTM,
    LAG(T1.END_DTM) OVER (PARTITION BY T1.KEY ORDER BY T1.START_DTM,T1.END_DTM) M_END_DTM
  FROM
    T1
  ORDER BY
    T1.START_DTM,
    T1.END_DTM);
 
KEY        START_DTM END_DTM   M_START_D M_END_DTM          C
---------- --------- --------- --------- --------- ----------
1234       01-JAN-14 31-JAN-14                              0
1234       01-FEB-14 11-FEB-14 01-JAN-14 31-JAN-14          1
1234       15-FEB-14 15-APR-14 01-FEB-14 11-FEB-14          1
1234       01-JUL-14 30-DEC-14 15-FEB-14 15-APR-14          1

The notes in the article state: “In the above, a 1 is output whenever there is a jump in the date range – if we use the COUNT analytical function to create a running count of the 1s and slide the above into an inline view, we actually create a column that may be used for grouping.” Based on that note, we need to fix the SQL statement above so that a 0 is output on the second row of the output. 1 must be added to the previous end date (M_END_DTM) to allow it to overlap the start date:

SELECT
  KEY,
  START_DTM,
  END_DTM,
  M_START_DTM,
  M_END_DTM,
  DECODE(SIGN(START_DTM-NVL(M_END_DTM + 1,START_DTM)),1,1,0) C
FROM
  (SELECT
    T1.KEY,
    T1.START_DTM,
    T1.END_DTM,
    LAG(T1.START_DTM) OVER (PARTITION BY T1.KEY ORDER BY T1.START_DTM,T1.END_DTM) M_START_DTM,
    LAG(T1.END_DTM) OVER (PARTITION BY T1.KEY ORDER BY T1.START_DTM,T1.END_DTM) M_END_DTM
  FROM
    T1
  ORDER BY
    T1.START_DTM,
    T1.END_DTM);
 
KEY        START_DTM END_DTM   M_START_D M_END_DTM          C
---------- --------- --------- --------- --------- ----------
1234       01-JAN-14 31-JAN-14                              0
1234       01-FEB-14 11-FEB-14 01-JAN-14 31-JAN-14          0
1234       15-FEB-14 15-APR-14 01-FEB-14 11-FEB-14          1
1234       01-JUL-14 30-DEC-14 15-FEB-14 15-APR-14          1

Next stage, carrying that one change forward:

SELECT
  KEY,
  START_DTM START_DTM,
  END_DTM END_DTM,
  SUM(C) OVER (PARTITION BY KEY ORDER BY START_DTM,END_DTM) G
FROM
(SELECT
  KEY,
  START_DTM,
  END_DTM,
  M_START_DTM,
  M_END_DTM,
  DECODE(SIGN(START_DTM-NVL(M_END_DTM + 1,START_DTM)),1,1,0) C
FROM
  (SELECT
    T1.KEY,
    T1.START_DTM,
    T1.END_DTM,
    LAG(T1.START_DTM) OVER (PARTITION BY T1.KEY ORDER BY T1.START_DTM,T1.END_DTM) M_START_DTM,
    LAG(T1.END_DTM) OVER (PARTITION BY T1.KEY ORDER BY T1.START_DTM,T1.END_DTM) M_END_DTM
  FROM
    T1
  ORDER BY
    T1.START_DTM,
    T1.END_DTM));
 
KEY        START_DTM END_DTM            G
---------- --------- --------- ----------
1234       01-JAN-14 31-JAN-14          0
1234       01-FEB-14 11-FEB-14          0
1234       15-FEB-14 15-APR-14          1
1234       01-JUL-14 30-DEC-14          2

Next stage, again with the one change carried forward:

SELECT
  KEY,
  MIN(START_DTM) START_DTM,
  MAX(END_DTM) END_DTM
FROM
(SELECT
  KEY,
  START_DTM START_DTM,
  END_DTM END_DTM,
  SUM(C) OVER (PARTITION BY KEY ORDER BY START_DTM,END_DTM) G
FROM
(SELECT
  KEY,
  START_DTM,
  END_DTM,
  M_START_DTM,
  M_END_DTM,
  DECODE(SIGN(START_DTM-NVL(M_END_DTM + 1,START_DTM)),1,1,0) C
FROM
  (SELECT
    T1.KEY,
    T1.START_DTM,
    T1.END_DTM,
    LAG(T1.START_DTM) OVER (PARTITION BY T1.KEY ORDER BY T1.START_DTM,T1.END_DTM) M_START_DTM,
    LAG(T1.END_DTM) OVER (PARTITION BY T1.KEY ORDER BY T1.START_DTM,T1.END_DTM) M_END_DTM
  FROM
    T1
  ORDER BY
    T1.START_DTM,
    T1.END_DTM)))
GROUP BY
  KEY,
  G;
 
KEY        START_DTM END_DTM
---------- --------- ---------
1234       01-JAN-14 11-FEB-14
1234       15-FEB-14 15-APR-14
1234       01-JUL-14 30-DEC-14

I believe the above output is what you requested.

20 08 2014
Surbhi

Hello Everyone,

Could you please help me in implementing below thing :

Input :

Key Date
48 8/25/2013
48 9/30/2013
48 11/1/2013
48 12/4/2013
48 1/4/2014
48 3/4/2014

Output :

Key Month Range
48 Aug 13-Sep 13
48 Nov 13-Jan 14
48 Mar 14

Need the month range for which the client has record in the table.

1000270748 12/4/2013

20 08 2014
Charles Hooper

Let’s set up a test table:

CREATE TABLE T1 (
  KEY NUMBER,
  D DATE);
 
INSERT INTO T1 VALUES (48, TO_DATE('8/25/2013','MM/DD/YYYY'));
INSERT INTO T1 VALUES (48, TO_DATE('9/30/2013','MM/DD/YYYY'));
INSERT INTO T1 VALUES (48, TO_DATE('11/1/2013','MM/DD/YYYY'));
INSERT INTO T1 VALUES (48, TO_DATE('12/4/2013','MM/DD/YYYY'));
INSERT INTO T1 VALUES (48, TO_DATE('1/4/2014','MM/DD/YYYY'));
INSERT INTO T1 VALUES (48, TO_DATE('3/4/2014','MM/DD/YYYY'));
INSERT INTO T1 VALUES (49, TO_DATE('4/25/2014','MM/DD/YYYY'));
INSERT INTO T1 VALUES (49, TO_DATE('5/25/2014','MM/DD/YYYY'));

The first thought that I have is to used the TRUNC function to find the first of the month for the D column (column TD below). We will then add one month to that date (column LTD) which may be compared to the first of the month in the next of the sorted rows. The LEAD function is used to retrieve the date from the next row (column ND), and the TRUNC function is applied to the date of the next row to obtain the first date of that month (column TND):

SELECT
  KEY,
  D,
  TRUNC(D,'MM') TD,
  ADD_MONTHS(TRUNC(D,'MM'),1) LTD,
  LEAD(D,1) OVER (PARTITION BY KEY ORDER BY D) ND,
  LEAD(TRUNC(D,'MM'),1) OVER (PARTITION BY KEY ORDER BY D) TND
FROM
  T1
ORDER BY
  KEY,
  D;
 
KEY D         TD        LTD       ND        TND
--- --------- --------- --------- --------- ---------
 48 25-AUG-13 01-AUG-13 01-SEP-13 30-SEP-13 01-SEP-13
 48 30-SEP-13 01-SEP-13 01-OCT-13 01-NOV-13 01-NOV-13
 48 01-NOV-13 01-NOV-13 01-DEC-13 04-DEC-13 01-DEC-13
 48 04-DEC-13 01-DEC-13 01-JAN-14 04-JAN-14 01-JAN-14
 48 04-JAN-14 01-JAN-14 01-FEB-14 04-MAR-14 01-MAR-14
 48 04-MAR-14 01-MAR-14 01-APR-14
 49 25-APR-14 01-APR-14 01-MAY-14 25-MAY-14 01-MAY-14
 49 25-MAY-14 01-MAY-14 01-JUN-14

Now, we are able to slide the above into an inline view and determine if the first of the month of the date on the next row is after the value in the LTD column (this would mean that there is a gap in the months):

SELECT
  KEY,
  D,
  TD,
  TND,
  DECODE(SIGN(TND - LTD),1,1,0) C
FROM
  (SELECT
    KEY,
    D,
    TRUNC(D,'MM') TD,
    ADD_MONTHS(TRUNC(D,'MM'),1) LTD,
    LEAD(D,1) OVER (PARTITION BY KEY ORDER BY D) ND,
    LEAD(TRUNC(D,'MM'),1) OVER (PARTITION BY KEY ORDER BY D) TND
  FROM
    T1)
ORDER BY
  KEY,
  D;
 
KEY D         TD        TND                C
--- --------- --------- --------- ----------
 48 25-AUG-13 01-AUG-13 01-SEP-13          0
 48 30-SEP-13 01-SEP-13 01-NOV-13          1
 48 01-NOV-13 01-NOV-13 01-DEC-13          0
 48 04-DEC-13 01-DEC-13 01-JAN-14          0
 48 04-JAN-14 01-JAN-14 01-MAR-14          1
 48 04-MAR-14 01-MAR-14                    0
 49 25-APR-14 01-APR-14 01-MAY-14          0
 49 25-MAY-14 01-MAY-14                    0

The above worked, but we have a problem if we want to use the approach that is demonstrated in this blog article. We need a 1 to appear in column C at the start of every sequence of dates, not at the end of the sequence of dates. We have hit a dead end with this approach.

Trying again, this time using LAG to look at the date from the previous row, rather than LEAD to look at the date from the next row in the ordered set of rows:

SELECT
  KEY,
  D,
  TRUNC(D,'MM') TD,
  ADD_MONTHS(TRUNC(D,'MM'),-1) LTD,
  LAG(D,1) OVER (PARTITION BY KEY ORDER BY D) PD,
  LAG(TRUNC(D,'MM'),1) OVER (PARTITION BY KEY ORDER BY D) TPD
FROM
  T1
ORDER BY
  KEY,
  D;
 
KEY D         TD        LTD       PD        TPD
--- --------- --------- --------- --------- ---------
 48 25-AUG-13 01-AUG-13 01-JUL-13
 48 30-SEP-13 01-SEP-13 01-AUG-13 25-AUG-13 01-AUG-13
 48 01-NOV-13 01-NOV-13 01-OCT-13 30-SEP-13 01-SEP-13
 48 04-DEC-13 01-DEC-13 01-NOV-13 01-NOV-13 01-NOV-13
 48 04-JAN-14 01-JAN-14 01-DEC-13 04-DEC-13 01-DEC-13
 48 04-MAR-14 01-MAR-14 01-FEB-14 04-JAN-14 01-JAN-14
 49 25-APR-14 01-APR-14 01-MAR-14
 49 25-MAY-14 01-MAY-14 01-APR-14 25-APR-14 01-APR-14

We should now be able to generate a column C that has the value 1 at the start of a new sequence of TD dates. Sliding the above into an inline view:

SELECT
  KEY,
  D,
  TD,
  TPD,
  DECODE(SIGN(LTD-TPD),1,1,0) C
FROM
  (SELECT
    KEY,
    D,
    TRUNC(D,'MM') TD,
    ADD_MONTHS(TRUNC(D,'MM'),-1) LTD,
    LAG(D,1) OVER (PARTITION BY KEY ORDER BY D) PD,
    LAG(TRUNC(D,'MM'),1) OVER (PARTITION BY KEY ORDER BY D) TPD
  FROM
    T1)
ORDER BY
  KEY,
  D;
 
KEY D         TD        TPD                C
--- --------- --------- --------- ----------
 48 25-AUG-13 01-AUG-13                    0
 48 30-SEP-13 01-SEP-13 01-AUG-13          0
 48 01-NOV-13 01-NOV-13 01-SEP-13          1
 48 04-DEC-13 01-DEC-13 01-NOV-13          0
 48 04-JAN-14 01-JAN-14 01-DEC-13          0
 48 04-MAR-14 01-MAR-14 01-JAN-14          1
 49 25-APR-14 01-APR-14                    0
 49 25-MAY-14 01-MAY-14 01-APR-14          0

Next, we are able to use the SUM analytic function with an ORDER BY clause to assign the rows that are in the same month date sequence, the same grouping number (column SC):

SELECT
  KEY,
  TD,
  SUM(C) OVER (PARTITION BY KEY ORDER BY TD) SC
FROM
  (SELECT
    KEY,
    D,
    TD,
    TPD,
    DECODE(SIGN(LTD-TPD),1,1,0) C
  FROM
    (SELECT
      KEY,
      D,
      TRUNC(D,'MM') TD,
      ADD_MONTHS(TRUNC(D,'MM'),-1) LTD,
      LAG(D,1) OVER (PARTITION BY KEY ORDER BY D) PD,
      LAG(TRUNC(D,'MM'),1) OVER (PARTITION BY KEY ORDER BY D) TPD
    FROM
      T1))
ORDER BY
  KEY,
  TD;
  
KEY TD                SC
--- --------- ----------
 48 01-AUG-13          0
 48 01-SEP-13          0
 48 01-NOV-13          1
 48 01-DEC-13          1
 48 01-JAN-14          1
 48 01-MAR-14          2
 49 01-APR-14          0
 49 01-MAY-14          0

Sliding the above into an inline view, we are able to group on columns KEY and SC to find the start and end month for each month date sequence:

SELECT
  KEY,
  MIN(TD) MINTD,
  MAX(TD) MAXTD
FROM
  (SELECT
    KEY,
    TD,
    SUM(C) OVER (PARTITION BY KEY ORDER BY TD) SC
  FROM
    (SELECT
      KEY,
      D,
      TD,
      TPD,
      DECODE(SIGN(LTD-TPD),1,1,0) C
    FROM
      (SELECT
        KEY,
        D,
        TRUNC(D,'MM') TD,
        ADD_MONTHS(TRUNC(D,'MM'),-1) LTD,
        LAG(D,1) OVER (PARTITION BY KEY ORDER BY D) PD,
        LAG(TRUNC(D,'MM'),1) OVER (PARTITION BY KEY ORDER BY D) TPD
      FROM
        T1)))
GROUP BY
  KEY,
  SC
ORDER BY
  KEY,
  SC;
 
KEY MINTD     MAXTD
--- --------- ---------
 48 01-AUG-13 01-SEP-13
 48 01-NOV-13 01-JAN-14
 48 01-MAR-14 01-MAR-14
 49 01-APR-14 01-MAY-14

The above example still needs a bit of cleanup to remove unnecessary columns from the inline views and a bit of formatting to clean up the final output.

21 08 2014
Surbhi

Perfec.. Thank you so much for the solution.

9 09 2014
jamespee2014

Great implementation. You mention above an interesting case: “note, I just noticed that this may have problems if there are two rows with discrete date ranges that fall entirely within a third, larger date range”. Any ideas on how to handle that case?

10 09 2014
jamespee2014

Came up with an implementation using MAX(start_dtm) OVER (PARTITION BY key ORDER BY start_dtm ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) to see if the row would intersect any of the preceding range rather than LAG.

10 09 2014
Charles Hooper

James,
I have been a bit preoccupied with a couple of unrelated items, so I have not had a chance to put together a solution for the potential issue that you highlighted. Did the MAX(START_DTM) OVER (PARTITION BY … method that you mentioned solve the potential issue that I mentioned?

10 09 2014
jamespee2014

Charles,

No problem! I know this is an older post that has gotten some recent interest, so I definitely understand.

I do think the solution I put together with the PARTITION … ROWS BETWEEN UNBOUNDED … works. I reversed the logic from your original, in which you check to see if the current row’s start is not between the previous row’s start and end to create a new group, and instead, I check to see if the current row’s start date is greater than the MAX(start date) OVER (…) the preceding rows in the partition. If so, I don’t create a new group and have an else condition that creates the group.

I haven’t profiled it, so I don’t know if the plan would be different if I had to do that for the start and end, but I at least saved myself some typing. When I get a minute, I’ll try to put an implementation together that uses the schema in the article and put it up here and see if you agree that it solves the problem.

Thanks again.

11 09 2014
jamespee2014

Worked this up quickly:

If you create a period that is completely intersected by one period and another that intersects the first but not the second, this potentially causes an issue in using LAG() and evaluating only the previous period. For example (forgive the slight syntax differences, as I am developing on SQL Server, and am representing date in MM/DD/YYYY format):

insert into t1 values (’1234′, ’01/01/2001′, ’03/30/2001′);
insert into t1 values (’1234′, ’01/02/2001′, ’03/30/2001′);
insert into t1 values (’1234′, ’01/03/2001′, ’01/31/2001′);
insert into t1 values (’1234′, ’02/15/2001′, ’04/15/2001′);
insert into t1 values (’1234′, ’07/01/2001′, ’12/30/2001′);

The period Jan-3 to Jan 31 is entirely contained within the prior period but does not intersect the Feb-15 to Apr-15 period, which does intersect the Jan-1 to Mar-30 and the Jan-2 to Mar-30 period.

Using the implementation using LAG yields the following periods:

1234 2001-01-01 2001-03-30
1234 2001-02-15 2001-04-15
1234 2001-07-01 2001-12-30

Clearly, the Feb-15 period should be included in the prior period. So, changing this implementation to this:

select id, min(start_dtm), max(end_dtm)
from
(
select id, start_dtm, end_dtm, max (grp) over (partition by id order by start_dtm asc) grp2
from
(
SELECT id,
start_dtm,
isnull (end_dtm, ’31-dec-4712′) end_dtm,
case
when
(start_dtm not between min(start_dtm) over (partition by id order by start_dtm asc rows between unbounded preceding and 1 preceding)
and isnull( max(end_dtm) over (partition by id order by start_dtm asc rows between unbounded preceding and 1 preceding), ’31-dec-4712′) )
or lag(start_dtm) over (partition by id order by start_dtm asc) is null
then
row_number() over (partition by id order by start_dtm asc)
end grp
FROM t1
) a
) b
group by id, grp2

Yields the following expected result:

1234 2001-01-01 2001-04-15
1234 2001-07-01 2001-12-30

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s




Follow

Get every new post delivered to your Inbox.

Join 148 other followers

%d bloggers like this: