SQL – Filling in Gaps in the Source Data 2

8 12 2009

December 8, 2009

The following question recently appeared in the comp.databases.oracle.server group:
http://groups.google.com/group/comp.databases.oracle.server/browse_thread/thread/d9c7420e0e36231d

Could anyone show me how to write a analytical query for my requirement.

Query:

SELECT distinct bug_when,
       login_name,
       SUM(incoming_count - outgoing_count)
        OVER (PARTITION BY login_name ORDER BY bug_when) AS OPEN
FROM RM_COMP_INCOM_OUTGO_BUGIDS_GT G,
     bz_components c,
     bz_profiles p
where  G.component_id= c.ID and
c.manager = p.userid
order by 2,1 desc;

Query returns data like this:

12/04/2009 ssmit 52
12/02/2009 ssmit 48
11/30/2009 ssmit 45
11/29/2009 ssmit 42

I want the data to be like this:

12/04/2009 ssmit 52
12/03/2009 ssmit 48 -- fill the gap with previous value
12/02/2009 ssmit 48
12/01/2009 ssmit 45
11/30/2009 ssmit 45
11/30/2009 ssmit 45
11/29/2009 ssmit 42

Table Desc:

CREATE GLOBAL TEMPORARY TABLE RM_COMP_INCOM_OUTGO_BUGIDS_GT
(
  BUG_WHEN        DATE,
  COMPONENT_ID    NUMBER,
  INCOMING_COUNT  NUMBER,
  OUTGOING_COUNT  NUMBER
);

Set up a test table for experimentation (note, I changed the value for test from 42 to 12 for extra clarity and added additional rows):

CREATE TABLE T1
(
  BUG_WHEN        DATE,
  COMPONENT_ID    VARCHAR2(10),
  INCOMING_COUNT  NUMBER,
  OUTGOING_COUNT  NUMBER
);

INSERT INTO T1 VALUES (TO_DATE('12/07/2009','MM/DD/YYYY'),'ssmit',60,60);
INSERT INTO T1 VALUES (TO_DATE('12/04/2009','MM/DD/YYYY'),'ssmit',52,52);
INSERT INTO T1 VALUES (TO_DATE('12/02/2009','MM/DD/YYYY'),'ssmit',48,48);
INSERT INTO T1 VALUES (TO_DATE('11/30/2009','MM/DD/YYYY'),'ssmit',45,45);
INSERT INTO T1 VALUES (TO_DATE('11/29/2009','MM/DD/YYYY'),'ssmit',42,42);
INSERT INTO T1 VALUES (TO_DATE('01/29/2009','MM/DD/YYYY'),'test',12,12);

COMMIT;

SELECT
  *
FROM
  T1;

BUG_WHEN  COMPONENT_ INCOMING_COUNT OUTGOING_COUNT
--------- ---------- -------------- --------------
07-DEC-09 ssmit                  60             60
04-DEC-09 ssmit                  52             52
02-DEC-09 ssmit                  48             48
30-NOV-09 ssmit                  45             45
29-NOV-09 ssmit                  42             42
29-JAN-09 test                   12             12

Let’s start with using the LEAD function to peek at the next row from the T1 table.

SELECT
  COMPONENT_ID,
  BUG_WHEN,
  LEAD(BUG_WHEN,1) OVER (PARTITION BY COMPONENT_ID ORDER BY BUG_WHEN) NEXT_BUG_WHEN,
  INCOMING_COUNT
FROM
  T1;

COMPONENT_ BUG_WHEN  NEXT_BUG_ INCOMING_COUNT
---------- --------- --------- --------------
ssmit      29-NOV-09 30-NOV-09             42
ssmit      30-NOV-09 02-DEC-09             45
ssmit      02-DEC-09 04-DEC-09             48
ssmit      04-DEC-09 07-DEC-09             52
ssmit      07-DEC-09                       60
test       29-JAN-09                       12

Above needs to be modified to permit ranges by subtracting 1 from the next date.  Also we need to fix the NULL values on the last row of each COMPONENT_ID grouping.

SELECT
  COMPONENT_ID,
  BUG_WHEN,
  NVL(LEAD(BUG_WHEN,1) OVER (PARTITION BY COMPONENT_ID ORDER BY BUG_WHEN)-1,BUG_WHEN) THROUGH_BUG_WHEN,
  NVL(LEAD(BUG_WHEN,1) OVER (PARTITION BY COMPONENT_ID ORDER BY BUG_WHEN)-1,BUG_WHEN)-BUG_WHEN RANGE_DAYS,
  INCOMING_COUNT
FROM
  T1;

COMPONENT_ BUG_WHEN  THROUGH_B RANGE_DAYS INCOMING_COUNT
---------- --------- --------- ---------- --------------
ssmit      29-NOV-09 29-NOV-09          0             42
ssmit      30-NOV-09 01-DEC-09          1             45
ssmit      02-DEC-09 03-DEC-09          1             48
ssmit      04-DEC-09 06-DEC-09          2             52
ssmit      07-DEC-09 07-DEC-09          0             60
test       29-JAN-09 29-JAN-09          0             12

Now, all we need to do is find a way to generate the missing dates between the BUG_WHEN date and the THROUGH_BUG_WHEN date.  If we slide the above into an inline view and join to an inline view with a counter generator, we end up with the following:

SELECT
  T1.COMPONENT_ID,
  T1.BUG_WHEN+V1.C BUG_WHEN,
  T1.INCOMING_COUNT
FROM
  (SELECT
    COMPONENT_ID,
    BUG_WHEN,
    NVL(LEAD(BUG_WHEN,1) OVER (PARTITION BY COMPONENT_ID ORDER BY BUG_WHEN)-1,BUG_WHEN) THROUGH_BUG_WHEN,
    NVL(LEAD(BUG_WHEN,1) OVER (PARTITION BY COMPONENT_ID ORDER BY BUG_WHEN)-1,BUG_WHEN)-BUG_WHEN RANGE_DAYS,
    INCOMING_COUNT
  FROM
    T1) T1,
  (SELECT
    ROWNUM-1 C
  FROM
    DUAL
  CONNECT BY
    LEVEL<=100) V1
WHERE
  V1.C<=T1.RANGE_DAYS
ORDER BY
  T1.COMPONENT_ID,
  T1.BUG_WHEN+V1.C DESC;

COMPONENT_ BUG_WHEN  INCOMING_COUNT
---------- --------- --------------
ssmit      07-DEC-09             60
ssmit      06-DEC-09             52
ssmit      05-DEC-09             52
ssmit      04-DEC-09             52
ssmit      03-DEC-09             48
ssmit      02-DEC-09             48
ssmit      01-DEC-09             45
ssmit      30-NOV-09             45
ssmit      29-NOV-09             42
test       29-JAN-09             12




SQL – Outer Joins, Inline Views, and DENSE_RANK

8 12 2009

December 7, 2009

Some time ago the following question appeared in the comp.databases.oracle.server Usenet group:
http://groups.google.com/group/comp.databases.oracle.server/browse_thread/thread/a2112d7bcae62df5

 Learning SQL – will appreciate any help.
 Here is the case:

Two tables with 0..n relationship

TableA                  TableB
-----------             --------------
ColA                    ColB
                        ColA_FK
                        Col_C

Data

TableA.ColA
====================
 1
 2

Table B

ColB   ColA_FK    Col_C
 11      2              12345
 12      2              99999

 Resultset:

 --------------------------------------
 ColA   ColB      Col_C
 --------------------------------------
 1       -        -
 2       12       99999

In case no data exists in TableB for ColA=11,  1 from TableA shows up in result without any data from TableB
 However, in case of 2 from ColA,  we want to record from TableB with Col_C = 99999

I can do the outer join – however, how do I limit it so it picks only the row with 9999?

Here is the test case:

create table tableA (colA number not null);

create table tableB (colB number not null, colA_fk number_not null, colC number not null); 

insert into tableA values (1);
insert into tableA values (2); 

commit;

insert into tableB values (11,2,12345);
insert into tableB values (12,2,99999); 

commit;

 

select tableA.colA, tableB.colA_fk, tableB.colC from tableA left join
tableB on tableA.colA = tableB.colA_fk where tableB.colC = 99999;

--------------------------------------
  ColA   ColA_FK      ColC
  --------------------------------------
  2       12       99999

 I would also like to display another row:

 1       -    -

 in there.

Using the suggestion offered by Pat, with a small modification:

SELECT
  TABLEA.COLA,
  TABLEB.COLA_FK,
  TABLEB.COLC
FROM
  TABLEA
LEFT JOIN
  TABLEB
ON
  TABLEA.COLA=TABLEB.COLA_FK
  AND TABLEB.COLC=99999;

      COLA    COLA_FK       COLC
---------- ---------- ----------
         2          2      99999
         1

This is the way I would commonly set up a SQL statement to meet a similar requirements:

SELECT
  TABLEA.COLA,
  TABLEB.COLA_FK,
  TABLEB.COLC
FROM
  TABLEA,
  TABLEB
WHERE
  TABLEA.COLA=TABLEB.COLA_FK(+)
  AND TABLEB.COLC(+)=99999;

      COLA    COLA_FK       COLC
---------- ---------- ----------
         1
         2          2      99999

Will the value of interest always be 99999, or will it be the highest value with a matching COLA_FK?  If you are looking for the highest value, please supply the four digit version of Oracle that you are using (10.2.0.2, 11.1.0.6, 8.1.7.3, etc.).

The original poster replied:

Aha – that is very cool.  I had figured out the

TABLEA.COLA=TABLEB.COLA_FK(+)

but did not know I could do this:

AND TABLEB.COLC(+)=99999;

As a matter of fact you are right – how did you guess – in my cases, the interest of value would be max of whatever is in TABLEB.COLC – if the row with 99999 does not exist, then the sql returns the fow with data in COLC=12345.
 Also, working with ORACLE version 9.2.0.8.

It looks like the DENSE_RANK analytical function, an inline view, and an outer join are required.

First, let’s introduce a little more data to make certain that we cannot query for a specific value of COLC and return the expected
results:

INSERT INTO TABLEA VALUES (3);
INSERT INTO TABLEB VALUES (13,2,111111);
INSERT INTO TABLEB VALUES (13,3,11);

Next, we try an experiment with the DENSE_RANK function to separate the rows by the value of COLA_FK (caused by the PARTITION BY directive) and rank the values sorted from highest to lowest (caused by the DESC directive):

SELECT
  COLB,
  COLA_FK,
  COLC,
  DENSE_RANK() OVER (PARTITION BY COLA_FK ORDER BY COLC DESC) DR
FROM
  TABLEB;

      COLB    COLA_FK       COLC         DR
---------- ---------- ---------- ----------
        13          2     111111          1
        12          2      99999          2
        11          2      12345          3
        13          3         11          1

We are only interested in the rows with DR = 1, so we need a way to eliminate the unnecessary rows.  If we slide the above SQL statement into an inline view, we are able to add a WHERE clause that restricts the results to the rows containing the highest COLC value per COLA_FK value.  We can then alias the inline view (as B), and join it to TABLEA as before:

SELECT
  TABLEA.COLA,
  B.COLA_FK,
  B.COLC
FROM
  TABLEA,
  (SELECT
    COLB,
    COLA_FK,
    COLC,
    DENSE_RANK() OVER (PARTITION BY COLA_FK ORDER BY COLC DESC) DR
  FROM
    TABLEB) B
WHERE
  TABLEA.COLA=B.COLA_FK(+)
  AND B.DR(+)=1;

      COLA    COLA_FK       COLC
---------- ---------- ----------
         2          2     111111
         3          3         11
         1 




Inner and Outer Join Examples

8 12 2009

December 7, 2009

Some time ago someone asked the following in a forum:
http://groups.google.com/group/oracle_dba_experts/browse_thread/thread/41863b746cbef8d2

Can someone please help me in understanding inner and outer joins?  I seem to have a block when it comes to understanding this concept.

 Short answer:
* Inner join – the value must be in BOTH tables
* Outer join – the value must be in at least ONE table

Assume that the following table and columns in those tables, and data in those columns exist:

TABLE_1.ANIMAL 
  COW 
  PIG 
  ZEBRA 
  SHARK 
  ROOSTER 
  LION

 

TABLE_2.ANIMAL 
  COW 
  PIG 
  DOG

 

TABLE_3.ANIMAL 
  ZEBRA 
  LION 
  TIGER

Inner join TABLE_2 and TABLE_3

SELECT 
  T2.ANIMAL, 
  T3.ANIMAL 
FROM 
  TABLE_2 T2, 
  TABLE_3 T3 
WHERE 
  T2.ANIMAL=T3.ANIMAL; 
 
(no results) 
-------------------  

(Left) Outer join TABLE_2 and TABLE_3, include all rows from TABLE_2

SELECT 
  T2.ANIMAL, 
  T3.ANIMAL 
FROM 
  TABLE_2 T2, 
  TABLE_3 T3 
WHERE 
  T2.ANIMAL=T3.ANIMAL(+); 
 
T2.ANIMAL      T3.ANIMAL 
COW            (null) 
PIG            (null) 
DOG            (null) 
-------------------

Inner join TABLE_1 and TABLE_2

SELECT 
  T1.ANIMAL, 
  T2.ANIMAL 
FROM 
  TABLE_1 T1, 
  TABLE_2 T2 
WHERE 
  T1.ANIMAL=T2.ANIMAL; 
 
T1.ANIMAL         T2.ANIMAL 
COW               COW 
PIG               PIG 
-------------------

(Right) Outer join TABLE_1 and TABLE_2, include all rows from TABLE_2

SELECT 
  T1.ANIMAL, 
  T2.ANIMAL 
FROM 
  TABLE_1 T1, 
  TABLE_2 T2 
WHERE 
  T1.ANIMAL(+)=T2.ANIMAL; 
 
T1.ANIMAL         T2.ANIMAL 
COW               COW 
PIG               PIG 
(null)            DOG 
-------------------

(One Method, Full) Outer join TABLE_1 and TABLE_2, include all rows

SELECT 
  T1.ANIMAL, 
  T2.ANIMAL 
FROM 
  TABLE_1 T1, 
  TABLE_2 T2 
WHERE 
  T1.ANIMAL(+)=T2.ANIMAL 
UNION 
SELECT 
  T1.ANIMAL, 
  T2.ANIMAL 
FROM 
  TABLE_1 T1, 
  TABLE_2 T2 
WHERE 
  T1.ANIMAL=T2.ANIMAL(+); 
 
T1.ANIMAL         T2.ANIMAL 
COW               COW 
PIG               PIG 
(null)            DOG 
ZEBRA             (null) 
SHARK             (null) 
ROOSTER           (null) 
LION              (null)




SQL – Running Sum, Skipping Weekends

8 12 2009

December 7, 2009

Some time ago the following question appeared on the comp.databases.oracle.misc Usenet group: http://groups.google.com/group/comp.databases.oracle.misc/browse_thread/thread/cad8621182ae7d1b

This sounds like it should be super simple, but I can’t think how to do it using plain SQL.

I am going to display some rows, and I want to number them in the display, like using rownum, except that the number only goes up when the row has some property (I don’t care if it displays or not when it doesn’t go up).

To explain, imagine

select rownum  , the_date , to_char(dates.the_date,'DY') DAY 
from my_table 
order by the_date

shows 
        1       1-Jan-2008      TUE 
        2       2-Jan-2008      WED 
        3       3-Jan-2008      THU 
        4       4-Jan-2008      FRI 
        5       5-Jan-2008      SAT 
        6       6-Jan-2008      SUN 
        7       7-Jan-2008      MON 
        8       8-Jan-2008      TUE

but I don’t want to count the weekend.  what I want to show would be the following instead

        1       1-Jan-2008      TUE 
        2       2-Jan-2008      WED 
        3       3-Jan-2008      THU 
        4       4-Jan-2008      FRI 
                5-Jan-2008      SAT 
                6-Jan-2008      SUN 
        5       7-Jan-2008      MON 
        6       8-Jan-2008      TUE

Looks simple, but how to do this in plain SQL?

I provided the following solution:

Do it with the SUM analytical function and DECODE:

SELECT 
  TO_DATE('01/01/2008','MM/DD/YYYY')+(ROWNUM-1) THE_DATE, 
  TO_CHAR(TO_DATE('01/01/2008','MM/DD/YYYY')+(ROWNUM-1),'DY') DAY 
FROM 
  DUAL 
CONNECT BY 
  LEVEL<=20; 
 
THE_DATE  DAY 
--------- --- 
01-JAN-08 TUE 
02-JAN-08 WED 
03-JAN-08 THU 
04-JAN-08 FRI 
... 
19-JAN-08 SAT 
20-JAN-08 SUN

Step 1:

SELECT 
  SUM(DECODE(DAY,'SAT',0,'SUN',0,1)) OVER (ORDER BY THE_DATE) MY_COUNT, 
  THE_DATE, 
  DAY 
FROM 
  (SELECT 
    TO_DATE('01/01/2008','MM/DD/YYYY')+(ROWNUM-1) THE_DATE, 
    TO_CHAR(TO_DATE('01/01/2008','MM/DD/YYYY')+(ROWNUM-1),'DY') DAY 
  FROM 
    DUAL 
  CONNECT BY 
    LEVEL<=20); 
 
  MY_COUNT THE_DATE  DAY 
---------- --------- --- 
         1 01-JAN-08 TUE 
         2 02-JAN-08 WED 
         3 03-JAN-08 THU 
         4 04-JAN-08 FRI 
         4 05-JAN-08 SAT 
         4 06-JAN-08 SUN 
         5 07-JAN-08 MON 
         6 08-JAN-08 TUE 
         7 09-JAN-08 WED 
         8 10-JAN-08 THU 
         9 11-JAN-08 FRI 
         9 12-JAN-08 SAT 
         9 13-JAN-08 SUN 
        10 14-JAN-08 MON 
        11 15-JAN-08 TUE 
        12 16-JAN-08 WED 
        13 17-JAN-08 THU 
        14 18-JAN-08 FRI 
        14 19-JAN-08 SAT 
        14 20-JAN-08 SUN

Now to remove the numbers that should not print, using a second DECODE:

SELECT 
  DECODE(DAY,'SAT',NULL,'SUN',NULL,SUM(DECODE(DAY,'SAT',0,'SUN',0,1)) OVER (ORDER BY THE_DATE)) MY_COUNT, 
  THE_DATE, 
  DAY 
FROM 
  (SELECT 
    TO_DATE('01/01/2008','MM/DD/YYYY')+(ROWNUM-1) THE_DATE, 
    TO_CHAR(TO_DATE('01/01/2008','MM/DD/YYYY')+(ROWNUM-1),'DY') DAY 
  FROM 
    DUAL 
  CONNECT BY 
    LEVEL<=20); 
 
MY THE_DATE  DAY 
-- --------- --- 
1  01-JAN-08 TUE 
2  02-JAN-08 WED 
3  03-JAN-08 THU 
4  04-JAN-08 FRI 
   05-JAN-08 SAT 
   06-JAN-08 SUN 
5  07-JAN-08 MON 
6  08-JAN-08 TUE 
7  09-JAN-08 WED 
8  10-JAN-08 THU 
9  11-JAN-08 FRI 
   12-JAN-08 SAT 
   13-JAN-08 SUN 
10 14-JAN-08 MON 
11 15-JAN-08 TUE 
12 16-JAN-08 WED 
13 17-JAN-08 THU 
14 18-JAN-08 FRI 
   19-JAN-08 SAT 
   20-JAN-08 SUN

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Maxim Demenko provided the following solution:

SQL> alter session set nls_territory='AMERICA'; 
 
Session altered. 
 
SQL> alter session set nls_date_language='AMERICAN'; 
 
Session altered. 
 
SQL> 
SQL> with my_table as ( 
   2    select date '2008-01-01' + rownum -1 the_date 
   3    from dual connect by level <=10 
   4  ) 
   5  select 
   6  case 
   7  when 
   8  to_char(the_date,'d') not in (1,7) 
   9  then row_number() 
  10  over(partition by case when to_char(the_date,'d') not in (1,7) then 1 
  11  end order by the_date) 
  12  end row_num, 
  13  the_date, 
  14  to_char(the_date,'DY') DAY 
  15  from my_table 
  16  order by the_date; 
 
    ROW_NUM THE_DATE           DAY 
---------- ------------------ ------------ 
          1 01-JAN-08          TUE 
          2 02-JAN-08          WED 
          3 03-JAN-08          THU 
          4 04-JAN-08          FRI 
            05-JAN-08          SAT 
            06-JAN-08          SUN 
          5 07-JAN-08          MON 
          6 08-JAN-08          TUE 
          7 09-JAN-08          WED 
          8 10-JAN-08          THU

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  
Ken Denny provided the following solution:

select r_num, the_date, to_char(dates.the_date,'DY') DAY 
  from (select rownum r_num, the_date 
          from (select the_date from my_table 
                  where to_char(dates.the_date,'DY') IN 
                     ('MON','TUE','WED','THU','FRI') 
                  order by the_date) 
        union 
        select null r_num, the_date from my_table 
          where to_char(dates.the_date,'DY') IN ('SAT','SUN')) 
   order by the_date;




SQL – Combining Overlapping Date Rows

8 12 2009

December 7, 2009

Some time ago the following question appeared on the comp.databases.oracle.server Usenet group:
http://groups.google.com/group/comp.databases.oracle.server/browse_thread/thread/aff40e7ccf09d9cc

On Oracle 10G, lets say I have a table that looks like this:

create table t1 (key varchar2(10), start_dtm date, end_dtm date);

and I have data some of which over laps:

insert into t1 values ('1234', to_date('01/01/2001', 'DD/MM/YYYY'), to_date('30/03/2001', 'DD/MM/YYYY'));
insert into t1 values ('1234', to_date('01/02/2001', 'DD/MM/YYYY'), to_date('30/03/2001', 'DD/MM/YYYY'));
insert into t1 values ('1234', to_date('15/02/2001', 'DD/MM/YYYY'), to_date('15/04/2001', 'DD/MM/YYYY'));
insert into t1 values ('1234', to_date('01/07/2001', 'DD/MM/YYYY'), to_date('30/12/2001', 'DD/MM/YYYY'));
Ie:

1st Jan - 30th March
    2nd Feb - 30th March
       15th Feb - 15th April

1st July - 30th Dec

I want to return only two rows from the 4 that look like

1st jan - 15th April
1st July - 30th Dec

ie – if the dates are continuous, squash all the continuous rows into 1 with the min start and max end date but if there is a gap it has to become a second row.  There can be any number of rows and over laps etc. I reckon this can be done with some analytics trickery, but I just cannot seem to figure out how.

Any ideas?

I provided the following solution:

First, we experiment with LAG so that we are able to examine the previous row when sorted by START_DTM and END_DTM:

SELECT
  T1.KEY,
  T1.START_DTM,
  T1.END_DTM,
  LEAD(T1.START_DTM) OVER (PARTITION BY T1.KEY ORDER BY T1.START_DTM,T1.END_DTM) M_START_DTM,
  LEAD(T1.END_DTM) OVER (PARTITION BY T1.KEY ORDER BY T1.START_DTM,T1.END_DTM) M_END_DTM
FROM
  T1
ORDER BY
  T1.START_DTM,
  T1.END_DTM;

KEY        START_DTM END_DTM   M_START_D M_END_DTM
---------- --------- --------- --------- ---------
1234       01-JAN-01 30-MAR-01 01-FEB-01 30-MAR-01
1234       01-FEB-01 30-MAR-01 15-FEB-01 15-APR-01
1234       15-FEB-01 15-APR-01 01-JUL-01 30-DEC-01
1234       01-JUL-01 30-DEC-01

If we now slide the above into an inline view and see if the START_DTM is greater than M_END_DTM (the previous row’s END_DTM), we may have found a new series of dates (note, I just noticed that this may have problems if there are two rows with discrete date ranges that fall entirely within a third, larger date range), we will output 1 if true, or 0 if false:

SELECT
  KEY,
  START_DTM,
  END_DTM,
  M_START_DTM,
  M_END_DTM,
  DECODE(SIGN(START_DTM-NVL(M_END_DTM,START_DTM)),1,1,0) C
FROM
  (SELECT
    T1.KEY,
    T1.START_DTM,
    T1.END_DTM,
    LAG(T1.START_DTM) OVER (PARTITION BY T1.KEY ORDER BY T1.START_DTM,T1.END_DTM) M_START_DTM,
    LAG(T1.END_DTM) OVER (PARTITION BY T1.KEY ORDER BY T1.START_DTM,T1.END_DTM) M_END_DTM
  FROM
    T1
  ORDER BY
    T1.START_DTM,
    T1.END_DTM);

KEY        START_DTM END_DTM   M_START_D M_END_DTM          C
---------- --------- --------- --------- --------- ----------
1234       01-JAN-01 30-MAR-01                              0
1234       01-FEB-01 30-MAR-01 01-JAN-01 30-MAR-01          0
1234       15-FEB-01 15-APR-01 01-FEB-01 30-MAR-01          0
1234       01-JUL-01 30-DEC-01 15-FEB-01 15-APR-01          1

In the above, a 1 is output whenever there is a jump in the date range – if we use the COUNT analytical function to create a running count of the 1s and slide the above into an inline view, we actually create a column that may be used for grouping:

SELECT
  KEY,
  START_DTM START_DTM,
  END_DTM END_DTM,
  SUM(C) OVER (PARTITION BY KEY ORDER BY START_DTM,END_DTM) G
FROM
(SELECT
  KEY,
  START_DTM,
  END_DTM,
  M_START_DTM,
  M_END_DTM,
  DECODE(SIGN(START_DTM-NVL(M_END_DTM,START_DTM)),1,1,0) C
FROM
  (SELECT
    T1.KEY,
    T1.START_DTM,
    T1.END_DTM,
    LAG(T1.START_DTM) OVER (PARTITION BY T1.KEY ORDER BY T1.START_DTM,T1.END_DTM) M_START_DTM,
    LAG(T1.END_DTM) OVER (PARTITION BY T1.KEY ORDER BY T1.START_DTM,T1.END_DTM) M_END_DTM
  FROM
    T1
  ORDER BY
    T1.START_DTM,
    T1.END_DTM));

KEY        START_DTM END_DTM            G
---------- --------- --------- ----------
1234       01-JAN-01 30-MAR-01          0
1234       01-FEB-01 30-MAR-01          0
1234       15-FEB-01 15-APR-01          0
1234       01-JUL-01 30-DEC-01          1

We are now able to group on the column G by again sliding the above into an inline view:

SELECT
  KEY,
  MIN(START_DTM) START_DTM,
  MAX(END_DTM) END_DTM
FROM
(SELECT
  KEY,
  START_DTM START_DTM,
  END_DTM END_DTM,
  SUM(C) OVER (PARTITION BY KEY ORDER BY START_DTM,END_DTM) G
FROM
(SELECT
  KEY,
  START_DTM,
  END_DTM,
  M_START_DTM,
  M_END_DTM,
  DECODE(SIGN(START_DTM-NVL(M_END_DTM,START_DTM)),1,1,0) C
FROM
  (SELECT
    T1.KEY,
    T1.START_DTM,
    T1.END_DTM,
    LAG(T1.START_DTM) OVER (PARTITION BY T1.KEY ORDER BY T1.START_DTM,T1.END_DTM) M_START_DTM,
    LAG(T1.END_DTM) OVER (PARTITION BY T1.KEY ORDER BY T1.START_DTM,T1.END_DTM) M_END_DTM
  FROM
    T1
  ORDER BY
    T1.START_DTM,
    T1.END_DTM)))
GROUP BY
  KEY,
  G;

KEY        START_DTM END_DTM
---------- --------- ---------
1234       01-JAN-01 15-APR-01
1234       01-JUL-01 30-DEC-01

Let’s add a little more data to see what happens:

insert into t1 values ('1234', to_date('10/10/2001', 'DD/MM/YYYY'), to_date('29/12/2001', 'DD/MM/YYYY'));
insert into t1 values ('1234', to_date('31/12/2001', 'DD/MM/YYYY'), to_date('15/01/2002', 'DD/MM/YYYY'));

KEY        START_DTM END_DTM
---------- --------- ---------
1234       01-JAN-01 15-APR-01
1234       01-JUL-01 30-DEC-01
1234       31-DEC-01 15-JAN-02

Note the addition of the last row in the output – you may want to determine if that should be part of the previous group of date.  You should be able to fix this by adding or subtracting a date in the SIGN( ) function.

—-

I originally thought that you were trying to eliminate this row from the output:

KEY        START_DTM END_DTM
---------- --------- ---------
1234       01-FEB-01 30-MAR-01

For the above, there are several methods:

SELECT
  T1.KEY,
  T1.START_DTM,
  T1.END_DTM
FROM
  T1
WHERE
  (T1.KEY,
  T1.START_DTM,
  T1.END_DTM) NOT IN (
    SELECT
      T1.KEY,
      T1.START_DTM,
      T1.END_DTM
    FROM
      T1,
      T1 T2
    WHERE
      T1.KEY=T2.KEY
      AND T1.START_DTM BETWEEN T2.START_DTM AND T2.END_DTM
      AND T1.END_DTM BETWEEN  T2.START_DTM AND T2.END_DTM
      AND (
        T1.START_DTM<>T2.START_DTM
        OR T1.END_DTM<>T2.END_DTM));

KEY        START_DTM END_DTM
---------- --------- ---------
1234       01-JAN-01 30-MAR-01
1234       15-FEB-01 15-APR-01
1234       01-JUL-01 30-DEC-01

 

SELECT
  T1.KEY,
  T1.START_DTM,
  T1.END_DTM
FROM
  T1,
  (SELECT
    T1.KEY,
    T1.START_DTM,
    T1.END_DTM
  FROM
    T1,
    T1 T2
  WHERE
    T1.KEY=T2.KEY
    AND T1.START_DTM BETWEEN T2.START_DTM AND T2.END_DTM
    AND T1.END_DTM BETWEEN  T2.START_DTM AND T2.END_DTM
    AND (
      T1.START_DTM<>T2.START_DTM
      OR T1.END_DTM<>T2.END_DTM)) T3
WHERE
  T1.KEY=T3.KEY(+)
  AND T1.START_DTM=T3.START_DTM(+)
  AND T1.END_DTM=T3.END_DTM(+)
  AND T3.KEY IS NULL;

KEY        START_DTM END_DTM
---------- --------- ---------
1234       01-JAN-01 30-MAR-01
1234       15-FEB-01 15-APR-01
1234       01-JUL-01 30-DEC-01

 

SELECT
  T1.KEY,
  T1.START_DTM,
  T1.END_DTM
FROM
  T1
MINUS
SELECT
  T1.KEY,
  T1.START_DTM,
  T1.END_DTM
FROM
  T1,
  T1 T2
WHERE
  T1.KEY=T2.KEY
  AND T1.START_DTM BETWEEN T2.START_DTM AND T2.END_DTM
  AND T1.END_DTM BETWEEN  T2.START_DTM AND T2.END_DTM
  AND (
    T1.START_DTM<>T2.START_DTM
    OR T1.END_DTM<>T2.END_DTM);

KEY        START_DTM END_DTM
---------- --------- ---------
1234       01-JAN-01 30-MAR-01
1234       15-FEB-01 15-APR-01
1234       01-JUL-01 30-DEC-01

 

SELECT
  KEY,
  START_DTM,
  END_DTM
FROM
  (SELECT
    T1.KEY,
    T1.START_DTM,
    T1.END_DTM,
    MIN(T1.START_DTM) OVER (PARTITION BY T1.KEY, T1.END_DTM) M_START_DTM,
    MAX(T1.END_DTM) OVER  (PARTITION BY T1.KEY, T1.START_DTM) M_END_DTM
  FROM
    T1)
WHERE
  START_DTM=M_START_DTM
  AND END_DTM=M_END_DTM;

KEY        START_DTM END_DTM
---------- --------- ---------
1234       01-JAN-01 30-MAR-01
1234       15-FEB-01 15-APR-01
1234       01-JUL-01 30-DEC-01

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Maxim Demenko provided the following solution:

Maybe, something like this

SQL> select key, min(start_dtm) start_dtm, max(end_dtm) end_dtm
   2    from (select key,
   3                 start_dtm,
   4                 end_dtm,
   5                 sum(interval_change) over(partition by key order by end_dtm, start_dtm) interval_no
   6            from (select key,
   7                         start_dtm,
   8                         end_dtm,
   9                         case
  10                           when (lag(end_dtm)
  11                                 over(partition by key order by end_dtm,
  12                                      start_dtm) - start_dtm) > 0 then
  13                            0
  14                           else
  15                            1
  16                         end interval_change
  17                    from t1))
  18   group by key, interval_no
  19   order by key, interval_no
  20  ;

KEY        START_DTM            END_DTM
---------- -------------------- --------------------
1234       01-Jan-2001          15-Apr-2001
1234       01-Jul-2001          30-Dec-2001

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The original poster followed up with the following solution that he developed:

Using the same table as above and this data:

create table t1 (key varchar2(10), start_dtm date, end_dtm date);

insert into t1 values( 9999, to_date('01/01/2008', 'DD/MM/YYYY'), to_date('01/06/2008', 'DD/MM/YYYY'));
insert into t1 values( 9999, to_date('01/02/2008', 'DD/MM/YYYY'), to_date('01/06/2008', 'DD/MM/YYYY'));
insert into t1 values( 9999, to_date('01/02/2008', 'DD/MM/YYYY'), to_date('01/05/2008', 'DD/MM/YYYY'));
insert into t1 values( 9999, to_date('01/02/2008', 'DD/MM/YYYY'), to_date('01/07/2008', 'DD/MM/YYYY'));

This gives a set of rows like:

|------------------------------------------|
   |---------------------------------------|
   |---------------------|
      |------------------------------------------|

And we expect this to turn into a single row like

01/01/08 |----------------------------------------------| 01/07/2008

 

insert into t1 values( 9999, to_date('01/08/2008', 'DD/MM/YYYY'), null);
insert into t1 values( 9999, to_date('01/09/2008', 'DD/MM/YYYY'), to_date('01/10/2008', 'DD/MM/YYYY'));

This gives a set of rows like:

|--------------------------------------->
   |------------|

And we want an row that looks like

01/08/2008 |----------------------------> (null end date)

I think this query does it:

select key, min(start_dtm), max(end_dtm)
from
(
  select key, start_dtm, end_dtm, max (grp) over (partition by key order by start_dtm asc) grp2
  from
  (
    SELECT key,
           start_dtm,
           NVL (end_dtm, '31-dec-4712') end_dtm,
           case
             when
               (start_dtm not between lag(start_dtm) over (partition by key order by start_dtm asc)
                   and nvl( lag(end_dtm) over (partition by key order by start_dtm asc), '31-dec-4712') )
                or lag(start_dtm) over (partition by key order by start_dtm asc) is null
                then
                  row_number() over (partition by key order by start_dtm asc)
             end grp
    FROM t1
    order by key, start_dtm asc
  ) a
) b
group by key, grp2

GRP                              MIN(START MAX(END_D
--------------------------------- --------- ---------
09999                             01-JAN-08 01-JUL-08
09999                             01-AUG-08 31-DEC-12

Basically I said, if you order the rows by increasing start date, then if rows overlap, the current start_dtm must be between the previous start_dtm and end_dtm – if there is no previous start_dtm or they don’t overlap, then its a new group.

The innermost select outputs something like:

KEY                               START_DTM END_DTM          GRP
--------------------------------- --------- --------- ----------
09999                             01-JAN-08 01-JUN-08          1
09999                             01-FEB-08 01-JUL-08
09999                             01-FEB-08 01-JUN-08
09999                             01-FEB-08 01-MAY-08
09999                             01-AUG-08 31-DEC-12          5
09999                             01-SEP-08 01-OCT-08

Then we use analytics again to fill in the blanks in the enclosing query (b):

KEY                               START_DTM END_DTM         GRP2
--------------------------------- --------- --------- ----------
09999                             01-JAN-08 01-JUN-08          1
09999                             01-FEB-08 01-JUL-08          1
09999                             01-FEB-08 01-JUN-08          1
09999                             01-FEB-08 01-MAY-08          1
09999                             01-AUG-08 31-DEC-12          5
09999                             01-SEP-08 01-OCT-08          5

and then simply group by KEY, GRP in the outer query to get the result.

I am glad I went through the pain of figuring this out (took me a good 90 minutes) as I really feel like I get analytics now.  It also allowed me to replace a temporary table and several 100 lines of PLSQL that was performing rather poorly and written by someone who doesn’t get analytics!








Follow

Get every new post delivered to your Inbox.

Join 137 other followers