Matching the Expected Output – Analytic RANK, ROW_NUMBER, DENSE_RANK, or Something Different?

16 11 2011

November 16, 2011

I noticed an interesting request on the comp.databases.oracle.misc usenet group, and I thought that I would make an attempt at solving the request.  The original poster (OP) stated that he had a table with data similar to the following:

TX   ID   DEPT   LOCATION   LOAD
1    99    A        NY       12
2    99    A        LA       10
3    99    B        LA       05
4    77    B        LA       15
5    77    C        NY       12
6    77    D        LA       11  

He would like to obtain the following output:

TX   ID   DEPT   DEPT_RANK   LOCATION   LOC_RANK   LOAD
1    99    A         1          NY         2        12
2    99    A         1          LA         1        10
3    99    B         2          LA         1        05
4    77    B         1          LA         1        15
5    77    C         2          NY         2        12
6    77    D         3          LA         1        11  

The rankings are to be determined as follows:

DEPT_RANK for ID 99 is 1 for A because sum(LOAD) = 22 is the max
LOC_RANK for ID 99 is 1 for LA because sum(LOAD) = 15 is the max

At first glance, that request seems to be reasonably easy to accomplish.  Let’s start by creating a table with the sample data (ideally, the OP should have provided the DDL and DML to create and populate this table):

DROP TABLE T1 PURGE;

CREATE TABLE T1 (
  TX NUMBER,
  ID NUMBER,
  DEPT VARCHAR2(1),
  LOCATION VARCHAR2(2),
  LOAD NUMBER);

INSERT INTO T1 VALUES (1,99,'A','NY',12);
INSERT INTO T1 VALUES (2,99,'A','LA',10);
INSERT INTO T1 VALUES (3,99,'B','LA',05);
INSERT INTO T1 VALUES (4,77,'B','LA',15);
INSERT INTO T1 VALUES (5,77,'C','NY',12);
INSERT INTO T1 VALUES (6,77,'D','LA',11); 

The first step, if we were to think about creating the solution in logical steps, is to find a way to calculate the SUM values that were mentioned by the OP.  So, as a starting point, we might try this:

SELECT
  TX,
  ID,
  DEPT,
  SUM(LOAD) OVER (PARTITION BY ID,DEPT) SUM_LOAD_ID,
  LOCATION,
  SUM(LOAD) OVER (PARTITION BY ID,LOCATION) SUM_LOAD_LOCATION,
  LOAD
FROM
  T1
ORDER BY
  TX;

 TX  ID D SUM_LOAD_ID LO SUM_LOAD_LOCATION  LOAD
--- --- - ----------- -- ----------------- -----
  1  99 A          22 NY                12    12
  2  99 A          22 LA                15    10
  3  99 B           5 LA                15     5
  4  77 B          15 LA                26    15
  5  77 C          12 NY                12    12
  6  77 D          11 LA                26    11 

If I am understanding the OP’s request correctly, the above is a good starting point (even though the alias for the first analytic function could have been better selected).

We are then able to take the above SQL statement and push it into an inline view to hopefully produce the output that is expected by the OP  (note that the PARTITION clause differs for the LOC_RANK column from what is specified in the inline view for the function that is used to create that column):

SELECT
  TX,
  ID,
  DEPT,
  RANK() OVER (PARTITION BY ID,DEPT ORDER BY SUM_LOAD_ID DESC) DEPT_RANK,
  LOCATION,
  RANK() OVER (PARTITION BY ID,DEPT ORDER BY SUM_LOAD_LOCATION DESC) LOC_RANK,
  LOAD
FROM
  (SELECT
    TX,
    ID,
    DEPT,
    SUM(LOAD) OVER (PARTITION BY ID,DEPT) SUM_LOAD_ID,
    LOCATION,
    SUM(LOAD) OVER (PARTITION BY ID,LOCATION) SUM_LOAD_LOCATION,
    LOAD
  FROM
    T1)
ORDER BY
  TX;

 TX  ID D  DEPT_RANK LO   LOC_RANK  LOAD
--- --- - ---------- -- ---------- -----
  1  99 A          1 NY          2    12
  2  99 A          1 LA          1    10
  3  99 B          1 LA          1     5
  4  77 B          1 LA          1    15
  5  77 C          1 NY          1    12
  6  77 D          1 LA          1    11 

Let’s compare the above output with what the OP requested:

TX   ID   DEPT   DEPT_RANK   LOCATION   LOC_RANK   LOAD
1    99    A         1          NY         2        12
2    99    A         1          LA         1        10
3    99    B         2          LA         1        05
4    77    B         1          LA         1        15
5    77    C         2          NY         2        12
6    77    D         3          LA         1        11

Almost for the LOC_RANK column (not even close for the DEPT_RANK column), but not quite right.  The problem is that when attempting to calculate the RANK columns in the above output, we should only PARTITION on the ID column, not the ID column and some other column, as was the case when we used the SUM analytic function.

Let’s fix the PARTITION clause and try again:

SELECT
  TX,
  ID,
  DEPT,
  RANK() OVER (PARTITION BY ID ORDER BY SUM_LOAD_ID DESC) DEPT_RANK,
  LOCATION,
  RANK() OVER (PARTITION BY ID ORDER BY SUM_LOAD_LOCATION DESC) LOC_RANK,
  LOAD
FROM
  (SELECT
    TX,
    ID,
    DEPT,
    SUM(LOAD) OVER (PARTITION BY ID,DEPT) SUM_LOAD_ID,
    LOCATION,
    SUM(LOAD) OVER (PARTITION BY ID,LOCATION) SUM_LOAD_LOCATION,
    LOAD
  FROM
    T1)
ORDER BY
  TX;

 TX  ID D  DEPT_RANK LO   LOC_RANK  LOAD
--- --- - ---------- -- ---------- -----
  1  99 A          1 NY          3    12
  2  99 A          1 LA          1    10
  3  99 B          3 LA          1     5
  4  77 B          1 LA          1    15
  5  77 C          2 NY          3    12
  6  77 D          3 LA          1    11 

Let’s compare the above output with what the OP requested:

TX   ID   DEPT   DEPT_RANK   LOCATION   LOC_RANK   LOAD
1    99    A         1          NY         2        12
2    99    A         1          LA         1        10
3    99    B         2          LA         1        05
4    77    B         1          LA         1        15
5    77    C         2          NY         2        12
6    77    D         3          LA         1        11

Notice in the above that when two rows have the same SUM_LOAD_ value, the displayed rank is correct, but that repeated rank value then causes a rank value to be skipped (compare the DEPT_RANK column value on row 3, the LOC_RANK column value on row 1, and the LOC_RANK column value on row 5).

Now what?  The ROW_NUMBER function could be used to produce sequential rank numbers without gaps, for example:

SELECT
  TX,
  ID,
  DEPT,
  ROW_NUMBER() OVER (PARTITION BY ID ORDER BY SUM_LOAD_ID DESC) DEPT_RANK,
  LOCATION,
  ROW_NUMBER() OVER (PARTITION BY ID ORDER BY SUM_LOAD_LOCATION DESC) LOC_RANK,
  LOAD
FROM
  (SELECT
    TX,
    ID,
    DEPT,
    SUM(LOAD) OVER (PARTITION BY ID,DEPT) SUM_LOAD_ID,
    LOCATION,
    SUM(LOAD) OVER (PARTITION BY ID,LOCATION) SUM_LOAD_LOCATION,
    LOAD
  FROM
    T1)
ORDER BY
  TX;

 TX  ID D  DEPT_RANK LO   LOC_RANK  LOAD
--- --- - ---------- -- ---------- -----
  1  99 A          1 NY          3    12
  2  99 A          2 LA          2    10
  3  99 B          3 LA          1     5
  4  77 B          1 LA          1    15
  5  77 C          2 NY          3    12
  6  77 D          3 LA          2    11 

The above output, as mentioned, does not match the output requested by the OP, since the OP’s requested output specifes that equal values for different rows should show the same rank value.

One more try using the DENSE_RANK analytic function:

SELECT
  TX,
  ID,
  DEPT,
  DENSE_RANK() OVER (PARTITION BY ID ORDER BY SUM_LOAD_ID DESC) DEPT_RANK,
  LOCATION,
  DENSE_RANK() OVER (PARTITION BY ID ORDER BY SUM_LOAD_LOCATION DESC) LOC_RANK,
  LOAD
FROM
  (SELECT
    TX,
    ID,
    DEPT,
    SUM(LOAD) OVER (PARTITION BY ID,DEPT) SUM_LOAD_ID,
    LOCATION,
    SUM(LOAD) OVER (PARTITION BY ID,LOCATION) SUM_LOAD_LOCATION,
    LOAD
  FROM
    T1)
ORDER BY
  TX;

 TX  ID D  DEPT_RANK LO   LOC_RANK  LOAD
--- --- - ---------- -- ---------- -----
  1  99 A          1 NY          2    12
  2  99 A          1 LA          1    10
  3  99 B          2 LA          1     5
  4  77 B          1 LA          1    15
  5  77 C          2 NY          2    12
  6  77 D          3 LA          1    11 

Let’s compare the above output with what the OP requested:

TX   ID   DEPT   DEPT_RANK   LOCATION   LOC_RANK   LOAD
1    99    A         1          NY         2        12
2    99    A         1          LA         1        10
3    99    B         2          LA         1        05
4    77    B         1          LA         1        15
5    77    C         2          NY         2        12
6    77    D         3          LA         1        11

As best as I am able to determine, the above SQL statement will satisfy the OP’s request.

—–

Part 2 of the Challenge

If the OP has the following SQL statement:

SELECT
  LOAD_YEAR,
  ORG_UNIT_CODE,
  RANK () OVER (PARTITION BY LOAD_YEAR, CLASSIF_CODE ORDER BY SUM (FTE_DAYS) DESC) ORG_RANK,
  CLASSIF_CODE,
  RANK () OVER (PARTITION BY LOAD_YEAR, ORG_UNIT_CODE ORDER BY SUM (FTE_DAYS) DESC) CLASSIF_RANK,
  SUM (FTE_DAYS) FTE
FROM
  STAFF_LOAD
GROUP BY
  LOAD_YEAR,
  ORG_UNIT_CODE,
  CLASSIF_CODE;  

And the above SQL statement produces the following output:

YEAR ORG_UNIT_CODE ORG_RANK CLASSIF_CODE CLASSIF_RANK FTE
2010 A46                  1 HEW3                    1  59
2010 A42                  2 HEW3                    1  13
2010 A42                  1 HEW4                    1  13
2010 A46                  2 HEW4                    2  12

And the OP wants the output to look like this:

YEAR ORG_UNIT_CODE ORG_RANK CLASSIF_CODE CLASSIF_RANK FTE
2010 A46                  1 HEW3                    1  59
2010 A42                  2 HEW3                    1  13
2010 A42                  2 HEW4                    2  13
2010 A46                  1 HEW4                    2  12  

Write the DDL and DML statements to create the source table and populate it with the non-aggregated original data, and then produce the output requested by the OP (without looking at the updated usenet thread).


Actions

Information

2 responses

18 11 2011
Log Buffer #246, A Carnival of the Vanities for DBAs | The Pythian Blog

[…] Charles Hooper blogs about matching the expected output – Analytic RANK, ROW_NUMBER, DENSE_RANK, or Something Different? […]

18 11 2011
Sid

hi Charles,

Nice example!
for the part 2, to make it easy, 4 rows in the staff_load is enough to show the recursive analytic function usage. Here comes my example(not sure how to format the code in the comment)
1

oe@CS10G> drop table STAFF_LOAD purge;

Table dropped.

oe@CS10G> create table STAFF_LOAD
  2  (
  3  load_year number,
  4  org_unit_code varchar2(10),
  5  classif_code varchar2(10),
  6  fte_days number
  7  );

Table created.

oe@CS10G> 
oe@CS10G> pause;
oe@CS10G> insert into staff_load values(2010,'A46','HEW3',59);

1 row created.

oe@CS10G> insert into staff_load values(2010,'A42','HEW3',13);

1 row created.

oe@CS10G> insert into staff_load values(2010,'A42','HEW4',13);

1 row created.

oe@CS10G> insert into staff_load values(2010,'A46','HEW4',12);

1 row created.

oe@CS10G> commit ;

Commit complete.

oe@CS10G> 
oe@CS10G> pause;
 
oe@CS10G> SELECT
  2    LOAD_YEAR,
  3    ORG_UNIT_CODE,
  4    --ORG_SUM,
  5    DENSE_RANK() OVER (PARTITION BY LOAD_YEAR ORDER BY ORG_SUM DESC) ORG_RANK,
  6    CLASSIF_CODE,
  7    --CLASSIF_SUM,
  8    DENSE_RANK() OVER (PARTITION BY LOAD_YEAR ORDER BY CLASSIF_SUM DESC) CLASSIF_RANK,
  9    FTE
 10  FROM
 11  (
 12  SELECT
 13    LOAD_YEAR,
 14    ORG_UNIT_CODE,
 15    SUM(SUM(FTE_DAYS)) OVER (PARTITION BY LOAD_YEAR, ORG_UNIT_CODE) ORG_SUM,
 16    CLASSIF_CODE,
 17    SUM(SUM(FTE_DAYS)) OVER (PARTITION BY LOAD_YEAR, CLASSIF_CODE) CLASSIF_SUM,
 18    SUM (FTE_DAYS) FTE
 19  FROM
 20    STAFF_LOAD
 21  GROUP BY
 22    LOAD_YEAR,
 23    ORG_UNIT_CODE,
 24    CLASSIF_CODE)
 25  order by FTE desc, org_unit_code asc
 26  ;

 LOAD_YEAR ORG_UNIT_C	ORG_RANK CLASSIF_CO CLASSIF_RANK	FTE
---------- ---------- ---------- ---------- ------------ ----------
      2010 A46		       1 HEW3		       1	 59
      2010 A42		       2 HEW3		       1	 13
      2010 A42		       2 HEW4		       2	 13
      2010 A46		       1 HEW4		       2	 12

Leave a comment