November 16, 2011
I noticed an interesting request on the comp.databases.oracle.misc usenet group, and I thought that I would make an attempt at solving the request. The original poster (OP) stated that he had a table with data similar to the following:
TX ID DEPT LOCATION LOAD 1 99 A NY 12 2 99 A LA 10 3 99 B LA 05 4 77 B LA 15 5 77 C NY 12 6 77 D LA 11
He would like to obtain the following output:
TX ID DEPT DEPT_RANK LOCATION LOC_RANK LOAD 1 99 A 1 NY 2 12 2 99 A 1 LA 1 10 3 99 B 2 LA 1 05 4 77 B 1 LA 1 15 5 77 C 2 NY 2 12 6 77 D 3 LA 1 11
The rankings are to be determined as follows:
DEPT_RANK for ID 99 is 1 for A because sum(LOAD) = 22 is the max
LOC_RANK for ID 99 is 1 for LA because sum(LOAD) = 15 is the max
At first glance, that request seems to be reasonably easy to accomplish. Let’s start by creating a table with the sample data (ideally, the OP should have provided the DDL and DML to create and populate this table):
DROP TABLE T1 PURGE; CREATE TABLE T1 ( TX NUMBER, ID NUMBER, DEPT VARCHAR2(1), LOCATION VARCHAR2(2), LOAD NUMBER); INSERT INTO T1 VALUES (1,99,'A','NY',12); INSERT INTO T1 VALUES (2,99,'A','LA',10); INSERT INTO T1 VALUES (3,99,'B','LA',05); INSERT INTO T1 VALUES (4,77,'B','LA',15); INSERT INTO T1 VALUES (5,77,'C','NY',12); INSERT INTO T1 VALUES (6,77,'D','LA',11);
The first step, if we were to think about creating the solution in logical steps, is to find a way to calculate the SUM values that were mentioned by the OP. So, as a starting point, we might try this:
SELECT TX, ID, DEPT, SUM(LOAD) OVER (PARTITION BY ID,DEPT) SUM_LOAD_ID, LOCATION, SUM(LOAD) OVER (PARTITION BY ID,LOCATION) SUM_LOAD_LOCATION, LOAD FROM T1 ORDER BY TX; TX ID D SUM_LOAD_ID LO SUM_LOAD_LOCATION LOAD --- --- - ----------- -- ----------------- ----- 1 99 A 22 NY 12 12 2 99 A 22 LA 15 10 3 99 B 5 LA 15 5 4 77 B 15 LA 26 15 5 77 C 12 NY 12 12 6 77 D 11 LA 26 11
If I am understanding the OP’s request correctly, the above is a good starting point (even though the alias for the first analytic function could have been better selected).
We are then able to take the above SQL statement and push it into an inline view to hopefully produce the output that is expected by the OP (note that the PARTITION clause differs for the LOC_RANK column from what is specified in the inline view for the function that is used to create that column):
SELECT TX, ID, DEPT, RANK() OVER (PARTITION BY ID,DEPT ORDER BY SUM_LOAD_ID DESC) DEPT_RANK, LOCATION, RANK() OVER (PARTITION BY ID,DEPT ORDER BY SUM_LOAD_LOCATION DESC) LOC_RANK, LOAD FROM (SELECT TX, ID, DEPT, SUM(LOAD) OVER (PARTITION BY ID,DEPT) SUM_LOAD_ID, LOCATION, SUM(LOAD) OVER (PARTITION BY ID,LOCATION) SUM_LOAD_LOCATION, LOAD FROM T1) ORDER BY TX; TX ID D DEPT_RANK LO LOC_RANK LOAD --- --- - ---------- -- ---------- ----- 1 99 A 1 NY 2 12 2 99 A 1 LA 1 10 3 99 B 1 LA 1 5 4 77 B 1 LA 1 15 5 77 C 1 NY 1 12 6 77 D 1 LA 1 11
Let’s compare the above output with what the OP requested:
TX ID DEPT DEPT_RANK LOCATION LOC_RANK LOAD 1 99 A 1 NY 2 12 2 99 A 1 LA 1 10 3 99 B 2 LA 1 05 4 77 B 1 LA 1 15 5 77 C 2 NY 2 12 6 77 D 3 LA 1 11
Almost for the LOC_RANK column (not even close for the DEPT_RANK column), but not quite right. The problem is that when attempting to calculate the RANK columns in the above output, we should only PARTITION on the ID column, not the ID column and some other column, as was the case when we used the SUM analytic function.
Let’s fix the PARTITION clause and try again:
SELECT TX, ID, DEPT, RANK() OVER (PARTITION BY ID ORDER BY SUM_LOAD_ID DESC) DEPT_RANK, LOCATION, RANK() OVER (PARTITION BY ID ORDER BY SUM_LOAD_LOCATION DESC) LOC_RANK, LOAD FROM (SELECT TX, ID, DEPT, SUM(LOAD) OVER (PARTITION BY ID,DEPT) SUM_LOAD_ID, LOCATION, SUM(LOAD) OVER (PARTITION BY ID,LOCATION) SUM_LOAD_LOCATION, LOAD FROM T1) ORDER BY TX; TX ID D DEPT_RANK LO LOC_RANK LOAD --- --- - ---------- -- ---------- ----- 1 99 A 1 NY 3 12 2 99 A 1 LA 1 10 3 99 B 3 LA 1 5 4 77 B 1 LA 1 15 5 77 C 2 NY 3 12 6 77 D 3 LA 1 11
Let’s compare the above output with what the OP requested:
TX ID DEPT DEPT_RANK LOCATION LOC_RANK LOAD 1 99 A 1 NY 2 12 2 99 A 1 LA 1 10 3 99 B 2 LA 1 05 4 77 B 1 LA 1 15 5 77 C 2 NY 2 12 6 77 D 3 LA 1 11
Notice in the above that when two rows have the same SUM_LOAD_ value, the displayed rank is correct, but that repeated rank value then causes a rank value to be skipped (compare the DEPT_RANK column value on row 3, the LOC_RANK column value on row 1, and the LOC_RANK column value on row 5).
Now what? The ROW_NUMBER function could be used to produce sequential rank numbers without gaps, for example:
SELECT TX, ID, DEPT, ROW_NUMBER() OVER (PARTITION BY ID ORDER BY SUM_LOAD_ID DESC) DEPT_RANK, LOCATION, ROW_NUMBER() OVER (PARTITION BY ID ORDER BY SUM_LOAD_LOCATION DESC) LOC_RANK, LOAD FROM (SELECT TX, ID, DEPT, SUM(LOAD) OVER (PARTITION BY ID,DEPT) SUM_LOAD_ID, LOCATION, SUM(LOAD) OVER (PARTITION BY ID,LOCATION) SUM_LOAD_LOCATION, LOAD FROM T1) ORDER BY TX; TX ID D DEPT_RANK LO LOC_RANK LOAD --- --- - ---------- -- ---------- ----- 1 99 A 1 NY 3 12 2 99 A 2 LA 2 10 3 99 B 3 LA 1 5 4 77 B 1 LA 1 15 5 77 C 2 NY 3 12 6 77 D 3 LA 2 11
The above output, as mentioned, does not match the output requested by the OP, since the OP’s requested output specifes that equal values for different rows should show the same rank value.
One more try using the DENSE_RANK analytic function:
SELECT TX, ID, DEPT, DENSE_RANK() OVER (PARTITION BY ID ORDER BY SUM_LOAD_ID DESC) DEPT_RANK, LOCATION, DENSE_RANK() OVER (PARTITION BY ID ORDER BY SUM_LOAD_LOCATION DESC) LOC_RANK, LOAD FROM (SELECT TX, ID, DEPT, SUM(LOAD) OVER (PARTITION BY ID,DEPT) SUM_LOAD_ID, LOCATION, SUM(LOAD) OVER (PARTITION BY ID,LOCATION) SUM_LOAD_LOCATION, LOAD FROM T1) ORDER BY TX; TX ID D DEPT_RANK LO LOC_RANK LOAD --- --- - ---------- -- ---------- ----- 1 99 A 1 NY 2 12 2 99 A 1 LA 1 10 3 99 B 2 LA 1 5 4 77 B 1 LA 1 15 5 77 C 2 NY 2 12 6 77 D 3 LA 1 11
Let’s compare the above output with what the OP requested:
TX ID DEPT DEPT_RANK LOCATION LOC_RANK LOAD 1 99 A 1 NY 2 12 2 99 A 1 LA 1 10 3 99 B 2 LA 1 05 4 77 B 1 LA 1 15 5 77 C 2 NY 2 12 6 77 D 3 LA 1 11
As best as I am able to determine, the above SQL statement will satisfy the OP’s request.
—–
Part 2 of the Challenge
If the OP has the following SQL statement:
SELECT LOAD_YEAR, ORG_UNIT_CODE, RANK () OVER (PARTITION BY LOAD_YEAR, CLASSIF_CODE ORDER BY SUM (FTE_DAYS) DESC) ORG_RANK, CLASSIF_CODE, RANK () OVER (PARTITION BY LOAD_YEAR, ORG_UNIT_CODE ORDER BY SUM (FTE_DAYS) DESC) CLASSIF_RANK, SUM (FTE_DAYS) FTE FROM STAFF_LOAD GROUP BY LOAD_YEAR, ORG_UNIT_CODE, CLASSIF_CODE;
And the above SQL statement produces the following output:
YEAR ORG_UNIT_CODE ORG_RANK CLASSIF_CODE CLASSIF_RANK FTE 2010 A46 1 HEW3 1 59 2010 A42 2 HEW3 1 13 2010 A42 1 HEW4 1 13 2010 A46 2 HEW4 2 12
And the OP wants the output to look like this:
YEAR ORG_UNIT_CODE ORG_RANK CLASSIF_CODE CLASSIF_RANK FTE 2010 A46 1 HEW3 1 59 2010 A42 2 HEW3 1 13 2010 A42 2 HEW4 2 13 2010 A46 1 HEW4 2 12
Write the DDL and DML statements to create the source table and populate it with the non-aggregated original data, and then produce the output requested by the OP (without looking at the updated usenet thread).
[…] Charles Hooper blogs about matching the expected output – Analytic RANK, ROW_NUMBER, DENSE_RANK, or Something Different? […]
hi Charles,
Nice example!
for the part 2, to make it easy, 4 rows in the staff_load is enough to show the recursive analytic function usage. Here comes my example(not sure how to format the code in the comment)
1