## SQL – RANK, MAX Analytical Functions, DECODE, SIGN

26 12 2009

December 26, 2009

A couple years ago the following question appeared on the comp.databases.oracle.misc Usenet group:

Table Structure:

```Table1
symbol  orders  ttime```

Requirement: Want to arrange all records, symbolwise, based on orders (asc order).
Among that, if a particular symbol have records in the range TTIME BETWEEN 9300 AND 1530, then I want to extract MIN(TTIME) within that range else MIN(TTIME) of whatever available records.

I want to achieve this using a single query.

Example:

```Table1
symbol  orders  ttime
A       2       9300
A       2       9450
A       2       1030
A       2       9451
A       2       1530
A       2       1600
A       2       1700
B       3       9300
B       4       1600
C       3       1600```

I want to get all records with maximum orders (in desc order) for each symbol.

```Output:
Symbol  Orders  ttime
A       2       9300
A       2       9450
A       2       9451
A       2       1030
A       2       1530
A       2       1600
A       1       9300
A       2       1700
B       4       9300
B       4       1600
C       3       1600```

Out of this subset I want to get all records with ttime falling between 9450 to 1530 to appear first in asc. if there is no record within this range then I want to go for normal asc order on ttime.

```Ouput:
Symbol  Orders  ttime
A       2       9450
A       2       1030
A       2       1530
A       2       1600
A       2       9300
B       4       9450
B       4       1030
B       4       1600
C       3       1600```

Finally I want to extract only first record

```Final output:
A       2       9450
B       4       9300
C       3       1600```

Are we dealing with time here?  When is 9300 less than 1530?  Why is 1 included in the above?

The set up:

```CREATE TABLE T1 (
SYMBOL CHAR(1),
ORDERS NUMBER(10),
TTIME NUMBER(10));

INSERT INTO T1 VALUES('A',2,9300);
INSERT INTO T1 VALUES('A',2,9450);
INSERT INTO T1 VALUES('A',2,10300);
INSERT INTO T1 VALUES('A',2,9451);
INSERT INTO T1 VALUES('A',2,15300);
INSERT INTO T1 VALUES('A',2,16000);
INSERT INTO T1 VALUES('A',2,17000);
INSERT INTO T1 VALUES('B',3,9300);
INSERT INTO T1 VALUES('B',4,16000);
INSERT INTO T1 VALUES('C',3,16000);```

First, let’s find the maximum value for ORDERS for each SYMBOL:

```SELECT
SYMBOL,
MAX(ORDERS) OVER (PARTITION BY SYMBOL) ORDERS,
TTIME TTIME
FROM
T1;

SYMBOL ORDERS  TTIME
A           2   9300
A           2   9450
A           2  10300
A           2   9451
A           2  15300
A           2  17000
A           2  16000
B           4  16000
B           4   9300
C           3  16000```

You stated that if TTIME is between 9450 and 1530 (should it be 15300?), that it should take priority over other values of TTIME.  The DECODE below determines if TTIME is between 9450 and 15300, if so it assigns a value of 10 to EXTRA_SORT, otherwise it assigns a value of 1 to EXTRA_SORT:

```SELECT
SYMBOL,
MAX(ORDERS) OVER (PARTITION BY SYMBOL) ORDERS,
TTIME TTIME,
DECODE(SIGN(TTIME-9449.999),1,DECODE(SIGN(TTIME-15300.0001),-1,10,1),1) EXTRA_SORT
FROM
T1;

SYMBOL ORDERS  TTIME  EXTRA_SORT
A           2   9300           1
A           2   9450          10
A           2  10300          10
A           2   9451          10
A           2  15300          10
A           2  17000           1
A           2  16000           1
B           4  16000           1
B           4   9300           1
C           3  16000           1```

If we slide the above into an inline view, we can then rank the rows when sorted first on EXTRA_SORT and then on TTIME:

```SELECT
SYMBOL,
ORDERS,
TTIME,
RANK() OVER (PARTITION BY SYMBOL ORDER BY EXTRA_SORT DESC,TTIME) POSITION
FROM
(SELECT
SYMBOL,
MAX(ORDERS) OVER (PARTITION BY SYMBOL) ORDERS,
TTIME TTIME,
DECODE(SIGN(TTIME-9449.999),1,DECODE(SIGN(TTIME-15300.0001),-1,10,1),1) EXTRA_SORT
FROM
T1);

SYMBOL ORDERS  TTIME  POSITION
A           2   9450         1
A           2   9451         2
A           2  10300         3
A           2  15300         4
A           2   9300         5
A           2  16000         6
A           2  17000         7
B           4   9300         1
B           4  16000         2
C           3  16000         1```

We can again slide the above into an inline view and extract only those with a POSITION value of 1:

```SELECT
SYMBOL,
ORDERS,
TTIME
FROM
(SELECT
SYMBOL,
ORDERS,
TTIME,
RANK() OVER (PARTITION BY SYMBOL ORDER BY EXTRA_SORT DESC,TTIME)
POSITION
FROM
(SELECT
SYMBOL,
MAX(ORDERS) OVER (PARTITION BY SYMBOL) ORDERS,
TTIME TTIME,
DECODE(SIGN(TTIME-9449.999),1,DECODE(SIGN(TTIME-15300.0001),-1,10,1),1) EXTRA_SORT
FROM
T1)
)
WHERE
POSITION=1;

SYMBOL ORDERS  TTIME
A           2   9450
B           4   9300
C           3  16000```

## SQL – DENSE_RANK, PERCENT_RANK, and COUNT Analytical Functions

26 12 2009

December 26, 2009

A couple years ago the following question appeared on the comp.databases.oracle.misc Usenet group:

I have to following problem:

I get X rows from a statement, these are sorted by a certain column, let’s say a numerical value.
Now I want to calculate the average of this numerical value, but the 10% with the lowest and the 10% with the highest value shall not be included in this calculation. So for example, if I get 20 rows, I need the average of the value in rows 3 to 18.

Currently I solved this with a very complicated statement, but I don’t know the built-in Oracle mathematical functions so I hope that there could be a way to do this with a better performance.

Let’s set up a short experiment:

```CREATE TABLE T1 (C1 NUMBER(4));

INSERT INTO T1
SELECT
ROWNUM
FROM
DUAL
CONNECT BY
LEVEL<=20;```

We now have a table with 20 rows with numbers between 1 and 20.

Assuming that you are running a version of Oracle that supports analytical functions, the following returns the twenty rows with the relative ranking of each row, if the rows are sorted by C1 in descending order:

```SELECT
C1,
DENSE_RANK() OVER (ORDER BY C1 DESC) DR,
COUNT(C1) OVER (PARTITION BY 1) R
FROM
T1;

C1         DR          R
---------- ---------- ----------
20          1         20
19          2         20
18          3         20
17          4         20
16          5         20
15          6         20
14          7         20
13          8         20
12          9         20
11         10         20
10         11         20
9         12         20
8         13         20
7         14         20
6         15         20
5         16         20
4         17         20
3         18         20
2         19         20
1         20         20```

A slight modification of the above, dividing the value of DENSE_RANK by the value of COUNT, and also including a PERCENT_RANK for comparison:

```SELECT
C1,
DENSE_RANK() OVER (ORDER BY C1 DESC) DR,
(DENSE_RANK() OVER (ORDER BY C1 DESC))/(COUNT(C1) OVER (PARTITION BY 1)) DRP,
PERCENT_RANK() OVER (ORDER BY C1 DESC) PR
FROM
T1;

C1         DR        DRP         PR
---------- ---------- ---------- ----------
20          1        .05          0
19          2         .1 .052631579
18          3        .15 .105263158
17          4         .2 .157894737
16          5        .25 .210526316
15          6         .3 .263157895
14          7        .35 .315789474
13          8         .4 .368421053
12          9        .45 .421052632
11         10         .5 .473684211
10         11        .55 .526315789
9         12         .6 .578947368
8         13        .65 .631578947
7         14         .7 .684210526
6         15        .75 .736842105
5         16         .8 .789473684
4         17        .85 .842105263
3         18         .9 .894736842
2         19        .95 .947368421
1         20          1          1```

The final cleanup is performed when the above is slid into an inline view, by using a WHERE clause:

```SELECT
SUM(T.C1) S
FROM
(SELECT
C1,
DENSE_RANK() OVER (ORDER BY C1 DESC) DR,
(DENSE_RANK() OVER (ORDER BY C1 DESC))/(COUNT(C1) OVER (PARTITION BY 1)) DRP,
PERCENT_RANK() OVER (ORDER BY C1 DESC) PR
FROM
T1) T
WHERE
T.DRP>0.1
AND T.DRP<=0.9;```

S
———-
168

A version that uses the PERCENT_RANK value:

```SELECT
SUM(T.C1) S
FROM
(SELECT
C1,
DENSE_RANK() OVER (ORDER BY C1 DESC) DR,
(DENSE_RANK() OVER (ORDER BY C1 DESC))/(COUNT(C1) OVER (PARTITION BY 1)) DRP,
PERCENT_RANK() OVER (ORDER BY C1 DESC) PR
FROM
T1) T
WHERE
T.PR BETWEEN 0.1 AND 0.9;```

S
———-
168

## SQL – COUNT Analytical Function, GROUP BY, HAVING

26 12 2009

December 26, 2009

A couple years ago the following question appeared on the comp.databases.oracle.misc Usenet group: http://groups.google.com/group/comp.databases.oracle.misc/browse_thread/thread/93bf8d1e75033d4c

I have a book table and in that table it has the book tile, publisher, and type of book it is. example mystery, scifi, etc…

I am trying to write a query that brings back a list of every pair of books that have the same publisher and same book type.  I have been able to get the following code to work:

```select publisher_code, type
from book
group by publisher_code, type
having count(*) > 1;```

which returns the following results:

```PU TYP
-- ---
JP MYS
LB FIC
PE FIC
PL FIC
ST SFI
VB FIC```

I can not figure out how to get the book title and book code for the books that this result list represents, everything I have tried throws out an error.

My initial response follows:

I see two possible methods:

1.  Slide the SQL statement that you have written into an inline view, join the inline view to your book table, and then use the publisher_code, type columns to drive back into your book table.  The join syntax may look like one of the following: (publisher_code, type) IN (SELECT…)   or   b.publisher_code=ib.publisher_code and b.type=ib.type
2.  Use analytical functions (COUNT() OVER…) to determine the number of matches for the same publisher_code, type columns.  Then slide this SQL statement into an inline view to retrieve only those records with the aliased COUNT() OVER greater than 1.  This has the benefit of retrieving the matching rows in a single pass.

The original poster then attempted to create a query to meet the requirements, but the query generated an error:

```SQL> select title
2  from book
3  where publisher_code, type in
4  (select publisher_code, type
5  from book
6  group by publisher_code, type
7  having count(*) > 1);
where publisher_code, type in
*
ERROR at line 3:
ORA-00920: invalid relational operator```

My reponse continues:

Very close to what you need.  However, Oracle expects the column names to be wrapped in () … like this:  where (publisher_code, type) in

The above uses a subquery, which may perform slow on some Oracle releases compared to the use of an inline view.  Assume that I have a table named PART, which has columns ID, DESCRIPITION, PRODUCT_CODE, and COMMODITY_CODE, with ID as the primary key.  I want to find ID, DESCRIPTION, and COMMODITY_CODE for all parts with the same DESCRIPTION and PRODUCT_CODE, where there are at least 3 matching parts in the group:

The starting point, which looks similar to your initial query:

```SELECT
DESCRIPTION,
PRODUCT_CODE,
COUNT(*) NUM_MATCHES
FROM
PART
GROUP BY
DESCRIPTION,
PRODUCT_CODE
HAVING
COUNT(*)>=3;```

When the original query is slid into an inline view and joined to the original table, it looks like this:

```SELECT
P.ID,
P.DESCRIPTION,
P.COMMODITY_CODE
FROM
(SELECT
DESCRIPTION,
PRODUCT_CODE,
COUNT(*) NUM_MATCHES
FROM
PART
GROUP BY
DESCRIPTION,
PRODUCT_CODE
HAVING
COUNT(*)>=3) IP,
PART P
WHERE
IP.DESCRIPTION=P.DESCRIPTION
AND IP.PRODUCT_CODE=P.PRODUCT_CODE;```

Here is the DBMS_XPLAN:

```-------------------------------------------------------------------------------------------------------------------
| Id  | Operation             | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-------------------------------------------------------------------------------------------------------------------
|*  1 |  HASH JOIN            |      |      1 |   1768 |  11525 |00:00:00.21 |    2748 |  1048K|  1048K| 1293K (0)|
|   2 |   VIEW                |      |      1 |   1768 |   1156 |00:00:00.11 |    1319 |       |       |          |
|*  3 |    FILTER             |      |      1 |        |   1156 |00:00:00.11 |    1319 |       |       |          |
|   4 |     HASH GROUP BY     |      |      1 |   1768 |  23276 |00:00:00.08 |    1319 |       |       |          |
|   5 |      TABLE ACCESS FULL| PART |      1 |  35344 |  35344 |00:00:00.04 |    1319 |       |       |          |
|   6 |   TABLE ACCESS FULL   | PART |      1 |  35344 |  35344 |00:00:00.04 |    1429 |       |       |          |
-------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("IP"."DESCRIPTION"="P"."DESCRIPTION" AND "IP"."PRODUCT_CODE"="P"."PRODUCT_CODE")
3 - filter(COUNT(*)>=3)```

The query format using the subquery looks like this:

```SELECT
P.ID,
P.DESCRIPTION,
P.COMMODITY_CODE
FROM
PART P
WHERE
(DESCRIPTION,PRODUCT_CODE) IN
(SELECT
DESCRIPTION,
PRODUCT_CODE
FROM
PART
GROUP BY
DESCRIPTION,
PRODUCT_CODE
HAVING
COUNT(*)>=3);```

The DBMS_XPLAN, note that Oracle 10.2.0.2 transformed the query above:

```-----------------------------------------------------------------------------------------------------------------------
| Id  | Operation             | Name     | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------
|*  1 |  HASH JOIN RIGHT SEMI |          |      1 |      1 |  11525 |00:00:00.21 |    2748 |  1048K|  1048K| 1214K (0)|
|   2 |   VIEW                | VW_NSO_1 |      1 |   1768 |   1156 |00:00:00.12 |    1319 |       |       |          |
|*  3 |    FILTER             |          |      1 |        |   1156 |00:00:00.12 |    1319 |       |       |          |
|   4 |     HASH GROUP BY     |          |      1 |   1768 |  23276 |00:00:00.09 |    1319 |       |       |          |
|   5 |      TABLE ACCESS FULL| PART     |      1 |  35344 |  35344 |00:00:00.04 |    1319 |       |       |          |
|   6 |   TABLE ACCESS FULL   | PART     |      1 |  35344 |  35344 |00:00:00.01 |    1429 |       |       |          |
-----------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("DESCRIPTION"="\$nso_col_1" AND "PRODUCT_CODE"="\$nso_col_2")
3 - filter(COUNT(*)>=3)```

Without allowing the automatic transformations in Oracle 10.2.0.2, the query takes _much_ longer than 0.21 seconds to complete.

The method using analytical functions starts like this:

```SELECT
P.ID,
P.DESCRIPTION,
P.COMMODITY_CODE,
COUNT(*) OVER (PARTITION BY DESCRIPTION, PRODUCT_CODE) NUM_MATCHES
FROM
PART P;```

Then, sliding the above into an inline view:

```SELECT
ID,
DESCRIPTION,
COMMODITY_CODE
FROM
(SELECT
P.ID,
P.DESCRIPTION,
P.COMMODITY_CODE,
COUNT(*) OVER (PARTITION BY DESCRIPTION, PRODUCT_CODE) NUM_MATCHES
FROM
PART P)
WHERE
NUM_MATCHES>=3;```

The DBMS_XPLAN for the above looks like this:

```-----------------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------
|*  1 |  VIEW               |      |      1 |  35344 |  11525 |00:00:00.31 |    1319 |       |       |          |
|   2 |   WINDOW SORT       |      |      1 |  35344 |  35344 |00:00:00.27 |    1319 |  2533K|   726K| 2251K (0)|
|   3 |    TABLE ACCESS FULL| PART |      1 |  35344 |  35344 |00:00:00.04 |    1319 |       |       |          |
-----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("NUM_MATCHES">=3)

Note that there is only one TABLE ACCESS FULL of the PART table in the above.  The execution time required 0.31 seconds to complete, which is greater than the first two approaches, but that is because the database server is concurrently still trying to resolve the query method using the subquery with no permitted transformations (5+ minutes later).```

Subquery method with no transformations permitted:

```---------------------------------------------------------------------------------------
| Id  | Operation            | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------
|*  1 |  FILTER              |      |      1 |        |  11525 |00:46:21.46 |      38M|
|   2 |   TABLE ACCESS FULL  | PART |      1 |  35344 |  35344 |00:00:00.25 |    1429 |
|*  3 |   FILTER             |      |  29474 |        |   6143 |00:46:06.52 |      38M|
|   4 |    HASH GROUP BY     |      |  29474 |      1 |    613M|00:33:24.30 |      38M|
|   5 |     TABLE ACCESS FULL| PART |  29474 |  35344 |   1041M|00:00:02.54 |      38M|
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter( IS NOT NULL)
3 - filter(("DESCRIPTION"=:B1 AND "PRODUCT_CODE"=:B2 AND COUNT(*)>=3))```

Maxim Demenko provided another possible solution for the problem experienced by the original poster.