SQL – RANK, MAX Analytical Functions, DECODE, SIGN

26 12 2009

December 26, 2009

A couple years ago the following question appeared on the comp.databases.oracle.misc Usenet group:
http://groups.google.com/group/comp.databases.oracle.misc/browse_thread/thread/22b36e3ca18490db

Table Structure:

Table1
symbol  orders  ttime

Requirement: Want to arrange all records, symbolwise, based on orders (asc order).
Among that, if a particular symbol have records in the range TTIME BETWEEN 9300 AND 1530, then I want to extract MIN(TTIME) within that range else MIN(TTIME) of whatever available records.

I want to achieve this using a single query.

Example:

Table1
symbol  orders  ttime
A       2       9300
A       2       9450
A       2       1030
A       2       9451
A       2       1530
A       2       1600
A       2       1700
B       3       9300
B       4       1600
C       3       1600

I want to get all records with maximum orders (in desc order) for each symbol.

Output:
Symbol  Orders  ttime
A       2       9300
A       2       9450
A       2       9451
A       2       1030
A       2       1530
A       2       1600
A       1       9300
A       2       1700
B       4       9300
B       4       1600
C       3       1600

Out of this subset I want to get all records with ttime falling between 9450 to 1530 to appear first in asc. if there is no record within this range then I want to go for normal asc order on ttime.

Ouput:
Symbol  Orders  ttime
A       2       9450
A       2       1030
A       2       1530
A       2       1600
A       2       9300
B       4       9450
B       4       1030
B       4       1600
C       3       1600

Finally I want to extract only first record

Final output:
A       2       9450
B       4       9300
C       3       1600

Are we dealing with time here?  When is 9300 less than 1530?  Why is 1 included in the above?

The set up:

CREATE TABLE T1 (
  SYMBOL CHAR(1),
  ORDERS NUMBER(10),
  TTIME NUMBER(10));

INSERT INTO T1 VALUES('A',2,9300);
INSERT INTO T1 VALUES('A',2,9450);
INSERT INTO T1 VALUES('A',2,10300);
INSERT INTO T1 VALUES('A',2,9451);
INSERT INTO T1 VALUES('A',2,15300);
INSERT INTO T1 VALUES('A',2,16000);
INSERT INTO T1 VALUES('A',2,17000);
INSERT INTO T1 VALUES('B',3,9300);
INSERT INTO T1 VALUES('B',4,16000);
INSERT INTO T1 VALUES('C',3,16000);

First, let’s find the maximum value for ORDERS for each SYMBOL:

SELECT
  SYMBOL,
  MAX(ORDERS) OVER (PARTITION BY SYMBOL) ORDERS,
  TTIME TTIME
FROM
  T1;

SYMBOL ORDERS  TTIME
A           2   9300
A           2   9450
A           2  10300
A           2   9451
A           2  15300
A           2  17000
A           2  16000
B           4  16000
B           4   9300
C           3  16000

You stated that if TTIME is between 9450 and 1530 (should it be 15300?), that it should take priority over other values of TTIME.  The DECODE below determines if TTIME is between 9450 and 15300, if so it assigns a value of 10 to EXTRA_SORT, otherwise it assigns a value of 1 to EXTRA_SORT:

SELECT
  SYMBOL,
  MAX(ORDERS) OVER (PARTITION BY SYMBOL) ORDERS,
  TTIME TTIME,
  DECODE(SIGN(TTIME-9449.999),1,DECODE(SIGN(TTIME-15300.0001),-1,10,1),1) EXTRA_SORT
FROM
  T1;

SYMBOL ORDERS  TTIME  EXTRA_SORT
A           2   9300           1
A           2   9450          10
A           2  10300          10
A           2   9451          10
A           2  15300          10
A           2  17000           1
A           2  16000           1
B           4  16000           1
B           4   9300           1
C           3  16000           1

If we slide the above into an inline view, we can then rank the rows when sorted first on EXTRA_SORT and then on TTIME:

SELECT
  SYMBOL,
  ORDERS,
  TTIME,
  RANK() OVER (PARTITION BY SYMBOL ORDER BY EXTRA_SORT DESC,TTIME) POSITION
FROM
  (SELECT
    SYMBOL,
    MAX(ORDERS) OVER (PARTITION BY SYMBOL) ORDERS,
    TTIME TTIME,
    DECODE(SIGN(TTIME-9449.999),1,DECODE(SIGN(TTIME-15300.0001),-1,10,1),1) EXTRA_SORT
  FROM
    T1);

SYMBOL ORDERS  TTIME  POSITION
A           2   9450         1
A           2   9451         2
A           2  10300         3
A           2  15300         4
A           2   9300         5
A           2  16000         6
A           2  17000         7
B           4   9300         1
B           4  16000         2
C           3  16000         1

We can again slide the above into an inline view and extract only those with a POSITION value of 1:

SELECT
  SYMBOL,
  ORDERS,
  TTIME
FROM
  (SELECT
    SYMBOL,
    ORDERS,
    TTIME,
    RANK() OVER (PARTITION BY SYMBOL ORDER BY EXTRA_SORT DESC,TTIME)
POSITION
  FROM
    (SELECT
      SYMBOL,
      MAX(ORDERS) OVER (PARTITION BY SYMBOL) ORDERS,
      TTIME TTIME,
      DECODE(SIGN(TTIME-9449.999),1,DECODE(SIGN(TTIME-15300.0001),-1,10,1),1) EXTRA_SORT
    FROM
      T1)
  )
WHERE
  POSITION=1;

SYMBOL ORDERS  TTIME
A           2   9450
B           4   9300
C           3  16000




SQL – DENSE_RANK, PERCENT_RANK, and COUNT Analytical Functions

26 12 2009

December 26, 2009

A couple years ago the following question appeared on the comp.databases.oracle.misc Usenet group:
http://groups.google.com/group/comp.databases.oracle.misc/browse_thread/thread/9af4117466316d9a

I have to following problem:

I get X rows from a statement, these are sorted by a certain column, let’s say a numerical value.
Now I want to calculate the average of this numerical value, but the 10% with the lowest and the 10% with the highest value shall not be included in this calculation. So for example, if I get 20 rows, I need the average of the value in rows 3 to 18.

Currently I solved this with a very complicated statement, but I don’t know the built-in Oracle mathematical functions so I hope that there could be a way to do this with a better performance.

Let’s set up a short experiment:

CREATE TABLE T1 (C1 NUMBER(4));

INSERT INTO T1
SELECT
  ROWNUM
FROM
  DUAL
CONNECT BY
  LEVEL<=20;

We now have a table with 20 rows with numbers between 1 and 20.

Assuming that you are running a version of Oracle that supports analytical functions, the following returns the twenty rows with the relative ranking of each row, if the rows are sorted by C1 in descending order:

SELECT
  C1,
  DENSE_RANK() OVER (ORDER BY C1 DESC) DR,
  COUNT(C1) OVER (PARTITION BY 1) R
FROM
  T1;

        C1         DR          R
---------- ---------- ----------
        20          1         20
        19          2         20
        18          3         20
        17          4         20
        16          5         20
        15          6         20
        14          7         20
        13          8         20
        12          9         20
        11         10         20
        10         11         20
         9         12         20
         8         13         20
         7         14         20
         6         15         20
         5         16         20
         4         17         20
         3         18         20
         2         19         20
         1         20         20

A slight modification of the above, dividing the value of DENSE_RANK by the value of COUNT, and also including a PERCENT_RANK for comparison:

SELECT
  C1,
  DENSE_RANK() OVER (ORDER BY C1 DESC) DR,
  (DENSE_RANK() OVER (ORDER BY C1 DESC))/(COUNT(C1) OVER (PARTITION BY 1)) DRP,
  PERCENT_RANK() OVER (ORDER BY C1 DESC) PR
FROM
  T1;

        C1         DR        DRP         PR
---------- ---------- ---------- ----------
        20          1        .05          0
        19          2         .1 .052631579
        18          3        .15 .105263158
        17          4         .2 .157894737
        16          5        .25 .210526316
        15          6         .3 .263157895
        14          7        .35 .315789474
        13          8         .4 .368421053
        12          9        .45 .421052632
        11         10         .5 .473684211
        10         11        .55 .526315789
         9         12         .6 .578947368
         8         13        .65 .631578947
         7         14         .7 .684210526
         6         15        .75 .736842105
         5         16         .8 .789473684
         4         17        .85 .842105263
         3         18         .9 .894736842
         2         19        .95 .947368421
         1         20          1          1

The final cleanup is performed when the above is slid into an inline view, by using a WHERE clause:

SELECT
  SUM(T.C1) S
FROM
  (SELECT
    C1,
    DENSE_RANK() OVER (ORDER BY C1 DESC) DR,
    (DENSE_RANK() OVER (ORDER BY C1 DESC))/(COUNT(C1) OVER (PARTITION BY 1)) DRP,
    PERCENT_RANK() OVER (ORDER BY C1 DESC) PR
  FROM
    T1) T
WHERE
  T.DRP>0.1
  AND T.DRP<=0.9;

         S
———-
       168

A version that uses the PERCENT_RANK value:

SELECT
  SUM(T.C1) S
FROM
  (SELECT
    C1,
    DENSE_RANK() OVER (ORDER BY C1 DESC) DR,
    (DENSE_RANK() OVER (ORDER BY C1 DESC))/(COUNT(C1) OVER (PARTITION BY 1)) DRP,
    PERCENT_RANK() OVER (ORDER BY C1 DESC) PR
  FROM
    T1) T
WHERE
  T.PR BETWEEN 0.1 AND 0.9;

         S
———-
       168





SQL – COUNT Analytical Function, GROUP BY, HAVING

26 12 2009

December 26, 2009

A couple years ago the following question appeared on the comp.databases.oracle.misc Usenet group: http://groups.google.com/group/comp.databases.oracle.misc/browse_thread/thread/93bf8d1e75033d4c

I have a book table and in that table it has the book tile, publisher, and type of book it is. example mystery, scifi, etc…

I am trying to write a query that brings back a list of every pair of books that have the same publisher and same book type.  I have been able to get the following code to work:

select publisher_code, type
from book
group by publisher_code, type
having count(*) > 1;

which returns the following results:

PU TYP
-- ---
JP MYS
LB FIC
PE FIC
PL FIC
ST SFI
VB FIC

I can not figure out how to get the book title and book code for the books that this result list represents, everything I have tried throws out an error.

My initial response follows:

I see two possible methods:

  1.  Slide the SQL statement that you have written into an inline view, join the inline view to your book table, and then use the publisher_code, type columns to drive back into your book table.  The join syntax may look like one of the following: (publisher_code, type) IN (SELECT…)   or   b.publisher_code=ib.publisher_code and b.type=ib.type 
  2.  Use analytical functions (COUNT() OVER…) to determine the number of matches for the same publisher_code, type columns.  Then slide this SQL statement into an inline view to retrieve only those records with the aliased COUNT() OVER greater than 1.  This has the benefit of retrieving the matching rows in a single pass.

The original poster then attempted to create a query to meet the requirements, but the query generated an error:

SQL> select title
  2  from book
  3  where publisher_code, type in
  4  (select publisher_code, type
  5  from book
  6  group by publisher_code, type
  7  having count(*) > 1);
where publisher_code, type in
                    *
ERROR at line 3:
ORA-00920: invalid relational operator

My reponse continues:

Very close to what you need.  However, Oracle expects the column names to be wrapped in () … like this:  where (publisher_code, type) in

The above uses a subquery, which may perform slow on some Oracle releases compared to the use of an inline view.  Assume that I have a table named PART, which has columns ID, DESCRIPITION, PRODUCT_CODE, and COMMODITY_CODE, with ID as the primary key.  I want to find ID, DESCRIPTION, and COMMODITY_CODE for all parts with the same DESCRIPTION and PRODUCT_CODE, where there are at least 3 matching parts in the group:

The starting point, which looks similar to your initial query:

SELECT
  DESCRIPTION,
  PRODUCT_CODE,
  COUNT(*) NUM_MATCHES
FROM
  PART
GROUP BY
  DESCRIPTION,
  PRODUCT_CODE
HAVING
  COUNT(*)>=3;

When the original query is slid into an inline view and joined to the original table, it looks like this:

SELECT
  P.ID,
  P.DESCRIPTION,
  P.COMMODITY_CODE
FROM
  (SELECT
    DESCRIPTION,
    PRODUCT_CODE,
    COUNT(*) NUM_MATCHES
  FROM
    PART
  GROUP BY
    DESCRIPTION,
    PRODUCT_CODE
  HAVING
    COUNT(*)>=3) IP,
  PART P
WHERE
  IP.DESCRIPTION=P.DESCRIPTION
  AND IP.PRODUCT_CODE=P.PRODUCT_CODE;

Here is the DBMS_XPLAN:

-------------------------------------------------------------------------------------------------------------------
| Id  | Operation             | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-------------------------------------------------------------------------------------------------------------------
|*  1 |  HASH JOIN            |      |      1 |   1768 |  11525 |00:00:00.21 |    2748 |  1048K|  1048K| 1293K (0)|
|   2 |   VIEW                |      |      1 |   1768 |   1156 |00:00:00.11 |    1319 |       |       |          |
|*  3 |    FILTER             |      |      1 |        |   1156 |00:00:00.11 |    1319 |       |       |          |
|   4 |     HASH GROUP BY     |      |      1 |   1768 |  23276 |00:00:00.08 |    1319 |       |       |          |
|   5 |      TABLE ACCESS FULL| PART |      1 |  35344 |  35344 |00:00:00.04 |    1319 |       |       |          |
|   6 |   TABLE ACCESS FULL   | PART |      1 |  35344 |  35344 |00:00:00.04 |    1429 |       |       |          |
-------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("IP"."DESCRIPTION"="P"."DESCRIPTION" AND "IP"."PRODUCT_CODE"="P"."PRODUCT_CODE")
   3 - filter(COUNT(*)>=3)

The query format using the subquery looks like this:

SELECT
  P.ID,
  P.DESCRIPTION,
  P.COMMODITY_CODE
FROM
  PART P
WHERE
  (DESCRIPTION,PRODUCT_CODE) IN
  (SELECT
    DESCRIPTION,
    PRODUCT_CODE
  FROM
    PART
  GROUP BY
    DESCRIPTION,
    PRODUCT_CODE
  HAVING
    COUNT(*)>=3);

The DBMS_XPLAN, note that Oracle 10.2.0.2 transformed the query above:

-----------------------------------------------------------------------------------------------------------------------
| Id  | Operation             | Name     | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------
|*  1 |  HASH JOIN RIGHT SEMI |          |      1 |      1 |  11525 |00:00:00.21 |    2748 |  1048K|  1048K| 1214K (0)|
|   2 |   VIEW                | VW_NSO_1 |      1 |   1768 |   1156 |00:00:00.12 |    1319 |       |       |          |
|*  3 |    FILTER             |          |      1 |        |   1156 |00:00:00.12 |    1319 |       |       |          |
|   4 |     HASH GROUP BY     |          |      1 |   1768 |  23276 |00:00:00.09 |    1319 |       |       |          |
|   5 |      TABLE ACCESS FULL| PART     |      1 |  35344 |  35344 |00:00:00.04 |    1319 |       |       |          |
|   6 |   TABLE ACCESS FULL   | PART     |      1 |  35344 |  35344 |00:00:00.01 |    1429 |       |       |          |
-----------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("DESCRIPTION"="$nso_col_1" AND "PRODUCT_CODE"="$nso_col_2")
   3 - filter(COUNT(*)>=3)

Without allowing the automatic transformations in Oracle 10.2.0.2, the query takes _much_ longer than 0.21 seconds to complete.

The method using analytical functions starts like this:

SELECT
  P.ID,
  P.DESCRIPTION,
  P.COMMODITY_CODE,
  COUNT(*) OVER (PARTITION BY DESCRIPTION, PRODUCT_CODE) NUM_MATCHES
FROM
  PART P;

Then, sliding the above into an inline view:

SELECT
  ID,
  DESCRIPTION,
  COMMODITY_CODE
FROM
  (SELECT
    P.ID,
    P.DESCRIPTION,
    P.COMMODITY_CODE,
    COUNT(*) OVER (PARTITION BY DESCRIPTION, PRODUCT_CODE) NUM_MATCHES
  FROM
    PART P)
WHERE
  NUM_MATCHES>=3;

The DBMS_XPLAN for the above looks like this:

-----------------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------
|*  1 |  VIEW               |      |      1 |  35344 |  11525 |00:00:00.31 |    1319 |       |       |          |
|   2 |   WINDOW SORT       |      |      1 |  35344 |  35344 |00:00:00.27 |    1319 |  2533K|   726K| 2251K (0)|
|   3 |    TABLE ACCESS FULL| PART |      1 |  35344 |  35344 |00:00:00.04 |    1319 |       |       |          |
-----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("NUM_MATCHES">=3)

Note that there is only one TABLE ACCESS FULL of the PART table in the above.  The execution time required 0.31 seconds to complete, which is greater than the first two approaches, but that is because the database server is concurrently still trying to resolve the query method using the subquery with no permitted transformations (5+ minutes later).

Subquery method with no transformations permitted:

---------------------------------------------------------------------------------------
| Id  | Operation            | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------
|*  1 |  FILTER              |      |      1 |        |  11525 |00:46:21.46 |      38M|
|   2 |   TABLE ACCESS FULL  | PART |      1 |  35344 |  35344 |00:00:00.25 |    1429 |
|*  3 |   FILTER             |      |  29474 |        |   6143 |00:46:06.52 |      38M|
|   4 |    HASH GROUP BY     |      |  29474 |      1 |    613M|00:33:24.30 |      38M|
|   5 |     TABLE ACCESS FULL| PART |  29474 |  35344 |   1041M|00:00:02.54 |      38M|
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter( IS NOT NULL)
   3 - filter(("DESCRIPTION"=:B1 AND "PRODUCT_CODE"=:B2 AND COUNT(*)>=3))

Maxim Demenko provided another possible solution for the problem experienced by the original poster.