Matching the Expected Output – Analytic RANK, ROW_NUMBER, DENSE_RANK, or Something Different?

16 11 2011

November 16, 2011

I noticed an interesting request on the comp.databases.oracle.misc usenet group, and I thought that I would make an attempt at solving the request.  The original poster (OP) stated that he had a table with data similar to the following:

TX   ID   DEPT   LOCATION   LOAD
1    99    A        NY       12
2    99    A        LA       10
3    99    B        LA       05
4    77    B        LA       15
5    77    C        NY       12
6    77    D        LA       11  

He would like to obtain the following output:

TX   ID   DEPT   DEPT_RANK   LOCATION   LOC_RANK   LOAD
1    99    A         1          NY         2        12
2    99    A         1          LA         1        10
3    99    B         2          LA         1        05
4    77    B         1          LA         1        15
5    77    C         2          NY         2        12
6    77    D         3          LA         1        11  

The rankings are to be determined as follows:

DEPT_RANK for ID 99 is 1 for A because sum(LOAD) = 22 is the max
LOC_RANK for ID 99 is 1 for LA because sum(LOAD) = 15 is the max

At first glance, that request seems to be reasonably easy to accomplish.  Let’s start by creating a table with the sample data (ideally, the OP should have provided the DDL and DML to create and populate this table):

DROP TABLE T1 PURGE;

CREATE TABLE T1 (
  TX NUMBER,
  ID NUMBER,
  DEPT VARCHAR2(1),
  LOCATION VARCHAR2(2),
  LOAD NUMBER);

INSERT INTO T1 VALUES (1,99,'A','NY',12);
INSERT INTO T1 VALUES (2,99,'A','LA',10);
INSERT INTO T1 VALUES (3,99,'B','LA',05);
INSERT INTO T1 VALUES (4,77,'B','LA',15);
INSERT INTO T1 VALUES (5,77,'C','NY',12);
INSERT INTO T1 VALUES (6,77,'D','LA',11); 

The first step, if we were to think about creating the solution in logical steps, is to find a way to calculate the SUM values that were mentioned by the OP.  So, as a starting point, we might try this:

SELECT
  TX,
  ID,
  DEPT,
  SUM(LOAD) OVER (PARTITION BY ID,DEPT) SUM_LOAD_ID,
  LOCATION,
  SUM(LOAD) OVER (PARTITION BY ID,LOCATION) SUM_LOAD_LOCATION,
  LOAD
FROM
  T1
ORDER BY
  TX;

 TX  ID D SUM_LOAD_ID LO SUM_LOAD_LOCATION  LOAD
--- --- - ----------- -- ----------------- -----
  1  99 A          22 NY                12    12
  2  99 A          22 LA                15    10
  3  99 B           5 LA                15     5
  4  77 B          15 LA                26    15
  5  77 C          12 NY                12    12
  6  77 D          11 LA                26    11 

If I am understanding the OP’s request correctly, the above is a good starting point (even though the alias for the first analytic function could have been better selected).

We are then able to take the above SQL statement and push it into an inline view to hopefully produce the output that is expected by the OP  (note that the PARTITION clause differs for the LOC_RANK column from what is specified in the inline view for the function that is used to create that column):

SELECT
  TX,
  ID,
  DEPT,
  RANK() OVER (PARTITION BY ID,DEPT ORDER BY SUM_LOAD_ID DESC) DEPT_RANK,
  LOCATION,
  RANK() OVER (PARTITION BY ID,DEPT ORDER BY SUM_LOAD_LOCATION DESC) LOC_RANK,
  LOAD
FROM
  (SELECT
    TX,
    ID,
    DEPT,
    SUM(LOAD) OVER (PARTITION BY ID,DEPT) SUM_LOAD_ID,
    LOCATION,
    SUM(LOAD) OVER (PARTITION BY ID,LOCATION) SUM_LOAD_LOCATION,
    LOAD
  FROM
    T1)
ORDER BY
  TX;

 TX  ID D  DEPT_RANK LO   LOC_RANK  LOAD
--- --- - ---------- -- ---------- -----
  1  99 A          1 NY          2    12
  2  99 A          1 LA          1    10
  3  99 B          1 LA          1     5
  4  77 B          1 LA          1    15
  5  77 C          1 NY          1    12
  6  77 D          1 LA          1    11 

Let’s compare the above output with what the OP requested:

TX   ID   DEPT   DEPT_RANK   LOCATION   LOC_RANK   LOAD
1    99    A         1          NY         2        12
2    99    A         1          LA         1        10
3    99    B         2          LA         1        05
4    77    B         1          LA         1        15
5    77    C         2          NY         2        12
6    77    D         3          LA         1        11

Almost for the LOC_RANK column (not even close for the DEPT_RANK column), but not quite right.  The problem is that when attempting to calculate the RANK columns in the above output, we should only PARTITION on the ID column, not the ID column and some other column, as was the case when we used the SUM analytic function.

Let’s fix the PARTITION clause and try again:

SELECT
  TX,
  ID,
  DEPT,
  RANK() OVER (PARTITION BY ID ORDER BY SUM_LOAD_ID DESC) DEPT_RANK,
  LOCATION,
  RANK() OVER (PARTITION BY ID ORDER BY SUM_LOAD_LOCATION DESC) LOC_RANK,
  LOAD
FROM
  (SELECT
    TX,
    ID,
    DEPT,
    SUM(LOAD) OVER (PARTITION BY ID,DEPT) SUM_LOAD_ID,
    LOCATION,
    SUM(LOAD) OVER (PARTITION BY ID,LOCATION) SUM_LOAD_LOCATION,
    LOAD
  FROM
    T1)
ORDER BY
  TX;

 TX  ID D  DEPT_RANK LO   LOC_RANK  LOAD
--- --- - ---------- -- ---------- -----
  1  99 A          1 NY          3    12
  2  99 A          1 LA          1    10
  3  99 B          3 LA          1     5
  4  77 B          1 LA          1    15
  5  77 C          2 NY          3    12
  6  77 D          3 LA          1    11 

Let’s compare the above output with what the OP requested:

TX   ID   DEPT   DEPT_RANK   LOCATION   LOC_RANK   LOAD
1    99    A         1          NY         2        12
2    99    A         1          LA         1        10
3    99    B         2          LA         1        05
4    77    B         1          LA         1        15
5    77    C         2          NY         2        12
6    77    D         3          LA         1        11

Notice in the above that when two rows have the same SUM_LOAD_ value, the displayed rank is correct, but that repeated rank value then causes a rank value to be skipped (compare the DEPT_RANK column value on row 3, the LOC_RANK column value on row 1, and the LOC_RANK column value on row 5).

Now what?  The ROW_NUMBER function could be used to produce sequential rank numbers without gaps, for example:

SELECT
  TX,
  ID,
  DEPT,
  ROW_NUMBER() OVER (PARTITION BY ID ORDER BY SUM_LOAD_ID DESC) DEPT_RANK,
  LOCATION,
  ROW_NUMBER() OVER (PARTITION BY ID ORDER BY SUM_LOAD_LOCATION DESC) LOC_RANK,
  LOAD
FROM
  (SELECT
    TX,
    ID,
    DEPT,
    SUM(LOAD) OVER (PARTITION BY ID,DEPT) SUM_LOAD_ID,
    LOCATION,
    SUM(LOAD) OVER (PARTITION BY ID,LOCATION) SUM_LOAD_LOCATION,
    LOAD
  FROM
    T1)
ORDER BY
  TX;

 TX  ID D  DEPT_RANK LO   LOC_RANK  LOAD
--- --- - ---------- -- ---------- -----
  1  99 A          1 NY          3    12
  2  99 A          2 LA          2    10
  3  99 B          3 LA          1     5
  4  77 B          1 LA          1    15
  5  77 C          2 NY          3    12
  6  77 D          3 LA          2    11 

The above output, as mentioned, does not match the output requested by the OP, since the OP’s requested output specifes that equal values for different rows should show the same rank value.

One more try using the DENSE_RANK analytic function:

SELECT
  TX,
  ID,
  DEPT,
  DENSE_RANK() OVER (PARTITION BY ID ORDER BY SUM_LOAD_ID DESC) DEPT_RANK,
  LOCATION,
  DENSE_RANK() OVER (PARTITION BY ID ORDER BY SUM_LOAD_LOCATION DESC) LOC_RANK,
  LOAD
FROM
  (SELECT
    TX,
    ID,
    DEPT,
    SUM(LOAD) OVER (PARTITION BY ID,DEPT) SUM_LOAD_ID,
    LOCATION,
    SUM(LOAD) OVER (PARTITION BY ID,LOCATION) SUM_LOAD_LOCATION,
    LOAD
  FROM
    T1)
ORDER BY
  TX;

 TX  ID D  DEPT_RANK LO   LOC_RANK  LOAD
--- --- - ---------- -- ---------- -----
  1  99 A          1 NY          2    12
  2  99 A          1 LA          1    10
  3  99 B          2 LA          1     5
  4  77 B          1 LA          1    15
  5  77 C          2 NY          2    12
  6  77 D          3 LA          1    11 

Let’s compare the above output with what the OP requested:

TX   ID   DEPT   DEPT_RANK   LOCATION   LOC_RANK   LOAD
1    99    A         1          NY         2        12
2    99    A         1          LA         1        10
3    99    B         2          LA         1        05
4    77    B         1          LA         1        15
5    77    C         2          NY         2        12
6    77    D         3          LA         1        11

As best as I am able to determine, the above SQL statement will satisfy the OP’s request.

—–

Part 2 of the Challenge

If the OP has the following SQL statement:

SELECT
  LOAD_YEAR,
  ORG_UNIT_CODE,
  RANK () OVER (PARTITION BY LOAD_YEAR, CLASSIF_CODE ORDER BY SUM (FTE_DAYS) DESC) ORG_RANK,
  CLASSIF_CODE,
  RANK () OVER (PARTITION BY LOAD_YEAR, ORG_UNIT_CODE ORDER BY SUM (FTE_DAYS) DESC) CLASSIF_RANK,
  SUM (FTE_DAYS) FTE
FROM
  STAFF_LOAD
GROUP BY
  LOAD_YEAR,
  ORG_UNIT_CODE,
  CLASSIF_CODE;  

And the above SQL statement produces the following output:

YEAR ORG_UNIT_CODE ORG_RANK CLASSIF_CODE CLASSIF_RANK FTE
2010 A46                  1 HEW3                    1  59
2010 A42                  2 HEW3                    1  13
2010 A42                  1 HEW4                    1  13
2010 A46                  2 HEW4                    2  12

And the OP wants the output to look like this:

YEAR ORG_UNIT_CODE ORG_RANK CLASSIF_CODE CLASSIF_RANK FTE
2010 A46                  1 HEW3                    1  59
2010 A42                  2 HEW3                    1  13
2010 A42                  2 HEW4                    2  13
2010 A46                  1 HEW4                    2  12  

Write the DDL and DML statements to create the source table and populate it with the non-aggregated original data, and then produce the output requested by the OP (without looking at the updated usenet thread).





Simple SQL – Finding the Next Operation

3 11 2011

November 3, 2011

An interesting request came in from an ERP mailing list – how would you write a SQL statement that indicates the next operation in a manufacturing process.  Sounds like an easy requirement.  Let’s take a look at a graphical view of one example (the graphical view is created using a program that I wrote probably 7 years ago):

The cyan colored rectangles in the above picture are the various operations (found in the OPERATION table) in the manufacturing process.  The light green colored rectangles are the material requirements (found in the REQUIREMENT table) that are used by the operation that is immediately above the sequence of light green rectangles.  The white rectangles are the sub-assembly headers for the manufacturing process.  The white rectangle at the far left is considered the main sub-assembly 0 (WORKORDER_SUB_ID = ’0′), the middle white rectangle in this case is sub-assembly 1 (WORKORDER_SUB_ID = ’1′), and the right-most white rectangle in this case is sub-assembly 200 (WORKORDER_SUB_ID = ’200′).  All sub-assemblies except the main sub-assembly 0 are tied to another “parent” operation that consumes the sub-assembly just as if it were another material requirement; therefore a dummy row is added to the material requirements table (REQUIREMENT) with the SUBORD_WO_SUB_ID column set to the sub-assembly number of the connected operation.

In the above picture, on the main sub-assembly 0, operation 888 (at resource ID INSPECT) is the first operation, and operation 999 (at resource ID SHIP) is the second operation.  On sub-assembly 1, operation 10 (at resource ID 92) is the first operation, operation 20 (at resource ID 62) is the second operation, and operation 541 (at resource ID KRW) is the third operation.  On sub-assembly 200, operation 10 (at resource ID WSD70-TACK) is the first operation, operation 20 (at resource ID WELD-CBA) is the second operation, operation 40 (at resource ID BLAST) is the third operation, and operation 50 (at resource ID PAINT) is the fourth operation.

Since I am working with Oracle Database, I can construct a SQL statement using the LEAD analytic function to find the next operation number and the next resource ID:

SELECT
  *
FROM
  (SELECT
    O.WORKORDER_BASE_ID,
    O.WORKORDER_LOT_ID,
    O.WORKORDER_SPLIT_ID,
    O.WORKORDER_SUB_ID,
    O.SEQUENCE_NO,
    O.RESOURCE_ID,
    LEAD(O.SEQUENCE_NO,1) OVER (PARTITION BY O.WORKORDER_BASE_ID,
                                             O.WORKORDER_LOT_ID,
                                             O.WORKORDER_SPLIT_ID,
                                             O.WORKORDER_SUB_ID
                                ORDER BY O.SEQUENCE_NO) AS NEXT_SEQUENCE_NO,
    LEAD(O.RESOURCE_ID,1) OVER (PARTITION BY O.WORKORDER_BASE_ID,
                                             O.WORKORDER_LOT_ID,
                                             O.WORKORDER_SPLIT_ID,
                                             O.WORKORDER_SUB_ID
                                ORDER BY O.SEQUENCE_NO) AS NEXT_RESOURCE_ID
  FROM
    OPERATION O
  WHERE
    O.WORKORDER_TYPE='W'
    AND O.WORKORDER_BASE_ID='13000'
    AND O.WORKORDER_LOT_ID='50'
    AND O.WORKORDER_SPLIT_ID='0') O
ORDER BY
  O.WORKORDER_BASE_ID,
  O.WORKORDER_LOT_ID,
  O.WORKORDER_SPLIT_ID,
  O.WORKORDER_SUB_ID,
  O.SEQUENCE_NO;

WORKORDER_BASE_ID    WOR WOR WOR SEQUENCE_NO RESOURCE_ID     NEXT_SEQUENCE_NO NEXT_RESOURCE_I
-------------------- --- --- --- ----------- --------------- ---------------- ---------------
13000                50  0   0           888 INSPECT                      999 SHIP
13000                50  0   0           999 SHIP
13000                50  0   1            10 92                            20 62
13000                50  0   1            20 62                           541 KRW
13000                50  0   1           541 KRW
13000                50  0   200          10 WSD70-TACK FIX                20 WELD-CBA
13000                50  0   200          20 WELD-CBA                      40 BLAST
13000                50  0   200          40 BLAST                         50 PAINT
13000                50  0   200          50 PAINT

Looks like a pretty easy solution… unless we recognize that after the last operation on sub-assembly 200 the next operation is really the parent operation of that sub-assembly (operation 10 on sub-assembly 1).  Likewise, after the last operation on sub-assembly 1, the next operation is really the parent operation of that sub-assembly (operation 888 on sub-assembly 0).  There is no next operation after operation 999 on the main sub-assembly 0.  How can we fix this problem with the NULL next operations in the previous output?  We just need an outer join to the dummy row in the REQUIREMENT table to pick up the parent’s sub-assembly number, and then join that dummy row back to a second reference of the OPERATION table.  The NVL2 and COALESCE functions are used to handle the cases where the original next operation and next resource ID would have been output as NULL values:

SELECT
  O.WORKORDER_BASE_ID,
  O.WORKORDER_LOT_ID,
  O.WORKORDER_SPLIT_ID,
  O.WORKORDER_SUB_ID,
  O.SEQUENCE_NO,
  O.RESOURCE_ID,
  NVL2(O.NEXT_SEQUENCE_NO,O.WORKORDER_SUB_ID,O2.WORKORDER_SUB_ID) NEXT_SUB_ID,
  COALESCE(O.NEXT_SEQUENCE_NO,O2.SEQUENCE_NO) NEXT_SEQUENCE_NO,
  COALESCE(O.NEXT_RESOURCE_ID,O2.RESOURCE_ID) NEXT_RESOURCE_ID
FROM
  (SELECT
    O.WORKORDER_TYPE,
    O.WORKORDER_BASE_ID,
    O.WORKORDER_LOT_ID,
    O.WORKORDER_SPLIT_ID,
    O.WORKORDER_SUB_ID,
    O.SEQUENCE_NO,
    O.RESOURCE_ID,
    LEAD(O.SEQUENCE_NO,1) OVER (PARTITION BY O.WORKORDER_BASE_ID,
                                             O.WORKORDER_LOT_ID,
                                             O.WORKORDER_SPLIT_ID,
                                             O.WORKORDER_SUB_ID
                                ORDER BY O.SEQUENCE_NO) AS NEXT_SEQUENCE_NO,
    LEAD(O.RESOURCE_ID,1) OVER (PARTITION BY O.WORKORDER_BASE_ID,
                                             O.WORKORDER_LOT_ID,
                                             O.WORKORDER_SPLIT_ID,
                                             O.WORKORDER_SUB_ID
                                ORDER BY O.SEQUENCE_NO) AS NEXT_RESOURCE_ID
  FROM
    OPERATION O
  WHERE
    O.WORKORDER_TYPE='W'
    AND O.WORKORDER_BASE_ID='13000'
    AND O.WORKORDER_LOT_ID='50'
    AND O.WORKORDER_SPLIT_ID='0') O,
  REQUIREMENT R,
  OPERATION O2
WHERE
  O.WORKORDER_TYPE=R.WORKORDER_TYPE(+)
  AND O.WORKORDER_BASE_ID=R.WORKORDER_BASE_ID(+)
  AND O.WORKORDER_LOT_ID=R.WORKORDER_LOT_ID(+)
  AND O.WORKORDER_SPLIT_ID=R.WORKORDER_SPLIT_ID(+)
  AND O.WORKORDER_SUB_ID=R.SUBORD_WO_SUB_ID(+)
  AND R.WORKORDER_TYPE=O2.WORKORDER_TYPE(+)
  AND R.WORKORDER_BASE_ID=O2.WORKORDER_BASE_ID(+)
  AND R.WORKORDER_LOT_ID=O2.WORKORDER_LOT_ID(+)
  AND R.WORKORDER_SPLIT_ID=O2.WORKORDER_SPLIT_ID(+)
  AND R.WORKORDER_SUB_ID=O2.WORKORDER_SUB_ID(+)
  AND R.OPERATION_SEQ_NO=O2.SEQUENCE_NO(+)
ORDER BY
  O.WORKORDER_BASE_ID,
  O.WORKORDER_LOT_ID,
  O.WORKORDER_SPLIT_ID,
  O.WORKORDER_SUB_ID,
  O.SEQUENCE_NO;

WORKORDER_BASE_ID    WOR WOR WOR SEQUENCE_NO RESOURCE_ID     NEX NEXT_SEQUENCE_NO NEXT_RESOURCE_I
-------------------- --- --- --- ----------- --------------- --- ---------------- ---------------
13000                50  0   0           888 INSPECT         0                999 SHIP
13000                50  0   0           999 SHIP
13000                50  0   1            10 92              1                 20 62
13000                50  0   1            20 62              1                541 KRW
13000                50  0   1           541 KRW             0                888 INSPECT
13000                50  0   200          10 WSD70-TACK FIX  200               20 WELD-CBA
13000                50  0   200          20 WELD-CBA        200               40 BLAST
13000                50  0   200          40 BLAST           200               50 PAINT
13000                50  0   200          50 PAINT           1                 10 92
 

In case you are wondering, the execution plan for the above SQL statement looks like this:

Plan hash value: 1263351280

-----------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                        | Name            | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                 |                 |      1 |        |      9 |00:00:00.01 |      43 |       |       |          |
|   1 |  SORT ORDER BY                   |                 |      1 |      9 |      9 |00:00:00.01 |      43 |  2048 |  2048 | 2048  (0)|
|   2 |   NESTED LOOPS OUTER             |                 |      1 |      9 |      9 |00:00:00.01 |      43 |       |       |          |
|   3 |    NESTED LOOPS OUTER            |                 |      1 |      9 |      9 |00:00:00.01 |      23 |       |       |          |
|   4 |     VIEW                         |                 |      1 |      9 |      9 |00:00:00.01 |       9 |       |       |          |
|   5 |      WINDOW BUFFER               |                 |      1 |      9 |      9 |00:00:00.01 |       9 |  2048 |  2048 | 2048  (0)|
|   6 |       TABLE ACCESS BY INDEX ROWID| OPERATION       |      1 |      9 |      9 |00:00:00.01 |       9 |       |       |          |
|*  7 |        INDEX RANGE SCAN          | SYS_C0021734    |      1 |      9 |      9 |00:00:00.01 |       3 |       |       |          |
|   8 |     TABLE ACCESS BY INDEX ROWID  | REQUIREMENT     |      9 |      1 |      7 |00:00:00.01 |      14 |       |       |          |
|*  9 |      INDEX RANGE SCAN            | X_REQUIREMENT_5 |      9 |      1 |      7 |00:00:00.01 |      13 |       |       |          |
|  10 |    TABLE ACCESS BY INDEX ROWID   | OPERATION       |      9 |      1 |      7 |00:00:00.01 |      20 |       |       |          |
|* 11 |     INDEX UNIQUE SCAN            | SYS_C0021734    |      9 |      1 |      7 |00:00:00.01 |      13 |       |       |          |
-----------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   7 - access("O"."WORKORDER_TYPE"='W' AND "O"."WORKORDER_BASE_ID"='13000' AND "O"."WORKORDER_LOT_ID"='50' AND
              "O"."WORKORDER_SPLIT_ID"='0')
   9 - access("O"."WORKORDER_TYPE"="R"."WORKORDER_TYPE" AND "O"."WORKORDER_BASE_ID"="R"."WORKORDER_BASE_ID" AND
              "O"."WORKORDER_LOT_ID"="R"."WORKORDER_LOT_ID" AND "O"."WORKORDER_SPLIT_ID"="R"."WORKORDER_SPLIT_ID" AND
              "O"."WORKORDER_SUB_ID"="R"."SUBORD_WO_SUB_ID")
       filter("R"."SUBORD_WO_SUB_ID" IS NOT NULL)
  11 - access("R"."WORKORDER_TYPE"="O2"."WORKORDER_TYPE" AND "R"."WORKORDER_BASE_ID"="O2"."WORKORDER_BASE_ID" AND
              "R"."WORKORDER_LOT_ID"="O2"."WORKORDER_LOT_ID" AND "R"."WORKORDER_SPLIT_ID"="O2"."WORKORDER_SPLIT_ID" AND
              "R"."WORKORDER_SUB_ID"="O2"."WORKORDER_SUB_ID" AND "R"."OPERATION_SEQ_NO"="O2"."SEQUENCE_NO")

Note
-----
   - cardinality feedback used for this statement 

The above takes less that 0.01 seconds to execute, in part because I was careful to preserve the leading columns in the primary key indexes’ columns when building the inline view.

But, what if the OP (as he mentioned) was running with an older version of SQL Server that did not support the LEAD analytic function?  Is the OP out of luck (it could be argued that is a trick question)?

Let’s start again, this time we will self-join the OPERATION table rather than using an analytic function to determine the next operation, and just for fun we will use ANSI join syntax:

SELECT
  O.WORKORDER_BASE_ID,
  O.WORKORDER_LOT_ID,
  O.WORKORDER_SPLIT_ID,
  O.WORKORDER_SUB_ID,
  O.SEQUENCE_NO,
  MIN(O2.SEQUENCE_NO) AS NEXT_SEQUENCE_NO
FROM
  OPERATION O
LEFT OUTER JOIN
  OPERATION O2
ON
  O.WORKORDER_TYPE=O2.WORKORDER_TYPE
  AND O.WORKORDER_BASE_ID=O2.WORKORDER_BASE_ID
  AND O.WORKORDER_LOT_ID=O2.WORKORDER_LOT_ID
  AND O.WORKORDER_SPLIT_ID=O2.WORKORDER_SPLIT_ID
  AND O.WORKORDER_SUB_ID=O2.WORKORDER_SUB_ID
  AND O.SEQUENCE_NO < O2.SEQUENCE_NO
WHERE
  O.WORKORDER_TYPE='W'
  AND O.WORKORDER_BASE_ID='13000'
  AND O.WORKORDER_LOT_ID='50'
  AND O.WORKORDER_SPLIT_ID='0'
GROUP BY
  O.WORKORDER_BASE_ID,
  O.WORKORDER_LOT_ID,
  O.WORKORDER_SPLIT_ID,
  O.WORKORDER_SUB_ID,
  O.SEQUENCE_NO
ORDER BY
  O.WORKORDER_BASE_ID,
  O.WORKORDER_LOT_ID,
  O.WORKORDER_SPLIT_ID,
  O.WORKORDER_SUB_ID,
  O.SEQUENCE_NO;

WORKORDER_BASE_ID    WOR WOR WOR SEQUENCE_NO NEXT_SEQUENCE_NO
-------------------- --- --- --- ----------- ----------------
13000                50  0   0           888              999
13000                50  0   0           999
13000                50  0   1            10               20
13000                50  0   1            20              541
13000                50  0   1           541
13000                50  0   200          10               20
13000                50  0   200          20               40
13000                50  0   200          40               50
13000                50  0   200          50 

Now, we will slide the above SQL statement into an inline view and use the COALESCE function to return the first non-NULL value in a list of columns, and the Oracle only NVL2 function to return one of two column values depending on whether or not a third column contains a NULL value:

SELECT
  O.WORKORDER_TYPE,
  O.WORKORDER_BASE_ID,
  O.WORKORDER_LOT_ID,
  O.WORKORDER_SPLIT_ID,
  O.WORKORDER_SUB_ID,
  O.SEQUENCE_NO,
  O.RESOURCE_ID,
  NVL2(O.NEXT_SEQUENCE_NO,O.WORKORDER_SUB_ID,R.WORKORDER_SUB_ID) AS NEXT_SUB_ID,
  COALESCE(O.NEXT_SEQUENCE_NO,R.OPERATION_SEQ_NO) AS NEXT_SEQUENCE_NO
FROM
  (SELECT
    O.WORKORDER_TYPE,
    O.WORKORDER_BASE_ID,
    O.WORKORDER_LOT_ID,
    O.WORKORDER_SPLIT_ID,
    O.WORKORDER_SUB_ID,
    O.SEQUENCE_NO,
    O.RESOURCE_ID,
    MIN(O2.SEQUENCE_NO) AS NEXT_SEQUENCE_NO
  FROM
    OPERATION O
  LEFT OUTER JOIN
    OPERATION O2
  ON
    O.WORKORDER_TYPE=O2.WORKORDER_TYPE
    AND O.WORKORDER_BASE_ID=O2.WORKORDER_BASE_ID
    AND O.WORKORDER_LOT_ID=O2.WORKORDER_LOT_ID
    AND O.WORKORDER_SPLIT_ID=O2.WORKORDER_SPLIT_ID
    AND O.WORKORDER_SUB_ID=O2.WORKORDER_SUB_ID
    AND O.SEQUENCE_NO < O2.SEQUENCE_NO
  WHERE
    O.WORKORDER_TYPE='W'
    AND O.WORKORDER_BASE_ID='13000'
    AND O.WORKORDER_LOT_ID='50'
    AND O.WORKORDER_SPLIT_ID='0'
  GROUP BY
    O.WORKORDER_TYPE,
    O.WORKORDER_BASE_ID,
    O.WORKORDER_LOT_ID,
    O.WORKORDER_SPLIT_ID,
    O.WORKORDER_SUB_ID,
    O.SEQUENCE_NO,
    O.RESOURCE_ID) O
LEFT OUTER JOIN
  REQUIREMENT R
ON
  O.WORKORDER_TYPE=R.WORKORDER_TYPE
  AND O.WORKORDER_BASE_ID=R.WORKORDER_BASE_ID
  AND O.WORKORDER_LOT_ID=R.WORKORDER_LOT_ID
  AND O.WORKORDER_SPLIT_ID=R.WORKORDER_SPLIT_ID
  AND O.WORKORDER_SUB_ID=R.SUBORD_WO_SUB_ID
ORDER BY
  O.WORKORDER_BASE_ID,
  O.WORKORDER_LOT_ID,
  O.WORKORDER_SPLIT_ID,
  O.WORKORDER_SUB_ID,
  O.SEQUENCE_NO;

W WORKORDER_BASE_ID    WOR WOR WOR SEQUENCE_NO RESOURCE_ID     NEX NEXT_SEQUENCE_NO
- -------------------- --- --- --- ----------- --------------- --- ----------------
W 13000                50  0   0           888 INSPECT         0                999
W 13000                50  0   0           999 SHIP
W 13000                50  0   1            10 92              1                 20
W 13000                50  0   1            20 62              1                541
W 13000                50  0   1           541 KRW             0                888
W 13000                50  0   200          10 WSD70-TACK FIX  200               20
W 13000                50  0   200          20 WELD-CBA        200               40
W 13000                50  0   200          40 BLAST           200               50
W 13000                50  0   200          50 PAINT           1                 10 

Almost there – we still need the resource ID of the next operation.  Sliding the above into another level of inline view, we end up with the following:

SELECT
  O.WORKORDER_BASE_ID,
  O.WORKORDER_LOT_ID,
  O.WORKORDER_SPLIT_ID,
  O.WORKORDER_SUB_ID,
  O.RESOURCE_ID,
  O.NEXT_SUB_ID,
  O.NEXT_SEQUENCE_NO,
  O2.RESOURCE_ID NEXT_RESOURCE_ID
FROM
  (SELECT
    O.WORKORDER_TYPE,
    O.WORKORDER_BASE_ID,
    O.WORKORDER_LOT_ID,
    O.WORKORDER_SPLIT_ID,
    O.WORKORDER_SUB_ID,
    O.SEQUENCE_NO,
    O.RESOURCE_ID,
    NVL2(O.NEXT_SEQUENCE_NO,O.WORKORDER_SUB_ID,R.WORKORDER_SUB_ID) AS NEXT_SUB_ID,
    COALESCE(O.NEXT_SEQUENCE_NO,R.OPERATION_SEQ_NO) AS NEXT_SEQUENCE_NO
  FROM
    (SELECT
      O.WORKORDER_TYPE,
      O.WORKORDER_BASE_ID,
      O.WORKORDER_LOT_ID,
      O.WORKORDER_SPLIT_ID,
      O.WORKORDER_SUB_ID,
      O.SEQUENCE_NO,
      O.RESOURCE_ID,
      MIN(O2.SEQUENCE_NO) AS NEXT_SEQUENCE_NO
    FROM
      OPERATION O
    LEFT OUTER JOIN
      OPERATION O2
    ON
      O.WORKORDER_TYPE=O2.WORKORDER_TYPE
      AND O.WORKORDER_BASE_ID=O2.WORKORDER_BASE_ID
      AND O.WORKORDER_LOT_ID=O2.WORKORDER_LOT_ID
      AND O.WORKORDER_SPLIT_ID=O2.WORKORDER_SPLIT_ID
      AND O.WORKORDER_SUB_ID=O2.WORKORDER_SUB_ID
      AND O.SEQUENCE_NO < O2.SEQUENCE_NO
    WHERE
      O.WORKORDER_TYPE='W'
      AND O.WORKORDER_BASE_ID='13000'
      AND O.WORKORDER_LOT_ID='50'
      AND O.WORKORDER_SPLIT_ID='0'
    GROUP BY
      O.WORKORDER_TYPE,
      O.WORKORDER_BASE_ID,
      O.WORKORDER_LOT_ID,
      O.WORKORDER_SPLIT_ID,
      O.WORKORDER_SUB_ID,
      O.SEQUENCE_NO,
      O.RESOURCE_ID) O
  LEFT OUTER JOIN
    REQUIREMENT R
  ON
    O.WORKORDER_TYPE=R.WORKORDER_TYPE
    AND O.WORKORDER_BASE_ID=R.WORKORDER_BASE_ID
    AND O.WORKORDER_LOT_ID=R.WORKORDER_LOT_ID
    AND O.WORKORDER_SPLIT_ID=R.WORKORDER_SPLIT_ID
    AND O.WORKORDER_SUB_ID=R.SUBORD_WO_SUB_ID) O
LEFT OUTER JOIN
  OPERATION O2
ON
  O.WORKORDER_TYPE=O2.WORKORDER_TYPE
  AND O.WORKORDER_BASE_ID=O2.WORKORDER_BASE_ID
  AND O.WORKORDER_LOT_ID=O2.WORKORDER_LOT_ID
  AND O.WORKORDER_SPLIT_ID=O2.WORKORDER_SPLIT_ID
  AND O.NEXT_SUB_ID=O2.WORKORDER_SUB_ID
  AND O.NEXT_SEQUENCE_NO=O2.SEQUENCE_NO
ORDER BY
  O.WORKORDER_BASE_ID,
  O.WORKORDER_LOT_ID,
  O.WORKORDER_SPLIT_ID,
  O.WORKORDER_SUB_ID,
  O.SEQUENCE_NO;

WORKORDER_BASE_ID    WOR WOR WOR RESOURCE_ID     NEX NEXT_SEQUENCE_NO NEXT_RESOURCE_I
-------------------- --- --- --- --------------- --- ---------------- ---------------
13000                50  0   0   INSPECT         0                999 SHIP
13000                50  0   0   SHIP
13000                50  0   1   92              1                 20 62
13000                50  0   1   62              1                541 KRW
13000                50  0   1   KRW             0                888 INSPECT
13000                50  0   200 WSD70-TACK FIX  200               20 WELD-CBA
13000                50  0   200 WELD-CBA        200               40 BLAST
13000                50  0   200 BLAST           200               50 PAINT
13000                50  0   200 PAINT           1                 10 92 

The above SQL statement has the following execution plan:

Plan hash value: 3374377097

---------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                         | Name         | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
---------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                  |              |      1 |        |      9 |00:00:00.01 |      41 |       |       |          |
|   1 |  SORT ORDER BY                    |              |      1 |      9 |      9 |00:00:00.01 |      41 |  2048 |  2048 | 2048  (0)|
|   2 |   NESTED LOOPS OUTER              |              |      1 |      9 |      9 |00:00:00.01 |      41 |       |       |          |
|   3 |    VIEW                           |              |      1 |      9 |      9 |00:00:00.01 |      19 |       |       |          |
|   4 |     HASH GROUP BY                 |              |      1 |      9 |      9 |00:00:00.01 |      19 |   781K|   781K| 1103K (0)|
|*  5 |      HASH JOIN OUTER              |              |      1 |      9 |     13 |00:00:00.01 |      19 |   851K|   851K|  653K (0)|
|*  6 |       HASH JOIN OUTER             |              |      1 |      9 |      9 |00:00:00.01 |      16 |   927K|   927K|  651K (0)|
|   7 |        TABLE ACCESS BY INDEX ROWID| OPERATION    |      1 |      9 |      9 |00:00:00.01 |       9 |       |       |          |
|*  8 |         INDEX RANGE SCAN          | SYS_C0021734 |      1 |      9 |      9 |00:00:00.01 |       3 |       |       |          |
|*  9 |        TABLE ACCESS BY INDEX ROWID| REQUIREMENT  |      1 |      1 |      2 |00:00:00.01 |       7 |       |       |          |
|* 10 |         INDEX RANGE SCAN          | SYS_C0021842 |      1 |      1 |     18 |00:00:00.01 |       3 |       |       |          |
|* 11 |       INDEX RANGE SCAN            | SYS_C0021734 |      1 |      1 |      9 |00:00:00.01 |       3 |       |       |          |
|  12 |    TABLE ACCESS BY INDEX ROWID    | OPERATION    |      9 |      1 |      8 |00:00:00.01 |      22 |       |       |          |
|* 13 |     INDEX UNIQUE SCAN             | SYS_C0021734 |      9 |      1 |      8 |00:00:00.01 |      14 |       |       |          |
---------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   5 - access("O"."WORKORDER_SUB_ID"="O2"."WORKORDER_SUB_ID" AND "O"."WORKORDER_SPLIT_ID"="O2"."WORKORDER_SPLIT_ID" AND
              "O"."WORKORDER_LOT_ID"="O2"."WORKORDER_LOT_ID" AND "O"."WORKORDER_BASE_ID"="O2"."WORKORDER_BASE_ID" AND
              "O"."WORKORDER_TYPE"="O2"."WORKORDER_TYPE")
       filter("O"."SEQUENCE_NO"<"O2"."SEQUENCE_NO")
   6 - access("O"."WORKORDER_SUB_ID"="R"."SUBORD_WO_SUB_ID" AND "O"."WORKORDER_SPLIT_ID"="R"."WORKORDER_SPLIT_ID" AND
              "O"."WORKORDER_LOT_ID"="R"."WORKORDER_LOT_ID" AND "O"."WORKORDER_BASE_ID"="R"."WORKORDER_BASE_ID" AND
              "O"."WORKORDER_TYPE"="R"."WORKORDER_TYPE")
   8 - access("O"."WORKORDER_TYPE"='W' AND "O"."WORKORDER_BASE_ID"='13000' AND "O"."WORKORDER_LOT_ID"='50' AND
              "O"."WORKORDER_SPLIT_ID"='0')
   9 - filter("R"."SUBORD_WO_SUB_ID" IS NOT NULL)
  10 - access("R"."WORKORDER_TYPE"='W' AND "R"."WORKORDER_BASE_ID"='13000' AND "R"."WORKORDER_LOT_ID"='50' AND
              "R"."WORKORDER_SPLIT_ID"='0')
  11 - access("O2"."WORKORDER_TYPE"='W' AND "O2"."WORKORDER_BASE_ID"='13000' AND "O2"."WORKORDER_LOT_ID"='50' AND
              "O2"."WORKORDER_SPLIT_ID"='0')
  13 - access("O"."WORKORDER_TYPE"="O2"."WORKORDER_TYPE" AND "O"."WORKORDER_BASE_ID"="O2"."WORKORDER_BASE_ID" AND
              "O"."WORKORDER_LOT_ID"="O2"."WORKORDER_LOT_ID" AND "O"."WORKORDER_SPLIT_ID"="O2"."WORKORDER_SPLIT_ID" AND
              "O"."NEXT_SUB_ID"="O2"."WORKORDER_SUB_ID" AND "O"."NEXT_SEQUENCE_NO"="O2"."SEQUENCE_NO")

Note
-----
   - cardinality feedback used for this statement 

So, now that we have two different SQL statements that both solve the problem that was presented by the OP, which version of the SQL statement is the most efficient, and which would you use in production?





Simple SQL with and without Inline Views 2

26 10 2011

October 26, 2011

(Back to the Previous Post in the Series)

In the previous article of this series we examined a couple of moderately simple data retrieval requirements, and used inline views and/or analytic functions to solve those data retrieval requirements.  I recently saw another opportunity to help a person on the ERP mailing list with a SQL statement.  The original poster (OP) provided a screen capture that contained information similar to the following:

CUST_ORDER_ID   PART_ID         TYPE CLASS WAREHOUSE_ID LOCATION_ID  QTY_ON_HAND TRANSACTI
--------------- --------------- ---- ----- ------------ ------------ ----------- ---------
ZQHZG           2J-2908         O    I     FIB          AB7A10                30 21-JUN-10
ZQHZZ           2J-2909         O    I     FIB          AB7A10                 1 21-JUN-10
ZIHKR           2J0836          O    I     HIBB         A77Y3               1092 30-NOV-07
ZQKZH           2J0836          O    I     HIBB         A77Y3               1092 10-JUN-08
ZQIHZ           2J0836          O    I     HIBB         A77Y3               1092 23-NOV-09
ZKOHI           2J4120          O    I     PIPABTONA    AB7A10                 2 16-NOV-95
ZKOHQ           2J4129          O    I     PIPABTONA    AB7A10                 2 16-NOV-95
ZKGQG           2J9785          O    I     HIBB         AB7A10                 2 25-SEP-95
ZKGQG           2J9785          O    I     HIBB         AB7A10                 2 25-SEP-95
ZKGQG           2J9785          O    I     HIBB         AB7A10                 2 26-OCT-95
ZKGQG           2J9785          O    I     HIBB         AB7A10                 2 26-OCT-95
ZKGQG           2J9785          O    I     HIBB         AB7A10                 2 12-DEC-95
ZKGQG           2J9785          O    I     HIBB         AB7A10                 2 12-DEC-95
ZKGQG           2J9785          O    I     HIBB         AB7A10                 2 27-FEB-96
ZKGQG           2J9785          O    I     HIBB         AB7A10                 2 04-APR-96
ZKGQG           2J9785          O    I     HIBB         AB7A10                 2 04-APR-96
ZKGQG           2J9785          O    I     HIBB         AB7A10                 2 31-MAY-96
ZKGQG           2J9785          O    I     HIBB         AB7A10                 2 31-MAY-96
ZKGQG           2J9785          O    I     HIBB         AB7A10                 2 06-SEP-96
ZKGQG           2J9785          O    I     HIBB         AB7A10                 2 30-SEP-96
ZKGQG           2J9785          O    I     HIBB         AB7A10                 2 30-SEP-96
ZKGQG           2J9785          O    I     HIBB         AB7A10                 2 30-SEP-96
ZKGQG           2J9785          O    I     HIBB         AB7A10                 2 12-JAN-98
ZKGQG           2J9785          O    I     HIBB         AB7A10                 2 12-FEB-98
ZKGQG           2J9785          O    I     HIBB         AB7A10                 2 04-MAR-98
ZKGQG           2J9785          O    I     HIBB         AB7A10                 2 05-MAY-98
ZKGQG           2J9785          O    I     HIBB         AB7A10                 2 05-MAY-98
ZKGQG           2J9785          O    I     HIBB         AB7A10                 2 18-AUG-98
ZKGQG           2J9785          O    I     HIBB         AB7A10                 2 18-AUG-98
ZKGQG           2J9785          O    I     HIBB         AB7A10                 2 29-SEP-98
ZKGQG           2J9785          O    I     HIBB         AB7A10                 2 02-DEC-98
ZKGQG           2J9785          O    I     HIBB         AB7A10                 2 02-DEC-98
ZKGQG           2J9785          O    I     HIBB         AB7A10                 2 02-DEC-98
ZIRZQ           3J8547          O    I     HIBB         A77Y2                252 18-OCT-07
ZZUQG           4G0007          O    I     HIBB         AB7A10                 2 08-NOV-99
ZZUQG           4G0007          O    I     HIBB         AB7A10                 2 30-NOV-99 
...

Initially, the OP requested that the most recent TRANSACTION_DATE (the last column in the above output) for the PART_ID value to be shown on each row of the output, rather than the TRANSACTION_DATE that is directly associated with a specific row.  However, there are two problems:

  • The OP did not mention what tables were used as the data source for the included screen capture.
  • The OP did not mention whether a recent release of Oracle Database or some other brand of database was in use.

For times like these, it helps a lot to be familiar with the data model used in the database.  The INVENTORY_TRANS table in this particular ERP system has columns that match several of the columns that were included in the OP’s screen capture: CUST_ORDER_ID, PART_ID, TYPE, CLASS, WAREHOUSE_ID, LOCATION_ID, and TRANSACTION_DATE.  The question remains, where did the QTY_ON_HAND column come from – that column is not in the INVENTORY_TRANS table.  We could find out, at least on Oracle Database, with a SQL statement similar to the following:

SELECT
  DTC.TABLE_NAME,
  DT.TABLE_NAME
FROM
  DBA_TAB_COLUMNS DTC,
  DBA_TABLES DT
WHERE
  DTC.COLUMN_NAME='QTY_ON_HAND'
  AND DTC.OWNER='SCHEMA_OWNER_HERE'
  AND DTC.OWNER=DT.OWNER(+)
  AND DTC.TABLE_NAME=DT.TABLE_NAME(+)
ORDER BY
  DTC.TABLE_NAME;

TABLE_NAME                     TABLE_NAME
------------------------------ ----------
CR_EC_LINE_PART
CR_OM_CO_LINE_PART
CR_PART_PO_LINE
CR_WO_PART
PART                           PART
PART_USER_DEFINED_VIEW 

Note that in the above, I had to join the DBA_TAB_COLUMNS dictionary view with the DBA_TABLES dictionary view (I could have used DBA_OBJECTS instead) to determine that 5 of the 6 rows returned from the DBA_TAB_COLUMNS view are in fact not tables, but views.  Thus, the QTY_ON_HAND column in the OP’s screen capture must have originated from the PART table (unless, of course, the column in the screen capture is actually an aliased calculation).

Now that the data sources are known for the query, we can take a best guess approach at reproducing the output that the OP provided in the screen capture:

SELECT
  IT.CUST_ORDER_ID,
  IT.PART_ID,
  TYPE,
  CLASS,
  IT.WAREHOUSE_ID,
  IT.LOCATION_ID,
  P.QTY_ON_HAND,
  IT.TRANSACTION_DATE
FROM
  INVENTORY_TRANS IT,
  PART P
WHERE
  IT.CUST_ORDER_ID IS NOT NULL
  AND IT.TYPE='O'
  AND IT.CLASS='I'
  AND IT.PART_ID=P.ID
  AND P.QTY_ON_HAND>0
ORDER BY
  IT.PART_ID,
  IT.TRANSACTION_DATE; 

Next, there are a couple of ways to have the most recent transaction date appear on each row of the output (partially dependent on the database version).  First, let’s build a query that returns the most recent transaction date where the TYPE is O and the CLASS is I and the CUST_ORDER_ID column does not contain a NULL:

SELECT
  PART_ID,
  MAX(TRANSACTION_DATE) AS LAST_TRANSACTION_DATE
FROM
  INVENTORY_TRANS
WHERE
  TYPE='O'
  AND CLASS='I'
  AND CUST_ORDER_ID IS NOT NULL
GROUP BY
  PART_ID
ORDER BY
  PART_ID; 

PART_ID         LAST_TRAN
--------------- ---------
0000009         29-DEC-00
005035008005    29-MAY-98
005035008006    29-MAY-98
00576649-0080   20-OCT-11
00576649-0088   08-JUN-11
00576649-8005   14-JAN-11
007580398       17-JAN-03
0099229940      01-NOV-95
0099229990      01-NOV-95
0108556         28-APR-00
01550867KM      15-NOV-05
01552316KM      02-OCT-00
01552346KM      30-APR-03
01552369KM      30-APR-01
01558943M       07-JAN-10
01561230KM      30-OCT-01
01563001M       08-MAR-10
01563882M       01-APR-10
01566790KM      08-SEP-08
01569945M       01-APR-10
01583508TO      11-FEB-05
01780151TO      19-JUL-07
...

Next, we will just plug the above inline view into the FROM clause of our original SQL statement and add an additional predicate into the WHERE clause to tell the query optimizer how to associate the inline view with the other tables in the SQL statement:

SELECT
  IT.CUST_ORDER_ID,
  IT.PART_ID,
  IT.QTY,
  IT.WAREHOUSE_ID,
  IT.LOCATION_ID,
  P.QTY_ON_HAND,
  MIT.LAST_TRANSACTION_DATE
FROM
  INVENTORY_TRANS IT,
  PART P,
  (SELECT
     PART_ID,
     MAX(TRANSACTION_DATE) AS LAST_TRANSACTION_DATE
   FROM
     INVENTORY_TRANS
   WHERE
     TYPE='O'
     AND CLASS='I'
     AND CUST_ORDER_ID IS NOT NULL
   GROUP BY
     PART_ID) MIT
WHERE
  IT.CUST_ORDER_ID IS NOT NULL
  AND IT.TYPE='O'
  AND IT.CLASS='I'
  AND IT.PART_ID=P.ID
  AND P.QTY_ON_HAND>0
  AND IT.PART_ID=MIT.PART_ID
ORDER BY
  IT.PART_ID,
  IT.CUST_ORDER_ID,
  IT.TRANSACTION_DATE; 

CUST_ORDER_ID   PART_ID             QTY WAREHOUSE_ID LOCATION_ID  QTY_ON_HAND LAST_TRAN
--------------- ------------ ---------- ------------ ------------ ----------- ---------
ZQHZG           2J-2908               1 FIB          AB7A10                30 21-JUN-10
ZQHZZ           2J-2909               1 FIB          AB7A10                 1 21-JUN-10
ZIHKR           2J0836               48 HIBB         A77Y3               1092 23-NOV-09
ZQIHZ           2J0836               72 HIBB         A77Y3               1092 23-NOV-09
ZQKZH           2J0836              111 HIBB         A77Y3               1092 23-NOV-09
ZKOHI           2J4120                2 PIPABTONA    AB7A10                 2 16-NOV-95
ZKOHQ           2J4129                2 PIPABTONA    AB7A10                 2 16-NOV-95
ZKGQG           2J9785                9 HIBB         AB7A10                 2 02-DEC-98
ZKGQG           2J9785                1 HIBB         AB7A10                 2 02-DEC-98
ZKGQG           2J9785                7 HIBB         AB7A10                 2 02-DEC-98
ZKGQG           2J9785                2 HIBB         AB7A10                 2 02-DEC-98
ZKGQG           2J9785                6 HIBB         AB7A10                 2 02-DEC-98
ZKGQG           2J9785                4 HIBB         AB7A10                 2 02-DEC-98
ZKGQG           2J9785               10 HIBB         AB7A10                 2 02-DEC-98
ZKGQG           2J9785                8 HIBB         AB7A10                 2 02-DEC-98
ZKGQG           2J9785                4 HIBB         AB7A10                 2 02-DEC-98
ZKGQG           2J9785               10 HIBB         AB7A10                 2 02-DEC-98
ZKGQG           2J9785                4 HIBB         AB7A10                 2 02-DEC-98
ZKGQG           2J9785               12 HIBB         AB7A10                 2 02-DEC-98
...

Since I am using Oracle Database 11.2.0.2, I can take advantage of analytic functions to further simplify the SQL statement:

SELECT
  IT.CUST_ORDER_ID,
  IT.PART_ID,
  IT.QTY,
  IT.WAREHOUSE_ID,
  IT.LOCATION_ID,
  P.QTY_ON_HAND,
  MAX(IT.TRANSACTION_DATE) OVER (PARTITION BY IT.PART_ID) AS LAST_TRANSACTION_DATE
FROM
  INVENTORY_TRANS IT,
  PART P
WHERE
  IT.CUST_ORDER_ID IS NOT NULL
  AND IT.TYPE='O'
  AND IT.CLASS='I'
  AND IT.PART_ID=P.ID
  AND P.QTY_ON_HAND>0
ORDER BY
  IT.PART_ID,
  IT.CUST_ORDER_ID,
  IT.TRANSACTION_DATE; 

CUST_ORDER_ID   PART_ID             QTY WAREHOUSE_ID LOCATION_ID  QTY_ON_HAND LAST_TRAN
--------------- ------------ ---------- ------------ ------------ ----------- ---------
ZQHZG           2J-2908               1 FIB          AB7A10                30 21-JUN-10
ZQHZZ           2J-2909               1 FIB          AB7A10                 1 21-JUN-10
ZIHKR           2J0836               48 HIBB         A77Y3               1092 23-NOV-09
ZQKZH           2J0836              111 HIBB         A77Y3               1092 23-NOV-09
ZQIHZ           2J0836               72 HIBB         A77Y3               1092 23-NOV-09
ZKOHI           2J4120                2 PIPABTONA    AB7A10                 2 16-NOV-95
ZKOHQ           2J4129                2 PIPABTONA    AB7A10                 2 16-NOV-95
ZKGQG           2J9785                9 HIBB         AB7A10                 2 02-DEC-98
ZKGQG           2J9785                1 HIBB         AB7A10                 2 02-DEC-98
ZKGQG           2J9785                7 HIBB         AB7A10                 2 02-DEC-98
ZKGQG           2J9785                2 HIBB         AB7A10                 2 02-DEC-98
ZKGQG           2J9785                6 HIBB         AB7A10                 2 02-DEC-98
ZKGQG           2J9785                4 HIBB         AB7A10                 2 02-DEC-98
ZKGQG           2J9785               10 HIBB         AB7A10                 2 02-DEC-98
ZKGQG           2J9785                8 HIBB         AB7A10                 2 02-DEC-98
ZKGQG           2J9785                4 HIBB         AB7A10                 2 02-DEC-98
ZKGQG           2J9785               10 HIBB         AB7A10                 2 02-DEC-98
ZKGQG           2J9785                4 HIBB         AB7A10                 2 02-DEC-98
ZKGQG           2J9785               12 HIBB         AB7A10                 2 02-DEC-98
ZKGQG           2J9785                2 HIBB         AB7A10                 2 02-DEC-98
ZKGQG           2J9785               11 HIBB         AB7A10                 2 02-DEC-98
...

—–

Let’s assume that the OP’s intended result changes – the OP just wants to know which PART_IDs have not had any transactions in the last 89 day.  So, the OP is interested in those PART_IDs with a last transaction that was either 90 or more days ago, or there has never been a transaction for the PART_ID.  Oh, scrap… start over.

First, we will identify the list of PART_IDs that have had transactions in the last 89 days (CLASS values of ‘A’ are adjustments, not true inventory transactions, so rows with those CLASS values will be excluded).  Once we have that list, we can simply subtract the list from the entire list of PART_IDs:

SELECT DISTINCT
  PART_ID
FROM
  INVENTORY_TRANS
WHERE
  TRANSACTION_DATE > TRUNC(SYSDATE-90)
  AND PART_ID IS NOT NULL
  AND CLASS<>'A'; 

PART_ID
---------
2J-2908
9F4810
867-8649
90772357L
90772341L
90772297L
8475047
8145432
90772326L
9340730
90772357
90772347
782-9484
...

The OP is interested in seeing the actual warehouse location containing the PART_IDs, so we need to use the PART_LOCATION table, not the PART table as we did before.  We then need to create a left outer join between the above list and the PART_LOCATION table – essentially, we will find cases where there is a row in the PART_LOCATION table without a corresponding part ID in the above list.  ANSI syntax will be used here for the outer join, because it is still unknown whether the OP is running an Oracle Database backend or a SQL Server backend:

SELECT
  PL.PART_ID,
  PL.WAREHOUSE_ID,
  PL.LOCATION_ID,
  IT.PART_ID,
  PL.QTY
FROM
  PART_LOCATION PL
LEFT OUTER JOIN
  (SELECT DISTINCT
     PART_ID
   FROM
     INVENTORY_TRANS
   WHERE
     TRANSACTION_DATE > TRUNC(SYSDATE-90)
     AND PART_ID IS NOT NULL
     AND CLASS<>'A') IT
ON
  PL.PART_ID=IT.PART_ID
  AND IT.PART_ID IS NULL
WHERE
  PL.QTY<>0
ORDER BY
  PL.PART_ID,
  PL.WAREHOUSE_ID,
  PL.LOCATION_ID; 

PART_ID                WAREHOUSE_ID LOCATION_ID  PART_ID             QTY
---------------------- ------------ ------------ ------------ ----------
#2                     HIBB         AB7A10                             4
#46954                 HIBB         TOOL CRAB                         15
#4HWG LHRK GROUND WIRK HIBB         TOOL CRAB                        250
#4THHN                 HIBB         AB7A10                          2460
#5                     HIBB         AB7A10                             3
#6                     HIBB         TOOL CRAB                          4
#649                   HIBB         TOOL CRAB                          2
#655                   HIBB         TOOL CRAB                          4
#709                   HIBB         TOOL CRAB                          2
#889                   HIBB         TOOL CRAB                          1
#89 KJTKNSION          HIBB         TOOL CRAB                          3
#90                    HIBB         AB7A10                             4
#92 KJTKNSION          HIBB         TOOL CRAB                          2
*G-WM370H-MT           HIBB         AB7A10                             2
.235-H522              BURN         AB7A10                            .5
.235RD-9K0045          BURN         AB7A10                            .7
.260RDJ.963-HSTMH57    BURN         AB7A10                        2.5625
.35-H596-845           PIPABTONA    AB7A10                         4.039
...

In the above, notice that there is a seeminly odd ON clause: “PL.PART_ID=IT.PART_ID  AND IT.PART_ID IS NULL” – that simply means that when resolving the left outer join, a value must be found on the left side, but a corresponding value should not be found on the right side of the join.  This is the “minus” syntax that I selected to use here (the SQL MINUS syntax could not be used because the WAREHOUSE_ID, LOCATION_ID, and QTY column values returned by the query of the INVENTORY_TRANS table would have to exactly match the values found in the PART_LOCATION table… and that is more difficult to accomplish).

If I was certain that the OP was running with an Oracle Database backend, I would have suggested the following syntax, rather than the ANSI syntax as shown above:

SELECT
  PL.PART_ID,
  PL.WAREHOUSE_ID,
  PL.LOCATION_ID,
  IT.PART_ID,
  PL.QTY
FROM
  PART_LOCATION PL,
  (SELECT DISTINCT
     PART_ID
   FROM
     INVENTORY_TRANS
   WHERE
     TRANSACTION_DATE > TRUNC(SYSDATE-90)
     AND PART_ID IS NOT NULL
     AND CLASS<>'A') IT
WHERE
  PL.PART_ID=IT.PART_ID(+)
  AND IT.PART_ID IS NULL
  AND PL.QTY<>0
ORDER BY
  PL.PART_ID,
  PL.WAREHOUSE_ID,
  PL.LOCATION_ID; 

Why the NULL values in the fourth column (the second PART_ID column) of the output?  That NULL simply means that the query is working exactly as intended.  That fourth column would be showing only those PART_IDs that have had transactions in the last 89 days (and NULL if no transactions in the last 89 days) if the restriction “AND IT.PART_ID IS NULL” was NOT included in the ON clause – but such rows should not be included in the output if the output is to be used as intended.





Simple SQL with and without Inline Views

21 10 2011

October 21, 2011

(Forward to the Next Post in the Series)

Sometimes it is interesting to take a step back from some of the more difficult to construct SQL statements, and just put together something simple.  The following request recently arrived through an ERP mailing list:

“I want to find the last ship date for all the open part numbers in Customer Order Entry. The last ship date is not necessarily from the Open order in Customer Order Entry…”

The original poster (OP) provided more information, and even provided a description of what was already tried as a solution.  It is not exceptionally clear what the OP wanted, but because I have several years of experience working with this particular ERP system, I had a feeling that the OP might be headed in the wrong direction.  This ERP system uses tables that are well normalized, which would make it a bit more difficult to locate the most recent ship date for a part number that is recorded with one customer order’s tables’ rows when another customer order’s tables’ rows are reported.  Unless, of course, we remember that there is also a transaction table that records every inventory movement of part numbers through the system – we just need to know what those rows in the transaction table look like.  In this case, the rows have the following characteristics:

  • The CUST_ORDER_ID column value must not be NULL
  • The TYPE column must have a value of ‘O’
  • The CLASS column must have a value of ‘I’

With that knowledge, we could then find the most recent date that any part number shipped with a SQL statement similar to the following:

SELECT
  PART_ID,
  MAX(TRANSACTION_DATE) AS LAST_TRANSACTION_DATE
FROM
  INVENTORY_TRANS
WHERE
  CUST_ORDER_ID IS NOT NULL
  AND TYPE='O'
  AND CLASS='I'
GROUP BY
  PART_ID;

In the above, the AS keyword is optional on an Oracle Database platform, but it is not optional on some other database platforms.

With an Oracle Database backend, inline views may be used to permit the above SQL statement to be used to retrieve additional information for a parent SQL statement.  If we wrap the above SQL statement in ( ) and place it in the FROM clause of the SQL statement, Oracle will treat the data returned by the above SQL statement quite similar to how it would handle a regular table (the Oracle query optimizer might decide to rewrite the combined SQL statement into an equivalent form that no longer contains an inline view).  There is always a chance that a part number that is on order may have never shipped to a customer, so we will need to outer join (in this case using a left outer join) to the newly created inline view:

SELECT
  COL.CUST_ORDER_ID,
  COL.PART_ID,
  LS.LAST_TRANSACTION_DATE
FROM
  CUSTOMER_ORDER CO,
  CUST_ORDER_LINE COL,
  (SELECT
    PART_ID,
    MAX(TRANSACTION_DATE) AS LAST_TRANSACTION_DATE
  FROM
    INVENTORY_TRANS
  WHERE
    CUST_ORDER_ID IS NOT NULL
    AND TYPE='O'
    AND CLASS='I'
  GROUP BY
    PART_ID) LS
WHERE
  CO.ID=COL.CUST_ORDER_ID
  AND CO.STATUS IN ('U','F','R')
  AND COL.LINE_STATUS='A'
  AND COL.ORDER_QTY>COL.TOTAL_SHIPPED_QTY
  AND COL.PART_ID=LS.PART_ID(+);

In the above, I joined the CUSTOMER_ORDER and CUST_ORDER_LINE tables to the inline view that I aliased as LS.  A third table, CUST_LINE_DEL, that optionally contains the delivery schedule for some of the rows in the CUST_ORDER_LINE table, could have also been outer joined to the CUST_ORDER_LINE table.

Running SQL Server, or just feeling ANSI?  If so, the above may be rewritten as follows:

SELECT
  COL.CUST_ORDER_ID,
  COL.PART_ID,
  LS.LAST_TRANSACTION_DATE
FROM
  CUSTOMER_ORDER CO
JOIN
  CUST_ORDER_LINE COL
ON
  CO.ID=COL.CUST_ORDER_ID
LEFT OUTER JOIN
  (SELECT
    PART_ID,
    MAX(TRANSACTION_DATE) AS LAST_TRANSACTION_DATE
  FROM
    INVENTORY_TRANS
  WHERE
    CUST_ORDER_ID IS NOT NULL
    AND TYPE='O'
    AND CLASS='I'
  GROUP BY
    PART_ID) LS
ON
  COL.PART_ID=LS.PART_ID
WHERE
  CO.STATUS IN ('U','F','R')
  AND COL.LINE_STATUS='A'
  AND COL.ORDER_QTY>COL.TOTAL_SHIPPED_QTY;

Unfortunately, the OP is actually using a SQLBase database backend that does not support inline views.  I remember the feeling before I discovered that Oracle Database supported inline views… in that case I would do something like the following:

  1. Create a statically defined view.
  2. Join to that statically defined view just as if the view were a table.
CREATE VIEW CUST_ORDER_PART_LS AS
SELECT
  PART_ID,
  MAX(TRANSACTION_DATE) AS LAST_TRANSACTION_DATE
FROM
  INVENTORY_TRANS
WHERE
  CUST_ORDER_ID IS NOT NULL
  AND TYPE='O'
  AND CLASS='I'
GROUP BY
  PART_ID;

SELECT
  COL.CUST_ORDER_ID,
  COL.PART_ID,
  LS.LAST_TRANSACTION_DATE
FROM
  CUSTOMER_ORDER CO,
  CUST_ORDER_LINE COL,
  CUST_ORDER_PART_LS LS
WHERE
  CO.ID=COL.CUST_ORDER_ID
  AND CO.STATUS IN ('U','F','R')
  AND COL.LINE_STATUS='A'
  AND COL.ORDER_QTY>COL.TOTAL_SHIPPED_QTY
  AND COL.PART_ID=LS.PART_ID(+);

I guess that it is a good exercise once in a while to practice simple SQL.

A second example from the ERP mailing list showed a SQL Server solution for a particular problem.  The particular problem that the SQL Server solution set out to solve is as follows:

“Someone posed an interesting question to me – How can you do a mass insert of Operations? Well most of us know how to manage that without too much trouble but this one came with a couple of caveats – The Master’s first operation (lowest numbered) has to have a particular ResourceID and then a different but specific ResourceID must be on a subsequent Operation. It is after the second operation where the insert must be placed.

Sounds like fun – So I figured that it could be done in a single SQL statement – well after a 4-way join of the Operation table to itself I got it to work how I expected…”

Interesting, I think that there are times in the past that I have used multiple self-joins to a table in order to solve similar SQL problems.  However, there is an easier way using analytic functions.  The following SQL statement attempts to indicate: the sequence of the operation within the work order sub ID, the sequence number that follows sequentially, the second sequence number that follows sequentially, and the resource ID of the next operation in sequential order by the operation number:

SELECT
  WORKORDER_BASE_ID,
  WORKORDER_LOT_ID,
  WORKORDER_SPLIT_ID,
  WORKORDER_SUB_ID,
  SEQUENCE_NO,
  RESOURCE_ID,
  ROW_NUMBER() OVER (PARTITION BY WORKORDER_BASE_ID, WORKORDER_LOT_ID, WORKORDER_SPLIT_ID, WORKORDER_SUB_ID
                     ORDER BY RESOURCE_ID) AS RN,
  LEAD(RESOURCE_ID,1) OVER (PARTITION BY WORKORDER_BASE_ID, WORKORDER_LOT_ID, WORKORDER_SPLIT_ID, WORKORDER_SUB_ID
                            ORDER BY RESOURCE_ID) AS NEXT_RESOURCE_ID,
  LEAD(SEQUENCE_NO,1) OVER (PARTITION BY WORKORDER_BASE_ID, WORKORDER_LOT_ID, WORKORDER_SPLIT_ID, WORKORDER_SUB_ID
                            ORDER BY RESOURCE_ID) AS NEXT_SEQ_NO,
  LEAD(SEQUENCE_NO,2) OVER (PARTITION BY WORKORDER_BASE_ID, WORKORDER_LOT_ID, WORKORDER_SPLIT_ID, WORKORDER_SUB_ID
                            ORDER BY RESOURCE_ID) AS NEXT_NEXT_SEQ_NO
FROM
  OPERATION
WHERE
  WORKORDER_TYPE='M'; 

Next, we only care about those rows when our resource ID of interest (69 in my example) is specified in the first operation on the work order sub ID and the second operation on that work order sub ID specifies the other resource ID that is of interest (FW in my example).  If we are to avoid primary key violations, we should select a new sequence number that is half way between the next-next operation and the next operation:

SELECT
  WORKORDER_BASE_ID,
  WORKORDER_LOT_ID,
  WORKORDER_SPLIT_ID,
  WORKORDER_SUB_ID,
  SEQUENCE_NO,
  NEXT_SEQ_NO,
  NEXT_NEXT_SEQ_NO,
  NEXT_SEQ_NO + ROUND((COALESCE(NEXT_NEXT_SEQ_NO,NEXT_SEQ_NO+10) - NEXT_SEQ_NO)/2) NEW_SEQUENCE_NO,
  'NEW RESOURCE' RESOURCE_ID
FROM
  (SELECT
    WORKORDER_BASE_ID,
    WORKORDER_LOT_ID,
    WORKORDER_SPLIT_ID,
    WORKORDER_SUB_ID,
    SEQUENCE_NO,
    RESOURCE_ID,
    ROW_NUMBER() OVER (PARTITION BY WORKORDER_BASE_ID, WORKORDER_LOT_ID, WORKORDER_SPLIT_ID, WORKORDER_SUB_ID
                       ORDER BY RESOURCE_ID) AS RN,
    LEAD(RESOURCE_ID,1) OVER (PARTITION BY WORKORDER_BASE_ID, WORKORDER_LOT_ID, WORKORDER_SPLIT_ID, WORKORDER_SUB_ID
                              ORDER BY RESOURCE_ID) AS NEXT_RESOURCE_ID,
    LEAD(SEQUENCE_NO,1) OVER (PARTITION BY WORKORDER_BASE_ID, WORKORDER_LOT_ID, WORKORDER_SPLIT_ID, WORKORDER_SUB_ID
                              ORDER BY RESOURCE_ID) AS NEXT_SEQ_NO,
    LEAD(SEQUENCE_NO,2) OVER (PARTITION BY WORKORDER_BASE_ID, WORKORDER_LOT_ID, WORKORDER_SPLIT_ID, WORKORDER_SUB_ID
                              ORDER BY RESOURCE_ID) AS NEXT_NEXT_SEQ_NO
  FROM
    OPERATION
  WHERE
    WORKORDER_TYPE='M')
WHERE
  RN=1
  AND RESOURCE_ID='69'
  AND NEXT_RESOURCE_ID='FW';

WORKORDER_ WOR WOR WOR SEQUENCE_NO NEXT_SEQ_NO NEXT_NEXT_SEQ_NO NEW_SEQUENCE_NO RESOURCE_ID
---------- --- --- --- ----------- ----------- ---------------- --------------- ------------
231610     0   0   0            10         777              888             833 NEW RESOURCE
237680     0   0   0            10         777              888             833 NEW RESOURCE
32018917X  0   0   0            10         777              888             833 NEW RESOURCE
3201B8920  0   0   0            10         777              888             833 NEW RESOURCE
3201C8765  0   0   0            10         777              888             833 NEW RESOURCE
3201G8639  0   0   0            10         777              888             833 NEW RESOURCE
3201G9003  0   0   0            10         777              888             833 NEW RESOURCE
3201J8772  0   0   0            10         777              888             833 NEW RESOURCE
3201J8850  0   0   0            10         777              888             833 NEW RESOURCE
3201K8669  0   0   0            10         777              888             833 NEW RESOURCE
3201M8281  0   0   0            10         777              888             833 NEW RESOURCE
3201M8489  0   0   0            10         777              888             833 NEW RESOURCE
3201N8516  0   0   0            10         777              888             833 NEW RESOURCE
3201N8776  0   0   0            10         777              888             833 NEW RESOURCE
3201Q8545  0   0   0            10         777              888             833 NEW RESOURCE
3201T8964  0   0   0            10         777              888             833 NEW RESOURCE
3201T8964X 0   0   0            10          20               30              25 NEW RESOURCE
3201V8524  0   0   0            10         777              888             833 NEW RESOURCE
3201V8966  0   0   0            10         777              888             833 NEW RESOURCE
3201V8966X 0   0   0            10          20               30              25 NEW RESOURCE
3201W8967  0   0   0            10         777              888             833 NEW RESOURCE
3201W8967X 0   0   0            10          20               30              25 NEW RESOURCE
3201Y8423  0   0   0            10         777              888             833 NEW RESOURCE
3201Z8996  0   0   0            10         777              888             833 NEW RESOURCE

24 rows selected. 

With the above, we are well on our way to perfoming a mass insert into this table.  If I recall correctly, the above must first be inserted into another table (a temp table, preferrably) and then the rows may be inserted into the OPERATION table (the table that is the row source for the analytic functions used in the SQL statement).





Brain Teaser: Why is this Query Performing a Full Table Scan

14 09 2011

September 14, 2011

While taking a look at the OTN threads this morning, I found an interesting test case, where the OP was asking why a query used a full table scan and not an index range scan.  The OP would like for the query to use the index without using a hint in the query.  So, why doesn’t the OP’s test case use an index range scan?  Just for fun I will state that my first two initial guesses were not quite on target.

A slightly modified table creation script of the setup for the OP’s test case:

DROP TABLE T1 PURGE;

CREATE TABLE T1 AS
SELECT
  *
FROM
  DBA_OBJECTS
WHERE
  STATUS='VALID';

UPDATE T1 SET STATUS='INVALID' WHERE ROWNUM=1;
COMMIT;

CREATE INDEX IND_T1_STATUS ON T1(STATUS); 

Let’s take a look at the data distribution in the table:

SELECT
  STATUS,
  CNT,
  ROUND((RATIO_TO_REPORT(CNT) OVER ())*100,6) PERCENT
FROM
  (SELECT
    STATUS,
    COUNT(*) CNT
  FROM
    T1
  GROUP BY
    STATUS);

STATUS         CNT    PERCENT
------- ---------- ----------
INVALID          1    .001513
VALID        66095  99.998487 

99.998% of the table’s rows have a STATUS of VALID with just a single row having a STATUS of invalid.

Now let’s collect the statistics for the table and index and check the execution plan:

ANALYZE INDEX IND_T1_STATUS COMPUTE STATISTICS;
ANALYZE TABLE T1 COMPUTE STATISTICS;

SET AUTOTRACE TRACEONLY EXPLAIN
SET PAGESIZE 1000
SET LINESIZE 140

SELECT
  *
FROM
  T1
WHERE
  STATUS='INVALID';

Execution Plan
----------------------------------------------------------
Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 33048 |  3227K|   265   (1)| 00:00:04 |
|*  1 |  TABLE ACCESS FULL| T1   | 33048 |  3227K|   265   (1)| 00:00:04 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("STATUS"='INVALID') 

If you were assigned to help the OP achieve his task with the test case, what would you do?





The New Order Oracle Coding Challenge 4 – Tic Tac Toe

12 08 2011

(Back to the Previous Post in the Series)

Tic Tac Toe, the game of X’s and O’s, was an oddly popular game in elementary school.  When playing the game you quickly learn a couple of rules:

  • Because X always places his mark first (alternating between X and O), there is an unfair advantage for the player placing the X marks.
  • The player placing his mark in the center square typically has an advantage in the game.
  • Once a player scores three marks in a row horizontally, vertically, or diagonally, there is no point in continuing the game.

-

-

-

-

-

-

-

Consider the above game matches.  In the first two matches both X and O score three in a row, but which player won the match – it depends on the order in which the marks were placed.  In the third match, X won with a diagnal three in a row.  The final match resulted in a tie (neither player won).  You can use a SQL statement similar to the following to output all of the roughly 362,000 combinations (note that not all combinations are unique – rotating the board 90 or 180 degrees to generate a unique combination probably is not fair, and the order in which the marks are placed could matter):

WITH
 N AS
  (SELECT
    ROWNUM N
  FROM
    DUAL
  CONNECT BY
    LEVEL<=9),
 C AS
  (SELECT
    'XOX'||'OXO'||'XOX' C
  FROM
    DUAL)
SELECT
  SUBSTR(C.C,N1.N,1) ||
  SUBSTR(C.C,N2.N,1) ||
  SUBSTR(C.C,N3.N,1) || CHR(10) ||
  SUBSTR(C.C,N4.N,1) ||
  SUBSTR(C.C,N5.N,1) ||
  SUBSTR(C.C,N6.N,1) || CHR(10) ||
  SUBSTR(C.C,N7.N,1) ||
  SUBSTR(C.C,N8.N,1) ||
  SUBSTR(C.C,N9.N,1) GAME
FROM
  N N1,
  N N2,
  N N3,
  N N4,
  N N5,
  N N6,
  N N7,
  N N8,
  N N9,
  C
WHERE
  N1.N<>N2.N
  AND N1.N<>N3.N
  AND N1.N<>N4.N
  AND N1.N<>N5.N
  AND N1.N<>N6.N
  AND N1.N<>N7.N
  AND N1.N<>N8.N
  AND N1.N<>N9.N
  AND N2.N<>N3.N
  AND N2.N<>N4.N
  AND N2.N<>N5.N
  AND N2.N<>N6.N
  AND N2.N<>N7.N
  AND N2.N<>N8.N
  AND N2.N<>N9.N
  AND N3.N<>N4.N
  AND N3.N<>N5.N
  AND N3.N<>N6.N
  AND N3.N<>N7.N
  AND N3.N<>N8.N
  AND N3.N<>N9.N
  AND N4.N<>N5.N
  AND N4.N<>N6.N
  AND N4.N<>N7.N
  AND N4.N<>N8.N
  AND N4.N<>N9.N
  AND N5.N<>N6.N
  AND N5.N<>N7.N
  AND N5.N<>N8.N
  AND N5.N<>N9.N
  AND N6.N<>N7.N
  AND N6.N<>N8.N
  AND N6.N<>N9.N
  AND N7.N<>N8.N
  AND N7.N<>N9.N
  AND N8.N<>N9.N; 

The output of the above SQL statement should appear similar to the following:

GAME
----
...

XXX
OOO
OXX

XXX
OOO
OXX

...

XOX
XOO
OXX

OXX
XOO
OXX

XXO
XOO
XOX

XXO
XOO
XOX

XOX
XOO
XOX 

...

The big problem with the above SQL statement is that it is not clear which player won in all cases.  Ideally, the N1.N value would be used to output an X mark in the specified position, the N2.N value would be used to output an O mark in a specified position, the N3.N value would be used to output an X in the specified position, etc. until one of the players places three marks in a row.  For example, if N1.N is 5, an X would be placed in the center square, if N2.N is 9, an O would be placed in the bottom right square.

Now that we know that the order in which the marks are placed is important, how would be know when a player wins?  You could experiment with the following:

Vertical win, with three positions having the same resulting values:

SELECT
  MOD(position - 1, 3) + 1 V
FROM
  DUAL; 

Horizontal win, with three positions having the same resulting values:

SELECT
  TRUNC((position - 1) / 3) + 1 V
FROM
  DUAL;

Diagnal win \ when V=0 in three positions:

SELECT
  (MOD(position - 1, 3) + 1) - (TRUNC((position - 1) / 3) + 1) V
FROM
  DUAL;

Diagnal win / when V=0 (watch out, could end up in a V pattern) or V=2:

SELECT
  ABS((MOD(position - 1, 3) + 1) - (TRUNC((position - 1) / 3) + 1)) V
FROM
  DUAL; 

———–

OK, now that I have explained the game, given you a couple of SQL statements to possibly help you with the solution… on to the challenge.  With the help of Oracle Database build a tic tac toe solver that will help a player win at tic tac toe.  Provided a current description of the board, for example ‘_X_’||’O_X’||’OX_’, display all board solutions that allows player O (or player X if his turn is next) to win.  (Side note: I have not yet completed the solution to this challenge – it might be possible to accomplish this challenge with just a SQL statement.)





The New Order Oracle Coding Challenge 3 – Mind Boggle

5 08 2011

August 5, 2011 (Modified August 7, 2011)

(Back to the Previous Post in the Series) (Forward to the Next Post in the Series)

In part 1 of this series the challenge was to simply reverse the order of digits in the numbers from 1 to 1,000,000 to find that cases where the numbers formed by the reverse ordered digits evenly divided into the original number.  In part 2 of this series the challenge required examining all of the numbers between 1000 and 9999, where arranging the digits of the original number into any of 23 valid combinations resulted in a new number that cleanly divided into the original four digit number.  There were several different solutions provided to the two challenges, so now it is time to move on to part three of the series.

In part 1 of this blog article series I mentioned playing a game years ago that used letters on the face of dice – the dice were rolled, and then the challenge was to find all words that could be completely spelled using the letters on the top of the dice.  I was not very good at the game, so I enlisted the help of a computer.  One such dice game is called Boggle, and that game’s name is probably fitting for today’s challenge.  Imagine that you played this game and the following letters appeared on the top of the dice:

One of the rules of the game requires that words must be at least 3 letters in length, for example: you say melee eye (I) see elfs file some mail (OK, the word I is too short, but we can have some fun with the words that are found).  As you might be able to guess, there are a lot of possible combinations of the 16 letters found on the dice, some of which are valid words.  If we just consider the 5 letter, 4 letter, and 3 letter combinations of the dice, there are more than a half million possible combinations (in the following table, multiply the numbers across and add the results for each row) - no wonder I needed the computer’s help with these puzzles.

16 15 14 13 12   = 16! / 11!
16 15 14 13     = 16! / 12!
16 15 14       = 16! / 13!
            = 571,200

To make the full challenge of finding words a little easier, let’s break the challenge into a couple of parts:

 Part 1: Consider the 2 x 2 letter arrangement at the left.  With the help of Oracle Database, list all of the three letter combinations of those four letters.  There will be 4 * 3 * 2 = 24 possible combinations of the letters.

 

 

 Part 2: Consider the 4 x 4 letter arrangement at the left.  With the help of Oracle Database, list all of the four letter combinations of those 16 letters.  There will be 16 * 15 * 14 * 13 = 43,680 possible combinations of the letters.

-

-

-

-

-

Part 3: Consider the 4 x 4 letter arrangement above.  With the help of Oracle Database, list all of the three, four, five, and six letter combinations of those 16 letters.  If you see any seven letter words in the above set of letters, you might as well retrieve those letter combinations also.  How many letter combinations do you have in total for part 3?

Part 4: Extra Credit: How many of the letter combinations generated in part 3 above are valid U.S. or U.K. English words?  List the words.

Part 5: Extra, Extra Credit: List any words found in the letters at the left that have any connection to Oracle Corporation.  Remember that a letter can only be used as many times in a single word as it appears at the left (if you can form a word with three letter A’s that have a connection to Oracle Corp., go for it.).

-

-

-

-

-

-

-

-

-

-

-

-

-

-

Added August 7, 2011:

When I put together this challenge I did not think that it was possible to complete Part 4 Extra Credit using just SQL.  I was fairly certain that there were some interesting techniques to retrieve HTML content with the help of PL/SQL, but I had not worked out a solution that utilized that technique.  As I write this, Radoslav Golian in the comments section appears to have both a PL/SQL and a SQL solution that uses the dictionary.reference.com website to validate the words (only 6 words to avoid a denial of service type attack on the dictionary.reference.com website).  One of the approaches that I considered, but did not develop, is something similar to how Radoslav verified the words, but I would use a VBS script to submit the request and check the result as is demonstrated in these two articles: Submit Input to an ASP Web Page and Retrieve the Result using VBS and Use VBS to Search for Oracle Books using Google’s Book Library.

The solution that I put together for Part 4 Extra Credit started with an Excel macro that I posted in another blog article, which was then converted to PL/SQL.  I then transformed the PL/SQL for use in this article, and generated a new Excel macro from the PL/SQL code.  The Excel macro (along with the calling code looks like this:

Sub StartBoggle()
    Call Boggle("ESOIMEFOALEUSAYE", 6, 3)
End Sub

Sub Boggle(strCharacters As String, intMaxWordLength As Integer, intMinWordLength As Integer)
    Dim i As Integer
    Dim strCharacter(20) As String
    Dim intCharacterIndex(20) As Integer
    Dim intCharacters As Integer
    Dim intCharactersMax As Integer
    Dim intCharactersMin As Integer
    Dim intNumberOfSuppliedCharacters As Integer
    Dim intAdjustmentPosition As Integer
    Dim intFlag As Integer
    Dim strOutput As String
    Dim strWords(10000) As String
    Dim intWordCount As Integer
    Dim intFilenum As Integer

    intFilenum = FreeFile
    Open "C:\Words " & strCharacters & ".txt" For Output As #intFilenum

    If intMaxWordLength = 0 Then
        intCharactersMax = Len(strCharacters)
    Else
        If intMaxWordLength <= Len(strCharacters) Then
            intCharactersMax = intMaxWordLength
        Else
            intCharactersMax = Len(strCharacters)
        End If
    End If

    If intMinWordLength = 0 Then
        intCharactersMin = 3
    Else
        If intMaxWordLength < intMinWordLength Then
            intCharactersMin = intCharactersMax
        Else
            intCharactersMin = intMinWordLength
        End If
    End If

    intNumberOfSuppliedCharacters = Len(strCharacters)

    For i = 1 To intNumberOfSuppliedCharacters
        strCharacter(i) = Mid(strCharacters, i, 1)
    Next i

    intCharacters = intCharactersMin - 1
    intWordCount = 0

    Do While intCharacters < intCharactersMax
        intCharacters = intCharacters + 1
        intAdjustmentPosition = 1
        For i = 1 To intCharacters
            intCharacterIndex(i) = i
        Next i

        Do While intAdjustmentPosition > 0
            intFlag = 0
            For i = 1 To intAdjustmentPosition - 1
                If intCharacterIndex(i) = intCharacterIndex(intAdjustmentPosition) Then
                    ' Found a duplicate index position in the other values to the left
                    intFlag = 1
                    Exit For
                End If
            Next i

            If intFlag = 1 Then
                ' Try the next index position in this element
                intCharacterIndex(intAdjustmentPosition) = intCharacterIndex(intAdjustmentPosition) + 1
            Else
                If intAdjustmentPosition = intCharacters Then
                    ' Output
                    strOutput = ""
                    For i = 1 To intCharacters
                        strOutput = strOutput & strCharacter(intCharacterIndex(i))
                    Next i

                    intFlag = 0
                    For i = intWordCount To 1 Step -1
                        If strOutput = strWords(i) Then
                            intFlag = 1
                            Exit For
                        End If
                    Next i
                    If intFlag = 0 Then
                        If Application.CheckSpelling(Word:=UCase(strOutput)) <> 0 Then
                            intWordCount = intWordCount + 1
                            strWords(intWordCount) = strOutput

                            Print #intFilenum, strOutput
                            Debug.Print strOutput
                        End If
                    End If

                    If intCharacterIndex(intAdjustmentPosition) = intNumberOfSuppliedCharacters Then
                        ' No more available values in the last position
                        intCharacterIndex(intAdjustmentPosition) = 1
                        intAdjustmentPosition = intAdjustmentPosition - 1
                        If intAdjustmentPosition > 0 Then
                            intCharacterIndex(intAdjustmentPosition) = intCharacterIndex(intAdjustmentPosition) + 1
                        End If
                    Else
                        intCharacterIndex(intAdjustmentPosition) = intCharacterIndex(intAdjustmentPosition) + 1
                    End If
                Else
                    ' No duplicate so prepare to check the next position
                    intAdjustmentPosition = intAdjustmentPosition + 1
                End If
            End If

            Do While (intAdjustmentPosition > 0) And (intCharacterIndex(intAdjustmentPosition) > intNumberOfSuppliedCharacters)
                ' Roll back one index position as many times as necessary
                intCharacterIndex(intAdjustmentPosition) = 1
                intAdjustmentPosition = intAdjustmentPosition - 1
                If intAdjustmentPosition > 0 Then
                    intCharacterIndex(intAdjustmentPosition) = intCharacterIndex(intAdjustmentPosition) + 1
                End If
            Loop ' (intAdjustmentPosition > 0) And
        Loop 'intAdjustmentPosition > 0
    Loop 'intCharacters < intCharactersMax

    Close #intFilenum
End Sub 

The Excel macro builds letter combinations that are between the minimum and maximum length, and then tests those letter combinations using the built-in dictionary that is in Excel.  I had a little bit of difficulty coming up with a way to generate the letter combinations of variable length, so I settled on a custom developed technique – I would simply keep track of the original character positions, manipulate those original character positions, and then output the corresponding characters.  The challenge is then how does one verify that the same character position is not used more than once in a single word? 

 The method that I came up with is as follows, which assumes that we are trying to build four letter words from the supplied 16 letters.  We can start with the seed combination 1,2,3,4.  The idea is to work from left to right, and then back to the left.  Every time to make it to the right, we output a word, when we make it all the way back to the left (just before the number 1 in the above), we are done.  The rules are simple:

  • Increment the number in a position, and if that number does not appear in a position to the left, move one position to the right.
  • When the maximum character number (16 in this example) is exceeded in a position, reset the number to 1, move one position to the left, and increment the value in the new position by 1.
  • In the last position the character number should be incremented as many times as necessary to reach the maximum character number – each time a potential new combination will be generated.

 But there is a problem with this approach – it does not use Oracle Database!

-

Let’s go back to the PL/SQL function from which I created the Excel function (I have not worked much with pipelined functions - so there may be one or two errors):

CREATE OR REPLACE FUNCTION BOGGLE_VAR_LENGTH(strCHARACTERS IN VARCHAR2, intMaxWordLength IN NUMBER, intMinWordLength IN NUMBER) RETURN SYS.AQ$_MIDARRAY PIPELINED
AS
  TYPE NUMBER_ARRAY IS TABLE OF NUMBER INDEX BY PLS_INTEGER;
  TYPE CHARACTER_ARRAY IS TABLE OF VARCHAR(1) INDEX BY PLS_INTEGER;
  strCharacter CHARACTER_ARRAY;
  intCharacterIndex NUMBER_ARRAY;
  intCharacters NUMBER;
  intCharactersMax NUMBER;
  intCharactersMin NUMBER;
  intNumberOfSuppliedCharacters NUMBER;
  intAdjustmentPosition NUMBER;
  intFlag NUMBER;
  intI NUMBER;
  strOutput VARCHAR2(100);
BEGIN
  IF intMaxWordLength IS NULL THEN
    intCharactersMax := LENGTH(strCHARACTERS);
  ELSE
    IF intMaxWordLength <= LENGTH(strCHARACTERS) THEN
      intCharactersMax := intMaxWordLength;
    ELSE
      intCharactersMax := LENGTH(strCHARACTERS);
    END IF;
  END IF;

  IF intMinWordLength IS NULL THEN
    intCharactersMin := 3;
  ELSE
    IF intMaxWordLength < intMinWordLength THEN
      intCharactersMin := intCharactersMax;
    ELSE
      intCharactersMin := intMinWordLength;
    END IF;
  END IF;

  intNumberOfSuppliedCharacters := LENGTH(strCHARACTERS);

  FOR I IN 1.. intNumberOfSuppliedCharacters LOOP
    strCharacter(I) := SUBSTR(strCHARACTERS, I, 1);
  END LOOP;

  intCharacters := intCharactersMin - 1;
  WHILE intCharacters < intCharactersMax LOOP
    intCharacters := intCharacters + 1;
    intAdjustmentPosition := 1;
    FOR I IN 1 .. intCharacters LOOP
      intCharacterIndex(I) := I;
    END LOOP;

    WHILE intAdjustmentPosition > 0 LOOP
      intFlag := 0;
      FOR I IN 1 .. intAdjustmentPosition - 1 LOOP
        IF intCharacterIndex(I) = intCharacterIndex(intAdjustmentPosition) Then
          -- Found a duplicate index position in the other values to the left
          intFlag := 1;
        END IF;
      END LOOP;
      IF intFlag = 1 Then
        -- Try the next index position in this element
        intCharacterIndex(intAdjustmentPosition) := intCharacterIndex(intAdjustmentPosition) + 1;
      ELSE
        IF intAdjustmentPosition = intCharacters Then
          -- Output
          strOutput := '';
          FOR i IN 1 .. intCharacters LOOP
            strOutput := strOutput || strCharacter(intCharacterIndex(i));
          END LOOP;

          PIPE ROW (strOutput);

          IF intCharacterIndex(intAdjustmentPosition) = intNumberOfSuppliedCharacters THEN
            -- No more available values in the last position
            intCharacterIndex(intAdjustmentPosition) := 1;
            intAdjustmentPosition := intAdjustmentPosition - 1;
            IF intAdjustmentPosition > 0 THEN
              intCharacterIndex(intAdjustmentPosition) := intCharacterIndex(intAdjustmentPosition) + 1;
            END IF;
          ELSE
            intCharacterIndex(intAdjustmentPosition) := intCharacterIndex(intAdjustmentPosition) + 1;
          END IF;
        ELSE
          -- No duplicate so prepare to check the next position
          intAdjustmentPosition := intAdjustmentPosition + 1;
        END IF;
      END IF;

      WHILE (intAdjustmentPosition > 0) And (intCharacterIndex(intAdjustmentPosition) > intNumberOfSuppliedCharacters) LOOP
        -- Roll back one index position as many times as necessary
        intCharacterIndex(intAdjustmentPosition) := 1;
        intAdjustmentPosition := intAdjustmentPosition - 1;
        IF intAdjustmentPosition > 0 THEN
          intCharacterIndex(intAdjustmentPosition) := intCharacterIndex(intAdjustmentPosition) + 1;
        END IF;
      END LOOP;
    END LOOP;
  END LOOP;
END;
/ 

 We are able to call the function from a SQL statement like this:

SELECT
  *
FROM
  TABLE(BOGGLE_VAR_LENGTH('ESOIMEFOALEUSAYE', 6, 3)); 

Remember that there are more than a half million character combinations for just the 3, 4, and 5 letter combinations – the above will as for 6,336,960 letter combinations to be generated.   But there is a problem with this approach – it does not verify that the letter combinations are actual words!

For fun, let’s see how many possible combinations will result if we allow 3, 4, 5, 6, 7, and 8 letter combinations:

Len                 Combinations  
8 16 15 14 13 12 11 10 9 518,918,400 = 16! / 8!
7 16 15 14 13 12 11 10   57,657,600 = 16! / 9!
6 16 15 14 13 12 11     5,765,760 = 16! / 10!
5 16 15 14 13 12       524,160 = 16! / 11!
4 16 15 14 13         43,680 = 16! / 12!
3 16 15 14           3,360 = 16! / 13!
                  582,912,960 582,912,960

That is more than a half billion combinations!  Warning, significant database server CPU consumption will result when generating all combinations.

Let’s take a look at the final solution that I created for Part 4 Extra, Extra Credit.  The solution is an Excel macro that calls the PL/SQL function through a SQL statement:

Sub StartBoggleOracle()
    Call BoggleOracle("ESOIMEFOALEUSAYE", 8, 3)
End Sub

Sub BoggleOracle(strCharacters As String, intMaxWordLength As Integer, intMinWordLength As Integer)
    Dim strSQL As String
    Dim strUsername As String
    Dim strPassword As String
    Dim strDatabase As String
    Dim intFilenum As Integer

    Dim intCharacters As Integer
    Dim intCharactersMax As Integer
    Dim intCharactersMin As Integer
    Dim strOutput As String

    Dim dbDatabase As ADODB.Connection
    Dim snpData As ADODB.Recordset

    Set dbDatabase = New ADODB.Connection
    Set snpData = New ADODB.Recordset

    strUsername = "MyUsername"
    strPassword = "MyPassword"
    strDatabase = "MyDatabase"

    dbDatabase.ConnectionString = "Provider=OraOLEDB.Oracle;Data Source=" & strDatabase & ";User ID=" & strUsername & ";Password=" & strPassword & ";FetchSize=5000;"
    dbDatabase.Open

    intFilenum = FreeFile
    Open "C:\WordsOracle " & strCharacters & ".txt" For Output As #intFilenum

    If intMaxWordLength = 0 Then
        intCharactersMax = Len(strCharacters)
    Else
        If intMaxWordLength <= Len(strCharacters) Then
            intCharactersMax = intMaxWordLength
        Else
            intCharactersMax = Len(strCharacters)
        End If
    End If

    If intMinWordLength = 0 Then
        intCharactersMin = 3
    Else
        If intMaxWordLength < intMinWordLength Then
            intCharactersMin = intCharactersMax
        Else
            intCharactersMin = intMinWordLength
        End If
    End If

    strSQL = "SELECT DISTINCT" & vbCrLf
    strSQL = strSQL & "  *" & vbCrLf
    strSQL = strSQL & "FROM" & vbCrLf
    strSQL = strSQL & "  (SELECT" & vbCrLf
    strSQL = strSQL & "    *" & vbCrLf
    strSQL = strSQL & "  FROM" & vbCrLf
    strSQL = strSQL & "    TABLE(BOGGLE_VAR_LENGTH('" & strCharacters & "', " & Format(intCharactersMax) & ", " & Format(intCharactersMin) & ")))" & vbCrLf
    strSQL = strSQL & "ORDER BY" & vbCrLf
    strSQL = strSQL & "  1"
    snpData.Open strSQL, dbDatabase

    If snpData.State = 1 Then
        Do While Not snpData.EOF
            strOutput = snpData(0)
            If Application.CheckSpelling(Word:=UCase(strOutput)) <> 0 Then
                Print #intFilenum, strOutput
                Debug.Print strOutput
            End If

            snpData.MoveNext
        Loop

        snpData.Close
    End If

    Close #intFilenum
    dbDatabase.Close
    Set snpData = Nothing
    Set dbDatabase = Nothing
End Sub

The words found appear to depend on the version of Excel – Excel 2010 seems to find more words than Excel 2007.

  • The 799 word list from Excel 2007 for word lengths between 3 and 8 characters, including the timing information to show when the SQL statement was submitted, when the first 5,000 combinations were retrieved from the database, and when the Excel spell check finished.  Words Oracle_ESOIMEFOALEUSAYE.txt
  • The 2,179 word list from Excel 2007 for word lengths between 3 and 8 characters, including the timing information to show when the SQL statement was submitted, when the first 5,000 combinations were retrieved from the database, and when the Excel spell check finished. Words Oracle_OSERIEFAARLNCAYL.txt

Excel found Ellison in the second word list.  For Part 5 Extra, Extra Credit, what other words connected to Oracle Corporation were found?





The New Order Oracle Coding Challenge 2

2 08 2011

August 2, 2011

(Back to the Previous Post in the Series) (Forward to the Next Post in the Series)

In the previous article in this series we tried to develop different methods for identifying special numbers between 1 and 1,000,000, such that the number formed by reversing the order of the digits will evenly divide into the original number.  I initially predicted that there would be at least three unique solutions provided before the undocumented REVERSE function was mentioned.  A fellow Oak Table member quickly provided a solution using the REVERSE function, which happens to be the fastest of the various approaches.  Another reader noticed a pattern that helps find those special numbers – if that pattern identifies all of the special numbers, it is easily the most efficient approach to finding these special numbers.

Today’s blog article also is on the topic of finding special numbers, determined as special after changing the order of the number’s digits.  I used my fancy, on-point introduction for this article in the earlier article (letters on the dice, trying to form words from those letters), so I thought that I would jump the little pony over the edge of the cliff in search of another introduction (there are probably five readers with an odd image stuck in their head at this moment).   ;-)

Have you ever noticed the disparity between the Environment Protection Agency’s (EPA or the equivalent in other countries) estimates for fuel efficiency and the actual fuel efficiency calculated with the hand held calculator?  Take for instance a car that I own – that car has never achieved the EPA’s estimated fuel efficiency rating.  A 412 horsepower (at the flywheel as claimed by the manufacturer, other tests have concluded that it might be higher) V8 achieving 25 miles per U.S. gallon (MPG) of fuel?

Time for the Oracle Database test case.  On a frigid 32 degree day (in Celsius, that is a hot 90 degrees Fahrenheit), I lowered the windows, topped off the fuel tank with premium grade fuel, reset the car’s fuel efficiency gauge (the MPG gauge… oddly, this gauge always reads 6% to 8% lower than the actual fuel efficiency), and then set out on a roughly 117 mile (188 KM) drive down the highway with the radar detector on the dash board.  While I have heard that the car’s speed limit is set at the factory to roughly 160 miles per hour (258 KM/H), the posted speed limit was typically 55 MPH (miles per hour) except in the roughly half-dozen or so cities along the way.  The car never did hit the EPA’s estimated fuel efficiency rating.  Somebody was playing with the numbers a bit, I think. :-)

-

-

-

-

-

-

-

-

Along the route selected for the 117 mile test loop, I captured cell phone pictures showing the car’s fuel efficiency gauge (MPG gauge).  Below are the cell phone pictures captured at 15 miles into the drive, roughly half way through the drive, and near the end of the 117 mile drive. 

So, how does Oracle Database play a part in the introduction to this blog article?  Let’s take a guess at the car’s actual fuel efficiency using a SQL statement:

SELECT
  29.5 CAR,
  ROUND(29.5 * 1.06, 2) LOW_ACTUAL,
  ROUND(29.5 * 1.08, 2) HIGH_ACTUAL
FROM
  DUAL;

       CAR LOW_ACTUAL HIGH_ACTUAL
---------- ---------- -----------
      29.5      31.27       31.86

So much for EPA estimates – between 25.08% and 27.44% lower than the actual fuel efficiency.  Does this test case then suggest that if we do not use all of the features that are available in a tool, that we are then more efficient than expected?  I think that I need another test case to decide.  ;-)

Side note: For the tank of fuel when the car rolled over the 1,000 mile mark, the car achieved 28.75 miles per gallon while touring some of the sites around Michigan, specifically the hills and curves along the north-west shoreline of the lower peninsula (several of those hills were steep – for one hill in particular the car downshifted three gears to maintain speed going down the hill).  The car’s overall fuel efficiency since it left the factory is 25.58 MPG (excluding the 117 mile test case), so I guess that one could argue that if you TRUNC the fuel efficiency number, at the root the EPA is right.  Make like a tree and leave (that was a movie quote from “Back to the Future”).

After returning from the test case I set out on my other 412 HP vehicle… cough, that would be a 4 cylinder 12 horsepower 1946 Farmall A tractor.  Time to mow the lawn in second gear.  Cutting a 6 foot (1.83 meter) wide path through grass that is up to 12 inches (30.48 cm) high, and capable of mowing down 2 inch diameter trees as necessary.  Oh, the power of the horse…

OK, now that we have drifted far from the topic of this article and have had a chance to play with numbers in the long introduction, on to the challenge.  Consider the four digit integer numbers between 1,000 and 9,999.  There are 4! (1 * 2 * 3 * 4 = 24) possible combinations of the four digits of each of these numbers.  For example, consider the number 8712 - the 24 possible combinations are:

1278
1287
1728
1782
1827
1872
2178
2187
2718
2781
2817
2871
7128
7182
7218
7281
7812
7821
8127
8172
8217
8271
8712 8721

The objective is to find all combinations of the four digit numbers between 1,000 and 9,999 that evenly divide into the original number.  For the above example with the original number 8712, the combination 2178 evenly divides into 8712.  Thus, the output would include:

ORIGINAL COMBINATION
-------- -----------
    8712        2178

The restrictions:

  • The four digit original numbers between 1,000 and 9,999 should not be included in the output if the last digit of those numbers is 0 – the combination number may end with 0 if the original number included a 0 digit.  Thus, do not include the original numbers 1000, 1010, 1020, 1030, 1040, etc.
  • The original value cannot be equal to the combination value (in the above example, it is not valid to output the combination value 8712).
  • Each original number and combination pair should be output at most one time.
  • You may not reuse a digit position twice; all digit positions must be used once.  For example, the number 7141 includes the digit 1 twice, therefore, all generated combinations of that number must include two number 1 digits.

This challenge is a bit more difficult than was the challenge in part 1 of this series.  The greatest difficulty likely involves creating the 23 valid digit combinations (there are 24, but one of those combinations will always be invalid) of the four digit numbers. 

The problem is solvable – will your solution be the same as mine?  I will give the readers a couple of days to ponder the thoughts of why a horse entered a race between a car and a tractor before sharing my approach to solving the problem.





The New Order Oracle Coding Challenge 1

31 07 2011

July 31, 2011

(Forward to the Next Post in the Series)

Years ago I played a couple of different games with letters, rather than numbers, on dice – attempting to form words from the letters displayed on top of the dice.  I was not very good with those games, and I recall attempting to create a computer program to help me with the challenge.  The screaming fast 1 MHz CPU and 64KB of memory proved to be no match for the words listed in the paper-bound dictionary sitting on the shelf.  Computers are significantly faster now, with wide-spread access to Internet based word lists, so it probably would not be much of a challenge today to build a solution for one of those dice letter games.

Finding different combinations with number digits is a bit easier than working with letters – we are able to use math rules to determine if the specified conditions are met rather than a dictionary.  We will start with an easy problem that I found on the web, but I will keep the source of that problem a secret for now.  Consider the number 989,901.  If we write the digits in that number from right to left, we obtain the new number 109,989.  What is special about that new number?  The number is evenly divisible by 3 and 9, but more interesting is that the new number divides evenly into the original number (989,901).

With the help of Oracle Database, find all of the numbers from 1 to 1,000,000 where the digits in the number when listed from left to right are evenly divisible by those same digits listed from right to left.  The numbers that end with 0 are a special case, reversing the order of the digits in those numbers will result in the 0 digit effectively disappearing – as such, exclude any number that ends with 0 from being tested.

There are several methods to swap the order of the digits in the number.  Would you use a different method to test all of the numbers between 1 and 10,000, or to test all of the numbers up to 10,000,000?

Might it work to store the numbers in a reverse key index, and then dump the resulting index values – is that the fourth method to switch the order of the digits?  ;-)





The Unique Result Oracle Database Coding Challenge

28 07 2011

July 28, 2011

I must say that I am impressed with the number of unique solutions that were developed for the previous coding challenge (FizzBuzz).  While not all solutions were extremely efficient (a couple were intentionally designed to be as inefficient as possible), the various techniques provide views of different approaches to solving a problem that was not well defined (even though at first glance it appeared to be well defined).  While not all of the solutions presented are optimal for the FizzBuzz challenge, derivatives of those solutions might be perfect for real-world problems (for instance, side-tracking a pesky DBA or developer with a performance challenge).

Time for another coding challenge.  This challenge is adapted from one that was posed in a Usenet thread several years ago.  Remember that if a specification is not well defined, feel free to interpret the specification – one of the goals of this blog article is to see a variety of solutions, but a secondary goal is to determine what might happen when weak specifications are provided in a request for a solution.

Assume that there are two tables, T1 and T2, as designed and populated below.  The two tables contain the results from experiments performed by two different analysts.

DROP TABLE T1 PURGE;
DROP TABLE T2 PURGE;

CREATE TABLE T1 (
  C1 VARCHAR2(10),
  C2 VARCHAR2(10),
  C3 VARCHAR2(10),
  C4 VARCHAR2(10));

CREATE TABLE T2 (
  C1 VARCHAR2(10),
  C2 VARCHAR2(10),
  C3 VARCHAR2(10),
  C4 VARCHAR2(10));

INSERT INTO T1 VALUES ('TEST1A','TEST1B','TEST1C','TEST1D');
INSERT INTO T1 VALUES ('TEST2A','TEST2B','TEST2C','TEST2D');
INSERT INTO T1 VALUES ('TEST3A','TEST3B','TEST3C','TEST3D');
INSERT INTO T1 VALUES ('TEST4A','TEST4B','TEST4C','TEST4D');
INSERT INTO T1 VALUES ('TEST5A','TEST5B','TEST5C','TEST5D');

INSERT INTO T2 VALUES ('TEST1A','TEST1B','TEST1C','TEST1D');
INSERT INTO T2 VALUES ('TEST2A','TEST1B','TEST1C','TEST2D');
INSERT INTO T2 VALUES ('TEST3A','TEST3B','TEST3C','TEST3D');
INSERT INTO T2 VALUES ('TEST5A','TEST5B','TEST5C','TEST5D');
INSERT INTO T2 VALUES ('TEST6A','TEST5B','TEST5C','TEST6D');  

This is a three part challenge.

Part 1: Return all rows from table T1 where the C2 value is not found in table T2′s column C2 values, and return all rows from table T2 where the C2 value is not found in table T1′s column C2 values.  In short, we want the rows that have unique results found in column C2 so that we may identify the differences found by the two analysts.

Part 2: Using a single select statement, return all rows found in table T1 that are not duplicated in table T2, and all rows found in table T2 that are not duplicated in table T1.  The resulting output must identify the table from which the row was found.

Part 3: A third analyst performed experiments and entered his results in table T3, as designed and populated below.  Using a single SQL statement, retrieve the rows from tables T1, T2, and T3 that are not duplicated in the other tables – find the unique rows from the three tables.  Each row should identify its source table (T1, T2, or T3).

DROP TABLE T3 PURGE;

CREATE TABLE T3 (
  C1 VARCHAR2(10),
  C2 VARCHAR2(10),
  C3 VARCHAR2(10),
  C4 VARCHAR2(10));

INSERT INTO T3 VALUES ('TEST1A','TEST1B','TEST1C','TEST1D');
INSERT INTO T3 VALUES ('TEST2A','TEST1B','TEST1C','TEST2D');
INSERT INTO T3 VALUES ('TEST6A','TEST6B','TEST6C','TEST6D'); 

When you post your solutions, please identify Part 1, Part 2, or Part 3 next to the solution.

General tips for comments:

  • Include your code portion of the solution in a monospaced font (Courier) to retain the alignment spaces in the solution.  To do this, use a <pre> tag just before the code portion of the solution and a </pre> tag just after the solution ([s ourcecode] and [/s ourcecode] tags (without the space between the first two letters) should have the same result, just with a smaller font size and a non-scrolling code area).
  • Less than (<) and greater than (>) signs have a special meaning in HTML web pages.  As such, specify &lt; for a less than sign and &gt; for a greater than sign in code sections to avoid having portions of your code sections magically disappear.

Think about how you would build the solutions before scrolling down to the comments section.

-

-

-

-

-

-

-

-

Part 2:

SELECT  NVL(T1.C1,T2.C1) C1, NVL(T1.C2,T2.C2) C2, NVL(T1.C3,T2.C3) C3, NVL(T1.C4,T2.C4) C4, NVL2(T1.C1,'T1','T2') FROM_TABLE FROM T1 FULL OUTER JOIN T2 ON T1.C1=T2.C1 AND T1.C2=T2.C2 AND T1.C3=T2.C3 AND T1.C4=T2.C4 WHERE T1.C1 IS NULL OR T2.C1 IS NULL; 

-

-

-

-

-

-





The FizzBuzz Oracle Database Coding Challenge

26 07 2011

July 26, 2011

Through a web search I located a page titled “Coding Horror: Why Can’t Programmers.. Program?”  A simple question was asked in an interview, and apparently 199 of 200 programmers struggled to build a solution for the problem in less than ten minutes.  The problem must be that the 199 people who did not succeed did not have access to an Oracle Database.  The same question was posed to SQL Server developers in the form of a quiz.  Before looking at the articles, see if you are able to solve the following problem with the help of Oracle Database:

Write a program that prints the numbers from 1 to 100. But for multiples of three print “Fizz” instead of the number and for the multiples of five print “Buzz”. For numbers which are multiples of both three and five print “FizzBuzz”.

How many different solutions are there for this problem?  Think about the problem before scrolling down.

-

-

-

-

-

-

-

-

-

-

-

-

-

My solution:

SELECT
  NVL(DECODE(ROWNUM/3,TRUNC(ROWNUM/3),'Fizz',NULL)||DECODE(ROWNUM/5,TRUNC(ROWNUM/5),'Buzz',NULL),TO_CHAR(ROWNUM)) FIZZBUZZ
FROM
  DUAL
CONNECT BY
  LEVEL<=100; 

-

-

-

-

-

-

-

I think that I remember solving a similar problem using an IBM PC Jr. using BASICA years ago.  How may ways can this problem be solved with the help of Oracle Database?  Before you answer, you might be thinking to yourself why would someone ask such a simple question?  Could there be an overly complex solution, something that the interviewer had never seen before, that was the intended response to the question?





How Many Ways to Solve this Problem? Generate a Calendar that Displays Average Hours Per Day

18 07 2011

July 18, 2011

I am surprised at the significant number of unique solutions to the SQL problems that I have previously posed on this blog.  For fun I thought that I would give people another chance to demonstrate their unique approaches to solving another problem.

Supposed that a transaction table exists with the following definition:

CREATE TABLE T1 (
  TRANS_ID NUMBER,
  EMPLOYEE_ID VARCHAR2(10),
  PART_ID VARCHAR2(20),
  CLOCK_IN DATE,
  CLOCK_OUT DATE,
  QTY_COMPLETE NUMBER,
  PRIMARY KEY (TRANS_ID));  

The above table simulates a transaction table that might capture production run times for people (or even robots) to produce sets of parts.  The requirement is rather simple – we want to know the average number of hours required to produce a particular part by day in a calendar type layout.  The only restriction is that the output must appear in a SQL*Plus window (or SQL*Plus may be used to spool the result to a text file).  For example, the output might look like this (feel free to be creative):

PART_ID    W     SUNDAY     MONDAY    TUESDAY  WEDNESDAY   THURSDAY     FRIDAY   SATURDAY
-------- --- ---------- ---------- ---------- ---------- ---------- ---------- ----------
PART1      1        .98        .56        .67       1.21        .23        .98        .02
PART1      2        .57        .87        .76        .01        .45        .55        .77
PART1      3        .01        .77        .66        .54        .02        .44        .92
PART1      4        .55        .02        .88        .11        .05        .33        .89
PART1      5        .11        .43        .55        .14        .03        .90        .78
PART1      6        .87        .45        .63        .02        .82        .78        .55
PART1      7        .02                                         .01 

If there was no transaction for a particular part on a particular date, a blank (NULL) should appear; to prevent division by zero errors, the hours should be output divided by one if the total QTY_COMPLETE is zero.

Let’s create some reproducible sample data in table T1:

DROP TABLE TABLE T1_EMP PURGE;

CREATE TABLE T1_EMP AS
SELECT
  'EMP'||TO_CHAR(ROWNUM) EMPLOYEE_ID
FROM
  DUAL
CONNECT BY
  LEVEL<=1000;

DROP TABLE T1_PART PURGE;

CREATE TABLE T1_PART AS
SELECT
  'PART'||TO_CHAR(ROWNUM) PART_ID
FROM
  DUAL
CONNECT BY
  LEVEL<=200;

INSERT INTO
  T1
SELECT /*+ LEADING(P E) */
  ROWNUM TRANS_ID,
  E.EMPLOYEE_ID,
  P.PART_ID,
  TO_DATE('01-JAN-2000','DD-MON-YYYY')+ROWNUM/500 CLOCK_IN,
  TO_DATE('01-JAN-2000','DD-MON-YYYY')+ROWNUM/500 + ((MOD(ROWNUM,20) + 1) * 0.5) / 24 CLOCK_OUT,
  ROUND(ABS(SIN(ROWNUM/180*3.141592)*10)) QTY_COMPLETE
FROM
  (SELECT
     EMPLOYEE_ID,
     ROW_NUMBER() OVER (ORDER BY EMPLOYEE_ID) RN
   FROM
     T1_EMP) E,
  (SELECT
     PART_ID,
     ROW_NUMBER() OVER (ORDER BY PART_ID) RN
   FROM
     T1_PART) P,
  (SELECT
     ROWNUM RN
   FROM
     DUAL
   CONNECT BY
     LEVEL<=10) D
WHERE
  MOD(E.RN,8)=MOD(P.RN,8);

COMMIT; 

Taking a quick look at the sample data in table T1:

COLUMN PART_ID FORMAT A8
COLUMN HOURS FORMAT 90.00
SET PAGESIZE 1000
SET LINESIZE 140
SET TRIMSPOOL ON

SELECT
  PART_ID,
  EMPLOYEE_ID,
  TRUNC(CLOCK_IN) SHIFT_DATE,
  CLOCK_OUT-CLOCK_IN HOURS,
  QTY_COMPLETE
FROM
  T1
WHERE
  PART_ID='PART1'
ORDER BY
  CLOCK_IN;

PART_ID  EMPLOYEE_I SHIFT_DAT  HOURS QTY_COMPLETE
-------- ---------- --------- ------ ------------
PART1    EMP811     31-JAN-01   0.19            9
PART1    EMP811     31-JAN-01   0.21            9
PART1    EMP811     31-JAN-01   0.23            9
PART1    EMP819     04-FEB-01   0.04           10
PART1    EMP819     04-FEB-01   0.06           10
PART1    EMP819     04-FEB-01   0.08           10
PART1    EMP819     04-FEB-01   0.10           10
PART1    EMP819     04-FEB-01   0.13           10
PART1    EMP819     04-FEB-01   0.15           10
...
PART1    EMP912     28-MAR-01   0.19            2
PART1    EMP912     28-MAR-01   0.21            2
PART1    EMP912     28-MAR-01   0.23            2
PART1    EMP92      01-APR-01   0.04            0
PART1    EMP92      01-APR-01   0.06            0
PART1    EMP92      01-APR-01   0.08            1
PART1    EMP92      01-APR-01   0.10            1
...
PART1    EMP992     11-MAY-01   0.19            5
PART1    EMP992     11-MAY-01   0.21            5
PART1    EMP992     11-MAY-01   0.23            5

1250 rows selected. 

Now just a quick example of how you might calculate the average hours per piece (note that the HOURS column actually shows the number of days, NOT hours – multiply the numbers in that column by 24 to convert to hours):

SELECT
  PART_ID,
  TRUNC(CLOCK_IN) SHIFT_DATE,
  SUM(CLOCK_OUT-CLOCK_IN) HOURS,
  SUM(QTY_COMPLETE) QTY_COMPLETE
FROM
  T1
WHERE
  PART_ID='PART1'
GROUP BY
  PART_ID,
  TRUNC(CLOCK_IN)
ORDER BY
  TRUNC(CLOCK_IN);

PART_ID  SHIFT_DAT  HOURS QTY_COMPLETE
-------- --------- ------ ------------
PART1    01-JAN-00   1.35           90
PART1    05-JAN-00   1.35          100
PART1    09-JAN-00   1.35           98
PART1    13-JAN-00   1.35           81
PART1    17-JAN-00   1.35           56
PART1    21-JAN-00   1.35           25
PART1    25-JAN-00   1.35           10
PART1    29-JAN-00   1.35           44
...
PART1    29-APR-01   1.35           56
PART1    03-MAY-01   1.35           25
PART1    07-MAY-01   1.35           10
PART1    11-MAY-01   1.35           44

125 rows selected. 

Interesting how the HOURS column always sums to 1.35 per day…

OK, for you self-starters, create the calendar output.  Think about any other kinds of analysis that might be done with this data – is it possible to determine which PART_ID takes the least (or the most) average time per piece on any given day.

-

-

-

For the non-self starters, scroll down.

-

-

-

-

-

-

-

-

-

-

-

-

-

One of the challenges that we face is determining the calendar row and column for any given shift date (the date associated with the CLOCK_IN date/time stamp).  Since in this case we know that the first row of data has a date of 01-JAN-2000, we need to determine the Sunday of that week.  One way to do that is to use the NEXT_DAY function, like this:

SELECT
  NEXT_DAY(TO_DATE('01-JAN-2000','DD-MON-YYYY')-7,'SUNDAY') S
FROM
  DUAL;

S
---------
26-DEC-99 

The next step is to divide the shift dates into rows and columns based on the zero date of 26-DEC-99.  TRUNC and MOD could be used for this calculation, but to keep things interesting I will use TO_CHAR rather than MOD:

COLUMN R FORMAT 99
COLUMN C FORMAT 9

SELECT
  PART_ID,
  NEXT_DAY(TO_DATE('01-JAN-2000','DD-MON-YYYY')-7,'SUNDAY') S,
  TRUNC(CLOCK_IN) SHIFT_DATE,
  TRUNC((TRUNC(CLOCK_IN)-NEXT_DAY(TO_DATE('01-JAN-2000','DD-MON-YYYY')-7,'SUNDAY'))/7)+1 R,
  TO_NUMBER(TO_CHAR(TRUNC(CLOCK_IN),'D')) C,
  SUM(CLOCK_OUT-CLOCK_IN) HOURS,
  SUM(QTY_COMPLETE) QTY_COMPLETE
FROM
  T1
WHERE
  PART_ID='PART1'
GROUP BY
  PART_ID,
  TRUNC(CLOCK_IN)
ORDER BY
  TRUNC(CLOCK_IN);

PART_ID  S         SHIFT_DAT   R  C  HOURS QTY_COMPLETE
-------- --------- --------- --- -- ------ ------------
PART1    26-DEC-99 01-JAN-00   1  7   1.35           90
PART1    26-DEC-99 05-JAN-00   2  4   1.35          100
PART1    26-DEC-99 09-JAN-00   3  1   1.35           98
PART1    26-DEC-99 13-JAN-00   3  5   1.35           81
PART1    26-DEC-99 17-JAN-00   4  2   1.35           56
PART1    26-DEC-99 21-JAN-00   4  6   1.35           25
PART1    26-DEC-99 25-JAN-00   5  3   1.35           10
...
PART1    26-DEC-99 16-NOV-00  47  5   1.35           72
PART1    26-DEC-99 20-NOV-00  48  2   1.35           90
PART1    26-DEC-99 24-NOV-00  48  6   1.35          100
PART1    26-DEC-99 28-NOV-00  49  3   1.35           98
PART1    26-DEC-99 02-DEC-00  49  7   1.35           81
...
PART1    26-DEC-99 25-APR-01  70  4   1.35           81
PART1    26-DEC-99 29-APR-01  71  1   1.35           56
PART1    26-DEC-99 03-MAY-01  71  5   1.35           25
PART1    26-DEC-99 07-MAY-01  72  2   1.35           10
PART1    26-DEC-99 11-MAY-01  72  6   1.35           44

125 rows selected. 

The next step is to collapse all of the rows with the same R value into a single row – we will do this with a combination of the MAX and DECODE functions:

SELECT
  PART_ID,
  R,
  ROUND(MAX(DECODE(C,1,DECODE(QTY_COMPLETE,0,HOURS,HOURS/QTY_COMPLETE),NULL)),2) SUNDAY,
  ROUND(MAX(DECODE(C,2,DECODE(QTY_COMPLETE,0,HOURS,HOURS/QTY_COMPLETE),NULL)),2) MONDAY,
  ROUND(MAX(DECODE(C,3,DECODE(QTY_COMPLETE,0,HOURS,HOURS/QTY_COMPLETE),NULL)),2) TUESDAY,
  ROUND(MAX(DECODE(C,4,DECODE(QTY_COMPLETE,0,HOURS,HOURS/QTY_COMPLETE),NULL)),2) WEDNESDAY,
  ROUND(MAX(DECODE(C,5,DECODE(QTY_COMPLETE,0,HOURS,HOURS/QTY_COMPLETE),NULL)),2) THURSDAY,
  ROUND(MAX(DECODE(C,6,DECODE(QTY_COMPLETE,0,HOURS,HOURS/QTY_COMPLETE),NULL)),2) FRIDAY,
  ROUND(MAX(DECODE(C,7,DECODE(QTY_COMPLETE,0,HOURS,HOURS/QTY_COMPLETE),NULL)),2) SATURDAY
FROM
  (SELECT
    PART_ID,
    TRUNC(CLOCK_IN) SHIFT_DATE,
    TRUNC((TRUNC(CLOCK_IN)-NEXT_DAY(TO_DATE('01-JAN-2000','DD-MON-YYYY')-7,'SUNDAY'))/7)+1 R,
    TO_NUMBER(TO_CHAR(TRUNC(CLOCK_IN),'D')) C,
    SUM(CLOCK_OUT-CLOCK_IN) HOURS,
    SUM(QTY_COMPLETE) QTY_COMPLETE
  FROM
    T1
  WHERE
    PART_ID='PART1'
  GROUP BY
    PART_ID,
    TRUNC(CLOCK_IN))
GROUP BY
  PART_ID,
  R
ORDER BY
  PART_ID,
  R;

PART_ID    R     SUNDAY     MONDAY    TUESDAY  WEDNESDAY   THURSDAY     FRIDAY   SATURDAY
-------- --- ---------- ---------- ---------- ---------- ---------- ---------- ----------
PART1      1                                                                          .02
PART1      2                                         .01
PART1      3        .01                                         .02
PART1      4                   .02                                         .05
PART1      5                              .14                                         .03
PART1      6                                         .02
PART1      7        .02                                         .01
PART1      8                   .01                                         .02
...
PART1     69                              .01                                         .01
PART1     70                                         .02
PART1     71        .02                                         .05
PART1     72                   .14                                         .03

72 rows selected. 

Of course there is a problem with the above – the scale of the time is actually in days, rather than hours, but that is easy enough to fix.

-

-

-

-

-

-

-

Anyone care to share their approach to solving the original problem?

-

-

Assume that the output will be used by a production supervisor – what other kinds of analysis can be performed?  For example, could you produce an analysis like the one below that compares recent transactions with those from prior periods and color codes those prior periods based on how the current period (the Avg Hrs Pc column in the below screen capture) compares with the prior periods:

Be creative – performance, while important, is not the primary objective.





How Many Ways to Solve this Problem? Add the Sequential Numbers x Through y

13 07 2011

July 13, 2011

I am not entirely sure why, however a couple of days ago the following search keywords were used to access one or more articles on this blog:

the sum of 1+2+3+4...+98+99+100

The above request I found to be a bit interesting, and there is a 50/50 chance that the person found the right answer to the sum of the numbers between 1 and 100. 

If you had to solve this problem with the help of Oracle Database, how would you accomplish the task?  If it helps, I think that I recall that the mathematical notation representing the problem posed by the searcher is as follows:

Would your answer be any different if the person needed to know the sum of the numbers between 6 and 105:

What about generalizing the problem even further:

Think about the problem before scrolling down.  How many unique solutions are able to produce the answer?

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

————————————

If you think about the problem, in its simplist form it is really just a matter of repeatedly adding the set of the highest and lowest unmatched numbers (all of those results should be the same, otherwise your calculator needs new batteries) and then multiplying by the number of pairs (1/2 as many numbers are in the sequence to be summed):

01 + 100 = 101
02 + 99 = 101
03 + 98 = 101
04 + 97 = 101
05 + 96 = 101
...
50 + 51 = 101

So, the general formula is:

(max - min + 1) / 2 * (min + max)

And the SQL statements to produce the results:

SELECT   (100 - 1 + 1) /2 * (1 + 100) FROM   DUAL;  
SELECT   (105 - 6 + 1) /2 * (6 + 105) FROM   DUAL;

————————————

-

-

-

-

-

-

-

-

-

-

-

-

For now, ignore the above section.  How many ways can this particular problem be solved with the help of Oracle Database?  Are there any built-in functions that will help?





What Would Cause a NO_INDEX Hint to Not Work as Expected?

11 07 2011

July 11, 2011

Recently, the following search keywords were used to access an article on my site, and that search triggered an idea for another blog article:

no_index hint oracle 10g not working 

In Oracle Database, hints are directives that must be obeyed (with a couple of minor exceptions that include bugs).  I started wondering what might cause a NO_INDEX hint to not work as expected.  Let’s create a test table for a couple of experiments:

CREATE TABLE T3(
  C1 NUMBER,
  C2 NUMBER,
  C3 NUMBER,
  C4 VARCHAR2(300));

INSERT INTO
  T3
SELECT
  ROWNUM C1,
  TRUNC(ROWNUM/10000) C2,
  MOD(ROWNUM,10000) C3,
  LPAD('A',300,'A')
FROM
  DUAL
CONNECT BY
  LEVEL<=1000000;

CREATE INDEX IND_T3_C1 ON T3(C1);
CREATE INDEX IND_T3_C2 ON T3(C2);
CREATE INDEX IND_T3_C3 ON T3(C3);

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T3',CASCADE=>TRUE,ESTIMATE_PERCENT=>100)

SET LINESIZE 140
SET PAGESIZE 1000 

Let’s try a simple query that accesses the table, and display the execution plan for that query (note that these test SQL statements are being executed on Oracle Database 11.2.0.2):

SELECT /*+ GATHER_PLAN_STATISTICS */
  C1,
  C2,
  C3,
  SUBSTR(C4,1,10) C4
FROM
  T3
WHERE
  C1<=20;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

SQL_ID  0s5xrvx04309f, child number 0
-------------------------------------
SELECT /*+ GATHER_PLAN_STATISTICS */   C1,   C2,   C3,
SUBSTR(C4,1,10) C4 FROM   T3 WHERE   C1<=20

Plan hash value: 1371903174

---------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |      1 |        |     20 |00:00:00.01 |       8 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T3        |      1 |     20 |     20 |00:00:00.01 |       8 |
|*  2 |   INDEX RANGE SCAN          | IND_T3_C1 |      1 |     20 |     20 |00:00:00.01 |       5 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C1"<=20) 

The IND_T3_C1 index was automatically selected to assist data retrieval.

Let’s try a NO_INDEX hint just to verify that the hint can work as expected:

SELECT /*+ NO_INDEX(T3 IND_T3_C1) GATHER_PLAN_STATISTICS */
  C1,
  C2,
  C3,
  SUBSTR(C4,1,10) C4
FROM
  T3
WHERE
  C1<=20;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

SQL_ID  d0gpwhvg7629r, child number 0
-------------------------------------
SELECT /*+ NO_INDEX(T3 IND_T3_C1) GATHER_PLAN_STATISTICS */   C1,   C2,
  C3,   SUBSTR(C4,1,10) C4 FROM   T3 WHERE   C1<=20

Plan hash value: 4161002650

------------------------------------------------------------------------------------
| Id  | Operation         | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |      1 |        |     20 |00:00:00.01 |   45583 |
|*  1 |  TABLE ACCESS FULL| T3   |      1 |     20 |     20 |00:00:00.01 |   45583 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("C1"<=20) 

As can be seen above, a full table scan is performed for data retrieval, rather than the index that was used in the previous example.

Let’s try another example with the NO_INDEX hint:

SELECT /*+ NO_INDEX(IND_T3_C1) GATHER_PLAN_STATISTICS */
  C1,
  C2,
  C3,
  SUBSTR(C4,1,10) C4
FROM
  T3
WHERE
  C1<=20;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

SQL_ID  4p5xpu625cw5a, child number 0
-------------------------------------
SELECT /*+ NO_INDEX(IND_T3_C1) GATHER_PLAN_STATISTICS */   C1,   C2,
C3,   SUBSTR(C4,1,10) C4 FROM   T3 WHERE   C1<=20

Plan hash value: 1371903174

---------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |      1 |        |     20 |00:00:00.01 |       8 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T3        |      1 |     20 |     20 |00:00:00.01 |       8 |
|*  2 |   INDEX RANGE SCAN          | IND_T3_C1 |      1 |     20 |     20 |00:00:00.01 |       5 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C1"<=20) 

That did not work quite as someone might think – you must specify the table/view name in the NO_INDEX hint.

Let’s try another example:

SELECT /*+ NO_INDEX(T3 IND_T3_C1) GATHER_PLAN_STATISTICS */
  C1,
  C2,
  C3,
  SUBSTR(C4,1,10) C4
FROM
  T3 T
WHERE
  C1<=20;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

SQL_ID  fnjc3pc41a2mh, child number 0
-------------------------------------
SELECT /*+ NO_INDEX(T3 IND_T3_C1) GATHER_PLAN_STATISTICS */   C1,   C2,
  C3,   SUBSTR(C4,1,10) C4 FROM   T3 T WHERE   C1<=20

Plan hash value: 1371903174

---------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |      1 |        |     20 |00:00:00.01 |       8 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T3        |      1 |     20 |     20 |00:00:00.01 |       8 |
|*  2 |   INDEX RANGE SCAN          | IND_T3_C1 |      1 |     20 |     20 |00:00:00.01 |       5 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C1"<=20) 

Again, the index was used despite the NO_INDEX hint – if you alias a table/view, you must specify the alias in the NO_INDEX hint.  This seems to be a common problem when people report in Internet forums that Oracle hints do not work as expected.

Another example:

SELECT /*+ GATHER_PLAN_STATISTICS */
  C1,
  C2,
  C3,
  SUBSTR(C4,1,10) C4
FROM
  T3
WHERE
  C1 IN
    (SELECT /*+ NO_INDEX(T3 IND_T3_C1) */
       C1
     FROM
       T3
     WHERE
       C1<=20);

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

Plan hash value: 587667290

-------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |           |      1 |        |     20 |00:00:00.01 |      13 |       |       |          |
|*  1 |  HASH JOIN SEMI              |           |      1 |     20 |     20 |00:00:00.01 |      13 |   705K|   705K| 1125K (0)|
|   2 |   TABLE ACCESS BY INDEX ROWID| T3        |      1 |     20 |     20 |00:00:00.01 |       4 |       |       |          |
|*  3 |    INDEX RANGE SCAN          | IND_T3_C1 |      1 |     20 |     20 |00:00:00.01 |       3 |       |       |          |
|*  4 |   TABLE ACCESS FULL          | T3        |      1 |     20 |     20 |00:00:00.01 |       9 |       |       |          |
-------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("C1"="C1")
   3 - access("C1"<=20)
   4 - filter("C1"<=20) 

As seen by the above, the IND_T3_C1 index was still used even though the hint was correctly formed.  What is wrong?  The scope of the index hint is only in the subquery found in the WHERE clause and that hint does not apply to the main portion of the SQL statement – the hint did work in the scope of the subquery.  Note that the execution plan shows that the query was transformed into a simple join.

A similar SQL statement, with the NO_INDEX hint relocated:

SELECT /*+ NO_INDEX(T3 IND_T3_C1) GATHER_PLAN_STATISTICS */
  C1,
  C2,
  C3,
  SUBSTR(C4,1,10) C4
FROM
  T3
WHERE
  C1 IN
    (SELECT /*+  */
       C1
     FROM
       T3
     WHERE
       C1<=20);

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

SQL_ID  3n76qa6km68r2, child number 0
-------------------------------------
SELECT /*+ NO_INDEX(T3 IND_T3_C1) GATHER_PLAN_STATISTICS */   C1,   C2,
  C3,   SUBSTR(C4,1,10) C4 FROM   T3 WHERE   C1 IN     (SELECT /*+  */
      C1      FROM        T3      WHERE        C1<=20)

Plan hash value: 3266157401

---------------------------------------------------------------------------------------------------------------------
| Id  | Operation          | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
---------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |           |      1 |        |     20 |00:00:00.07 |   45586 |       |       |          |
|*  1 |  HASH JOIN SEMI    |           |      1 |     20 |     20 |00:00:00.07 |   45586 |   705K|   705K| 1140K (0)|
|*  2 |   TABLE ACCESS FULL| T3        |      1 |     20 |     20 |00:00:00.01 |   45581 |       |       |          |
|*  3 |   INDEX RANGE SCAN | IND_T3_C1 |      1 |     20 |     20 |00:00:00.01 |       5 |       |       |          |
---------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("C1"="C1")
   2 - filter("C1"<=20)
   3 - access("C1"<=20) 

As can be seen above, the IND_T3_C1 index was still used because the NO_INDEX scope is only in the main body of the SQL statement, not in the subquery found in the WHERE clause.  Once again, the execution plan shows that the query was transformed into a simple join.

OK, so we saw in the previous examples that query transformations happened and Oracle’s optimizer was able to keep track of the scope of the NO_INDEX hint, even when the query was tranformed into a simple join.  Let’s try another example, this time with a NO_QUERY_TRANSFORMATION hint, a NO_INDEX hint, and an INDEX hint: 

SELECT /*+ INDEX(T3 IND_T3_C1) NO_QUERY_TRANSFORMATION GATHER_PLAN_STATISTICS */
  C1,
  C2,
  C3,
  SUBSTR(C4,1,10) C4
FROM
  T3
WHERE
  C1 IN
    (SELECT /*+ NO_INDEX(T3 IND_T3_C1) */
       C1
     FROM
       T3
     WHERE
       C1<=20);

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

SQL_ID  bbs14bbywgfq2, child number 0
-------------------------------------
SELECT /*+ INDEX(T3 IND_T3_C1) NO_QUERY_TRANSFORMATION
GATHER_PLAN_STATISTICS */   C1,   C2,   C3,   SUBSTR(C4,1,10) C4 FROM
T3 WHERE   C1 IN     (SELECT /*+ NO_INDEX(T3 IND_T3_C1) */        C1
  FROM        T3      WHERE        C1<=20)

Plan hash value: 371539318

--------------------------------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |     20 |00:00:00.01 |   45723 |
|*  1 |  FILTER             |      |      1 |        |     20 |00:00:00.01 |   45723 |
|   2 |   TABLE ACCESS FULL | T3   |      1 |   1000K|   1000K|00:00:00.16 |   45583 |
|*  3 |   FILTER            |      |   1000K|        |     20 |00:00:00.11 |     140 |
|*  4 |    TABLE ACCESS FULL| T3   |     20 |      1 |     20 |00:00:00.01 |     140 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter( IS NOT NULL)
   3 - filter(:B1<=20)
   4 - filter(("C1"=:B1 AND "C1"<=20)) 

Note in the above that no query transformation happened, but also notice that two full table scans were performed – it *appears* that the INDEX hint was ignored.  For extra credit, explain why the optimizer could not apply the INDEX hint.

What if we create a view with an embedded hint?

CREATE VIEW V3 AS
SELECT /*+ INDEX(TV3 IND_T3_C2) */
  C1,
  C2,
  C3,
  C4
FROM
  T3 TV3
WHERE
  C2 <= 20; 

Now a query with a NO_INDEX hint that uses that view:

SELECT /*+ GATHER_PLAN_STATISTICS */
  C1,
  C2,
  C3,
  SUBSTR(C4,1,10) C4
FROM
  T3
WHERE
  C1 IN
    (SELECT /*+ NO_INDEX(V3 IND_T3_C2) */
       C1
     FROM
       V3
     WHERE
       C1<=200);

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

SQL_ID  agjapbkt2n8av, child number 0
-------------------------------------
SELECT /*+ GATHER_PLAN_STATISTICS */   C1,   C2,   C3,
SUBSTR(C4,1,10) C4 FROM   T3 WHERE   C1 IN     (SELECT /*+ NO_INDEX(V3
IND_T3_C2) */        C1      FROM        V3      WHERE        C1<=200)

Plan hash value: 1309751330

-------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |           |      1 |        |    200 |00:00:00.01 |      54 |    |          |          |
|*  1 |  HASH JOIN RIGHT SEMI        |           |      1 |     42 |    200 |00:00:00.01 |      54 |  1452K|  1452K| 1282K (0)|
|*  2 |   TABLE ACCESS BY INDEX ROWID| T3        |      1 |     42 |    200 |00:00:00.01 |      13 |    |          |          |
|*  3 |    INDEX RANGE SCAN          | IND_T3_C1 |      1 |    200 |    200 |00:00:00.01 |       3 |    |          |          |
|   4 |   TABLE ACCESS BY INDEX ROWID| T3        |      1 |    200 |    200 |00:00:00.01 |      41 |    |          |          |
|*  5 |    INDEX RANGE SCAN          | IND_T3_C1 |      1 |    200 |    200 |00:00:00.01 |      17 |    |          |          |
-------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("C1"="C1")
   2 - filter("C2"<=20)
   3 - access("C1"<=200)
   5 - access("C1"<=200) 

As seen by the above, the NO_INDEX hint was applied and the conflicting hint that was embedded in the view was not applied.

For fun, let’s reverse the location of the INDEX and NO_INDEX hints to see if the NO_INDEX hint always overrides the INDEX hint.  First, the view definition:

CREATE OR REPLACE VIEW V3 AS
SELECT /*+ NO_INDEX(TV3 IND_T3_C2) */
  C1,
  C2,
  C3,
  C4
FROM
  T3 TV3
WHERE
  C2 <= 20; 

Now the query:

 SELECT /*+ GATHER_PLAN_STATISTICS */
  C1,
  C2,
  C3,
  SUBSTR(C4,1,10) C4
FROM
  T3
WHERE
  C1 IN
    (SELECT /*+ INDEX(V3 IND_T3_C2) */
       C1
     FROM
       V3
     WHERE
       C1<=200);

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

SQL_ID  5j745zr4dmqzx, child number 0
-------------------------------------
SELECT /*+ GATHER_PLAN_STATISTICS */   C1,   C2,   C3,
SUBSTR(C4,1,10) C4 FROM   T3 WHERE   C1 IN     (SELECT /*+ INDEX(V3
IND_T3_C2) */        C1      FROM        V3      WHERE        C1<=200)

Plan hash value: 3864333899

-------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |           |      1 |        |    200 |00:00:00.06 |   10017 |       |       |          |
|*  1 |  HASH JOIN RIGHT SEMI        |           |      1 |     42 |    200 |00:00:00.06 |   10017 |  1452K|  1452K| 1269K (0)|
|*  2 |   TABLE ACCESS BY INDEX ROWID| T3        |      1 |     42 |    200 |00:00:00.01 |    9976 |       |       |          |
|*  3 |    INDEX RANGE SCAN          | IND_T3_C2 |      1 |    209K|    209K|00:00:00.02 |     410 |       |       |          |
|   4 |   TABLE ACCESS BY INDEX ROWID| T3        |      1 |    200 |    200 |00:00:00.01 |      41 |       |       |          |
|*  5 |    INDEX RANGE SCAN          | IND_T3_C1 |      1 |    200 |    200 |00:00:00.01 |      17 |       |       |          |
-------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("C1"="C1")
   2 - filter("C1"<=200)
   3 - access("C2"<=20)
   5 - access("C1"<=200)

As can be seen by the above, the NO_INDEX hint in the view was not applied due to the conflicting INDEX hint in the main query.  So, that is another case where the NO_INDEX hint could appear to not work as expected.  Just for confirmation that the NO_INDEX hint in the view works as expected, we will re-execute the query without the INDEX hint:

SELECT /*+ GATHER_PLAN_STATISTICS */
  C1,
  C2,
  C3,
  SUBSTR(C4,1,10) C4
FROM
  T3
WHERE
  C1 IN
    (SELECT /*+  */
       C1
     FROM
       V3
     WHERE
       C1<=200);

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

SQL_ID  fq3g6pr7ffj2f, child number 0
-------------------------------------
SELECT /*+ GATHER_PLAN_STATISTICS */   C1,   C2,   C3,
SUBSTR(C4,1,10) C4 FROM   T3 WHERE   C1 IN     (SELECT /*+  */
C1      FROM        V3      WHERE        C1<=200)

Plan hash value: 1309751330

-------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |           |      1 |        |    200 |00:00:00.01 |      54 |    |          |          |
|*  1 |  HASH JOIN RIGHT SEMI        |           |      1 |     42 |    200 |00:00:00.01 |      54 |  1452K|  1452K| 1232K (0)|
|*  2 |   TABLE ACCESS BY INDEX ROWID| T3        |      1 |     42 |    200 |00:00:00.01 |      13 |    |          |          |
|*  3 |    INDEX RANGE SCAN          | IND_T3_C1 |      1 |    200 |    200 |00:00:00.01 |       3 |    |          |          |
|   4 |   TABLE ACCESS BY INDEX ROWID| T3        |      1 |    200 |    200 |00:00:00.01 |      41 |    |          |          |
|*  5 |    INDEX RANGE SCAN          | IND_T3_C1 |      1 |    200 |    200 |00:00:00.01 |      17 |    |          |          |
-------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("C1"="C1")
   2 - filter("C2"<=20)
   3 - access("C1"<=200)
   5 - access("C1"<=200) 

  1. By executing the above test scripts on Oracle Database 11.1, 10.2, 10.1, 9.2, or 9.0.1 do you see different results?  I am trying to understand why the person performing the search might have included 10g in the search keywords.  Is it possible that the hint appeared to work correctly in 9.2 simply by coincidence, and a query transformation in 10.1 or 10.2 exposed the fact that the hint was malformed?
  2. Is it possible that a query transformation can cause a NO_INDEX hint to be ignored?  If yes, please provide a test case that demonstrates a NO_INDEX hint being ignored due to a transformation.
  3. Are there any other examples where a NO_INDEX hint will appear to not work properly?  Could an index organized table cause problems for this hint?




How Many Ways to Solve this SQL Problem?

6 07 2011

July 6, 2011

Since there were so many unique solutions to the last blog article that posed a SQL challenge, I thought that I would try another blog article that asks a similar type of question.  Assume that someone showed you the following output:

 C2   D
--- ---
100   0
150  50
200  50
201   1
300  99
350  50
400  50
500 100 

You have the following table definition, and rows in the table:

CREATE TABLE T2 (
  C1 NUMBER,
  C2 NUMBER);

INSERT INTO T2 VALUES (1,100);
INSERT INTO T2 VALUES (4,150);
INSERT INTO T2 VALUES (7,200);
INSERT INTO T2 VALUES (8,201);
INSERT INTO T2 VALUES (10,300);
INSERT INTO T2 VALUES (14,350);
INSERT INTO T2 VALUES (18,400);
INSERT INTO T2 VALUES (24,500);

COMMIT;

Assume that you know nothing other than the fact that the C2 values are listed in ascending order when sorted by column C1.  How many different ways can this particular problem be solved.  Yes, there is an easy way, but assume that you were trying to “help educate” the person who provided the requested output.

My least-shortest-path solution follows:

SELECT
  C2,
  0 D
FROM
  T2
WHERE
  C1=(SELECT
        MIN(C1)
      FROM
        T2)
UNION ALL
SELECT
  V2.C2,
  V2.C2-MAX(T2.C2) D
FROM
  T2,
  (SELECT
    C1,
    C2
  FROM
    T2) V2
WHERE
  T2.C1<V2.C1
GROUP BY
  V2.C2
ORDER BY
  C2;

  C2    D
---- ----
 100    0
 150   50
 200   50
 201    1
 300   99
 350   50
 400   50
 500  100

8 rows selected. 

In the above, the row with the 0 in the D column was the hardest part of the solution.  Why would I use UNION ALL and not UNION – what was not in the specification?

This blog article was inspired by an old question found in a Usenet group from 1998 – if you were answering the question in 1998, would your answer be any different?  Be creative with your solution.  While you are thinking about a solution, take a look at this old Usenet thread and consider how difficult it was to find the “50 highest paid workers” in the last century.





Finding Rows with Common Attributes – Roman to Find a Solution in Something New

1 07 2011

July 1, 1011

A recent thread in the comp.databases.oracle.server Usenet group (actually two threads) asked an interesting question.  Assume that you had a detail table that contained several attributes for each of the unique key values.  How would one go about finding all of the unique key values that share the same set of attributes?  The sample set provided by the OP looks like this:

COL1   COL2
----  -----
I         a
I         b
I         c
II        a
II        b
III       a
III       b
III       c 

For the above, assume that the OP was interested in the attributes of “I”: a,b,c.  “II” lacks a “c” attribute, while “III” has the required “a”, “b”, and “c” attributes.  So, the OP would like to return C1 value “III” but not “II”.  I wonder if there is a simple solution for the OP?

First, let’s create our test data.  COL1 appears to contain Roman numbers – if we go beyond the number 3, those could be tricky to generate (unless of course you find the RN format parameter for the TO_CHAR function).  Let’s first create a temporary work table that contains the Roman numbers from 1 to 100 and a random number between 1 and 10:

CREATE TABLE T1_TEMP AS
SELECT
  TRIM(TO_CHAR(ROWNUM,'RN')) C1,
  TRUNC(DBMS_RANDOM.VALUE(1,10)+1) C2,
  ROWNUM C3
FROM
  DUAL
CONNECT BY
  LEVEL<=100; 

Let’s see what is in the T1_TEMP table:

COLUMN C1 FORMAT A10

SELECT
  *
FROM
  T1_TEMP
ORDER BY
  C3;

C1                 C2         C3
---------- ---------- ----------
I                  10          1
II                  4          2
III                 7          3
IV                  9          4
V                   8          5
VI                 10          6
VII                 9          7
VIII                4          8
IX                  4          9
X                  10         10
...
XCV                 5         95
XCVI                4         96
XCVII               8         97
XCVIII              7         98
XCIX               10         99
C                   4        100

100 rows selected. 

The row with the value “I” in column C1 has the number 10 in column C2, but that number might be a bit different in your temporary work table.  Column C2 will determine the number of attributes that are added for each of the values found in column C1 when we create the table T1 (note that we could have defined column C2 with the function CHR(96 + COUNTER) to place lowercase letters in that column, rather than numbers, to help reproduce the OP’s dataset):

CREATE TABLE T1 AS
SELECT
  T1_TEMP.C1,
  V1.COUNTER C2
FROM
  T1_TEMP,
  (SELECT
     ROWNUM COUNTER
   FROM
     DUAL
   CONNECT BY
     LEVEL<=10) V1
WHERE
  T1_TEMP.C2>=V1.COUNTER; 

Let’s see what is in table T1:

SELECT
  *
FROM
  T1
ORDER BY
  C1,
  C2;

C1                 C2
---------- ----------
C                   1
C                   2
C                   3
C                   4
I                   1
I                   2
I                   3
I                   4
I                   5
I                   6
I                   7
I                   8
I                   9
I                  10
...
XXXVII              1
XXXVII              2
XXXVII              3
XXXVIII             1
XXXVIII             2
XXXVIII             3
XXXVIII             4
XXXVIII             5
XXXVIII             6

634 rows selected.  

From the above output, you can see that we now have the number of rows in table T1 for each distinct value of C1 as was specified in table T1_TEMP.  An interesting side-note, the Roman number 100 (C) is less than the Roman number 1 (I) – I guess that explains why computers do not natively use Roman numbers for calculations.  :-)

For the next step, we need to collapse the different C2 values for each of the unique C1 values into a single row.  Oracle Database 11.2.0.1 introduced the LISTAGG function that makes easy work of this task, as shown in this earlier blog article.

COLUMN C2_LISTING FORMAT A22

SELECT
  C1,
  LISTAGG(TO_CHAR(C2), ',') WITHIN GROUP (ORDER BY C2) C2_LISTING
FROM
  T1
GROUP BY
  C1
ORDER BY
  C1;

C1         C2_LISTING
---------- --------------------
C          1,2,3,4
I          1,2,3,4,5,6,7,8,9,10
II         1,2,3,4
III        1,2,3,4,5,6,7
IV         1,2,3,4,5,6,7,8,9
IX         1,2,3,4
...
XXXV       1,2,3,4,5,6,7,8,9
XXXVI      1,2,3,4,5
XXXVII     1,2,3
XXXVIII    1,2,3,4,5,6

100 rows selected. 

The question remains, how can I find all of the unique C1 values that have all of the same attributes as the C1 value “I” – in this case 1,2,3,4,5,6,7,8,9,10?  One method slides the above query into a WITH block and then the WITH block is referenced twice in the main query:

WITH MY_VIEW AS
(SELECT
  C1,
  LISTAGG(TO_CHAR(C2), ',') WITHIN GROUP (ORDER BY C2) C2_LISTING
FROM
  T1
GROUP BY
  C1)
SELECT
  V2.C1,
  V2.C2_LISTING
FROM
  MY_VIEW V1,
  MY_VIEW V2
WHERE
  V1.C1='I'
  AND V1.C1<>V2.C1
  AND V1.C2_LISTING=V2.C2_LISTING
ORDER BY
  V2.C1;

C1         C2_LISTING
---------- --------------------
LVII       1,2,3,4,5,6,7,8,9,10
LXXI       1,2,3,4,5,6,7,8,9,10
LXXIII     1,2,3,4,5,6,7,8,9,10
VI         1,2,3,4,5,6,7,8,9,10
X          1,2,3,4,5,6,7,8,9,10
XCIX       1,2,3,4,5,6,7,8,9,10
XV         1,2,3,4,5,6,7,8,9,10
XXIX       1,2,3,4,5,6,7,8,9,10
XXXI       1,2,3,4,5,6,7,8,9,10

9 rows selected. 

How else might you solve the problem posted by the OP in the Usenet thread?





WHERE MOD(ROWNUM,100) = 0

30 06 2011

June 30, 2011

I saw an interesting search keyword the other day that generated several hits on my blog articles.  The search keyword was simply:

WHERE MOD(ROWNUM,100) = 0

Just what is interesting about the above?  Had someone seen that syntax and wondered how it worked, or was someone trying to use that syntax and found that it did not work quite as expected?

The MOD function returns the remainder value when dividing the number (ROWNUM in this case) by the specified denominator value (100 in this case).  The result of the MOD function where 100 is specified as the second parameter, with the sequential numbers 1-201 fed in as the first parameter, will appear as follows:

SELECT
  MOD(ROWNUM,100) C1
FROM
  DUAL
CONNECT BY
  LEVEL<=201;

C1
--
 1
 2
 3
 4
 5
 6
...
98
99
 0
 1
 2
 3
 4
...
98
99
 0
 1

As shown above, there is a sequence of 1-99, and then repeating  sequences of 0-99.  Note, however, that it appears the person performing the search is intending to place the MOD function in the WHERE clause, likely to retrieve every 100th row in the resultset.  The problem, of course, is this approach does not work with the ROWNUM pseudo-column.

SELECT
  *
FROM
  (SELECT
     ROWNUM C1
   FROM
     DUAL
   CONNECT BY
     LEVEL<=1000)
WHERE
  MOD(ROWNUM,100) = 0;

no rows selected 

However, if we create a table and populate the table with the same 1000 rows that would be produced by the inline view found in the above SQL statement:

DROP TABLE T1 PURGE;

CREATE TABLE T1 AS
SELECT
  ROWNUM C1
FROM
  DUAL
CONNECT BY
  LEVEL<=1000;  

Then, select from that table, switching out the ROWNUM keyword with the C1 column found in the table T1:

SELECT
  *
FROM
  T1
WHERE
  MOD(C1,100) = 0;

    C1
------
   100
   200
   300
   400
   500
   600
   700
   800
   900
  1000

10 rows selected. 

We see that 10 rows were returned – every 100th row.  At this point, you might be wondering if the following will work:

SELECT
  *
FROM
  (SELECT
     ROWNUM C1
   FROM
     DUAL
   CONNECT BY
     LEVEL<=1000)
WHERE
  MOD(C1,100) = 0; 

Here is the output of the above SQL statement:

    C1
------
   100
   200
   300
   400
   500
   600
   700
   800
   900
  1000

10 rows selected. 

So, the query works as expected when we use the alias of the ROWNUM column from the inline view when the MOD function appears in the WHERE clause, but the query does not work as (probably) intended when the ROWNUM pseudo-column is used directly in the MOD function in the WHERE clause.  Why?  Think about when the ROWNUM value is assigned – not until after the WHERE clause is applied.  See the comments attached in this related blog article.





Calculate the Distance Between Two Latitude/Longitude Points using Plain SQL

13 06 2011

June 13, 2011 (Modified June 14, 2011)

A question recently appeared on the comp.databases.oracle.server Usenet group that oddly made me recall a lesson from a mathematics class that I taught in the early 1990s.  A bit strange how a question related to Oracle Database would trigger such a memory, but it happened.  The question posed in the thread initially asked how to calculate the distance between the points:

 item  latitude  longitude
---- ---------  ---------
P1   50.716667  -1.883333
P2   51.023332  -1.872231

Let’s see, the earth is flat, so…

= ((51.023332 - 50.716667)^2 + (1.883333 - 1.872231)^2)^(1/2) * 3960 * 3.141592/180 miles
= (0.094043422225 + 0.000123254404)^(1/2) * 3960 * 3.141592/180 miles
= 0.30686589355775594241245361976998 * 3960 * 3.141592/180 miles
= 1215.18894 * 3.141592/180 miles
= 21.209 miles

So, if the earth is flat, the Pythagorean Theorem seems to give a sensible answer… maybe.  My recollection of the formulas used for the correct calculation is a bit fuzzy.  Good news, I found the three page lesson plan that I put together in the early 1990s to solve a problem similar to the one posed by the OP in the Usenet thread.  The lesson plan begins by stating that part of the point of the lesson was to destroy Euclid’s Elements, which were introduced 23 centuries ago… a bit ambitious, looking back.  Even though the lesson plan is not too terribly clear given the amount of mathematics knowledge that I have forgotten, I managed to find the following formula in the lesson plan:

c0 = cos^(-1)(cos(90-alpha1)*cos(90-alpha2) + sin(90-alpha1)*sin(90-alpha2)*cos(theta1-theta2))
distance = c0 * (2pi/360) * 3960 miles

That looks complicated, even without the help of a computer… what is alpha, and what is theta – it’s Greek to me?

A Google search found another helpful page that describes how to calculate distances between pairs of latitude and longitude coordinates.  Included in that page is the following formula that is compatible with Microsoft Excel:

=ACOS(SIN(lat1)*SIN(lat2)+COS(lat1)*COS(lat2)*COS(lon2-lon1))*6371 KM 

Other than outputting kilometers rather than miles, and using SIN where my formula uses COS, the two formulas are close to being identical.  Based on that observation, alpha must be the latitude coordinate and theta must be the longitude coordinate in the formula that I found in my lesson plan.

The lesson plan included a couple of sample questions and answers:

Prague (14 degrees 26 minutes east, 50 degrees 5 minutes north)
Rio de Janeiro (43 degrees 12 minutes west, 22 degrees 57 minutes south)
Distance = 6152 miles

(119 19 degrees 48 minutes west, 36 degrees 44 minutes north)
(88 degrees 30 minutes west, 42 degrees south)
Distance = 5789.38 4832.09 miles

Will either of the two formulas, when plugged into an Oracle Database, output the distances that I wrote down roughly two decades ago?  That sounds like a challenge.

The setup (make certain that you do not swap the longitude and latitude numbers, the results will be slightly different):

DROP TABLE T1 PURGE;

CREATE TABLE T1 (
  LOCATION VARCHAR2(40),
  LAT_RAD NUMBER,
  LON_RAD NUMBER,
  LAT_DEG NUMBER,
  LON_DEG NUMBER,
  PRIMARY KEY(LOCATION));

INSERT INTO T1 VALUES (
  'Prague',
  ROUND((-(50 + 5/60))*3.141592654/180,6),
  ROUND((14 + 26/60)*3.141592654/180,6),
  ROUND(-(50 + 5/60),6),
  ROUND(14 + 26/60,6)
  );

INSERT INTO T1 VALUES (
  'Rio de Janeiro',
  ROUND((22 + 57/60)*3.141592654/180,6),
  ROUND((-(43 + 12/60))*3.141592654/180,6),
  ROUND(22 + 57/60,6),
  ROUND(-(43 + 12/60),6)
  );

COMMIT; 

Note in the above that I had to convert the degrees and minutes notation to decimal numbers and then also to equivalent radian values.  It would probably be best just to store the radian values, because those values can be used directly in the calculations.  Let’s take a look at the contents of this table:

COLUMN LAT_RAD FORMAT 99.999999
COLUMN LON_RAD FORMAT 99.999999
COLUMN LAT_DEG FORMAT 999.999999
COLUMN LON_DEG FORMAT 999.999999
COLUMN LOCATION FORMAT A20
SET LINESIZE 140
SET TRIMSPOOL ON
SET PAGESIZE 1000

SELECT
  *
FROM
  T1;

LOCATION                LAT_RAD    LON_RAD     LAT_DEG     LON_DEG
-------------------- ---------- ---------- ----------- -----------
Prague                 -.874119    .251909  -50.083333   14.433333
Rio de Janeiro          .400553   -.753982   22.950000  -43.200000 

As seen in the above, west and north directional values are assigned negative numbers.

Let’s try converting the Excel version of the formula to Oracle Database SQL:

COLUMN DISTANCE FORMAT 999990.00

SELECT
  ABS(ACOS(SIN(LOC1.LAT_RAD)*SIN(LOC2.LAT_RAD)+COS(LOC1.LAT_RAD)*COS(LOC2.LAT_RAD)*COS(LOC2.LON_RAD-LOC1.LON_RAD)))* 3960 DISTANCE
FROM
  (SELECT
    LAT_RAD,
    LON_RAD
  FROM
    T1
  WHERE
    LOCATION='Prague') LOC1,
  (SELECT
    LAT_RAD,
    LON_RAD
  FROM
    T1
  WHERE
    LOCATION='Rio de Janeiro') LOC2;

  DISTANCE
----------
   6152.02 

That number seems to match the number that I wrote into the lesson plan.  Let’s try the formula from the lesson plan:

SELECT
  ACOS(COS(1.570796327-LOC1.LAT_RAD)*COS(1.570796327-LOC2.LAT_RAD) + SIN(1.570796327-LOC1.LAT_RAD)*SIN(1.570796327-LOC2.LAT_RAD)*COS(LOC1.LON_RAD-LOC2.LON_RAD)) * 3960 DISTANCE
FROM
  (SELECT
    LAT_RAD,
    LON_RAD
  FROM
    T1
  WHERE
    LOCATION='Prague') LOC1,
  (SELECT
    LAT_RAD,
    LON_RAD
  FROM
    T1
  WHERE
    LOCATION='Rio de Janeiro') LOC2;

  DISTANCE
----------
   6152.02 

If only I had access to an Oracle Database’s SQL interpretter back then rather than a fancy scientific calculator… :-)

Let’s try to calculate the distance for the latitude and longitude values provided by the OP (we need to convert the degree values to radians):

SELECT
  ACOS(COS(1.570796327-LOC1.LAT_RAD)*COS(1.570796327-LOC2.LAT_RAD) + SIN(1.570796327-LOC1.LAT_RAD)*SIN(1.570796327-LOC2.LAT_RAD)*COS(LOC1.LON_RAD-LOC2.LON_RAD)) * 3960 DISTANCE
FROM
  (SELECT
    50.716667*3.141592/180 LAT_RAD,
    -1.883333*3.141592/180 LON_RAD
  FROM
    DUAL) LOC1,
  (SELECT
    51.023332*3.141592/180 LAT_RAD,
    -1.872231*3.141592/180 LON_RAD
  FROM
    DUAL) LOC2;

  DISTANCE
----------
21.20 

You may notice that for the short distance between the latitude/longitude pairs that the “earth is flat” method produced the same answer that we see with the more complicated formula shown above.

For a double-check, let’s put the second set of demonstration points from my lesson plan into the T1 table:

INSERT INTO T1 VALUES (
  'TEST P1',
  NULL,
  NULL,
  19 + 48/60,     ---- 119 + 48/60,
  -(36 + 44/60)
  );

INSERT INTO T1 VALUES (
  'TEST P2',
  NULL,
  NULL,
  88 + 30/60,
  42
  );

UPDATE
  T1
SET
  LAT_RAD=ROUND(LAT_DEG/180*3.141592654,6),
  LON_RAD=ROUND(LON_DEG/180*3.141592654,6),
  LAT_DEG=ROUND(LAT_DEG,6),
  LON_DEG=ROUND(LON_DEG,6)
WHERE
  LAT_RAD IS NULL;

COMMIT;

Checking the distance with the two formulas:

SELECT
  ABS(ACOS(SIN(LOC1.LAT_RAD)*SIN(LOC2.LAT_RAD)+COS(LOC1.LAT_RAD)*COS(LOC2.LAT_RAD)*COS(LOC2.LON_RAD-LOC1.LON_RAD)))* 3960 DISTANCE
FROM
  (SELECT
    LAT_RAD,
    LON_RAD
  FROM
    T1
  WHERE
    LOCATION='TEST P1') LOC1,
  (SELECT
    LAT_RAD,
    LON_RAD
  FROM
    T1
  WHERE
    LOCATION='TEST P2') LOC2;

  DISTANCE
----------
   4832.09  ---5789.07--- 

 —

 SELECT
  ACOS(COS(1.570796327-LOC1.LAT_RAD)*COS(1.570796327-LOC2.LAT_RAD) + SIN(1.570796327-LOC1.LAT_RAD)*SIN(1.570796327-LOC2.LAT_RAD)*COS(LOC1.LON_RAD-LOC2.LON_RAD)) * 3960 DISTANCE
FROM
  (SELECT
    LAT_RAD,
    LON_RAD
  FROM
    T1
  WHERE
    LOCATION='TEST P1') LOC1,
  (SELECT
    LAT_RAD,
    LON_RAD
  FROM
    T1
  WHERE
    LOCATION='TEST P2') LOC2;

  DISTANCE
----------
   4832.09  ---5789.07---

The values returned are identical to the value I wrote into the lesson plan, so the formulas are probably correct.

The OP’s second question is to identify all locations that are within a given radius of a specified location.  This might be easy, or it might not.  Someone in the Usenet thread suggested using Oracle Spatial, which is an additional cost licensed item that may be added to the Enterprise Edition for a list price of $17,500 per CPU plus $3,850 (USD) annual maintenance.  As I am not familiar with the product, I do not know if Oracle Spatial could answer this question (does someone know for certain?). 

We need some more test data to see if we can find a solution for the OP.  We will use the SIN and COS functions just to insert some “random” yet repeatable data:

INSERT INTO
  T1
SELECT
  'POINT '||TO_CHAR(ROWNUM),
  NULL,
  NULL,
  COS(ROWNUM/40)*180,
  SIN(ROWNUM/33)*180
FROM
  DUAL
CONNECT BY
  LEVEL<=10000;

UPDATE
  T1
SET
  LAT_RAD=ROUND(LAT_DEG/180*3.141592654,6),
  LON_RAD=ROUND(LON_DEG/180*3.141592654,6),
  LAT_DEG=ROUND(LAT_DEG,6),
  LON_DEG=ROUND(LON_DEG,6)
WHERE
  LAT_RAD IS NULL;

COMMIT;

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER, TABNAME=>'T1', CASCADE=> TRUE) 

Let’s give the first SQL statement a try, looking for those areas that are less than 500 miles away from “TEST P1″:

SELECT
  LOC2.LOCATION,
  ABS(ACOS(SIN(LOC1.LAT_RAD)*SIN(LOC2.LAT_RAD)+COS(LOC1.LAT_RAD)*COS(LOC2.LAT_RAD)*COS(LOC2.LON_RAD-LOC1.LON_RAD)))* 3960 DISTANCE
FROM
  (SELECT
    LAT_RAD,
    LON_RAD
  FROM
    T1
  WHERE
    LOCATION='TEST P1') LOC1,
  (SELECT
    LOCATION,
    LAT_RAD,
    LON_RAD
  FROM
    T1
  WHERE
    LOCATION<>'TEST P1') LOC2
WHERE
  (ABS(ACOS(SIN(LOC1.LAT_RAD)*SIN(LOC2.LAT_RAD)+COS(LOC1.LAT_RAD)*COS(LOC2.LAT_RAD)*COS(LOC2.LON_RAD-LOC1.LON_RAD)))* 3960) < 500
ORDER BY
  LOC2.LOCATION; 

LOCATION               DISTANCE
-------------------- ----------
POINT 1561               331.91
POINT 1562                46.45
POINT 1563               426.22
POINT 200                439.78
POINT 2066               318.73
POINT 2067               387.62
POINT 4970               432.93
POINT 7989               210.35
POINT 7990               268.25
POINT 8494               443.05
POINT 9855               258.38
POINT 9856               120.22
Rio de Janeiro           469.61

13 rows selected. 

LOCATION               DISTANCE -------------------- ---------- POINT 1358               428.75 POINT 1359               487.10 POINT 402                452.35 POINT 403                124.05 POINT 404                247.85 POINT 4675               452.49 POINT 4676               168.24 POINT 4677               306.37 POINT 4847               405.92 POINT 5206               449.08 POINT 5207               160.17 POINT 5208               367.93 POINT 5378               496.50 POINT 5379               341.63 POINT 5380               453.77 POINT 6714               282.04 POINT 6715                96.74 POINT 6716               484.03 POINT 8696               386.01 POINT 8697                77.56 POINT 8698               312.19 POINT 9652               420.05 22 rows selected. 

And the second SQL statement:

SELECT
  LOC2.LOCATION,
  ACOS(COS(1.570796327-LOC1.LAT_RAD)*COS(1.570796327-LOC2.LAT_RAD) + SIN(1.570796327-LOC1.LAT_RAD)*SIN(1.570796327-LOC2.LAT_RAD)*COS(LOC1.LON_RAD-LOC2.LON_RAD)) * 3960 DISTANCE
FROM
  (SELECT
    LAT_RAD,
    LON_RAD
  FROM
    T1
  WHERE
    LOCATION='TEST P1') LOC1,
  (SELECT
    LOCATION,
    LAT_RAD,
    LON_RAD
  FROM
    T1
  WHERE
    LOCATION<>'TEST P1') LOC2
WHERE
  (ABS(ACOS(SIN(LOC1.LAT_RAD)*SIN(LOC2.LAT_RAD)+COS(LOC1.LAT_RAD)*COS(LOC2.LAT_RAD)*COS(LOC2.LON_RAD-LOC1.LON_RAD)))* 3960) < 500
ORDER BY
  LOC2.LOCATION;

LOCATION               DISTANCE
-------------------- ----------
POINT 1561               331.91
POINT 1562                46.45
POINT 1563               426.22
POINT 200                439.78
POINT 2066               318.73
POINT 2067               387.62
POINT 4970               432.93
POINT 7989               210.35
POINT 7990               268.25
POINT 8494               443.05
POINT 9855               258.38
POINT 9856               120.22
Rio de Janeiro           469.61
 
13 rows selected.

LOCATION               DISTANCE -------------------- ---------- POINT 1358               428.75 POINT 1359               487.10 POINT 402                452.35 POINT 403                124.05 POINT 404                247.85 POINT 4675               452.49 POINT 4676               168.24 POINT 4677               306.37 POINT 4847               405.92 POINT 5206               449.08 POINT 5207               160.17 POINT 5208               367.93 POINT 5378               496.50 POINT 5379               341.63 POINT 5380               453.77 POINT 6714               282.04 POINT 6715                96.74 POINT 6716               484.03 POINT 8696               386.01 POINT 8697                77.56 POINT 8698               312.19 POINT 9652               420.05 22 rows selected. 

If you tried either of the above two SQL statements, you probably found that it took a couple of seconds to make it through the 10,004 or so rows in the table.  All of the calculations are likely hammering the server’s CPU (go parallel and hammer more than one CPU?).

Maybe we should try submitting bind variables in place of some of the calculations that we are requesting for the server to perform?  Let’s set up the bind variables to prepare another test:

SELECT
  SIN(LAT_RAD),
  COS(LAT_RAD),
  SIN(LON_RAD),
  COS(LON_RAD)
FROM
  T1
WHERE
  LOCATION='TEST P1'; 

SIN(LAT_RAD) COS(LAT_RAD) SIN(LON_RAD) COS(LON_RAD)
------------ ------------ ------------ ------------
   .33873774   .940880834   -.59809181   .801427592

SIN(LAT_RAD) COS(LAT_RAD) SIN(LON_RAD) COS(LON_RAD) ------------ ------------ ------------ ------------   -.59809181   .801427592   .867765674   -.49697358 VAR SIN1_LAT_RAD NUMBER

VAR COS1_LAT_RAD NUMBER
VAR SIN1_LON_RAD NUMBER
VAR COS1_LON_RAD NUMBER

EXEC :SIN1_LAT_RAD := .33873774
EXEC :COS1_LAT_RAD := .940880834
EXEC :SIN1_LON_RAD := -.59809181
EXEC :COS1_LON_RAD := .801427592

EXEC :SIN1_LAT_RAD := -.59809181 EXEC :COS1_LAT_RAD := .801427592 EXEC :SIN1_LON_RAD := .867765674 EXEC :COS1_LON_RAD := -.49697358 

To test the performance, execute the following scipt (more than once to eliminate the hard parse):

SET ARRAYSIZE 100
ALTER SESSION SET TRACEFILE_IDENTIFIER = 'SQL_10046';
ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT FOREVER, LEVEL 8';

SELECT
  LOC2.LOCATION,
  ABS(ACOS(SIN(LOC1.LAT_RAD)*SIN(LOC2.LAT_RAD)+COS(LOC1.LAT_RAD)*COS(LOC2.LAT_RAD)*COS(LOC2.LON_RAD-LOC1.LON_RAD)))* 3960 DISTANCE
FROM
  (SELECT
    LAT_RAD,
    LON_RAD
  FROM
    T1
  WHERE
    LOCATION='TEST P1') LOC1,
  (SELECT
    LOCATION,
    LAT_RAD,
    LON_RAD
  FROM
    T1
  WHERE
    LOCATION<>'TEST P1') LOC2
WHERE
  (ABS(ACOS(SIN(LOC1.LAT_RAD)*SIN(LOC2.LAT_RAD)+COS(LOC1.LAT_RAD)*COS(LOC2.LAT_RAD)*COS(LOC2.LON_RAD-LOC1.LON_RAD)))* 3960) < 500
ORDER BY
  LOC2.LOCATION;

SELECT
  LOC2.LOCATION,
  ACOS(COS(1.570796327-LOC1.LAT_RAD)*COS(1.570796327-LOC2.LAT_RAD) + SIN(1.570796327-LOC1.LAT_RAD)*SIN(1.570796327-LOC2.LAT_RAD)*COS(LOC1.LON_RAD-LOC2.LON_RAD)) * 3960 DISTANCE
FROM
  (SELECT
    LAT_RAD,
    LON_RAD
  FROM
    T1
  WHERE
    LOCATION='TEST P1') LOC1,
  (SELECT
    LOCATION,
    LAT_RAD,
    LON_RAD
  FROM
    T1
  WHERE
    LOCATION<>'TEST P1') LOC2
WHERE
  (ABS(ACOS(SIN(LOC1.LAT_RAD)*SIN(LOC2.LAT_RAD)+COS(LOC1.LAT_RAD)*COS(LOC2.LAT_RAD)*COS(LOC2.LON_RAD-LOC1.LON_RAD)))* 3960) < 500
ORDER BY
  LOC2.LOCATION;

SELECT
  LOC2.LOCATION,
  ABS(ACOS(:SIN1_LAT_RAD*SIN(LOC2.LAT_RAD)+:COS1_LAT_RAD*COS(LOC2.LAT_RAD)*COS(LOC2.LON_RAD-LOC1.LON_RAD)))* 3960 DISTANCE
FROM
  (SELECT
    LAT_RAD,
    LON_RAD
  FROM
    T1
  WHERE
    LOCATION='TEST P1') LOC1,
  (SELECT
    LOCATION,
    LAT_RAD,
    LON_RAD
  FROM
    T1
  WHERE
    LOCATION<>'TEST P1') LOC2
WHERE
  (ABS(ACOS(:SIN1_LAT_RAD*SIN(LOC2.LAT_RAD)+:COS1_LAT_RAD*COS(LOC2.LAT_RAD)*COS(LOC2.LON_RAD-LOC1.LON_RAD)))* 3960) < 500
ORDER BY
  LOC2.LOCATION;

ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT OFF'; 

Trace file output for the first query (1.794 CPU seconds):

FETCH #406890544:c=1794011,e=1797194,p=0,cr=94,cu=0,mis=0,r=1,dep=0,og=1,plh=2797247905,tim=2500817545497
WAIT #406890544: nam='SQL*Net message from client' ela= 357 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=2500817545983
WAIT #406890544: nam='SQL*Net message to client' ela= 0 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=2500817546020
FETCH #406890544:c=0,e=44,p=0,cr=0,cu=0,mis=0,r=21,dep=0,og=1,plh=2797247905,tim=2500817546058
STAT #406890544 id=1 cnt=22 pid=0 pos=1 obj=0 op='SORT ORDER BY (cr=94 pr=0 pw=0 time=1797194 us cost=13 size=25000 card=500)'
STAT #406890544 id=2 cnt=22 pid=1 pos=1 obj=0 op='NESTED LOOPS  (cr=94 pr=0 pw=0 time=125707 us cost=12 size=25000 card=500)'
STAT #406890544 id=3 cnt=1 pid=2 pos=1 obj=90695 op='TABLE ACCESS BY INDEX ROWID T1 (cr=3 pr=0 pw=0 time=23 us cost=2 size=25 card=1)'
STAT #406890544 id=4 cnt=1 pid=3 pos=1 obj=90696 op='INDEX UNIQUE SCAN SYS_C0038308 (cr=2 pr=0 pw=0 time=17 us cost=1 size=0 card=1)'
STAT #406890544 id=5 cnt=22 pid=2 pos=2 obj=90695 op='TABLE ACCESS FULL T1 (cr=91 pr=0 pw=0 time=125683 us cost=10 size=12500 card=500)'
WAIT #406890544: nam='SQL*Net message from client' ela= 4359 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=2500817550490
CLOSE #406890544:c=0,e=14,dep=0,type=0,tim=2500817550613 

Trace file output for the second query (1.529 CPU seconds):

FETCH #406890544:c=1528810,e=1519062,p=0,cr=94,cu=0,mis=0,r=1,dep=0,og=1,plh=2797247905,tim=2500819083108
WAIT #406890544: nam='SQL*Net message from client' ela= 320 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=2500819083487
WAIT #406890544: nam='SQL*Net message to client' ela= 0 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=2500819083524
FETCH #406890544:c=0,e=42,p=0,cr=0,cu=0,mis=0,r=21,dep=0,og=1,plh=2797247905,tim=2500819083560
STAT #406890544 id=1 cnt=22 pid=0 pos=1 obj=0 op='SORT ORDER BY (cr=94 pr=0 pw=0 time=1519059 us cost=13 size=25000 card=500)'
STAT #406890544 id=2 cnt=22 pid=1 pos=1 obj=0 op='NESTED LOOPS  (cr=94 pr=0 pw=0 time=50452 us cost=12 size=25000 card=500)'
STAT #406890544 id=3 cnt=1 pid=2 pos=1 obj=90695 op='TABLE ACCESS BY INDEX ROWID T1 (cr=3 pr=0 pw=0 time=15 us cost=2 size=25 card=1)'
STAT #406890544 id=4 cnt=1 pid=3 pos=1 obj=90696 op='INDEX UNIQUE SCAN SYS_C0038308 (cr=2 pr=0 pw=0 time=10 us cost=1 size=0 card=1)'
STAT #406890544 id=5 cnt=22 pid=2 pos=2 obj=90695 op='TABLE ACCESS FULL T1 (cr=91 pr=0 pw=0 time=50437 us cost=10 size=12500 card=500)'
WAIT #406890544: nam='SQL*Net message from client' ela= 3294 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=2500819086974
CLOSE #406890544:c=0,e=13,dep=0,type=0,tim=2500819087016 

Trace file output for the third query (1.388 CPU seconds):

FETCH #406890544:c=1388409,e=1380995,p=0,cr=94,cu=0,mis=0,r=1,dep=0,og=1,plh=2797247905,tim=2500820478110
WAIT #406890544: nam='SQL*Net message from client' ela= 278 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=2500820478441
WAIT #406890544: nam='SQL*Net message to client' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=2500820478480
FETCH #406890544:c=0,e=46,p=0,cr=0,cu=0,mis=0,r=21,dep=0,og=1,plh=2797247905,tim=2500820478519
STAT #406890544 id=1 cnt=22 pid=0 pos=1 obj=0 op='SORT ORDER BY (cr=94 pr=0 pw=0 time=1380991 us cost=13 size=21500 card=500)'
STAT #406890544 id=2 cnt=22 pid=1 pos=1 obj=0 op='NESTED LOOPS  (cr=94 pr=0 pw=0 time=45079 us cost=12 size=21500 card=500)'
STAT #406890544 id=3 cnt=1 pid=2 pos=1 obj=90695 op='TABLE ACCESS BY INDEX ROWID T1 (cr=3 pr=0 pw=0 time=12 us cost=2 size=18 card=1)'
STAT #406890544 id=4 cnt=1 pid=3 pos=1 obj=90696 op='INDEX UNIQUE SCAN SYS_C0038308 (cr=2 pr=0 pw=0 time=9 us cost=1 size=0 card=1)'
STAT #406890544 id=5 cnt=22 pid=2 pos=2 obj=90695 op='TABLE ACCESS FULL T1 (cr=91 pr=0 pw=0 time=45046 us cost=10 size=12500 card=500)'
WAIT #406890544: nam='SQL*Net message from client' ela= 2713 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=2500820481326
CLOSE #406890544:c=0,e=15,dep=0,type=0,tim=2500820481372 

Maybe we should just assume that the world is flat to save time (note that I am not converting the degree values to radians)?

SELECT
  LOC2.LOCATION,
  SQRT(POWER((LOC1.LAT_DEG - LOC2.LAT_DEG),2) + POWER((LOC1.LON_DEG - LOC2.LON_DEG),2)) * 3960 * 3.141592654/180 DISTANCE
FROM
  (SELECT
    LAT_DEG,
    LON_DEG
  FROM
    T1
  WHERE
    LOCATION='TEST P1') LOC1,
  (SELECT
    LOCATION,
    LAT_DEG,
    LON_DEG
  FROM
    T1
  WHERE
    LOCATION<>'TEST P1') LOC2
WHERE
  (SQRT(POWER((LOC1.LAT_DEG - LOC2.LAT_DEG),2) + POWER((LOC1.LON_DEG - LOC2.LON_DEG),2)) * 3960 * 3.141592654/180) < 500
ORDER BY
  LOC2.LOCATION;

LOCATION               DISTANCE
-------------------- ----------
POINT 1561               351.10
POINT 1562                49.17
POINT 1563               446.41
POINT 200                447.03
POINT 2066               327.96
POINT 2067               389.31
POINT 4970               436.30
POINT 7989               222.68
POINT 7990               274.47
POINT 8494               446.20
POINT 9855               272.91
POINT 9856               126.70
Rio de Janeiro           497.15

13 rows selected.
LOCATION               DISTANCE -------------------- ---------- POINT 4847           487.345998 POINT 5207           194.867475 POINT 5208           371.803354 POINT 6714           306.023772 POINT 6715            110.18648  5 rows selected.

Still 13 rows returned, but the distances are a bit off due to the curvature of the earth. What happened to the other 17 rows… did I make a mistake, or is the world not flat?

Note that the original version of this article made a critical mistake – latitude values cannot exceed 90 degrees.  While trying to implement my suggestion in a comment to use +-0.0002525254004999719 radians to speed up the query (note, still need to take into account the wrap-around for longitude values) I discovered inconsistent results.  This article is corrected to fix that mistake.

Added June 14, 2011:

The speed up version of the query:

SET ARRAYSIZE 100 ALTER SESSION SET TRACEFILE_IDENTIFIER = 'SQL_10046-FAST'; 
ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT FOREVER, LEVEL 8';     
SELECT 
  LOC2.LOCATION, 
  ABS(ACOS(SIN(LOC1.LAT_RAD)*SIN(LOC2.LAT_RAD)+COS(LOC1.LAT_RAD)*COS(LOC2.LAT_RAD)*COS(LOC2.LON_RAD-LOC1.LON_RAD)))* 3960 DISTANCE 
FROM 
  (SELECT 
     LAT_RAD, 
     LON_RAD 
   FROM 
     T1 
   WHERE 
     LOCATION='TEST P1') LOC1, 
  (SELECT 
     LOCATION, 
     LAT_RAD, 
     LON_RAD 
   FROM 
     T1 
   WHERE 
     LOCATION<>'TEST P1') LOC2 
WHERE 
  LOC2.LAT_RAD BETWEEN LOC1.LAT_RAD-(0.0002525254004999719 * 500) AND LOC1.LAT_RAD+(0.0002525254004999719 * 500) 
  AND LOC2.LON_RAD BETWEEN LOC1.LON_RAD-(0.0002525254004999719 * 500) AND LOC1.LON_RAD+(0.0002525254004999719 * 500) 
  AND (ABS(ACOS(SIN(LOC1.LAT_RAD)*SIN(LOC2.LAT_RAD)+COS(LOC1.LAT_RAD)*COS(LOC2.LAT_RAD)*COS(LOC2.LON_RAD-LOC1.LON_RAD)))* 3960) < 500 
ORDER BY 
  LOC2.LOCATION; 

ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT OFF'; 

The fetch required almost no CPU time, from the 10046 trace file:

PARSE #406878256:c=0,e=3157,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,plh=1311020123,tim=2586341558538 
EXEC #406878256:c=0,e=35,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=1311020123,tim=2586341558744 
WAIT #406878256: nam='SQL*Net message to client' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=2586341558793 
FETCH #406878256:c=15600,e=7387,p=0,cr=94,cu=0,mis=0,r=1,dep=0,og=1,plh=1311020123,tim=2586341566207 
WAIT #406878256: nam='SQL*Net message from client' ela= 11647 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=2586341577955 
WAIT #406878256: nam='SQL*Net message to client' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=2586341578017 
FETCH #406878256:c=0,e=42,p=0,cr=0,cu=0,mis=0,r=12,dep=0,og=1,plh=1311020123,tim=2586341578052 
STAT #406878256 id=1 cnt=13 pid=0 pos=1 obj=0 op='SORT ORDER BY (cr=94 pr=0 pw=0 time=7387 us cost=13 size=1950 card=39)' 
STAT #406878256 id=2 cnt=13 pid=1 pos=1 obj=0 op='NESTED LOOPS  (cr=94 pr=0 pw=0 time=3920 us cost=12 size=1950 card=39)' 
STAT #406878256 id=3 cnt=1 pid=2 pos=1 obj=90785 op='TABLE ACCESS BY INDEX ROWID T1 (cr=3 pr=0 pw=0 time=23 us cost=2 size=25 card=1)' 
STAT #406878256 id=4 cnt=1 pid=3 pos=1 obj=90786 op='INDEX UNIQUE SCAN SYS_C0038321 (cr=2 pr=0 pw=0 time=15 us cost=1 size=0 card=1)' 
STAT #406878256 id=5 cnt=13 pid=2 pos=2 obj=90785 op='TABLE ACCESS FULL T1 (cr=91 pr=0 pw=0 time=3894 us cost=10 size=975 card=39)' 
WAIT #406878256: nam='SQL*Net message from client' ela= 7604 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=2586341585780 
CLOSE #406878256:c=0,e=13,dep=0,type=0,tim=2586341585850 

The above is not a finished product, it is still necessary to take care of wrap-around values for the longitude coordinates.





Row Values to Comma Separated Lists, an Overly Complicated Use Case Example

26 05 2011

May 26, 2011

In a previous article I showed a simple method to convert values found in multiple table rows into a comma delimited list.  The method works very well, until the situation in which the approach is to be used becomes a bit more complex.

Assume for a moment that the following table structure exists where the approach must be implemented:

In the above diagram, the PURCHASE_ORDER table is the parent table, the PURC_ORDER_LINE table is the child table (the line-level detail), and the PURC_LINE_DEL table is the grand-child table (the sub-line level detail).  A fairly simple setup, and I suspect not a too terribly uncommon setup.

Now the complexity.  Assume that an ERP package using the above tables generates a report using the tables’ data, and as the ERP package is flexible, it allows additional information to be added to the reports – essentially, it is possible to execute a query that retrieves a single row for every detail-level row in the report.  Great, except that in this particular report, the detail-level row level is at the PURC_ORDER_LINE table.  If there were 18 rows, for instance, in the PURC_LINE_DEL table (the schedule that indicates when the parts should be delivered) for the first order line in the PURC_ORDER_LINE table for a particular order ID that is found in the PURCHASE_ORDER table, then we have a bit of a problem.  Fortunately, this ERP package’s report was constructed such that the DESIRED_RECV_DATE and USER_ORDER_QTY columns are represented in two report variables with CRLF (carriage return – line feed) delimited lists, making it very easy to list the delivery dates that the parts are due, and the quantities that are due for each of those dates.  Fantastic, so where is the complexity?  What if the purchase order must be revised after the supplier starts sending the parts on the agreed upon delivery dates, such that the first 9 of the 18 delivery schedules for line 1 of a purchase order have been satisfied, 1 of the 18 delivery schedule lines has been partially satisfied, and the remaining 8 delivery schedule lines have dates that are in the future?  This might be helpful information to provide to the supplier when presenting the updated purchase order.  Unfortunately, the ERP package’s report does not deliver the USER_RECEIVED_QTY column as a CRLF delimited list, as it does for the DESIRED_RECV_DATE and USER_ORDER_QTY columns. 

A complex problem, or not?  Well, the ERP package does permit a SQL statement returning a single row to be executed per detail row in the report (at the PURC_ORDER_LINE table level), so we have a possible solution.  However, if the SQL statement is too complex (having inline views is one such complication), the complex portion of the SQL statement must be isolated in a statically defined view, and then that view should be referenced by the report.  A complex problem, or not?  Let’s try building a solution starting in SQL*Plus:

SELECT
  PURC_ORDER_ID,
  PURC_ORDER_LINE_NO,
  USER_RECEIVED_QTY,
  ROW_NUMBER() OVER (PARTITION BY PURC_ORDER_ID,PURC_ORDER_LINE_NO ORDER BY DEL_SCHED_LINE_NO) RN
FROM
  PURC_LINE_DEL
WHERE
  PURC_ORDER_ID='146825'
  AND PURC_ORDER_LINE_NO=1
ORDER BY
  DEL_SCHED_LINE_NO;

PURC_ORDER_ID   PURC_ORDER_LINE_NO USER_RECEIVED_QTY         RN
--------------- ------------------ ----------------- ----------
146825                           1                 1          1
146825                           1                 1          2
146825                           1                 2          3
146825                           1                 2          4
146825                           1                 2          5
146825                           1                 3          6
146825                           1                 3          7
146825                           1                 1          8
146825                           1                 3          9
146825                           1                 2         10
146825                           1                 0         11
146825                           1                 0         12
146825                           1                 0         13
146825                           1                 0         14
146825                           1                 0         15
146825                           1                 0         16
146825                           1                 0         17
146825                           1                 0         18 

Simple so far, that line identified by RN 10 requests an order quantity of 3, but the supplier only delivered 2 of the 3 parts for that date.  Now let’s generate a comma separated list using the SYS_CONNECT_BY_PATH function:

SELECT
  SYS_CONNECT_BY_PATH(TO_CHAR(USER_RECEIVED_QTY),',') LISTING
FROM
(SELECT
  PURC_ORDER_ID,
  PURC_ORDER_LINE_NO,
  USER_RECEIVED_QTY,
  ROW_NUMBER() OVER (PARTITION BY PURC_ORDER_ID,PURC_ORDER_LINE_NO ORDER BY DEL_SCHED_LINE_NO) RN
FROM
  PURC_LINE_DEL
WHERE
  PURC_ORDER_ID='146825'
  AND PURC_ORDER_LINE_NO=1
ORDER BY
  DEL_SCHED_LINE_NO) V
CONNECT BY PRIOR
  RN = RN-1
START WITH
  RN = 1;

LISTING
------------------------------------

,1
,1,1
,1,1,2
,1,1,2,2
,1,1,2,2,2
,1,1,2,2,2,3
,1,1,2,2,2,3,3
,1,1,2,2,2,3,3,1
,1,1,2,2,2,3,3,1,3
,1,1,2,2,2,3,3,1,3,2
,1,1,2,2,2,3,3,1,3,2,0
,1,1,2,2,2,3,3,1,3,2,0,0
,1,1,2,2,2,3,3,1,3,2,0,0,0
,1,1,2,2,2,3,3,1,3,2,0,0,0,0
,1,1,2,2,2,3,3,1,3,2,0,0,0,0,0
,1,1,2,2,2,3,3,1,3,2,0,0,0,0,0,0
,1,1,2,2,2,3,3,1,3,2,0,0,0,0,0,0,0
,1,1,2,2,2,3,3,1,3,2,0,0,0,0,0,0,0,0 

We only want the last row returned from the above result, and we need to change those commas to a CRLF character combination (after eliminating the initial comma):

SELECT
  MAX(REPLACE(SUBSTR(SYS_CONNECT_BY_PATH(TO_CHAR(USER_RECEIVED_QTY),','),2),',',CHR(13)||CHR(10))) LISTING
FROM
(SELECT
  PURC_ORDER_ID,
  PURC_ORDER_LINE_NO,
  USER_RECEIVED_QTY,
  ROW_NUMBER() OVER (PARTITION BY PURC_ORDER_ID,PURC_ORDER_LINE_NO ORDER BY DEL_SCHED_LINE_NO) RN
FROM
  PURC_LINE_DEL
WHERE
  PURC_ORDER_ID='146825'
  AND PURC_ORDER_LINE_NO=1
ORDER BY
  DEL_SCHED_LINE_NO) V
CONNECT BY PRIOR
  RN = RN-1
START WITH
  RN = 1;

LISTING
-------

1
1
2
2
2
3
3
1
3
2
0
0
0
0
0
0
0
0

Looks like mission accomplished, all of the previously received quantities are returned in a single query row… except now we have a complicated query that must be isolated in a statically defined view (silly rules…).

Before we head down the path of isolating the complex portion of the query, I thought that it might be entertaining to examine a couple of execution plan to see what happens when we isolate the complicated portion of the query in a statically defined view.  First, we will take a look at the last of the above SQL statements:

SET LINESIZE 140
SET TRIMSPOOL ON
SET PAGESIZE 1000
SET AUTOTRACE TRACEONLY EXPLAIN

SELECT
  MAX(REPLACE(SUBSTR(SYS_CONNECT_BY_PATH(TO_CHAR(USER_RECEIVED_QTY),','),2),',',CHR(13)||CHR(10))) LISTING
FROM
(SELECT
  PURC_ORDER_ID,
  PURC_ORDER_LINE_NO,
  USER_RECEIVED_QTY,
  ROW_NUMBER() OVER (PARTITION BY PURC_ORDER_ID,PURC_ORDER_LINE_NO ORDER BY DEL_SCHED_LINE_NO) RN
FROM
  PURC_LINE_DEL
WHERE
  PURC_ORDER_ID='146825'
  AND PURC_ORDER_LINE_NO=1
ORDER BY
  DEL_SCHED_LINE_NO) V
CONNECT BY PRIOR
  RN = RN-1
START WITH
  RN = 1;

Plan hash value: 929864805

----------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |               |     1 |    26 |     4  (25)| 00:00:01 |
|   1 |  SORT AGGREGATE                          |               |     1 |    26 |            |          |
|*  2 |   CONNECT BY NO FILTERING WITH START-WITH|               |       |       |            |          |
|   3 |    VIEW                                  |               |     4 |   104 |     3   (0)| 00:00:01 |
|   4 |     WINDOW NOSORT                        |               |     4 |    76 |     3   (0)| 00:00:01 |
|   5 |      TABLE ACCESS BY INDEX ROWID         | PURC_LINE_DEL |     4 |    76 |     3   (0)| 00:00:01 |
|*  6 |       INDEX RANGE SCAN                   | SYS_C0021790  |     4 |       |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("RN"-1=PRIOR "RN")
       filter("RN"=1)
   6 - access("PURC_ORDER_ID"='146825' AND "PURC_ORDER_LINE_NO"=1) 

The above looks simple, and the query really does execute quickly.  For the above complex portion of the query to work in a statically defined view, we need to relocate the WHERE clause outside of the inline view, so that when the report runs, it is able to pass in the appropriate values for the PURC_ORDER_ID and PURC_ORDER_LINE_NO columns.  Here is the execution plan (generated on Oracle Database 11.2.0.2) when the WHERE clause is relocated:

SELECT
  MAX(REPLACE(SUBSTR(SYS_CONNECT_BY_PATH(TO_CHAR(USER_RECEIVED_QTY),','),2),',',CHR(13)||CHR(10))) LISTING
FROM
(SELECT
  PURC_ORDER_ID,
  PURC_ORDER_LINE_NO,
  USER_RECEIVED_QTY,
  ROW_NUMBER() OVER (PARTITION BY PURC_ORDER_ID,PURC_ORDER_LINE_NO ORDER BY DEL_SCHED_LINE_NO) RN
FROM
  PURC_LINE_DEL
ORDER BY
  DEL_SCHED_LINE_NO) V
WHERE
  PURC_ORDER_ID='146825'
  AND PURC_ORDER_LINE_NO=1
CONNECT BY PRIOR
  RN = RN-1
START WITH
  RN = 1;

Plan hash value: 2107147172

-------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                 | Name          | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                          |               |     1 |    48 |       |  1193  (50)| 00:00:05 |
|   1 |  SORT AGGREGATE                           |               |     1 |    48 |       |            |          |
|*  2 |   FILTER                                  |               |       |       |       |            |          |
|*  3 |    CONNECT BY NO FILTERING WITH START-WITH|               |       |       |       |            |          |
|   4 |     VIEW                                  |               | 53851 |  2524K|       |   625   (3)| 00:00:03 |
|   5 |      SORT ORDER BY                        |               | 53851 |   999K|  1704K|   625   (3)| 00:00:03 |
|   6 |       WINDOW SORT                         |               | 53851 |   999K|  1704K|   625   (3)| 00:00:03 |
|   7 |        TABLE ACCESS FULL                  | PURC_LINE_DEL | 53851 |   999K|       |    57   (8)| 00:00:01 |
-------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("PURC_ORDER_ID"='146825' AND "PURC_ORDER_LINE_NO"=1)
   3 - access("RN"-1=PRIOR "RN")
       filter("RN"=1) 

The above looks to be reasonably sane, although the predicted amount of time jumped from about 1 second to about 5 seconds.  The Predicate Information section of the execution plan indicates that the filter for the specified PURC_ORDER_ID and PURC_ORDER_LINE_NO was not pushed into the inline view as I had hoped.  The actual execution time for this simple query is much longer than 5 seconds, spending 100% of its time burning CPU time (I killed the query after waiting several minutes).  No problem, I will just use a hint to force the optimizer to push those predicates into the inline view:

SELECT /*+ PUSH_PRED(V) */
  MAX(REPLACE(SUBSTR(SYS_CONNECT_BY_PATH(TO_CHAR(USER_RECEIVED_QTY),','),2),',',CHR(13)||CHR(10))) LISTING
FROM
(SELECT
  PURC_ORDER_ID,
  PURC_ORDER_LINE_NO,
  USER_RECEIVED_QTY,
  ROW_NUMBER() OVER (PARTITION BY PURC_ORDER_ID,PURC_ORDER_LINE_NO ORDER BY DEL_SCHED_LINE_NO) RN
FROM
  PURC_LINE_DEL
ORDER BY
  DEL_SCHED_LINE_NO) V
WHERE
  PURC_ORDER_ID='146825'
  AND PURC_ORDER_LINE_NO=1
CONNECT BY PRIOR
  RN = RN-1
START WITH
  RN = 1;

Plan hash value: 2107147172

-------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                 | Name          | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                          |               |     1 |    48 |       |  1193  (50)| 00:00:05 |
|   1 |  SORT AGGREGATE                           |               |     1 |    48 |       |            |          |
|*  2 |   FILTER                                  |               |       |       |       |            |          |
|*  3 |    CONNECT BY NO FILTERING WITH START-WITH|               |       |       |       |            |          |
|   4 |     VIEW                                  |               | 53851 |  2524K|       |   625   (3)| 00:00:03 |
|   5 |      SORT ORDER BY                        |               | 53851 |   999K|  1704K|   625   (3)| 00:00:03 |
|   6 |       WINDOW SORT                         |               | 53851 |   999K|  1704K|   625   (3)| 00:00:03 |
|   7 |        TABLE ACCESS FULL                  | PURC_LINE_DEL | 53851 |   999K|       |    57   (8)| 00:00:01 |
-------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("PURC_ORDER_ID"='146825' AND "PURC_ORDER_LINE_NO"=1)
   3 - access("RN"-1=PRIOR "RN")
       filter("RN"=1) 

Well, the above is disappointing.  I could dig through a 10053 trace to try to determine why the hint was “ignored“, or I could try another query that produces an equivalent result.  I choose the latter.  Let’s try using the COUNT analytic function to replace the ROW_NUMBER analytic function:

SELECT
  MAX(REPLACE(SUBSTR(SYS_CONNECT_BY_PATH(TO_CHAR(USER_RECEIVED_QTY),','),2),',',CHR(13)||CHR(10))) LISTING
FROM
(SELECT
  PURC_ORDER_ID,
  PURC_ORDER_LINE_NO,
  DEL_SCHED_LINE_NO,
  USER_RECEIVED_QTY,
  COUNT(*) OVER (PARTITION BY PURC_ORDER_ID,PURC_ORDER_LINE_NO ORDER BY DEL_SCHED_LINE_NO) RN
FROM
  PURC_LINE_DEL
WHERE
  PURC_ORDER_ID='146825'
  AND PURC_ORDER_LINE_NO=1
ORDER BY
  DEL_SCHED_LINE_NO) V
CONNECT BY PRIOR
  RN=RN-1
START WITH
  RN=1;

Plan hash value: 617003335

----------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |               |     1 |    26 |     4  (25)| 00:00:01 |
|   1 |  SORT AGGREGATE                          |               |     1 |    26 |            |          |
|*  2 |   CONNECT BY NO FILTERING WITH START-WITH|               |       |       |            |          |
|   3 |    VIEW                                  |               |     4 |   104 |     3   (0)| 00:00:01 |
|   4 |     WINDOW BUFFER                        |               |     4 |    76 |     3   (0)| 00:00:01 |
|   5 |      TABLE ACCESS BY INDEX ROWID         | PURC_LINE_DEL |     4 |    76 |     3   (0)| 00:00:01 |
|*  6 |       INDEX RANGE SCAN                   | SYS_C0021790  |     4 |       |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("RN"-1=PRIOR "RN")
       filter("RN"=1)
   6 - access("PURC_ORDER_ID"='146825' AND "PURC_ORDER_LINE_NO"=1) 

The execution plan appears to be reasonable – the predicates on the PURC_ORDER_ID and PURC_ORDER_LINE_NO columns are applied early, so the query executes quickly.  Now to move the WHERE clause:

SELECT
  MAX(REPLACE(SUBSTR(SYS_CONNECT_BY_PATH(TO_CHAR(USER_RECEIVED_QTY),','),2),',',CHR(13)||CHR(10))) LISTING
FROM
(SELECT
  PURC_ORDER_ID,
  PURC_ORDER_LINE_NO,
  DEL_SCHED_LINE_NO,
  USER_RECEIVED_QTY,
  COUNT(*) OVER (PARTITION BY PURC_ORDER_ID,PURC_ORDER_LINE_NO ORDER BY DEL_SCHED_LINE_NO) RN
FROM
  PURC_LINE_DEL
ORDER BY
  DEL_SCHED_LINE_NO) V
WHERE
  PURC_ORDER_ID='146825'
  AND PURC_ORDER_LINE_NO=1
CONNECT BY PRIOR
  RN=RN-1
START WITH
  RN=1;

Plan hash value: 2107147172

-------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                 | Name          | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                          |               |     1 |    48 |       |  1193  (50)| 00:00:05 |
|   1 |  SORT AGGREGATE                           |               |     1 |    48 |       |            |          |
|*  2 |   FILTER                                  |               |       |       |       |            |          |
|*  3 |    CONNECT BY NO FILTERING WITH START-WITH|               |       |       |       |            |          |
|   4 |     VIEW                                  |               | 53851 |  2524K|       |   625   (3)| 00:00:03 |
|   5 |      SORT ORDER BY                        |               | 53851 |   999K|  1704K|   625   (3)| 00:00:03 |
|   6 |       WINDOW SORT                         |               | 53851 |   999K|  1704K|   625   (3)| 00:00:03 |
|   7 |        TABLE ACCESS FULL                  | PURC_LINE_DEL | 53851 |   999K|       |    57   (8)| 00:00:01 |
-------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("PURC_ORDER_ID"='146825' AND "PURC_ORDER_LINE_NO"=1)
   3 - access("RN"-1=PRIOR "RN")
       filter("RN"=1) 

As had happened before, the predicates for the PURC_ORDER_ID and PURC_ORDER_LINE_NO columns were not pushed into the inline view, so the query executes extremely slow.  Let’s try the PUSH_PRED hint again:

SELECT /*+ PUSH_PRED(V) */
  MAX(REPLACE(SUBSTR(SYS_CONNECT_BY_PATH(TO_CHAR(USER_RECEIVED_QTY),','),2),',',CHR(13)||CHR(10))) LISTING
FROM
(SELECT
  PURC_ORDER_ID,
  PURC_ORDER_LINE_NO,
  DEL_SCHED_LINE_NO,
  USER_RECEIVED_QTY,
  COUNT(*) OVER (PARTITION BY PURC_ORDER_ID,PURC_ORDER_LINE_NO ORDER BY DEL_SCHED_LINE_NO) RN
FROM
  PURC_LINE_DEL
ORDER BY
  DEL_SCHED_LINE_NO) V
WHERE
  PURC_ORDER_ID='146825'
  AND PURC_ORDER_LINE_NO=1
CONNECT BY PRIOR
  RN=RN-1
START WITH
  RN=1;

Plan hash value: 2107147172

-------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                 | Name          | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                          |               |     1 |    48 |       |  1193  (50)| 00:00:05 |
|   1 |  SORT AGGREGATE                           |               |     1 |    48 |       |            |          |
|*  2 |   FILTER                                  |               |       |       |       |            |          |
|*  3 |    CONNECT BY NO FILTERING WITH START-WITH|               |       |       |       |            |          |
|   4 |     VIEW                                  |               | 53851 |  2524K|       |   625   (3)| 00:00:03 |
|   5 |      SORT ORDER BY                        |               | 53851 |   999K|  1704K|   625   (3)| 00:00:03 |
|   6 |       WINDOW SORT                         |               | 53851 |   999K|  1704K|   625   (3)| 00:00:03 |
|   7 |        TABLE ACCESS FULL                  | PURC_LINE_DEL | 53851 |   999K|       |    57   (8)| 00:00:01 |
-------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("PURC_ORDER_ID"='146825' AND "PURC_ORDER_LINE_NO"=1)
   3 - access("RN"-1=PRIOR "RN")
       filter("RN"=1) 

The execution plan is unchanged.  Out of luck…

I remembered reading about a new analytic function introduced in Oracle Database 11.2.0.1, what was that new function… LISTAGG?  I am not a fan of that function name, Oracle should have named the function __CREATE_A_DARN_COMMA_SEPARATED_LIST_QUICKLY (a double underscore function, making it a challenge to find the function).  The LISTAGG function is described in the Oracle documentation library for 11.2.  Let’s give the LISTAGG function a try (the SQL statement is also now formatting the USER_RECEIVED_QTY so that the column values consistently show 2 decimal places):

SELECT
  LISTING
FROM
(SELECT
  LISTAGG(TO_CHAR(USER_RECEIVED_QTY,'999,990.00'), CHR(13)||CHR(10)) WITHIN GROUP (ORDER BY DEL_SCHED_LINE_NO) LISTING
FROM
  PURC_LINE_DEL
WHERE
  PURC_ORDER_ID='146825'
  AND PURC_ORDER_LINE_NO=1
GROUP BY
  PURC_ORDER_ID,
  PURC_ORDER_LINE_NO);

Plan hash value: 3148992471

----------------------------------------------------------------------------------------------
| Id  | Operation                    | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |               |     1 |    19 |     3   (0)| 00:00:01 |
|   1 |  SORT GROUP BY               |               |     1 |    19 |     3   (0)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID| PURC_LINE_DEL |     4 |    76 |     3   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | SYS_C0021790  |     4 |       |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("PURC_ORDER_ID"='146825' AND "PURC_ORDER_LINE_NO"=1) 

The calculated cost for the query dropped by 1 and the CPU component of that cost dropped to 0%, so that must be a good thing (the calculated cost can be misleading, just as we saw earlier that the estimated time can be misleading).  Without AUTOTRACE TRACEONLY EXPLAIN enabled, the query executes very quickly and produces output that looks similar to this:

LISTING
-----------

       1.00
       1.00
       2.00
       2.00
       2.00
       3.00
       3.00
       1.00
       3.00
       2.00
       0.00
       0.00
       0.00
       0.00
       0.00
       0.00
       0.00
       0.00
 

OK, let’s re-enable AUTOTRACE TRACEONLY EXPLAIN and try moving the WHERE clause so that we are able to package the complex portion of this query into a statically defined view:

SELECT
  LISTING
FROM
(SELECT
  PURC_ORDER_ID,
  PURC_ORDER_LINE_NO,
  LISTAGG(TO_CHAR(USER_RECEIVED_QTY,'999,990.00'), CHR(13)||CHR(10)) WITHIN GROUP (ORDER BY DEL_SCHED_LINE_NO) LISTING
FROM
  PURC_LINE_DEL
GROUP BY
  PURC_ORDER_ID,
  PURC_ORDER_LINE_NO) V
WHERE
  PURC_ORDER_ID='146825'
  AND PURC_ORDER_LINE_NO=1;

Plan hash value: 3148992471

----------------------------------------------------------------------------------------------
| Id  | Operation                    | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |               |     1 |    19 |     3   (0)| 00:00:01 |
|   1 |  SORT GROUP BY               |               |     1 |    19 |     3   (0)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID| PURC_LINE_DEL |     4 |    76 |     3   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | SYS_C0021790  |     4 |       |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("PURC_ORDER_ID"='146825' AND "PURC_ORDER_LINE_NO"=1) 

In the above, notice that the  predicates for the PURC_ORDER_ID and PURC_ORDER_LINE_NO columns were pushed into the inline view, or more correctly, the inline view was completely eliminated by the query optimizer (notice that there is no line in the execution plan showing VIEW as we saw in the previous examples above).

OK, so Oracle’s query optimizer worked as I had hoped with the inline view, but what about a statically defined view – would it also execute the SQL statement efficiently?  Let’s create the view:

CREATE VIEW TEST_LA AS
SELECT
  PURC_ORDER_ID,
  PURC_ORDER_LINE_NO,
  LISTAGG(TO_CHAR(USER_RECEIVED_QTY,'999,990.00'), CHR(13)||CHR(10)) WITHIN GROUP (ORDER BY DEL_SCHED_LINE_NO) LISTING
FROM
  PURC_LINE_DEL
GROUP BY
  PURC_ORDER_ID,
  PURC_ORDER_LINE_NO; 

Now let’s try the query with AUTOTRACE TRACEONLY EXPLAIN enabled:

SELECT
  LISTING
FROM
  TEST_LA
WHERE
  PURC_ORDER_ID='146825'
  AND PURC_ORDER_LINE_NO=1;

Plan hash value: 3148992471

----------------------------------------------------------------------------------------------
| Id  | Operation                    | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |               |     1 |    19 |     3   (0)| 00:00:01 |
|   1 |  SORT GROUP BY               |               |     1 |    19 |     3   (0)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID| PURC_LINE_DEL |     4 |    76 |     3   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | SYS_C0021790  |     4 |       |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("PURC_ORDER_ID"='146825' AND "PURC_ORDER_LINE_NO"=1) 

The execution plan above shows the same cost (3) that we saw earlier, and once again the view was eliminated by Oracle’s query optimizer (as indicated by the absence of a VIEW line in the execution plan), allowing the query to execute quickly since the predicates would be applied early in the query execution.

So, Oracle Database 11.2.0.1 and above provide an efficient way to retrieve the information that we need in a comma (or CRLF) delimited list for the report.  When the SQL statement is added to the report (defined as an “extended query” in the ERP package), the SQL statement looks like the following (the reporting package does not permit line breaks in the SQL statement):

SELECT (LISTAGG(TO_CHAR(USER_RECEIVED_QTY,'999,990.00'), CHR(13)||CHR(10)) WITHIN GROUP (ORDER BY DEL_SCHED_LINE_NO)) FROM PURC_LINE_DEL WHERE PURC_ORDER_ID= : PO_ID AND PURC_ORDER_LINE_NO= :LN_LINE_NO GROUP BY PURC_ORDER_ID , PURC_ORDER_LINE_NO

Oddly enough, by enclosing the LISTAGG analytic funtion in the SQL statement with ( ), the reporting package did not require me to use a statically defined view for the complex portion of the query.  But even if the statically defined view was required, as shown above, the query would still have executed efficiently.





Using ROWNUM in the Where Clause Causes Problems

4 04 2011

April 4, 2011

A couple of years ago a very well written article appeared in Oracle Magazine that described how to use ROWNUM in the WHERE clause, and also why using ROWNUM might not work quite as expected.

Let’s assume that we want to do something a bit unusual, as described in this forum thread.  We set up the tables for the test case to see what is happening:

CREATE TABLE T1 AS
SELECT
  ROWNUM C1,
  TRUNC(SYSDATE+ROWNUM) C2,
  LPAD('A',15,'A') PADDING
FROM
  DUAL
CONNECT BY
  LEVEL<=10000
ORDER BY
  DBMS_RANDOM.VALUE;

CREATE TABLE T2 AS
SELECT
  ROWNUM C1,
  TRUNC(SYSDATE+ROWNUM) C2,
  LPAD('A',15,'A') PADDING
FROM
  DUAL
CONNECT BY
  LEVEL<=10000
ORDER BY
  DBMS_RANDOM.VALUE;

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T1')
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T2') 

In the above, 10,000 rows were inserted into the T1 and T2 tables in random order, as the tables were created.  Trying a basic query, if we execute the following query, we should see six randomly selected rows:

SELECT
  *
FROM
  T1
WHERE
  ROWNUM IN (6,5,4,3,2,1);

  C1 C2        PADDING
---- --------- ---------------
9546 23-MAY-37 AAAAAAAAAAAAAAA
4894 27-AUG-24 AAAAAAAAAAAAAAA
1106 14-APR-14 AAAAAAAAAAAAAAA
1144 22-MAY-14 AAAAAAAAAAAAAAA
2281 02-JUL-17 AAAAAAAAAAAAAAA
4832 26-JUN-24 AAAAAAAAAAAAAAA

6 rows selected. 

And, if we had not read the article that is linked to at the start of this blog article, we might assume that the following would also return six rows:

SELECT
  *
FROM
  T1
WHERE
  ROWNUM IN (7,6,5,4,3,2); 

no rows selected

How about something like the following?  If there were only 10 rows in table T2, we might expect the following to return 10 rows from table T1 (a random selection of rows), or 10,000 rows if both tables T1 and T2 contain 10,000 rows:

SELECT
  *
FROM
  T1
WHERE
  ROWNUM IN (
    SELECT
      ROWNUM RN
    FROM
      T2);

no rows selected 

Interesting (and expected if you read the article that is linked to at the start of this blog article).  Let’s take a look at the execution plan:

SET AUTOTRACE TRACEONLY EXPLAIN

SELECT
  *
FROM
  T1
WHERE
  ROWNUM IN (
    SELECT
      ROWNUM RN
    FROM
      T2);

Execution Plan
----------------------------------------------------------
Plan hash value: 881956856

------------------------------------------------------------------------------
| Id  | Operation             | Name | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |      |   100M|  2574M|     8   (0)| 00:00:01 |
|   1 |  COUNT                |      |       |       |            |          |
|*  2 |   FILTER              |      |       |       |            |          |
|   3 |    TABLE ACCESS FULL  | T1   | 10000 |   263K|     6   (0)| 00:00:01 |
|*  4 |    FILTER             |      |       |       |            |          |
|   5 |     COUNT             |      |       |       |            |          |
|   6 |      TABLE ACCESS FULL| T2   |     1 |       |     2   (0)| 00:00:01 |
------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter( EXISTS (???)
   4 - filter(ROWNUM=ROWNUM) 

Interesting, after applying the EXISTS (???) filter (plan ID line 2), the optimizer expects 10,000 rows to become 100,000,000 rows.  Secondly, ff the value of ROWNUM is assigned only after the WHERE clause is evaluated, could the ROWNUM=ROWNUM filter (plan ID line 4) ever be true (and result in a row passing through)?  I suspect that is why this query returns no rows – is there a better answer?

What about this query, which should yield the originally expected results (note that AUTOTRACE was still enabled):

SELECT
  T1.*
FROM
  (SELECT
     T1.*,
     ROWNUM RN
   FROM
     T1) T1,
  (SELECT
     T2.*,
     ROWNUM RN
   FROM
     T2) T2
WHERE
  T1.RN=T2.RN;

Execution Plan
----------------------------------------------------------
Plan hash value: 2458726244

-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |  1000K|    54M|    17  (30)| 00:00:01 |
|*  1 |  HASH JOIN           |      |  1000K|    54M|    17  (30)| 00:00:01 |
|   2 |   VIEW               |      | 10000 |   126K|     6   (0)| 00:00:01 |
|   3 |    COUNT             |      |       |       |            |          |
|   4 |     TABLE ACCESS FULL| T2   | 10000 |       |     6   (0)| 00:00:01 |
|   5 |   VIEW               |      | 10000 |   429K|     6   (0)| 00:00:01 |
|   6 |    COUNT             |      |       |       |            |          |
|   7 |     TABLE ACCESS FULL| T1   | 10000 |   263K|     6   (0)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."RN"="T2"."RN") 

The optimizer is predicting that 1,000,000 rows will be returned, and the aliased ROWNUM values (“T1″.”RN”=”T2″.”RN”) are being compared.  So, will this query return 0 rows, 10,000 rows, or 1,000,000 rows?  Let’s find out:

SET AUTOTRACE OFF

/

        C1 C2        PADDING                 RN
---------- --------- --------------- ----------
...
      8353 15-FEB-34 AAAAAAAAAAAAAAA       9989
      8566 16-SEP-34 AAAAAAAAAAAAAAA       9990
      8202 17-SEP-33 AAAAAAAAAAAAAAA       9991
      7980 07-FEB-33 AAAAAAAAAAAAAAA       9992
      6708 15-AUG-29 AAAAAAAAAAAAAAA       9993
      5848 08-APR-27 AAAAAAAAAAAAAAA       9994
      9006 30-NOV-35 AAAAAAAAAAAAAAA       9995
      6423 03-NOV-28 AAAAAAAAAAAAAAA       9996
      3272 19-MAR-20 AAAAAAAAAAAAAAA       9997
       921 11-OCT-13 AAAAAAAAAAAAAAA       9998
      7519 04-NOV-31 AAAAAAAAAAAAAAA       9999
      9311 30-SEP-36 AAAAAAAAAAAAAAA      10000

10000 rows selected.

So, from the above, we find that by placing both halves of the query into inline views, and joining the aliased ROWNUM values, the query worked as expected.

Let’s try just placing what had been an IN list sub-query in the original query into an inline view to see what happens:

SELECT
  *
FROM
  T1,
  (SELECT
     T2.*,
     ROWNUM RN
   FROM
     T2) T2
WHERE
  T1.ROWNUM=T2.RN;

SQL> SELECT
  2    *
  3  FROM
  4    T1,
  5    (SELECT
  6       T2.*,
  7       ROWNUM RN
  8     FROM
  9       T2) T2
 10  WHERE
 11    T1.ROWNUM=T2.RN;
  T1.ROWNUM=T2.RN
     *
ERROR at line 11:
ORA-01747: invalid user.table.column, table.column, or column specification 

Needless to say, that did not work.  One final thought: ROWNUM is not a fixed attribute of a specific row in a table – if you attempt to use it as such, you will not achieve the results that you are expecting.








Follow

Get every new post delivered to your Inbox.

Join 137 other followers