Everything or Nothing in SQL

23 05 2013

May 23, 2013

The following question recently came through an ERP mailing list (significantly rephrased):

I would like to use the Microsoft Query tool in Microsoft Excel to extract records from the ERP database.  I would like the list of parent records to be retrieved into Excel when all of the specified child records belong to the parent.

That sounds as though it would be an easy request.  The OP in the ERP mailing list later followed up with the following:

“If I may ask for another variation to pick your brain….

How would I rewrite this to show ONLY the work order(s) that have all of the 3 Parts, but instead of showing all of them, show only the ones that don’t include a specific other part?

For example, Include all the work order retrieved from your query, but then subtract those work orders that don’t contain a part or parts in another separate list.

I.E. All order containing parts (‘147555′,’227643′,’155328’) but not containing ANY of these parts (‘12345’, ‘456789’)”

In the above follow up, the term “work order(s)” refer to the “parent records” that was previously mentioned, and the term “parts” refer to the “child records” that was previously mentioned.  Below is a graphical mock up, with the light-green colored rectangles representing the “parts”, and the white colored rectangle at the bottom left representing the “work order”.  The other white colored rectangles represent subassemblies that belong to the “work order”, and the cyan colored rectangles represent the operations that utilize the “parts”.  Through normalization, the white colored rectangles represent rows in one database table (WORK_ORDER – the “parent” table), the cyan colored rectangles represent rows in another table (OPERATION – the “child” table), and the light-green colored rectangles represent rows in a third table (REQUIREMENT – the “grandchild” table).

EverythingOrNothing

The first request still seems to be easy to solve, even though there are three tables that are potentially involved.  Let’s simplify the problem further and just focus on the “grandchild” table.  Let’s create a small version of that table and populate the table with some sample rows:

CREATE TABLE T1 (
  WORKORDER_TYPE VARCHAR2(1),
  WORKORDER_BASE_ID VARCHAR2(15),
  WORKORDER_LOT_ID VARCHAR2(3),
  WORKORDER_SUB_ID VARCHAR2(3),
  PIECE_NO NUMBER,
  PART_ID VARCHAR2(15),
  QTY NUMBER(12,4));

INSERT INTO
  T1
SELECT
  'W' WORKORDER_TYPE,
  TRIM(TO_CHAR(TRUNC(ROWNUM / 100 + 1)*1000,'00000')) || 'W' WORKORDER_BASE_ID,
  TO_CHAR(MOD(TRUNC(ROWNUM / 25 + 1), 999) + 1) WORKORDER_LOT_ID,
  TO_CHAR(TRUNC(MOD(ROWNUM - 1, 25) / 5) + 1) WORKORDER_SUB_ID,
  MOD(ROWNUM - 1, 25) + 1 PIECE_NO,
  CHR(65 + MOD(ROWNUM - 1, 26)) || CHR(65 + MOD(ROWNUM - 1, 15)) || MOD(ROWNUM - 1, 10)  PART_ID,
  TRUNC(DBMS_RANDOM.VALUE*100) + 1 QTY
FROM
  DUAL
CONNECT BY
  LEVEL<=1000;

In the above, the combination of the WORKORDER_BASE_ID column and the WORKORDER_LOT_ID column represent the “work order” that was mentioned by the OP.  The WORKORDER_SUB_ID column identifies the subassembly number (answering the question of to which of the white colored rectangles does the “part” belong).  The PIECE_NO column is used to uniquely identify the rows that belong to a specific subassembly number (that is the final column in the REQUIREMENT table’s primary key).  The PART_ID column is the specific “part” that is attached to a specific “work order”.  The QTY column indicates the number of the “part” that is required by the “work order” – with the sample data, this is the only column that will differ from one execution of the table creation script to the next execution.

The sample data is a bit too consistent, however, we will ignore any issues that consistency may present for now.  Let’s execute a simple SQL statement that indicates how many from the following “parts” list appear in the “work orders”: MI8, NJ9, KG6 (note the use of DISTINCT – a single PART_ID could appear on more than one subassembly in a single WORKORDER_BASE_ID, WORKORDER_LOT_ID combination, and we do not want to repeatedly count those “parts”):

SELECT
  T.WORKORDER_TYPE,
  T.WORKORDER_BASE_ID,
  T.WORKORDER_LOT_ID,
  T.WORKORDER_SUB_ID,
  T.PART_ID,
  COUNT(DISTINCT T.PART_ID) OVER (PARTITION BY T.WORKORDER_TYPE, T.WORKORDER_BASE_ID, T.WORKORDER_LOT_ID) AS C
FROM
  T1 TWHERE  T.PART_ID IN ('MI8','NJ9','KG6');

W WORKORDER_BASE_ WOR WOR PART_ID                  C
- --------------- --- --- --------------- ----------
W 01000W          3   3   KG6                      3
W 01000W          3   3   MI8                      3
W 01000W          3   3   NJ9                      3
W 05000W          19  1   KG6                      3
W 05000W          19  1   MI8                      3
W 05000W          19  1   NJ9                      3
W 09000W          34  4   KG6                      3
W 09000W          34  4   MI8                      3
W 09000W          34  4   NJ9                      3

Only 9 rows returned, with three different WORKORDER_BASE_ID, WORKORDER_LOT_ID combinations.  For a moment, let’s ignore that the three “parts” appear in all three “work orders”.  If we wanted a list of those “work orders” where _all_ three of the “parts” are present, and not just _one or more_ of the three “parts”, we could simply slide the above SQL statement into an inline view and specify that our C generated column must be equal to (or greater than – the use of greater than will be explained later) 3:

SELECT
  WORKORDER_TYPE,
  WORKORDER_BASE_ID,
  WORKORDER_LOT_ID,
  WORKORDER_SUB_ID,
  PART_ID
FROM
  (SELECT
    T.WORKORDER_TYPE,
    T.WORKORDER_BASE_ID,
    T.WORKORDER_LOT_ID,
    T.WORKORDER_SUB_ID,
    T.PART_ID,
    COUNT(DISTINCT T.PART_ID) OVER (PARTITION BY T.WORKORDER_TYPE, T.WORKORDER_BASE_ID, T.WORKORDER_LOT_ID) AS C
  FROM
    T1 T
  WHERE
    T.PART_ID IN ('MI8','NJ9','KG6'))
WHERE
  C>=3
ORDER BY
  WORKORDER_BASE_ID,
  WORKORDER_LOT_ID,
  WORKORDER_SUB_ID;

W WORKORDER_BASE_ WOR WOR PART_ID
- --------------- --- --- -------
W 01000W          3   3   KG6
W 01000W          3   3   MI8
W 01000W          3   3   NJ9
W 05000W          19  1   KG6
W 05000W          19  1   MI8
W 05000W          19  1   NJ9
W 09000W          34  4   KG6
W 09000W          34  4   MI8
W 09000W          34  4   NJ9

While trying to determine the OP’s business case for the requests, I wondered if the OP was attempting to replace a specific “part” with another “part” within the ERP package, and simply required a report to understand the progress of that manual “part” replacement procedure in the ERP package.  Let’s move forward with this assumption by replacing a row containing a specific PART_ID value with a new PART_ID value, but only for a single “work order”:

DELETE FROM
  T1
WHERE
  WORKORDER_TYPE='W'
  AND WORKORDER_BASE_ID='09000W'
  AND WORKORDER_LOT_ID='34'
  AND PART_ID='NJ9';

1 row deleted.

INSERT INTO T1 VALUES (
  'W',
  '09000W',
  '34',
  '4',
  10,
  'REPLACEMENT-NJ9',
  10);

1 row created.

Let’s check the result of the SQL statement that we were in the process of developing:

SELECT
  WORKORDER_TYPE,
  WORKORDER_BASE_ID,
  WORKORDER_LOT_ID,
  WORKORDER_SUB_ID,
  PART_ID
FROM
  (SELECT
    T.WORKORDER_TYPE,
    T.WORKORDER_BASE_ID,
    T.WORKORDER_LOT_ID,
    T.WORKORDER_SUB_ID,
    T.PART_ID,
    COUNT(DISTINCT T.PART_ID) OVER (PARTITION BY T.WORKORDER_TYPE, T.WORKORDER_BASE_ID, T.WORKORDER_LOT_ID) AS C
  FROM
    T1 T
  WHERE
    T.PART_ID IN ('MI8','NJ9','KG6'))
WHERE
  C>=3
ORDER BY
  WORKORDER_BASE_ID,
  WORKORDER_LOT_ID,
  WORKORDER_SUB_ID;

W WORKORDER_BASE_ WOR WOR PART_ID
- --------------- --- --- -------
W 01000W          3   3   KG6
W 01000W          3   3   MI8
W 01000W          3   3   NJ9
W 05000W          19  1   KG6
W 05000W          19  1   MI8
W 05000W          19  1   NJ9

Now, there are only six rows returned from two “work orders” – the third “work order” is no longer included in the query results.

With the use of the C >= n syntax, we are able to find “work orders” with at least n number of the specified “parts”.  So, if we were interested in finding cases where at least two of the three listed “parts” are attached to a “work order”, we just need to change the 3 to 2:

SELECT
  WORKORDER_TYPE,
  WORKORDER_BASE_ID,
  WORKORDER_LOT_ID,
  WORKORDER_SUB_ID,
  PART_ID
FROM
  (SELECT
    T.WORKORDER_TYPE,
    T.WORKORDER_BASE_ID,
    T.WORKORDER_LOT_ID,
    T.WORKORDER_SUB_ID,
    T.PART_ID,
    COUNT(DISTINCT T.PART_ID) OVER (PARTITION BY T.WORKORDER_TYPE, T.WORKORDER_BASE_ID, T.WORKORDER_LOT_ID) AS C
  FROM
    T1 T
  WHERE
    T.PART_ID IN ('MI8','NJ9','KG6'))
WHERE
  C>=2
ORDER BY
  WORKORDER_BASE_ID,
  WORKORDER_LOT_ID,
  WORKORDER_SUB_ID;

W WORKORDER_BASE_ WOR WOR PART_ID
- --------------- --- --- -------
W 01000W          3   3   KG6
W 01000W          3   3   MI8
W 01000W          3   3   NJ9
W 05000W          19  1   KG6
W 05000W          19  1   NJ9
W 05000W          19  1   MI8
W 09000W          34  4   KG6
W 09000W          34  4   MI8

The third “work order” is again included in the query resultset.

Other than putting the results into Microsoft Excel, the sample query satisfies the first request of the OP.  If we were not interested in returning the WORKORDER_SUB_ID and PART_ID columns, we could have simply used the following SQL statement:

SELECT
  T.WORKORDER_TYPE,
  T.WORKORDER_BASE_ID,
  T.WORKORDER_LOT_ID
FROM
  T1 T
WHERE
  T.PART_ID IN ('MI8','NJ9','KG6')
GROUP BY
  T.WORKORDER_TYPE,
  T.WORKORDER_BASE_ID,
  T.WORKORDER_LOT_ID
HAVING
  COUNT(*)>=3;

W WORKORDER_BASE_ WOR
- --------------- ---
W 05000W          19
W 01000W          3

Now to work on the second request, only retrieving the “work orders” when all of a specified list of “parts” are attached to the “work order” and none of another list of “parts” are attached to the “work order”.  As a starting point, we will re-use the inline view from the previous solution.  The IN list will be modified to include the “parts” that must be attached to the “work order”, as well as the “parts” that must not be attached to the “work order”.  The COUNT analytic function must be modified to include either a DECODE or CASE structure to divide the “parts” into the “must” and “must not” lists so that a count of each may be obtained:

SELECT
  T.WORKORDER_TYPE,
  T.WORKORDER_BASE_ID,
  T.WORKORDER_LOT_ID,
  T.WORKORDER_SUB_ID,
  T.PART_ID,
  COUNT(DISTINCT (CASE WHEN T.PART_ID IN ('MI8','NJ9','KG6') THEN T.PART_ID ELSE NULL END)) OVER (PARTITION BY T.WORKORDER_TYPE, T.WORKORDER_BASE_ID, T.WORKORDER_LOT_ID) AS C,
  COUNT(DISTINCT (CASE WHEN T.PART_ID IN ('REPLACEMENT-NJ9') THEN T.PART_ID ELSE NULL END)) OVER (PARTITION BY T.WORKORDER_TYPE, T.WORKORDER_BASE_ID, T.WORKORDER_LOT_ID) AS NC
FROM
  T1 T
WHERE
  T.PART_ID IN ('MI8','KG6',   'REPLACEMENT-NJ9');

W WORKORDER_BASE_ WOR WOR PART_ID                  C         NC
- --------------- --- --- --------------- ---------- ----------
W 01000W          3   3   KG6                      2          0
W 01000W          3   3   MI8                      2          0
W 05000W          19  1   KG6                      2          0
W 05000W          19  1   MI8                      2          0
W 09000W          34  4   KG6                      2          1
W 09000W          34  4   MI8                      2          1
W 09000W          34  4   REPLACEMENT-NJ9          2          1

To satisfy the OP’s second request, we are interested in those rows where the C column value is (at least) 2 and the NC column value is 0.  Just as before, we will slide the above SQL statement into an inline view and add the specifications for the C and NC generated columns:

SELECT
  WORKORDER_TYPE,
  WORKORDER_BASE_ID,
  WORKORDER_LOT_ID,
  WORKORDER_SUB_ID,
  PART_ID
FROM
  (SELECT
    T.WORKORDER_TYPE,
    T.WORKORDER_BASE_ID,
    T.WORKORDER_LOT_ID,
    T.WORKORDER_SUB_ID,
    T.PART_ID,
    COUNT(DISTINCT (CASE WHEN T.PART_ID IN ('MI8','NJ9','KG6') THEN T.PART_ID ELSE NULL END)) OVER (PARTITION BY T.WORKORDER_TYPE, T.WORKORDER_BASE_ID, T.WORKORDER_LOT_ID) AS C,
    COUNT(DISTINCT (CASE WHEN T.PART_ID IN ('REPLACEMENT-NJ9') THEN T.PART_ID ELSE NULL END)) OVER (PARTITION BY T.WORKORDER_TYPE, T.WORKORDER_BASE_ID, T.WORKORDER_LOT_ID) AS NC
  FROM
    T1 T
  WHERE
    T.PART_ID IN ('MI8','KG6',   'REPLACEMENT-NJ9'))
WHERE
  C>=2
  AND NC=0
ORDER BY
  WORKORDER_BASE_ID,
  WORKORDER_LOT_ID,
  WORKORDER_SUB_ID;

W WORKORDER_BASE_ WOR WOR PART_ID
- --------------- --- --- -------
W 01000W          3   3   KG6
W 01000W          3   3   MI8
W 05000W          19  1   KG6
W 05000W          19  1   MI8

The above shows those “work orders” where the NJ9 “part” had not yet been replaced with the REPLACEMENT-NJ9 “part”.  If desired, we are also able to easily modify the OP’s second request to see the “work orders” where the “part” was already replaced just by changing NC=0 to NC=1:

SELECT
  WORKORDER_TYPE,
  WORKORDER_BASE_ID,
  WORKORDER_LOT_ID,
  WORKORDER_SUB_ID,
  PART_ID
FROM
  (SELECT
    T.WORKORDER_TYPE,
    T.WORKORDER_BASE_ID,
    T.WORKORDER_LOT_ID,
    T.WORKORDER_SUB_ID,
    T.PART_ID,
    COUNT(DISTINCT (CASE WHEN T.PART_ID IN ('MI8','NJ9','KG6') THEN T.PART_ID ELSE NULL END)) OVER (PARTITION BY T.WORKORDER_TYPE, T.WORKORDER_BASE_ID, T.WORKORDER_LOT_ID) AS C,
    COUNT(DISTINCT (CASE WHEN T.PART_ID IN ('REPLACEMENT-NJ9') THEN T.PART_ID ELSE NULL END)) OVER (PARTITION BY T.WORKORDER_TYPE, T.WORKORDER_BASE_ID, T.WORKORDER_LOT_ID) AS NC
  FROM
    T1 T
  WHERE
    T.PART_ID IN ('MI8','KG6',   'REPLACEMENT-NJ9'))
WHERE
  C>=2
  AND NC=1
ORDER BY
  WORKORDER_BASE_ID,
  WORKORDER_LOT_ID,
  WORKORDER_SUB_ID;

W WORKORDER_BASE_ WOR WOR PART_ID
- --------------- --- --- ---------------
W 09000W          34  4   KG6
W 09000W          34  4   REPLACEMENT-NJ9
W 09000W          34  4   MI8

What other related information are we able to return?  What about all “parts” (the full bill of material) for each of the work orders that match the OP’s second request.  That modification is simple, just slide the majority of the above SQL statement into a subquery (found in the first WHERE clause):

SELECT
  *
FROM
  T1
WHERE
  (WORKORDER_TYPE,
   WORKORDER_BASE_ID,
   WORKORDER_LOT_ID) IN
    (SELECT
      WORKORDER_TYPE,
      WORKORDER_BASE_ID,
      WORKORDER_LOT_ID
    FROM
      (SELECT
        T.WORKORDER_TYPE,
        T.WORKORDER_BASE_ID,
        T.WORKORDER_LOT_ID,
        T.WORKORDER_SUB_ID,
        T.PART_ID,
        COUNT(DISTINCT (CASE WHEN T.PART_ID IN ('MI8','NJ9','KG6') THEN T.PART_ID ELSE NULL END)) OVER (PARTITION BY T.WORKORDER_TYPE, T.WORKORDER_BASE_ID, T.WORKORDER_LOT_ID) AS C,
        COUNT(DISTINCT (CASE WHEN T.PART_ID IN ('REPLACEMENT-NJ9') THEN T.PART_ID ELSE NULL END)) OVER (PARTITION BY T.WORKORDER_TYPE, T.WORKORDER_BASE_ID, T.WORKORDER_LOT_ID) AS NC
      FROM
        T1 T
      WHERE
        T.PART_ID IN ('MI8','KG6',   'REPLACEMENT-NJ9'))
    WHERE
      C>=2
      AND NC=0)
ORDER BY
  WORKORDER_BASE_ID,
  WORKORDER_LOT_ID,
  WORKORDER_SUB_ID,
  PIECE_NO;

W WORKORDER_BASE_ WOR WOR   PIECE_NO PART_ID                QTY
- --------------- --- --- ---------- --------------- ----------
W 01000W          3   1            1 ZK5                     48
W 01000W          3   1            2 AL6                     60
W 01000W          3   1            3 BM7                     42
W 01000W          3   1            4 CN8                     26
W 01000W          3   1            5 DO9                     49
W 01000W          3   2            6 EA0                     66
W 01000W          3   2            7 FB1                     39
W 01000W          3   2            8 GC2                     57
W 01000W          3   2            9 HD3                     42
W 01000W          3   2           10 IE4                     52
W 01000W          3   3           11 JF5                     16
W 01000W          3   3           12 KG6                     61
W 01000W          3   3           13 LH7                     39
W 01000W          3   3           14 MI8                     75
W 01000W          3   3           15 NJ9                     94
W 01000W          3   4           16 OK0                     67
W 01000W          3   4           17 PL1                     12
W 01000W          3   4           18 QM2                     41
W 01000W          3   4           19 RN3                     97
W 01000W          3   4           20 SO4                     15
W 01000W          3   5           21 TA5                     54
W 01000W          3   5           22 UB6                     59
W 01000W          3   5           23 VC7                     23
W 01000W          3   5           24 WD8                      7
W 01000W          3   5           25 YJ4                     22
W 05000W          19  1            1 JF5                     31
W 05000W          19  1            2 KG6                     42
W 05000W          19  1            3 LH7                     37
W 05000W          19  1            4 MI8                     85
W 05000W          19  1            5 NJ9                    100
W 05000W          19  2            6 OK0                     43
W 05000W          19  2            7 PL1                     58
W 05000W          19  2            8 QM2                     76
W 05000W          19  2            9 RN3                     66
W 05000W          19  2           10 SO4                     75
W 05000W          19  3           11 TA5                     50
W 05000W          19  3           12 UB6                     55
W 05000W          19  3           13 VC7                     18
W 05000W          19  3           14 WD8                     33
W 05000W          19  3           15 XE9                      8
W 05000W          19  4           16 YF0                     86
W 05000W          19  4           17 ZG1                      7
W 05000W          19  4           18 AH2                     25
W 05000W          19  4           19 BI3                     38
W 05000W          19  4           20 CJ4                     34
W 05000W          19  5           21 DK5                     88
W 05000W          19  5           22 EL6                     91
W 05000W          19  5           23 FM7                     52
W 05000W          19  5           24 GN8                     71
W 05000W          19  5           25 IE4                     55

Just as before, we are able to see all “parts” for the “work orders” that already have the replacement part, just by changing the NC=0 to NC=1:

SELECT
  *
FROM
  T1
WHERE
  (WORKORDER_TYPE,
   WORKORDER_BASE_ID,
   WORKORDER_LOT_ID) IN
    (SELECT
      WORKORDER_TYPE,
      WORKORDER_BASE_ID,
      WORKORDER_LOT_ID
    FROM
      (SELECT
        T.WORKORDER_TYPE,
        T.WORKORDER_BASE_ID,
        T.WORKORDER_LOT_ID,
        T.WORKORDER_SUB_ID,
        T.PART_ID,
        COUNT(DISTINCT (CASE WHEN T.PART_ID IN ('MI8','NJ9','KG6') THEN T.PART_ID ELSE NULL END)) OVER (PARTITION BY T.WORKORDER_TYPE, T.WORKORDER_BASE_ID, T.WORKORDER_LOT_ID) AS C,
        COUNT(DISTINCT (CASE WHEN T.PART_ID IN ('REPLACEMENT-NJ9') THEN T.PART_ID ELSE NULL END)) OVER (PARTITION BY T.WORKORDER_TYPE, T.WORKORDER_BASE_ID, T.WORKORDER_LOT_ID) AS NC
      FROM
        T1 T
      WHERE
        T.PART_ID IN ('MI8','KG6',   'REPLACEMENT-NJ9'))
    WHERE
      C>=2
      AND NC=1)
ORDER BY
  WORKORDER_BASE_ID,
  WORKORDER_LOT_ID,
  WORKORDER_SUB_ID,
  PIECE_NO;

W WORKORDER_BASE_ WOR WOR   PIECE_NO PART_ID                QTY
- --------------- --- --- ---------- --------------- ----------
W 09000W          34  1            1 UF0                     80
W 09000W          34  1            2 VG1                     75
W 09000W          34  1            3 WH2                     75
W 09000W          34  1            4 XI3                     77
W 09000W          34  1            5 YJ4                     46
W 09000W          34  2            6 ZK5                     57
W 09000W          34  2            7 AL6                     64
W 09000W          34  2            8 BM7                     14
W 09000W          34  2            9 CN8                     30
W 09000W          34  2           10 DO9                     62
W 09000W          34  3           11 EA0                     45
W 09000W          34  3           12 FB1                     68
W 09000W          34  3           13 GC2                     25
W 09000W          34  3           14 HD3                     73
W 09000W          34  3           15 IE4                     32
W 09000W          34  4           10 REPLACEMENT-NJ9         10
W 09000W          34  4           16 JF5                     49
W 09000W          34  4           17 KG6                     23
W 09000W          34  4           18 LH7                     60
W 09000W          34  4           19 MI8                     75
W 09000W          34  5           21 OK0                     91
W 09000W          34  5           22 PL1                     66
W 09000W          34  5           23 QM2                     45
W 09000W          34  5           24 RN3                     49
W 09000W          34  5           25 TE9                      6

Using the sample table and data provided, are there any other solutions to the OP’s requests?





Analysis Challenges

25 04 2013

April 25, 2013

Roughly 12 years ago I was attempting to analyze customer order changes that were received through electronic document interchange (EDI), specifically X12 830 documents that show order forecasted demand for specific part numbers.  At the time, the EDI data was partially transformed and inserted into an Oracle 8.0.5 database, while that data is now inserted into an Oracle 11.2 database.  I recently took another glance that this particular problem to see if there is a better method for performing the analysis.

Complications in the analysis:

  • The EDI documents do not include the customer’s purchase order line numbers, just a list of outstanding due dates and quantities.  That complication means that there is no easy way to see if an outstanding quantity has changed, or the date for that outstanding quantity has changed.
  • It is possible to ship to the customer a partial order quantity, with the remaining quantity on the line handled as a back-order.  If the customer does not alter this particular forecast line, only the quantity outstanding will be provided, along with the due date – the original quantity is not provided.
  • It is possible to ship out an order line received in an 830 document up to five days before the customer’s due date.
  • Shipped parts may require hours or days to reach the customer’s site, thus delaying when those shipped parts are acknowledged in the list of outstanding due dates and quantities received through EDI.
  • EDI 830 documents are generally received the next weekday after the customer batch generates those documents on a nightly basis.  Thus, there is an opportunity for the received forecast to be out of sync with what is known to have arrived at the customer site, and what is acknowledged in the EDI data.
  • The customer may generate multiple EDI 830 documents for a particular part number/purchase order combination on any given day, and only the EDI 830 document received with the highest (sorted in ascending order) release number (which may not be a number) on a specific date is the valid release for the specific date.
  • EDI 830 documents may be generated for a specific part number by the customer nearly every weekday, once a month, or any frequency in between.

I recall trying to find a solution to this particular problem 12 years ago.  I thought that I could possibly pull all of the relevant data from the Oracle database into a custom application, and through some magical formula output what appeared to be a list of what the customer changed.  However, what about those part quantities that are in transit to the customer?  What about those part quantities that were received by the customer, but had not yet made their way through the customer’s batch processing system?  What about those cases where the customer sents two or more EDI 830 releases for a specific part number, and those EDI releases are received on the same date?

At the time, I was frustrated.  The quality of the output of my analysis would be dependent on the accuracy of humans (yes, and humans wrote the program that performed the analysis too).  I simply had to find the first matching due date and order quantity from the previous release with the due date and order quanitity from the current release, and then rely on an end user to correct the 5% to 10% of the cases where the matching algorithm was wrong due to one or more of the “complications” outlined above.

While recently re-examining the problem, I noticed a data element in the EDI 830 document that is described as “ACCUM RECEIPTS AS OF THIS DATE”, and a luck would have it, that particular data element is one of those elements that was selected more than 12 years ago to be transformed into the Oracle Database tables (in the sample data included with this blog article, this column is named PREVIOUS_ACCUM_QTY).  If only I recognized the meaning of the values of this column, and had access to Oracle analytical SQL functions 12 years ago, this particular analysis problem might not have had so many variables that could potentially throw off the quality of the analysis.

This particular blog article has a fairly large set of sample data – however, that sample data is a fairly small subset of the data that must be analyzed on a daily basis.  The sample data may be downloaded here: analysischallenges.sql sample code (save the file as “analysischallenges.sql” – without the .doc double extension).  If the file is saved in the same directory (folder) as SQL*Plus was started from, you may then execute the script in SQL*Plus to build the tables and populate the sample data by using the following command:

@analysischallenges.sql

The two sample table definitions are created with the following commands (also included in the script):

CREATE TABLE T1 (
  CUSTOMER_PO VARCHAR2(15),
  RELEASE_ID VARCHAR2(15),
  PART_ID VARCHAR2(30),
  RELEASE_DATE DATE,
  PREVIOUS_ACCUM_QTY NUMBER,
  CREATE_DATE DATE,
  PRIMARY KEY (CUSTOMER_PO,RELEASE_ID));

CREATE TABLE T1_LINE (
  CUSTOMER_PO VARCHAR2(12),
  RELEASE_ID VARCHAR2(12),
  QTY NUMBER,
  DUE_DATE DATE,
  PRIMARY KEY (CUSTOMER_PO,RELEASE_ID,DUE_DATE));

T1 Table Columns:

CUSTOMER_PO is the unique identifier for the customer purchase order received through the EDI 830 documents.  RELEASE_ID defines the specific serialized change number for the customer’s PO – the RELEASE_ID is unique for a specific PO.  PART_ID is the part number for which the customer is providing a forecast.  RELEASE_DATE is the date that the forecast was batch generated in the customer’s computer system.  PREVIOUS_ACCUM_QTY is the count of the number of this part number that the customer has already received for this specific customer purchase order.  CREATE_DATE is the date that the forecast was received from the customer and transformed/inserted into the Oracle Database tables.

T1_LINE Table Columns:

CUSTOMER_PO and RELEASE_ID are undeclared foreign key columns that point back to table T1.  QTY is the quantity of the part number that should (or is forecasted to) ship on the specfied DUE_DATE.

—–

If you are curious how I created the script to populate the tables, I executed the following in SQL*Plus, and then performed a little cleanup of the resulting spool files:

SET LINESIZE 200
SET TRIMSPOOL ON

SPOOL analysischallenges_t1.txt

SELECT
  'INSERT INTO T1 VALUES ('||CHR(39)||CUSTOMER_PO||CHR(39)||','
    ||CHR(39)||RELEASE_ID||CHR(39)||','
    ||CHR(39)||PART_ID||CHR(39)||','
    ||'TO_DATE('||CHR(39)||TO_CHAR(RELEASE_DATE,'DD-MON-YYYY')||CHR(39)||','||CHR(39)||'DD-MON-YYYY'||CHR(39)||')'||','
    ||PREVIOUS_ACCUM_QTY||','
    ||'TO_DATE('||CHR(39)||TO_CHAR(CREATE_DATE,'DD-MON-YYYY')||CHR(39)||','||CHR(39)||'DD-MON-YYYY'||CHR(39)||')'||');' D
FROM
  T1
ORDER BY
  CUSTOMER_PO,
  RELEASE_ID;

SPOOL OFF

SPOOL analysischallenges_t1_line.txt

SELECT
  'INSERT INTO T1_LINE VALUES ('||CHR(39)||CUSTOMER_PO||CHR(39)||','
    ||CHR(39)||RELEASE_ID||CHR(39)||','
    ||QTY||','
    ||'TO_DATE('||CHR(39)||TO_CHAR(DUE_DATE,'DD-MON-YYYY')||CHR(39)||','||CHR(39)||'DD-MON-YYYY'||CHR(39)||')'||');' D
FROM
  T1_LINE
ORDER BY
  CUSTOMER_PO,
  RELEASE_ID,
  DUE_DATE;

SPOOL OFF

—–

We must find a starting point for performing the analysis of what changed from one day’s forecast to a prior day’s forecast.  There is an issue with the EDI 830 releases that were received (identified by the CREATE_DATE column) on April 24, 2013 – two releases for the same customer purchase orders were received, and there were two purchase orders that demand the same part number.  Our analysis is only concerned with the highest numbered (sorted alphabetically) release number document for each customer purchase order that was received on any particular day, so we need to find a way to eliminate the other older releases for the same customer purchase order that was received on the same day.  The ROW_NUMBER analytic function is perfect for this task – we just need the output to show the same number (1) for the rows that we want to keep, and have some other number appear on the other rows.  Something similar to the following may work (note that the TRUNC function, or other approach that uses a date/time range, probably should be used when working with the CREATE_DATE column – realistically, there could be a time component included with those date values):

COLUMN PART_ID FORMAT A10
SET LINESIZE 120
SET PAGESIZE 1000

SELECT
  T.PART_ID,
  T.CUSTOMER_PO,
  ROW_NUMBER() OVER (PARTITION BY T.CUSTOMER_PO,TRUNC(T.CREATE_DATE) ORDER BY T.RELEASE_ID DESC) REL_ON_DATE,
  T.RELEASE_ID,
  T.RELEASE_DATE,
  T.CREATE_DATE,
  T.PREVIOUS_ACCUM_QTY
FROM
  T1 T
WHERE 
  T.CREATE_DATE=TO_DATE('24-APR-2013','DD-MON-YYYY')
ORDER BY
  T.PART_ID,
  T.CUSTOMER_PO,
  T.RELEASE_ID;

PART_ID    CUSTOMER_PO     REL_ON_DATE RELEASE_ID      RELEASE_D CREATE_DA PREVIOUS_ACCUM_QTY
---------- --------------- ----------- --------------- --------- --------- ------------------
DS1812+    1231234                   2 20130402-001    23-APR-13 24-APR-13                 24
DS1812+    1231234                   1 20130403-001    24-APR-13 24-APR-13                 24
DS1812+    1233290                   2 20130402-001    23-APR-13 24-APR-13                  4
DS1812+    1233290                   1 20130403-001    24-APR-13 24-APR-13                  4
DS212+     1133290                   2 20130402-001    23-APR-13 24-APR-13                  2
DS212+     1133290                   1 20130403-001    24-APR-13 24-APR-13                  2
RS812      1231280                   2 20130402-001    23-APR-13 24-APR-13                 60
RS812      1231280                   1 20130403-001    24-APR-13 24-APR-13                 60

To remove all of the rows that have something other than 1 in the REL_ON_DATE column, we are able to slide the above SQL statement into an inline view, and then specify that the REL_ON_DATE column value must be equal to 1.  Typically, there are supposed to be few/no changes made by the customer within eight weeks of the customer forecast’s release date, but that is not always the case.  While eliminating the unnecessary releases, it probably makes sense to also calculate the end date of this “firm” period by adding 55 days to the date on which the forecast was created:

SELECT
  T.PART_ID,
  T.CUSTOMER_PO,
  T.RELEASE_ID,
  T.RELEASE_DATE+55 POSSIBLE_FIRM_DATE,
  T.PREVIOUS_ACCUM_QTY
FROM
  (SELECT
    T.PART_ID,
    T.CUSTOMER_PO,
    ROW_NUMBER() OVER (PARTITION BY T.CUSTOMER_PO,TRUNC(T.CREATE_DATE) ORDER BY T.RELEASE_ID DESC) REL_ON_DATE,
    T.RELEASE_ID,
    T.RELEASE_DATE,
    T.CREATE_DATE,
    T.PREVIOUS_ACCUM_QTY
  FROM
    T1 T
  WHERE 
    T.CREATE_DATE=TO_DATE('24-APR-2013','DD-MON-YYYY')) T
WHERE
  T.REL_ON_DATE=1
ORDER BY
  T.PART_ID,
  T.CUSTOMER_PO,
  T.RELEASE_ID;

PART_ID    CUSTOMER_PO     RELEASE_ID      POSSIBLE_ PREVIOUS_ACCUM_QTY
---------- --------------- --------------- --------- ------------------
DS1812+    1231234         20130403-001    18-JUN-13                 24
DS1812+    1233290         20130403-001    18-JUN-13                  4
DS212+     1133290         20130403-001    18-JUN-13                  2
RS812      1231280         20130403-001    18-JUN-13                 60

Next, we need to retrieve the forecasted delivery schedule (in the T1_LINES table) that are associated with the above rows.  If we take the value in the PREVIOUS_ACCUM_QTY column, and find a way to generate a running SUM of the QTY column from the T1_LINE table, in theory we should be able to determine if the total quantity ordered through a specific date has changed.  The analytic version of the SUM function offers the capability of producing a running SUM when an ORDER BY clause is included in the function call.  To reduce the data set size, I will only consider those forecasts with a scheduled delivery date that is within 101 days of the customer’s forecast date.  With those changes in place, the SQL statement is modified as follows:

SELECT
  T.PART_ID,
  T.CUSTOMER_PO,
  T.RELEASE_ID,
  T.RELEASE_DATE+55 POSSIBLE_FIRM_DATE,
  T.PREVIOUS_ACCUM_QTY,
  TL.DUE_DATE,
  TL.QTY,
  T.PREVIOUS_ACCUM_QTY + SUM(TL.QTY) OVER (PARTITION BY T.CUSTOMER_PO,T.RELEASE_ID ORDER BY TL.DUE_DATE) REL_RUNNING_SUM
FROM
  (SELECT
    T.PART_ID,
    T.CUSTOMER_PO,
    ROW_NUMBER() OVER (PARTITION BY T.CUSTOMER_PO,TRUNC(T.CREATE_DATE) ORDER BY T.RELEASE_ID DESC) REL_ON_DATE,
    T.RELEASE_ID,
    T.RELEASE_DATE,
    T.CREATE_DATE,
    T.PREVIOUS_ACCUM_QTY
  FROM
    T1 T
  WHERE 
    T.CREATE_DATE=TO_DATE('24-APR-2013','DD-MON-YYYY')) T,
  T1_LINE TL
WHERE
  T.REL_ON_DATE=1
  AND T.CUSTOMER_PO=TL.CUSTOMER_PO
  AND T.RELEASE_ID=TL.RELEASE_ID
  AND (T.RELEASE_DATE+100)>=TL.DUE_DATE
ORDER BY
  T.PART_ID,
  T.CUSTOMER_PO,
  T.RELEASE_ID,
  TL.DUE_DATE;

PART_ID    CUSTOMER_PO     RELEASE_ID      POSSIBLE_ PREVIOUS_ACCUM_QTY DUE_DATE         QTY REL_RUNNING_SUM
---------- --------------- --------------- --------- ------------------ --------- ---------- ---------------
DS1812+    1231234         20130403-001    18-JUN-13                 24 24-APR-13          0              24
DS1812+    1233290         20130403-001    18-JUN-13                  4 24-APR-13          0               4
DS212+     1133290         20130403-001    18-JUN-13                  2 24-APR-13          0               2
RS812      1231280         20130403-001    18-JUN-13                 60 27-JUN-13          1              61
RS812      1231280         20130403-001    18-JUN-13                 60 03-JUL-13          1              62
RS812      1231280         20130403-001    18-JUN-13                 60 24-JUL-13          2              64

As may be seen above, only customer purchase order 1231280 has a forecast, calling for a quantity of 1 on June 27, quantity of 1 on July 3, and a quantity of 2 on July 24.  The previously received quantity for customer purchase order 1231280 is 60, so the additional forecast quantities will bring the total order quantities up to 61, 62, and 64 through the three dates included in the forecast.

-The Top Half of the Final SQL Statement is Above-

So, we now have a starting point – the data that was received on April 24, 2013.  How does that forecast data compare with previously received forecasts?  We need to start building the second half of the SQL statement, which looks at the historical data.  To limit the dataset size, I will limit the data to only those EDI documents that were received up to 56 days prior (the eight week “firm” period) to the April 24, 2013 date that was specified earlier in this article:

SELECT
  T.CUSTOMER_PO,
  ROW_NUMBER() OVER (PARTITION BY T.CUSTOMER_PO,TRUNC(T.CREATE_DATE) ORDER BY T.RELEASE_ID DESC) REL_ON_DATE,
  TRUNC(T.CREATE_DATE) CREATE_DATE,
  T.RELEASE_ID,
  T.PREVIOUS_ACCUM_QTY
FROM
  T1 T
WHERE 
  T.CREATE_DATE >= TO_DATE('24-APR-2013','DD-MON-YYYY') - 56
  AND T.CREATE_DATE < TO_DATE('24-APR-2013','DD-MON-YYYY')
ORDER BY
  T.CUSTOMER_PO,
  T.RELEASE_ID;

CUSTOMER_PO     REL_ON_DATE CREATE_DA RELEASE_ID      PREVIOUS_ACCUM_QTY
--------------- ----------- --------- --------------- ------------------
1131111                   1 27-FEB-13 0371                           575
1131111                   1 01-MAR-13 0381                           577
1131111                   1 06-MAR-13 0431                           578
1131111                   1 07-MAR-13 0441                           581
1131111                   1 08-MAR-13 0451                           581
1131111                   1 11-MAR-13 0461                           581
...
1131111                   1 08-APR-13 0741                           593
1131111                   1 11-APR-13 0791                           593
1131111                   1 23-APR-13 0911                           595
1131245                   1 27-FEB-13 0371                          2299
1131245                   1 01-MAR-13 0381                          2300
1131245                   1 05-MAR-13 0421                          2303
...
1131745                   1 19-APR-13 0871                           435
1131745                   1 23-APR-13 0911                           435
1133290                   1 27-FEB-13 20130205-001                     2
1133290                   1 28-FEB-13 20130206-001                     2
1133290                   1 01-MAR-13 20130207-001                     2
...
1133290                   1 19-APR-13 20130328-001                     2
1133290                   1 23-APR-13 20130401-001                     2
1231234                   1 27-FEB-13 20130205-001                    24
1231234                   1 28-FEB-13 20130206-001                    24
...
1231234                   1 23-APR-13 20130401-001                    24
1231280                   1 27-FEB-13 20130205-001                    57
1231280                   1 28-FEB-13 20130206-001                    57
1231280                   1 01-MAR-13 20130207-001                    57
1231280                   1 04-MAR-13 20130208-001                    58
1231280                   1 05-MAR-13 20130211-001                    58
...
1231280                   1 04-APR-13 20130313-001                    60
1231280                   2 05-APR-13 20130314-002                    60
...
1231280                   1 19-APR-13 20130328-001                    60
1231280                   1 23-APR-13 20130401-001                    60
1233290                   1 27-FEB-13 20130205-001                     4
1233290                   1 28-FEB-13 20130206-001                     4
...

As we found earlier, there could be more than one release for a customer purchase order received in a single date, and we are only concerned with the most recent release received on any particular day.  We have a second issue, in that the dates may have shifted from one release to the next, so we have to artifically generate rows with the potentially missing dates, with an order quantity of 0 on those dates.

But, how to generate a list of dates?  We could use a CONNECT BY LEVEL trick:

SELECT
  TO_DATE('24-APR-2013','DD-MON-YYYY')+(LEVEL-1) DUE_DATE
FROM
  DUAL
CONNECT BY
  LEVEL<=100;

DUE_DATE
---------
24-APR-13
25-APR-13
26-APR-13
27-APR-13
28-APR-13
...
15-JUN-13
16-JUN-13
17-JUN-13
18-JUN-13
...
30-JUL-13
31-JUL-13
01-AUG-13

Now, by creating a Cartesian join between the dataset with the dates and the dataset with the EDI release header records, we will make certain that a date exists for every date that is potentially of interest (up to 100 days after April 24, 2013) when we later join the result with the T1_LINE table (the child table that lists the forecast delivery dates).

SELECT
  T2.CUSTOMER_PO,
  T2.RELEASE_ID,
  T2.CREATE_DATE,
  T2.PREVIOUS_ACCUM_QTY,
  D.DUE_DATE
FROM
  (SELECT
    T.CUSTOMER_PO,
    ROW_NUMBER() OVER (PARTITION BY T.CUSTOMER_PO,TRUNC(T.CREATE_DATE) ORDER BY T.RELEASE_ID DESC) REL_ON_DATE,
    TRUNC(T.CREATE_DATE) CREATE_DATE,
    T.RELEASE_ID,
    T.PREVIOUS_ACCUM_QTY
  FROM
    T1 T
  WHERE 
    T.CREATE_DATE >= TO_DATE('24-APR-2013','DD-MON-YYYY') - 56
    AND T.CREATE_DATE < TO_DATE('24-APR-2013','DD-MON-YYYY')
  ORDER BY
    T.CUSTOMER_PO,
    T.RELEASE_ID) T2,
  (SELECT
    TO_DATE('24-APR-2013','DD-MON-YYYY')+(LEVEL-1) DUE_DATE
  FROM
    DUAL
  CONNECT BY
    LEVEL<=100) D
WHERE
  T2.REL_ON_DATE=1
ORDER BY
  T2.CUSTOMER_PO,
  T2.RELEASE_ID,
  D.DUE_DATE;

...
CUSTOMER_PO     RELEASE_ID      CREATE_DA PREVIOUS_ACCUM_QTY DUE_DATE
--------------- --------------- --------- ------------------ ---------
1233290         20130318-001    09-APR-13                  4 25-MAY-13
1233290         20130318-001    09-APR-13                  4 26-MAY-13
1233290         20130318-001    09-APR-13                  4 27-MAY-13
1233290         20130318-001    09-APR-13                  4 28-MAY-13
...
1233290         20130322-001    15-APR-13                  4 22-JUN-13
1233290         20130322-001    15-APR-13                  4 23-JUN-13
1233290         20130322-001    15-APR-13                  4 24-JUN-13
1233290         20130322-001    15-APR-13                  4 25-JUN-13
...
1233290         20130401-001    23-APR-13                  4 29-JUL-13
1233290         20130401-001    23-APR-13                  4 30-JUL-13
1233290         20130401-001    23-APR-13                  4 31-JUL-13
1233290         20130401-001    23-APR-13                  4 01-AUG-13

Next, we will outer join to the T1_LINE table to bring in the actual forecasted delivery schedule from the previous dates (note that I also removed an unnecessary ORDER BY clause that was still embedded in the SQL statement – the Oracle query optimizer probably would have realized that the ORDER BY clause was meaningless to the query result, and automatically removed it – but I removed the clause just in case my assumption is incorrect):

SELECT
  T3.CUSTOMER_PO,
  T3.CREATE_DATE,
  T3.DUE_DATE,
  T3.PREVIOUS_ACCUM_QTY + SUM(NVL(TL.QTY,0)) OVER (PARTITION BY T3.CUSTOMER_PO,T3.RELEASE_ID ORDER BY T3.DUE_DATE) REL_RUNNING_SUM
FROM
  (SELECT
    T2.CUSTOMER_PO,
    T2.RELEASE_ID,
    T2.CREATE_DATE,
    T2.PREVIOUS_ACCUM_QTY,
    D.DUE_DATE
  FROM
    (SELECT
      T.CUSTOMER_PO,
      ROW_NUMBER() OVER (PARTITION BY T.CUSTOMER_PO,TRUNC(T.CREATE_DATE) ORDER BY T.RELEASE_ID DESC) REL_ON_DATE,
      TRUNC(T.CREATE_DATE) CREATE_DATE,
      T.RELEASE_ID,
      T.PREVIOUS_ACCUM_QTY
    FROM
      T1 T
    WHERE 
      T.CREATE_DATE >= TO_DATE('24-APR-2013','DD-MON-YYYY') - 56
      AND T.CREATE_DATE < TO_DATE('24-APR-2013','DD-MON-YYYY')) T2,
    (SELECT
      TO_DATE('24-APR-2013','DD-MON-YYYY')+(LEVEL-1) DUE_DATE
    FROM
      DUAL
    CONNECT BY
      LEVEL<=100) D
  WHERE
    T2.REL_ON_DATE=1) T3,
  T1_LINE TL
WHERE
  T3.CUSTOMER_PO=TL.CUSTOMER_PO(+)
  AND T3.RELEASE_ID=TL.RELEASE_ID(+)
  AND T3.DUE_DATE=TL.DUE_DATE(+)
ORDER BY
  T3.CUSTOMER_PO,
  T3.RELEASE_ID,
  T3.DUE_DATE;

CUSTOMER_PO     CREATE_DA DUE_DATE  REL_RUNNING_SUM
--------------- --------- --------- ---------------
1131111         27-FEB-13 24-APR-13             577
1131111         27-FEB-13 25-APR-13             578
1131111         27-FEB-13 26-APR-13             579
1131111         27-FEB-13 27-APR-13             579
1131111         27-FEB-13 28-APR-13             579
1131111         27-FEB-13 29-APR-13             580
1131111         27-FEB-13 30-APR-13             581
...
1131111         23-APR-13 30-JUL-13             636
1131111         23-APR-13 31-JUL-13             638
1131111         23-APR-13 01-AUG-13             639
...
1231280         11-MAR-13 08-MAY-13              58
1231280         11-MAR-13 09-MAY-13              58
1231280         11-MAR-13 10-MAY-13              58
1231280         11-MAR-13 11-MAY-13              58
...
1231280         23-APR-13 21-JUL-13              62
1231280         23-APR-13 22-JUL-13              62
1231280         23-APR-13 23-JUL-13              62
1231280         23-APR-13 24-JUL-13              64
1231280         23-APR-13 25-JUL-13              64
1231280         23-APR-13 26-JUL-13              64
...

We are getting close.  Now we just need to join the SQL created earlier in this article with the above SQL statement so that the current customer purchase order forecasts may be compared with the earlier received customer purchase order forecasts.  The LISTAGG analytic funtion, introduced with Oracle Database 11.1, makes it easy to condense multiple rows from the resultset into a much smaller number of rows, if the resultset is grouped on at least one column.  So, the final version of the analysis challenge appears as follows:

SELECT
  C.PART_ID,
  C.CUSTOMER_PO,
  C.POSSIBLE_FIRM_DATE,
  C.DUE_DATE,
  C.QTY,
  C.REL_RUNNING_SUM,
  LISTAGG(T4.REL_RUNNING_SUM||' ('||TO_CHAR(T4.CREATE_DATE,'MM/DD/YY')||')',', ') WITHIN GROUP (ORDER BY T4.CREATE_DATE DESC) EXCEPTIONS
FROM
  (SELECT
    T.PART_ID,
    T.CUSTOMER_PO,
    T.RELEASE_ID,
    T.RELEASE_DATE+55 POSSIBLE_FIRM_DATE,
    T.PREVIOUS_ACCUM_QTY,
    TL.DUE_DATE,
    TL.QTY,
    T.PREVIOUS_ACCUM_QTY + SUM(TL.QTY) OVER (PARTITION BY T.CUSTOMER_PO,T.RELEASE_ID ORDER BY TL.DUE_DATE) REL_RUNNING_SUM
  FROM
    (SELECT
      T.PART_ID,
      T.CUSTOMER_PO,
      ROW_NUMBER() OVER (PARTITION BY T.CUSTOMER_PO,TRUNC(T.CREATE_DATE) ORDER BY T.RELEASE_ID DESC) REL_ON_DATE,
      T.RELEASE_ID,
      T.RELEASE_DATE,
      T.CREATE_DATE,
      T.PREVIOUS_ACCUM_QTY
    FROM
      T1 T
    WHERE 
      T.CREATE_DATE=TO_DATE('24-APR-2013','DD-MON-YYYY')) T,
    T1_LINE TL
  WHERE
    T.REL_ON_DATE=1
    AND T.CUSTOMER_PO=TL.CUSTOMER_PO
    AND T.RELEASE_ID=TL.RELEASE_ID
    AND (T.RELEASE_DATE+100)>=TL.DUE_DATE) C,
  (SELECT
    T3.CUSTOMER_PO,
    T3.CREATE_DATE,
    T3.DUE_DATE,
    T3.PREVIOUS_ACCUM_QTY + SUM(NVL(TL.QTY,0)) OVER (PARTITION BY T3.CUSTOMER_PO,T3.RELEASE_ID ORDER BY T3.DUE_DATE) REL_RUNNING_SUM
  FROM
    (SELECT
      T2.CUSTOMER_PO,
      T2.RELEASE_ID,
      T2.CREATE_DATE,
      T2.PREVIOUS_ACCUM_QTY,
      D.DUE_DATE
    FROM
      (SELECT
        T.CUSTOMER_PO,
        ROW_NUMBER() OVER (PARTITION BY T.CUSTOMER_PO,TRUNC(T.CREATE_DATE) ORDER BY T.RELEASE_ID DESC) REL_ON_DATE,
        TRUNC(T.CREATE_DATE) CREATE_DATE,
        T.RELEASE_ID,
        T.PREVIOUS_ACCUM_QTY
      FROM
        T1 T
      WHERE 
        T.CREATE_DATE >= TO_DATE('24-APR-2013','DD-MON-YYYY') - 56
        AND T.CREATE_DATE < TO_DATE('24-APR-2013','DD-MON-YYYY')) T2,
      (SELECT
        TO_DATE('24-APR-2013','DD-MON-YYYY')+(LEVEL-1) DUE_DATE
      FROM
        DUAL
      CONNECT BY
        LEVEL<=100) D
    WHERE
      T2.REL_ON_DATE=1) T3,
    T1_LINE TL
  WHERE
    T3.CUSTOMER_PO=TL.CUSTOMER_PO(+)
    AND T3.RELEASE_ID=TL.RELEASE_ID(+)
    AND T3.DUE_DATE=TL.DUE_DATE(+)) T4
WHERE
  C.CUSTOMER_PO=T4.CUSTOMER_PO
  AND C.DUE_DATE=T4.DUE_DATE
  AND C.REL_RUNNING_SUM<>T4.REL_RUNNING_SUM
GROUP BY
  C.PART_ID,
  C.CUSTOMER_PO,
  C.POSSIBLE_FIRM_DATE,
  C.DUE_DATE,
  C.QTY,
  C.REL_RUNNING_SUM
ORDER BY
  C.PART_ID,
  C.CUSTOMER_PO,
  C.DUE_DATE;

PART_ID    CUSTOMER_PO     POSSIBLE_ DUE_DATE         QTY REL_RUNNING_SUM
---------- --------------- --------- --------- ---------- ---------------
EXCEPTIONS
-----------------------------------------------------------------------------------------------------------------------
RS812      1231280         18-JUN-13 27-JUN-13          1              61
60 (03/22/13), 60 (03/21/13), 60 (03/19/13), 60 (03/18/13), 60 (03/15/13), 60 (03/14/13), 59 (03/13/13), 59 (03/12/13),
59 (03/11/13), 59 (03/08/13), 60 (03/07/13), 60 (03/06/13), 60 (03/05/13), 60 (03/04/13), 59 (03/01/13), 59 (02/28/13),
59 (02/27/13)

RS812      1231280         18-JUN-13 03-JUL-13          1              62
61 (03/22/13), 61 (03/21/13), 61 (03/19/13), 61 (03/18/13), 61 (03/15/13), 61 (03/14/13), 60 (03/13/13), 60 (03/12/13),
60 (03/11/13), 60 (03/08/13), 61 (03/07/13), 61 (03/06/13), 61 (03/05/13), 61 (03/04/13), 60 (03/01/13), 60 (02/28/13),
60 (02/27/13)

RS812      1231280         18-JUN-13 24-JUL-13          2              64
63 (03/26/13), 63 (03/25/13), 62 (03/22/13), 62 (03/21/13), 62 (03/19/13), 62 (03/18/13), 62 (03/15/13), 62 (03/14/13),
61 (03/13/13), 61 (03/12/13), 61 (03/11/13), 61 (03/08/13), 62 (03/07/13), 62 (03/06/13), 62 (03/05/13), 62 (03/04/13),
61 (03/01/13), 61 (02/28/13), 61 (02/27/13)

It is now easy to see that the requested total quantity due through June 27, 2013 increased by 2 since the first forecast which was received on February 27, 2013.  The requested total quantity increased by 1 in the EDI release that was received on March 4, 2013, and by 1 again on the next EDI release that was received after March 22, 2013.  The second and third lines of the output show that there was a little more fluctuation in the customer’s forecast, increasing in quantity, decreasing in quantity, and then increasing in quantity again.

An unfortunate problem may be present – what if the customer completely eliminates the forecast orders on their purchase order, but still sends in a header line for the forecast?  A future enhancement may be required, where the CONNECT BY LEVEL trick is used in the first half of the SQL statement (the current releases), just as it is used in the second half of the SQL statement (the historical releases), so that when the top half of the final SQL statement is joined to the bottom half, rows returned by the bottom half of the SQL statement are not unnecessarily removed from the output.

The output is a little ugly.  One approach to clean up the output involves spooling the output to an HTML file as explained here and in the documentation here.  Another approach to clean up the output involves sending the data into an Excel spreadsheet using either a macro or the Microsoft Query Wizard, accessed from within Excel (see the articles in the Excel category of this blog for an explanation of these approaches).

—–

The final solution for this analysis challenge is so simple when built up from smaller SQL statements as shown above; I wonder why I did not consider this approach 12 years ago?  🙂





On the Topic of Programming 3

3 09 2012

September 3, 2012

(Back to the Previous Post in the Series)

In the previous article in this series, a question was posed regarding how many SQL statements would be required to generate the following report, which displays in Microsoft Excel:

A significant number of items regarding the report require explanation:

  • Employee IDs and employee names appear in column A (the left-most column).  There could be many employees, but only those employees reporting labor transactions for machining operations in production batches (work order lots) that completed during one of the weeks should be listed.
  • Part IDs, part descriptions, and operations (Sub n OP nn in the above example report) are listed in column B (the second column from the left).  The number of operations listed for an employee could range from one to possibly 50 (or more, depending on the date range of the report), but that list will only include machining type operations (not welding, blasting, painting, laser cutting, etc.).
  • Total hours are reported in column C.  The second output row for an employee shows the sum of the numbers directly below for the employee.  The numbers on the 3rd, 6th, 9th, etc. rows for the employee in column C are the sum of the hours the employee worked in the operation for all manufacturing batches that completed in the report time period.
  • The first row for each employee, in columns E though H in the above report (could very well extend beyond column H, depending on the number of weeks selected), are the Monday through Sunday dates for the week.
  • The second row for each employee, in columns E through H in the above report show the weighted average efficiency for the employee in that week.  The weighting is based on the percentage of hours the employee spent in the operations, for those operations in batches that completed during the week.  This number of hours is calculated, but not printed in the report.
  • The numbers on the 4th, 7th, 10th, etc. rows for each employee, in columns E through H, shows the average number of hours the operation required, for those batches that completed during the week.  If an employee did not work in the operation one week, but did another, then a blank should appear in the week that the employee did not work in the operation.  All employees that worked in the operation for the week will have the same average printed.  For example, Cary Grant and James Stewart both worked on WIDGET101 operation Sub 0 OP 30.  The average hours for the first week is 10.50, and that value appears for both employees.  During the second week, only James Stewart worked on that operation, and the average hours for that week decreased to 7.56.
  • The numbers of on the 3rd, 6th, 9th, etc. rows for each employee, shows the ratio of the engineering standard hours divided by the average hours for the operation in that week.  The same values will appear for all employees reporting time to the operation in the week.
  • The percentage in column I is the simple average of the weighted averages to the left.  There may be causes where a blank will appear for one of the weighted averages for an employee, so it is not possible to simply sum the weighted averages and divide by the number of weeks.

This appeared to be a simple report when it was just a sketch on a sheet of paper!

How many SQL statement executions will be necessary to generate the above report?  Before answering that question, let’s take another look at how the data is organized and related between the tables:

OK, so the above diagram might be a little confusing.

The engineering standard hours are found by querying the OPERATION table with the WORKORDER_TYPE=’W’ and the WORKORDER_BASE_ID equal to the part ID that is of interest (the WORK_ORDER table also contains engineering standard information, accessed by searching for TYPE=’M’ and the BASE_ID equal to the part ID tht is of interest).  In theory, there could be multiple engineering standards for the same part ID and operation, which is why we might need to join to the PART table to make certain that the current engineering standard is retrieved, but we will ignore that potential issue for now.  It would be a good idea to also check the WORK_ORDER table to obtain a distinct list of part IDs that had a close date in the time range that is of interest (there is no sense in forcing the database RDBMS to retrieve the engineering standards for parts that have not been produced in the last 17 years), so we will need to specify that the TYPE column in the WORK_ORDER table is equal to W and that the CLOSE_DATE is in the date range of the report.  Many operations are NOT machining operations, and we can only determine the type of operation by determining the type machine that is selected for use, so we must also query the SHOP_RESOURCE table to determine if the operation is processed at a machining center.

The actual production hours are found by querying the LABOR_TICKET table with the the WORKORDER_TYPE=’W’ (to eliminate spurious indirect labor) and joined to the SHOP_RESOURCE table to make certain that the operation is a machining type operation (as explained above).  To determine the employees’ name, the LABOR_TICKET table is joined to the EMPLOYEE table.  To determine the part ID that is produced in the labor ticket transaction, the LABOR_TICKET table must be joined to the WORK_ORDER table, with the TYPE column in the WORK_ORDER table set to ‘W’ and the SUB_ID column in the WORK_ORDER table set to ‘0’ so that the main part ID for the work order is returned rather than a subordinate part ID.  To retrieve the description of the part ID that is produced, the WORK_ORDER table must be joined to the PART table.

That sketch of a report on a sheet of paper sure looked simple, but the degree of difficulty is all relative (or relational).

How many query executions will be required to produce the report that will appear in Microsoft Excel?  Should we retrieve the entire database over the network and process the data client-side?  I don’t program in the Java programming language, but I am guessing that the majority of Java programmers would not attempt to retrieve the entire database.   Would the number of SQL statement executions depend on the number of employees?  Would the number of SQL statement executions depend on the number of different part operations whose batches closed in a particular week?  Would the number of SQL statement executions depend on the number weeks included in the report?

In the words of the TV show “Name that Tune“, I can name that tune in 1 note.  Because there are many to many joins between the various table, it is not possible to construct a single simple SQL statement that retrieves the correct result.  However, it is possible to construct a single complex query using inline views or subquery factoring (WITH blocks), along with analytic functions to retrieve the correct, necessary information to the client for presentation in Microsoft Excel.  The correct answer to the above question must:

  • Minimize the amount of data flowing across the network.  SQL*Net compression may help, but of course that can only possibly help if more than one row is retrieved at a time.  Ideally, the array fetch size should be set to a large value (possibly 100 up to the maximum of 5,000) to make the best use of SQL*Net compression.
  • Minimize as best as possible the performance impact on the database server, the database server is a shared resource without unlimited capacity.  Setting a reasonably large array fetch size may help reduce the number of memory accesses (specifically consistent gets) on the database server.  Make certain that available indexes are usable by the query – be careful about using functions (such as TRUNC) on date type columns in the WHERE clause.
  • When a query accesses many tables, and also when there are multiple inline views (or factored subqueries), the Oracle optimizer may not always find the optimal join order.  The _OPTIMIZER_MAX_PERMUTATIONS hidden parameter defaults to a value of 2,000, which of course restricts the number of possible join permutations attempted per query block in a query, so with many tables in a query block (more than 6, for instance), the limit for the maximum number of permutations may easily be hit.  Statistics must be reasonably accurate for cardinality estimates to be close to accurate, otherwise the query optimizer may select not only an inappropriate join order but also an inappropriate join type.  Histograms may help (or harm) the query optimizer, indexes may help (or harm), establishing default cardinality and cost for custom PL/SQL functions may help, and adding hints to help control the cardinality estimates or join orders may also help.
  • Lastly, and most importantly, the queries must provide correct information.

I struggled with this problem a bit.  The report is intended to measure efficiency, so it would be somewhat laughable if the report performed inefficiently.  How could I minimize execution time for this particular report?  If Oracle Database is licensed by CPU core (or CPU socket in the case of the Standard Edition), does it make sense to perform all of the computations on the database server, or does it make sense to off-load some of that processing to the client’s dual, quad, or eight core CPU?  If the report were rendered by a shared resource, such as a web server, application server, or Citrix server, would I want that same CPU offloading as I would have planned with a dedicated client computer?  The report shows the information in a (modified) pivot table format, should I use the PIVOT function that was introduced with Oracle Database 11.1; or maybe I should use client-side control break logic that was taught in one of my university-level structured programming classes years ago?

That sketch of a report on a sheet of paper sure looked simple, now I need a control break (must be simple one key solution for that problem – check your keyboard 🙂 ).

It is possible to group the manufacturing lots by the Monday of the work week of the completion date (CLOSE_DATE) with the following function call:

NEXT_DAY(TRUNC(CLOSE_DATE),'MONDAY')-7)

Looks fairly simple.  If I had a date range of four weeks in mind, I might be inclined to do something like this in the WHERE clause of the query:

CLOSE_DATE BETWEEN (NEXT_DAY(TO_DATE('20-AUG-2012','DD-MON-YYYY'),'MONDAY')-28) AND (NEXT_DAY(TO_DATE('20-AUG-2012','DD-MON-YYYY'),'MONDAY')-1)

However, I did not do that.  Why not?  One reason is that the Oracle query optimizer would likely have a hard time determining the approximate number of rows that would be returned from the table with the CLOSE_DATE column.  The NEXT_DAY(TO_DATE()) combination hides the actual date range of interest, so the optimizer falls back to using default cardinality percentages for that predicate.  Secondly, what happens when the user of the program is interested in something other than four weeks – the solution may be cumbersome to maintain.  Thirdly, the above did not make use of bind variables, so every time the date range is changed, the query optimizer will perform a hard parse of the query.  A better approach calculates the date range on the client-side and submits the WHERE clause like this:

CLOSE_DATE BETWEEN TO_DATE('30-JUL-2012','DD-MON-YYYY') AND TO_DATE('26-AUG-2012','DD-MON-YYYY')

Or, better yet using bind variables:

CLOSE_DATE BETWEEN :START_DATE AND :END_DATE

So, how many query executions will this report require?  Just one query, executed a single time.  The particular programming environment that I used requires ? to appear in bind variable positions, so when you see a ? just read it as either :START_DATE or :END_DATE.

SELECT
  L.EMPLOYEE_ID,
  E.LAST_NAME||', '||E.FIRST_NAME EMPLOYEE_NAME,
  L.PART_ID,
  L.WORKORDER_SUB_ID,
  L.OPERATION_SEQ_NO,
  L.DESCRIPTION,
  L.CLOSE_WEEK,
  L.PART_EFF_WEEK,
  L.AVG_HRS_PC,
  L.EMP_ACCUM_EFF_WEEK,
  SUM(L.HOURS_WORKED) OVER (PARTITION BY L.EMPLOYEE_ID) EMP_HOURS_WORKED,
  SUM(L.HOURS_WORKED) OVER (PARTITION BY L.EMPLOYEE_ID,L.PART_ID, L.WORKORDER_SUB_ID,L.OPERATION_SEQ_NO) EMP_PART_HOURS_WORKED,
  SUM(L.EMP_ACCUM_EFF_WEEK) OVER (PARTITION BY L.EMPLOYEE_ID,L.CLOSE_WEEK) EMP_EFF_WEEK
FROM
  (SELECT
    O.PART_ID,
    O.DESCRIPTION,
    O.WORKORDER_SUB_ID,
    O.OPERATION_SEQ_NO,
    O.CLOSE_WEEK,
    O.RECEIVED_QTY,
    O.ACT_RUN_HRS,
    L.EMPLOYEE_ID,
    L.HOURS_WORKED,
    ROUND(O.ACT_RUN_HRS/O.RECEIVED_QTY,2) AVG_HRS_PC,
    ENG.ENG_HRS,
    ROUND(DECODE((O.ACT_RUN_HRS/O.RECEIVED_QTY),0,1,ENG.ENG_HRS/(O.ACT_RUN_HRS/O.RECEIVED_QTY)),4) PART_EFF_WEEK,
    SUM(L.HOURS_WORKED) OVER (PARTITION BY L.EMPLOYEE_ID,L.CLOSE_WEEK) EMP_HRS_WEEK,
    ROUND(L.HOURS_WORKED/(SUM(L.HOURS_WORKED) OVER (PARTITION BY L.EMPLOYEE_ID,L.CLOSE_WEEK)),4) EMP_PERCENT_WEEK,
    ROUND((L.HOURS_WORKED/(SUM(L.HOURS_WORKED) OVER (PARTITION BY L.EMPLOYEE_ID,L.CLOSE_WEEK))) * DECODE((O.ACT_RUN_HRS/O.RECEIVED_QTY),0,1,ENG.ENG_HRS/(O.ACT_RUN_HRS/O.RECEIVED_QTY)),4) EMP_ACCUM_EFF_WEEK
  FROM
    (SELECT
      PART_ID,
      DESCRIPTION,
      WORKORDER_SUB_ID,
      OPERATION_SEQ_NO,
      NEXT_DAY(TRUNC(CLOSE_DATE),'MONDAY')-7 CLOSE_WEEK,
      SUM(RECEIVED_QTY) RECEIVED_QTY,
      SUM(ACT_RUN_HRS) ACT_RUN_HRS
    FROM
      (SELECT
        WO.TYPE,
        WO.BASE_ID,
        WO.LOT_ID,
        WO.SPLIT_ID,
        WO.RECEIVED_QTY,
        WO.CLOSE_DATE,
        SUM(O.ACT_SETUP_HRS+O.ACT_RUN_HRS) ACT_RUN_HRS,
        WO.PART_ID,
        O.WORKORDER_SUB_ID,
        O.SEQUENCE_NO OPERATION_SEQ_NO,
        P.DESCRIPTION
      FROM
        WORK_ORDER WO,
        PART P,
        OPERATION O,
        SHOP_RESOURCE SR
      WHERE
        WO.TYPE='W'
        AND WO.SUB_ID='0'
        AND WO.CLOSE_DATE BETWEEN ? AND ?
        AND WO.RECEIVED_QTY>0
        AND WO.TYPE=O.WORKORDER_TYPE
        AND WO.BASE_ID=O.WORKORDER_BASE_ID
        AND WO.LOT_ID=O.WORKORDER_LOT_ID
        AND WO.SPLIT_ID=O.WORKORDER_SPLIT_ID
        AND WO.PART_ID=P.ID
        AND O.RESOURCE_ID=SR.ID
        AND SR.BUILDING_ID='Machine'
        AND SR.LOGICAL_GROUP IN ('10 TURNING CENTERS','20 VERTICAL M.C.','30 630MM PALLET HMC','40 800MM-1000MM PALLET HMC','50 1000MM PALLET HMC','55 LARGE 2-TABLE HMC/HBM','60 SINGLE TABLE HBM','60 SINGLE TABLE HDM','70 LARGE FLOOR MACHINES')
      GROUP BY
        WO.TYPE,
        WO.BASE_ID,
        WO.LOT_ID,
        WO.SPLIT_ID,
        WO.CLOSE_DATE,
        WO.RECEIVED_QTY,
        WO.PART_ID,
        O.WORKORDER_SUB_ID,
        O.SEQUENCE_NO,
        P.DESCRIPTION)
    GROUP BY
      PART_ID,
      DESCRIPTION,
      WORKORDER_SUB_ID,
      OPERATION_SEQ_NO,
      NEXT_DAY(TRUNC(CLOSE_DATE),'MONDAY')-7) O,
    (SELECT /*+ LEADING(WO LT) */
      WO.PART_ID,
      LT.EMPLOYEE_ID,
      LT.WORKORDER_SUB_ID,
      LT.OPERATION_SEQ_NO,
      NEXT_DAY(TRUNC(CLOSE_DATE),'MONDAY')-7 CLOSE_WEEK,
      SUM(LT.HOURS_WORKED) HOURS_WORKED
    FROM
      (SELECT
        WO.TYPE,
        WO.BASE_ID,
        WO.LOT_ID,
        WO.SPLIT_ID,
        WO.CLOSE_DATE,
        WO.PART_ID
      FROM
        WORK_ORDER WO,
        OPERATION O,
        SHOP_RESOURCE SR
      WHERE
        WO.TYPE='W'
        AND WO.SUB_ID='0'
        AND WO.CLOSE_DATE BETWEEN ? AND ?
        AND WO.RECEIVED_QTY>0
        AND WO.TYPE=O.WORKORDER_TYPE
        AND WO.BASE_ID=O.WORKORDER_BASE_ID
        AND WO.LOT_ID=O.WORKORDER_LOT_ID
        AND WO.SPLIT_ID=O.WORKORDER_SPLIT_ID
        AND O.RESOURCE_ID=SR.ID
        AND SR.BUILDING_ID='Machine'
        AND SR.LOGICAL_GROUP IN ('10 TURNING CENTERS','20 VERTICAL M.C.','30 630MM PALLET HMC','40 800MM-1000MM PALLET HMC','50 1000MM PALLET HMC','55 LARGE 2-TABLE HMC/HBM','60 SINGLE TABLE HBM','60 SINGLE TABLE HDM','70 LARGE FLOOR MACHINES')
      GROUP BY
        WO.TYPE,
        WO.BASE_ID,
        WO.LOT_ID,
        WO.SPLIT_ID,
        WO.CLOSE_DATE,
        WO.PART_ID) WO,
      LABOR_TICKET LT,
      SHOP_RESOURCE SR
    WHERE
      WO.TYPE=LT.WORKORDER_TYPE
      AND WO.BASE_ID=LT.WORKORDER_BASE_ID
      AND WO.LOT_ID=LT.WORKORDER_LOT_ID
      AND WO.SPLIT_ID=LT.WORKORDER_SPLIT_ID
      AND LT.TYPE IN ('R','S')
      AND LT.HOURS_WORKED<>0
      AND SR.LOGICAL_GROUP IN ('10 TURNING CENTERS','20 VERTICAL M.C.','30 630MM PALLET HMC','40 800MM-1000MM PALLET HMC','50 1000MM PALLET HMC','55 LARGE 2-TABLE HMC/HBM','60 SINGLE TABLE HBM','60 SINGLE TABLE HDM','70 LARGE FLOOR MACHINES')
      AND LT.RESOURCE_ID=SR.ID
    GROUP BY
      WO.PART_ID,
      LT.WORKORDER_SUB_ID,
      LT.OPERATION_SEQ_NO,
      LT.EMPLOYEE_ID,
      NEXT_DAY(TRUNC(CLOSE_DATE),'MONDAY')-7) L,
    (SELECT /*+ LEADING(WO) */
      WO2.TYPE,
      WO2.BASE_ID,
      O.WORKORDER_SUB_ID,
      O.SEQUENCE_NO OPERATION_SEQ_NO,
      SUM(O.SETUP_HRS+O.RUN_HRS) ENG_HRS,
      WO.PART_ID
    FROM
      (SELECT
        WO.PART_ID
      FROM
        WORK_ORDER WO
      WHERE
        WO.TYPE='W'
        AND WO.SUB_ID='0'
        AND WO.CLOSE_DATE BETWEEN ? AND ?
        AND WO.RECEIVED_QTY>0
      GROUP BY
        WO.PART_ID) WO,
      WORK_ORDER WO2,
      OPERATION O,
      SHOP_RESOURCE SR
    WHERE
      WO2.TYPE='M'
      AND WO.PART_ID=WO2.BASE_ID
      AND WO2.LOT_ID='0'
      AND WO2.SPLIT_ID='0'
      AND WO2.SUB_ID='0'
      AND WO2.TYPE=O.WORKORDER_TYPE
      AND WO2.BASE_ID=O.WORKORDER_BASE_ID
      AND WO2.LOT_ID=O.WORKORDER_LOT_ID
      AND WO2.SPLIT_ID=O.WORKORDER_SPLIT_ID
      AND O.RESOURCE_ID=SR.ID
      AND SR.BUILDING_ID='Machine'
      AND SR.LOGICAL_GROUP IN ('10 TURNING CENTERS','20 VERTICAL M.C.','30 630MM PALLET HMC','40 800MM-1000MM PALLET HMC','50 1000MM PALLET HMC','55 LARGE 2-TABLE HMC/HBM','60 SINGLE TABLE HBM','60 SINGLE TABLE HDM','70 LARGE FLOOR MACHINES')
    GROUP BY
      WO2.TYPE,
      WO2.BASE_ID,
      WO.PART_ID,
      O.WORKORDER_SUB_ID,
      O.SEQUENCE_NO) ENG
  WHERE
    O.PART_ID=L.PART_ID
    AND O.WORKORDER_SUB_ID=L.WORKORDER_SUB_ID
    AND O.OPERATION_SEQ_NO=L.OPERATION_SEQ_NO
    AND O.CLOSE_WEEK=L.CLOSE_WEEK
    AND O.PART_ID=ENG.PART_ID
    AND O.WORKORDER_SUB_ID=ENG.WORKORDER_SUB_ID
    AND O.OPERATION_SEQ_NO=ENG.OPERATION_SEQ_NO) L,
  EMPLOYEE E
WHERE
  L.EMPLOYEE_ID=E.ID
ORDER BY
  L.EMPLOYEE_ID,
  L.PART_ID,
  TO_NUMBER(L.WORKORDER_SUB_ID),
  L.OPERATION_SEQ_NO,
  L.CLOSE_WEEK;

I suppose that I could have performed more of the processing on the database server by sliding the above query into an inline view and used the PIVOT function, and/or additional analytic functions to calculate the employee’s weighted efficiency average per week and the simple averages of the wieghted averages – doing so would have eliminated some potentially complex client-side logic that needed to be programmed a single time, but counter point is that every time the report executed, it would require a more of the database server’s resources than were absolutely required.

That sketch of a report on a sheet of paper sure looked simple, but it turned into a three part blog article series.  Interesting, now the person would like to analyze the data by part ID, listing the employees working on the operations for the part.  Do I smell another three part blog article series?  Sorry, no – I just changed the ORDER BY clause so that the PART_ID column was listed first, and made a small change to the client-side control break logic.  Now the person wants to analyze 104 weeks worth of data rather than just four weeks, and only a single part ID or employee.  Too late, those changes were anticipated, and included in the original specification in the client-side programming, implemented with a simple change to the query and pre-planning for a variable number of weeks in the report.





Matching the Expected Output – Analytic RANK, ROW_NUMBER, DENSE_RANK, or Something Different?

16 11 2011

November 16, 2011

I noticed an interesting request on the comp.databases.oracle.misc usenet group, and I thought that I would make an attempt at solving the request.  The original poster (OP) stated that he had a table with data similar to the following:

TX   ID   DEPT   LOCATION   LOAD
1    99    A        NY       12
2    99    A        LA       10
3    99    B        LA       05
4    77    B        LA       15
5    77    C        NY       12
6    77    D        LA       11  

He would like to obtain the following output:

TX   ID   DEPT   DEPT_RANK   LOCATION   LOC_RANK   LOAD
1    99    A         1          NY         2        12
2    99    A         1          LA         1        10
3    99    B         2          LA         1        05
4    77    B         1          LA         1        15
5    77    C         2          NY         2        12
6    77    D         3          LA         1        11  

The rankings are to be determined as follows:

DEPT_RANK for ID 99 is 1 for A because sum(LOAD) = 22 is the max
LOC_RANK for ID 99 is 1 for LA because sum(LOAD) = 15 is the max

At first glance, that request seems to be reasonably easy to accomplish.  Let’s start by creating a table with the sample data (ideally, the OP should have provided the DDL and DML to create and populate this table):

DROP TABLE T1 PURGE;

CREATE TABLE T1 (
  TX NUMBER,
  ID NUMBER,
  DEPT VARCHAR2(1),
  LOCATION VARCHAR2(2),
  LOAD NUMBER);

INSERT INTO T1 VALUES (1,99,'A','NY',12);
INSERT INTO T1 VALUES (2,99,'A','LA',10);
INSERT INTO T1 VALUES (3,99,'B','LA',05);
INSERT INTO T1 VALUES (4,77,'B','LA',15);
INSERT INTO T1 VALUES (5,77,'C','NY',12);
INSERT INTO T1 VALUES (6,77,'D','LA',11); 

The first step, if we were to think about creating the solution in logical steps, is to find a way to calculate the SUM values that were mentioned by the OP.  So, as a starting point, we might try this:

SELECT
  TX,
  ID,
  DEPT,
  SUM(LOAD) OVER (PARTITION BY ID,DEPT) SUM_LOAD_ID,
  LOCATION,
  SUM(LOAD) OVER (PARTITION BY ID,LOCATION) SUM_LOAD_LOCATION,
  LOAD
FROM
  T1
ORDER BY
  TX;

 TX  ID D SUM_LOAD_ID LO SUM_LOAD_LOCATION  LOAD
--- --- - ----------- -- ----------------- -----
  1  99 A          22 NY                12    12
  2  99 A          22 LA                15    10
  3  99 B           5 LA                15     5
  4  77 B          15 LA                26    15
  5  77 C          12 NY                12    12
  6  77 D          11 LA                26    11 

If I am understanding the OP’s request correctly, the above is a good starting point (even though the alias for the first analytic function could have been better selected).

We are then able to take the above SQL statement and push it into an inline view to hopefully produce the output that is expected by the OP  (note that the PARTITION clause differs for the LOC_RANK column from what is specified in the inline view for the function that is used to create that column):

SELECT
  TX,
  ID,
  DEPT,
  RANK() OVER (PARTITION BY ID,DEPT ORDER BY SUM_LOAD_ID DESC) DEPT_RANK,
  LOCATION,
  RANK() OVER (PARTITION BY ID,DEPT ORDER BY SUM_LOAD_LOCATION DESC) LOC_RANK,
  LOAD
FROM
  (SELECT
    TX,
    ID,
    DEPT,
    SUM(LOAD) OVER (PARTITION BY ID,DEPT) SUM_LOAD_ID,
    LOCATION,
    SUM(LOAD) OVER (PARTITION BY ID,LOCATION) SUM_LOAD_LOCATION,
    LOAD
  FROM
    T1)
ORDER BY
  TX;

 TX  ID D  DEPT_RANK LO   LOC_RANK  LOAD
--- --- - ---------- -- ---------- -----
  1  99 A          1 NY          2    12
  2  99 A          1 LA          1    10
  3  99 B          1 LA          1     5
  4  77 B          1 LA          1    15
  5  77 C          1 NY          1    12
  6  77 D          1 LA          1    11 

Let’s compare the above output with what the OP requested:

TX   ID   DEPT   DEPT_RANK   LOCATION   LOC_RANK   LOAD
1    99    A         1          NY         2        12
2    99    A         1          LA         1        10
3    99    B         2          LA         1        05
4    77    B         1          LA         1        15
5    77    C         2          NY         2        12
6    77    D         3          LA         1        11

Almost for the LOC_RANK column (not even close for the DEPT_RANK column), but not quite right.  The problem is that when attempting to calculate the RANK columns in the above output, we should only PARTITION on the ID column, not the ID column and some other column, as was the case when we used the SUM analytic function.

Let’s fix the PARTITION clause and try again:

SELECT
  TX,
  ID,
  DEPT,
  RANK() OVER (PARTITION BY ID ORDER BY SUM_LOAD_ID DESC) DEPT_RANK,
  LOCATION,
  RANK() OVER (PARTITION BY ID ORDER BY SUM_LOAD_LOCATION DESC) LOC_RANK,
  LOAD
FROM
  (SELECT
    TX,
    ID,
    DEPT,
    SUM(LOAD) OVER (PARTITION BY ID,DEPT) SUM_LOAD_ID,
    LOCATION,
    SUM(LOAD) OVER (PARTITION BY ID,LOCATION) SUM_LOAD_LOCATION,
    LOAD
  FROM
    T1)
ORDER BY
  TX;

 TX  ID D  DEPT_RANK LO   LOC_RANK  LOAD
--- --- - ---------- -- ---------- -----
  1  99 A          1 NY          3    12
  2  99 A          1 LA          1    10
  3  99 B          3 LA          1     5
  4  77 B          1 LA          1    15
  5  77 C          2 NY          3    12
  6  77 D          3 LA          1    11 

Let’s compare the above output with what the OP requested:

TX   ID   DEPT   DEPT_RANK   LOCATION   LOC_RANK   LOAD
1    99    A         1          NY         2        12
2    99    A         1          LA         1        10
3    99    B         2          LA         1        05
4    77    B         1          LA         1        15
5    77    C         2          NY         2        12
6    77    D         3          LA         1        11

Notice in the above that when two rows have the same SUM_LOAD_ value, the displayed rank is correct, but that repeated rank value then causes a rank value to be skipped (compare the DEPT_RANK column value on row 3, the LOC_RANK column value on row 1, and the LOC_RANK column value on row 5).

Now what?  The ROW_NUMBER function could be used to produce sequential rank numbers without gaps, for example:

SELECT
  TX,
  ID,
  DEPT,
  ROW_NUMBER() OVER (PARTITION BY ID ORDER BY SUM_LOAD_ID DESC) DEPT_RANK,
  LOCATION,
  ROW_NUMBER() OVER (PARTITION BY ID ORDER BY SUM_LOAD_LOCATION DESC) LOC_RANK,
  LOAD
FROM
  (SELECT
    TX,
    ID,
    DEPT,
    SUM(LOAD) OVER (PARTITION BY ID,DEPT) SUM_LOAD_ID,
    LOCATION,
    SUM(LOAD) OVER (PARTITION BY ID,LOCATION) SUM_LOAD_LOCATION,
    LOAD
  FROM
    T1)
ORDER BY
  TX;

 TX  ID D  DEPT_RANK LO   LOC_RANK  LOAD
--- --- - ---------- -- ---------- -----
  1  99 A          1 NY          3    12
  2  99 A          2 LA          2    10
  3  99 B          3 LA          1     5
  4  77 B          1 LA          1    15
  5  77 C          2 NY          3    12
  6  77 D          3 LA          2    11 

The above output, as mentioned, does not match the output requested by the OP, since the OP’s requested output specifes that equal values for different rows should show the same rank value.

One more try using the DENSE_RANK analytic function:

SELECT
  TX,
  ID,
  DEPT,
  DENSE_RANK() OVER (PARTITION BY ID ORDER BY SUM_LOAD_ID DESC) DEPT_RANK,
  LOCATION,
  DENSE_RANK() OVER (PARTITION BY ID ORDER BY SUM_LOAD_LOCATION DESC) LOC_RANK,
  LOAD
FROM
  (SELECT
    TX,
    ID,
    DEPT,
    SUM(LOAD) OVER (PARTITION BY ID,DEPT) SUM_LOAD_ID,
    LOCATION,
    SUM(LOAD) OVER (PARTITION BY ID,LOCATION) SUM_LOAD_LOCATION,
    LOAD
  FROM
    T1)
ORDER BY
  TX;

 TX  ID D  DEPT_RANK LO   LOC_RANK  LOAD
--- --- - ---------- -- ---------- -----
  1  99 A          1 NY          2    12
  2  99 A          1 LA          1    10
  3  99 B          2 LA          1     5
  4  77 B          1 LA          1    15
  5  77 C          2 NY          2    12
  6  77 D          3 LA          1    11 

Let’s compare the above output with what the OP requested:

TX   ID   DEPT   DEPT_RANK   LOCATION   LOC_RANK   LOAD
1    99    A         1          NY         2        12
2    99    A         1          LA         1        10
3    99    B         2          LA         1        05
4    77    B         1          LA         1        15
5    77    C         2          NY         2        12
6    77    D         3          LA         1        11

As best as I am able to determine, the above SQL statement will satisfy the OP’s request.

—–

Part 2 of the Challenge

If the OP has the following SQL statement:

SELECT
  LOAD_YEAR,
  ORG_UNIT_CODE,
  RANK () OVER (PARTITION BY LOAD_YEAR, CLASSIF_CODE ORDER BY SUM (FTE_DAYS) DESC) ORG_RANK,
  CLASSIF_CODE,
  RANK () OVER (PARTITION BY LOAD_YEAR, ORG_UNIT_CODE ORDER BY SUM (FTE_DAYS) DESC) CLASSIF_RANK,
  SUM (FTE_DAYS) FTE
FROM
  STAFF_LOAD
GROUP BY
  LOAD_YEAR,
  ORG_UNIT_CODE,
  CLASSIF_CODE;  

And the above SQL statement produces the following output:

YEAR ORG_UNIT_CODE ORG_RANK CLASSIF_CODE CLASSIF_RANK FTE
2010 A46                  1 HEW3                    1  59
2010 A42                  2 HEW3                    1  13
2010 A42                  1 HEW4                    1  13
2010 A46                  2 HEW4                    2  12

And the OP wants the output to look like this:

YEAR ORG_UNIT_CODE ORG_RANK CLASSIF_CODE CLASSIF_RANK FTE
2010 A46                  1 HEW3                    1  59
2010 A42                  2 HEW3                    1  13
2010 A42                  2 HEW4                    2  13
2010 A46                  1 HEW4                    2  12  

Write the DDL and DML statements to create the source table and populate it with the non-aggregated original data, and then produce the output requested by the OP (without looking at the updated usenet thread).





Simple SQL with and without Inline Views

21 10 2011

October 21, 2011

(Forward to the Next Post in the Series)

Sometimes it is interesting to take a step back from some of the more difficult to construct SQL statements, and just put together something simple.  The following request recently arrived through an ERP mailing list:

“I want to find the last ship date for all the open part numbers in Customer Order Entry. The last ship date is not necessarily from the Open order in Customer Order Entry…”

The original poster (OP) provided more information, and even provided a description of what was already tried as a solution.  It is not exceptionally clear what the OP wanted, but because I have several years of experience working with this particular ERP system, I had a feeling that the OP might be headed in the wrong direction.  This ERP system uses tables that are well normalized, which would make it a bit more difficult to locate the most recent ship date for a part number that is recorded with one customer order’s tables’ rows when another customer order’s tables’ rows are reported.  Unless, of course, we remember that there is also a transaction table that records every inventory movement of part numbers through the system – we just need to know what those rows in the transaction table look like.  In this case, the rows have the following characteristics:

  • The CUST_ORDER_ID column value must not be NULL
  • The TYPE column must have a value of ‘O’
  • The CLASS column must have a value of ‘I’

With that knowledge, we could then find the most recent date that any part number shipped with a SQL statement similar to the following:

SELECT
  PART_ID,
  MAX(TRANSACTION_DATE) AS LAST_TRANSACTION_DATE
FROM
  INVENTORY_TRANS
WHERE
  CUST_ORDER_ID IS NOT NULL
  AND TYPE='O'
  AND CLASS='I'
GROUP BY
  PART_ID;

In the above, the AS keyword is optional on an Oracle Database platform, but it is not optional on some other database platforms.

With an Oracle Database backend, inline views may be used to permit the above SQL statement to be used to retrieve additional information for a parent SQL statement.  If we wrap the above SQL statement in ( ) and place it in the FROM clause of the SQL statement, Oracle will treat the data returned by the above SQL statement quite similar to how it would handle a regular table (the Oracle query optimizer might decide to rewrite the combined SQL statement into an equivalent form that no longer contains an inline view).  There is always a chance that a part number that is on order may have never shipped to a customer, so we will need to outer join (in this case using a left outer join) to the newly created inline view:

SELECT
  COL.CUST_ORDER_ID,
  COL.PART_ID,
  LS.LAST_TRANSACTION_DATE
FROM
  CUSTOMER_ORDER CO,
  CUST_ORDER_LINE COL,
  (SELECT
    PART_ID,
    MAX(TRANSACTION_DATE) AS LAST_TRANSACTION_DATE
  FROM
    INVENTORY_TRANS
  WHERE
    CUST_ORDER_ID IS NOT NULL
    AND TYPE='O'
    AND CLASS='I'
  GROUP BY
    PART_ID) LS
WHERE
  CO.ID=COL.CUST_ORDER_ID
  AND CO.STATUS IN ('U','F','R')
  AND COL.LINE_STATUS='A'
  AND COL.ORDER_QTY>COL.TOTAL_SHIPPED_QTY
  AND COL.PART_ID=LS.PART_ID(+);

In the above, I joined the CUSTOMER_ORDER and CUST_ORDER_LINE tables to the inline view that I aliased as LS.  A third table, CUST_LINE_DEL, that optionally contains the delivery schedule for some of the rows in the CUST_ORDER_LINE table, could have also been outer joined to the CUST_ORDER_LINE table.

Running SQL Server, or just feeling ANSI?  If so, the above may be rewritten as follows:

SELECT
  COL.CUST_ORDER_ID,
  COL.PART_ID,
  LS.LAST_TRANSACTION_DATE
FROM
  CUSTOMER_ORDER CO
JOIN
  CUST_ORDER_LINE COL
ON
  CO.ID=COL.CUST_ORDER_ID
LEFT OUTER JOIN
  (SELECT
    PART_ID,
    MAX(TRANSACTION_DATE) AS LAST_TRANSACTION_DATE
  FROM
    INVENTORY_TRANS
  WHERE
    CUST_ORDER_ID IS NOT NULL
    AND TYPE='O'
    AND CLASS='I'
  GROUP BY
    PART_ID) LS
ON
  COL.PART_ID=LS.PART_ID
WHERE
  CO.STATUS IN ('U','F','R')
  AND COL.LINE_STATUS='A'
  AND COL.ORDER_QTY>COL.TOTAL_SHIPPED_QTY;

Unfortunately, the OP is actually using a SQLBase database backend that does not support inline views.  I remember the feeling before I discovered that Oracle Database supported inline views… in that case I would do something like the following:

  1. Create a statically defined view.
  2. Join to that statically defined view just as if the view were a table.
CREATE VIEW CUST_ORDER_PART_LS AS
SELECT
  PART_ID,
  MAX(TRANSACTION_DATE) AS LAST_TRANSACTION_DATE
FROM
  INVENTORY_TRANS
WHERE
  CUST_ORDER_ID IS NOT NULL
  AND TYPE='O'
  AND CLASS='I'
GROUP BY
  PART_ID;

SELECT
  COL.CUST_ORDER_ID,
  COL.PART_ID,
  LS.LAST_TRANSACTION_DATE
FROM
  CUSTOMER_ORDER CO,
  CUST_ORDER_LINE COL,
  CUST_ORDER_PART_LS LS
WHERE
  CO.ID=COL.CUST_ORDER_ID
  AND CO.STATUS IN ('U','F','R')
  AND COL.LINE_STATUS='A'
  AND COL.ORDER_QTY>COL.TOTAL_SHIPPED_QTY
  AND COL.PART_ID=LS.PART_ID(+);

I guess that it is a good exercise once in a while to practice simple SQL.

A second example from the ERP mailing list showed a SQL Server solution for a particular problem.  The particular problem that the SQL Server solution set out to solve is as follows:

“Someone posed an interesting question to me – How can you do a mass insert of Operations? Well most of us know how to manage that without too much trouble but this one came with a couple of caveats – The Master’s first operation (lowest numbered) has to have a particular ResourceID and then a different but specific ResourceID must be on a subsequent Operation. It is after the second operation where the insert must be placed.

Sounds like fun – So I figured that it could be done in a single SQL statement – well after a 4-way join of the Operation table to itself I got it to work how I expected…”

Interesting, I think that there are times in the past that I have used multiple self-joins to a table in order to solve similar SQL problems.  However, there is an easier way using analytic functions.  The following SQL statement attempts to indicate: the sequence of the operation within the work order sub ID, the sequence number that follows sequentially, the second sequence number that follows sequentially, and the resource ID of the next operation in sequential order by the operation number:

SELECT
  WORKORDER_BASE_ID,
  WORKORDER_LOT_ID,
  WORKORDER_SPLIT_ID,
  WORKORDER_SUB_ID,
  SEQUENCE_NO,
  RESOURCE_ID,
  ROW_NUMBER() OVER (PARTITION BY WORKORDER_BASE_ID, WORKORDER_LOT_ID, WORKORDER_SPLIT_ID, WORKORDER_SUB_ID
                     ORDER BY RESOURCE_ID) AS RN,
  LEAD(RESOURCE_ID,1) OVER (PARTITION BY WORKORDER_BASE_ID, WORKORDER_LOT_ID, WORKORDER_SPLIT_ID, WORKORDER_SUB_ID
                            ORDER BY RESOURCE_ID) AS NEXT_RESOURCE_ID,
  LEAD(SEQUENCE_NO,1) OVER (PARTITION BY WORKORDER_BASE_ID, WORKORDER_LOT_ID, WORKORDER_SPLIT_ID, WORKORDER_SUB_ID
                            ORDER BY RESOURCE_ID) AS NEXT_SEQ_NO,
  LEAD(SEQUENCE_NO,2) OVER (PARTITION BY WORKORDER_BASE_ID, WORKORDER_LOT_ID, WORKORDER_SPLIT_ID, WORKORDER_SUB_ID
                            ORDER BY RESOURCE_ID) AS NEXT_NEXT_SEQ_NO
FROM
  OPERATION
WHERE
  WORKORDER_TYPE='M'; 

Next, we only care about those rows when our resource ID of interest (69 in my example) is specified in the first operation on the work order sub ID and the second operation on that work order sub ID specifies the other resource ID that is of interest (FW in my example).  If we are to avoid primary key violations, we should select a new sequence number that is half way between the next-next operation and the next operation:

SELECT
  WORKORDER_BASE_ID,
  WORKORDER_LOT_ID,
  WORKORDER_SPLIT_ID,
  WORKORDER_SUB_ID,
  SEQUENCE_NO,
  NEXT_SEQ_NO,
  NEXT_NEXT_SEQ_NO,
  NEXT_SEQ_NO + ROUND((COALESCE(NEXT_NEXT_SEQ_NO,NEXT_SEQ_NO+10) - NEXT_SEQ_NO)/2) NEW_SEQUENCE_NO,
  'NEW RESOURCE' RESOURCE_ID
FROM
  (SELECT
    WORKORDER_BASE_ID,
    WORKORDER_LOT_ID,
    WORKORDER_SPLIT_ID,
    WORKORDER_SUB_ID,
    SEQUENCE_NO,
    RESOURCE_ID,
    ROW_NUMBER() OVER (PARTITION BY WORKORDER_BASE_ID, WORKORDER_LOT_ID, WORKORDER_SPLIT_ID, WORKORDER_SUB_ID
                       ORDER BY RESOURCE_ID) AS RN,
    LEAD(RESOURCE_ID,1) OVER (PARTITION BY WORKORDER_BASE_ID, WORKORDER_LOT_ID, WORKORDER_SPLIT_ID, WORKORDER_SUB_ID
                              ORDER BY RESOURCE_ID) AS NEXT_RESOURCE_ID,
    LEAD(SEQUENCE_NO,1) OVER (PARTITION BY WORKORDER_BASE_ID, WORKORDER_LOT_ID, WORKORDER_SPLIT_ID, WORKORDER_SUB_ID
                              ORDER BY RESOURCE_ID) AS NEXT_SEQ_NO,
    LEAD(SEQUENCE_NO,2) OVER (PARTITION BY WORKORDER_BASE_ID, WORKORDER_LOT_ID, WORKORDER_SPLIT_ID, WORKORDER_SUB_ID
                              ORDER BY RESOURCE_ID) AS NEXT_NEXT_SEQ_NO
  FROM
    OPERATION
  WHERE
    WORKORDER_TYPE='M')
WHERE
  RN=1
  AND RESOURCE_ID='69'
  AND NEXT_RESOURCE_ID='FW';

WORKORDER_ WOR WOR WOR SEQUENCE_NO NEXT_SEQ_NO NEXT_NEXT_SEQ_NO NEW_SEQUENCE_NO RESOURCE_ID
---------- --- --- --- ----------- ----------- ---------------- --------------- ------------
231610     0   0   0            10         777              888             833 NEW RESOURCE
237680     0   0   0            10         777              888             833 NEW RESOURCE
32018917X  0   0   0            10         777              888             833 NEW RESOURCE
3201B8920  0   0   0            10         777              888             833 NEW RESOURCE
3201C8765  0   0   0            10         777              888             833 NEW RESOURCE
3201G8639  0   0   0            10         777              888             833 NEW RESOURCE
3201G9003  0   0   0            10         777              888             833 NEW RESOURCE
3201J8772  0   0   0            10         777              888             833 NEW RESOURCE
3201J8850  0   0   0            10         777              888             833 NEW RESOURCE
3201K8669  0   0   0            10         777              888             833 NEW RESOURCE
3201M8281  0   0   0            10         777              888             833 NEW RESOURCE
3201M8489  0   0   0            10         777              888             833 NEW RESOURCE
3201N8516  0   0   0            10         777              888             833 NEW RESOURCE
3201N8776  0   0   0            10         777              888             833 NEW RESOURCE
3201Q8545  0   0   0            10         777              888             833 NEW RESOURCE
3201T8964  0   0   0            10         777              888             833 NEW RESOURCE
3201T8964X 0   0   0            10          20               30              25 NEW RESOURCE
3201V8524  0   0   0            10         777              888             833 NEW RESOURCE
3201V8966  0   0   0            10         777              888             833 NEW RESOURCE
3201V8966X 0   0   0            10          20               30              25 NEW RESOURCE
3201W8967  0   0   0            10         777              888             833 NEW RESOURCE
3201W8967X 0   0   0            10          20               30              25 NEW RESOURCE
3201Y8423  0   0   0            10         777              888             833 NEW RESOURCE
3201Z8996  0   0   0            10         777              888             833 NEW RESOURCE

24 rows selected. 

With the above, we are well on our way to perfoming a mass insert into this table.  If I recall correctly, the above must first be inserted into another table (a temp table, preferrably) and then the rows may be inserted into the OPERATION table (the table that is the row source for the analytic functions used in the SQL statement).





Finding Rows with Common Attributes – Roman to Find a Solution in Something New

1 07 2011

July 1, 1011

A recent thread in the comp.databases.oracle.server Usenet group (actually two threads) asked an interesting question.  Assume that you had a detail table that contained several attributes for each of the unique key values.  How would one go about finding all of the unique key values that share the same set of attributes?  The sample set provided by the OP looks like this:

COL1   COL2
----  -----
I         a
I         b
I         c
II        a
II        b
III       a
III       b
III       c 

For the above, assume that the OP was interested in the attributes of “I”: a,b,c.  “II” lacks a “c” attribute, while “III” has the required “a”, “b”, and “c” attributes.  So, the OP would like to return C1 value “III” but not “II”.  I wonder if there is a simple solution for the OP?

First, let’s create our test data.  COL1 appears to contain Roman numbers – if we go beyond the number 3, those could be tricky to generate (unless of course you find the RN format parameter for the TO_CHAR function).  Let’s first create a temporary work table that contains the Roman numbers from 1 to 100 and a random number between 1 and 10:

CREATE TABLE T1_TEMP AS
SELECT
  TRIM(TO_CHAR(ROWNUM,'RN')) C1,
  TRUNC(DBMS_RANDOM.VALUE(1,10)+1) C2,
  ROWNUM C3
FROM
  DUAL
CONNECT BY
  LEVEL<=100; 

Let’s see what is in the T1_TEMP table:

COLUMN C1 FORMAT A10

SELECT
  *
FROM
  T1_TEMP
ORDER BY
  C3;

C1                 C2         C3
---------- ---------- ----------
I                  10          1
II                  4          2
III                 7          3
IV                  9          4
V                   8          5
VI                 10          6
VII                 9          7
VIII                4          8
IX                  4          9
X                  10         10
...
XCV                 5         95
XCVI                4         96
XCVII               8         97
XCVIII              7         98
XCIX               10         99
C                   4        100

100 rows selected. 

The row with the value “I” in column C1 has the number 10 in column C2, but that number might be a bit different in your temporary work table.  Column C2 will determine the number of attributes that are added for each of the values found in column C1 when we create the table T1 (note that we could have defined column C2 with the function CHR(96 + COUNTER) to place lowercase letters in that column, rather than numbers, to help reproduce the OP’s dataset):

CREATE TABLE T1 AS
SELECT
  T1_TEMP.C1,
  V1.COUNTER C2
FROM
  T1_TEMP,
  (SELECT
     ROWNUM COUNTER
   FROM
     DUAL
   CONNECT BY
     LEVEL<=10) V1
WHERE
  T1_TEMP.C2>=V1.COUNTER; 

Let’s see what is in table T1:

SELECT
  *
FROM
  T1
ORDER BY
  C1,
  C2;

C1                 C2
---------- ----------
C                   1
C                   2
C                   3
C                   4
I                   1
I                   2
I                   3
I                   4
I                   5
I                   6
I                   7
I                   8
I                   9
I                  10
...
XXXVII              1
XXXVII              2
XXXVII              3
XXXVIII             1
XXXVIII             2
XXXVIII             3
XXXVIII             4
XXXVIII             5
XXXVIII             6

634 rows selected.  

From the above output, you can see that we now have the number of rows in table T1 for each distinct value of C1 as was specified in table T1_TEMP.  An interesting side-note, the Roman number 100 (C) is less than the Roman number 1 (I) – I guess that explains why computers do not natively use Roman numbers for calculations.  🙂

For the next step, we need to collapse the different C2 values for each of the unique C1 values into a single row.  Oracle Database 11.2.0.1 introduced the LISTAGG function that makes easy work of this task, as shown in this earlier blog article.

COLUMN C2_LISTING FORMAT A22

SELECT
  C1,
  LISTAGG(TO_CHAR(C2), ',') WITHIN GROUP (ORDER BY C2) C2_LISTING
FROM
  T1
GROUP BY
  C1
ORDER BY
  C1;

C1         C2_LISTING
---------- --------------------
C          1,2,3,4
I          1,2,3,4,5,6,7,8,9,10
II         1,2,3,4
III        1,2,3,4,5,6,7
IV         1,2,3,4,5,6,7,8,9
IX         1,2,3,4
...
XXXV       1,2,3,4,5,6,7,8,9
XXXVI      1,2,3,4,5
XXXVII     1,2,3
XXXVIII    1,2,3,4,5,6

100 rows selected. 

The question remains, how can I find all of the unique C1 values that have all of the same attributes as the C1 value “I” – in this case 1,2,3,4,5,6,7,8,9,10?  One method slides the above query into a WITH block and then the WITH block is referenced twice in the main query:

WITH MY_VIEW AS
(SELECT
  C1,
  LISTAGG(TO_CHAR(C2), ',') WITHIN GROUP (ORDER BY C2) C2_LISTING
FROM
  T1
GROUP BY
  C1)
SELECT
  V2.C1,
  V2.C2_LISTING
FROM
  MY_VIEW V1,
  MY_VIEW V2
WHERE
  V1.C1='I'
  AND V1.C1<>V2.C1
  AND V1.C2_LISTING=V2.C2_LISTING
ORDER BY
  V2.C1;

C1         C2_LISTING
---------- --------------------
LVII       1,2,3,4,5,6,7,8,9,10
LXXI       1,2,3,4,5,6,7,8,9,10
LXXIII     1,2,3,4,5,6,7,8,9,10
VI         1,2,3,4,5,6,7,8,9,10
X          1,2,3,4,5,6,7,8,9,10
XCIX       1,2,3,4,5,6,7,8,9,10
XV         1,2,3,4,5,6,7,8,9,10
XXIX       1,2,3,4,5,6,7,8,9,10
XXXI       1,2,3,4,5,6,7,8,9,10

9 rows selected. 

How else might you solve the problem posted by the OP in the Usenet thread?





Row Values to Comma Separated Lists, an Overly Complicated Use Case Example

26 05 2011

May 26, 2011

In a previous article I showed a simple method to convert values found in multiple table rows into a comma delimited list.  The method works very well, until the situation in which the approach is to be used becomes a bit more complex.

Assume for a moment that the following table structure exists where the approach must be implemented:

In the above diagram, the PURCHASE_ORDER table is the parent table, the PURC_ORDER_LINE table is the child table (the line-level detail), and the PURC_LINE_DEL table is the grand-child table (the sub-line level detail).  A fairly simple setup, and I suspect not a too terribly uncommon setup.

Now the complexity.  Assume that an ERP package using the above tables generates a report using the tables’ data, and as the ERP package is flexible, it allows additional information to be added to the reports – essentially, it is possible to execute a query that retrieves a single row for every detail-level row in the report.  Great, except that in this particular report, the detail-level row level is at the PURC_ORDER_LINE table.  If there were 18 rows, for instance, in the PURC_LINE_DEL table (the schedule that indicates when the parts should be delivered) for the first order line in the PURC_ORDER_LINE table for a particular order ID that is found in the PURCHASE_ORDER table, then we have a bit of a problem.  Fortunately, this ERP package’s report was constructed such that the DESIRED_RECV_DATE and USER_ORDER_QTY columns are represented in two report variables with CRLF (carriage return – line feed) delimited lists, making it very easy to list the delivery dates that the parts are due, and the quantities that are due for each of those dates.  Fantastic, so where is the complexity?  What if the purchase order must be revised after the supplier starts sending the parts on the agreed upon delivery dates, such that the first 9 of the 18 delivery schedules for line 1 of a purchase order have been satisfied, 1 of the 18 delivery schedule lines has been partially satisfied, and the remaining 8 delivery schedule lines have dates that are in the future?  This might be helpful information to provide to the supplier when presenting the updated purchase order.  Unfortunately, the ERP package’s report does not deliver the USER_RECEIVED_QTY column as a CRLF delimited list, as it does for the DESIRED_RECV_DATE and USER_ORDER_QTY columns. 

A complex problem, or not?  Well, the ERP package does permit a SQL statement returning a single row to be executed per detail row in the report (at the PURC_ORDER_LINE table level), so we have a possible solution.  However, if the SQL statement is too complex (having inline views is one such complication), the complex portion of the SQL statement must be isolated in a statically defined view, and then that view should be referenced by the report.  A complex problem, or not?  Let’s try building a solution starting in SQL*Plus:

SELECT
  PURC_ORDER_ID,
  PURC_ORDER_LINE_NO,
  USER_RECEIVED_QTY,
  ROW_NUMBER() OVER (PARTITION BY PURC_ORDER_ID,PURC_ORDER_LINE_NO ORDER BY DEL_SCHED_LINE_NO) RN
FROM
  PURC_LINE_DEL
WHERE
  PURC_ORDER_ID='146825'
  AND PURC_ORDER_LINE_NO=1
ORDER BY
  DEL_SCHED_LINE_NO;

PURC_ORDER_ID   PURC_ORDER_LINE_NO USER_RECEIVED_QTY         RN
--------------- ------------------ ----------------- ----------
146825                           1                 1          1
146825                           1                 1          2
146825                           1                 2          3
146825                           1                 2          4
146825                           1                 2          5
146825                           1                 3          6
146825                           1                 3          7
146825                           1                 1          8
146825                           1                 3          9
146825                           1                 2         10
146825                           1                 0         11
146825                           1                 0         12
146825                           1                 0         13
146825                           1                 0         14
146825                           1                 0         15
146825                           1                 0         16
146825                           1                 0         17
146825                           1                 0         18 

Simple so far, that line identified by RN 10 requests an order quantity of 3, but the supplier only delivered 2 of the 3 parts for that date.  Now let’s generate a comma separated list using the SYS_CONNECT_BY_PATH function:

SELECT
  SYS_CONNECT_BY_PATH(TO_CHAR(USER_RECEIVED_QTY),',') LISTING
FROM
(SELECT
  PURC_ORDER_ID,
  PURC_ORDER_LINE_NO,
  USER_RECEIVED_QTY,
  ROW_NUMBER() OVER (PARTITION BY PURC_ORDER_ID,PURC_ORDER_LINE_NO ORDER BY DEL_SCHED_LINE_NO) RN
FROM
  PURC_LINE_DEL
WHERE
  PURC_ORDER_ID='146825'
  AND PURC_ORDER_LINE_NO=1
ORDER BY
  DEL_SCHED_LINE_NO) V
CONNECT BY PRIOR
  RN = RN-1
START WITH
  RN = 1;

LISTING
------------------------------------

,1
,1,1
,1,1,2
,1,1,2,2
,1,1,2,2,2
,1,1,2,2,2,3
,1,1,2,2,2,3,3
,1,1,2,2,2,3,3,1
,1,1,2,2,2,3,3,1,3
,1,1,2,2,2,3,3,1,3,2
,1,1,2,2,2,3,3,1,3,2,0
,1,1,2,2,2,3,3,1,3,2,0,0
,1,1,2,2,2,3,3,1,3,2,0,0,0
,1,1,2,2,2,3,3,1,3,2,0,0,0,0
,1,1,2,2,2,3,3,1,3,2,0,0,0,0,0
,1,1,2,2,2,3,3,1,3,2,0,0,0,0,0,0
,1,1,2,2,2,3,3,1,3,2,0,0,0,0,0,0,0
,1,1,2,2,2,3,3,1,3,2,0,0,0,0,0,0,0,0 

We only want the last row returned from the above result, and we need to change those commas to a CRLF character combination (after eliminating the initial comma):

SELECT
  MAX(REPLACE(SUBSTR(SYS_CONNECT_BY_PATH(TO_CHAR(USER_RECEIVED_QTY),','),2),',',CHR(13)||CHR(10))) LISTING
FROM
(SELECT
  PURC_ORDER_ID,
  PURC_ORDER_LINE_NO,
  USER_RECEIVED_QTY,
  ROW_NUMBER() OVER (PARTITION BY PURC_ORDER_ID,PURC_ORDER_LINE_NO ORDER BY DEL_SCHED_LINE_NO) RN
FROM
  PURC_LINE_DEL
WHERE
  PURC_ORDER_ID='146825'
  AND PURC_ORDER_LINE_NO=1
ORDER BY
  DEL_SCHED_LINE_NO) V
CONNECT BY PRIOR
  RN = RN-1
START WITH
  RN = 1;

LISTING
-------

1
1
2
2
2
3
3
1
3
2
0
0
0
0
0
0
0
0

Looks like mission accomplished, all of the previously received quantities are returned in a single query row… except now we have a complicated query that must be isolated in a statically defined view (silly rules…).

Before we head down the path of isolating the complex portion of the query, I thought that it might be entertaining to examine a couple of execution plan to see what happens when we isolate the complicated portion of the query in a statically defined view.  First, we will take a look at the last of the above SQL statements:

SET LINESIZE 140
SET TRIMSPOOL ON
SET PAGESIZE 1000
SET AUTOTRACE TRACEONLY EXPLAIN

SELECT
  MAX(REPLACE(SUBSTR(SYS_CONNECT_BY_PATH(TO_CHAR(USER_RECEIVED_QTY),','),2),',',CHR(13)||CHR(10))) LISTING
FROM
(SELECT
  PURC_ORDER_ID,
  PURC_ORDER_LINE_NO,
  USER_RECEIVED_QTY,
  ROW_NUMBER() OVER (PARTITION BY PURC_ORDER_ID,PURC_ORDER_LINE_NO ORDER BY DEL_SCHED_LINE_NO) RN
FROM
  PURC_LINE_DEL
WHERE
  PURC_ORDER_ID='146825'
  AND PURC_ORDER_LINE_NO=1
ORDER BY
  DEL_SCHED_LINE_NO) V
CONNECT BY PRIOR
  RN = RN-1
START WITH
  RN = 1;

Plan hash value: 929864805

----------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |               |     1 |    26 |     4  (25)| 00:00:01 |
|   1 |  SORT AGGREGATE                          |               |     1 |    26 |            |          |
|*  2 |   CONNECT BY NO FILTERING WITH START-WITH|               |       |       |            |          |
|   3 |    VIEW                                  |               |     4 |   104 |     3   (0)| 00:00:01 |
|   4 |     WINDOW NOSORT                        |               |     4 |    76 |     3   (0)| 00:00:01 |
|   5 |      TABLE ACCESS BY INDEX ROWID         | PURC_LINE_DEL |     4 |    76 |     3   (0)| 00:00:01 |
|*  6 |       INDEX RANGE SCAN                   | SYS_C0021790  |     4 |       |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("RN"-1=PRIOR "RN")
       filter("RN"=1)
   6 - access("PURC_ORDER_ID"='146825' AND "PURC_ORDER_LINE_NO"=1) 

The above looks simple, and the query really does execute quickly.  For the above complex portion of the query to work in a statically defined view, we need to relocate the WHERE clause outside of the inline view, so that when the report runs, it is able to pass in the appropriate values for the PURC_ORDER_ID and PURC_ORDER_LINE_NO columns.  Here is the execution plan (generated on Oracle Database 11.2.0.2) when the WHERE clause is relocated:

SELECT
  MAX(REPLACE(SUBSTR(SYS_CONNECT_BY_PATH(TO_CHAR(USER_RECEIVED_QTY),','),2),',',CHR(13)||CHR(10))) LISTING
FROM
(SELECT
  PURC_ORDER_ID,
  PURC_ORDER_LINE_NO,
  USER_RECEIVED_QTY,
  ROW_NUMBER() OVER (PARTITION BY PURC_ORDER_ID,PURC_ORDER_LINE_NO ORDER BY DEL_SCHED_LINE_NO) RN
FROM
  PURC_LINE_DEL
ORDER BY
  DEL_SCHED_LINE_NO) V
WHERE
  PURC_ORDER_ID='146825'
  AND PURC_ORDER_LINE_NO=1
CONNECT BY PRIOR
  RN = RN-1
START WITH
  RN = 1;

Plan hash value: 2107147172

-------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                 | Name          | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                          |               |     1 |    48 |       |  1193  (50)| 00:00:05 |
|   1 |  SORT AGGREGATE                           |               |     1 |    48 |       |            |          |
|*  2 |   FILTER                                  |               |       |       |       |            |          |
|*  3 |    CONNECT BY NO FILTERING WITH START-WITH|               |       |       |       |            |          |
|   4 |     VIEW                                  |               | 53851 |  2524K|       |   625   (3)| 00:00:03 |
|   5 |      SORT ORDER BY                        |               | 53851 |   999K|  1704K|   625   (3)| 00:00:03 |
|   6 |       WINDOW SORT                         |               | 53851 |   999K|  1704K|   625   (3)| 00:00:03 |
|   7 |        TABLE ACCESS FULL                  | PURC_LINE_DEL | 53851 |   999K|       |    57   (8)| 00:00:01 |
-------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("PURC_ORDER_ID"='146825' AND "PURC_ORDER_LINE_NO"=1)
   3 - access("RN"-1=PRIOR "RN")
       filter("RN"=1) 

The above looks to be reasonably sane, although the predicted amount of time jumped from about 1 second to about 5 seconds.  The Predicate Information section of the execution plan indicates that the filter for the specified PURC_ORDER_ID and PURC_ORDER_LINE_NO was not pushed into the inline view as I had hoped.  The actual execution time for this simple query is much longer than 5 seconds, spending 100% of its time burning CPU time (I killed the query after waiting several minutes).  No problem, I will just use a hint to force the optimizer to push those predicates into the inline view:

SELECT /*+ PUSH_PRED(V) */
  MAX(REPLACE(SUBSTR(SYS_CONNECT_BY_PATH(TO_CHAR(USER_RECEIVED_QTY),','),2),',',CHR(13)||CHR(10))) LISTING
FROM
(SELECT
  PURC_ORDER_ID,
  PURC_ORDER_LINE_NO,
  USER_RECEIVED_QTY,
  ROW_NUMBER() OVER (PARTITION BY PURC_ORDER_ID,PURC_ORDER_LINE_NO ORDER BY DEL_SCHED_LINE_NO) RN
FROM
  PURC_LINE_DEL
ORDER BY
  DEL_SCHED_LINE_NO) V
WHERE
  PURC_ORDER_ID='146825'
  AND PURC_ORDER_LINE_NO=1
CONNECT BY PRIOR
  RN = RN-1
START WITH
  RN = 1;

Plan hash value: 2107147172

-------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                 | Name          | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                          |               |     1 |    48 |       |  1193  (50)| 00:00:05 |
|   1 |  SORT AGGREGATE                           |               |     1 |    48 |       |            |          |
|*  2 |   FILTER                                  |               |       |       |       |            |          |
|*  3 |    CONNECT BY NO FILTERING WITH START-WITH|               |       |       |       |            |          |
|   4 |     VIEW                                  |               | 53851 |  2524K|       |   625   (3)| 00:00:03 |
|   5 |      SORT ORDER BY                        |               | 53851 |   999K|  1704K|   625   (3)| 00:00:03 |
|   6 |       WINDOW SORT                         |               | 53851 |   999K|  1704K|   625   (3)| 00:00:03 |
|   7 |        TABLE ACCESS FULL                  | PURC_LINE_DEL | 53851 |   999K|       |    57   (8)| 00:00:01 |
-------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("PURC_ORDER_ID"='146825' AND "PURC_ORDER_LINE_NO"=1)
   3 - access("RN"-1=PRIOR "RN")
       filter("RN"=1) 

Well, the above is disappointing.  I could dig through a 10053 trace to try to determine why the hint was “ignored“, or I could try another query that produces an equivalent result.  I choose the latter.  Let’s try using the COUNT analytic function to replace the ROW_NUMBER analytic function:

SELECT
  MAX(REPLACE(SUBSTR(SYS_CONNECT_BY_PATH(TO_CHAR(USER_RECEIVED_QTY),','),2),',',CHR(13)||CHR(10))) LISTING
FROM
(SELECT
  PURC_ORDER_ID,
  PURC_ORDER_LINE_NO,
  DEL_SCHED_LINE_NO,
  USER_RECEIVED_QTY,
  COUNT(*) OVER (PARTITION BY PURC_ORDER_ID,PURC_ORDER_LINE_NO ORDER BY DEL_SCHED_LINE_NO) RN
FROM
  PURC_LINE_DEL
WHERE
  PURC_ORDER_ID='146825'
  AND PURC_ORDER_LINE_NO=1
ORDER BY
  DEL_SCHED_LINE_NO) V
CONNECT BY PRIOR
  RN=RN-1
START WITH
  RN=1;

Plan hash value: 617003335

----------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |               |     1 |    26 |     4  (25)| 00:00:01 |
|   1 |  SORT AGGREGATE                          |               |     1 |    26 |            |          |
|*  2 |   CONNECT BY NO FILTERING WITH START-WITH|               |       |       |            |          |
|   3 |    VIEW                                  |               |     4 |   104 |     3   (0)| 00:00:01 |
|   4 |     WINDOW BUFFER                        |               |     4 |    76 |     3   (0)| 00:00:01 |
|   5 |      TABLE ACCESS BY INDEX ROWID         | PURC_LINE_DEL |     4 |    76 |     3   (0)| 00:00:01 |
|*  6 |       INDEX RANGE SCAN                   | SYS_C0021790  |     4 |       |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("RN"-1=PRIOR "RN")
       filter("RN"=1)
   6 - access("PURC_ORDER_ID"='146825' AND "PURC_ORDER_LINE_NO"=1) 

The execution plan appears to be reasonable – the predicates on the PURC_ORDER_ID and PURC_ORDER_LINE_NO columns are applied early, so the query executes quickly.  Now to move the WHERE clause:

SELECT
  MAX(REPLACE(SUBSTR(SYS_CONNECT_BY_PATH(TO_CHAR(USER_RECEIVED_QTY),','),2),',',CHR(13)||CHR(10))) LISTING
FROM
(SELECT
  PURC_ORDER_ID,
  PURC_ORDER_LINE_NO,
  DEL_SCHED_LINE_NO,
  USER_RECEIVED_QTY,
  COUNT(*) OVER (PARTITION BY PURC_ORDER_ID,PURC_ORDER_LINE_NO ORDER BY DEL_SCHED_LINE_NO) RN
FROM
  PURC_LINE_DEL
ORDER BY
  DEL_SCHED_LINE_NO) V
WHERE
  PURC_ORDER_ID='146825'
  AND PURC_ORDER_LINE_NO=1
CONNECT BY PRIOR
  RN=RN-1
START WITH
  RN=1;

Plan hash value: 2107147172

-------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                 | Name          | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                          |               |     1 |    48 |       |  1193  (50)| 00:00:05 |
|   1 |  SORT AGGREGATE                           |               |     1 |    48 |       |            |          |
|*  2 |   FILTER                                  |               |       |       |       |            |          |
|*  3 |    CONNECT BY NO FILTERING WITH START-WITH|               |       |       |       |            |          |
|   4 |     VIEW                                  |               | 53851 |  2524K|       |   625   (3)| 00:00:03 |
|   5 |      SORT ORDER BY                        |               | 53851 |   999K|  1704K|   625   (3)| 00:00:03 |
|   6 |       WINDOW SORT                         |               | 53851 |   999K|  1704K|   625   (3)| 00:00:03 |
|   7 |        TABLE ACCESS FULL                  | PURC_LINE_DEL | 53851 |   999K|       |    57   (8)| 00:00:01 |
-------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("PURC_ORDER_ID"='146825' AND "PURC_ORDER_LINE_NO"=1)
   3 - access("RN"-1=PRIOR "RN")
       filter("RN"=1) 

As had happened before, the predicates for the PURC_ORDER_ID and PURC_ORDER_LINE_NO columns were not pushed into the inline view, so the query executes extremely slow.  Let’s try the PUSH_PRED hint again:

SELECT /*+ PUSH_PRED(V) */
  MAX(REPLACE(SUBSTR(SYS_CONNECT_BY_PATH(TO_CHAR(USER_RECEIVED_QTY),','),2),',',CHR(13)||CHR(10))) LISTING
FROM
(SELECT
  PURC_ORDER_ID,
  PURC_ORDER_LINE_NO,
  DEL_SCHED_LINE_NO,
  USER_RECEIVED_QTY,
  COUNT(*) OVER (PARTITION BY PURC_ORDER_ID,PURC_ORDER_LINE_NO ORDER BY DEL_SCHED_LINE_NO) RN
FROM
  PURC_LINE_DEL
ORDER BY
  DEL_SCHED_LINE_NO) V
WHERE
  PURC_ORDER_ID='146825'
  AND PURC_ORDER_LINE_NO=1
CONNECT BY PRIOR
  RN=RN-1
START WITH
  RN=1;

Plan hash value: 2107147172

-------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                 | Name          | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                          |               |     1 |    48 |       |  1193  (50)| 00:00:05 |
|   1 |  SORT AGGREGATE                           |               |     1 |    48 |       |            |          |
|*  2 |   FILTER                                  |               |       |       |       |            |          |
|*  3 |    CONNECT BY NO FILTERING WITH START-WITH|               |       |       |       |            |          |
|   4 |     VIEW                                  |               | 53851 |  2524K|       |   625   (3)| 00:00:03 |
|   5 |      SORT ORDER BY                        |               | 53851 |   999K|  1704K|   625   (3)| 00:00:03 |
|   6 |       WINDOW SORT                         |               | 53851 |   999K|  1704K|   625   (3)| 00:00:03 |
|   7 |        TABLE ACCESS FULL                  | PURC_LINE_DEL | 53851 |   999K|       |    57   (8)| 00:00:01 |
-------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("PURC_ORDER_ID"='146825' AND "PURC_ORDER_LINE_NO"=1)
   3 - access("RN"-1=PRIOR "RN")
       filter("RN"=1) 

The execution plan is unchanged.  Out of luck…

I remembered reading about a new analytic function introduced in Oracle Database 11.2.0.1, what was that new function… LISTAGG?  I am not a fan of that function name, Oracle should have named the function __CREATE_A_DARN_COMMA_SEPARATED_LIST_QUICKLY (a double underscore function, making it a challenge to find the function).  The LISTAGG function is described in the Oracle documentation library for 11.2.  Let’s give the LISTAGG function a try (the SQL statement is also now formatting the USER_RECEIVED_QTY so that the column values consistently show 2 decimal places):

SELECT
  LISTING
FROM
(SELECT
  LISTAGG(TO_CHAR(USER_RECEIVED_QTY,'999,990.00'), CHR(13)||CHR(10)) WITHIN GROUP (ORDER BY DEL_SCHED_LINE_NO) LISTING
FROM
  PURC_LINE_DEL
WHERE
  PURC_ORDER_ID='146825'
  AND PURC_ORDER_LINE_NO=1
GROUP BY
  PURC_ORDER_ID,
  PURC_ORDER_LINE_NO);

Plan hash value: 3148992471

----------------------------------------------------------------------------------------------
| Id  | Operation                    | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |               |     1 |    19 |     3   (0)| 00:00:01 |
|   1 |  SORT GROUP BY               |               |     1 |    19 |     3   (0)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID| PURC_LINE_DEL |     4 |    76 |     3   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | SYS_C0021790  |     4 |       |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("PURC_ORDER_ID"='146825' AND "PURC_ORDER_LINE_NO"=1) 

The calculated cost for the query dropped by 1 and the CPU component of that cost dropped to 0%, so that must be a good thing (the calculated cost can be misleading, just as we saw earlier that the estimated time can be misleading).  Without AUTOTRACE TRACEONLY EXPLAIN enabled, the query executes very quickly and produces output that looks similar to this:

LISTING
-----------

       1.00
       1.00
       2.00
       2.00
       2.00
       3.00
       3.00
       1.00
       3.00
       2.00
       0.00
       0.00
       0.00
       0.00
       0.00
       0.00
       0.00
       0.00
 

OK, let’s re-enable AUTOTRACE TRACEONLY EXPLAIN and try moving the WHERE clause so that we are able to package the complex portion of this query into a statically defined view:

SELECT
  LISTING
FROM
(SELECT
  PURC_ORDER_ID,
  PURC_ORDER_LINE_NO,
  LISTAGG(TO_CHAR(USER_RECEIVED_QTY,'999,990.00'), CHR(13)||CHR(10)) WITHIN GROUP (ORDER BY DEL_SCHED_LINE_NO) LISTING
FROM
  PURC_LINE_DEL
GROUP BY
  PURC_ORDER_ID,
  PURC_ORDER_LINE_NO) V
WHERE
  PURC_ORDER_ID='146825'
  AND PURC_ORDER_LINE_NO=1;

Plan hash value: 3148992471

----------------------------------------------------------------------------------------------
| Id  | Operation                    | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |               |     1 |    19 |     3   (0)| 00:00:01 |
|   1 |  SORT GROUP BY               |               |     1 |    19 |     3   (0)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID| PURC_LINE_DEL |     4 |    76 |     3   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | SYS_C0021790  |     4 |       |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("PURC_ORDER_ID"='146825' AND "PURC_ORDER_LINE_NO"=1) 

In the above, notice that the  predicates for the PURC_ORDER_ID and PURC_ORDER_LINE_NO columns were pushed into the inline view, or more correctly, the inline view was completely eliminated by the query optimizer (notice that there is no line in the execution plan showing VIEW as we saw in the previous examples above).

OK, so Oracle’s query optimizer worked as I had hoped with the inline view, but what about a statically defined view – would it also execute the SQL statement efficiently?  Let’s create the view:

CREATE VIEW TEST_LA AS
SELECT
  PURC_ORDER_ID,
  PURC_ORDER_LINE_NO,
  LISTAGG(TO_CHAR(USER_RECEIVED_QTY,'999,990.00'), CHR(13)||CHR(10)) WITHIN GROUP (ORDER BY DEL_SCHED_LINE_NO) LISTING
FROM
  PURC_LINE_DEL
GROUP BY
  PURC_ORDER_ID,
  PURC_ORDER_LINE_NO; 

Now let’s try the query with AUTOTRACE TRACEONLY EXPLAIN enabled:

SELECT
  LISTING
FROM
  TEST_LA
WHERE
  PURC_ORDER_ID='146825'
  AND PURC_ORDER_LINE_NO=1;

Plan hash value: 3148992471

----------------------------------------------------------------------------------------------
| Id  | Operation                    | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |               |     1 |    19 |     3   (0)| 00:00:01 |
|   1 |  SORT GROUP BY               |               |     1 |    19 |     3   (0)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID| PURC_LINE_DEL |     4 |    76 |     3   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | SYS_C0021790  |     4 |       |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("PURC_ORDER_ID"='146825' AND "PURC_ORDER_LINE_NO"=1) 

The execution plan above shows the same cost (3) that we saw earlier, and once again the view was eliminated by Oracle’s query optimizer (as indicated by the absence of a VIEW line in the execution plan), allowing the query to execute quickly since the predicates would be applied early in the query execution.

So, Oracle Database 11.2.0.1 and above provide an efficient way to retrieve the information that we need in a comma (or CRLF) delimited list for the report.  When the SQL statement is added to the report (defined as an “extended query” in the ERP package), the SQL statement looks like the following (the reporting package does not permit line breaks in the SQL statement):

SELECT (LISTAGG(TO_CHAR(USER_RECEIVED_QTY,'999,990.00'), CHR(13)||CHR(10)) WITHIN GROUP (ORDER BY DEL_SCHED_LINE_NO)) FROM PURC_LINE_DEL WHERE PURC_ORDER_ID= : PO_ID AND PURC_ORDER_LINE_NO= :LN_LINE_NO GROUP BY PURC_ORDER_ID , PURC_ORDER_LINE_NO

Oddly enough, by enclosing the LISTAGG analytic funtion in the SQL statement with ( ), the reporting package did not require me to use a statically defined view for the complex portion of the query.  But even if the statically defined view was required, as shown above, the query would still have executed efficiently.





SELECT Statement is Fast, INSERT INTO Using the SELECT Statement is Brutally Slow 2

18 12 2010

December 18, 2010 (Modified December 19, 2010)

(Back to the Previous Post in the Series) (Forward to the Next Post in the Series)

The previous blog article in this series resulted in several interesting comments with very useful advice.  But was the performance issue just an isolated case, one that only happens in one database, with one Oracle Database release version?  We need a test case to determine how frequently this problem may occur.  First, we will create two tables, table T1 will emulate table EDI830 from the previous blog article, and table T1_LINES will emulate the EDI830_LINES table.  A PADDING column will be used to make the average row length roughly equivalent to that of the original tables:

CREATE TABLE T1(
  CUSTPO VARCHAR2(12) NOT NULL,
  RELID VARCHAR2(12) NOT NULL,
  PADDING VARCHAR2(40),
  CONSTRAINT T1_PK PRIMARY KEY (CUSTPO,RELID));

CREATE TABLE T1_LINES(
  CUSTPO VARCHAR2(12) NOT NULL,
  RELID VARCHAR2(12) NOT NULL,
  FCSTQTY NUMBER NOT NULL,
  FC VARCHAR2(1),
  FCDUEDATE DATE NOT NULL,
  CONSTRAINT T1_LINES_PK PRIMARY KEY (CUSTPO,RELID,FCDUEDATE,FCSTQTY)); 

Next, the table T1 is populated in a repeatable way that as best as possible simulates how the original table was actually populated in production:

INSERT INTO T1
SELECT
  RPAD(CHR(MOD(ROWNUM-1,26)+65)||TO_CHAR(ROWNUM),12,'A') CUSTPO,
  LPAD('1',12,'0') RELID,
  LPAD('A',40,'A') PADDING
FROM
  DUAL
CONNECT BY
  LEVEL<=700;

INSERT INTO
  T1
SELECT
  CUSTPO,
  LPAD(TO_CHAR(V2.RN+1),12,'0') RELID,
  LPAD('A',40,'A') PADDING
FROM
  (SELECT /*+ NO_MERGE */
    CUSTPO,
    PADDING,
    MOD(ROWNUM,10)*30+19 RN
  FROM
    (SELECT /*+ NO_MERGE */
      CUSTPO,
      PADDING
    FROM
      T1
    ORDER BY
      CUSTPO)) V1,
  (SELECT /*+ NO_MERGE */
    ROWNUM RN
  FROM
    DUAL
  CONNECT BY
    LEVEL<=319) V2
WHERE
  V2.RN<=V1.RN; 

If we select from table T1, we are able to see how the row counts are distributed in an unequal, yet repeatable way, with 700 distinct customer purchase order numbers: 

SELECT
  CUSTPO,
  MIN(RELID),
  MAX(RELID)
FROM
  T1
GROUP BY
  CUSTPO
ORDER BY
  CUSTPO;

CUSTPO       MIN(RELID)   MAX(RELID)
------------ ------------ ------------
A105AAAAAAAA 000000000001 000000000050
A131AAAAAAAA 000000000001 000000000080
A157AAAAAAAA 000000000001 000000000110
A183AAAAAAAA 000000000001 000000000140
A1AAAAAAAAAA 000000000001 000000000170
A209AAAAAAAA 000000000001 000000000200
A235AAAAAAAA 000000000001 000000000230
A261AAAAAAAA 000000000001 000000000260
...

Z624AAAAAAAA 000000000001 000000000230
Z650AAAAAAAA 000000000001 000000000260
Z676AAAAAAAA 000000000001 000000000290
Z78AAAAAAAAA 000000000001 000000000020

700 rows selected. 

Just as a quick experiment, let’s take a look at the output of the SQL statement that will be used to populate the line detail table (T1_LINES) for the customer POs:

SELECT
  CUSTPO,
  RELID,
  10 FCSTQTY,
  'A' FC,
  TO_DATE('01-JAN-2011','DD-MON-YYYY') + (V2.RN*3 - V1.REL*2) FCDUEDATE
FROM
  (SELECT /*+ NO_MERGE */
    CUSTPO,
    RELID,
    ROW_NUMBER() OVER (PARTITION BY RELID ORDER BY CUSTPO) RN,
    ROW_NUMBER() OVER (PARTITION BY CUSTPO ORDER BY RELID DESC) REL
  FROM
    T1
  WHERE
    CUSTPO IN ('Z78AAAAAAAAA','Y155AAAAAAAA')) V1,
  (SELECT /*+ NO_MERGE */
    ROWNUM RN
  FROM
    DUAL
  CONNECT BY
    LEVEL<=100) V2
WHERE
  V2.RN < (MOD(V1.RN,100)+10)
ORDER BY
  CUSTPO,
  RELID,
  5;

CUSTPO       RELID           FCSTQTY F FCDUEDATE
------------ ------------ ---------- - ---------
Y155AAAAAAAA 000000000001         10 A 26-SEP-10
Y155AAAAAAAA 000000000001         10 A 29-SEP-10
Y155AAAAAAAA 000000000001         10 A 02-OCT-10
Y155AAAAAAAA 000000000001         10 A 05-OCT-10
Y155AAAAAAAA 000000000001         10 A 08-OCT-10
Y155AAAAAAAA 000000000001         10 A 11-OCT-10
Y155AAAAAAAA 000000000001         10 A 14-OCT-10
Y155AAAAAAAA 000000000001         10 A 17-OCT-10
Y155AAAAAAAA 000000000001         10 A 20-OCT-10
Y155AAAAAAAA 000000000001         10 A 23-OCT-10
Y155AAAAAAAA 000000000002         10 A 28-SEP-10
Y155AAAAAAAA 000000000002         10 A 01-OCT-10
...

Z78AAAAAAAAA 000000000020         10 A 17-JAN-11
Z78AAAAAAAAA 000000000020         10 A 20-JAN-11
Z78AAAAAAAAA 000000000020         10 A 23-JAN-11
Z78AAAAAAAAA 000000000020         10 A 26-JAN-11
Z78AAAAAAAAA 000000000020         10 A 29-JAN-11
Z78AAAAAAAAA 000000000020         10 A 01-FEB-11

720 rows selected.

In the above, you can see how the dates will automatically shift from one release of a customer PO to the next, as well as the order in which the rows will be inserted.  The actual dates and number of rows for these two customer POs will actually be a bit different when the rows are inserted into the table, so use the above as just an illustration.

Next, let’s insert 6,079,050 rows into the line detail table in a repeatable way:

INSERT INTO
  T1_LINES
SELECT
  CUSTPO,
  RELID,
  10 FCSTQTY,
  'A' FC,
  TO_DATE('01-JAN-2011','DD-MON-YYYY') + (V2.RN*3 - V1.REL*2) FCDUEDATE
FROM
  (SELECT /*+ NO_MERGE */
    CUSTPO,
    RELID,
    ROW_NUMBER() OVER (PARTITION BY RELID ORDER BY CUSTPO) RN,
    ROW_NUMBER() OVER (PARTITION BY CUSTPO ORDER BY RELID DESC) REL
  FROM
    T1) V1,
  (SELECT /*+ NO_MERGE */
    ROWNUM RN
  FROM
    DUAL
  CONNECT BY
    LEVEL<=100) V2
WHERE
  V2.RN < (MOD(V1.RN,100)+10)
ORDER BY
  RELID,
  CUSTPO,
  5; 

COMMIT;

Now let’s create table T2 (simulates table CUSTOMER_ORDER) and table T2_LINES (simulates table CUST_ORDER_LINE).  Table T2_LINES will be populated with 224,700 rows:

CREATE TABLE T2(
  ID VARCHAR2(15),
  CUSTOMER_ID VARCHAR2(15),
  CUSTOMER_PO_REF VARCHAR2(40),
  PADDING VARCHAR2(150),
  PRIMARY KEY(ID));

CREATE TABLE T2_LINES(
  CUST_ORDER_ID VARCHAR2(15),
  LINE_NO NUMBER,
  PART_ID VARCHAR2(30),
  DESIRED_SHIP_DATE DATE,
  PADDING VARCHAR2(100),
  PRIMARY KEY(CUST_ORDER_ID,LINE_NO)); 

INSERT INTO
  T2
SELECT DISTINCT
  CUSTPO ID,
  'AAAAAAAAAA' CUSTOMER_ID,
  CUSTPO CUSTOMER_PO_REF,
  RPAD('A',150,'A') PADDING
FROM
  T1
ORDER BY
  DBMS_RANDOM.VALUE;

INSERT INTO
  T2_LINES
SELECT
  CUST_ORDER_ID,
  ROW_NUMBER() OVER (PARTITION BY CUST_ORDER_ID ORDER BY DESIRED_SHIP_DATE) LINE_NO,
  RPAD(TO_CHAR((MOD(ROWNUM,3000)+1) * 33)||'PART',10,'A') PART_ID,
  DESIRED_SHIP_DATE,
  RPAD('A',100,'A') PADDING
FROM 
  (SELECT DISTINCT
    CUSTPO CUST_ORDER_ID,
    FCDUEDATE DESIRED_SHIP_DATE   
  FROM
    T1_LINES
  WHERE
    FCDUEDATE<= TO_DATE('15-JAN-2011','DD-MON-YYYY')) V1;

COMMIT;

Finally, we will gather statistics for the tables and the primary key indexes:

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T1',CASCADE=>TRUE)
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T1_LINES',CASCADE=>TRUE)
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T2',CASCADE=>TRUE)
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T2_LINES',CASCADE=>TRUE) 

We are now able to start transforming the original query sections, these sections will be listed one at a time, as they were in the earlier article:

    (SELECT
      E.CUSTPO,
      MAX(RELID) RELID
    FROM
      T1 E
    GROUP BY
      E.CUSTPO) E 

 The inline view that uses the above inline view: 

  (SELECT /*+ LEADING(E) */
    E.CUSTPO,
    EL.FCSTQTY,
    EL.FCDUEDATE
  FROM
    (SELECT
      E.CUSTPO,
      MAX(RELID) RELID
    FROM
      T1 E
    GROUP BY
      E.CUSTPO) E,
    T1_LINES EL
  WHERE
    E.CUSTPO=EL.CUSTPO
    AND E.RELID=EL.RELID
    AND EL.FCDUEDATE BETWEEN TO_DATE('01-JAN-2011','DD-MON-YYYY')-365 AND TO_DATE('01-JAN-2011','DD-MON-YYYY')+1200
    AND EL.FCSTQTY>0) E 

The inline view for the customer orders:

  (SELECT
    CO.CUSTOMER_PO_REF,
    MAX(COL.DESIRED_SHIP_DATE) DESIRED_SHIP_DATE,
    MAX(COL.PART_ID) PART_ID
  FROM
    T2 CO,
    T2_LINES COL
  WHERE
    CO.ID=COL.CUST_ORDER_ID
    AND CUSTOMER_PO_REF IS NOT NULL
    AND COL.DESIRED_SHIP_DATE>=SYSDATE-365
  GROUP BY
    CO.CUSTOMER_PO_REF) CO 

Finally, the full SQL statement with all of the inline views joined:

SELECT /*+ LEADING(E) */
  COALESCE(CO.PART_ID,'NOT-IN-SYSTEM') PART_ID,
  E.FCDUEDATE REQUIRED_DATE,
  SUM(E.FCSTQTY) QTY,
  1 PLAN_LEVEL
FROM
  (SELECT /*+ LEADING(E) */
    E.CUSTPO,
    EL.FCSTQTY,
    EL.FCDUEDATE
  FROM
    (SELECT
      E.CUSTPO,
      MAX(RELID) RELID
    FROM
      T1 E
    GROUP BY
      E.CUSTPO) E,
    T1_LINES EL
  WHERE
    E.CUSTPO=EL.CUSTPO
    AND E.RELID=EL.RELID
    AND EL.FCDUEDATE BETWEEN SYSDATE-365 AND SYSDATE+1200
    AND EL.FCSTQTY>0) E,
  (SELECT
    CO.CUSTOMER_PO_REF,
    MAX(COL.DESIRED_SHIP_DATE) DESIRED_SHIP_DATE,
    MAX(COL.PART_ID) PART_ID
  FROM
    T2 CO,
    T2_LINES COL
  WHERE
    CO.ID=COL.CUST_ORDER_ID
    AND CUSTOMER_PO_REF IS NOT NULL
    AND COL.DESIRED_SHIP_DATE>=SYSDATE-365
  GROUP BY
    CO.CUSTOMER_PO_REF) CO
WHERE
  E.CUSTPO=CO.CUSTOMER_PO_REF(+)
  AND E.FCDUEDATE>COALESCE(CO.DESIRED_SHIP_DATE,SYSDATE-365)
GROUP BY
  COALESCE(CO.PART_ID,'NOT-IN-SYSTEM'),
  E.FCDUEDATE
ORDER BY
  COALESCE(CO.PART_ID,'NOT-IN-SYSTEM'),
  E.FCDUEDATE; 

Next, we need a simple table to hold the contents of the INSERT INTO… SELECT statement:

CREATE TABLE T1_DEST (
  PART_ID VARCHAR2(30),
  REQUIRED_DATE DATE,
  QTY NUMBER,
  PLAN_LEVEL NUMBER); 

Now the script that will actually perform the test with different values for OPTIMIZER_FEATURES_ENABLE, OPTIMIZER_INDEX_COST_ADJ, and OPTIMIZER_INDEX_CACHING (Edit Dec 19, 2010: the original version of the script set the TRACEFILE_IDENTIFIER value identically for the SELECT and the INSERT INTO versions of the SQL statement when the OPTIMIZER_FEATURES_ENABLED parameter was greater than 10.1.0.4, and that error resulted in the SELECT and the INSERT INTO execution plans appearing in the same trace file, rather than in separate trace files – the script below has been corrected):


SET TIMING ON
SET ARRAYSIZE 1000
SET AUTOTRACE TRACEONLY STATISTICS

SPOOL Capture_Results.txt

ALTER SESSION SET OPTIMIZER_INDEX_CACHING=0;
ALTER SESSION SET OPTIMIZER_INDEX_COST_ADJ=100;

ALTER SESSION SET OPTIMIZER_FEATURES_ENABLE='10.1.0.4';
ALTER SESSION SET TRACEFILE_IDENTIFIER = 'SELECT_10.1.0.4';
ALTER SESSION SET EVENTS '10053 TRACE NAME CONTEXT FOREVER, LEVEL 1';

SELECT /*+ LEADING(E) */
  COALESCE(CO.PART_ID,'NOT-IN-SYSTEM') PART_ID,
  E.FCDUEDATE REQUIRED_DATE,
  SUM(E.FCSTQTY) QTY,
  1 PLAN_LEVEL
FROM
  (SELECT /*+ LEADING(E) */
    E.CUSTPO,
    EL.FCSTQTY,
    EL.FCDUEDATE
  FROM
    (SELECT /*+  */
      E.CUSTPO,
      MAX(RELID) RELID
    FROM
      T1 E
    GROUP BY
      E.CUSTPO) E,
    T1_LINES EL
  WHERE
    E.CUSTPO=EL.CUSTPO
    AND E.RELID=EL.RELID
    AND EL.FCDUEDATE BETWEEN SYSDATE-365 AND SYSDATE+1200
    AND EL.FCSTQTY>0) E,
  (SELECT /*+  */
    CO.CUSTOMER_PO_REF,
    MAX(COL.DESIRED_SHIP_DATE) DESIRED_SHIP_DATE,
    MAX(COL.PART_ID) PART_ID
  FROM
    T2 CO,
    T2_LINES COL
  WHERE
    CO.ID=COL.CUST_ORDER_ID
    AND CUSTOMER_PO_REF IS NOT NULL
    AND COL.DESIRED_SHIP_DATE>=SYSDATE-365
  GROUP BY
    CO.CUSTOMER_PO_REF) CO
WHERE
  E.CUSTPO=CO.CUSTOMER_PO_REF(+)
  AND E.FCDUEDATE>COALESCE(CO.DESIRED_SHIP_DATE,SYSDATE-365)
GROUP BY
  COALESCE(CO.PART_ID,'NOT-IN-SYSTEM'),
  E.FCDUEDATE
ORDER BY
  COALESCE(CO.PART_ID,'NOT-IN-SYSTEM'),
  E.FCDUEDATE;

ALTER SESSION SET EVENTS '10053 TRACE NAME CONTEXT OFF';

ALTER SESSION SET TRACEFILE_IDENTIFIER = 'INSERT_10.1.0.4';
ALTER SESSION SET EVENTS '10053 TRACE NAME CONTEXT FOREVER, LEVEL 1';

INSERT INTO T1_DEST
SELECT /*+ LEADING(E) */
  COALESCE(CO.PART_ID,'NOT-IN-SYSTEM') PART_ID,
  E.FCDUEDATE REQUIRED_DATE,
  SUM(E.FCSTQTY) QTY,
  1 PLAN_LEVEL
FROM
  (SELECT /*+ LEADING(E) */
    E.CUSTPO,
    EL.FCSTQTY,
    EL.FCDUEDATE
  FROM
    (SELECT /*+  */
      E.CUSTPO,
      MAX(RELID) RELID
    FROM
      T1 E
    GROUP BY
      E.CUSTPO) E,
    T1_LINES EL
  WHERE
    E.CUSTPO=EL.CUSTPO
    AND E.RELID=EL.RELID
    AND EL.FCDUEDATE BETWEEN SYSDATE-365 AND SYSDATE+1200
    AND EL.FCSTQTY>0) E,
  (SELECT /*+  */
    CO.CUSTOMER_PO_REF,
    MAX(COL.DESIRED_SHIP_DATE) DESIRED_SHIP_DATE,
    MAX(COL.PART_ID) PART_ID
  FROM
    T2 CO,
    T2_LINES COL
  WHERE
    CO.ID=COL.CUST_ORDER_ID
    AND CUSTOMER_PO_REF IS NOT NULL
    AND COL.DESIRED_SHIP_DATE>=SYSDATE-365
  GROUP BY
    CO.CUSTOMER_PO_REF) CO
WHERE
  E.CUSTPO=CO.CUSTOMER_PO_REF(+)
  AND E.FCDUEDATE>COALESCE(CO.DESIRED_SHIP_DATE,SYSDATE-365)
GROUP BY
  COALESCE(CO.PART_ID,'NOT-IN-SYSTEM'),
  E.FCDUEDATE
ORDER BY
  COALESCE(CO.PART_ID,'NOT-IN-SYSTEM'),
  E.FCDUEDATE;

ALTER SESSION SET EVENTS '10053 TRACE NAME CONTEXT OFF';

TRUNCATE TABLE T1_DEST;

ALTER SESSION SET OPTIMIZER_FEATURES_ENABLE='10.2.0.2';
ALTER SESSION SET TRACEFILE_IDENTIFIER = 'SELECT_10.2.0.2';
ALTER SESSION SET EVENTS '10053 TRACE NAME CONTEXT FOREVER, LEVEL 1';

SELECT /*+ LEADING(E) */
  COALESCE(CO.PART_ID,'NOT-IN-SYSTEM') PART_ID,
  E.FCDUEDATE REQUIRED_DATE,
  SUM(E.FCSTQTY) QTY,
  1 PLAN_LEVEL
FROM
  (SELECT /*+ LEADING(E) */
    E.CUSTPO,
    EL.FCSTQTY,
    EL.FCDUEDATE
  FROM
    (SELECT /*+  */
      E.CUSTPO,
      MAX(RELID) RELID
    FROM
      T1 E
    GROUP BY
      E.CUSTPO) E,
    T1_LINES EL
  WHERE
    E.CUSTPO=EL.CUSTPO
    AND E.RELID=EL.RELID
    AND EL.FCDUEDATE BETWEEN SYSDATE-365 AND SYSDATE+1200
    AND EL.FCSTQTY>0) E,
  (SELECT /*+  */
    CO.CUSTOMER_PO_REF,
    MAX(COL.DESIRED_SHIP_DATE) DESIRED_SHIP_DATE,
    MAX(COL.PART_ID) PART_ID
  FROM
    T2 CO,
    T2_LINES COL
  WHERE
    CO.ID=COL.CUST_ORDER_ID
    AND CUSTOMER_PO_REF IS NOT NULL
    AND COL.DESIRED_SHIP_DATE>=SYSDATE-365
  GROUP BY
    CO.CUSTOMER_PO_REF) CO
WHERE
  E.CUSTPO=CO.CUSTOMER_PO_REF(+)
  AND E.FCDUEDATE>COALESCE(CO.DESIRED_SHIP_DATE,SYSDATE-365)
GROUP BY
  COALESCE(CO.PART_ID,'NOT-IN-SYSTEM'),
  E.FCDUEDATE
ORDER BY
  COALESCE(CO.PART_ID,'NOT-IN-SYSTEM'),
  E.FCDUEDATE;

ALTER SESSION SET EVENTS '10053 TRACE NAME CONTEXT OFF';

ALTER SESSION SET TRACEFILE_IDENTIFIER = 'INSERT_10.2.0.2';
ALTER SESSION SET EVENTS '10053 TRACE NAME CONTEXT FOREVER, LEVEL 1';

INSERT INTO T1_DEST
SELECT /*+ LEADING(E) */
  COALESCE(CO.PART_ID,'NOT-IN-SYSTEM') PART_ID,
  E.FCDUEDATE REQUIRED_DATE,
  SUM(E.FCSTQTY) QTY,
  1 PLAN_LEVEL
FROM
  (SELECT /*+ LEADING(E) */
    E.CUSTPO,
    EL.FCSTQTY,
    EL.FCDUEDATE
  FROM
    (SELECT /*+  */
      E.CUSTPO,
      MAX(RELID) RELID
    FROM
      T1 E
    GROUP BY
      E.CUSTPO) E,
    T1_LINES EL
  WHERE
    E.CUSTPO=EL.CUSTPO
    AND E.RELID=EL.RELID
    AND EL.FCDUEDATE BETWEEN SYSDATE-365 AND SYSDATE+1200
    AND EL.FCSTQTY>0) E,
  (SELECT /*+  */
    CO.CUSTOMER_PO_REF,
    MAX(COL.DESIRED_SHIP_DATE) DESIRED_SHIP_DATE,
    MAX(COL.PART_ID) PART_ID
  FROM
    T2 CO,
    T2_LINES COL
  WHERE
    CO.ID=COL.CUST_ORDER_ID
    AND CUSTOMER_PO_REF IS NOT NULL
    AND COL.DESIRED_SHIP_DATE>=SYSDATE-365
  GROUP BY
    CO.CUSTOMER_PO_REF) CO
WHERE
  E.CUSTPO=CO.CUSTOMER_PO_REF(+)
  AND E.FCDUEDATE>COALESCE(CO.DESIRED_SHIP_DATE,SYSDATE-365)
GROUP BY
  COALESCE(CO.PART_ID,'NOT-IN-SYSTEM'),
  E.FCDUEDATE
ORDER BY
  COALESCE(CO.PART_ID,'NOT-IN-SYSTEM'),
  E.FCDUEDATE;

ALTER SESSION SET EVENTS '10053 TRACE NAME CONTEXT OFF';

TRUNCATE TABLE T1_DEST;

ALTER SESSION SET OPTIMIZER_FEATURES_ENABLE='11.1.0.7';
ALTER SESSION SET TRACEFILE_IDENTIFIER = 'SELECT_11.1.0.7';
ALTER SESSION SET EVENTS '10053 TRACE NAME CONTEXT FOREVER, LEVEL 1';

SELECT /*+ LEADING(E) */
  COALESCE(CO.PART_ID,'NOT-IN-SYSTEM') PART_ID,
  E.FCDUEDATE REQUIRED_DATE,
  SUM(E.FCSTQTY) QTY,
  1 PLAN_LEVEL
FROM
  (SELECT /*+ LEADING(E) */
    E.CUSTPO,
    EL.FCSTQTY,
    EL.FCDUEDATE
  FROM
    (SELECT /*+  */
      E.CUSTPO,
      MAX(RELID) RELID
    FROM
      T1 E
    GROUP BY
      E.CUSTPO) E,
    T1_LINES EL
  WHERE
    E.CUSTPO=EL.CUSTPO
    AND E.RELID=EL.RELID
    AND EL.FCDUEDATE BETWEEN SYSDATE-365 AND SYSDATE+1200
    AND EL.FCSTQTY>0) E,
  (SELECT /*+  */
    CO.CUSTOMER_PO_REF,
    MAX(COL.DESIRED_SHIP_DATE) DESIRED_SHIP_DATE,
    MAX(COL.PART_ID) PART_ID
  FROM
    T2 CO,
    T2_LINES COL
  WHERE
    CO.ID=COL.CUST_ORDER_ID
    AND CUSTOMER_PO_REF IS NOT NULL
    AND COL.DESIRED_SHIP_DATE>=SYSDATE-365
  GROUP BY
    CO.CUSTOMER_PO_REF) CO
WHERE
  E.CUSTPO=CO.CUSTOMER_PO_REF(+)
  AND E.FCDUEDATE>COALESCE(CO.DESIRED_SHIP_DATE,SYSDATE-365)
GROUP BY
  COALESCE(CO.PART_ID,'NOT-IN-SYSTEM'),
  E.FCDUEDATE
ORDER BY
  COALESCE(CO.PART_ID,'NOT-IN-SYSTEM'),
  E.FCDUEDATE;

ALTER SESSION SET EVENTS '10053 TRACE NAME CONTEXT OFF';

ALTER SESSION SET TRACEFILE_IDENTIFIER = 'INSERT_11.1.0.7';
ALTER SESSION SET EVENTS '10053 TRACE NAME CONTEXT FOREVER, LEVEL 1';
INSERT INTO T1_DEST
SELECT /*+ LEADING(E) */
  COALESCE(CO.PART_ID,'NOT-IN-SYSTEM') PART_ID,
  E.FCDUEDATE REQUIRED_DATE,
  SUM(E.FCSTQTY) QTY,
  1 PLAN_LEVEL
FROM
  (SELECT /*+ LEADING(E) */
    E.CUSTPO,
    EL.FCSTQTY,
    EL.FCDUEDATE
  FROM
    (SELECT /*+  */
      E.CUSTPO,
      MAX(RELID) RELID
    FROM
      T1 E
    GROUP BY
      E.CUSTPO) E,
    T1_LINES EL
  WHERE
    E.CUSTPO=EL.CUSTPO
    AND E.RELID=EL.RELID
    AND EL.FCDUEDATE BETWEEN SYSDATE-365 AND SYSDATE+1200
    AND EL.FCSTQTY>0) E,
  (SELECT /*+  */
    CO.CUSTOMER_PO_REF,
    MAX(COL.DESIRED_SHIP_DATE) DESIRED_SHIP_DATE,
    MAX(COL.PART_ID) PART_ID
  FROM
    T2 CO,
    T2_LINES COL
  WHERE
    CO.ID=COL.CUST_ORDER_ID
    AND CUSTOMER_PO_REF IS NOT NULL
    AND COL.DESIRED_SHIP_DATE>=SYSDATE-365
  GROUP BY
    CO.CUSTOMER_PO_REF) CO
WHERE
  E.CUSTPO=CO.CUSTOMER_PO_REF(+)
  AND E.FCDUEDATE>COALESCE(CO.DESIRED_SHIP_DATE,SYSDATE-365)
GROUP BY
  COALESCE(CO.PART_ID,'NOT-IN-SYSTEM'),
  E.FCDUEDATE
ORDER BY
  COALESCE(CO.PART_ID,'NOT-IN-SYSTEM'),
  E.FCDUEDATE;

ALTER SESSION SET EVENTS '10053 TRACE NAME CONTEXT OFF';

TRUNCATE TABLE T1_DEST;

ALTER SESSION SET OPTIMIZER_FEATURES_ENABLE='11.2.0.1';
ALTER SESSION SET TRACEFILE_IDENTIFIER = 'SELECT_11.2.0.1';
ALTER SESSION SET EVENTS '10053 TRACE NAME CONTEXT FOREVER, LEVEL 1';

SELECT /*+ LEADING(E) */
  COALESCE(CO.PART_ID,'NOT-IN-SYSTEM') PART_ID,
  E.FCDUEDATE REQUIRED_DATE,
  SUM(E.FCSTQTY) QTY,
  1 PLAN_LEVEL
FROM
  (SELECT /*+ LEADING(E) */
    E.CUSTPO,
    EL.FCSTQTY,
    EL.FCDUEDATE
  FROM
    (SELECT /*+  */
      E.CUSTPO,
      MAX(RELID) RELID
    FROM
      T1 E
    GROUP BY
      E.CUSTPO) E,
    T1_LINES EL
  WHERE
    E.CUSTPO=EL.CUSTPO
    AND E.RELID=EL.RELID
    AND EL.FCDUEDATE BETWEEN SYSDATE-365 AND SYSDATE+1200
    AND EL.FCSTQTY>0) E,
  (SELECT /*+  */
    CO.CUSTOMER_PO_REF,
    MAX(COL.DESIRED_SHIP_DATE) DESIRED_SHIP_DATE,
    MAX(COL.PART_ID) PART_ID
  FROM
    T2 CO,
    T2_LINES COL
  WHERE
    CO.ID=COL.CUST_ORDER_ID
    AND CUSTOMER_PO_REF IS NOT NULL
    AND COL.DESIRED_SHIP_DATE>=SYSDATE-365
  GROUP BY
    CO.CUSTOMER_PO_REF) CO
WHERE
  E.CUSTPO=CO.CUSTOMER_PO_REF(+)
  AND E.FCDUEDATE>COALESCE(CO.DESIRED_SHIP_DATE,SYSDATE-365)
GROUP BY
  COALESCE(CO.PART_ID,'NOT-IN-SYSTEM'),
  E.FCDUEDATE
ORDER BY
  COALESCE(CO.PART_ID,'NOT-IN-SYSTEM'),
  E.FCDUEDATE;

ALTER SESSION SET EVENTS '10053 TRACE NAME CONTEXT OFF';

ALTER SESSION SET TRACEFILE_IDENTIFIER = 'INSERT_11.2.0.1';
ALTER SESSION SET EVENTS '10053 TRACE NAME CONTEXT FOREVER, LEVEL 1';

INSERT INTO T1_DEST
SELECT /*+ LEADING(E) */
  COALESCE(CO.PART_ID,'NOT-IN-SYSTEM') PART_ID,
  E.FCDUEDATE REQUIRED_DATE,
  SUM(E.FCSTQTY) QTY,
  1 PLAN_LEVEL
FROM
  (SELECT /*+ LEADING(E) */
    E.CUSTPO,
    EL.FCSTQTY,
    EL.FCDUEDATE
  FROM
    (SELECT /*+  */
      E.CUSTPO,
      MAX(RELID) RELID
    FROM
      T1 E
    GROUP BY
      E.CUSTPO) E,
    T1_LINES EL
  WHERE
    E.CUSTPO=EL.CUSTPO
    AND E.RELID=EL.RELID
    AND EL.FCDUEDATE BETWEEN SYSDATE-365 AND SYSDATE+1200
    AND EL.FCSTQTY>0) E,
  (SELECT /*+  */
    CO.CUSTOMER_PO_REF,
    MAX(COL.DESIRED_SHIP_DATE) DESIRED_SHIP_DATE,
    MAX(COL.PART_ID) PART_ID
  FROM
    T2 CO,
    T2_LINES COL
  WHERE
    CO.ID=COL.CUST_ORDER_ID
    AND CUSTOMER_PO_REF IS NOT NULL
    AND COL.DESIRED_SHIP_DATE>=SYSDATE-365
  GROUP BY
    CO.CUSTOMER_PO_REF) CO
WHERE
  E.CUSTPO=CO.CUSTOMER_PO_REF(+)
  AND E.FCDUEDATE>COALESCE(CO.DESIRED_SHIP_DATE,SYSDATE-365)
GROUP BY
  COALESCE(CO.PART_ID,'NOT-IN-SYSTEM'),
  E.FCDUEDATE
ORDER BY
  COALESCE(CO.PART_ID,'NOT-IN-SYSTEM'),
  E.FCDUEDATE;

ALTER SESSION SET EVENTS '10053 TRACE NAME CONTEXT OFF';

SPOOL Capture_Results80.txt

TRUNCATE TABLE T1_DEST;

ALTER SESSION SET OPTIMIZER_INDEX_CACHING=100;
ALTER SESSION SET OPTIMIZER_INDEX_COST_ADJ=80;

ALTER SESSION SET OPTIMIZER_FEATURES_ENABLE='10.1.0.4';
ALTER SESSION SET TRACEFILE_IDENTIFIER = 'SELECT_10.1.0.4_80';
ALTER SESSION SET EVENTS '10053 TRACE NAME CONTEXT FOREVER, LEVEL 1';

SELECT /*+ LEADING(E) */
  COALESCE(CO.PART_ID,'NOT-IN-SYSTEM') PART_ID,
  E.FCDUEDATE REQUIRED_DATE,
  SUM(E.FCSTQTY) QTY,
  1 PLAN_LEVEL
FROM
  (SELECT /*+ LEADING(E) */
    E.CUSTPO,
    EL.FCSTQTY,
    EL.FCDUEDATE
  FROM
    (SELECT /*+ _80 */
      E.CUSTPO,
      MAX(RELID) RELID
    FROM
      T1 E
    GROUP BY
      E.CUSTPO) E,
    T1_LINES EL
  WHERE
    E.CUSTPO=EL.CUSTPO
    AND E.RELID=EL.RELID
    AND EL.FCDUEDATE BETWEEN SYSDATE-365 AND SYSDATE+1200
    AND EL.FCSTQTY>0) E,
  (SELECT /*+  */
    CO.CUSTOMER_PO_REF,
    MAX(COL.DESIRED_SHIP_DATE) DESIRED_SHIP_DATE,
    MAX(COL.PART_ID) PART_ID
  FROM
    T2 CO,
    T2_LINES COL
  WHERE
    CO.ID=COL.CUST_ORDER_ID
    AND CUSTOMER_PO_REF IS NOT NULL
    AND COL.DESIRED_SHIP_DATE>=SYSDATE-365
  GROUP BY
    CO.CUSTOMER_PO_REF) CO
WHERE
  E.CUSTPO=CO.CUSTOMER_PO_REF(+)
  AND E.FCDUEDATE>COALESCE(CO.DESIRED_SHIP_DATE,SYSDATE-365)
GROUP BY
  COALESCE(CO.PART_ID,'NOT-IN-SYSTEM'),
  E.FCDUEDATE
ORDER BY
  COALESCE(CO.PART_ID,'NOT-IN-SYSTEM'),
  E.FCDUEDATE;

ALTER SESSION SET EVENTS '10053 TRACE NAME CONTEXT OFF';

ALTER SESSION SET TRACEFILE_IDENTIFIER = 'INSERT_10.1.0.4_80';
ALTER SESSION SET EVENTS '10053 TRACE NAME CONTEXT FOREVER, LEVEL 1';

INSERT INTO T1_DEST
SELECT /*+ LEADING(E) */
  COALESCE(CO.PART_ID,'NOT-IN-SYSTEM') PART_ID,
  E.FCDUEDATE REQUIRED_DATE,
  SUM(E.FCSTQTY) QTY,
  1 PLAN_LEVEL
FROM
  (SELECT /*+ LEADING(E) */
    E.CUSTPO,
    EL.FCSTQTY,
    EL.FCDUEDATE
  FROM
    (SELECT /*+ _80 */
      E.CUSTPO,
      MAX(RELID) RELID
    FROM
      T1 E
    GROUP BY
      E.CUSTPO) E,
    T1_LINES EL
  WHERE
    E.CUSTPO=EL.CUSTPO
    AND E.RELID=EL.RELID
    AND EL.FCDUEDATE BETWEEN SYSDATE-365 AND SYSDATE+1200
    AND EL.FCSTQTY>0) E,
  (SELECT /*+  */
    CO.CUSTOMER_PO_REF,
    MAX(COL.DESIRED_SHIP_DATE) DESIRED_SHIP_DATE,
    MAX(COL.PART_ID) PART_ID
  FROM
    T2 CO,
    T2_LINES COL
  WHERE
    CO.ID=COL.CUST_ORDER_ID
    AND CUSTOMER_PO_REF IS NOT NULL
    AND COL.DESIRED_SHIP_DATE>=SYSDATE-365
  GROUP BY
    CO.CUSTOMER_PO_REF) CO
WHERE
  E.CUSTPO=CO.CUSTOMER_PO_REF(+)
  AND E.FCDUEDATE>COALESCE(CO.DESIRED_SHIP_DATE,SYSDATE-365)
GROUP BY
  COALESCE(CO.PART_ID,'NOT-IN-SYSTEM'),
  E.FCDUEDATE
ORDER BY
  COALESCE(CO.PART_ID,'NOT-IN-SYSTEM'),
  E.FCDUEDATE;

ALTER SESSION SET EVENTS '10053 TRACE NAME CONTEXT OFF';

TRUNCATE TABLE T1_DEST;

ALTER SESSION SET OPTIMIZER_FEATURES_ENABLE='10.2.0.2';
ALTER SESSION SET TRACEFILE_IDENTIFIER = 'SELECT_10.2.0.2_80';
ALTER SESSION SET EVENTS '10053 TRACE NAME CONTEXT FOREVER, LEVEL 1';

SELECT /*+ LEADING(E) */
  COALESCE(CO.PART_ID,'NOT-IN-SYSTEM') PART_ID,
  E.FCDUEDATE REQUIRED_DATE,
  SUM(E.FCSTQTY) QTY,
  1 PLAN_LEVEL
FROM
  (SELECT /*+ LEADING(E) */
    E.CUSTPO,
    EL.FCSTQTY,
    EL.FCDUEDATE
  FROM
    (SELECT /*+ _80 */
      E.CUSTPO,
      MAX(RELID) RELID
    FROM
      T1 E
    GROUP BY
      E.CUSTPO) E,
    T1_LINES EL
  WHERE
    E.CUSTPO=EL.CUSTPO
    AND E.RELID=EL.RELID
    AND EL.FCDUEDATE BETWEEN SYSDATE-365 AND SYSDATE+1200
    AND EL.FCSTQTY>0) E,
  (SELECT /*+  */
    CO.CUSTOMER_PO_REF,
    MAX(COL.DESIRED_SHIP_DATE) DESIRED_SHIP_DATE,
    MAX(COL.PART_ID) PART_ID
  FROM
    T2 CO,
    T2_LINES COL
  WHERE
    CO.ID=COL.CUST_ORDER_ID
    AND CUSTOMER_PO_REF IS NOT NULL
    AND COL.DESIRED_SHIP_DATE>=SYSDATE-365
  GROUP BY
    CO.CUSTOMER_PO_REF) CO
WHERE
  E.CUSTPO=CO.CUSTOMER_PO_REF(+)
  AND E.FCDUEDATE>COALESCE(CO.DESIRED_SHIP_DATE,SYSDATE-365)
GROUP BY
  COALESCE(CO.PART_ID,'NOT-IN-SYSTEM'),
  E.FCDUEDATE
ORDER BY
  COALESCE(CO.PART_ID,'NOT-IN-SYSTEM'),
  E.FCDUEDATE;

ALTER SESSION SET EVENTS '10053 TRACE NAME CONTEXT OFF';

ALTER SESSION SET TRACEFILE_IDENTIFIER = 'INSERT_10.2.0.2_80';
ALTER SESSION SET EVENTS '10053 TRACE NAME CONTEXT FOREVER, LEVEL 1';

INSERT INTO T1_DEST
SELECT /*+ LEADING(E) */
  COALESCE(CO.PART_ID,'NOT-IN-SYSTEM') PART_ID,
  E.FCDUEDATE REQUIRED_DATE,
  SUM(E.FCSTQTY) QTY,
  1 PLAN_LEVEL
FROM
  (SELECT /*+ LEADING(E) */
    E.CUSTPO,
    EL.FCSTQTY,
    EL.FCDUEDATE
  FROM
    (SELECT /*+ _80 */
      E.CUSTPO,
      MAX(RELID) RELID
    FROM
      T1 E
    GROUP BY
      E.CUSTPO) E,
    T1_LINES EL
  WHERE
    E.CUSTPO=EL.CUSTPO
    AND E.RELID=EL.RELID
    AND EL.FCDUEDATE BETWEEN SYSDATE-365 AND SYSDATE+1200
    AND EL.FCSTQTY>0) E,
  (SELECT /*+  */
    CO.CUSTOMER_PO_REF,
    MAX(COL.DESIRED_SHIP_DATE) DESIRED_SHIP_DATE,
    MAX(COL.PART_ID) PART_ID
  FROM
    T2 CO,
    T2_LINES COL
  WHERE
    CO.ID=COL.CUST_ORDER_ID
    AND CUSTOMER_PO_REF IS NOT NULL
    AND COL.DESIRED_SHIP_DATE>=SYSDATE-365
  GROUP BY
    CO.CUSTOMER_PO_REF) CO
WHERE
  E.CUSTPO=CO.CUSTOMER_PO_REF(+)
  AND E.FCDUEDATE>COALESCE(CO.DESIRED_SHIP_DATE,SYSDATE-365)
GROUP BY
  COALESCE(CO.PART_ID,'NOT-IN-SYSTEM'),
  E.FCDUEDATE
ORDER BY
  COALESCE(CO.PART_ID,'NOT-IN-SYSTEM'),
  E.FCDUEDATE;

ALTER SESSION SET EVENTS '10053 TRACE NAME CONTEXT OFF';

TRUNCATE TABLE T1_DEST;

ALTER SESSION SET OPTIMIZER_FEATURES_ENABLE='11.1.0.7';
ALTER SESSION SET TRACEFILE_IDENTIFIER = 'SELECT_11.1.0.7_80';
ALTER SESSION SET EVENTS '10053 TRACE NAME CONTEXT FOREVER, LEVEL 1';

SELECT /*+ LEADING(E) */
  COALESCE(CO.PART_ID,'NOT-IN-SYSTEM') PART_ID,
  E.FCDUEDATE REQUIRED_DATE,
  SUM(E.FCSTQTY) QTY,
  1 PLAN_LEVEL
FROM
  (SELECT /*+ LEADING(E) */
    E.CUSTPO,
    EL.FCSTQTY,
    EL.FCDUEDATE
  FROM
    (SELECT /*+ _80 */
      E.CUSTPO,
      MAX(RELID) RELID
    FROM
      T1 E
    GROUP BY
      E.CUSTPO) E,
    T1_LINES EL
  WHERE
    E.CUSTPO=EL.CUSTPO
    AND E.RELID=EL.RELID
    AND EL.FCDUEDATE BETWEEN SYSDATE-365 AND SYSDATE+1200
    AND EL.FCSTQTY>0) E,
  (SELECT /*+  */
    CO.CUSTOMER_PO_REF,
    MAX(COL.DESIRED_SHIP_DATE) DESIRED_SHIP_DATE,
    MAX(COL.PART_ID) PART_ID
  FROM
    T2 CO,
    T2_LINES COL
  WHERE
    CO.ID=COL.CUST_ORDER_ID
    AND CUSTOMER_PO_REF IS NOT NULL
    AND COL.DESIRED_SHIP_DATE>=SYSDATE-365
  GROUP BY
    CO.CUSTOMER_PO_REF) CO
WHERE
  E.CUSTPO=CO.CUSTOMER_PO_REF(+)
  AND E.FCDUEDATE>COALESCE(CO.DESIRED_SHIP_DATE,SYSDATE-365)
GROUP BY
  COALESCE(CO.PART_ID,'NOT-IN-SYSTEM'),
  E.FCDUEDATE
ORDER BY
  COALESCE(CO.PART_ID,'NOT-IN-SYSTEM'),
  E.FCDUEDATE;

ALTER SESSION SET EVENTS '10053 TRACE NAME CONTEXT OFF';

ALTER SESSION SET TRACEFILE_IDENTIFIER = 'INSERT_11.1.0.7_80';
ALTER SESSION SET EVENTS '10053 TRACE NAME CONTEXT FOREVER, LEVEL 1';

INSERT INTO T1_DEST
SELECT /*+ LEADING(E) */
  COALESCE(CO.PART_ID,'NOT-IN-SYSTEM') PART_ID,
  E.FCDUEDATE REQUIRED_DATE,
  SUM(E.FCSTQTY) QTY,
  1 PLAN_LEVEL
FROM
  (SELECT /*+ LEADING(E) */
    E.CUSTPO,
    EL.FCSTQTY,
    EL.FCDUEDATE
  FROM
    (SELECT /*+ _80 */
      E.CUSTPO,
      MAX(RELID) RELID
    FROM
      T1 E
    GROUP BY
      E.CUSTPO) E,
    T1_LINES EL
  WHERE
    E.CUSTPO=EL.CUSTPO
    AND E.RELID=EL.RELID
    AND EL.FCDUEDATE BETWEEN SYSDATE-365 AND SYSDATE+1200
    AND EL.FCSTQTY>0) E,
  (SELECT /*+  */
    CO.CUSTOMER_PO_REF,
    MAX(COL.DESIRED_SHIP_DATE) DESIRED_SHIP_DATE,
    MAX(COL.PART_ID) PART_ID
  FROM
    T2 CO,
    T2_LINES COL
  WHERE
    CO.ID=COL.CUST_ORDER_ID
    AND CUSTOMER_PO_REF IS NOT NULL
    AND COL.DESIRED_SHIP_DATE>=SYSDATE-365
  GROUP BY
    CO.CUSTOMER_PO_REF) CO
WHERE
  E.CUSTPO=CO.CUSTOMER_PO_REF(+)
  AND E.FCDUEDATE>COALESCE(CO.DESIRED_SHIP_DATE,SYSDATE-365)
GROUP BY
  COALESCE(CO.PART_ID,'NOT-IN-SYSTEM'),
  E.FCDUEDATE
ORDER BY
  COALESCE(CO.PART_ID,'NOT-IN-SYSTEM'),
  E.FCDUEDATE;

ALTER SESSION SET EVENTS '10053 TRACE NAME CONTEXT OFF';

TRUNCATE TABLE T1_DEST;

ALTER SESSION SET OPTIMIZER_FEATURES_ENABLE='11.2.0.1';
ALTER SESSION SET TRACEFILE_IDENTIFIER = 'SELECT_11.2.0.1_80';
ALTER SESSION SET EVENTS '10053 TRACE NAME CONTEXT FOREVER, LEVEL 1';

SELECT /*+ LEADING(E) */
  COALESCE(CO.PART_ID,'NOT-IN-SYSTEM') PART_ID,
  E.FCDUEDATE REQUIRED_DATE,
  SUM(E.FCSTQTY) QTY,
  1 PLAN_LEVEL
FROM
  (SELECT /*+ LEADING(E) */
    E.CUSTPO,
    EL.FCSTQTY,
    EL.FCDUEDATE
  FROM
    (SELECT /*+ _80 */
      E.CUSTPO,
      MAX(RELID) RELID
    FROM
      T1 E
    GROUP BY
      E.CUSTPO) E,
    T1_LINES EL
  WHERE
    E.CUSTPO=EL.CUSTPO
    AND E.RELID=EL.RELID
    AND EL.FCDUEDATE BETWEEN SYSDATE-365 AND SYSDATE+1200
    AND EL.FCSTQTY>0) E,
  (SELECT /*+  */
    CO.CUSTOMER_PO_REF,
    MAX(COL.DESIRED_SHIP_DATE) DESIRED_SHIP_DATE,
    MAX(COL.PART_ID) PART_ID
  FROM
    T2 CO,
    T2_LINES COL
  WHERE
    CO.ID=COL.CUST_ORDER_ID
    AND CUSTOMER_PO_REF IS NOT NULL
    AND COL.DESIRED_SHIP_DATE>=SYSDATE-365
  GROUP BY
    CO.CUSTOMER_PO_REF) CO
WHERE
  E.CUSTPO=CO.CUSTOMER_PO_REF(+)
  AND E.FCDUEDATE>COALESCE(CO.DESIRED_SHIP_DATE,SYSDATE-365)
GROUP BY
  COALESCE(CO.PART_ID,'NOT-IN-SYSTEM'),
  E.FCDUEDATE
ORDER BY
  COALESCE(CO.PART_ID,'NOT-IN-SYSTEM'),
  E.FCDUEDATE;

ALTER SESSION SET EVENTS '10053 TRACE NAME CONTEXT OFF';

ALTER SESSION SET TRACEFILE_IDENTIFIER = 'INSERT_11.2.0.1_80';
ALTER SESSION SET EVENTS '10053 TRACE NAME CONTEXT FOREVER, LEVEL 1';

INSERT INTO T1_DEST
SELECT /*+ LEADING(E) */
  COALESCE(CO.PART_ID,'NOT-IN-SYSTEM') PART_ID,
  E.FCDUEDATE REQUIRED_DATE,
  SUM(E.FCSTQTY) QTY,
  1 PLAN_LEVEL
FROM
  (SELECT /*+ LEADING(E) */
    E.CUSTPO,
    EL.FCSTQTY,
    EL.FCDUEDATE
  FROM
    (SELECT /*+ _80 */
      E.CUSTPO,
      MAX(RELID) RELID
    FROM
      T1 E
    GROUP BY
      E.CUSTPO) E,
    T1_LINES EL
  WHERE
    E.CUSTPO=EL.CUSTPO
    AND E.RELID=EL.RELID
    AND EL.FCDUEDATE BETWEEN SYSDATE-365 AND SYSDATE+1200
    AND EL.FCSTQTY>0) E,
  (SELECT /*+  */
    CO.CUSTOMER_PO_REF,
    MAX(COL.DESIRED_SHIP_DATE) DESIRED_SHIP_DATE,
    MAX(COL.PART_ID) PART_ID
  FROM
    T2 CO,
    T2_LINES COL
  WHERE
    CO.ID=COL.CUST_ORDER_ID
    AND CUSTOMER_PO_REF IS NOT NULL
    AND COL.DESIRED_SHIP_DATE>=SYSDATE-365
  GROUP BY
    CO.CUSTOMER_PO_REF) CO
WHERE
  E.CUSTPO=CO.CUSTOMER_PO_REF(+)
  AND E.FCDUEDATE>COALESCE(CO.DESIRED_SHIP_DATE,SYSDATE-365)
GROUP BY
  COALESCE(CO.PART_ID,'NOT-IN-SYSTEM'),
  E.FCDUEDATE
ORDER BY
  COALESCE(CO.PART_ID,'NOT-IN-SYSTEM'),
  E.FCDUEDATE;

ALTER SESSION SET EVENTS '10053 TRACE NAME CONTEXT OFF';

SPOOL OFF

For this blog article, I decided to use the Enterprise Edition of Oracle Database 11.2.0.1, with the following manually set system (CPU) statistics (as printed in the 10053 trace files):

SYSTEM STATISTICS INFORMATION
-----------------------------
  Using WORKLOAD Stats
  CPUSPEED: 2664 millions instructions/sec
  SREADTIM: 8.000000 milliseconds
  MREADTIM: 10.000000 millisecons
  MBRC: 16 blocks
  MAXTHR: 19181568 bytes/sec
  SLAVETHR: -1 bytes/sec 

The tablespace used for the test case uses ASSM with auto-allocated extent sizes, and the following instance-wide parameters specified: 

SGA_TARGET =          12000M
PGA_AGGREGATE_TARGET = 2000M
OPTIMIZER_MODE =    ALL_ROWS

Just for fun, for now I will show you only the first two execution plans from the 10053 trace files:

SELECT_10.1.0.4 (OPTIMIZER_FEATURES_ENABLE=’10.1.0.4′, OPTIMIZER_INDEX_CACHING=0, OPTIMIZER_INDEX_COST_ADJ=100), completed in 0.23 seconds when selecting 10,478 rows:

--------------------------------------------------+-----------------------------------+
| Id  | Operation                    | Name       | Rows  | Bytes | Cost  | Time      |
--------------------------------------------------+-----------------------------------+
| 0   | SELECT STATEMENT             |            |       |       |  1425 |           |
| 1   |  SORT GROUP BY               |            |   487 |   46K |  1425 |  00:00:12 |
| 2   |   FILTER                     |            |       |       |       |           |
| 3   |    HASH JOIN RIGHT OUTER     |            |   19K | 1884K |  1423 |  00:00:12 |
| 4   |     VIEW                     |            |   700 |   27K |   368 |  00:00:03 |
| 5   |      SORT GROUP BY           |            |   700 |   40K |   368 |  00:00:03 |
| 6   |       HASH JOIN              |            |  146K | 8475K |   362 |  00:00:03 |
| 7   |        TABLE ACCESS FULL     | T2         |   700 |   18K |     3 |  00:00:01 |
| 8   |        TABLE ACCESS FULL     | T2_LINES   |  146K | 4676K |   358 |  00:00:03 |
| 9   |     NESTED LOOPS             |            |   19K | 1126K |  1055 |  00:00:09 |
| 10  |      VIEW                    |            |   700 |   14K |    65 |  00:00:01 |
| 11  |       SORT GROUP BY          |            |   700 |   18K |    65 |  00:00:01 |
| 12  |        FILTER                |            |       |       |       |           |
| 13  |         INDEX FAST FULL SCAN | T1_PK      |  106K | 2755K |    60 |  00:00:01 |
| 14  |      INDEX RANGE SCAN        | T1_LINES_PK|    28 |  1036 |     3 |  00:00:01 |
--------------------------------------------------+-----------------------------------+
Predicate Information:
----------------------
2 - filter(("EL"."FCDUEDATE">COALESCE("CO"."DESIRED_SHIP_DATE",SYSDATE@!-365) AND COALESCE("CO"."DESIRED_SHIP_DATE",SYSDATE@!-365)<SYSDATE@!+1200))
3 - access("E"."CUSTPO"="CO"."CUSTOMER_PO_REF")
6 - access("CO"."ID"="COL"."CUST_ORDER_ID")
7 - filter("CUSTOMER_PO_REF" IS NOT NULL)
8 - filter("COL"."DESIRED_SHIP_DATE">=SYSDATE@!-365)
12 - filter(SYSDATE@!-365<=SYSDATE@!+1200)
14 - access("E"."CUSTPO"="EL"."CUSTPO" AND "E"."RELID"="EL"."RELID" AND "EL"."FCDUEDATE">=SYSDATE@!-365 AND "EL"."FCSTQTY">0 AND "EL"."FCDUEDATE"<=SYSDATE@!+1200)
14 - filter("EL"."FCSTQTY">0) 

INSERT_10.1.0.4 (OPTIMIZER_FEATURES_ENABLE=’10.1.0.4′, OPTIMIZER_INDEX_CACHING=0, OPTIMIZER_INDEX_COST_ADJ=100), completed in… 30 minutes, 18.01 seconds when inserting 10,478 rows:

-------------------------------------------------+-----------------------------------+
| Id  | Operation                      | Name    | Rows  | Bytes | Cost  | Time      |
-------------------------------------------------+-----------------------------------+
| 0   | INSERT STATEMENT               |         |       |       | 4967K |           |
| 1   |  LOAD TABLE CONVENTIONAL       |         |       |       |       |           |
| 2   |   SORT GROUP BY                |         |   842 |   60K | 4967K |  11:18:11 |
| 3   |    FILTER                      |         |       |       |       |           |
| 4   |     HASH JOIN OUTER            |         |   842 |   60K | 4967K |  11:18:11 |
| 5   |      VIEW                      |         |   842 |   28K | 4967K |  11:18:08 |
| 6   |       FILTER                   |         |       |       |       |           |
| 7   |        SORT GROUP BY           |         |   842 |   53K | 4967K |  11:18:08 |
| 8   |         FILTER                 |         |       |       |       |           |
| 9   |          HASH JOIN             |         |  852M |   54G |  7035 |  00:00:57 |
| 10  |           INDEX FAST FULL SCAN | T1_PK   |  106K | 2755K |    60 |  00:00:01 |
| 11  |           TABLE ACCESS FULL    | T1_LINES| 5631K |  214M |  3388 |  00:00:28 |
| 12  |      VIEW                      |         |   700 |   27K |   368 |  00:00:03 |
| 13  |       SORT GROUP BY            |         |   700 |   40K |   368 |  00:00:03 |
| 14  |        HASH JOIN               |         |  146K | 8475K |   362 |  00:00:03 |
| 15  |         TABLE ACCESS FULL      | T2      |   700 |   18K |     3 |  00:00:01 |
| 16  |         TABLE ACCESS FULL      | T2_LINES|  146K | 4676K |   358 |  00:00:03 |
-------------------------------------------------+-----------------------------------+
Predicate Information:
----------------------
3 - filter("E"."FCDUEDATE">COALESCE("CO"."DESIRED_SHIP_DATE",SYSDATE@!-365))
4 - access("E"."CUSTPO"="CO"."CUSTOMER_PO_REF")
6 - filter("EL"."RELID"=MAX("RELID"))
8 - filter(SYSDATE@!-365<=SYSDATE@!+1200)
9 - access("E"."CUSTPO"="EL"."CUSTPO")
11 - filter(("EL"."FCDUEDATE">=SYSDATE@!-365 AND "EL"."FCDUEDATE"<=SYSDATE@!+1200 AND "EL"."FCSTQTY">0))
14 - access("CO"."ID"="COL"."CUST_ORDER_ID")
15 - filter("CUSTOMER_PO_REF" IS NOT NULL)
16 - filter("COL"."DESIRED_SHIP_DATE">=SYSDATE@!-365) 

——————-

Just one more example of the value of test cases.  By the way, make certain that the air conditioning is working well in the room where your test server is located – your test server could be swamped for a couple of hours while the table creation scripts run and the actual test script is executed.  So, what is ahead in part three of this blog article series?

(P.S. Did anyone’s test result for the first INSERT INTO statement complete in less than 30 minutes?)





Graphical Work Center Utilization – Creating the Demo Data and Active Server Page

1 09 2010

September 1, 2010

Today’s blog article provides a graphical view of production work areas on a factory floor, providing feedback to indicate when the production area is in use.  The blog article includes a classic ASP web page which uses VBScript to write the web page on the fly to the browser on the client computer.  The web page written to the client browser automatically refreshes itself every 30 seconds, automatically advancing the view time by 15 minutes, displaying the production work areas that were in use during the new view time (the time displayed at the top of the page).

The ASP web page is the simple part of today’s blog article (although enabling classic ASP support may be a little challenging), while creating the demo data is actually the challenging portion of the article.  First, we need a table that will define the production work areas and the locations of those areas on the factory floor:

CREATE TABLE RESOURCE_LOCATION_DEMO (
  RESOURCE_ID VARCHAR2(15),
  DESCRIPTION VARCHAR2(30),
  LOCATION_LEFT NUMBER(12,4),
  LOCATION_TOP NUMBER(12,4),
  LOCATION_WIDTH NUMBER(12,4),
  LOCATION_HEIGHT NUMBER(12,4),
  PRIMARY KEY (RESOURCE_ID));

To keep things interesting, I do not want to just place the first production work area next to the second, the second next to the third, etc.  Instead, I want to randomly locate the production work areas on the factory floor, making certain that no two work areas overlap.  We can accomplish this by creating a list of numbers and ordering the numbers in a random sequence, like this:

SELECT
  ROWNUM RN
FROM
  DUAL
CONNECT BY
  LEVEL<=200
ORDER BY
  DBMS_RANDOM.VALUE;

  RN
----
 191
 165
 122
  12
  48
  27
 104
...
 198
 168
 150

200 rows selected.

Now, to locate the production work areas, imagine that we permitted 10 work areas horizontally (along the X axis) across the factory floor.  We can use the above number sequence along with the MOD function to determine the horizontal location of the work areas, and the FLOOR function to determine the vertical location of the work areas (note that each time we run this SQL statement will we receive different results):

SELECT
  'MA'||TO_CHAR(ROWNUM) RESOURCE_ID,
  'Shop Floor Machine '||TO_CHAR(ROWNUM) DESCRIPTION,
  MOD(RN,10) BOX_LEFT,
  FLOOR(RN/10) BOX_TOP
FROM
  (SELECT
    ROWNUM RN
  FROM
    DUAL
  CONNECT BY
    LEVEL<=200
  ORDER BY
    DBMS_RANDOM.VALUE)
WHERE
  ROWNUM<=50;

RESOURCE_ID DESCRIPTION                 BOX_LEFT    BOX_TOP
----------- ------------------------- ---------- ----------
MA1         Shop Floor Machine 1               4         14
MA2         Shop Floor Machine 2               9          6
MA3         Shop Floor Machine 3               5          2
MA4         Shop Floor Machine 4               5          9
MA5         Shop Floor Machine 5               7         18
MA6         Shop Floor Machine 6               9          4
MA7         Shop Floor Machine 7               0          8
MA8         Shop Floor Machine 8               6          6
MA9         Shop Floor Machine 9               5          5
MA10        Shop Floor Machine 10              7         15
...
MA49        Shop Floor Machine 49              2         11
MA50        Shop Floor Machine 50              8          7

It would be too boring to assume that each of the production work areas is exactly the same width and height, so we will add a little more randomization.  Additionally, I want each production area to be up to 1.5 units wide and up to 1.0 units tall, both offset 0.25 units from the top-left (we are dealing with screen coordinates here, where positive Y is the same as mathematical -Y).  While there are up to 200 locations on the factory floor for work areas, we will only define 50 work areas (controlled by the ROWNUM<=50 predicate at the end of the SQL statement):

SELECT
  'MA'||TO_CHAR(ROWNUM) RESOURCE_ID,
  'Shop Floor Machine '||TO_CHAR(ROWNUM) DESCRIPTION,
  (MOD(RN,10))*1.5 + 0.25 LOCATION_LEFT,
  (FLOOR(RN/10))*1.0 + 0.25 LOCATION_TOP,
  ROUND(1.5*DBMS_RANDOM.VALUE,4) LOCATION_WIDTH,
  ROUND(1.0*DBMS_RANDOM.VALUE,4) LOCATION_HEIGHT
FROM
  (SELECT
    ROWNUM RN
  FROM
    DUAL
  CONNECT BY
    LEVEL<=200
  ORDER BY
    DBMS_RANDOM.VALUE)
WHERE
  ROWNUM<=50;

RESOURCE_ID DESCRIPTION               LOCATION_LEFT LOCATION_TOP LOCATION_WIDTH LOCATION_HEIGHT
----------- ------------------------- ------------- ------------ -------------- ---------------
MA1         Shop Floor Machine 1               3.25        18.25         1.2386           .7948
MA2         Shop Floor Machine 2               4.75        11.25          .6078           .9578
MA3         Shop Floor Machine 3               1.75        12.25          .5318            .457
MA4         Shop Floor Machine 4               3.25        13.25         1.2908           .9813
MA5         Shop Floor Machine 5                .25        16.25          .3245           .4644
MA6         Shop Floor Machine 6              12.25        15.25           .239           .3822
MA7         Shop Floor Machine 7               1.75        18.25          .0159           .8868
MA8         Shop Floor Machine 8               1.75        16.25          .3948           .8511
MA9         Shop Floor Machine 9              12.25         6.25          .4856           .3356
MA10        Shop Floor Machine 10             13.75        11.25         1.2619           .6124
...
MA49        Shop Floor Machine 49              7.75        16.25          .6664           .6938
MA50        Shop Floor Machine 50              9.25        15.25         1.3449           .6606

Now that we have tested the results, let’s insert a new set of similar random values into the RESOURCE_LOCATION_DEMO table, and display some of the inserted rows:

INSERT INTO
  RESOURCE_LOCATION_DEMO 
SELECT
  'MA'||TO_CHAR(ROWNUM) RESOURCE_ID,
  'Shop Floor Machine '||TO_CHAR(ROWNUM) DESCRIPTION,
  (MOD(RN,10))*1.5 + 0.25 LOCATION_LEFT,
  (FLOOR(RN/10))*1.0 + 0.25 LOCATION_TOP,
  ROUND(1.5*DBMS_RANDOM.VALUE,4) LOCATION_WIDTH,
  ROUND(1.0*DBMS_RANDOM.VALUE,4) LOCATION_HEIGHT
FROM
  (SELECT
    ROWNUM RN
  FROM
    DUAL
  CONNECT BY
    LEVEL<=200
  ORDER BY
    DBMS_RANDOM.VALUE)
WHERE
  ROWNUM<=50;

COMMIT;

SELECT
  *
FROM
  RESOURCE_LOCATION_DEMO;

RESOURCE_ID DESCRIPTION               LOCATION_LEFT LOCATION_TOP LOCATION_WIDTH LOCATION_HEIGHT
----------- ------------------------- ------------- ------------ -------------- ---------------
MA1         Shop Floor Machine 1              10.75        13.25           .104           .2165
MA2         Shop Floor Machine 2               7.75        18.25         1.2291            .478
MA3         Shop Floor Machine 3               9.25        16.25          .3431           .4364
MA4         Shop Floor Machine 4               1.75        15.25          .3665           .7278
MA5         Shop Floor Machine 5               4.75        15.25          .6842           .4507
MA6         Shop Floor Machine 6               4.75        18.25          .1384           .6434
MA7         Shop Floor Machine 7               4.75         7.25          .7448           .2178
MA8         Shop Floor Machine 8               7.75          .25          .3756            .499
MA9         Shop Floor Machine 9               1.75        18.25         1.0155           .0769
MA10        Shop Floor Machine 10              7.75        13.25         1.1892           .7518
...
MA49        Shop Floor Machine 49              6.25         3.25           .278           .6513
MA50        Shop Floor Machine 50               .25        15.25          .5216           .9607

To translate the above storage units (maybe in inch scale) into screen units we will multiply the storage units by 96 (96 dots per inch) and then multiply by the zoom percent (75% = 0.75).

SELECT
  RESOURCE_ID,
  ROUND(LOCATION_LEFT*96 *0.75) LOCATION_LEFT,
  ROUND(LOCATION_TOP*96 *0.75) LOCATION_TOP,
  ROUND(LOCATION_WIDTH*96 *0.75) LOCATION_WIDTH,
  ROUND(LOCATION_HEIGHT*96 *0.75) LOCATION_HEIGHT
FROM
  RESOURCE_LOCATION_DEMO;

RESOURCE_ID LOCATION_LEFT LOCATION_TOP LOCATION_WIDTH LOCATION_HEIGHT
----------- ------------- ------------ -------------- ---------------
MA1                   774          954              7              16
MA2                   558         1314             88              34
MA3                   666         1170             25              31
MA4                   126         1098             26              52
MA5                   342         1098             49              32
MA6                   342         1314             10              46
MA7                   342          522             54              16
MA8                   558           18             27              36
MA9                   126         1314             73               6
MA10                  558          954             86              54
...

Next, we need a table to maintain the time periods in which each of the production work areas was in use, and by whom the work areas were used:

CREATE TABLE LABOR_TICKET_DEMO (
  TRANSACTION_ID NUMBER,
  RESOURCE_ID VARCHAR2(15),
  EMPLOYEE_ID VARCHAR2(15),
  CLOCK_IN DATE,
  CLOCK_OUT DATE,
  PRIMARY KEY (TRANSACTION_ID));

Let’s see if we are able to generate some random data for the table:

ALTER SESSION SET NLS_DATE_FORMAT='DD-MON-YYYY HH24:MI';

SELECT
  ROWNUM TRANSACTION_ID,
  'MA'||TO_CHAR(ROUND(DBMS_RANDOM.VALUE(1,50))) RESOURCE_ID,
  'EMP'||TO_CHAR(ROUND(DBMS_RANDOM.VALUE(1,300))) EMPLOYEE_ID,
  TRUNC(SYSDATE)+DBMS_RANDOM.VALUE CLOCK_IN
FROM
  DUAL
CONNECT BY
  LEVEL<=10;

TRANSACTION_ID RESOURCE_ID EMPLOYEE_ID CLOCK_IN
-------------- ----------- ----------- -----------------
             1 MA29        EMP50       01-SEP-2010 01:56
             2 MA35        EMP181      01-SEP-2010 10:06
             3 MA13        EMP172      01-SEP-2010 17:40
             4 MA21        EMP182      01-SEP-2010 09:00
             5 MA14        EMP80       01-SEP-2010 09:53
             6 MA4         EMP80       01-SEP-2010 19:04
             7 MA7         EMP110      01-SEP-2010 14:34
             8 MA45        EMP19       01-SEP-2010 22:05
             9 MA10        EMP207      01-SEP-2010 21:51
            10 MA46        EMP127      01-SEP-2010 16:49

That worked, but note that we did not generate a CLOCK_OUT time – we want to make certain that the CLOCK_OUT time is after the CLOCK_IN time, but we simply cannot do that with the above SQL statement as written.  We slide the above into an inline view and then set the CLOCK_OUT time to be up to 12 hours after the CLOCK_IN time (DBMS_RANDOM.VALUE by default returns a value between 0 and 1, so if we divide that value by 2, we can add up to 1/2 of a day to the CLOCK_IN date and time):

INSERT INTO
  LABOR_TICKET_DEMO
SELECT
  TRANSACTION_ID,
  RESOURCE_ID,
  EMPLOYEE_ID,
  CLOCK_IN,
  CLOCK_IN + (DBMS_RANDOM.VALUE/2) CLOCK_OUT
FROM
  (SELECT
    ROWNUM TRANSACTION_ID,
    'MA'||TO_CHAR(ROUND(DBMS_RANDOM.VALUE(1,50))) RESOURCE_ID,
    'EMP'||TO_CHAR(ROUND(DBMS_RANDOM.VALUE(1,300))) EMPLOYEE_ID,
    TRUNC(SYSDATE)+DBMS_RANDOM.VALUE CLOCK_IN
  FROM
    DUAL
  CONNECT BY
    LEVEL<=1000);

COMMIT;

Let’s take a look at what made it into the table (your results will be different):

SELECT
  *
FROM
  LABOR_TICKET_DEMO
ORDER BY
  TRANSACTION_ID; 

TRANSACTION_ID RESOURCE_ID     EMPLOYEE_ID     CLOCK_IN          CLOCK_OUT
-------------- --------------- --------------- ----------------- -----------------
             1 MA34            EMP49           01-SEP-2010 20:32 02-SEP-2010 08:18
             2 MA47            EMP230          01-SEP-2010 20:08 02-SEP-2010 03:06
             3 MA20            EMP257          01-SEP-2010 02:17 01-SEP-2010 05:44
             4 MA21            EMP129          01-SEP-2010 09:37 01-SEP-2010 15:41
             5 MA18            EMP214          01-SEP-2010 18:57 01-SEP-2010 20:57
             6 MA46            EMP173          01-SEP-2010 05:51 01-SEP-2010 15:32
             7 MA34            EMP224          01-SEP-2010 20:23 02-SEP-2010 08:17
             8 MA31            EMP8            01-SEP-2010 02:56 01-SEP-2010 14:02
             9 MA37            EMP178          01-SEP-2010 09:28 01-SEP-2010 16:03
            10 MA8             EMP31           01-SEP-2010 20:17 02-SEP-2010 05:51
...
           999 MA43            EMP138          01-SEP-2010 05:07 01-SEP-2010 05:23
          1000 MA2             EMP235          01-SEP-2010 05:29 01-SEP-2010 13:28

We now have 1000 transactions scattered across the 50 work areas (RESOURCE_ID column).  What we next want to determine is which of the work areas was in use at a specific time of the day.  Because we eventually will want only one row per unique RESOURCE_ID value, we will use the ROW_NUMBER analytic function to number each of the rows within each unique RESOURCE_ID value:

SELECT
  RESOURCE_ID,
  EMPLOYEE_ID,
  TRUNC(SYSDATE) + (1.25)/24 CHECK_TIME,
  CLOCK_IN,
  CLOCK_OUT,
  ROW_NUMBER() OVER (PARTITION BY RESOURCE_ID ORDER BY CLOCK_IN) RN
FROM
  LABOR_TICKET_DEMO
WHERE
  CLOCK_IN<=TRUNC(SYSDATE) + (1.25)/24
  AND CLOCK_OUT>TRUNC(SYSDATE) + (1.25)/24
ORDER BY
  RESOURCE_ID,
  CLOCK_IN;

RESOURCE_ID  EMPLOYEE_ID  CHECK_TIME        CLOCK_IN          CLOCK_OUT         RN
------------ ------------ ----------------- ----------------- ----------------- --
MA10         EMP192       01-SEP-2010 01:15 01-SEP-2010 00:21 01-SEP-2010 05:44  1
MA10         EMP233       01-SEP-2010 01:15 01-SEP-2010 00:23 01-SEP-2010 02:42  2
MA16         EMP114       01-SEP-2010 01:15 01-SEP-2010 00:21 01-SEP-2010 04:48  1
MA18         EMP261       01-SEP-2010 01:15 01-SEP-2010 00:18 01-SEP-2010 07:02  1
MA18         EMP133       01-SEP-2010 01:15 01-SEP-2010 00:32 01-SEP-2010 04:35  2
MA2          EMP62        01-SEP-2010 01:15 01-SEP-2010 01:14 01-SEP-2010 12:03  1
MA21         EMP235       01-SEP-2010 01:15 01-SEP-2010 00:05 01-SEP-2010 10:42  1
MA22         EMP4         01-SEP-2010 01:15 01-SEP-2010 00:01 01-SEP-2010 06:27  1
MA22         EMP300       01-SEP-2010 01:15 01-SEP-2010 01:12 01-SEP-2010 11:50  2
MA23         EMP135       01-SEP-2010 01:15 01-SEP-2010 00:35 01-SEP-2010 05:19  1
MA25         EMP35        01-SEP-2010 01:15 01-SEP-2010 00:20 01-SEP-2010 06:58  1
MA28         EMP298       01-SEP-2010 01:15 01-SEP-2010 00:52 01-SEP-2010 06:27  1
MA30         EMP72        01-SEP-2010 01:15 01-SEP-2010 00:56 01-SEP-2010 07:28  1
MA32         EMP84        01-SEP-2010 01:15 01-SEP-2010 01:00 01-SEP-2010 05:25  1
MA34         EMP299       01-SEP-2010 01:15 01-SEP-2010 00:31 01-SEP-2010 12:04  1
MA38         EMP268       01-SEP-2010 01:15 01-SEP-2010 00:31 01-SEP-2010 04:15  1
MA38         EMP278       01-SEP-2010 01:15 01-SEP-2010 00:32 01-SEP-2010 04:50  2
MA38         EMP176       01-SEP-2010 01:15 01-SEP-2010 01:01 01-SEP-2010 04:01  3
MA4          EMP257       01-SEP-2010 01:15 01-SEP-2010 00:10 01-SEP-2010 10:45  1
MA40         EMP231       01-SEP-2010 01:15 01-SEP-2010 00:58 01-SEP-2010 11:01  1
MA43         EMP65        01-SEP-2010 01:15 01-SEP-2010 00:54 01-SEP-2010 09:29  1
MA44         EMP18        01-SEP-2010 01:15 01-SEP-2010 00:07 01-SEP-2010 03:30  1
MA46         EMP36        01-SEP-2010 01:15 01-SEP-2010 00:40 01-SEP-2010 04:57  1
MA48         EMP61        01-SEP-2010 01:15 01-SEP-2010 00:27 01-SEP-2010 10:20  1
MA48         EMP169       01-SEP-2010 01:15 01-SEP-2010 00:44 01-SEP-2010 01:27  2
MA5          EMP147       01-SEP-2010 01:15 01-SEP-2010 00:02 01-SEP-2010 04:48  1
MA6          EMP132       01-SEP-2010 01:15 01-SEP-2010 00:34 01-SEP-2010 09:42  1

27 rows selected.

In some cases we have up to three employees working in a work area at 1:15AM (the time of day is indicated by the 1.25 value in the SQL statement).  Now, lets eliminate the duplicate rows, leaving just the rows where the calculated RN column is equal to 1.  We will join the above SQL statement in an inline view to the RESOURCE_LOCATION_DEMO table and convert the production work area coordinates to screen coordinates, in this case 96 pixels per unit (inches) at a 75% zoom percent (we made this same screen coordinate conversion in a previous SQL statement above):

SELECT
  RL.RESOURCE_ID,
  RL.DESCRIPTION,
  ROUND(RL.LOCATION_LEFT*96 *0.75) LOCATION_LEFT,
  ROUND(RL.LOCATION_TOP*96 *0.75) LOCATION_TOP,
  ROUND(RL.LOCATION_WIDTH*96 *0.75) LOCATION_WIDTH,
  ROUND(RL.LOCATION_HEIGHT*96 *0.75) LOCATION_HEIGHT,
  LT.EMPLOYEE_ID,
  LT.CLOCK_IN,
  LT.CLOCK_OUT
FROM
  RESOURCE_LOCATION_DEMO RL,
  (SELECT
    RESOURCE_ID,
    EMPLOYEE_ID,
    CLOCK_IN,
    CLOCK_OUT,
    ROW_NUMBER() OVER (PARTITION BY RESOURCE_ID ORDER BY CLOCK_IN) RN
  FROM
    LABOR_TICKET_DEMO
  WHERE
    CLOCK_IN<=TRUNC(SYSDATE) + (1.25)/24
    AND CLOCK_OUT>TRUNC(SYSDATE) + (1.25)/24) LT
WHERE
  RL.RESOURCE_ID=LT.RESOURCE_ID(+)
  AND NVL(LT.RN,1)=1
ORDER BY
  RL.RESOURCE_ID;

RESOURCE_ID  DESCRIPTION                    LOCATION_LEFT LOCATION_TOP LOCATION_WIDTH LOCATION_HEIGHT EMPLOYEE_ID  CLOCK_IN          CLOCK_OUT
------------ ------------------------------ ------------- ------------ -------------- --------------- ------------ ----------------- -----------------
MA1          Shop Floor Machine 1                     774          954              7              16
MA10         Shop Floor Machine 10                    558          954             86              54 EMP192       01-SEP-2010 00:21 01-SEP-2010 05:44
MA11         Shop Floor Machine 11                    882         1098             29               1
MA12         Shop Floor Machine 12                    234          378             51              51
MA13         Shop Floor Machine 13                    882         1314             83              62
MA14         Shop Floor Machine 14                    558          378             38              61
MA15         Shop Floor Machine 15                    774          522             63              64
MA16         Shop Floor Machine 16                    126          666            103              55 EMP114       01-SEP-2010 00:21 01-SEP-2010 04:48
MA17         Shop Floor Machine 17                    558          234             94              30
MA18         Shop Floor Machine 18                    342          450             85              21 EMP261       01-SEP-2010 00:18 01-SEP-2010 07:02
MA19         Shop Floor Machine 19                    342          666             94              20
MA2          Shop Floor Machine 2                     558         1314             88              34 EMP62        01-SEP-2010 01:14 01-SEP-2010 12:03
MA20         Shop Floor Machine 20                    666          162             33              33
MA21         Shop Floor Machine 21                    774          306             66              22 EMP235       01-SEP-2010 00:05 01-SEP-2010 10:42
MA22         Shop Floor Machine 22                    990          378             78              71 EMP4         01-SEP-2010 00:01 01-SEP-2010 06:27
MA23         Shop Floor Machine 23                    666          666             50              37 EMP135       01-SEP-2010 00:35 01-SEP-2010 05:19
MA24         Shop Floor Machine 24                    990          810            107              45
...

From the above output, we know the screen coordinates of each production work area (RESOURCE_ID) and the first employee to use the production work area (and the employee was still using it) at 1:15AM.

For the next portion of this blog article, the portion that requires an ASP enabled web server, we need a couple of pictures (these were created using Microsoft Power Point):

Representing a production work center that is in use:

Representing a production work center that is idle:

The factory floor – the background area:

——

Now we need the classic ASP page code – note that the code syntax is very similar to that of the previous VBScript examples.  The script uses Response.Write to write information to the client computer’s web browser, and an embedded Java script to call the post event of the embedded HTML form to update the display as of time every 30 seconds (yes, I should have used bind variables in the SQL statement, but that would have required an extra 120 seconds to code and would have left you with nothing to improve):

<html>

<head>
<meta http-equiv="refresh" content="600">
<title>Graphical Work Center Utilization Animated</title>
</head>

<body>
    <%
    Dim strSQL

    Dim sglOriginLeft
    Dim sglOriginTop

    Dim sglZoom
    Dim strZoom

    Dim i
    Dim intWidth
    Dim intHeight
    Dim strOffset
    Dim sglOffset
    Dim varDateTime

    Dim snpData
    Dim dbDatabase

    Set dbDatabase = Server.CreateObject("ADODB.Connection")

    'Database configuration
    strUsername = "MyUsername"
    strPassword = "MyPassword"
    strDatabase = "MyDB"

    On Error Resume Next

    dbDatabase.ConnectionString = "Provider=OraOLEDB.Oracle;Data Source=" & strDatabase & ";User ID=" & strUsername & ";Password=" & strPassword & ";"
    dbDatabase.Open

    'Should verify that the connection attempt was successful, but I will leave that for someone else to code

    Set snpData = Server.CreateObject("ADODB.Recordset")

    'Retrieve the last value for the time offset and the selected zoom percent
    strOffset = Request.Form("txtOffset")
    strZoom = Request.Form("cboZoom")

    'Convert back to a number
    sglOffset = CSng(strOffset)
    sglZoom = CSng(strZoom) / 100

    'Advance to the next 0.25 hour
    if (sglOffset = 0) or (sglOffset = 24) then
        sglOffset = 0.25
    else
        sglOffset = sglOffset + 0.25
    end if

    'Set the zoom percent, if not already set
    If sglZoom = 0 Then
        sglZoom = 0.5 '50% zoom
    End If

    varDateTime = DateAdd("n", sglOffset*60, Date) 'Create a printable version of the view date and time

    Response.Write "<form name=" & chr(34) & "reportform" & chr(34) & " method=" & chr(34) & "POST" & chr(34) & " action=" & chr(34) & "GraphicalWorkCenterUtilization.asp" & chr(34) & ">" & vbcrlf
    Response.Write varDateTime & "  " & vbCrLf
    Response.Write "&nbsp;&nbsp;&nbsp;Zoom Percent <select size=""1"" id=""cboZoom"" name=""cboZoom"" style=""width:50"">" & vbCrLf
    For i = 10 to 200 Step 10
        If sglZoom = i / 100 Then
            Response.Write "<option selected value=""" & cStr(i) & """>" & cStr(i) & "</option>"
        Else
            Response.Write "<option value=""" & cStr(i) & """>" & cStr(i) & "</option>"
        End If
    Next
    Response.Write "</select><br>"
    Response.Write "  <input type=" & chr(34) & "submit" & chr(34) & " value=" & chr(34) & "Update View" & chr(34) & " name=" & chr(34) & "cmdViewReport" & chr(34) & ">" & vbcrlf
    Response.Write "  <input type=" & chr(34) & "hidden" & chr(34) & " name=" & chr(34) & "txtOffset" & chr(34) & " size=" & chr(34) & "5" & chr(34) & " value=" & chr(34) & cStr(sglOffset) & chr(34) & ">" & vbcrlf
    Response.Write "</form>" & vbcrlf

    'The background image
    intWidth = Round(16 * 96 * sglZoom)
    intHeight = Round(20 * 96 * sglZoom)
    Response.Write "<img src=" & chr(34) & "https://hoopercharles.files.wordpress.com/2010/09/graphicalworkcenterutilizationbackground.jpg" & chr(34) & " width=" & chr(34) & cstr(intWidth) & chr(34) & " height=" & chr(34) & cstr(intheight) & chr(34) & " style=" & chr(34) & "position:absolute;top:50px;left:0px;z-index:-1" & chr(34) & "/>" & vbcrlf

    'The SQL statement developed earlier
    strSQL = "SELECT" & VBCrLf
    strSQL = strSQL & "  RL.RESOURCE_ID," & VBCrLf
    strSQL = strSQL & "  RL.DESCRIPTION," & VBCrLf
    strSQL = strSQL & "  ROUND(RL.LOCATION_LEFT*96 *" & cStr(sglZoom) & ") LOCATION_LEFT," & VBCrLf
    strSQL = strSQL & "  ROUND(RL.LOCATION_TOP*96 *" & cStr(sglZoom) & ") LOCATION_TOP," & VBCrLf
    strSQL = strSQL & "  ROUND(RL.LOCATION_WIDTH*96 *" & cStr(sglZoom) & ") LOCATION_WIDTH," & VBCrLf
    strSQL = strSQL & "  ROUND(RL.LOCATION_HEIGHT*96 *" & cStr(sglZoom) & ") LOCATION_HEIGHT," & VBCrLf
    strSQL = strSQL & "  LT.EMPLOYEE_ID," & VBCrLf
    strSQL = strSQL & "  LT.CLOCK_IN," & VBCrLf
    strSQL = strSQL & "  LT.CLOCK_OUT" & VBCrLf
    strSQL = strSQL & "FROM" & VBCrLf
    strSQL = strSQL & "  RESOURCE_LOCATION_DEMO RL," & VBCrLf
    strSQL = strSQL & "  (SELECT" & VBCrLf
    strSQL = strSQL & "    RESOURCE_ID," & VBCrLf
    strSQL = strSQL & "    EMPLOYEE_ID," & VBCrLf
    strSQL = strSQL & "    CLOCK_IN," & VBCrLf
    strSQL = strSQL & "    CLOCK_OUT," & VBCrLf
    strSQL = strSQL & "    ROW_NUMBER() OVER (PARTITION BY RESOURCE_ID ORDER BY CLOCK_IN) RN" & VBCrLf
    strSQL = strSQL & "  FROM" & VBCrLf
    strSQL = strSQL & "    LABOR_TICKET_DEMO" & VBCrLf
    strSQL = strSQL & "  WHERE" & VBCrLf
    strSQL = strSQL & "    CLOCK_IN<=TRUNC(SYSDATE) + (" & cStr(sglOffset) & ")/24" & VBCrLf
    strSQL = strSQL & "    AND CLOCK_OUT>TRUNC(SYSDATE) + (" & cStr(sglOffset) & ")/24) LT" & VBCrLf
    strSQL = strSQL & "WHERE" & VBCrLf
    strSQL = strSQL & "  RL.RESOURCE_ID=LT.RESOURCE_ID(+)" & VBCrLf
    strSQL = strSQL & "  AND NVL(LT.RN,1)=1" & VBCrLf
    strSQL = strSQL & "ORDER BY" & VBCrLf
    strSQL = strSQL & "  RL.RESOURCE_ID"

    snpData.open strSQL, dbDatabase

    If snpData.State = 1 then
        Response.Write "<B><font color=" & chr(34) & "#0000FF" & chr(34) & "><p style=" & chr(34) & "position:absolute;top:15px;left:500px" & chr(34) & ">" & FormatDateTime(cdate(snpData("START_TIME")),2) & " " & FormatDateTime(cdate(snpData("START_TIME")),4) & " - " & FormatDateTime(cdate(snpData("END_TIME")),4) & "</p></font></b>" & vbcrlf

        Do While Not snpData.EOF
            If Not(IsNull(snpData("employee_id"))) Then
                'A labor ticket was in process during this time period
                Response.Write "<img alt=" & Chr(34) & snpData("resource_id") & "  " & snpData("description") & vbCrlf & snpData("employee_id") & "(" & snpData("clock_in") & " - " & snpData("clock_out") & ")" & Chr(34) & " src=" & chr(34) & "https://hoopercharles.files.wordpress.com/2010/09/graphicalworkcenterutilizationrunning.jpg" & chr(34) & " width=" & chr(34) & cStr(snpData("location_width")) & chr(34) & " height=" & chr(34) & cStr(snpData("location_height")) & chr(34) & " style=" & chr(34) & "position:absolute;top:" & cStr(cLng(snpData("location_top")) + 40) & "px;left:" & cStr(snpData("location_left")) & "px" & chr(34) & "/>" & vbcrlf
                'Write the title down 1 pixel
                Response.Write "<B><font color=" & chr(34) & "#00FFFF" & chr(34) & "><p style=" & chr(34) & "position:absolute;top:" & cStr(Round(41 + CSng(snpData("location_top")))) & "px;left:" & cStr(Round(CSng(snpData("location_left")))) & "px" & chr(34) & ">" & snpData("resource_id") & "</p></font></b>" & vbcrlf
            Else
                'No labor ticket was in process during this time period
                Response.Write "<img alt=" & Chr(34) & snpData("resource_id") & "  " & snpData("description") & Chr(34) & " src=" & chr(34) & "https://hoopercharles.files.wordpress.com/2010/09/graphicalworkcenterutilizationstopped.jpg" & chr(34) & " width=" & chr(34) & cStr(snpData("location_width")) & chr(34) & " height=" & chr(34) & cStr(snpData("location_height")) & chr(34) & " style=" & chr(34) & "position:absolute;top:" & cStr(cLng(snpData("location_top")) + 40) & "px;left:" & cStr(snpData("location_left")) & "px" & chr(34) & "/>" & vbcrlf
                'Write the title down 1 pixel
                Response.Write "<B><font color=" & chr(34) & "#FF0000" & chr(34) & "><p style=" & chr(34) & "position:absolute;top:" & cStr(Round(41 + CSng(snpData("location_top")))) & "px;left:" & cStr(Round(CSng(snpData("location_left")))) & "px" & chr(34) & ">" & snpData("resource_id") & "</p></font></b>" & vbcrlf
            End If
            snpData.movenext
        Loop
    End if

    snpData.Close
    dbDatabase.Close

    Set snpData = Nothing
    Set dbDatabase = Nothing
    %>

    <script language="JavaScript1.2">
    function NextInterval(){
      reportform.submit();
    }

      setTimeout("NextInterval()",30000)
    </script>
</body>

</html>

GraphicalWorkCenterUtilization.asp (save as GraphicalWorkCenterUtilization.asp on a web server that supports classic ASP pages)

Below are samples of the output as the display as of time advanced – the zoom percent was set to 50.  Notice that the work centers go online and offline as the time progresses (click a picture to see a larger version of that picture):

The final example demonstrates how the display changed when the zoom percent was changed from 50% to 130%:

As the mouse pointer is moved over the boxes representing the work centers, a pop-up tooltip appears that describes the work center, as well as employee ID, clock in time, and clock out time for the first labor ticket at that work center in the time period.

——-

Hopefully, you have found this example to be interesting.  Your assignment is to now connect proximity switches to the devices in your office and surrounding areas, recording their location in the RESOURCE_LOCATION_DEMO table.  Then log the proximity switch status to the LABOR_TICKET_DEMO table so that you are able to determine when the water cooler, coffee machine, chairs, keyboards, and computer mice are in use.  Use the data to determine which employees are the hardest working, and which employees have determined how to think smarter rather than working harder.  🙂





Create an Auto-Scaling HTML Chart using Only SQL

8 07 2010

July 8, 2010 (Modified July 9, 2010)

I thought that I would try something a little different today – build an auto-scaling HTML bar chart using nothing more than a SQL statement.  I mentioned in this book review that I was impressed with the HTML chart that was included in the book, but I felt that it might be more interesting if the example used absolute positioning, rather than an HTML table.  So, I built an example using dynamic positioning that is not based on what appears in that book.

We will use the sample table from this blog article (side note: this is an interesting article that shows how a VBS script can generate a 10046 trace file, and then transform that trace file back into a VBS script), just with the table renamed to T1.

CREATE TABLE T1 AS
SELECT
  DECODE(TRUNC(DBMS_RANDOM.VALUE(0,5)),
          0,'MIKE',
          1,'ROB',
          2,'SAM',
          3,'JOE',
          4,'ERIC') EMPLOYEE_ID,
  TRUNC(SYSDATE)-ROUND(DBMS_RANDOM.VALUE(0,1000)) SHIFT_DATE,
  DECODE(TRUNC(DBMS_RANDOM.VALUE(0,10)),
          0,'VAC',
          1,'HOL',
          2,'BEREAVE',
          3,'JURY',
          4,'ABS',
          5,'EXCUSE',
          6,'MIL',
          'OTHER') INDIRECT_ID
FROM
  DUAL
CONNECT BY
  LEVEL<=1000;

COMMIT;

Now that we have 1000 rows in the sample table, let’s see how many entries fall into each week in the table (the week starts on a Monday) for those indirect entries that are either VAC, ABS, or EXCUSE:

SELECT
  NEXT_DAY(SHIFT_DATE,'MONDAY')-7 WEEK_OF,
  COUNT(*) IND
FROM
  T1
WHERE
  INDIRECT_ID IN ('VAC','ABS','EXCUSE')
GROUP BY
  NEXT_DAY(SHIFT_DATE,'MONDAY')-7
ORDER BY
  1;

Your results of course will be different from what follows due to the randomization of the data, but this is what was returned from my database:

WEEK_OF   IND
--------- ---
08-OCT-07   1
15-OCT-07   2
22-OCT-07   4
29-OCT-07   3
05-NOV-07   3
03-DEC-07   2
10-DEC-07   3
24-DEC-07   2
...
05-JAN-09   1
12-JAN-09   3
19-JAN-09   7
02-FEB-09   1
...
21-JUN-10   3
28-JUN-10   2
05-JUL-10   2

The above SQL statement should work for the base query, now we need to start manipulating the data so that we are able to calculate the size and location of the bars in the chart.  We will slide the above SQL statement into an inline view:

SELECT
  WEEK_OF,
  IND,
  MAX(IND) OVER () MAX_IND,
  COUNT(WEEK_OF) OVER () COUNT_WEEK_OF,
  ROWNUM RN
FROM
  (SELECT
    NEXT_DAY(SHIFT_DATE,'MONDAY')-7 WEEK_OF,
    COUNT(*) IND
  FROM
    T1
  WHERE
    INDIRECT_ID IN ('VAC','ABS','EXCUSE')
  GROUP BY
    NEXT_DAY(SHIFT_DATE,'MONDAY')-7
  ORDER BY
    1);

In addition to returning the original data from the SQL statement, we are now also returning the maximum data value, the total number of weeks with at least one entry, and a row counter:

WEEK_OF   IND MAX_IND COUNT_WEEK_OF    RN
--------- --- ------- ------------- -----
08-OCT-07   1       7           126     1
15-OCT-07   2       7           126     2
22-OCT-07   4       7           126     3
29-OCT-07   3       7           126     4
05-NOV-07   3       7           126     5
03-DEC-07   2       7           126     6
10-DEC-07   3       7           126     7
...
05-JAN-09   1       7           126    57
12-JAN-09   3       7           126    58
19-JAN-09   7       7           126    59
02-FEB-09   1       7           126    60
...
14-JUN-10   2       7           126   123
21-JUN-10   3       7           126   124
28-JUN-10   2       7           126   125
05-JUL-10   2       7           126   126

Next, we need to calculate the position and size of each of the bars in the chart, so we will again slide the above into an inline view:

SELECT
  WEEK_OF,
  IND,
  MAX_IND,
  COUNT_WEEK_OF,
  RN,
  TRUNC(300 * IND/MAX_IND) BAR_WIDTH,
  TRUNC(800 * 1/COUNT_WEEK_OF) BAR_HEIGHT,
  TRUNC(800 * 1/COUNT_WEEK_OF * (RN-1)) BAR_TOP,
  100 BAR_LEFT
FROM
  (SELECT
    WEEK_OF,
    IND,
    MAX(IND) OVER () MAX_IND,
    COUNT(WEEK_OF) OVER () COUNT_WEEK_OF,
    ROWNUM RN
  FROM
    (SELECT
      NEXT_DAY(SHIFT_DATE,'MONDAY')-7 WEEK_OF,
      COUNT(*) IND
    FROM
      T1
    WHERE
      INDIRECT_ID IN ('VAC','ABS','EXCUSE')
    GROUP BY
      NEXT_DAY(SHIFT_DATE,'MONDAY')-7
    ORDER BY
      1));

You might notice in the above that I specified that the maximum width of the chart will be 300 (pixels) and the maximum height will be 800 (pixels).  Here is the output:

WEEK_OF   IND MAX_IND COUNT_WEEK_OF    RN  BAR_WIDTH BAR_HEIGHT BAR_TOP BAR_LEFT
--------- --- ------- ------------- ----- ---------- ---------- ------- --------
08-OCT-07   1       7           126     1         42          6       0      100
15-OCT-07   2       7           126     2         85          6       6      100
22-OCT-07   4       7           126     3        171          6      12      100
29-OCT-07   3       7           126     4        128          6      19      100
05-NOV-07   3       7           126     5        128          6      25      100
03-DEC-07   2       7           126     6         85          6      31      100
10-DEC-07   3       7           126     7        128          6      38      100
24-DEC-07   2       7           126     8         85          6      44      100
...
05-JAN-09   1       7           126    57         42          6     355      100
12-JAN-09   3       7           126    58        128          6     361      100
19-JAN-09   7       7           126    59        300          6     368      100
02-FEB-09   1       7           126    60         42          6     374      100
...
14-JUN-10   2       7           126   123         85          6     774      100
21-JUN-10   3       7           126   124        128          6     780      100
28-JUN-10   2       7           126   125         85          6     787      100
05-JUL-10   2       7           126   126         85          6     793      100

Now what?  We need to convert the above into HTML using DIV tags to position the bars as calculated.  Prior to the first row we need to write a couple of HTML tags to set the page title, and after the last row we need to write a couple more HTML tags to close the BODY and HTML section of the document.  The transformed SQL statement looks like this:

SET TRIMSPOOL ON
SET LINESIZE 400
SET ECHO OFF
SET FEEDBACK OFF
SET VERIFY OFF
SET PAGESIZE 0
SET SQLPROMPT ''

SPOOL C:\CUSTOM_CHART.HTM

SELECT
  DECODE(RN,1,'<html><head><title>Custom Chart</title></head><body>' || CHR(13) || CHR(10),' ') ||
  '<div style="position:absolute;' ||
    'top:' || TO_CHAR(TRUNC(800 * 1/COUNT_WEEK_OF * (RN-1))) || 'px;' ||
    'left:' || TO_CHAR(5) || 'px;' ||
    'width:' || TO_CHAR(100) || 'px;' ||
    'height:' || TO_CHAR(TRUNC(800 * 1/COUNT_WEEK_OF)) || 'px;' ||
    '"><font size="1px" color="#0000FF">' || TO_CHAR(WEEK_OF,'MM/DD/YY') ||
      REPLACE('     ',' ',CHR(38) || 'nbsp;') || TO_CHAR(IND) || '</font></div>' ||
  '<div style="background:#444466;position:absolute;' ||
    'top:' || TO_CHAR(TRUNC(800 * 1/COUNT_WEEK_OF * (RN-1))) || 'px;' ||
    'left:' || TO_CHAR(100) || 'px;' ||
    'width:' || TO_CHAR(TRUNC(300 * IND/MAX_IND)) || 'px;' ||
    'height:' || TO_CHAR(TRUNC(800 * 1/COUNT_WEEK_OF)) || 'px;' ||
    '"><font size="1px" color="#FFFFFF"></font></div>' ||
  DECODE(RN,COUNT_WEEK_OF, CHR(13) || CHR(10) || '</body></html>',' ') HTML_LINE
FROM
  (SELECT
    WEEK_OF,
    IND,
    MAX(IND) OVER () MAX_IND,
    COUNT(WEEK_OF) OVER () COUNT_WEEK_OF,
    ROWNUM RN
  FROM
    (SELECT
      NEXT_DAY(SHIFT_DATE,'MONDAY')-7 WEEK_OF,
      COUNT(*) IND
    FROM
      T1
    WHERE
      INDIRECT_ID IN ('VAC','ABS','EXCUSE')
    GROUP BY
      NEXT_DAY(SHIFT_DATE,'MONDAY')-7
    ORDER BY
      1));

SPOOL OFF

There is a slight problem with the above, the SQL statement and SPOOL OFF are printed in the resulting HTML file – if someone knows how to avoid that behavior (without placing the above into another script file), I would like to see how it is done (Oracle’s documentation did not help).

This is what the resulting HTML file looks like:

The number of result rows from the query was a bit high (126) so the bars are significantly compressed in height.  Just to see what happens, let’s add the following to the WHERE clause in the inner-most inline view:

AND SHIFT_DATE >= TO_DATE('01-JAN-2010','DD-MON-YYYY')

The resulting chart now looks like this:

Of course it is possible to adjust the colors of the font (#0000FF) and the bars (#444466), which are specified in hex in the format of RRGGBB (red green blue).  It is also possible to adjust the color of the bars to reflect the value represented by the bar, but that is an exercise for the reader.  For those who need to feel creative, it is also possible to display pictures in the bars, but that is also an exercise left for the reader.

—-

Edit: The sample output from the SQL statement displays correctly on Red Hat Enterprise Linux 3 using Firefox 0.8:





CPU Wait? LAG to the Rescue

26 02 2010

February 26, 2010

A question in a recent OTN thread appeared as follows:

I’m in RAC database (10gR2/Redhat4). I need to store the real CPU wait every 1 minute in a table for a month. For that I thing to query the GV$SYS_TIME_MODEL [for the “DB CPU” statistic].

This is a very difficult question to answer.  Why (additional reference)?  Technically, in Oracle when a session is “on the CPU”, that session is not in a wait event, and therefore not officially waiting for anything. The “DB CPU” statistic captures the accumulated time spent on the CPU by foreground processes – the user sessions. This CPU time does not include the time that the background processes (DBWR, LGWR, PMON, etc.) spend consuming CPU time. Additionally, the “DB CPU” statistic does not consider/accumulate CPU time consumed by processes that are not related to the database instance.  It could also be said that the “DB CPU” time does not account for time that the session spends waiting for its turn to execute on the CPUs.

With the above in mind, let’s see if we are able to calculate the amount of CPU time consumed by the sessions and the background processes in one minute intervals.  First, we need a logging table.  The following SQL statement builds the SYS_TIME_MODEL_CPU table using a couple of the column definitions from the GV$SYS_TIME_MODEL view so that I do not need to explicitly state the column data types (notice that the SQL statement is collapsing data from two source rows into a single row):

CREATE TABLE SYS_TIME_MODEL_CPU AS
SELECT
  SYSDATE CHK_ID,
  INST_ID,
  SUM(DECODE(STAT_NAME,'DB CPU',VALUE,NULL)) DB_CPU,
  SUM(DECODE(STAT_NAME,'background cpu time',VALUE,NULL)) BACKGROUND_CPU
FROM
  GV$SYS_TIME_MODEL
WHERE
  0=1
GROUP BY
  INST_ID;

If we are able to find a way to schedule the following SQL statement to execute once a minute, we will be able to store the current values of the “DB CPU” and “background cpu time” statistics with the following SQL statement (note that executing this SQL statement will also consume CPU time, the very thing we are trying to measure):

INSERT INTO SYS_TIME_MODEL_CPU
SELECT
  SYSDATE CHK_ID,
  INST_ID,
  SUM(DECODE(STAT_NAME,'DB CPU',VALUE,NULL)) DB_CPU,
  SUM(DECODE(STAT_NAME,'background cpu time',VALUE,NULL)) BACKGROUND_CPU
FROM
  GV$SYS_TIME_MODEL
WHERE
  STAT_NAME IN ('DB CPU','background cpu time')
GROUP BY
  INST_ID;

One way to schedule the SQL statement to execute once a minute is to use the DBMS_LOCK.SLEEP function in a loop.  Unfortunely, on some platforms the function may not wait exactly the specified number of seconds (it may wait slightly longer), and may cause the “PL/SQL lock timer” wait event to steal a position in the top 5 wait events list in a Statspack or AWR report.  For testing purposes, the following anonymous PL/SQL script might be used:

DECLARE
  STime DATE := SYSDATE;
BEGIN
  WHILE (SYSDATE - STime) < 32 LOOP
    INSERT INTO SYS_TIME_MODEL_CPU
      SELECT
        SYSDATE CHK_ID,
        INST_ID,
        SUM(DECODE(STAT_NAME,'DB CPU',VALUE,NULL)) DB_CPU,
        SUM(DECODE(STAT_NAME,'background cpu time',VALUE,NULL)) BACKGROUND_CPU
      FROM
        GV$SYS_TIME_MODEL
      WHERE
        STAT_NAME IN ('DB CPU','background cpu time')
      GROUP BY
        INST_ID;

      COMMIT;
      DBMS_LOCK.SLEEP(60);
  End Loop;
End;
/

If we allow the script to run for a couple of minutes (rather than 31 days), we are able to determine how much CPU time was consumed every minute by using the LAG analytic function, as shown below:

SELECT
  TO_CHAR(CHK_ID,'YYYY-MM-DD HH24:MI') CHK_ID,
  INST_ID,
  DB_CPU-LAG(DB_CPU,1) OVER (PARTITION BY INST_ID ORDER BY CHK_ID) DB_CPU,
  BACKGROUND_CPU-LAG(BACKGROUND_CPU,1) OVER (PARTITION BY INST_ID ORDER BY CHK_ID) BACKGROUND_CPU
FROM
  SYS_TIME_MODEL_CPU
ORDER BY
  CHK_ID;

CHK_ID              INST_ID     DB_CPU BACKGROUND_CPU
---------------- ---------- ---------- --------------
2010-02-24 07:18          1
2010-02-24 07:19          1   59990544          66070
2010-02-24 07:20          1   59951475          66724
2010-02-24 07:21          1   59985268          71768
2010-02-24 07:22          1   60000569          63694
2010-02-24 07:23          1   60002938          71639
2010-02-24 07:24          1   59978651          63770
2010-02-24 07:25          1   61487141          62785
2010-02-24 07:26          1      24194          76990

To determine the number of seconds of CPU time consumed, the values shown in the DB_CPU and BACKGROUND_CPU columns should be divided by 1,000,000.

Why not just use AWR data to obtain this information?  Just because AWR is built-in does not mean that it is free to use the features of AWR (as I attempted to argue in this OTN thread) – this is a fact that is often glossed over by various books, blog articles, “news” articles, and even the Oracle Database documentation when it states that AWR reports are the replacement for Statspack reports.





Excel – Charting the Results of Oracle Analytic Functions

6 02 2010

February 6, 2010

This is a somewhat complicated example that builds a couple of sample tables, uses a SQL statement with the Oracle analytic function LEAD submitted through ADO in an Excel macro, and then presents the information on an Excel worksheet.  When the user clicks one of three buttons on the Excel worksheet, an Excel macro executes that then build charts using disconnected row sources – a disconnected ADO recordset is used to sort the data categories before pushing that data into the charts that are built on the fly.

To start, we need to build the sample tables.  The first two tables follow, a part list table and a vendor list table with random data:

CREATE TABLE PART_LIST (
  PART_ID VARCHAR2(30),
  PRODUCT_CODE VARCHAR2(30),
  COMMODITY_CODE VARCHAR2(30),
  PURCHASED CHAR(1),
  PRIMARY KEY (PART_ID));

INSERT INTO
  PART_LIST
SELECT
  DBMS_RANDOM.STRING('Z',10),
  DBMS_RANDOM.STRING('Z',1),
  DBMS_RANDOM.STRING('Z',1),
  DECODE(ROUND(DBMS_RANDOM.VALUE(1,2)),1,'Y','N')
FROM
  DUAL
CONNECT BY
  LEVEL<=50000;

COMMIT;

CREATE TABLE VENDOR_LIST (
  VENDOR_ID VARCHAR2(30),
  PRIMARY KEY (VENDOR_ID));

INSERT INTO
  VENDOR_LIST
SELECT
  DBMS_RANDOM.STRING('Z',10)
FROM
  DUAL
CONNECT BY
  LEVEL<=100; 

COMMIT;

Next, we need to build a purchase transaction history table, allowing a single part to be purchased from 10 randomly selected vendors of the 100 vendors.  This is actually a Cartesian join, but we need to force it to handled as a nested loop join so that we will have a different set of 10 vendors for each PART_ID:

CREATE TABLE PURCHASE_HISTORY (
  TRANSACTION_ID NUMBER,
  VENDOR_ID VARCHAR2(30),
  PART_ID VARCHAR2(30),
  UNIT_PRICE NUMBER(12,2),
  PURCHASE_DATE DATE,
  PRIMARY KEY (TRANSACTION_ID));

INSERT INTO
  PURCHASE_HISTORY
SELECT /*+ ORDERED USE_NL(PL VL) */
  ROWNUM,
  VL.VENDOR_ID,
  PL.PART_ID,
  VL.UNIT_PRICE,
  VL.PURCHASE_DATE
FROM
  PART_LIST PL,
  (SELECT
     'A' MIN_PART,
     'ZZZZZZZZZZZ' MAX_PART,
     VENDOR_ID,
     UNIT_PRICE,
     PURCHASE_DATE,
     ROWNUM RN
  FROM
    (SELECT
      VENDOR_ID,
      ROUND(DBMS_RANDOM.VALUE(0,10000),2) UNIT_PRICE,
      TRUNC(SYSDATE) - ROUND(DBMS_RANDOM.VALUE(0,5000)) PURCHASE_DATE
    FROM
      VENDOR_LIST
    ORDER BY
      DBMS_RANDOM.VALUE)) VL
WHERE
  PL.PURCHASED='Y'
  AND VL.RN<=10
  AND PL.PART_ID BETWEEN VL.MIN_PART AND VL.MAX_PART;

COMMIT;

Before we start working in Excel, we need to put together a SQL statement so that we are able to determine by how much the price of a part fluctuates over time.  We will use the LEAD analytic function to allow us to compare the current row values with the next row values, and only output the row when either the VENDOR_ID changes or the UNIT_PRICE changes.  While the sample data potentially includes dates up to 5,000 days ago, we only want to consider dates up to 720 days ago for this example:

SELECT /*+ ORDERED */
  PH.PART_ID,
  PH.VENDOR_ID,
  PH.UNIT_PRICE,
  PH.LAST_VENDOR_ID,
  PH.LAST_UNIT_PRICE,
  PL.PRODUCT_CODE,
  PL.COMMODITY_CODE
FROM
  (SELECT
    PH.PART_ID,
    PH.VENDOR_ID,
    PH.UNIT_PRICE,
    PH.PURCHASE_DATE,
    LEAD(PH.VENDOR_ID,1,NULL) OVER (PARTITION BY PART_ID ORDER BY PURCHASE_DATE DESC) LAST_VENDOR_ID,
    LEAD(PH.UNIT_PRICE,1,NULL) OVER (PARTITION BY PART_ID ORDER BY PURCHASE_DATE DESC) LAST_UNIT_PRICE
  FROM
    PURCHASE_HISTORY PH
  WHERE
    PH.PURCHASE_DATE>=TRUNC(SYSDATE-720)) PH,
  PART_LIST PL
WHERE
  PH.PART_ID=PL.PART_ID
  AND (PH.VENDOR_ID<>NVL(PH.LAST_VENDOR_ID,'-')
    OR PH.UNIT_PRICE<>NVL(PH.LAST_UNIT_PRICE,-1))
ORDER BY
  PH.PART_ID,
  PH.PURCHASE_DATE DESC;

The output of the above SQL statement might look something like this:

PART_ID    VENDOR_ID  UNIT_PRICE LAST_VENDO LAST_UNIT_PRICE P C
---------- ---------- ---------- ---------- --------------- - -
AAAFWXDGOR HHJAWQCYIV    1773.67 RPKWXSTFDS         5841.37 I T
AAAFWXDGOR RPKWXSTFDS    5841.37                            I T
AABDVNQJBS BBOSDBKYBR    4034.07                            D J
AABNDOOTTV HQBZXICKQM    2932.36                            C G
AABPRKFTLG NKYJQJXGJN     242.18 HHJAWQCYIV         1997.01 F I
AABPRKFTLG HHJAWQCYIV    1997.01                            F I
AACHFXHCDC SZWNZCRUWZ    3562.43                            P G
AACNAAOZWE JEYKZFIKJU    4290.12                            L N
AAEAYOLWMN DNDYVXUZVZ    4431.63                            K T
AAFLKRJTCO QPXIDOEDTI    8613.52                            Q G
AAGDNYXQGW BZFMNYJVBP     911.06 RPKWXSTFDS         2813.39 B L
AAGDNYXQGW RPKWXSTFDS    2813.39                            B L
AAGMKTQITK RAGVQSBHKW    9221.90 BCIRRDLHAN         8541.34 S W
AAGMKTQITK BCIRRDLHAN    8541.34 CWQNPITMBE         5611.73 S W
AAGMKTQITK CWQNPITMBE    5611.73                            S W
AAINVDSSWC CQXRSIWOIL    2690.31 BBOSDBKYBR         1707.15 K R
AAINVDSSWC BBOSDBKYBR    1707.15 QFPGRYTYUM         9158.98 K R
AAINVDSSWC QFPGRYTYUM    9158.98                            K R
AALCTODILL NKYJQJXGJN    2116.94                            K M
AAMAUJIWLF LPMSAUJGHR    6294.19 CNHZFDEWIH         4666.58 L P
AAMAUJIWLF CNHZFDEWIH    4666.58 SZWNZCRUWZ         2096.59 L P
AAMAUJIWLF SZWNZCRUWZ    2096.59                            L P
AAMYBVKFQC GLVKOCSHSF     265.63 PNGVEEYGKA         5869.67 X Z
AAMYBVKFQC PNGVEEYGKA    5869.67                            X Z
AANVGRNFEX NFHOKCKLDN    3961.42                            Q O
...

Now we need to switch over to Excel.  Create four ActiveX command buttons named cmdInitialize, cmdComparePC, cmdCompareCC, cmdCompareVendorID.  Name the worksheet OracleAnalyticTest, as shown below:

Right-click the OracleAnalyticTest worksheet and select View Code.  See this blog article to determine how to enable macros in Excel 2007 (if you have not already turned on this feature) and add a reference to the Microsoft ActiveX Data Objects 2.8 (or 6.0) Library.  We will also need to add a reference to the Microsoft ActiveX Data Objects Recordset 2.8 (or 6.0) Library.  Next, we add the code to the cmdInitialize button:

Option Explicit 'Forces all variables to be declared

Dim dbDatabase As New ADODB.Connection
Dim strDatabase As String
Dim strUserName As String
Dim strPassword As String

Private Function ConnectDatabase() As Integer
    Dim intResult As Integer

    On Error Resume Next

    If dbDatabase.State <> 1 Then
        'Connection to the database if closed, specify the database name, a username, and password
        strDatabase = "MyDB"
        strUserName = "MyUser"
        strPassword = "MyPassword"

        'Connect to the database
        'Oracle connection string
        dbDatabase.ConnectionString = "Provider=OraOLEDB.Oracle;Data Source=" & strDatabase & ";User ID=" & strUserName & ";Password=" & strPassword & ";ChunkSize=1000;FetchSize=100;"

        dbDatabase.ConnectionTimeout = 40
        dbDatabase.CursorLocation = adUseClient
        dbDatabase.Open

        If (dbDatabase.State <> 1) Or (Err <> 0) Then
            intResult = MsgBox("Could not connect to the database.  Check your user name and password." & vbCrLf & Error(Err), 16, "Excel Demo")

            ConnectDatabase = False
        Else
            ConnectDatabase = True
        End If
    Else
        ConnectDatabase = True
    End If
End Function

Private Sub cmdInitialize_Click()
    Dim i As Integer
    Dim intResult As Integer
    Dim lngRow As Long
    Dim strSQL As String
    Dim snpData As ADODB.Recordset

    On Error Resume Next

    Sheets("OracleAnalyticTest").ChartObjects.Delete
    Sheets("OracleAnalyticTest").Rows("4:10000").Delete Shift:=xlUp

    intResult = ConnectDatabase

    If intResult = True Then
        Set snpData = New ADODB.Recordset

        strSQL = "SELECT /*+ ORDERED */" & vbCrLf
        strSQL = strSQL & "  PH.PART_ID," & vbCrLf
        strSQL = strSQL & "  PH.VENDOR_ID," & vbCrLf
        strSQL = strSQL & "  PH.UNIT_PRICE," & vbCrLf
        strSQL = strSQL & "  PH.LAST_VENDOR_ID," & vbCrLf
        strSQL = strSQL & "  PH.LAST_UNIT_PRICE," & vbCrLf
        strSQL = strSQL & "  PL.PRODUCT_CODE," & vbCrLf
        strSQL = strSQL & "  PL.COMMODITY_CODE" & vbCrLf
        strSQL = strSQL & "FROM" & vbCrLf
        strSQL = strSQL & "  (SELECT" & vbCrLf
        strSQL = strSQL & "    PH.PART_ID," & vbCrLf
        strSQL = strSQL & "    PH.VENDOR_ID," & vbCrLf
        strSQL = strSQL & "    PH.UNIT_PRICE," & vbCrLf
        strSQL = strSQL & "    PH.PURCHASE_DATE," & vbCrLf
        strSQL = strSQL & "    LEAD(PH.VENDOR_ID,1,NULL) OVER (PARTITION BY PART_ID ORDER BY PURCHASE_DATE DESC) LAST_VENDOR_ID," & vbCrLf
        strSQL = strSQL & "    LEAD(PH.UNIT_PRICE,1,NULL) OVER (PARTITION BY PART_ID ORDER BY PURCHASE_DATE DESC) LAST_UNIT_PRICE" & vbCrLf
        strSQL = strSQL & "  FROM" & vbCrLf
        strSQL = strSQL & "    PURCHASE_HISTORY PH" & vbCrLf
        strSQL = strSQL & "  WHERE" & vbCrLf
        strSQL = strSQL & "    PH.PURCHASE_DATE>=TRUNC(SYSDATE-270)) PH," & vbCrLf
        strSQL = strSQL & "  PART_LIST PL" & vbCrLf
        strSQL = strSQL & "WHERE" & vbCrLf
        strSQL = strSQL & "  PH.PART_ID=PL.PART_ID" & vbCrLf
        strSQL = strSQL & "  AND (PH.VENDOR_ID<>NVL(PH.LAST_VENDOR_ID,'-')" & vbCrLf
        strSQL = strSQL & "    OR PH.UNIT_PRICE<>NVL(PH.LAST_UNIT_PRICE,-1))" & vbCrLf
        strSQL = strSQL & "ORDER BY" & vbCrLf
        strSQL = strSQL & "  PH.PART_ID," & vbCrLf
        strSQL = strSQL & "  PH.PURCHASE_DATE DESC"
        snpData.Open strSQL, dbDatabase

        If snpData.State = 1 Then
            Application.ScreenUpdating = False

            For i = 0 To snpData.Fields.Count - 1
                ActiveSheet.Cells(3, i + 1).Value = snpData.Fields(i).Name
            Next i
            ActiveSheet.Range(ActiveSheet.Cells(3, 1), ActiveSheet.Cells(3, snpData.Fields.Count)).Font.Bold = True

            ActiveSheet.Range("A4").CopyFromRecordset snpData

            'Auto-fit up to 26 columns
            ActiveSheet.Columns("A:" & Chr(64 + snpData.Fields.Count)).AutoFit
            ActiveSheet.Range("A4").Select
            ActiveWindow.FreezePanes = True

            'Remove duplicate rows with the same PART ID
            lngRow = 4
            Do While lngRow < Sheets("OracleAnalyticTest").UsedRange.Rows.Count + 2
                If Sheets("OracleAnalyticTest").Cells(lngRow, 1).FormulaR1C1 = "" Then
                    'Past the end of the rows
                    Exit Do
                End If
                If Sheets("OracleAnalyticTest").Cells(lngRow - 1, 1).FormulaR1C1 = Sheets("OracleAnalyticTest").Cells(lngRow, 1).FormulaR1C1 Then
                    'Found a duplicate row, delete it
                    Sheets("OracleAnalyticTest").Rows(lngRow).Delete Shift:=xlUp
                Else
                    lngRow = lngRow + 1
                End If
            Loop
            snpData.Close

            Application.ScreenUpdating = True
        End If
    End If

    Set snpData = Nothing
End Sub

The cmdInitialize_Click subroutine retrieves the data from the database using the supplied SQL statement and writes that information to the worksheet.  The macro then eliminates subsequent rows if the part ID is identical to the previous part ID (this step would not have been required if we modified the SQL statement to use the ROW_NUMBER analytic function, and eliminate all rows where the ROW_NUMBER value is not 1).  Once you add the above code, you should be able to switch back to the Excel worksheet, turn off Design Mode, and click the Initialize button.

Unfortunately, this example will retrieve too many rows with too little variation in the PRODUCT_CODE and COMMODITY_CODE columns (just 26 distinct values), so it might be a good idea to delete all rows below row 1004.  Now we need to switch back to the Microsoft Visual Basic editor and add the code for the other three buttons.  Note that this code takes advantage of gradient shading in Excel 2007 charts, so some modification might be necessary on Excel 2003 and earlier.

Private Sub cmdCompareCC_Click()
    Dim i As Long
    Dim intCount As Integer
    Dim intChartNumber As Integer
    Dim lngRows As Long
    Dim dblValues() As Double
    Dim strValueNames() As String
    Dim snpDataList As ADOR.Recordset

    On Error Resume Next

    Sheets("OracleAnalyticTest").ChartObjects.Delete
    Sheets("OracleAnalyticTest").Cells(4, 1).Select
    lngRows = Sheets("OracleAnalyticTest").UsedRange.Rows.Count + 2

    'Set up to use ADOR to automatically sort the product codes
    Set snpDataList = New ADOR.Recordset
    snpDataList.Fields.Append "commodity_code", adVarChar, 30
    snpDataList.Open

    'Pick up a distinct list of commodity codes
    For i = 4 To lngRows
        'Only include those commodity codes with price changes
        If (Sheets("OracleAnalyticTest").Cells(i, 5).Value <> 0) And (Round(Sheets("OracleAnalyticTest").Cells(i, 3).Value - Sheets("OracleAnalyticTest").Cells(i, 5).Value, 2) <> 0) Then
            If snpDataList.RecordCount > 0 Then
                snpDataList.MoveFirst
            End If
            snpDataList.Find ("commodity_code = '" & Sheets("OracleAnalyticTest").Cells(i, 7) & "'")
            If snpDataList.EOF Then
                'Did not find a matching record
                snpDataList.AddNew
                snpDataList("commodity_code") = Sheets("OracleAnalyticTest").Cells(i, 7).Value
                snpDataList.Update
            End If
        End If
    Next i
    snpDataList.Sort = "commodity_code"

    'Find the matching rows for each product code
    snpDataList.MoveFirst
    Do While Not snpDataList.EOF
        intCount = 0
        ReDim dblValues(250)
        ReDim strValueNames(250)
        For i = 4 To lngRows
            If intCount >= 250 Then
                'Excel charts only permit about 250 data points when created with this method
                Exit For
            End If
            If Sheets("OracleAnalyticTest").Cells(i, 7).Value = snpDataList("commodity_code") Then
                'Found a row with this product code
                If (Sheets("OracleAnalyticTest").Cells(i, 5).Value <> 0) And (Round(Sheets("OracleAnalyticTest").Cells(i, 3).Value - Sheets("OracleAnalyticTest").Cells(i, 5).Value, 2) <> 0) Then
                    'Price change was found
                    dblValues(intCount) = Round(Sheets("OracleAnalyticTest").Cells(i, 3).Value - Sheets("OracleAnalyticTest").Cells(i, 5).Value, 2)
                    strValueNames(intCount) = Sheets("OracleAnalyticTest").Cells(i, 1).FormulaR1C1
                    intCount = intCount + 1
                End If
            End If
        Next i

        'Set the arrays to the exact number of elements, first element at position 0
        ReDim Preserve dblValues(intCount - 1)
        ReDim Preserve strValueNames(intCount - 1)

        intChartNumber = intChartNumber + 1
        With Sheets("OracleAnalyticTest").ChartObjects.Add(10 * intChartNumber, 60 + 10 * intChartNumber, 400, 300)
            .Chart.SeriesCollection.NewSeries
            .Chart.SeriesCollection(1).Values = dblValues
            .Chart.SeriesCollection(1).XValues = strValueNames
            .Chart.Axes(1).CategoryType = 2
            .Chart.HasLegend = False

            .Chart.HasTitle = True
            .Chart.ChartTitle.Text = "Price Changes by Commodity Code: " & snpDataList("commodity_code")

            .Chart.Axes(xlCategory, xlPrimary).HasTitle = True
            .Chart.Axes(xlCategory, xlPrimary).AxisTitle.Characters.Text = "Part ID"
            .Chart.Axes(xlValue, xlPrimary).HasTitle = True
            .Chart.Axes(xlValue, xlPrimary).AxisTitle.Characters.Text = "Unit Cost Change"

            .Chart.SeriesCollection(1).HasDataLabels = True
            .Chart.SeriesCollection(1).HasLeaderLines = True

            With .Chart.PlotArea.Border
                .ColorIndex = 16
                .Weight = xlThin
                .LineStyle = xlContinuous
            End With

            .Chart.PlotArea.Fill.OneColorGradient Style:=msoGradientHorizontal, Variant:=2, Degree:=0.756847486076142
            .Chart.PlotArea.Fill.ForeColor.SchemeColor = 23
            .Chart.PlotArea.Fill.Visible = True
            With .Chart.PlotArea.Border
                .ColorIndex = 57
                .Weight = xlThin
                .LineStyle = xlContinuous
            End With

            .Chart.SeriesCollection(1).Fill.OneColorGradient Style:=msoGradientVertical, Variant:=4, Degree:=0.2
            .Chart.SeriesCollection(1).Fill.Visible = True
            .Chart.SeriesCollection(1).Fill.ForeColor.SchemeColor = 4

            .Chart.Axes(xlValue).MajorGridlines.Border.ColorIndex = 2
            With .Chart.SeriesCollection(1).DataLabels.Font
                .Name = "Arial"
                .FontStyle = "Regular"
                .Size = 8
                .Color = RGB(255, 255, 255)
            End With
            With .Chart.Axes(xlCategory).TickLabels.Font
                .Name = "Arial"
                .FontStyle = "Regular"
                .Size = 8
                .Color = RGB(255, 255, 255)
            End With
            With .Chart.ChartTitle.Font
                .Name = "Arial"
                .FontStyle = "Bold"
                .Size = 16
                .Color = RGB(0, 0, 255)
            End With
        End With
        snpDataList.MoveNext
    Loop

    Set snpDataList = Nothing
End Sub

Private Sub cmdComparePC_Click()
    Dim i As Long
    Dim intCount As Integer
    Dim intChartNumber As Integer
    Dim lngRows As Long
    Dim dblValues() As Double
    Dim strValueNames() As String
    Dim snpDataList As ADOR.Recordset

    On Error Resume Next

    Sheets("OracleAnalyticTest").ChartObjects.Delete
    Sheets("OracleAnalyticTest").Cells(4, 1).Select
    lngRows = Sheets("OracleAnalyticTest").UsedRange.Rows.Count + 2

    'Set up to use ADOR to automatically sort the product codes
    Set snpDataList = New ADOR.Recordset
    snpDataList.Fields.Append "product_code", adVarChar, 30
    snpDataList.Open

    'Pick up a distinct list of product codes
    For i = 4 To lngRows
        'Only include those product codes with price changes
        If (Sheets("OracleAnalyticTest").Cells(i, 5).Value <> 0) And (Round(Sheets("OracleAnalyticTest").Cells(i, 3).Value - Sheets("OracleAnalyticTest").Cells(i, 5).Value, 2) <> 0) Then
            If snpDataList.RecordCount > 0 Then
                snpDataList.MoveFirst
            End If
            snpDataList.Find ("product_code = '" & Sheets("OracleAnalyticTest").Cells(i, 6) & "'")
            If snpDataList.EOF Then
                'Did not find a matching record
                snpDataList.AddNew
                snpDataList("product_code") = Sheets("OracleAnalyticTest").Cells(i, 6).Value
                snpDataList.Update
            End If
        End If
    Next i
    snpDataList.Sort = "product_code"

    'Find the matching rows for each product code
    snpDataList.MoveFirst
    Do While Not snpDataList.EOF
        intCount = 0
        ReDim dblValues(250)
        ReDim strValueNames(250)
        For i = 4 To lngRows
            If intCount >= 250 Then
                'Excel charts only permit about 250 data points when created with this method
                Exit For
            End If
            If Sheets("OracleAnalyticTest").Cells(i, 6).Value = snpDataList("product_code") Then
                'Found a row with this product code
                If (Sheets("OracleAnalyticTest").Cells(i, 5).Value <> 0) And (Round(Sheets("OracleAnalyticTest").Cells(i, 3).Value - Sheets("OracleAnalyticTest").Cells(i, 5).Value, 2) <> 0) Then
                    'Price change was found
                    dblValues(intCount) = Round(Sheets("OracleAnalyticTest").Cells(i, 3).Value - Sheets("OracleAnalyticTest").Cells(i, 5).Value, 2)
                    strValueNames(intCount) = Sheets("OracleAnalyticTest").Cells(i, 1).FormulaR1C1
                    intCount = intCount + 1
                End If
            End If
        Next i

        'Set the arrays to the exact number of elements, first element at position 0
        ReDim Preserve dblValues(intCount - 1)
        ReDim Preserve strValueNames(intCount - 1)

        intChartNumber = intChartNumber + 1

        With Sheets("OracleAnalyticTest").ChartObjects.Add(10 * intChartNumber, 60 + 10 * intChartNumber, 400, 300)
            .Chart.SeriesCollection.NewSeries
            .Chart.SeriesCollection(1).Values = dblValues
            .Chart.SeriesCollection(1).XValues = strValueNames
            .Chart.Axes(1).CategoryType = 2
            .Chart.HasLegend = False

            .Chart.HasTitle = True
            .Chart.ChartTitle.Text = "Price Changes by Product Code: " & snpDataList("product_code")

            .Chart.Axes(xlCategory, xlPrimary).HasTitle = True
            .Chart.Axes(xlCategory, xlPrimary).AxisTitle.Characters.Text = "Part ID"
            .Chart.Axes(xlValue, xlPrimary).HasTitle = True
            .Chart.Axes(xlValue, xlPrimary).AxisTitle.Characters.Text = "Unit Cost Change"

            .Chart.SeriesCollection(1).HasDataLabels = True
            .Chart.SeriesCollection(1).HasLeaderLines = True

            With .Chart.PlotArea.Border
                .ColorIndex = 16
                .Weight = xlThin
                .LineStyle = xlContinuous
            End With

            .Chart.PlotArea.Fill.OneColorGradient Style:=msoGradientHorizontal, Variant:=2, Degree:=0.756847486076142
            .Chart.PlotArea.Fill.ForeColor.SchemeColor = 23
            .Chart.PlotArea.Fill.Visible = True
            With .Chart.PlotArea.Border
                .ColorIndex = 57
                .Weight = xlThin
                .LineStyle = xlContinuous
            End With

            .Chart.SeriesCollection(1).Fill.OneColorGradient Style:=msoGradientVertical, Variant:=4, Degree:=0.2
            .Chart.SeriesCollection(1).Fill.Visible = True
            .Chart.SeriesCollection(1).Fill.ForeColor.SchemeColor = 5

            .Chart.Axes(xlValue).MajorGridlines.Border.ColorIndex = 2
            With .Chart.SeriesCollection(1).DataLabels.Font
                .Name = "Arial"
                .FontStyle = "Regular"
                .Size = 8
                .Color = RGB(255, 255, 255)
            End With
            With .Chart.Axes(xlCategory).TickLabels.Font
                .Name = "Arial"
                .FontStyle = "Regular"
                .Size = 8
                .Color = RGB(255, 255, 255)
            End With
            With .Chart.ChartTitle.Font
                .Name = "Arial"
                .FontStyle = "Bold"
                .Size = 16
                .Color = RGB(0, 0, 255)
            End With
        End With

        snpDataList.MoveNext
    Loop

    Set snpDataList = Nothing
End Sub

Private Sub cmdCompareVendorID_Click()
    Dim i As Long
    Dim intCount As Integer
    Dim intChartNumber As Integer
    Dim lngRows As Long
    Dim dblValues() As Double
    Dim strValueNames() As String
    Dim snpDataList As ADOR.Recordset

    On Error Resume Next

    Sheets("OracleAnalyticTest").ChartObjects.Delete
    Sheets("OracleAnalyticTest").Cells(4, 1).Select
    lngRows = Sheets("OracleAnalyticTest").UsedRange.Rows.Count + 2

    'Set up to use ADOR to automatically sort the product codes
    Set snpDataList = New ADOR.Recordset
    snpDataList.Fields.Append "vendor_id", adVarChar, 30
    snpDataList.Open

    'Pick up a distinct list of vendor IDs
    For i = 4 To lngRows
        'Only include those vendor IDs with price changes
        If (Sheets("OracleAnalyticTest").Cells(i, 5).Value <> 0) And (Round(Sheets("OracleAnalyticTest").Cells(i, 3).Value - Sheets("OracleAnalyticTest").Cells(i, 5).Value, 2) <> 0) Then
            If snpDataList.RecordCount > 0 Then
                snpDataList.MoveFirst
            End If
            snpDataList.Find ("vendor_id = '" & Sheets("OracleAnalyticTest").Cells(i, 2) & "'")
            If snpDataList.EOF Then
                'Did not find a matching record
                snpDataList.AddNew
                snpDataList("vendor_id") = Sheets("OracleAnalyticTest").Cells(i, 2).Value
                snpDataList.Update
            End If
        End If
    Next i
    snpDataList.Sort = "vendor_id"

    'Find the matching rows for each product code
    snpDataList.MoveFirst
    Do While Not snpDataList.EOF
        intCount = 0
        ReDim dblValues(250)
        ReDim strValueNames(250)
        For i = 4 To lngRows
            If intCount >= 250 Then
                'Excel charts only permit about 250 data points when created with this method
                Exit For
            End If
            If Sheets("OracleAnalyticTest").Cells(i, 2).Value = snpDataList("vendor_id") Then
                'Found a row with this product code
                If (Sheets("OracleAnalyticTest").Cells(i, 5).Value <> 0) And (Round(Sheets("OracleAnalyticTest").Cells(i, 3).Value - Sheets("OracleAnalyticTest").Cells(i, 5).Value, 2) <> 0) Then
                    'Price change was found
                    dblValues(intCount) = Round(Sheets("OracleAnalyticTest").Cells(i, 3).Value - Sheets("OracleAnalyticTest").Cells(i, 5).Value, 2)
                    strValueNames(intCount) = Sheets("OracleAnalyticTest").Cells(i, 1).FormulaR1C1
                    intCount = intCount + 1
                End If
            End If
        Next i

        'Set the arrays to the exact number of elements, first element at position 0
        ReDim Preserve dblValues(intCount - 1)
        ReDim Preserve strValueNames(intCount - 1)

        intChartNumber = intChartNumber + 1

        With Sheets("OracleAnalyticTest").ChartObjects.Add(10 * intChartNumber, 60 + 10 * intChartNumber, 400, 300)
            .Chart.SeriesCollection.NewSeries
            .Chart.SeriesCollection(1).Values = dblValues
            .Chart.SeriesCollection(1).XValues = strValueNames
            .Chart.Axes(1).CategoryType = 2
            .Chart.HasLegend = False

            .Chart.HasTitle = True
            .Chart.ChartTitle.Text = "Price Changes by Vendor: " & snpDataList("vendor_id")

            .Chart.Axes(xlCategory, xlPrimary).HasTitle = True
            .Chart.Axes(xlCategory, xlPrimary).AxisTitle.Characters.Text = "Part ID"
            .Chart.Axes(xlValue, xlPrimary).HasTitle = True
            .Chart.Axes(xlValue, xlPrimary).AxisTitle.Characters.Text = "Unit Cost Change"

            .Chart.SeriesCollection(1).HasDataLabels = True
            .Chart.SeriesCollection(1).HasLeaderLines = True

            With .Chart.PlotArea.Border
                .ColorIndex = 16
                .Weight = xlThin
                .LineStyle = xlContinuous
            End With

            .Chart.PlotArea.Fill.OneColorGradient Style:=msoGradientHorizontal, Variant:=2, Degree:=0.756847486076142
            .Chart.PlotArea.Fill.ForeColor.SchemeColor = 23
            .Chart.PlotArea.Fill.Visible = True
            With .Chart.PlotArea.Border
                .ColorIndex = 57
                .Weight = xlThin
                .LineStyle = xlContinuous
            End With

            .Chart.SeriesCollection(1).Fill.OneColorGradient Style:=msoGradientVertical, Variant:=4, Degree:=0.2
            .Chart.SeriesCollection(1).Fill.Visible = True
            .Chart.SeriesCollection(1).Fill.ForeColor.SchemeColor = 45

            .Chart.Axes(xlValue).MajorGridlines.Border.ColorIndex = 2
            With .Chart.SeriesCollection(1).DataLabels.Font
                .Name = "Arial"
                .FontStyle = "Regular"
                .Size = 8
                .Color = RGB(255, 255, 255)
            End With
            With .Chart.Axes(xlCategory).TickLabels.Font
                .Name = "Arial"
                .FontStyle = "Regular"
                .Size = 8
                .Color = RGB(255, 255, 255)
            End With
            With .Chart.ChartTitle.Font
                .Name = "Arial"
                .FontStyle = "Bold"
                .Size = 16
                .Color = RGB(0, 0, 255)
            End With
        End With
        snpDataList.MoveNext
    Loop

    Set snpDataList = Nothing
End Sub

If we switch back to the Excel worksheet, the remaining three buttons should now work.  Clicking each button will cause Excel to examine the data in the worksheet to locate all of the unique values for PRODUCT_CODE, COMMODITY_CODE, or VENDOR_ID, and then sort the list in alphabetical order, and build a chart for each of the part IDs that fall into those categories.  The results for my test run of each button looks like the following three pictures.

You can, of course, adapt the code to work with other SQL statements and modify the chart generating code to alter the chart type, colors, and fonts.





SQL – RANK, MAX Analytical Functions, DECODE, SIGN

26 12 2009

December 26, 2009

A couple years ago the following question appeared on the comp.databases.oracle.misc Usenet group:
http://groups.google.com/group/comp.databases.oracle.misc/browse_thread/thread/22b36e3ca18490db

Table Structure:

Table1
symbol  orders  ttime

Requirement: Want to arrange all records, symbolwise, based on orders (asc order).
Among that, if a particular symbol have records in the range TTIME BETWEEN 9300 AND 1530, then I want to extract MIN(TTIME) within that range else MIN(TTIME) of whatever available records.

I want to achieve this using a single query.

Example:

Table1
symbol  orders  ttime
A       2       9300
A       2       9450
A       2       1030
A       2       9451
A       2       1530
A       2       1600
A       2       1700
B       3       9300
B       4       1600
C       3       1600

I want to get all records with maximum orders (in desc order) for each symbol.

Output:
Symbol  Orders  ttime
A       2       9300
A       2       9450
A       2       9451
A       2       1030
A       2       1530
A       2       1600
A       1       9300
A       2       1700
B       4       9300
B       4       1600
C       3       1600

Out of this subset I want to get all records with ttime falling between 9450 to 1530 to appear first in asc. if there is no record within this range then I want to go for normal asc order on ttime.

Ouput:
Symbol  Orders  ttime
A       2       9450
A       2       1030
A       2       1530
A       2       1600
A       2       9300
B       4       9450
B       4       1030
B       4       1600
C       3       1600

Finally I want to extract only first record

Final output:
A       2       9450
B       4       9300
C       3       1600

Are we dealing with time here?  When is 9300 less than 1530?  Why is 1 included in the above?

The set up:

CREATE TABLE T1 (
  SYMBOL CHAR(1),
  ORDERS NUMBER(10),
  TTIME NUMBER(10));

INSERT INTO T1 VALUES('A',2,9300);
INSERT INTO T1 VALUES('A',2,9450);
INSERT INTO T1 VALUES('A',2,10300);
INSERT INTO T1 VALUES('A',2,9451);
INSERT INTO T1 VALUES('A',2,15300);
INSERT INTO T1 VALUES('A',2,16000);
INSERT INTO T1 VALUES('A',2,17000);
INSERT INTO T1 VALUES('B',3,9300);
INSERT INTO T1 VALUES('B',4,16000);
INSERT INTO T1 VALUES('C',3,16000);

First, let’s find the maximum value for ORDERS for each SYMBOL:

SELECT
  SYMBOL,
  MAX(ORDERS) OVER (PARTITION BY SYMBOL) ORDERS,
  TTIME TTIME
FROM
  T1;

SYMBOL ORDERS  TTIME
A           2   9300
A           2   9450
A           2  10300
A           2   9451
A           2  15300
A           2  17000
A           2  16000
B           4  16000
B           4   9300
C           3  16000

You stated that if TTIME is between 9450 and 1530 (should it be 15300?), that it should take priority over other values of TTIME.  The DECODE below determines if TTIME is between 9450 and 15300, if so it assigns a value of 10 to EXTRA_SORT, otherwise it assigns a value of 1 to EXTRA_SORT:

SELECT
  SYMBOL,
  MAX(ORDERS) OVER (PARTITION BY SYMBOL) ORDERS,
  TTIME TTIME,
  DECODE(SIGN(TTIME-9449.999),1,DECODE(SIGN(TTIME-15300.0001),-1,10,1),1) EXTRA_SORT
FROM
  T1;

SYMBOL ORDERS  TTIME  EXTRA_SORT
A           2   9300           1
A           2   9450          10
A           2  10300          10
A           2   9451          10
A           2  15300          10
A           2  17000           1
A           2  16000           1
B           4  16000           1
B           4   9300           1
C           3  16000           1

If we slide the above into an inline view, we can then rank the rows when sorted first on EXTRA_SORT and then on TTIME:

SELECT
  SYMBOL,
  ORDERS,
  TTIME,
  RANK() OVER (PARTITION BY SYMBOL ORDER BY EXTRA_SORT DESC,TTIME) POSITION
FROM
  (SELECT
    SYMBOL,
    MAX(ORDERS) OVER (PARTITION BY SYMBOL) ORDERS,
    TTIME TTIME,
    DECODE(SIGN(TTIME-9449.999),1,DECODE(SIGN(TTIME-15300.0001),-1,10,1),1) EXTRA_SORT
  FROM
    T1);

SYMBOL ORDERS  TTIME  POSITION
A           2   9450         1
A           2   9451         2
A           2  10300         3
A           2  15300         4
A           2   9300         5
A           2  16000         6
A           2  17000         7
B           4   9300         1
B           4  16000         2
C           3  16000         1

We can again slide the above into an inline view and extract only those with a POSITION value of 1:

SELECT
  SYMBOL,
  ORDERS,
  TTIME
FROM
  (SELECT
    SYMBOL,
    ORDERS,
    TTIME,
    RANK() OVER (PARTITION BY SYMBOL ORDER BY EXTRA_SORT DESC,TTIME)
POSITION
  FROM
    (SELECT
      SYMBOL,
      MAX(ORDERS) OVER (PARTITION BY SYMBOL) ORDERS,
      TTIME TTIME,
      DECODE(SIGN(TTIME-9449.999),1,DECODE(SIGN(TTIME-15300.0001),-1,10,1),1) EXTRA_SORT
    FROM
      T1)
  )
WHERE
  POSITION=1;

SYMBOL ORDERS  TTIME
A           2   9450
B           4   9300
C           3  16000




SQL – DENSE_RANK, PERCENT_RANK, and COUNT Analytical Functions

26 12 2009

December 26, 2009

A couple years ago the following question appeared on the comp.databases.oracle.misc Usenet group:
http://groups.google.com/group/comp.databases.oracle.misc/browse_thread/thread/9af4117466316d9a

I have to following problem:

I get X rows from a statement, these are sorted by a certain column, let’s say a numerical value.
Now I want to calculate the average of this numerical value, but the 10% with the lowest and the 10% with the highest value shall not be included in this calculation. So for example, if I get 20 rows, I need the average of the value in rows 3 to 18.

Currently I solved this with a very complicated statement, but I don’t know the built-in Oracle mathematical functions so I hope that there could be a way to do this with a better performance.

Let’s set up a short experiment:

CREATE TABLE T1 (C1 NUMBER(4));

INSERT INTO T1
SELECT
  ROWNUM
FROM
  DUAL
CONNECT BY
  LEVEL<=20;

We now have a table with 20 rows with numbers between 1 and 20.

Assuming that you are running a version of Oracle that supports analytical functions, the following returns the twenty rows with the relative ranking of each row, if the rows are sorted by C1 in descending order:

SELECT
  C1,
  DENSE_RANK() OVER (ORDER BY C1 DESC) DR,
  COUNT(C1) OVER (PARTITION BY 1) R
FROM
  T1;

        C1         DR          R
---------- ---------- ----------
        20          1         20
        19          2         20
        18          3         20
        17          4         20
        16          5         20
        15          6         20
        14          7         20
        13          8         20
        12          9         20
        11         10         20
        10         11         20
         9         12         20
         8         13         20
         7         14         20
         6         15         20
         5         16         20
         4         17         20
         3         18         20
         2         19         20
         1         20         20

A slight modification of the above, dividing the value of DENSE_RANK by the value of COUNT, and also including a PERCENT_RANK for comparison:

SELECT
  C1,
  DENSE_RANK() OVER (ORDER BY C1 DESC) DR,
  (DENSE_RANK() OVER (ORDER BY C1 DESC))/(COUNT(C1) OVER (PARTITION BY 1)) DRP,
  PERCENT_RANK() OVER (ORDER BY C1 DESC) PR
FROM
  T1;

        C1         DR        DRP         PR
---------- ---------- ---------- ----------
        20          1        .05          0
        19          2         .1 .052631579
        18          3        .15 .105263158
        17          4         .2 .157894737
        16          5        .25 .210526316
        15          6         .3 .263157895
        14          7        .35 .315789474
        13          8         .4 .368421053
        12          9        .45 .421052632
        11         10         .5 .473684211
        10         11        .55 .526315789
         9         12         .6 .578947368
         8         13        .65 .631578947
         7         14         .7 .684210526
         6         15        .75 .736842105
         5         16         .8 .789473684
         4         17        .85 .842105263
         3         18         .9 .894736842
         2         19        .95 .947368421
         1         20          1          1

The final cleanup is performed when the above is slid into an inline view, by using a WHERE clause:

SELECT
  SUM(T.C1) S
FROM
  (SELECT
    C1,
    DENSE_RANK() OVER (ORDER BY C1 DESC) DR,
    (DENSE_RANK() OVER (ORDER BY C1 DESC))/(COUNT(C1) OVER (PARTITION BY 1)) DRP,
    PERCENT_RANK() OVER (ORDER BY C1 DESC) PR
  FROM
    T1) T
WHERE
  T.DRP>0.1
  AND T.DRP<=0.9;

         S
———-
       168

A version that uses the PERCENT_RANK value:

SELECT
  SUM(T.C1) S
FROM
  (SELECT
    C1,
    DENSE_RANK() OVER (ORDER BY C1 DESC) DR,
    (DENSE_RANK() OVER (ORDER BY C1 DESC))/(COUNT(C1) OVER (PARTITION BY 1)) DRP,
    PERCENT_RANK() OVER (ORDER BY C1 DESC) PR
  FROM
    T1) T
WHERE
  T.PR BETWEEN 0.1 AND 0.9;

         S
———-
       168





SQL – COUNT Analytical Function, GROUP BY, HAVING

26 12 2009

December 26, 2009

A couple years ago the following question appeared on the comp.databases.oracle.misc Usenet group: http://groups.google.com/group/comp.databases.oracle.misc/browse_thread/thread/93bf8d1e75033d4c

I have a book table and in that table it has the book tile, publisher, and type of book it is. example mystery, scifi, etc…

I am trying to write a query that brings back a list of every pair of books that have the same publisher and same book type.  I have been able to get the following code to work:

select publisher_code, type
from book
group by publisher_code, type
having count(*) > 1;

which returns the following results:

PU TYP
-- ---
JP MYS
LB FIC
PE FIC
PL FIC
ST SFI
VB FIC

I can not figure out how to get the book title and book code for the books that this result list represents, everything I have tried throws out an error.

My initial response follows:

I see two possible methods:

  1.  Slide the SQL statement that you have written into an inline view, join the inline view to your book table, and then use the publisher_code, type columns to drive back into your book table.  The join syntax may look like one of the following: (publisher_code, type) IN (SELECT…)   or   b.publisher_code=ib.publisher_code and b.type=ib.type 
  2.  Use analytical functions (COUNT() OVER…) to determine the number of matches for the same publisher_code, type columns.  Then slide this SQL statement into an inline view to retrieve only those records with the aliased COUNT() OVER greater than 1.  This has the benefit of retrieving the matching rows in a single pass.

The original poster then attempted to create a query to meet the requirements, but the query generated an error:

SQL> select title
  2  from book
  3  where publisher_code, type in
  4  (select publisher_code, type
  5  from book
  6  group by publisher_code, type
  7  having count(*) > 1);
where publisher_code, type in
                    *
ERROR at line 3:
ORA-00920: invalid relational operator

My reponse continues:

Very close to what you need.  However, Oracle expects the column names to be wrapped in () … like this:  where (publisher_code, type) in

The above uses a subquery, which may perform slow on some Oracle releases compared to the use of an inline view.  Assume that I have a table named PART, which has columns ID, DESCRIPITION, PRODUCT_CODE, and COMMODITY_CODE, with ID as the primary key.  I want to find ID, DESCRIPTION, and COMMODITY_CODE for all parts with the same DESCRIPTION and PRODUCT_CODE, where there are at least 3 matching parts in the group:

The starting point, which looks similar to your initial query:

SELECT
  DESCRIPTION,
  PRODUCT_CODE,
  COUNT(*) NUM_MATCHES
FROM
  PART
GROUP BY
  DESCRIPTION,
  PRODUCT_CODE
HAVING
  COUNT(*)>=3;

When the original query is slid into an inline view and joined to the original table, it looks like this:

SELECT
  P.ID,
  P.DESCRIPTION,
  P.COMMODITY_CODE
FROM
  (SELECT
    DESCRIPTION,
    PRODUCT_CODE,
    COUNT(*) NUM_MATCHES
  FROM
    PART
  GROUP BY
    DESCRIPTION,
    PRODUCT_CODE
  HAVING
    COUNT(*)>=3) IP,
  PART P
WHERE
  IP.DESCRIPTION=P.DESCRIPTION
  AND IP.PRODUCT_CODE=P.PRODUCT_CODE;

Here is the DBMS_XPLAN:

-------------------------------------------------------------------------------------------------------------------
| Id  | Operation             | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-------------------------------------------------------------------------------------------------------------------
|*  1 |  HASH JOIN            |      |      1 |   1768 |  11525 |00:00:00.21 |    2748 |  1048K|  1048K| 1293K (0)|
|   2 |   VIEW                |      |      1 |   1768 |   1156 |00:00:00.11 |    1319 |       |       |          |
|*  3 |    FILTER             |      |      1 |        |   1156 |00:00:00.11 |    1319 |       |       |          |
|   4 |     HASH GROUP BY     |      |      1 |   1768 |  23276 |00:00:00.08 |    1319 |       |       |          |
|   5 |      TABLE ACCESS FULL| PART |      1 |  35344 |  35344 |00:00:00.04 |    1319 |       |       |          |
|   6 |   TABLE ACCESS FULL   | PART |      1 |  35344 |  35344 |00:00:00.04 |    1429 |       |       |          |
-------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("IP"."DESCRIPTION"="P"."DESCRIPTION" AND "IP"."PRODUCT_CODE"="P"."PRODUCT_CODE")
   3 - filter(COUNT(*)>=3)

The query format using the subquery looks like this:

SELECT
  P.ID,
  P.DESCRIPTION,
  P.COMMODITY_CODE
FROM
  PART P
WHERE
  (DESCRIPTION,PRODUCT_CODE) IN
  (SELECT
    DESCRIPTION,
    PRODUCT_CODE
  FROM
    PART
  GROUP BY
    DESCRIPTION,
    PRODUCT_CODE
  HAVING
    COUNT(*)>=3);

The DBMS_XPLAN, note that Oracle 10.2.0.2 transformed the query above:

-----------------------------------------------------------------------------------------------------------------------
| Id  | Operation             | Name     | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------
|*  1 |  HASH JOIN RIGHT SEMI |          |      1 |      1 |  11525 |00:00:00.21 |    2748 |  1048K|  1048K| 1214K (0)|
|   2 |   VIEW                | VW_NSO_1 |      1 |   1768 |   1156 |00:00:00.12 |    1319 |       |       |          |
|*  3 |    FILTER             |          |      1 |        |   1156 |00:00:00.12 |    1319 |       |       |          |
|   4 |     HASH GROUP BY     |          |      1 |   1768 |  23276 |00:00:00.09 |    1319 |       |       |          |
|   5 |      TABLE ACCESS FULL| PART     |      1 |  35344 |  35344 |00:00:00.04 |    1319 |       |       |          |
|   6 |   TABLE ACCESS FULL   | PART     |      1 |  35344 |  35344 |00:00:00.01 |    1429 |       |       |          |
-----------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("DESCRIPTION"="$nso_col_1" AND "PRODUCT_CODE"="$nso_col_2")
   3 - filter(COUNT(*)>=3)

Without allowing the automatic transformations in Oracle 10.2.0.2, the query takes _much_ longer than 0.21 seconds to complete.

The method using analytical functions starts like this:

SELECT
  P.ID,
  P.DESCRIPTION,
  P.COMMODITY_CODE,
  COUNT(*) OVER (PARTITION BY DESCRIPTION, PRODUCT_CODE) NUM_MATCHES
FROM
  PART P;

Then, sliding the above into an inline view:

SELECT
  ID,
  DESCRIPTION,
  COMMODITY_CODE
FROM
  (SELECT
    P.ID,
    P.DESCRIPTION,
    P.COMMODITY_CODE,
    COUNT(*) OVER (PARTITION BY DESCRIPTION, PRODUCT_CODE) NUM_MATCHES
  FROM
    PART P)
WHERE
  NUM_MATCHES>=3;

The DBMS_XPLAN for the above looks like this:

-----------------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------
|*  1 |  VIEW               |      |      1 |  35344 |  11525 |00:00:00.31 |    1319 |       |       |          |
|   2 |   WINDOW SORT       |      |      1 |  35344 |  35344 |00:00:00.27 |    1319 |  2533K|   726K| 2251K (0)|
|   3 |    TABLE ACCESS FULL| PART |      1 |  35344 |  35344 |00:00:00.04 |    1319 |       |       |          |
-----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("NUM_MATCHES">=3)

Note that there is only one TABLE ACCESS FULL of the PART table in the above.  The execution time required 0.31 seconds to complete, which is greater than the first two approaches, but that is because the database server is concurrently still trying to resolve the query method using the subquery with no permitted transformations (5+ minutes later).

Subquery method with no transformations permitted:

---------------------------------------------------------------------------------------
| Id  | Operation            | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------
|*  1 |  FILTER              |      |      1 |        |  11525 |00:46:21.46 |      38M|
|   2 |   TABLE ACCESS FULL  | PART |      1 |  35344 |  35344 |00:00:00.25 |    1429 |
|*  3 |   FILTER             |      |  29474 |        |   6143 |00:46:06.52 |      38M|
|   4 |    HASH GROUP BY     |      |  29474 |      1 |    613M|00:33:24.30 |      38M|
|   5 |     TABLE ACCESS FULL| PART |  29474 |  35344 |   1041M|00:00:02.54 |      38M|
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter( IS NOT NULL)
   3 - filter(("DESCRIPTION"=:B1 AND "PRODUCT_CODE"=:B2 AND COUNT(*)>=3))

Maxim Demenko provided another possible solution for the problem experienced by the original poster.





Update Rows in Another Table with the Help of Analytic Functions

17 12 2009

December 17, 2009

In a recent comp.databases.oracle.misc Usenet thread:
http://groups.google.com/group/comp.databases.oracle.misc/browse_thread/thread/42c0d1550278250b

The following question was asked:

I need help with a query which involves the 2 tables defined below.  What I need to do is choose the record with the max “Eff Date” from “Table A” for a particular “Emp No.” and update the “Desc” from that record in the field “Desc” of “Table B” for the same “Emp No.”. I am able to choose the max “Eff Date” record for each employee from Table A but somehow not able to updated the same “Desc” in “Table B”.

Request you to please help the query. Any help would be appreciated.
Thanks!

Table A
Emp No. Group   Eff Date        Desc
1234    CI      01/01/1989      X
1234    CI      01/02/2000      X
1234    CI      01/02/2006      A
2345    AF      01/01/1990      X
2345    AF      01/02/2005      A

 

Table B
Emp No. Group   Desc
1234    CI      X
2345    AF      A
3456    CI      A

I provided the following suggestion:

Watch the query and results closely as one possible solution is built (there are other methods):

CREATE TABLE T1 (
  EMP_NO NUMBER,
  GROUPING VARCHAR2(5),
  EFF_DATE DATE,
  DESCR VARCHAR2(5));

CREATE TABLE T2 (
  EMP_NO NUMBER,
  GROUPING VARCHAR2(5),
  DESCR VARCHAR2(5));

INSERT INTO T1 VALUES (1234,'CI',TO_DATE('01/01/1989','MM/DD/YYYY'),'X');
INSERT INTO T1 VALUES (1234,'CI',TO_DATE('01/02/2000','MM/DD/YYYY'),'X');
INSERT INTO T1 VALUES (1234,'CI',TO_DATE('01/02/2006','MM/DD/YYYY'),'A');
INSERT INTO T1 VALUES (2345,'AF',TO_DATE('01/01/1990','MM/DD/YYYY'),'X');
INSERT INTO T1 VALUES (2345,'AF',TO_DATE('01/02/2005','MM/DD/YYYY'),'A');

INSERT INTO T2 VALUES (1234,'CI','XNN');
INSERT INTO T2 VALUES (2345,'AF','ANN');
INSERT INTO T2 VALUES (3456,'CI','ANN');

COMMIT;

 

SELECT
  EMP_NO,
  GROUPING,
  DESCR
FROM
  T2;

EMP_NO GROUP DESCR
------ ----- -----
  1234 CI    XNN
  2345 AF    ANN
  3456 CI    ANN

 

SELECT
  EMP_NO,
  GROUPING,
  EFF_DATE,
  DESCR
FROM
  T1;

EMP_NO GROUP EFF_DATE  DESCR
------ ----- --------- -----
  1234 CI    01-JAN-89 X
  1234 CI    02-JAN-00 X
  1234 CI    02-JAN-06 A
  2345 AF    01-JAN-90 X
  2345 AF    02-JAN-05 A

 

SELECT
  EMP_NO,
  GROUPING,
  EFF_DATE,
  ROW_NUMBER() OVER (PARTITION BY EMP_NO, GROUPING ORDER BY EFF_DATE
DESC) RN,
  DESCR
FROM
  T1;

EMP_NO GROUP EFF_DATE          RN DESCR
------ ----- --------- ---------- -----
  1234 CI    02-JAN-06          1 A
  1234 CI    02-JAN-00          2 X
  1234 CI    01-JAN-89          3 X
  2345 AF    02-JAN-05          1 A
  2345 AF    01-JAN-90          2 X

 

SELECT
  EMP_NO,
  GROUPING,
  EFF_DATE,
  DESCR
FROM
  (SELECT
    EMP_NO,
    GROUPING,
    EFF_DATE,
    ROW_NUMBER() OVER (PARTITION BY EMP_NO, GROUPING ORDER BY EFF_DATE DESC) RN,
    DESCR
  FROM
    T1)
WHERE
  RN=1;

EMP_NO GROUP EFF_DATE  DESCR
------ ----- --------- -----
  1234 CI    02-JAN-06 A
  2345 AF    02-JAN-05 A

 

UPDATE
  T2
SET
  DESCR=(
    SELECT
      DESCR
    FROM
      (SELECT
        EMP_NO,
        GROUPING,
        ROW_NUMBER() OVER (PARTITION BY EMP_NO, GROUPING ORDER BY EFF_DATE DESC) RN,
        DESCR
      FROM
        T1) T1
    WHERE
      RN=1
      AND T1.EMP_NO=T2.EMP_NO
      AND T1.GROUPING=T2.GROUPING)
WHERE
  (T2.EMP_NO,T2.GROUPING) IN (
    SELECT
      EMP_NO,
      GROUPING
    FROM
      (SELECT
        EMP_NO,
        GROUPING,
        ROW_NUMBER() OVER (PARTITION BY EMP_NO, GROUPING ORDER BY EFF_DATE DESC) RN,
        DESCR
      FROM
        T1)
    WHERE
      RN=1);

2 rows updated.

 

SELECT
  EMP_NO,
  GROUPING,
  DESCR
FROM
  T2;

EMP_NO GROUP DESCR
------ ----- -----
  1234 CI    A
  2345 AF    A
  3456 CI    ANN

 

Note that in the above, I assumed that the combination of EMP_NO and GROUPING had to be the same.

~~~~~~~~~~~~~~~~~~~~~~~

Maxim Demenko provided a very different approach to solving the problem that is both compact and impressive:

SQL> merge into t2 t2
   2  using (
   3    select emp_no,grouping,
   4      max(descr) keep(dense_rank last order by eff_date) descr
   5      from t1 group by emp_no,grouping) t1
   6  on (t1.emp_no=t2.emp_no
   7  and t1.grouping=t2.grouping)
   8  when matched then update set t2.descr=t1.descr
   9  ;

2 rows merged.

 

Execution Plan
----------------------------------------------------------
Plan hash value: 3235844370

-------------------------------------------------------------------------------
| Id  | Operation              | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | MERGE STATEMENT        |      |     2 |    16 |     8  (25)| 00:00:01 |
|   1 |  MERGE                 | T2   |       |       |            |          |
|   2 |   VIEW                 |      |       |       |            |          |
|*  3 |    HASH JOIN           |      |     2 |   108 |     8  (25)| 00:00:01 |
|   4 |     TABLE ACCESS FULL  | T2   |     3 |    99 |     3   (0)| 00:00:01 |
|   5 |     VIEW               |      |     5 |   105 |     4  (25)| 00:00:01 |
|   6 |      SORT GROUP BY     |      |     5 |   150 |     4  (25)| 00:00:01 |
|   7 |       TABLE ACCESS FULL| T1   |     5 |   150 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------­---

Predicate Information (identified by operation id):
---------------------------------------------------
    3 - access("T1"."EMP_NO"="T2"."EMP_NO" AND
               "T1"."GROUPING"="T2"."GROUPING")

~~~~~~~~~~~~~~~~~~~~~~~

Since Maxim provided an execution plan, let’s compare the efficiency of the two methods with a larger test case that uses the same table definitions:

TRUNCATE TABLE T1;
TRUNCATE TABLE T2;

INSERT INTO T1
SELECT
  DECODE(MOD(ROWNUM,10),
         0,0000,
         1,1111,
         2,2222,
         3,3333,
         4,4444,
         5,5555,
         6,6666,
         7,7777,
         8,8888,
         9,9999),
  DECODE(MOD(ROWNUM,6),
         0,'AA',
         1,'BB',
         2,'CC',
         3,'DD',
         4,'EE',
         5,'FF'),
  TRUNC(SYSDATE+SIN(ROWNUM/180*3.141592)*1000),
  UPPER(DBMS_RANDOM.STRING('A',1))
FROM
  DUAL
CONNECT BY
  LEVEL<=1000000;

INSERT INTO T2
SELECT
  DECODE(MOD(ROWNUM,10),
         0,0000,
         1,1111,
         2,2222,
         3,3333,
         4,4444,
         5,5555,
         6,6,
         7,7,
         8,8,
         9,9),
  DECODE(MOD(ROWNUM,11),
         0,'AA',
         1,'BB',
         2,'CC',
         3,'DD',
         4,'EE',
         5,'FF',
         6,'GG',
         7,'HH',
         8,'II',
         9,'JJ',
         10,'KK'),
  UPPER(DBMS_RANDOM.STRING('A',3))
FROM
  DUAL
CONNECT BY
  LEVEL<=100;

COMMIT;

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T1',CASCADE=>TRUE)
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T2',CASCADE=>TRUE)

 

SET PAGESIZE 1000
SET LINESIZE 150
ALTER SESSION SET STATISTICS_LEVEL='ALL';
ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE;

SPOOL C:\CHECKTIMING.TXT

UPDATE
  T2
SET
  DESCR=(
    SELECT
      DESCR
    FROM
      (SELECT
        EMP_NO,
        GROUPING,
        ROW_NUMBER() OVER (PARTITION BY EMP_NO, GROUPING ORDER BY EFF_DATE DESC) RN,
        DESCR
      FROM
        T1) T1
    WHERE
      RN=1
      AND T1.EMP_NO=T2.EMP_NO
      AND T1.GROUPING=T2.GROUPING)
WHERE
  (T2.EMP_NO,T2.GROUPING) IN (
    SELECT
      EMP_NO,
      GROUPING
    FROM
      (SELECT
        EMP_NO,
        GROUPING,
        ROW_NUMBER() OVER (PARTITION BY EMP_NO, GROUPING ORDER BY EFF_DATE DESC) RN,
        DESCR
      FROM
        T1)
    WHERE
      RN=1);

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

SELECT
  *
FROM
  T2
WHERE
  LENGTH(DESCR)=1
ORDER BY
  EMP_NO;

ROLLBACK;
ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE;

merge into t2 t2
 using (
   select emp_no,grouping,
     max(descr) keep(dense_rank last order by eff_date) descr
     from t1 group by emp_no,grouping) t1
 on (t1.emp_no=t2.emp_no
 and t1.grouping=t2.grouping)
 when matched then update set t2.descr=t1.descr
 ;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

SELECT
  *
FROM
  T2
WHERE
  LENGTH(DESCR)=1
ORDER BY
  EMP_NO;

ROLLBACK;
SPOOL OFF

What is the output of the above?

SQL_ID  8w16pv37zxuh5, child number 0
-------------------------------------
UPDATE   T2 SET   DESCR=(     SELECT       DESCR     FROM       (SELECT
        EMP_NO,         GROUPING,         ROW_NUMBER() OVER (PARTITION
BY EMP_NO, GROUPING ORDER BY EFF_DATE DESC) RN,         DESCR
FROM         T1) T1     WHERE       RN=1       AND T1.EMP_NO=T2.EMP_NO
     AND T1.GROUPING=T2.GROUPING) WHERE   (T2.EMP_NO,T2.GROUPING) IN (
   SELECT       EMP_NO,       GROUPING     FROM       (SELECT
EMP_NO,         GROUPING,         ROW_NUMBER() OVER (PARTITION BY
EMP_NO, GROUPING ORDER BY EFF_DATE DESC) RN,         DESCR       FROM
      T1)     WHERE       RN=1) 

Plan hash value: 2277482977                                                                                                                          

-----------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name     | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  | Writes |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------------------------
|   0 | UPDATE STATEMENT            |          |      1 |        |      0 |00:00:06.12 |   59000 |   3282 |      5 |       |       |          |
|   1 |  UPDATE                     | T2       |      1 |        |      0 |00:00:06.12 |   59000 |   3282 |      5 |       |       |          |
|*  2 |   HASH JOIN SEMI            |          |      1 |      1 |     17 |00:00:03.06 |    3289 |   3280 |      5 |   816K|   816K| 1167K (0)|
|   3 |    TABLE ACCESS FULL        | T2       |      1 |    100 |    100 |00:00:00.03 |       7 |      6 |      0 |       |       |          |
|   4 |    VIEW                     | VW_NSO_1 |      1 |   1000K|     30 |00:00:03.03 |    3282 |   3274 |      5 |       |       |          |
|*  5 |     VIEW                    |          |      1 |   1000K|     30 |00:00:03.03 |    3282 |   3274 |      5 |       |       |          |
|*  6 |      WINDOW SORT PUSHED RANK|          |      1 |   1000K|     67 |00:00:03.03 |    3282 |   3274 |      5 | 36864 | 36864 |   25M (1)|
|   7 |       TABLE ACCESS FULL     | T1       |      1 |   1000K|   1000K|00:00:00.01 |    3275 |   3269 |      0 |       |       |          |
|*  8 |   VIEW                      |          |     17 |  16667 |     17 |00:00:03.06 |   55675 |      0 |      0 |       |       |          |
|*  9 |    WINDOW SORT PUSHED RANK  |          |     17 |  16667 |    566K|00:00:02.53 |   55675 |      0 |      0 |  1541K|   615K| 1369K (0)|
|* 10 |     TABLE ACCESS FULL       | T1       |     17 |  16667 |    566K|00:00:01.13 |   55675 |      0 |      0 |       |       |          |
-----------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):                                                                                                  
---------------------------------------------------                                                                                                  
   2 - access("T2"."EMP_NO"="EMP_NO" AND "T2"."GROUPING"="GROUPING")
   5 - filter("RN"=1)
   6 - filter(ROW_NUMBER() OVER ( PARTITION BY "EMP_NO","GROUPING" ORDER BY INTERNAL_FUNCTION("EFF_DATE") DESC )<=1)
   8 - filter("RN"=1)
   9 - filter(ROW_NUMBER() OVER ( PARTITION BY "EMP_NO","GROUPING" ORDER BY INTERNAL_FUNCTION("EFF_DATE") DESC )<=1)
  10 - filter(("EMP_NO"=:B1 AND "GROUPING"=:B2))

    EMP_NO GROUP DESCR
---------- ----- -----
         0 CC    W   
         0 EE    F   
      1111 BB    F   
      1111 DD    U   
      1111 FF    Y   
      2222 AA    W   
      2222 CC    T   
      2222 EE    K   
      3333 BB    Z   
      3333 FF    I   
      3333 DD    W   
      4444 AA    Z   
      4444 EE    G   
      4444 CC    S   
      5555 DD    L   
      5555 BB    Y   
      5555 FF    X   

 

SQL_ID  93cj2ck69n4kg, child number 0
-------------------------------------
merge into t2 t2  using (    select emp_no,grouping,      max(descr)
keep(dense_rank last order by eff_date) descr      from t1 group by
emp_no,grouping) t1  on (t1.emp_no=t2.emp_no  and
t1.grouping=t2.grouping)  when matched then update set t2.descr=t1.descr

Plan hash value: 4231777338                                                                                                                          

-----------------------------------------------------------------------------------------------------------------------------
| Id  | Operation              | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------
|   0 | MERGE STATEMENT        |      |      1 |        |      1 |00:00:01.53 |    3301 |   3277 |       |       |          |
|   1 |  MERGE                 | T2   |      1 |        |      1 |00:00:01.53 |    3301 |   3277 |       |       |          |
|   2 |   VIEW                 |      |      1 |        |     17 |00:00:01.50 |    3282 |   3275 |       |       |          |
|*  3 |    HASH JOIN           |      |      1 |     43 |     17 |00:00:01.50 |    3282 |   3275 |   921K|   921K| 1181K (0)|
|   4 |     VIEW               |      |      1 |     43 |     30 |00:00:01.47 |    3275 |   3269 |       |       |          |
|   5 |      SORT GROUP BY     |      |      1 |     43 |     30 |00:00:01.47 |    3275 |   3269 | 73728 | 73728 |          |
|   6 |       TABLE ACCESS FULL| T1   |      1 |   1000K|   1000K|00:00:00.01 |    3275 |   3269 |       |       |          |
|   7 |     TABLE ACCESS FULL  | T2   |      1 |    100 |    100 |00:00:00.03 |       7 |      6 |       |       |          |
-----------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):                                                                                                  
---------------------------------------------------                                                                                                  
   3 - access("T1"."EMP_NO"="T2"."EMP_NO" AND "T1"."GROUPING"="T2"."GROUPING") 

    EMP_NO GROUP DESCR
---------- ----- -----
         0 CC    Z   
         0 EE    Z   
      1111 BB    Z   
      1111 DD    X   
      1111 FF    Z   
      2222 AA    Z   
      2222 CC    Z   
      2222 EE    Z   
      3333 BB    Z   
      3333 FF    Z   
      3333 DD    Z   
      4444 AA    Z   
      4444 EE    Z   
      4444 CC    Z   
      5555 DD    Z   
      5555 BB    Z   
      5555 FF    Z   

From the point of view of performance, Maxim’s solution is a clear winner.  It is interesting to note that the value of the DESCR column in table T2 differs for the two approaches.





SQL – Bad Execution Plan caused by ROWNUM, ROW_NUMBER is Possible Fix

9 12 2009

December 9, 2009

A recent discussion in the OTN forums demonstrated a cardinality calculation problem caused by using ROWNUM as a predicate in the WHERE clause (this specific issue was corrected in 11.2.0.1):
http://forums.oracle.com/forums/thread.jspa?threadID=934895&start=0

The person who started the OTN thread obviously knew how to ask a question, provide relevant information, and analyze the problem (essentially a model of a good request for assistance):

I’m having a couple of issues with a query, and I can’t figure out the best way to reach a solution.

Platform Information
Windows Server 2003 R2
Oracle 10.2.0.4

Optimizer Settings

SQL > show parameter optimizer

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
optimizer_dynamic_sampling           integer     2
optimizer_features_enable            string      10.2.0.4
optimizer_index_caching              integer     90
optimizer_index_cost_adj             integer     30
optimizer_mode                       string      ALL_ROWS
optimizer_secure_view_merging        boolean     TRUE

The query below, is a simple “Top N” query, where the top result is returned. Here it is, with bind variables in the same location as the application code:

SELECT PRODUCT_DESC
FROM
(
 SELECT PRODUCT_DESC
 , COUNT(*) AS CNT
 FROM USER_VISITS 
 JOIN PRODUCT ON PRODUCT.PRODUCT_OID = USER_VISITS.PRODUCT_OID
 WHERE PRODUCT.PRODUCT_DESC != 'Home' 
 AND VISIT_DATE
  BETWEEN
   ADD_MONTHS    
   (
    TRUNC    
    (
     TO_DATE   
     (
      :vCurrentYear
     , 'YYYY'
     )
    , 'YEAR'
    )
   , 3*(:vCurrentQuarter-1)
   )
  AND
   ADD_MONTHS    
   (
    TRUNC    
    (
     TO_DATE   
     (
      :vCurrentYear
     , 'YYYY'
     )
    , 'YEAR'
    )
   , 3*:vCurrentQuarter
   ) - INTERVAL '1' DAY   
 GROUP BY PRODUCT_DESC
 ORDER BY CNT DESC
)
WHERE ROWNUM <= 1;

The explain plan I receive when running the query above.

| Id  | Operation                         | Name                          | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------------------------------------------
|*  1 |  COUNT STOPKEY                    |                               |      1 |        |      1 |00:00:34.92 |   66343 |       |       |          |
|   2 |   VIEW                            |                               |      1 |      1 |      1 |00:00:34.92 |   66343 |       |       |          |
|*  3 |    FILTER                         |                               |      1 |        |      1 |00:00:34.92 |   66343 |       |       |          |
|   4 |     SORT ORDER BY                 |                               |      1 |      1 |      1 |00:00:34.92 |   66343 |  2048 |  2048 | 2048  (0)|
|   5 |      SORT GROUP BY NOSORT         |                               |      1 |      1 |     27 |00:00:34.92 |   66343 |       |       |          |
|   6 |       NESTED LOOPS                |                               |      1 |      2 |  12711 |00:00:34.90 |   66343 |       |       |          |
|   7 |        TABLE ACCESS BY INDEX ROWID| PRODUCT                       |      1 |     74 |     77 |00:00:00.01 |      44 |       |       |          |
|*  8 |         INDEX FULL SCAN           | PRODUCT_PRODDESCHAND_UNQ      |      1 |      1 |     77 |00:00:00.01 |       1 |       |       |          |
|*  9 |        INDEX FULL SCAN            | USER_VISITS#PK                |     77 |      2 |  12711 |00:00:34.88 |   66299 |       |       |          |
--------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(ROWNUM<=1)
   3 - filter(ADD_MONTHS(TRUNC(TO_DATE(TO_CHAR(:VCURRENTYEAR),'YYYY'),'fmyear'),3*(:VCURRENTQUARTER-1))<=ADD_MONTHS(TRUNC(TO_DATE(TO_CHAR(:VCURR
              ENTYEAR),'YYYY'),'fmyear'),3*:VCURRENTQUARTER)-INTERVAL'+01 00:00:00' DAY(2) TO SECOND(0))
   8 - filter("PRODUCT"."PRODUCT_DESC"<>'Home')
   9 - access("USER_VISITS"."VISIT_DATE">=ADD_MONTHS(TRUNC(TO_DATE(TO_CHAR(:VCURRENTYEAR),'YYYY'),'fmyear'),3*(:VCURRENTQUARTER-1)) AND
              "USER_VISITS"."PRODUCT_OID"="PRODUCT"."PRODUCT_OID" AND "USER_VISITS"."VISIT_DATE"<=ADD_MONTHS(TRUNC(TO_DATE(TO_CHAR(:VCURRENTYEAR),'YYYY')
              ,'fmyear'),3*:VCURRENTQUARTER)-INTERVAL'+01 00:00:00' DAY(2) TO SECOND(0))
       filter(("USER_VISITS"."VISIT_DATE">=ADD_MONTHS(TRUNC(TO_DATE(TO_CHAR(:VCURRENTYEAR),'YYYY'),'fmyear'),3*(:VCURRENTQUARTER-1)) AND
              "USER_VISITS"."VISIT_DATE"<=ADD_MONTHS(TRUNC(TO_DATE(TO_CHAR(:VCURRENTYEAR),'YYYY'),'fmyear'),3*:VCURRENTQUARTER)-INTERVAL'+01 00:00:00' DAY(2)
              TO SECOND(0) AND "USER_VISITS"."PRODUCT_OID"="PRODUCT"."PRODUCT_OID"))

TKPROF Row Source Generation

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.01          0          0          0           0
Fetch        2     35.10      35.13          0      66343          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4     35.10      35.14          0      66343          0           1

Misses in library cache during parse: 1
Misses in library cache during execute: 1
Optimizer mode: ALL_ROWS
Parsing user id: 62 

Rows     Row Source Operation
-------  ---------------------------------------------------
      1  COUNT STOPKEY (cr=66343 pr=0 pw=0 time=35132008 us)
      1   VIEW  (cr=66343 pr=0 pw=0 time=35131996 us)
      1    FILTER  (cr=66343 pr=0 pw=0 time=35131991 us)
      1     SORT ORDER BY (cr=66343 pr=0 pw=0 time=35131936 us)
     27      SORT GROUP BY NOSORT (cr=66343 pr=0 pw=0 time=14476309 us)
  12711       NESTED LOOPS  (cr=66343 pr=0 pw=0 time=22921810 us)
     77        TABLE ACCESS BY INDEX ROWID PRODUCT (cr=44 pr=0 pw=0 time=3674 us)
     77         INDEX FULL SCAN PRODUCT_PRODDESCHAND_UNQ (cr=1 pr=0 pw=0 time=827 us)(object id 52355)
  12711        INDEX FULL SCAN USER_VISITS#PK (cr=66299 pr=0 pw=0 time=44083746 us)(object id 52949)

However when I run the query with an ALL_ROWS hint I receive this explain plan (reasoning for this can be found here Jonathan’s Lewis’ response: http://www.freelists.org/post/oracle-l/ORDER-BY-and-first-rows-10-madness,4):

---------------------------------------------------------------------------------------------
| Id  | Operation                  | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |                |     1 |    39 |   223  (25)| 00:00:03 |
|*  1 |  COUNT STOPKEY             |                |       |       |            |          |
|   2 |   VIEW                     |                |     1 |    39 |   223  (25)| 00:00:03 |
|*  3 |    FILTER                  |                |       |       |            |          |
|   4 |     SORT ORDER BY          |                |     1 |    49 |   223  (25)| 00:00:03 |
|   5 |      HASH GROUP BY         |                |     1 |    49 |   223  (25)| 00:00:03 |
|*  6 |       HASH JOIN            |                |   490 | 24010 |   222  (24)| 00:00:03 |
|*  7 |        TABLE ACCESS FULL   | PRODUCT        |    77 |  2849 |     2   (0)| 00:00:01 |
|*  8 |        INDEX FAST FULL SCAN| USER_VISITS#PK |   490 |  5880 |   219  (24)| 00:00:03 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(ROWNUM<=1)
   3 - filter(ADD_MONTHS(TRUNC(TO_DATE(:VCURRENTYEAR,'YYYY'),'fmyear'),3*(TO_NUMBER(:
              VCURRENTQUARTER)-1))<=ADD_MONTHS(TRUNC(TO_DATE(:VCURRENTYEAR,'YYYY'),'fmyear'),3*TO_N
              UMBER(:VCURRENTQUARTER))-INTERVAL'+01 00:00:00' DAY(2) TO SECOND(0))
   6 - access("USER_VISITS"."PRODUCT_OID"="PRODUCT"."PRODUCT_OID")
   7 - filter("PRODUCT"."PRODUCT_DESC"<>'Home')
   8 - filter("USER_VISITS"."VISIT_DATE">=ADD_MONTHS(TRUNC(TO_DATE(:VCURRENTYEAR,'YYY
              Y'),'fmyear'),3*(TO_NUMBER(:VCURRENTQUARTER)-1)) AND
              "USER_VISITS"."VISIT_DATE"<=ADD_MONTHS(TRUNC(TO_DATE(:VCURRENTYEAR,'YYYY'),'fmyear'),
              3*TO_NUMBER(:VCURRENTQUARTER))-INTERVAL'+01 00:00:00' DAY(2) TO SECOND(0))

And the TKPROF Row Source Generation:

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        3      0.51       0.51          0        907          0          27
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        5      0.51       0.51          0        907          0          27

Misses in library cache during parse: 1
Misses in library cache during execute: 1
Optimizer mode: ALL_ROWS
Parsing user id: 62 

Rows     Row Source Operation
-------  ---------------------------------------------------
     27  FILTER  (cr=907 pr=0 pw=0 time=513472 us)
     27   SORT ORDER BY (cr=907 pr=0 pw=0 time=513414 us)
     27    HASH GROUP BY (cr=907 pr=0 pw=0 time=512919 us)
  12711     HASH JOIN  (cr=907 pr=0 pw=0 time=641130 us)
     77      TABLE ACCESS FULL PRODUCT (cr=5 pr=0 pw=0 time=249 us)
  22844      INDEX FAST FULL SCAN USER_VISITS#PK (cr=902 pr=0 pw=0 time=300356 us)(object id 52949)

The query with the ALL_ROWS hint returns data instantly, while the other one takes about 70 times as long.

Interestingly enough BOTH queries generate plans with estimates that are WAY off. The first plan is estimating 2 rows, while the second plan is estimating 490 rows. However the real number of rows is correctly reported in the Row Source Generation as 12711 (after the join operation).

TABLE_NAME                       NUM_ROWS     BLOCKS
------------------------------ ---------- ----------
USER_VISITS                        196044       1049

INDEX_NAME                         BLEVEL LEAF_BLOCKS DISTINCT_KEYS CLUSTERING_FACTOR LAST_ANALYZED
------------------------------ ---------- ----------- ------------- ----------------- -------------------
USER_VISITS#PK                          2         860        196002          57761 07/24/2009 13:17:59

COLUMN_NAME                    NUM_DISTINCT LOW_VALUE            HIGH_VALUE                                 DENSITY     NUM_NULLS HISTOGRAM
------------------------------ ------------ -------------------- -------------------- -------------------------------- ---------- ---------------
VISIT_DATE                           195900 786809010E0910       786D0609111328                      .0000051046452272          0 NONE

I don’t know how the first one is estimating 2 rows, but I can compute the second’s cardinality estimates by assuming a 5% selectivity for the TO_DATE() functions:

SQL > SELECT ROUND(0.05*0.05*196044) FROM DUAL;

ROUND(0.05*0.05*196044)
-----------------------
                    490

However, removing the bind variables (and clearing the shared pool), does not change the cardinality estimates at all. I would like to avoid hinting this plan if possible and that is why I’m looking for advice. I also have a followup question.

Per one of Jonathan Lewis’ articles, adding a ROWNUM predicate in the WHERE clause causes the optimizer to switch to a FIRST ROWS n optimizer mode, as if a /*+ FIRST_ROWS(n) */ hint were provided for the query. To work around this issue, changing the query to use a ROW_NUMBER analytic function rather than the ROWNUM pseudo column could prevent the optimizer from switching to first rows optimization.

Another question you might ask is if bind peeking enabled? It is enabled by default on Oracle 10g (and 9i). Are there automatically generated histograms on any of the columns in the WHERE clause? Do any of the tables involved have out of date statistics (or cases where the statistics as of 10 PM do not match the statistics as of the current time)? Is it possible to examine 10053 trace files for the SQL statement?

Testing with either a CARDINALITY or OPT_ESTIMATE hint to see if the execution plan changes dramatically to improve performance is another possible area of investigation. The question then becomes whether this will be sufficient to over-rule the first rows optimizer so that it does not use an index access which will avoid a sort.

There may also be a problem with the accuracy of the EXPLAIN PLAN in SQL*Plus due to the bind variables in the SQL statement. If I recall correctly, explain plan does not peek at bind variables, as happens when the query is actually executed (but that may have been a 9i problem, where 10g uses DBMS_XPLAN for explain plans), and that SQL*Plus passes in numeric bind variables as if they are VARCHAR2. (I have an example in 11.1.0.7 which shows that this might not be the case).

The original poster provided a great test case to demonstrate what is happening:

DDL/DML

CREATE TABLE TEST_AGGR
(
 ID NUMBER
,  VAL DATE
,  PAD VARCHAR2(500)
,  ID2 NUMBER
);

CREATE TABLE TEST_JOIN
(
        ID2 NUMBER
,       PAD VARCHAR2(5)
,       PROD VARCHAR2(15)
);

CREATE SEQUENCE TEST_SEQUENCE;

INSERT INTO TEST_AGGR
SELECT TEST_SEQUENCE.NEXTVAL
, SYSDATE + DBMS_RANDOM.VALUE*100
, RPAD('X',499,'0')
, MOD(ROWNUM,78)
FROM DUAL
CONNECT BY ROWNUM <= 195000;

INSERT INTO TEST_JOIN
SELECT  ID2
,  
 (CASE
  WHEN ROWNUM IN (3,5) THEN 'Y'
  ELSE RPAD('X',4,'0')
 END)
, (CASE
  WHEN ROWNUM = 1 THEN 'test_prod'
  WHEN ROWNUM BETWEEN 2 AND 3 THEN 'DUPLICATE A1'
  WHEN ROWNUM BETWEEN 4 AND 5 THEN 'A DUPLICATE A2'
  ELSE UPPER(DBMS_RANDOM.STRING('A',15)) END)
FROM  (SELECT DISTINCT ID2 FROM TEST_AGGR);

CREATE INDEX TEST_AGGR_IDX ON TEST_AGGR(ID,VAL,ID2);
CREATE UNIQUE INDEX TEST_JOIN_UIDX ON TEST_JOIN(PROD,PAD);

 Statistics Gathering

EXEC DBMS_STATS.GATHER_TABLE_STATS(USER,'TEST_AGGR',cascade=>true);
EXEC DBMS_STATS.GATHER_TABLE_STATS(USER,'TEST_JOIN',cascade=>true);

Bind Variable Setup

var a VARCHAR2(25);
var b VARCHAR2(25);

exec :a := '08/01/2009';
exec :b := '08/27/2009';

Non-hinted Query

SELECT /*repeatable_case_nohint*/ PROD
FROM
(
 SELECT TEST_JOIN.PROD
 , COUNT(*) CNT
 FROM TEST_AGGR
 JOIN TEST_JOIN ON TEST_JOIN.ID2 = TEST_AGGR.ID2
 WHERE VAL BETWEEN TO_DATE(:a,'MM/DD/YYYY') AND TO_DATE(:b,'MM/DD/YYYY')
 AND PROD  'TEST_PROD'
 GROUP BY TEST_JOIN.PROD
 ORDER BY CNT DESC
)
WHERE ROWNUM <= 1

 Non-hinted XPLAN Output

SQL_ID  9smb486cd31b2, child number 0
-------------------------------------
SELECT /*repeatable_case_nohint*/ PROD FROM (  SELECT TEST_JOIN.PROD  , COUNT(*) CNT  FROM TEST_AGGR  JOIN TEST_JOIN ON
TEST_JOIN.ID2 = TEST_AGGR.ID2  WHERE VAL BETWEEN TO_DATE(:a,'MM/DD/YYYY') AND TO_DATE(:b,'MM/DD/YYYY')  AND PROD <> 'TEST_PROD'
GROUP BY TEST_JOIN.PROD  ORDER BY CNT DESC ) WHERE ROWNUM <= 1

Plan hash value: 301623847

-----------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                         | Name           | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem |  O/1/M   |
-----------------------------------------------------------------------------------------------------------------------------------------
|*  1 |  COUNT STOPKEY                    |                |      1 |        |      1 |00:00:07.33 |   57334 |       |       |          |
|   2 |   VIEW                            |                |      1 |      2 |      1 |00:00:07.33 |   57334 |       |       |          |
|*  3 |    FILTER                         |                |      1 |        |      1 |00:00:07.33 |   57334 |       |       |          |
|   4 |     SORT ORDER BY                 |                |      1 |      2 |      1 |00:00:07.33 |   57334 | 73728 | 73728 |          |
|   5 |      SORT GROUP BY NOSORT         |                |      1 |      2 |     76 |00:00:07.33 |   57334 |       |       |          |
|   6 |       NESTED LOOPS                |                |      1 |      2 |  50637 |00:00:07.24 |   57334 |       |       |          |
|   7 |        TABLE ACCESS BY INDEX ROWID| TEST_JOIN      |      1 |     77 |     78 |00:00:00.01 |    2 |          |       |          |
|*  8 |         INDEX FULL SCAN           | TEST_JOIN_UIDX |      1 |      1 |     78 |00:00:00.01 |    1 |          |       |          |
|*  9 |        INDEX FULL SCAN            | TEST_AGGR_IDX  |     78 |      2 |  50637 |00:00:07.15 |   57332 |       |       |          |
-----------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM<=1)
   3 - filter(TO_DATE(:A,'MM/DD/YYYY')<=TO_DATE(:B,'MM/DD/YYYY'))
   8 - filter("TEST_JOIN"."PROD"<>'TEST_PROD')
   9 - access("TEST_AGGR"."VAL">=TO_DATE(:A,'MM/DD/YYYY') AND "TEST_JOIN"."ID2"="TEST_AGGR"."ID2" AND
              "TEST_AGGR"."VAL"<=TO_DATE(:B,'MM/DD/YYYY'))
       filter(("TEST_AGGR"."VAL"<=TO_DATE(:B,'MM/DD/YYYY') AND "TEST_AGGR"."VAL">=TO_DATE(:A,'MM/DD/YYYY') AND
              "TEST_JOIN"."ID2"="TEST_AGGR"."ID2"))

Hinted Query

SELECT /*+ ALL_ROWS repeatable_case_hint*/ PROD
FROM
(
 SELECT TEST_JOIN.PROD
 , COUNT(*) CNT
 FROM TEST_AGGR
 JOIN TEST_JOIN ON TEST_JOIN.ID2 = TEST_AGGR.ID2
 WHERE VAL BETWEEN TO_DATE(:a,'MM/DD/YYYY') AND TO_DATE(:b,'MM/DD/YYYY')
 AND PROD  'TEST_PROD'
 GROUP BY TEST_JOIN.PROD
 ORDER BY CNT DESC
)
WHERE ROWNUM <= 1

Hinted XPLAN Output

SQL_ID  283wx8s0d04kn, child number 0
-------------------------------------
SELECT /*+ ALL_ROWS repeatable_case_hint*/ PROD FROM (  SELECT TEST_JOIN.PROD  , COUNT(*) CNT  FROM TEST_AGGR  JOIN
TEST_JOIN ON TEST_JOIN.ID2 = TEST_AGGR.ID2  WHERE VAL BETWEEN TO_DATE(:a,'MM/DD/YYYY') AND TO_DATE(:b,'MM/DD/YYYY')  AND
PROD <> 'TEST_PROD'  GROUP BY TEST_JOIN.PROD  ORDER BY CNT DESC ) WHERE ROWNUM <= 1

Plan hash value: 828673535

---------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                  | Name          | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem |  O/1/M   |
---------------------------------------------------------------------------------------------------------------------------------
|*  1 |  COUNT STOPKEY             |               |      1 |        |      1 |00:00:00.34 |     772 |       |       |          |
|   2 |   VIEW                     |               |      1 |     75 |      1 |00:00:00.34 |     772 |       |       |          |
|*  3 |    FILTER                  |               |      1 |        |      1 |00:00:00.34 |     772 |       |       |          |
|   4 |     SORT ORDER BY          |               |      1 |     75 |      1 |00:00:00.34 |     772 | 73728 | 73728 |          |
|   5 |      HASH GROUP BY         |               |      1 |     75 |     76 |00:00:00.34 |     772 |   808K|   808K|     1/0/0|
|*  6 |       HASH JOIN            |               |      1 |  50069 |  50637 |00:00:00.26 |     772 |   842K|   842K|     1/0/0|
|*  7 |        TABLE ACCESS FULL   | TEST_JOIN     |      1 |     77 |     78 |00:00:00.01 |       3 |       |       |          |
|*  8 |        INDEX FAST FULL SCAN| TEST_AGGR_IDX |      1 |  50737 |  50637 |00:00:00.10 |     769 |       |       |          |
---------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM<=1)
   3 - filter(TO_DATE(:A,'MM/DD/YYYY')<=TO_DATE(:B,'MM/DD/YYYY'))
   6 - access("TEST_JOIN"."ID2"="TEST_AGGR"."ID2")
   7 - filter("TEST_JOIN"."PROD"<>'TEST_PROD')
   8 - filter(("TEST_AGGR"."VAL"<=TO_DATE(:B,'MM/DD/YYYY') AND "TEST_AGGR"."VAL">=TO_DATE(:A,'MM/DD/YYYY')))

Observations

The cardinality estimates in the non-hinted plan are way, way off. However, in the hinted plan the cardinalities are nearly correct. In the non-hinted plan the cardinality estimates are incorrect at steps 8 and 9.

I tried to work out the cardinality estimate for the access of TEST_JOIN_UIDX by first calculating the “Effective Index Selectivity” based on the following data:

TABLE_NAME                     COLUMN_NAME                    NUM_DISTINCT    DENSITY  NUM_NULLS
------------------------------ ------------------------------ ------------ ---------- ----------
TEST_JOIN                      PROD                                     76 .013157895          0

Since in step 8 of the plan we are filtering based on a != condition we want ALL the rows except that one. Since the number of rows in this roughly equal the number of distinct values I calculated a rough effective index selectivity estimate of:

(1-0.13157895)*78 = 67.7368419

This is right in line with a logical understanding of the data.

Based on the following data:

TABLE_NAME                     COLUMN_NAME                    NUM_DISTINCT
------------------------------ ------------------------------ ------------
TEST_AGGR                      VAL                                  194588

I would expect the following cardinality out of the TEST_AGGR_IDX index:

SQL > SELECT (TO_DATE(:b,'MM/DD/YYYY')-TO_DATE(:a,'MM/DD/YYYY'))/(MAX(VAL)-MIN(VAL))+2/194588 AS SELECTIVITY FROM TEST_AGGR;

SELECTIVITY
-----------
 .260014672

SQL > SELECT .260014672*195128 AS CARDINALITY FROM DUAL;

CARDINALITY
-----------
 50736.1429

This almost exactly matches the HINTED plan (error due to rounding).

Based on the estimates in the non-hinted plan I would have expected a 5% * 5% to occur but my calculations don’t match:

SQL > select .05*.05*(1/78)*195128 AS CARDINALITY from dual;

CARDINALITY
-----------
 6.25410256

After playing with the test case for a little while, I recalled seeing this behavior in the past with ROWNUM in the WHERE clause.  What is happening is that the ROWNUM<=1 predicate is being pushed into the inline view. Take a look (at the E-Rows column):

SELECT /*+ gather_plan_statistics */ PROD
FROM
(
 SELECT TEST_JOIN.PROD
 , COUNT(*) CNT
 FROM TEST_AGGR
 JOIN TEST_JOIN ON TEST_JOIN.ID2 = TEST_AGGR.ID2
 WHERE VAL BETWEEN TO_DATE(:a,'MM/DD/YYYY') AND TO_DATE(:b,'MM/DD/YYYY')
 AND PROD <> 'TEST_PROD'
 GROUP BY TEST_JOIN.PROD
 ORDER BY CNT DESC
)
WHERE ROWNUM <= 1;

select * from TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

-----------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                         | Name           | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------------------
|*  1 |  COUNT STOPKEY                    |                |      1 |        |      1 |00:00:02.84 |   57178 |       |       |          |
|   2 |   VIEW                            |                |      1 |      2 |      1 |00:00:02.84 |   57178 |       |       |          |
|*  3 |    FILTER                         |                |      1 |        |      1 |00:00:02.84 |   57178 |       |       |          |
|   4 |     SORT ORDER BY                 |                |      1 |      2 |      1 |00:00:02.84 |   57178 | 73728 | 73728 |          |
|   5 |      SORT GROUP BY NOSORT         |                |      1 |      2 |     76 |00:00:02.80 |   57178 |       |       |          |
|   6 |       NESTED LOOPS                |                |      1 |      2 |  50596 |00:00:03.04 |   57178 |       |       |          |
|   7 |        TABLE ACCESS BY INDEX ROWID| TEST_JOIN      |      1 |     77 |     78 |00:00:00.01 |       2 |       |       |          |
|*  8 |         INDEX FULL SCAN           | TEST_JOIN_UIDX |      1 |      1 |     78 |00:00:00.01 |       1 |       |       |          |
|*  9 |        INDEX FULL SCAN            | TEST_AGGR_IDX  |     78 |      2 |  50596 |00:00:02.88 |   57176 |       |       |          |
-----------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(ROWNUM<=1)
   3 - filter(TO_DATE(:A,'MM/DD/YYYY')<=TO_DATE(:B,'MM/DD/YYYY'))
   8 - filter("TEST_JOIN"."PROD"<>'TEST_PROD')
   9 - access("TEST_AGGR"."VAL">=TO_DATE(:A,'MM/DD/YYYY') AND "TEST_JOIN"."ID2"="TEST_AGGR"."ID2" AND
              "TEST_AGGR"."VAL"<=TO_DATE(:B,'MM/DD/YYYY'))
       filter(("TEST_AGGR"."VAL"<=TO_DATE(:B,'MM/DD/YYYY') AND "TEST_AGGR"."VAL">=TO_DATE(:A,'MM/DD/YYYY') AND
              "TEST_JOIN"."ID2"="TEST_AGGR"."ID2"))

 

SELECT /*+ gather_plan_statistics */ PROD
FROM
(
 SELECT TEST_JOIN.PROD
 , COUNT(*) CNT
 FROM TEST_AGGR
 JOIN TEST_JOIN ON TEST_JOIN.ID2 = TEST_AGGR.ID2
 WHERE VAL BETWEEN TO_DATE(:a,'MM/DD/YYYY') AND TO_DATE(:b,'MM/DD/YYYY')
 AND PROD  'TEST_PROD'
 GROUP BY TEST_JOIN.PROD
 ORDER BY CNT DESC
)
WHERE ROWNUM <= 2;

select * from TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

-----------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                         | Name           | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------------------
|*  1 |  COUNT STOPKEY                    |                |      1 |        |      2 |00:00:02.70 |   57178 |       |       |          |
|   2 |   VIEW                            |                |      1 |      3 |      2 |00:00:02.70 |   57178 |       |       |          |
|*  3 |    FILTER                         |                |      1 |        |      2 |00:00:02.70 |   57178 |       |       |          |
|   4 |     SORT ORDER BY                 |                |      1 |      3 |      2 |00:00:02.70 |   57178 |  9216 |  9216 | 8192  (0)|
|   5 |      SORT GROUP BY NOSORT         |                |      1 |      3 |     76 |00:00:02.73 |   57178 |       |       |          |
|   6 |       NESTED LOOPS                |                |      1 |      3 |  50596 |00:00:02.88 |   57178 |       |       |          |
|   7 |        TABLE ACCESS BY INDEX ROWID| TEST_JOIN      |      1 |     77 |     78 |00:00:00.01 |       2 |       |       |          |
|*  8 |         INDEX FULL SCAN           | TEST_JOIN_UIDX |      1 |      1 |     78 |00:00:00.01 |       1 |       |       |          |
|*  9 |        INDEX FULL SCAN            | TEST_AGGR_IDX  |     78 |      3 |  50596 |00:00:02.73 |   57176 |       |       |          |
-----------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(ROWNUM<=2)
   3 - filter(TO_DATE(:A,'MM/DD/YYYY')<=TO_DATE(:B,'MM/DD/YYYY'))
   8 - filter("TEST_JOIN"."PROD"<>'TEST_PROD')
   9 - access("TEST_AGGR"."VAL">=TO_DATE(:A,'MM/DD/YYYY') AND "TEST_JOIN"."ID2"="TEST_AGGR"."ID2" AND
              "TEST_AGGR"."VAL"<=TO_DATE(:B,'MM/DD/YYYY'))
       filter(("TEST_AGGR"."VAL"<=TO_DATE(:B,'MM/DD/YYYY') AND "TEST_AGGR"."VAL">=TO_DATE(:A,'MM/DD/YYYY') AND
              "TEST_JOIN"."ID2"="TEST_AGGR"."ID2"))

 

SELECT /*+ gather_plan_statistics */ PROD
FROM
(
 SELECT TEST_JOIN.PROD
 , COUNT(*) CNT
 FROM TEST_AGGR
 JOIN TEST_JOIN ON TEST_JOIN.ID2 = TEST_AGGR.ID2
 WHERE VAL BETWEEN TO_DATE(:a,'MM/DD/YYYY') AND TO_DATE(:b,'MM/DD/YYYY')
 AND PROD  'TEST_PROD'
 GROUP BY TEST_JOIN.PROD
 ORDER BY CNT DESC
)
WHERE ROWNUM <= 3;

select * from TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

-----------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                         | Name           | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------------------
|*  1 |  COUNT STOPKEY                    |                |      1 |        |      3 |00:00:02.71 |   57178 |       |       |          |
|   2 |   VIEW                            |                |      1 |      4 |      3 |00:00:02.71 |   57178 |       |       |          |
|*  3 |    FILTER                         |                |      1 |        |      3 |00:00:02.71 |   57178 |       |       |          |
|   4 |     SORT ORDER BY                 |                |      1 |      4 |      3 |00:00:02.71 |   57178 |  9216 |  9216 | 8192  (0)|
|   5 |      SORT GROUP BY NOSORT         |                |      1 |      4 |     76 |00:00:02.67 |   57178 |       |       |          |
|   6 |       NESTED LOOPS                |                |      1 |      4 |  50596 |00:00:02.93 |   57178 |       |       |          |
|   7 |        TABLE ACCESS BY INDEX ROWID| TEST_JOIN      |      1 |     77 |     78 |00:00:00.01 |       2 |       |       |          |
|*  8 |         INDEX FULL SCAN           | TEST_JOIN_UIDX |      1 |      1 |     78 |00:00:00.01 |       1 |       |       |          |
|*  9 |        INDEX FULL SCAN            | TEST_AGGR_IDX  |     78 |      4 |  50596 |00:00:02.83 |   57176 |       |       |          |
-----------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(ROWNUM<=3)
   3 - filter(TO_DATE(:A,'MM/DD/YYYY')<=TO_DATE(:B,'MM/DD/YYYY'))
   8 - filter("TEST_JOIN"."PROD"<>'TEST_PROD')
   9 - access("TEST_AGGR"."VAL">=TO_DATE(:A,'MM/DD/YYYY') AND "TEST_JOIN"."ID2"="TEST_AGGR"."ID2" AND
              "TEST_AGGR"."VAL"<=TO_DATE(:B,'MM/DD/YYYY'))
       filter(("TEST_AGGR"."VAL"<=TO_DATE(:B,'MM/DD/YYYY') AND "TEST_AGGR"."VAL">=TO_DATE(:A,'MM/DD/YYYY') AND
              "TEST_JOIN"."ID2"="TEST_AGGR"."ID2"))

 

SELECT /*+ gather_plan_statistics */ PROD
FROM
(
 SELECT TEST_JOIN.PROD
 , COUNT(*) CNT
 FROM TEST_AGGR
 JOIN TEST_JOIN ON TEST_JOIN.ID2 = TEST_AGGR.ID2
 WHERE VAL BETWEEN TO_DATE(:a,'MM/DD/YYYY') AND TO_DATE(:b,'MM/DD/YYYY')
 AND PROD  'TEST_PROD'
 GROUP BY TEST_JOIN.PROD
 ORDER BY CNT DESC
)
WHERE ROWNUM <= 50;

select * from TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

-----------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                         | Name           | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------------------
|*  1 |  COUNT STOPKEY                    |                |      1 |        |     50 |00:00:02.72 |   57178 |       |       |          |
|   2 |   VIEW                            |                |      1 |     51 |     50 |00:00:02.72 |   57178 |       |       |          |
|*  3 |    FILTER                         |                |      1 |        |     50 |00:00:02.72 |   57178 |       |       |          |
|   4 |     SORT ORDER BY                 |                |      1 |     51 |     50 |00:00:02.72 |   57178 |  9216 |  9216 | 8192  (0)|
|   5 |      SORT GROUP BY NOSORT         |                |      1 |     51 |     76 |00:00:02.65 |   57178 |       |       |          |
|   6 |       NESTED LOOPS                |                |      1 |     51 |  50596 |00:00:02.88 |   57178 |       |       |          |
|   7 |        TABLE ACCESS BY INDEX ROWID| TEST_JOIN      |      1 |     77 |     78 |00:00:00.01 |       2 |       |       |          |
|*  8 |         INDEX FULL SCAN           | TEST_JOIN_UIDX |      1 |      1 |     78 |00:00:00.01 |       1 |       |       |          |
|*  9 |        INDEX FULL SCAN            | TEST_AGGR_IDX  |     78 |     51 |  50596 |00:00:02.78 |   57176 |       |       |          |
-----------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(ROWNUM<=50)
   3 - filter(TO_DATE(:A,'MM/DD/YYYY')<=TO_DATE(:B,'MM/DD/YYYY'))
   8 - filter("TEST_JOIN"."PROD"<>'TEST_PROD')
   9 - access("TEST_AGGR"."VAL">=TO_DATE(:A,'MM/DD/YYYY') AND "TEST_JOIN"."ID2"="TEST_AGGR"."ID2" AND
              "TEST_AGGR"."VAL"<=TO_DATE(:B,'MM/DD/YYYY'))
       filter(("TEST_AGGR"."VAL"<=TO_DATE(:B,'MM/DD/YYYY') AND "TEST_AGGR"."VAL">=TO_DATE(:A,'MM/DD/YYYY') AND
              "TEST_JOIN"."ID2"="TEST_AGGR"."ID2"))

 

SELECT /*+ gather_plan_statistics */ PROD
FROM
(
 SELECT TEST_JOIN.PROD
 , COUNT(*) CNT
 FROM TEST_AGGR
 JOIN TEST_JOIN ON TEST_JOIN.ID2 = TEST_AGGR.ID2
 WHERE VAL BETWEEN TO_DATE(:a,'MM/DD/YYYY') AND TO_DATE(:b,'MM/DD/YYYY')
 AND PROD  'TEST_PROD'
 GROUP BY TEST_JOIN.PROD
 ORDER BY CNT DESC
)
WHERE ROWNUM <= 74;

select * from TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

-----------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                         | Name           | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------------------
|*  1 |  COUNT STOPKEY                    |                |      1 |        |   74 |00:00:02.71 |   57178 |       |       |          |
|   2 |   VIEW                            |                |      1 |     75 |   74 |00:00:02.71 |   57178 |       |       |          |
|*  3 |    FILTER                         |                |      1 |        |   74 |00:00:02.71 |   57178 |       |       |          |
|   4 |     SORT ORDER BY                 |                |      1 |     75 |   74 |00:00:02.71 |   57178 |  9216 |  9216 | 8192  (0)|
|   5 |      SORT GROUP BY NOSORT         |                |      1 |     75 |   76 |00:00:02.66 |   57178 |       |       |          |
|   6 |       NESTED LOOPS                |                |      1 |     75 |50596 |00:00:02.93 |   57178 |       |       |          |
|   7 |        TABLE ACCESS BY INDEX ROWID| TEST_JOIN      |      1 |     77 |   78 |00:00:00.01 |       2 |       |       |          |
|*  8 |         INDEX FULL SCAN           | TEST_JOIN_UIDX |      1 |      1 |   78 |00:00:00.01 |       1 |       |       |          |
|*  9 |        INDEX FULL SCAN            | TEST_AGGR_IDX  |     78 |     75 |50596 |00:00:02.83 |   57176 |       |       |          |
-----------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(ROWNUM<=74)
   3 - filter(TO_DATE(:A,'MM/DD/YYYY')<=TO_DATE(:B,'MM/DD/YYYY'))
   8 - filter("TEST_JOIN"."PROD"<>'TEST_PROD')
   9 - access("TEST_AGGR"."VAL">=TO_DATE(:A,'MM/DD/YYYY') AND "TEST_JOIN"."ID2"="TEST_AGGR"."ID2" AND
              "TEST_AGGR"."VAL"<=TO_DATE(:B,'MM/DD/YYYY'))
       filter(("TEST_AGGR"."VAL"<=TO_DATE(:B,'MM/DD/YYYY') AND "TEST_AGGR"."VAL">=TO_DATE(:A,'MM/DD/YYYY') AND
              "TEST_JOIN"."ID2"="TEST_AGGR"."ID2"))

 

SELECT /*+ gather_plan_statistics */ PROD
FROM
(
 SELECT TEST_JOIN.PROD
 , COUNT(*) CNT
 FROM TEST_AGGR
 JOIN TEST_JOIN ON TEST_JOIN.ID2 = TEST_AGGR.ID2
 WHERE VAL BETWEEN TO_DATE(:a,'MM/DD/YYYY') AND TO_DATE(:b,'MM/DD/YYYY')
 AND PROD  'TEST_PROD'
 GROUP BY TEST_JOIN.PROD
 ORDER BY CNT DESC
)
WHERE ROWNUM <= 75;

select * from TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation               | Name          | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------------------
|*  1 |  COUNT STOPKEY          |               |      1 |        |     75 |00:00:00.06 |     741 |       |       |          |
|   2 |   VIEW                  |               |      1 |     75 |     75 |00:00:00.06 |     741 |       |       |          |
|*  3 |    FILTER               |               |      1 |        |     75 |00:00:00.06 |     741 |       |       |          |
|   4 |     SORT ORDER BY       |               |      1 |     75 |     75 |00:00:00.06 |     741 |  9216 |  9216 | 8192  (0)|
|   5 |      HASH GROUP BY      |               |      1 |     75 |     76 |00:00:00.06 |     741 |   963K|   963K| 2375K (0)|
|*  6 |       HASH JOIN         |               |      1 |  50074 |  50596 |00:00:00.30 |     741 |  1035K|  1035K| 1164K (0)|
|*  7 |        TABLE ACCESS FULL| TEST_JOIN     |      1 |     77 |     78 |00:00:00.01 |       7 |       |       |          |
|*  8 |        INDEX FULL SCAN  | TEST_AGGR_IDX |      1 |  50742 |  50596 |00:00:00.10 |     734 |       |       |          |
------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(ROWNUM<=75)
   3 - filter(TO_DATE(:A,'MM/DD/YYYY')<=TO_DATE(:B,'MM/DD/YYYY'))
   6 - access("TEST_JOIN"."ID2"="TEST_AGGR"."ID2")
   7 - filter("TEST_JOIN"."PROD"<>'TEST_PROD')
   8 - access("TEST_AGGR"."VAL">=TO_DATE(:A,'MM/DD/YYYY') AND "TEST_AGGR"."VAL"<=TO_DATE(:B,'MM/DD/YYYY'))
       filter(("TEST_AGGR"."VAL"<=TO_DATE(:B,'MM/DD/YYYY') AND "TEST_AGGR"."VAL">=TO_DATE(:A,'MM/DD/YYYY')))

Compare the above output with what follows from Oracle 11.2.0.1, with the following parameters set:

ALTER SESSION SET optimizer_index_caching=0;
ALTER SESSION SET optimizer_index_cost_adj=100;
ALTER SESSION SET TRACEFILE_IDENTIFIER='optimizer_test100';
ALTER SESSION SET EVENTS '10053 trace name context forever, level 1';

 

SELECT /*+ gather_plan_statistics */ PROD
FROM
(
 SELECT TEST_JOIN.PROD
 , COUNT(*) CNT
 FROM TEST_AGGR
 JOIN TEST_JOIN ON TEST_JOIN.ID2 = TEST_AGGR.ID2
 WHERE VAL BETWEEN TO_DATE(:a,'MM/DD/YYYY') AND TO_DATE(:b,'MM/DD/YYYY')
 AND PROD  'TEST_PROD'
 GROUP BY TEST_JOIN.PROD
 ORDER BY CNT DESC
)
WHERE ROWNUM <= 1;

select * from TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

---------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                  | Name          | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
---------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |               |      1 |        |      1 |00:00:00.03 |     747 |       |       |          |
|*  1 |  COUNT STOPKEY             |               |      1 |        |      1 |00:00:00.03 |     747 |       |       |          |
|   2 |   VIEW                     |               |      1 |     75 |      1 |00:00:00.03 |     747 |       |       |          |
|*  3 |    SORT ORDER BY STOPKEY   |               |      1 |     75 |      1 |00:00:00.03 |     747 | 73728 | 73728 |          |
|   4 |     HASH GROUP BY          |               |      1 |     75 |     76 |00:00:00.03 |     747 |   964K|   964K| 2491K (0)|
|*  5 |      FILTER                |               |      1 |        |  44832 |00:00:00.06 |     747 |       |       |          |
|*  6 |       HASH JOIN            |               |      1 |  44262 |  44832 |00:00:00.05 |     747 |  1035K|  1035K| 1265K (0)|
|*  7 |        TABLE ACCESS FULL   | TEST_JOIN     |      1 |     77 |     78 |00:00:00.01 |       3 |       |       |          |
|*  8 |        INDEX FAST FULL SCAN| TEST_AGGR_IDX |      1 |  44852 |  44832 |00:00:00.02 |     744 |       |       |          |
---------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(ROWNUM<=1)
   3 - filter(ROWNUM<=1)
   5 - filter(TO_DATE(:A,'MM/DD/YYYY')<=TO_DATE(:B,'MM/DD/YYYY'))
   6 - access("TEST_JOIN"."ID2"="TEST_AGGR"."ID2")
   7 - filter("TEST_JOIN"."PROD"<>'TEST_PROD')
   8 - filter(("TEST_AGGR"."VAL"<=TO_DATE(:B,'MM/DD/YYYY') AND "TEST_AGGR"."VAL">=TO_DATE(:A,'MM/DD/YYYY')))

— 

ALTER SESSION SET optimizer_index_caching=90;
ALTER SESSION SET optimizer_index_cost_adj=30;
ALTER SESSION SET TRACEFILE_IDENTIFIER='optimizer_test30';

 

SELECT /*+ gather_plan_statistics */ PROD
FROM
(
 SELECT TEST_JOIN.PROD
 , COUNT(*) CNT
 FROM TEST_AGGR
 JOIN TEST_JOIN ON TEST_JOIN.ID2 = TEST_AGGR.ID2
 WHERE VAL BETWEEN TO_DATE(:a,'MM/DD/YYYY') AND TO_DATE(:b,'MM/DD/YYYY')
 AND PROD  'TEST_PROD'
 GROUP BY TEST_JOIN.PROD
 ORDER BY CNT DESC
)
WHERE ROWNUM <= 1;

select * from TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

---------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                  | Name          | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
---------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |               |      1 |        |      1 |00:00:00.03 |     747 |       |       |          |
|*  1 |  COUNT STOPKEY             |               |      1 |        |      1 |00:00:00.03 |     747 |       |       |          |
|   2 |   VIEW                     |               |      1 |     75 |      1 |00:00:00.03 |     747 |       |       |          |
|*  3 |    SORT ORDER BY STOPKEY   |               |      1 |     75 |      1 |00:00:00.03 |     747 | 73728 | 73728 |          |
|   4 |     HASH GROUP BY          |               |      1 |     75 |     76 |00:00:00.03 |     747 |   964K|   964K| 2501K (0)|
|*  5 |      FILTER                |               |      1 |        |  44832 |00:00:00.06 |     747 |       |       |          |
|*  6 |       HASH JOIN            |               |      1 |  44262 |  44832 |00:00:00.05 |     747 |  1035K|  1035K| 1286K (0)|
|*  7 |        TABLE ACCESS FULL   | TEST_JOIN     |      1 |     77 |     78 |00:00:00.01 |       3 |       |       |          |
|*  8 |        INDEX FAST FULL SCAN| TEST_AGGR_IDX |      1 |  44852 |  44832 |00:00:00.02 |     744 |       |       |          |
---------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(ROWNUM<=1)
   3 - filter(ROWNUM<=1)
   5 - filter(TO_DATE(:A,'MM/DD/YYYY')<=TO_DATE(:B,'MM/DD/YYYY'))
   6 - access("TEST_JOIN"."ID2"="TEST_AGGR"."ID2")
   7 - filter("TEST_JOIN"."PROD"<>'TEST_PROD')
   8 - filter(("TEST_AGGR"."VAL"<=TO_DATE(:B,'MM/DD/YYYY') AND "TEST_AGGR"."VAL">=TO_DATE(:A,'MM/DD/YYYY')))

 

ALTER SESSION SET optimizer_index_caching=100;
ALTER SESSION SET optimizer_index_cost_adj=1;
ALTER SESSION SET TRACEFILE_IDENTIFIER='optimizer_test1';

 

SELECT /*+ gather_plan_statistics */ PROD
FROM
(
 SELECT TEST_JOIN.PROD
 , COUNT(*) CNT
 FROM TEST_AGGR
 JOIN TEST_JOIN ON TEST_JOIN.ID2 = TEST_AGGR.ID2
 WHERE VAL BETWEEN TO_DATE(:a,'MM/DD/YYYY') AND TO_DATE(:b,'MM/DD/YYYY')
 AND PROD  'TEST_PROD'
 GROUP BY TEST_JOIN.PROD
 ORDER BY CNT DESC
)
WHERE ROWNUM <= 1;

select * from TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation               | Name          | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |               |      1 |        |      1 |00:00:01.16 |   57034 |       |       |          |
|*  1 |  COUNT STOPKEY          |               |      1 |        |      1 |00:00:01.16 |   57034 |       |       |          |
|   2 |   VIEW                  |               |      1 |     75 |      1 |00:00:01.16 |   57034 |       |       |          |
|*  3 |    SORT ORDER BY STOPKEY|               |      1 |     75 |      1 |00:00:01.16 |   57034 | 73728 | 73728 |          |
|   4 |     HASH GROUP BY       |               |      1 |     75 |     76 |00:00:01.16 |   57034 |   964K|   964K| 2506K (0)|
|*  5 |      FILTER             |               |      1 |        |  44832 |00:00:01.17 |   57034 |       |       |          |
|   6 |       NESTED LOOPS      |               |      1 |  44262 |  44832 |00:00:01.16 |   57034 |       |       |          |
|*  7 |        TABLE ACCESS FULL| TEST_JOIN     |      1 |     77 |     78 |00:00:00.01 |       3 |       |       |          |
|*  8 |        INDEX FULL SCAN  | TEST_AGGR_IDX |     78 |    575 |  44832 |00:00:01.14 |   57031 |       |       |          |
------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(ROWNUM<=1)
   3 - filter(ROWNUM<=1)
   5 - filter(TO_DATE(:A,'MM/DD/YYYY')<=TO_DATE(:B,'MM/DD/YYYY'))
   7 - filter("TEST_JOIN"."PROD"<>'TEST_PROD')
   8 - access("TEST_AGGR"."VAL">=TO_DATE(:A,'MM/DD/YYYY') AND "TEST_JOIN"."ID2"="TEST_AGGR"."ID2" AND
              "TEST_AGGR"."VAL"<=TO_DATE(:B,'MM/DD/YYYY'))
       filter(("TEST_AGGR"."VAL"<=TO_DATE(:B,'MM/DD/YYYY') AND "TEST_AGGR"."VAL">=TO_DATE(:A,'MM/DD/YYYY') AND
              "TEST_JOIN"."ID2"="TEST_AGGR"."ID2"))




SQL – ROW_NUMBER, MOD, Even Distribution

9 12 2009

December 9, 2009

A question appeared in the comp.databases.oracle.server Usenet group a couple years ago:
http://groups.google.com/group/comp.databases.oracle.server/browse_thread/thread/41ee30267e646e10

Say I have a bunch of bowling players of different skill level as indicated by his avg_score in the table below.

I need to allot them into n teams (say 8), of equivalent strength on the TEAM level so no team ends up with mostly high-scorers and vic-versa.
(let’s say players may not be evenly divided into teams because n numbers are “sick”)

Is there a way to do to this ?

10gR2> create table players (id integer primary key, avg_score number, team_no integer);

10gR2> desc players
Name      Type
--------- -------
ID        INTEGER
AVG_SCORE NUMBER
TEAM_NO   INTEGER

 

10gR2> BEGIN
  2    FOR i IN 1..120
  3    LOOP
  4        INSERT INTO players (id, avg_score) VALUES(i,round(dbms_random.value(75,295)));
  5    END LOOP;
  6  END ;
  7  /

Needs work, but may be enough to get you started:

SELECT
  ID,
  AVG_SCORE,
  ROW_NUMBER() OVER (ORDER BY AVG_SCORE) RANKING,
  COUNT(*) OVER (PARTITION BY 1) ROWS_COUNT
FROM
  PLAYERS;

        ID  AVG_SCORE    RANKING ROWS_COUNT
---------- ---------- ---------- ----------
        74         78          1        120
        91         82          2        120
        95         83          3        120
        77         86          4        120
        61         87          5        120
        23         87          6        120
         1         90          7        120
        67         91          8        120
        62         97          9        120
        33         98         10        120
...
        88        271        111        120
        41        272        112        120
       104        274        113        120
        32        275        114        120
        36        275        115        120
        99        276        116        120
        71        277        117        120
        31        285        118        120
         3        286        119        120
       113        288        120        120

If we were to take the people at rank 1 and rank 120, they would have roughly the same average as the people at rank 2 and rank 119, and they would have roughly the same average as the people at rank 3 and 118, etc.  This does not work exactly as planned as the number of people must be evenly divisible by 2 * the number of groups, and this is not the case with 120 people and 8 groups.

We can have Oracle skip from 1 to 9 to 17 to … by using the MOD function, but we must recognize the mid-point so that we can switch the formula.

By sliding the above into an inline view, we can perform the analysis that is required.  I included three additional columns to help determine whether or not the formula is close:

SELECT
  ID,
  AVG_SCORE,
  DECODE(SIGN(RANKING-(ROWS_COUNT/2)),-1,MOD(RANKING-1,8)+1,((8-1)-MOD(RANKING-(8/2),8))+1) TEAM_NO,
  RANKING,
  SUM(AVG_SCORE) OVER (PARTITION BY DECODE(SIGN(RANKING-(ROWS_COUNT/2)),-1,MOD(RANKING-1,8)+1,((8-1)-MOD(RANKING-(8/2),8))+1)) TEAM_AVG,
  COUNT(*) OVER (PARTITION BY DECODE(SIGN(RANKING-(ROWS_COUNT/2)),-1,MOD(RANKING-1,8)+1,((8-1)-MOD(RANKING-(8/2),8))+1)) NUM_TEAM_MEMBERS
FROM
  (SELECT
    ID,
    AVG_SCORE,
    ROW_NUMBER() OVER (ORDER BY AVG_SCORE) RANKING,
    COUNT(*) OVER (PARTITION BY 1) ROWS_COUNT
  FROM
    PLAYERS)
ORDER BY
  RANKING;

        ID  AVG_SCORE    TEAM_NO    RANKING   TEAM_AVG NUM_TEAM_MEMBERS
---------- ---------- ---------- ---------- ---------- ----------------
        74         78          1          1       2603 15
        91         82          2          2       2602 15
        95         83          3          3       2592 15
        77         86          4          4       2709 15
        61         87          5          5       2701 15
        23         87          6          6       2690 15
         1         90          7          7       2686 15
        67         91          8          8       2689 15
        62         97          1          9       2603 15
        33         98          2         10       2602 15
        79         98          3         11       2592 15
       120        100          4         12       2709 15
         2        101          5         13       2701 15
        39        101          6         14       2690 15
        60        102          7         15       2686 15
       101        104          8         16       2689 15
...
        14        257          8        108       2689 15
        59        259          7        109       2686 15
        29        262          6        110       2690 15
        88        271          5        111       2701 15
        41        272          4        112       2709 15
       104        274          3        113       2592 15
        32        275          2        114       2602 15
        36        275          1        115       2603 15
        99        276          8        116       2689 15
        71        277          7        117       2686 15
        31        285          6        118       2690 15
         3        286          5        119       2701 15
       113        288          4        120       2709 15




SQL – ROW_NUMBER Analytical Function, Many to One Join

9 12 2009

December 9, 2009

A question appeared in the comp.databases.oracle.server Usenet group a couple years ago:
http://groups.google.com/group/comp.databases.oracle.server/browse_thread/thread/deeb9d899c8e6376

I have a problem filtering my results which involves a many-to-many join. Below is a brief layout of the problem, but briefly, I have a test that can have multiple results and those results may also belong multiple tests (i have done the same test under a different id and I want to inherit the previous test result). I also have an event table that is required to keep track of all changes.

The problem I have is that I cannot filter my results so where a result belongs to 2 tests I don’t want the details where it belongs to the test in the query. If anyone can suggest a way of returning 2 lines instead of 3 I would be very grateful.

SELECT tr.test_acc, e.event_id, re.resultevent_id, re.result_id,
re2.resultevent_id, re2.event_id, tr2.test_acc
FROM TestRequest tr JOIN Event e ON tr.test_acc=e.test_acc
JOIN ResultEvent re ON e.event_id=re.event_id
JOIN ResultEvent re2 ON re.result_id=re2.result_id
JOIN Event e2 ON e2.event_id=re2.event_id
JOIN TestRequest tr2 ON e2.test_acc=tr2.test_acc
WHERE tr.test_acc=3418;

+----------+----------+----------------+-----------+----------------+----------+----------+
| test_acc | event_id | resultevent_id | result_id | resultevent_id | event_id | test_acc |
+----------+----------+----------------+-----------+----------------+----------+----------+
|     3418 |    42178 |           6345 |      6321 |           6345 |    42178 |     3418 |
|     3418 |    42179 |           6346 |      4126 |           4126 |    28004 |     2248 |
|     3418 |    42179 |           6346 |      4126 |           6346 |    42179 |     3418 |
+----------+----------+----------------+-----------+----------------+----------+----------+
 
3 rows in set (0.00 sec)

 

+-------------+
| TestRequest |
+-------------+
| test_acc    |
+-------------+
     1 |
       |
     * |
+-------------+
| Event       |
+-------------+
| test_acc    |
| event_id    |
+-------------+
     1 |
       |
     * |
+----------------+
| ResultEvent    |
+----------------+
| resultevent_id |
| event_id       |
| result_id      |
+----------------+

I think that I understand what you are trying to do.  I changed the table names slightly (added _) in the mock up:
First, the table creation:

CREATE TABLE TEST_REQUEST (
  TEST_ACC NUMBER(10));

CREATE TABLE EVENT (
  TEST_ACC NUMBER(10),
  EVENT_ID NUMBER(10));

CREATE TABLE RESULT_EVENT (
  RESULTEVENT_ID NUMBER(10),
  EVENT_ID NUMBER(10),
  RESULT_ID NUMBER(10));

As best I can tell, the data that is in the tables:

INSERT INTO TEST_REQUEST VALUES (3418);
INSERT INTO TEST_REQUEST VALUES (2248);

INSERT INTO EVENT VALUES (3418,42178);
INSERT INTO EVENT VALUES (3418,42179);
INSERT INTO EVENT VALUES (2248,28004);

INSERT INTO RESULT_EVENT VALUES (6345,42178,6321);
INSERT INTO RESULT_EVENT VALUES (4126,28004,4126);
INSERT INTO RESULT_EVENT VALUES (6346,42179,4126);

COMMIT;

I reformatted your query so that I could more easily see what is happening:

SELECT
  TR.TEST_ACC,
  E.EVENT_ID,
  RE.RESULTEVENT_ID,
  RE.RESULT_ID,
  RE2.RESULTEVENT_ID,
  RE2.EVENT_ID,
  TR2.TEST_ACC
FROM
  TEST_REQUEST TR,
  EVENT E,
  RESULT_EVENT RE,
  RESULT_EVENT RE2,
  EVENT E2,
  TEST_REQUEST TR2
WHERE
  TR.TEST_ACC=3418
  AND TR.TEST_ACC=E.TEST_ACC
  AND E.EVENT_ID=RE.EVENT_ID
  AND RE.RESULT_ID=RE2.RESULT_ID
  AND E2.EVENT_ID=RE2.EVENT_ID
  AND E2.TEST_ACC=TR2.TEST_ACC;

  TEST_ACC   EVENT_ID RESULTEVENT_ID  RESULT_ID RESULTEVENT_ID   EVENT_ID  TEST_ACC
---------- ---------- -------------- ---------- -------------- ---------- ----------
      3418      42179           6346       4126           4126      28004      2248
      3418      42179           6346       4126           6346      42179      3418
      3418      42178           6345       6321           6345      42178      3418

It appears that if you have 2 rows with the same RESULT_ID, you only want the first EVENT_ID, so you need to have some way of numbering the rows.  The ROW_NUMBER analytical function might be able to help:

SELECT
  TR.TEST_ACC,
  E.EVENT_ID,
  RE.RESULTEVENT_ID,
  RE.RESULT_ID,
  RE2.RESULTEVENT_ID,
  RE2.EVENT_ID,
  TR2.TEST_ACC,
  ROW_NUMBER() OVER (PARTITION BY RE.RESULT_ID ORDER BY RE2.EVENT_ID) RN
FROM
  TEST_REQUEST TR,
  EVENT E,
  RESULT_EVENT RE,
  RESULT_EVENT RE2,
  EVENT E2,
  TEST_REQUEST TR2
WHERE
  TR.TEST_ACC=3418
  AND TR.TEST_ACC=E.TEST_ACC
  AND E.EVENT_ID=RE.EVENT_ID
  AND RE.RESULT_ID=RE2.RESULT_ID
  AND E2.EVENT_ID=RE2.EVENT_ID
  AND E2.TEST_ACC=TR2.TEST_ACC;

  TEST_ACC   EVENT_ID RESULTEVENT_ID  RESULT_ID RESULTEVENT_ID   EVENT_ID TEST_ACC   RN
---------- ---------- -------------- ---------- -------------- ---------- ---------- --
      3418      42179           6346       4126           4126      28004      2248   1
      3418      42179           6346       4126           6346      42179      3418   2
      3418      42178           6345       6321           6345      42178      3418   1

Now, if we can filter out any of the rows that do not have RN=1, we may have a usable solution.  By sliding the above into an inline view (and adding column aliases as necessary in the inline view), we can add a WHERE clause to return only those rows with RN=1:

SELECT
  TEST_ACC,
  EVENT_ID,
  RESULTEVENT_ID,
  RESULT_ID,
  RESULTEVENT_ID2,
  EVENT_ID2,
  TEST_ACC2
FROM
  (SELECT
    TR.TEST_ACC,
    E.EVENT_ID,
    RE.RESULTEVENT_ID,
    RE.RESULT_ID,
    RE2.RESULTEVENT_ID RESULTEVENT_ID2,
    RE2.EVENT_ID EVENT_ID2,
    TR2.TEST_ACC TEST_ACC2,
    ROW_NUMBER() OVER (PARTITION BY RE.RESULT_ID ORDER BY RE2.EVENT_ID) RN
  FROM
    TEST_REQUEST TR,
    EVENT E,
    RESULT_EVENT RE,
    RESULT_EVENT RE2,
    EVENT E2,
    TEST_REQUEST TR2
  WHERE
    TR.TEST_ACC=3418
    AND TR.TEST_ACC=E.TEST_ACC
    AND E.EVENT_ID=RE.EVENT_ID
    AND RE.RESULT_ID=RE2.RESULT_ID
    AND E2.EVENT_ID=RE2.EVENT_ID
    AND E2.TEST_ACC=TR2.TEST_ACC)
WHERE
  RN=1;

  TEST_ACC   EVENT_ID RESULTEVENT_ID  RESULT_ID RESULTEVENT_ID2  EVENT_ID2 TEST_ACC2
---------- ---------- -------------- ---------- --------------- ---------- ----------
      3418      42179           6346       4126            4126      28004      2248
      3418      42178           6345       6321            6345      42178      3418




SQL – ROW_NUMBER – Retrieving Rows from the Midpoint

9 12 2009

December 9, 2009

A question appeared in the comp.databases.oracle.server Usenet group a couple of years ago:
http://groups.google.com/group/comp.databases.oracle.server/browse_thread/thread/a8314487f8c414ac

I have a question about rownum which is not mentioned.

Lets say that I want to extract a 3 records out of a result where the middle one is the query.
For example, I have records:

ID      Value 
--      ------
1       Hello 
2       Hi 
3       Wow 
4       Shrek 
5       Shus 
6       What?

And I need a query that by receiving ID, it will extract the record with the id and the previous one to that and the next one. For example for selecting ID=4, it will return records: 3,4,5.

Is there a way to do it in Oracle?

But the ids are not seqeuntial – it can verify from each record, for example:

ID      Value 
225     Hello 
228     Hi 
258     Wow 
240     Shrek 
259     Shus 
230     What?

 I offered the following:

Note: LAG and LEAD will collapse the result into a single row.  If that is not desired, you will need a different approach:
The set up:

CREATE TABLE T1 ( 
  C1 NUMBER(12), 
  C2 VARCHAR2(20)); 
 
INSERT INTO T1 VALUES (1,'Hello'); 
INSERT INTO T1 VALUES (2,'Hi'); 
INSERT INTO T1 VALUES (3,'Wow'); 
INSERT INTO T1 VALUES (4,'Shrek'); 
INSERT INTO T1 VALUES (5,'Shus'); 
INSERT INTO T1 VALUES (6,'What?');

A simple query using the ROW_NUMBER analytical function:

SELECT 
  C1, 
  C2, 
  ROW_NUMBER() OVER (ORDER BY C1) RN 
FROM 
  T1; 
 
        C1 C2                           RN 
---------- -------------------- ---------- 
         1 Hello                         1 
         2 Hi                            2 
         3 Wow                           3 
         4 Shrek                         4 
         5 Shus                          5 
         6 What?                         6

Sliding the above into an inline view to retrieve only those on either side of RN=4:

SELECT 
  T.C1, 
  T.C2 
FROM 
  (SELECT 
    C1, 
    C2, 
    ROW_NUMBER() OVER (ORDER BY C1) RN 
  FROM 
    T1) T 
WHERE 
  T.RN BETWEEN (4 -1) AND (4 +1);
        C1 C2 
---------- ------ 
         3 Wow 
         4 Shrek 
         5 Shus

But, the above is not exactly what you need, unless C1 (your ID column) always starts at 1 and incements by 1.  Essentially listing the inline view twice with a join fixes the problem:

SELECT 
  T.C1, 
  T.C2 
FROM 
  (SELECT 
    C1, 
    C2, 
    ROW_NUMBER() OVER (ORDER BY C1) RN 
  FROM 
    T1) T, 
  (SELECT 
    C1, 
    ROW_NUMBER() OVER (ORDER BY C1) RN 
  FROM 
    T1) T2 
WHERE 
  T2.C1=4 
  AND T.RN BETWEEN (T2.RN -1) AND (T2.RN +1); 
 
        C1 C2 
---------- ----- 
         5 Shus 
         4 Shrek 
         3 Wow

A quick test:

DELETE FROM 
  T1 
WHERE 
  C1=3;

The first method results in:

        C1 C2 
---------- ----- 
         4 Shrek 
         5 Shus 
         6 What?

The second method results in:

        C1 C2 
---------- ----- 
         5 Shus 
         4 Shrek 
         2 Hi

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

A couple other people in the thread offered suggestions, including this one by Martin T.:
Maybe Analytic Functions can help:

select * from ( 
SELECT some_table.*,
 lead(ID) over (order by date) as LEAD_ID,
 lag(id) over (order by date) as LAG_ID 
from some_table 
) some_table_plus 
where ID      = :search_id 
    or LEAD_ID = :search_id 
    or LAG_ID  = :search_id 
order by date