Everything or Nothing in SQL

23 05 2013

May 23, 2013

The following question recently came through an ERP mailing list (significantly rephrased):

I would like to use the Microsoft Query tool in Microsoft Excel to extract records from the ERP database.  I would like the list of parent records to be retrieved into Excel when all of the specified child records belong to the parent.

That sounds as though it would be an easy request.  The OP in the ERP mailing list later followed up with the following:

“If I may ask for another variation to pick your brain….

How would I rewrite this to show ONLY the work order(s) that have all of the 3 Parts, but instead of showing all of them, show only the ones that don’t include a specific other part?

For example, Include all the work order retrieved from your query, but then subtract those work orders that don’t contain a part or parts in another separate list.

I.E. All order containing parts (‘147555′,’227643′,’155328′) but not containing ANY of these parts (‘12345′, ‘456789’)”

In the above follow up, the term “work order(s)” refer to the “parent records” that was previously mentioned, and the term “parts” refer to the “child records” that was previously mentioned.  Below is a graphical mock up, with the light-green colored rectangles representing the “parts”, and the white colored rectangle at the bottom left representing the “work order”.  The other white colored rectangles represent subassemblies that belong to the “work order”, and the cyan colored rectangles represent the operations that utilize the “parts”.  Through normalization, the white colored rectangles represent rows in one database table (WORK_ORDER – the “parent” table), the cyan colored rectangles represent rows in another table (OPERATION – the “child” table), and the light-green colored rectangles represent rows in a third table (REQUIREMENT – the “grandchild” table).

EverythingOrNothing

The first request still seems to be easy to solve, even though there are three tables that are potentially involved.  Let’s simplify the problem further and just focus on the “grandchild” table.  Let’s create a small version of that table and populate the table with some sample rows:

CREATE TABLE T1 (
  WORKORDER_TYPE VARCHAR2(1),
  WORKORDER_BASE_ID VARCHAR2(15),
  WORKORDER_LOT_ID VARCHAR2(3),
  WORKORDER_SUB_ID VARCHAR2(3),
  PIECE_NO NUMBER,
  PART_ID VARCHAR2(15),
  QTY NUMBER(12,4));

INSERT INTO
  T1
SELECT
  'W' WORKORDER_TYPE,
  TRIM(TO_CHAR(TRUNC(ROWNUM / 100 + 1)*1000,'00000')) || 'W' WORKORDER_BASE_ID,
  TO_CHAR(MOD(TRUNC(ROWNUM / 25 + 1), 999) + 1) WORKORDER_LOT_ID,
  TO_CHAR(TRUNC(MOD(ROWNUM - 1, 25) / 5) + 1) WORKORDER_SUB_ID,
  MOD(ROWNUM - 1, 25) + 1 PIECE_NO,
  CHR(65 + MOD(ROWNUM - 1, 26)) || CHR(65 + MOD(ROWNUM - 1, 15)) || MOD(ROWNUM - 1, 10)  PART_ID,
  TRUNC(DBMS_RANDOM.VALUE*100) + 1 QTY
FROM
  DUAL
CONNECT BY
  LEVEL<=1000;

In the above, the combination of the WORKORDER_BASE_ID column and the WORKORDER_LOT_ID column represent the “work order” that was mentioned by the OP.  The WORKORDER_SUB_ID column identifies the subassembly number (answering the question of to which of the white colored rectangles does the “part” belong).  The PIECE_NO column is used to uniquely identify the rows that belong to a specific subassembly number (that is the final column in the REQUIREMENT table’s primary key).  The PART_ID column is the specific “part” that is attached to a specific “work order”.  The QTY column indicates the number of the “part” that is required by the “work order” – with the sample data, this is the only column that will differ from one execution of the table creation script to the next execution.

The sample data is a bit too consistent, however, we will ignore any issues that consistency may present for now.  Let’s execute a simple SQL statement that indicates how many from the following “parts” list appear in the “work orders”: MI8, NJ9, KG6 (note the use of DISTINCT – a single PART_ID could appear on more than one subassembly in a single WORKORDER_BASE_ID, WORKORDER_LOT_ID combination, and we do not want to repeatedly count those “parts”):

SELECT
  T.WORKORDER_TYPE,
  T.WORKORDER_BASE_ID,
  T.WORKORDER_LOT_ID,
  T.WORKORDER_SUB_ID,
  T.PART_ID,
  COUNT(DISTINCT T.PART_ID) OVER (PARTITION BY T.WORKORDER_TYPE, T.WORKORDER_BASE_ID, T.WORKORDER_LOT_ID) AS C
FROM
  T1 TWHERE  T.PART_ID IN ('MI8','NJ9','KG6');

W WORKORDER_BASE_ WOR WOR PART_ID                  C
- --------------- --- --- --------------- ----------
W 01000W          3   3   KG6                      3
W 01000W          3   3   MI8                      3
W 01000W          3   3   NJ9                      3
W 05000W          19  1   KG6                      3
W 05000W          19  1   MI8                      3
W 05000W          19  1   NJ9                      3
W 09000W          34  4   KG6                      3
W 09000W          34  4   MI8                      3
W 09000W          34  4   NJ9                      3

Only 9 rows returned, with three different WORKORDER_BASE_ID, WORKORDER_LOT_ID combinations.  For a moment, let’s ignore that the three “parts” appear in all three “work orders”.  If we wanted a list of those “work orders” where _all_ three of the “parts” are present, and not just _one or more_ of the three “parts”, we could simply slide the above SQL statement into an inline view and specify that our C generated column must be equal to (or greater than – the use of greater than will be explained later) 3:

SELECT
  WORKORDER_TYPE,
  WORKORDER_BASE_ID,
  WORKORDER_LOT_ID,
  WORKORDER_SUB_ID,
  PART_ID
FROM
  (SELECT
    T.WORKORDER_TYPE,
    T.WORKORDER_BASE_ID,
    T.WORKORDER_LOT_ID,
    T.WORKORDER_SUB_ID,
    T.PART_ID,
    COUNT(DISTINCT T.PART_ID) OVER (PARTITION BY T.WORKORDER_TYPE, T.WORKORDER_BASE_ID, T.WORKORDER_LOT_ID) AS C
  FROM
    T1 T
  WHERE
    T.PART_ID IN ('MI8','NJ9','KG6'))
WHERE
  C>=3
ORDER BY
  WORKORDER_BASE_ID,
  WORKORDER_LOT_ID,
  WORKORDER_SUB_ID;

W WORKORDER_BASE_ WOR WOR PART_ID
- --------------- --- --- -------
W 01000W          3   3   KG6
W 01000W          3   3   MI8
W 01000W          3   3   NJ9
W 05000W          19  1   KG6
W 05000W          19  1   MI8
W 05000W          19  1   NJ9
W 09000W          34  4   KG6
W 09000W          34  4   MI8
W 09000W          34  4   NJ9

While trying to determine the OP’s business case for the requests, I wondered if the OP was attempting to replace a specific “part” with another “part” within the ERP package, and simply required a report to understand the progress of that manual “part” replacement procedure in the ERP package.  Let’s move forward with this assumption by replacing a row containing a specific PART_ID value with a new PART_ID value, but only for a single “work order”:

DELETE FROM
  T1
WHERE
  WORKORDER_TYPE='W'
  AND WORKORDER_BASE_ID='09000W'
  AND WORKORDER_LOT_ID='34'
  AND PART_ID='NJ9';

1 row deleted.

INSERT INTO T1 VALUES (
  'W',
  '09000W',
  '34',
  '4',
  10,
  'REPLACEMENT-NJ9',
  10);

1 row created.

Let’s check the result of the SQL statement that we were in the process of developing:

SELECT
  WORKORDER_TYPE,
  WORKORDER_BASE_ID,
  WORKORDER_LOT_ID,
  WORKORDER_SUB_ID,
  PART_ID
FROM
  (SELECT
    T.WORKORDER_TYPE,
    T.WORKORDER_BASE_ID,
    T.WORKORDER_LOT_ID,
    T.WORKORDER_SUB_ID,
    T.PART_ID,
    COUNT(DISTINCT T.PART_ID) OVER (PARTITION BY T.WORKORDER_TYPE, T.WORKORDER_BASE_ID, T.WORKORDER_LOT_ID) AS C
  FROM
    T1 T
  WHERE
    T.PART_ID IN ('MI8','NJ9','KG6'))
WHERE
  C>=3
ORDER BY
  WORKORDER_BASE_ID,
  WORKORDER_LOT_ID,
  WORKORDER_SUB_ID;

W WORKORDER_BASE_ WOR WOR PART_ID
- --------------- --- --- -------
W 01000W          3   3   KG6
W 01000W          3   3   MI8
W 01000W          3   3   NJ9
W 05000W          19  1   KG6
W 05000W          19  1   MI8
W 05000W          19  1   NJ9

Now, there are only six rows returned from two “work orders” – the third “work order” is no longer included in the query results.

With the use of the C >= n syntax, we are able to find “work orders” with at least n number of the specified “parts”.  So, if we were interested in finding cases where at least two of the three listed “parts” are attached to a “work order”, we just need to change the 3 to 2:

SELECT
  WORKORDER_TYPE,
  WORKORDER_BASE_ID,
  WORKORDER_LOT_ID,
  WORKORDER_SUB_ID,
  PART_ID
FROM
  (SELECT
    T.WORKORDER_TYPE,
    T.WORKORDER_BASE_ID,
    T.WORKORDER_LOT_ID,
    T.WORKORDER_SUB_ID,
    T.PART_ID,
    COUNT(DISTINCT T.PART_ID) OVER (PARTITION BY T.WORKORDER_TYPE, T.WORKORDER_BASE_ID, T.WORKORDER_LOT_ID) AS C
  FROM
    T1 T
  WHERE
    T.PART_ID IN ('MI8','NJ9','KG6'))
WHERE
  C>=2
ORDER BY
  WORKORDER_BASE_ID,
  WORKORDER_LOT_ID,
  WORKORDER_SUB_ID;

W WORKORDER_BASE_ WOR WOR PART_ID
- --------------- --- --- -------
W 01000W          3   3   KG6
W 01000W          3   3   MI8
W 01000W          3   3   NJ9
W 05000W          19  1   KG6
W 05000W          19  1   NJ9
W 05000W          19  1   MI8
W 09000W          34  4   KG6
W 09000W          34  4   MI8

The third “work order” is again included in the query resultset.

Other than putting the results into Microsoft Excel, the sample query satisfies the first request of the OP.  If we were not interested in returning the WORKORDER_SUB_ID and PART_ID columns, we could have simply used the following SQL statement:

SELECT
  T.WORKORDER_TYPE,
  T.WORKORDER_BASE_ID,
  T.WORKORDER_LOT_ID
FROM
  T1 T
WHERE
  T.PART_ID IN ('MI8','NJ9','KG6')
GROUP BY
  T.WORKORDER_TYPE,
  T.WORKORDER_BASE_ID,
  T.WORKORDER_LOT_ID
HAVING
  COUNT(*)>=3;

W WORKORDER_BASE_ WOR
- --------------- ---
W 05000W          19
W 01000W          3

Now to work on the second request, only retrieving the “work orders” when all of a specified list of “parts” are attached to the “work order” and none of another list of “parts” are attached to the “work order”.  As a starting point, we will re-use the inline view from the previous solution.  The IN list will be modified to include the “parts” that must be attached to the “work order”, as well as the “parts” that must not be attached to the “work order”.  The COUNT analytic function must be modified to include either a DECODE or CASE structure to divide the “parts” into the “must” and “must not” lists so that a count of each may be obtained:

SELECT
  T.WORKORDER_TYPE,
  T.WORKORDER_BASE_ID,
  T.WORKORDER_LOT_ID,
  T.WORKORDER_SUB_ID,
  T.PART_ID,
  COUNT(DISTINCT (CASE WHEN T.PART_ID IN ('MI8','NJ9','KG6') THEN T.PART_ID ELSE NULL END)) OVER (PARTITION BY T.WORKORDER_TYPE, T.WORKORDER_BASE_ID, T.WORKORDER_LOT_ID) AS C,
  COUNT(DISTINCT (CASE WHEN T.PART_ID IN ('REPLACEMENT-NJ9') THEN T.PART_ID ELSE NULL END)) OVER (PARTITION BY T.WORKORDER_TYPE, T.WORKORDER_BASE_ID, T.WORKORDER_LOT_ID) AS NC
FROM
  T1 T
WHERE
  T.PART_ID IN ('MI8','KG6',   'REPLACEMENT-NJ9');

W WORKORDER_BASE_ WOR WOR PART_ID                  C         NC
- --------------- --- --- --------------- ---------- ----------
W 01000W          3   3   KG6                      2          0
W 01000W          3   3   MI8                      2          0
W 05000W          19  1   KG6                      2          0
W 05000W          19  1   MI8                      2          0
W 09000W          34  4   KG6                      2          1
W 09000W          34  4   MI8                      2          1
W 09000W          34  4   REPLACEMENT-NJ9          2          1

To satisfy the OP’s second request, we are interested in those rows where the C column value is (at least) 2 and the NC column value is 0.  Just as before, we will slide the above SQL statement into an inline view and add the specifications for the C and NC generated columns:

SELECT
  WORKORDER_TYPE,
  WORKORDER_BASE_ID,
  WORKORDER_LOT_ID,
  WORKORDER_SUB_ID,
  PART_ID
FROM
  (SELECT
    T.WORKORDER_TYPE,
    T.WORKORDER_BASE_ID,
    T.WORKORDER_LOT_ID,
    T.WORKORDER_SUB_ID,
    T.PART_ID,
    COUNT(DISTINCT (CASE WHEN T.PART_ID IN ('MI8','NJ9','KG6') THEN T.PART_ID ELSE NULL END)) OVER (PARTITION BY T.WORKORDER_TYPE, T.WORKORDER_BASE_ID, T.WORKORDER_LOT_ID) AS C,
    COUNT(DISTINCT (CASE WHEN T.PART_ID IN ('REPLACEMENT-NJ9') THEN T.PART_ID ELSE NULL END)) OVER (PARTITION BY T.WORKORDER_TYPE, T.WORKORDER_BASE_ID, T.WORKORDER_LOT_ID) AS NC
  FROM
    T1 T
  WHERE
    T.PART_ID IN ('MI8','KG6',   'REPLACEMENT-NJ9'))
WHERE
  C>=2
  AND NC=0
ORDER BY
  WORKORDER_BASE_ID,
  WORKORDER_LOT_ID,
  WORKORDER_SUB_ID;

W WORKORDER_BASE_ WOR WOR PART_ID
- --------------- --- --- -------
W 01000W          3   3   KG6
W 01000W          3   3   MI8
W 05000W          19  1   KG6
W 05000W          19  1   MI8

The above shows those “work orders” where the NJ9 “part” had not yet been replaced with the REPLACEMENT-NJ9 “part”.  If desired, we are also able to easily modify the OP’s second request to see the “work orders” where the “part” was already replaced just by changing NC=0 to NC=1:

SELECT
  WORKORDER_TYPE,
  WORKORDER_BASE_ID,
  WORKORDER_LOT_ID,
  WORKORDER_SUB_ID,
  PART_ID
FROM
  (SELECT
    T.WORKORDER_TYPE,
    T.WORKORDER_BASE_ID,
    T.WORKORDER_LOT_ID,
    T.WORKORDER_SUB_ID,
    T.PART_ID,
    COUNT(DISTINCT (CASE WHEN T.PART_ID IN ('MI8','NJ9','KG6') THEN T.PART_ID ELSE NULL END)) OVER (PARTITION BY T.WORKORDER_TYPE, T.WORKORDER_BASE_ID, T.WORKORDER_LOT_ID) AS C,
    COUNT(DISTINCT (CASE WHEN T.PART_ID IN ('REPLACEMENT-NJ9') THEN T.PART_ID ELSE NULL END)) OVER (PARTITION BY T.WORKORDER_TYPE, T.WORKORDER_BASE_ID, T.WORKORDER_LOT_ID) AS NC
  FROM
    T1 T
  WHERE
    T.PART_ID IN ('MI8','KG6',   'REPLACEMENT-NJ9'))
WHERE
  C>=2
  AND NC=1
ORDER BY
  WORKORDER_BASE_ID,
  WORKORDER_LOT_ID,
  WORKORDER_SUB_ID;

W WORKORDER_BASE_ WOR WOR PART_ID
- --------------- --- --- ---------------
W 09000W          34  4   KG6
W 09000W          34  4   REPLACEMENT-NJ9
W 09000W          34  4   MI8

What other related information are we able to return?  What about all “parts” (the full bill of material) for each of the work orders that match the OP’s second request.  That modification is simple, just slide the majority of the above SQL statement into a subquery (found in the first WHERE clause):

SELECT
  *
FROM
  T1
WHERE
  (WORKORDER_TYPE,
   WORKORDER_BASE_ID,
   WORKORDER_LOT_ID) IN
    (SELECT
      WORKORDER_TYPE,
      WORKORDER_BASE_ID,
      WORKORDER_LOT_ID
    FROM
      (SELECT
        T.WORKORDER_TYPE,
        T.WORKORDER_BASE_ID,
        T.WORKORDER_LOT_ID,
        T.WORKORDER_SUB_ID,
        T.PART_ID,
        COUNT(DISTINCT (CASE WHEN T.PART_ID IN ('MI8','NJ9','KG6') THEN T.PART_ID ELSE NULL END)) OVER (PARTITION BY T.WORKORDER_TYPE, T.WORKORDER_BASE_ID, T.WORKORDER_LOT_ID) AS C,
        COUNT(DISTINCT (CASE WHEN T.PART_ID IN ('REPLACEMENT-NJ9') THEN T.PART_ID ELSE NULL END)) OVER (PARTITION BY T.WORKORDER_TYPE, T.WORKORDER_BASE_ID, T.WORKORDER_LOT_ID) AS NC
      FROM
        T1 T
      WHERE
        T.PART_ID IN ('MI8','KG6',   'REPLACEMENT-NJ9'))
    WHERE
      C>=2
      AND NC=0)
ORDER BY
  WORKORDER_BASE_ID,
  WORKORDER_LOT_ID,
  WORKORDER_SUB_ID,
  PIECE_NO;

W WORKORDER_BASE_ WOR WOR   PIECE_NO PART_ID                QTY
- --------------- --- --- ---------- --------------- ----------
W 01000W          3   1            1 ZK5                     48
W 01000W          3   1            2 AL6                     60
W 01000W          3   1            3 BM7                     42
W 01000W          3   1            4 CN8                     26
W 01000W          3   1            5 DO9                     49
W 01000W          3   2            6 EA0                     66
W 01000W          3   2            7 FB1                     39
W 01000W          3   2            8 GC2                     57
W 01000W          3   2            9 HD3                     42
W 01000W          3   2           10 IE4                     52
W 01000W          3   3           11 JF5                     16
W 01000W          3   3           12 KG6                     61
W 01000W          3   3           13 LH7                     39
W 01000W          3   3           14 MI8                     75
W 01000W          3   3           15 NJ9                     94
W 01000W          3   4           16 OK0                     67
W 01000W          3   4           17 PL1                     12
W 01000W          3   4           18 QM2                     41
W 01000W          3   4           19 RN3                     97
W 01000W          3   4           20 SO4                     15
W 01000W          3   5           21 TA5                     54
W 01000W          3   5           22 UB6                     59
W 01000W          3   5           23 VC7                     23
W 01000W          3   5           24 WD8                      7
W 01000W          3   5           25 YJ4                     22
W 05000W          19  1            1 JF5                     31
W 05000W          19  1            2 KG6                     42
W 05000W          19  1            3 LH7                     37
W 05000W          19  1            4 MI8                     85
W 05000W          19  1            5 NJ9                    100
W 05000W          19  2            6 OK0                     43
W 05000W          19  2            7 PL1                     58
W 05000W          19  2            8 QM2                     76
W 05000W          19  2            9 RN3                     66
W 05000W          19  2           10 SO4                     75
W 05000W          19  3           11 TA5                     50
W 05000W          19  3           12 UB6                     55
W 05000W          19  3           13 VC7                     18
W 05000W          19  3           14 WD8                     33
W 05000W          19  3           15 XE9                      8
W 05000W          19  4           16 YF0                     86
W 05000W          19  4           17 ZG1                      7
W 05000W          19  4           18 AH2                     25
W 05000W          19  4           19 BI3                     38
W 05000W          19  4           20 CJ4                     34
W 05000W          19  5           21 DK5                     88
W 05000W          19  5           22 EL6                     91
W 05000W          19  5           23 FM7                     52
W 05000W          19  5           24 GN8                     71
W 05000W          19  5           25 IE4                     55

Just as before, we are able to see all “parts” for the “work orders” that already have the replacement part, just by changing the NC=0 to NC=1:

SELECT
  *
FROM
  T1
WHERE
  (WORKORDER_TYPE,
   WORKORDER_BASE_ID,
   WORKORDER_LOT_ID) IN
    (SELECT
      WORKORDER_TYPE,
      WORKORDER_BASE_ID,
      WORKORDER_LOT_ID
    FROM
      (SELECT
        T.WORKORDER_TYPE,
        T.WORKORDER_BASE_ID,
        T.WORKORDER_LOT_ID,
        T.WORKORDER_SUB_ID,
        T.PART_ID,
        COUNT(DISTINCT (CASE WHEN T.PART_ID IN ('MI8','NJ9','KG6') THEN T.PART_ID ELSE NULL END)) OVER (PARTITION BY T.WORKORDER_TYPE, T.WORKORDER_BASE_ID, T.WORKORDER_LOT_ID) AS C,
        COUNT(DISTINCT (CASE WHEN T.PART_ID IN ('REPLACEMENT-NJ9') THEN T.PART_ID ELSE NULL END)) OVER (PARTITION BY T.WORKORDER_TYPE, T.WORKORDER_BASE_ID, T.WORKORDER_LOT_ID) AS NC
      FROM
        T1 T
      WHERE
        T.PART_ID IN ('MI8','KG6',   'REPLACEMENT-NJ9'))
    WHERE
      C>=2
      AND NC=1)
ORDER BY
  WORKORDER_BASE_ID,
  WORKORDER_LOT_ID,
  WORKORDER_SUB_ID,
  PIECE_NO;

W WORKORDER_BASE_ WOR WOR   PIECE_NO PART_ID                QTY
- --------------- --- --- ---------- --------------- ----------
W 09000W          34  1            1 UF0                     80
W 09000W          34  1            2 VG1                     75
W 09000W          34  1            3 WH2                     75
W 09000W          34  1            4 XI3                     77
W 09000W          34  1            5 YJ4                     46
W 09000W          34  2            6 ZK5                     57
W 09000W          34  2            7 AL6                     64
W 09000W          34  2            8 BM7                     14
W 09000W          34  2            9 CN8                     30
W 09000W          34  2           10 DO9                     62
W 09000W          34  3           11 EA0                     45
W 09000W          34  3           12 FB1                     68
W 09000W          34  3           13 GC2                     25
W 09000W          34  3           14 HD3                     73
W 09000W          34  3           15 IE4                     32
W 09000W          34  4           10 REPLACEMENT-NJ9         10
W 09000W          34  4           16 JF5                     49
W 09000W          34  4           17 KG6                     23
W 09000W          34  4           18 LH7                     60
W 09000W          34  4           19 MI8                     75
W 09000W          34  5           21 OK0                     91
W 09000W          34  5           22 PL1                     66
W 09000W          34  5           23 QM2                     45
W 09000W          34  5           24 RN3                     49
W 09000W          34  5           25 TE9                      6

Using the sample table and data provided, are there any other solutions to the OP’s requests?





Analysis Challenges

25 04 2013

April 25, 2013

Roughly 12 years ago I was attempting to analyze customer order changes that were received through electronic document interchange (EDI), specifically X12 830 documents that show order forecasted demand for specific part numbers.  At the time, the EDI data was partially transformed and inserted into an Oracle 8.0.5 database, while that data is now inserted into an Oracle 11.2 database.  I recently took another glance that this particular problem to see if there is a better method for performing the analysis.

Complications in the analysis:

  • The EDI documents do not include the customer’s purchase order line numbers, just a list of outstanding due dates and quantities.  That complication means that there is no easy way to see if an outstanding quantity has changed, or the date for that outstanding quantity has changed.
  • It is possible to ship to the customer a partial order quantity, with the remaining quantity on the line handled as a back-order.  If the customer does not alter this particular forecast line, only the quantity outstanding will be provided, along with the due date – the original quantity is not provided.
  • It is possible to ship out an order line received in an 830 document up to five days before the customer’s due date.
  • Shipped parts may require hours or days to reach the customer’s site, thus delaying when those shipped parts are acknowledged in the list of outstanding due dates and quantities received through EDI.
  • EDI 830 documents are generally received the next weekday after the customer batch generates those documents on a nightly basis.  Thus, there is an opportunity for the received forecast to be out of sync with what is known to have arrived at the customer site, and what is acknowledged in the EDI data.
  • The customer may generate multiple EDI 830 documents for a particular part number/purchase order combination on any given day, and only the EDI 830 document received with the highest (sorted in ascending order) release number (which may not be a number) on a specific date is the valid release for the specific date.
  • EDI 830 documents may be generated for a specific part number by the customer nearly every weekday, once a month, or any frequency in between.

I recall trying to find a solution to this particular problem 12 years ago.  I thought that I could possibly pull all of the relevant data from the Oracle database into a custom application, and through some magical formula output what appeared to be a list of what the customer changed.  However, what about those part quantities that are in transit to the customer?  What about those part quantities that were received by the customer, but had not yet made their way through the customer’s batch processing system?  What about those cases where the customer sents two or more EDI 830 releases for a specific part number, and those EDI releases are received on the same date?

At the time, I was frustrated.  The quality of the output of my analysis would be dependent on the accuracy of humans (yes, and humans wrote the program that performed the analysis too).  I simply had to find the first matching due date and order quantity from the previous release with the due date and order quanitity from the current release, and then rely on an end user to correct the 5% to 10% of the cases where the matching algorithm was wrong due to one or more of the “complications” outlined above.

While recently re-examining the problem, I noticed a data element in the EDI 830 document that is described as “ACCUM RECEIPTS AS OF THIS DATE”, and a luck would have it, that particular data element is one of those elements that was selected more than 12 years ago to be transformed into the Oracle Database tables (in the sample data included with this blog article, this column is named PREVIOUS_ACCUM_QTY).  If only I recognized the meaning of the values of this column, and had access to Oracle analytical SQL functions 12 years ago, this particular analysis problem might not have had so many variables that could potentially throw off the quality of the analysis.

This particular blog article has a fairly large set of sample data – however, that sample data is a fairly small subset of the data that must be analyzed on a daily basis.  The sample data may be downloaded here: analysischallenges.sql sample code (save the file as “analysischallenges.sql” – without the .doc double extension).  If the file is saved in the same directory (folder) as SQL*Plus was started from, you may then execute the script in SQL*Plus to build the tables and populate the sample data by using the following command:

@analysischallenges.sql

The two sample table definitions are created with the following commands (also included in the script):

CREATE TABLE T1 (
  CUSTOMER_PO VARCHAR2(15),
  RELEASE_ID VARCHAR2(15),
  PART_ID VARCHAR2(30),
  RELEASE_DATE DATE,
  PREVIOUS_ACCUM_QTY NUMBER,
  CREATE_DATE DATE,
  PRIMARY KEY (CUSTOMER_PO,RELEASE_ID));

CREATE TABLE T1_LINE (
  CUSTOMER_PO VARCHAR2(12),
  RELEASE_ID VARCHAR2(12),
  QTY NUMBER,
  DUE_DATE DATE,
  PRIMARY KEY (CUSTOMER_PO,RELEASE_ID,DUE_DATE));

T1 Table Columns:

CUSTOMER_PO is the unique identifier for the customer purchase order received through the EDI 830 documents.  RELEASE_ID defines the specific serialized change number for the customer’s PO – the RELEASE_ID is unique for a specific PO.  PART_ID is the part number for which the customer is providing a forecast.  RELEASE_DATE is the date that the forecast was batch generated in the customer’s computer system.  PREVIOUS_ACCUM_QTY is the count of the number of this part number that the customer has already received for this specific customer purchase order.  CREATE_DATE is the date that the forecast was received from the customer and transformed/inserted into the Oracle Database tables.

T1_LINE Table Columns:

CUSTOMER_PO and RELEASE_ID are undeclared foreign key columns that point back to table T1.  QTY is the quantity of the part number that should (or is forecasted to) ship on the specfied DUE_DATE.

—–

If you are curious how I created the script to populate the tables, I executed the following in SQL*Plus, and then performed a little cleanup of the resulting spool files:

SET LINESIZE 200
SET TRIMSPOOL ON

SPOOL analysischallenges_t1.txt

SELECT
  'INSERT INTO T1 VALUES ('||CHR(39)||CUSTOMER_PO||CHR(39)||','
    ||CHR(39)||RELEASE_ID||CHR(39)||','
    ||CHR(39)||PART_ID||CHR(39)||','
    ||'TO_DATE('||CHR(39)||TO_CHAR(RELEASE_DATE,'DD-MON-YYYY')||CHR(39)||','||CHR(39)||'DD-MON-YYYY'||CHR(39)||')'||','
    ||PREVIOUS_ACCUM_QTY||','
    ||'TO_DATE('||CHR(39)||TO_CHAR(CREATE_DATE,'DD-MON-YYYY')||CHR(39)||','||CHR(39)||'DD-MON-YYYY'||CHR(39)||')'||');' D
FROM
  T1
ORDER BY
  CUSTOMER_PO,
  RELEASE_ID;

SPOOL OFF

SPOOL analysischallenges_t1_line.txt

SELECT
  'INSERT INTO T1_LINE VALUES ('||CHR(39)||CUSTOMER_PO||CHR(39)||','
    ||CHR(39)||RELEASE_ID||CHR(39)||','
    ||QTY||','
    ||'TO_DATE('||CHR(39)||TO_CHAR(DUE_DATE,'DD-MON-YYYY')||CHR(39)||','||CHR(39)||'DD-MON-YYYY'||CHR(39)||')'||');' D
FROM
  T1_LINE
ORDER BY
  CUSTOMER_PO,
  RELEASE_ID,
  DUE_DATE;

SPOOL OFF

—–

We must find a starting point for performing the analysis of what changed from one day’s forecast to a prior day’s forecast.  There is an issue with the EDI 830 releases that were received (identified by the CREATE_DATE column) on April 24, 2013 – two releases for the same customer purchase orders were received, and there were two purchase orders that demand the same part number.  Our analysis is only concerned with the highest numbered (sorted alphabetically) release number document for each customer purchase order that was received on any particular day, so we need to find a way to eliminate the other older releases for the same customer purchase order that was received on the same day.  The ROW_NUMBER analytic function is perfect for this task – we just need the output to show the same number (1) for the rows that we want to keep, and have some other number appear on the other rows.  Something similar to the following may work (note that the TRUNC function, or other approach that uses a date/time range, probably should be used when working with the CREATE_DATE column – realistically, there could be a time component included with those date values):

COLUMN PART_ID FORMAT A10
SET LINESIZE 120
SET PAGESIZE 1000

SELECT
  T.PART_ID,
  T.CUSTOMER_PO,
  ROW_NUMBER() OVER (PARTITION BY T.CUSTOMER_PO,TRUNC(T.CREATE_DATE) ORDER BY T.RELEASE_ID DESC) REL_ON_DATE,
  T.RELEASE_ID,
  T.RELEASE_DATE,
  T.CREATE_DATE,
  T.PREVIOUS_ACCUM_QTY
FROM
  T1 T
WHERE 
  T.CREATE_DATE=TO_DATE('24-APR-2013','DD-MON-YYYY')
ORDER BY
  T.PART_ID,
  T.CUSTOMER_PO,
  T.RELEASE_ID;

PART_ID    CUSTOMER_PO     REL_ON_DATE RELEASE_ID      RELEASE_D CREATE_DA PREVIOUS_ACCUM_QTY
---------- --------------- ----------- --------------- --------- --------- ------------------
DS1812+    1231234                   2 20130402-001    23-APR-13 24-APR-13                 24
DS1812+    1231234                   1 20130403-001    24-APR-13 24-APR-13                 24
DS1812+    1233290                   2 20130402-001    23-APR-13 24-APR-13                  4
DS1812+    1233290                   1 20130403-001    24-APR-13 24-APR-13                  4
DS212+     1133290                   2 20130402-001    23-APR-13 24-APR-13                  2
DS212+     1133290                   1 20130403-001    24-APR-13 24-APR-13                  2
RS812      1231280                   2 20130402-001    23-APR-13 24-APR-13                 60
RS812      1231280                   1 20130403-001    24-APR-13 24-APR-13                 60

To remove all of the rows that have something other than 1 in the REL_ON_DATE column, we are able to slide the above SQL statement into an inline view, and then specify that the REL_ON_DATE column value must be equal to 1.  Typically, there are supposed to be few/no changes made by the customer within eight weeks of the customer forecast’s release date, but that is not always the case.  While eliminating the unnecessary releases, it probably makes sense to also calculate the end date of this “firm” period by adding 55 days to the date on which the forecast was created:

SELECT
  T.PART_ID,
  T.CUSTOMER_PO,
  T.RELEASE_ID,
  T.RELEASE_DATE+55 POSSIBLE_FIRM_DATE,
  T.PREVIOUS_ACCUM_QTY
FROM
  (SELECT
    T.PART_ID,
    T.CUSTOMER_PO,
    ROW_NUMBER() OVER (PARTITION BY T.CUSTOMER_PO,TRUNC(T.CREATE_DATE) ORDER BY T.RELEASE_ID DESC) REL_ON_DATE,
    T.RELEASE_ID,
    T.RELEASE_DATE,
    T.CREATE_DATE,
    T.PREVIOUS_ACCUM_QTY
  FROM
    T1 T
  WHERE 
    T.CREATE_DATE=TO_DATE('24-APR-2013','DD-MON-YYYY')) T
WHERE
  T.REL_ON_DATE=1
ORDER BY
  T.PART_ID,
  T.CUSTOMER_PO,
  T.RELEASE_ID;

PART_ID    CUSTOMER_PO     RELEASE_ID      POSSIBLE_ PREVIOUS_ACCUM_QTY
---------- --------------- --------------- --------- ------------------
DS1812+    1231234         20130403-001    18-JUN-13                 24
DS1812+    1233290         20130403-001    18-JUN-13                  4
DS212+     1133290         20130403-001    18-JUN-13                  2
RS812      1231280         20130403-001    18-JUN-13                 60

Next, we need to retrieve the forecasted delivery schedule (in the T1_LINES table) that are associated with the above rows.  If we take the value in the PREVIOUS_ACCUM_QTY column, and find a way to generate a running SUM of the QTY column from the T1_LINE table, in theory we should be able to determine if the total quantity ordered through a specific date has changed.  The analytic version of the SUM function offers the capability of producing a running SUM when an ORDER BY clause is included in the function call.  To reduce the data set size, I will only consider those forecasts with a scheduled delivery date that is within 101 days of the customer’s forecast date.  With those changes in place, the SQL statement is modified as follows:

SELECT
  T.PART_ID,
  T.CUSTOMER_PO,
  T.RELEASE_ID,
  T.RELEASE_DATE+55 POSSIBLE_FIRM_DATE,
  T.PREVIOUS_ACCUM_QTY,
  TL.DUE_DATE,
  TL.QTY,
  T.PREVIOUS_ACCUM_QTY + SUM(TL.QTY) OVER (PARTITION BY T.CUSTOMER_PO,T.RELEASE_ID ORDER BY TL.DUE_DATE) REL_RUNNING_SUM
FROM
  (SELECT
    T.PART_ID,
    T.CUSTOMER_PO,
    ROW_NUMBER() OVER (PARTITION BY T.CUSTOMER_PO,TRUNC(T.CREATE_DATE) ORDER BY T.RELEASE_ID DESC) REL_ON_DATE,
    T.RELEASE_ID,
    T.RELEASE_DATE,
    T.CREATE_DATE,
    T.PREVIOUS_ACCUM_QTY
  FROM
    T1 T
  WHERE 
    T.CREATE_DATE=TO_DATE('24-APR-2013','DD-MON-YYYY')) T,
  T1_LINE TL
WHERE
  T.REL_ON_DATE=1
  AND T.CUSTOMER_PO=TL.CUSTOMER_PO
  AND T.RELEASE_ID=TL.RELEASE_ID
  AND (T.RELEASE_DATE+100)>=TL.DUE_DATE
ORDER BY
  T.PART_ID,
  T.CUSTOMER_PO,
  T.RELEASE_ID,
  TL.DUE_DATE;

PART_ID    CUSTOMER_PO     RELEASE_ID      POSSIBLE_ PREVIOUS_ACCUM_QTY DUE_DATE         QTY REL_RUNNING_SUM
---------- --------------- --------------- --------- ------------------ --------- ---------- ---------------
DS1812+    1231234         20130403-001    18-JUN-13                 24 24-APR-13          0              24
DS1812+    1233290         20130403-001    18-JUN-13                  4 24-APR-13          0               4
DS212+     1133290         20130403-001    18-JUN-13                  2 24-APR-13          0               2
RS812      1231280         20130403-001    18-JUN-13                 60 27-JUN-13          1              61
RS812      1231280         20130403-001    18-JUN-13                 60 03-JUL-13          1              62
RS812      1231280         20130403-001    18-JUN-13                 60 24-JUL-13          2              64

As may be seen above, only customer purchase order 1231280 has a forecast, calling for a quantity of 1 on June 27, quantity of 1 on July 3, and a quantity of 2 on July 24.  The previously received quantity for customer purchase order 1231280 is 60, so the additional forecast quantities will bring the total order quantities up to 61, 62, and 64 through the three dates included in the forecast.

-The Top Half of the Final SQL Statement is Above-

So, we now have a starting point – the data that was received on April 24, 2013.  How does that forecast data compare with previously received forecasts?  We need to start building the second half of the SQL statement, which looks at the historical data.  To limit the dataset size, I will limit the data to only those EDI documents that were received up to 56 days prior (the eight week “firm” period) to the April 24, 2013 date that was specified earlier in this article:

SELECT
  T.CUSTOMER_PO,
  ROW_NUMBER() OVER (PARTITION BY T.CUSTOMER_PO,TRUNC(T.CREATE_DATE) ORDER BY T.RELEASE_ID DESC) REL_ON_DATE,
  TRUNC(T.CREATE_DATE) CREATE_DATE,
  T.RELEASE_ID,
  T.PREVIOUS_ACCUM_QTY
FROM
  T1 T
WHERE 
  T.CREATE_DATE >= TO_DATE('24-APR-2013','DD-MON-YYYY') - 56
  AND T.CREATE_DATE < TO_DATE('24-APR-2013','DD-MON-YYYY')
ORDER BY
  T.CUSTOMER_PO,
  T.RELEASE_ID;

CUSTOMER_PO     REL_ON_DATE CREATE_DA RELEASE_ID      PREVIOUS_ACCUM_QTY
--------------- ----------- --------- --------------- ------------------
1131111                   1 27-FEB-13 0371                           575
1131111                   1 01-MAR-13 0381                           577
1131111                   1 06-MAR-13 0431                           578
1131111                   1 07-MAR-13 0441                           581
1131111                   1 08-MAR-13 0451                           581
1131111                   1 11-MAR-13 0461                           581
...
1131111                   1 08-APR-13 0741                           593
1131111                   1 11-APR-13 0791                           593
1131111                   1 23-APR-13 0911                           595
1131245                   1 27-FEB-13 0371                          2299
1131245                   1 01-MAR-13 0381                          2300
1131245                   1 05-MAR-13 0421                          2303
...
1131745                   1 19-APR-13 0871                           435
1131745                   1 23-APR-13 0911                           435
1133290                   1 27-FEB-13 20130205-001                     2
1133290                   1 28-FEB-13 20130206-001                     2
1133290                   1 01-MAR-13 20130207-001                     2
...
1133290                   1 19-APR-13 20130328-001                     2
1133290                   1 23-APR-13 20130401-001                     2
1231234                   1 27-FEB-13 20130205-001                    24
1231234                   1 28-FEB-13 20130206-001                    24
...
1231234                   1 23-APR-13 20130401-001                    24
1231280                   1 27-FEB-13 20130205-001                    57
1231280                   1 28-FEB-13 20130206-001                    57
1231280                   1 01-MAR-13 20130207-001                    57
1231280                   1 04-MAR-13 20130208-001                    58
1231280                   1 05-MAR-13 20130211-001                    58
...
1231280                   1 04-APR-13 20130313-001                    60
1231280                   2 05-APR-13 20130314-002                    60
...
1231280                   1 19-APR-13 20130328-001                    60
1231280                   1 23-APR-13 20130401-001                    60
1233290                   1 27-FEB-13 20130205-001                     4
1233290                   1 28-FEB-13 20130206-001                     4
...

As we found earlier, there could be more than one release for a customer purchase order received in a single date, and we are only concerned with the most recent release received on any particular day.  We have a second issue, in that the dates may have shifted from one release to the next, so we have to artifically generate rows with the potentially missing dates, with an order quantity of 0 on those dates.

But, how to generate a list of dates?  We could use a CONNECT BY LEVEL trick:

SELECT
  TO_DATE('24-APR-2013','DD-MON-YYYY')+(LEVEL-1) DUE_DATE
FROM
  DUAL
CONNECT BY
  LEVEL<=100;

DUE_DATE
---------
24-APR-13
25-APR-13
26-APR-13
27-APR-13
28-APR-13
...
15-JUN-13
16-JUN-13
17-JUN-13
18-JUN-13
...
30-JUL-13
31-JUL-13
01-AUG-13

Now, by creating a Cartesian join between the dataset with the dates and the dataset with the EDI release header records, we will make certain that a date exists for every date that is potentially of interest (up to 100 days after April 24, 2013) when we later join the result with the T1_LINE table (the child table that lists the forecast delivery dates).

SELECT
  T2.CUSTOMER_PO,
  T2.RELEASE_ID,
  T2.CREATE_DATE,
  T2.PREVIOUS_ACCUM_QTY,
  D.DUE_DATE
FROM
  (SELECT
    T.CUSTOMER_PO,
    ROW_NUMBER() OVER (PARTITION BY T.CUSTOMER_PO,TRUNC(T.CREATE_DATE) ORDER BY T.RELEASE_ID DESC) REL_ON_DATE,
    TRUNC(T.CREATE_DATE) CREATE_DATE,
    T.RELEASE_ID,
    T.PREVIOUS_ACCUM_QTY
  FROM
    T1 T
  WHERE 
    T.CREATE_DATE >= TO_DATE('24-APR-2013','DD-MON-YYYY') - 56
    AND T.CREATE_DATE < TO_DATE('24-APR-2013','DD-MON-YYYY')
  ORDER BY
    T.CUSTOMER_PO,
    T.RELEASE_ID) T2,
  (SELECT
    TO_DATE('24-APR-2013','DD-MON-YYYY')+(LEVEL-1) DUE_DATE
  FROM
    DUAL
  CONNECT BY
    LEVEL<=100) D
WHERE
  T2.REL_ON_DATE=1
ORDER BY
  T2.CUSTOMER_PO,
  T2.RELEASE_ID,
  D.DUE_DATE;

...
CUSTOMER_PO     RELEASE_ID      CREATE_DA PREVIOUS_ACCUM_QTY DUE_DATE
--------------- --------------- --------- ------------------ ---------
1233290         20130318-001    09-APR-13                  4 25-MAY-13
1233290         20130318-001    09-APR-13                  4 26-MAY-13
1233290         20130318-001    09-APR-13                  4 27-MAY-13
1233290         20130318-001    09-APR-13                  4 28-MAY-13
...
1233290         20130322-001    15-APR-13                  4 22-JUN-13
1233290         20130322-001    15-APR-13                  4 23-JUN-13
1233290         20130322-001    15-APR-13                  4 24-JUN-13
1233290         20130322-001    15-APR-13                  4 25-JUN-13
...
1233290         20130401-001    23-APR-13                  4 29-JUL-13
1233290         20130401-001    23-APR-13                  4 30-JUL-13
1233290         20130401-001    23-APR-13                  4 31-JUL-13
1233290         20130401-001    23-APR-13                  4 01-AUG-13

Next, we will outer join to the T1_LINE table to bring in the actual forecasted delivery schedule from the previous dates (note that I also removed an unnecessary ORDER BY clause that was still embedded in the SQL statement – the Oracle query optimizer probably would have realized that the ORDER BY clause was meaningless to the query result, and automatically removed it – but I removed the clause just in case my assumption is incorrect):

SELECT
  T3.CUSTOMER_PO,
  T3.CREATE_DATE,
  T3.DUE_DATE,
  T3.PREVIOUS_ACCUM_QTY + SUM(NVL(TL.QTY,0)) OVER (PARTITION BY T3.CUSTOMER_PO,T3.RELEASE_ID ORDER BY T3.DUE_DATE) REL_RUNNING_SUM
FROM
  (SELECT
    T2.CUSTOMER_PO,
    T2.RELEASE_ID,
    T2.CREATE_DATE,
    T2.PREVIOUS_ACCUM_QTY,
    D.DUE_DATE
  FROM
    (SELECT
      T.CUSTOMER_PO,
      ROW_NUMBER() OVER (PARTITION BY T.CUSTOMER_PO,TRUNC(T.CREATE_DATE) ORDER BY T.RELEASE_ID DESC) REL_ON_DATE,
      TRUNC(T.CREATE_DATE) CREATE_DATE,
      T.RELEASE_ID,
      T.PREVIOUS_ACCUM_QTY
    FROM
      T1 T
    WHERE 
      T.CREATE_DATE >= TO_DATE('24-APR-2013','DD-MON-YYYY') - 56
      AND T.CREATE_DATE < TO_DATE('24-APR-2013','DD-MON-YYYY')) T2,
    (SELECT
      TO_DATE('24-APR-2013','DD-MON-YYYY')+(LEVEL-1) DUE_DATE
    FROM
      DUAL
    CONNECT BY
      LEVEL<=100) D
  WHERE
    T2.REL_ON_DATE=1) T3,
  T1_LINE TL
WHERE
  T3.CUSTOMER_PO=TL.CUSTOMER_PO(+)
  AND T3.RELEASE_ID=TL.RELEASE_ID(+)
  AND T3.DUE_DATE=TL.DUE_DATE(+)
ORDER BY
  T3.CUSTOMER_PO,
  T3.RELEASE_ID,
  T3.DUE_DATE;

CUSTOMER_PO     CREATE_DA DUE_DATE  REL_RUNNING_SUM
--------------- --------- --------- ---------------
1131111         27-FEB-13 24-APR-13             577
1131111         27-FEB-13 25-APR-13             578
1131111         27-FEB-13 26-APR-13             579
1131111         27-FEB-13 27-APR-13             579
1131111         27-FEB-13 28-APR-13             579
1131111         27-FEB-13 29-APR-13             580
1131111         27-FEB-13 30-APR-13             581
...
1131111         23-APR-13 30-JUL-13             636
1131111         23-APR-13 31-JUL-13             638
1131111         23-APR-13 01-AUG-13             639
...
1231280         11-MAR-13 08-MAY-13              58
1231280         11-MAR-13 09-MAY-13              58
1231280         11-MAR-13 10-MAY-13              58
1231280         11-MAR-13 11-MAY-13              58
...
1231280         23-APR-13 21-JUL-13              62
1231280         23-APR-13 22-JUL-13              62
1231280         23-APR-13 23-JUL-13              62
1231280         23-APR-13 24-JUL-13              64
1231280         23-APR-13 25-JUL-13              64
1231280         23-APR-13 26-JUL-13              64
...

We are getting close.  Now we just need to join the SQL created earlier in this article with the above SQL statement so that the current customer purchase order forecasts may be compared with the earlier received customer purchase order forecasts.  The LISTAGG analytic funtion, introduced with Oracle Database 11.1, makes it easy to condense multiple rows from the resultset into a much smaller number of rows, if the resultset is grouped on at least one column.  So, the final version of the analysis challenge appears as follows:

SELECT
  C.PART_ID,
  C.CUSTOMER_PO,
  C.POSSIBLE_FIRM_DATE,
  C.DUE_DATE,
  C.QTY,
  C.REL_RUNNING_SUM,
  LISTAGG(T4.REL_RUNNING_SUM||' ('||TO_CHAR(T4.CREATE_DATE,'MM/DD/YY')||')',', ') WITHIN GROUP (ORDER BY T4.CREATE_DATE DESC) EXCEPTIONS
FROM
  (SELECT
    T.PART_ID,
    T.CUSTOMER_PO,
    T.RELEASE_ID,
    T.RELEASE_DATE+55 POSSIBLE_FIRM_DATE,
    T.PREVIOUS_ACCUM_QTY,
    TL.DUE_DATE,
    TL.QTY,
    T.PREVIOUS_ACCUM_QTY + SUM(TL.QTY) OVER (PARTITION BY T.CUSTOMER_PO,T.RELEASE_ID ORDER BY TL.DUE_DATE) REL_RUNNING_SUM
  FROM
    (SELECT
      T.PART_ID,
      T.CUSTOMER_PO,
      ROW_NUMBER() OVER (PARTITION BY T.CUSTOMER_PO,TRUNC(T.CREATE_DATE) ORDER BY T.RELEASE_ID DESC) REL_ON_DATE,
      T.RELEASE_ID,
      T.RELEASE_DATE,
      T.CREATE_DATE,
      T.PREVIOUS_ACCUM_QTY
    FROM
      T1 T
    WHERE 
      T.CREATE_DATE=TO_DATE('24-APR-2013','DD-MON-YYYY')) T,
    T1_LINE TL
  WHERE
    T.REL_ON_DATE=1
    AND T.CUSTOMER_PO=TL.CUSTOMER_PO
    AND T.RELEASE_ID=TL.RELEASE_ID
    AND (T.RELEASE_DATE+100)>=TL.DUE_DATE) C,
  (SELECT
    T3.CUSTOMER_PO,
    T3.CREATE_DATE,
    T3.DUE_DATE,
    T3.PREVIOUS_ACCUM_QTY + SUM(NVL(TL.QTY,0)) OVER (PARTITION BY T3.CUSTOMER_PO,T3.RELEASE_ID ORDER BY T3.DUE_DATE) REL_RUNNING_SUM
  FROM
    (SELECT
      T2.CUSTOMER_PO,
      T2.RELEASE_ID,
      T2.CREATE_DATE,
      T2.PREVIOUS_ACCUM_QTY,
      D.DUE_DATE
    FROM
      (SELECT
        T.CUSTOMER_PO,
        ROW_NUMBER() OVER (PARTITION BY T.CUSTOMER_PO,TRUNC(T.CREATE_DATE) ORDER BY T.RELEASE_ID DESC) REL_ON_DATE,
        TRUNC(T.CREATE_DATE) CREATE_DATE,
        T.RELEASE_ID,
        T.PREVIOUS_ACCUM_QTY
      FROM
        T1 T
      WHERE 
        T.CREATE_DATE >= TO_DATE('24-APR-2013','DD-MON-YYYY') - 56
        AND T.CREATE_DATE < TO_DATE('24-APR-2013','DD-MON-YYYY')) T2,
      (SELECT
        TO_DATE('24-APR-2013','DD-MON-YYYY')+(LEVEL-1) DUE_DATE
      FROM
        DUAL
      CONNECT BY
        LEVEL<=100) D
    WHERE
      T2.REL_ON_DATE=1) T3,
    T1_LINE TL
  WHERE
    T3.CUSTOMER_PO=TL.CUSTOMER_PO(+)
    AND T3.RELEASE_ID=TL.RELEASE_ID(+)
    AND T3.DUE_DATE=TL.DUE_DATE(+)) T4
WHERE
  C.CUSTOMER_PO=T4.CUSTOMER_PO
  AND C.DUE_DATE=T4.DUE_DATE
  AND C.REL_RUNNING_SUM<>T4.REL_RUNNING_SUM
GROUP BY
  C.PART_ID,
  C.CUSTOMER_PO,
  C.POSSIBLE_FIRM_DATE,
  C.DUE_DATE,
  C.QTY,
  C.REL_RUNNING_SUM
ORDER BY
  C.PART_ID,
  C.CUSTOMER_PO,
  C.DUE_DATE;

PART_ID    CUSTOMER_PO     POSSIBLE_ DUE_DATE         QTY REL_RUNNING_SUM
---------- --------------- --------- --------- ---------- ---------------
EXCEPTIONS
-----------------------------------------------------------------------------------------------------------------------
RS812      1231280         18-JUN-13 27-JUN-13          1              61
60 (03/22/13), 60 (03/21/13), 60 (03/19/13), 60 (03/18/13), 60 (03/15/13), 60 (03/14/13), 59 (03/13/13), 59 (03/12/13),
59 (03/11/13), 59 (03/08/13), 60 (03/07/13), 60 (03/06/13), 60 (03/05/13), 60 (03/04/13), 59 (03/01/13), 59 (02/28/13),
59 (02/27/13)

RS812      1231280         18-JUN-13 03-JUL-13          1              62
61 (03/22/13), 61 (03/21/13), 61 (03/19/13), 61 (03/18/13), 61 (03/15/13), 61 (03/14/13), 60 (03/13/13), 60 (03/12/13),
60 (03/11/13), 60 (03/08/13), 61 (03/07/13), 61 (03/06/13), 61 (03/05/13), 61 (03/04/13), 60 (03/01/13), 60 (02/28/13),
60 (02/27/13)

RS812      1231280         18-JUN-13 24-JUL-13          2              64
63 (03/26/13), 63 (03/25/13), 62 (03/22/13), 62 (03/21/13), 62 (03/19/13), 62 (03/18/13), 62 (03/15/13), 62 (03/14/13),
61 (03/13/13), 61 (03/12/13), 61 (03/11/13), 61 (03/08/13), 62 (03/07/13), 62 (03/06/13), 62 (03/05/13), 62 (03/04/13),
61 (03/01/13), 61 (02/28/13), 61 (02/27/13)

It is now easy to see that the requested total quantity due through June 27, 2013 increased by 2 since the first forecast which was received on February 27, 2013.  The requested total quantity increased by 1 in the EDI release that was received on March 4, 2013, and by 1 again on the next EDI release that was received after March 22, 2013.  The second and third lines of the output show that there was a little more fluctuation in the customer’s forecast, increasing in quantity, decreasing in quantity, and then increasing in quantity again.

An unfortunate problem may be present – what if the customer completely eliminates the forecast orders on their purchase order, but still sends in a header line for the forecast?  A future enhancement may be required, where the CONNECT BY LEVEL trick is used in the first half of the SQL statement (the current releases), just as it is used in the second half of the SQL statement (the historical releases), so that when the top half of the final SQL statement is joined to the bottom half, rows returned by the bottom half of the SQL statement are not unnecessarily removed from the output.

The output is a little ugly.  One approach to clean up the output involves spooling the output to an HTML file as explained here and in the documentation here.  Another approach to clean up the output involves sending the data into an Excel spreadsheet using either a macro or the Microsoft Query Wizard, accessed from within Excel (see the articles in the Excel category of this blog for an explanation of these approaches).

—–

The final solution for this analysis challenge is so simple when built up from smaller SQL statements as shown above; I wonder why I did not consider this approach 12 years ago?  :-)





On the Topic of Programming 3

3 09 2012

September 3, 2012

(Back to the Previous Post in the Series)

In the previous article in this series, a question was posed regarding how many SQL statements would be required to generate the following report, which displays in Microsoft Excel:

A significant number of items regarding the report require explanation:

  • Employee IDs and employee names appear in column A (the left-most column).  There could be many employees, but only those employees reporting labor transactions for machining operations in production batches (work order lots) that completed during one of the weeks should be listed.
  • Part IDs, part descriptions, and operations (Sub n OP nn in the above example report) are listed in column B (the second column from the left).  The number of operations listed for an employee could range from one to possibly 50 (or more, depending on the date range of the report), but that list will only include machining type operations (not welding, blasting, painting, laser cutting, etc.).
  • Total hours are reported in column C.  The second output row for an employee shows the sum of the numbers directly below for the employee.  The numbers on the 3rd, 6th, 9th, etc. rows for the employee in column C are the sum of the hours the employee worked in the operation for all manufacturing batches that completed in the report time period.
  • The first row for each employee, in columns E though H in the above report (could very well extend beyond column H, depending on the number of weeks selected), are the Monday through Sunday dates for the week.
  • The second row for each employee, in columns E through H in the above report show the weighted average efficiency for the employee in that week.  The weighting is based on the percentage of hours the employee spent in the operations, for those operations in batches that completed during the week.  This number of hours is calculated, but not printed in the report.
  • The numbers on the 4th, 7th, 10th, etc. rows for each employee, in columns E through H, shows the average number of hours the operation required, for those batches that completed during the week.  If an employee did not work in the operation one week, but did another, then a blank should appear in the week that the employee did not work in the operation.  All employees that worked in the operation for the week will have the same average printed.  For example, Cary Grant and James Stewart both worked on WIDGET101 operation Sub 0 OP 30.  The average hours for the first week is 10.50, and that value appears for both employees.  During the second week, only James Stewart worked on that operation, and the average hours for that week decreased to 7.56.
  • The numbers of on the 3rd, 6th, 9th, etc. rows for each employee, shows the ratio of the engineering standard hours divided by the average hours for the operation in that week.  The same values will appear for all employees reporting time to the operation in the week.
  • The percentage in column I is the simple average of the weighted averages to the left.  There may be causes where a blank will appear for one of the weighted averages for an employee, so it is not possible to simply sum the weighted averages and divide by the number of weeks.

This appeared to be a simple report when it was just a sketch on a sheet of paper!

How many SQL statement executions will be necessary to generate the above report?  Before answering that question, let’s take another look at how the data is organized and related between the tables:

OK, so the above diagram might be a little confusing.

The engineering standard hours are found by querying the OPERATION table with the WORKORDER_TYPE=’W’ and the WORKORDER_BASE_ID equal to the part ID that is of interest (the WORK_ORDER table also contains engineering standard information, accessed by searching for TYPE=’M’ and the BASE_ID equal to the part ID tht is of interest).  In theory, there could be multiple engineering standards for the same part ID and operation, which is why we might need to join to the PART table to make certain that the current engineering standard is retrieved, but we will ignore that potential issue for now.  It would be a good idea to also check the WORK_ORDER table to obtain a distinct list of part IDs that had a close date in the time range that is of interest (there is no sense in forcing the database RDBMS to retrieve the engineering standards for parts that have not been produced in the last 17 years), so we will need to specify that the TYPE column in the WORK_ORDER table is equal to W and that the CLOSE_DATE is in the date range of the report.  Many operations are NOT machining operations, and we can only determine the type of operation by determining the type machine that is selected for use, so we must also query the SHOP_RESOURCE table to determine if the operation is processed at a machining center.

The actual production hours are found by querying the LABOR_TICKET table with the the WORKORDER_TYPE=’W’ (to eliminate spurious indirect labor) and joined to the SHOP_RESOURCE table to make certain that the operation is a machining type operation (as explained above).  To determine the employees’ name, the LABOR_TICKET table is joined to the EMPLOYEE table.  To determine the part ID that is produced in the labor ticket transaction, the LABOR_TICKET table must be joined to the WORK_ORDER table, with the TYPE column in the WORK_ORDER table set to ‘W’ and the SUB_ID column in the WORK_ORDER table set to ‘0’ so that the main part ID for the work order is returned rather than a subordinate part ID.  To retrieve the description of the part ID that is produced, the WORK_ORDER table must be joined to the PART table.

That sketch of a report on a sheet of paper sure looked simple, but the degree of difficulty is all relative (or relational).

How many query executions will be required to produce the report that will appear in Microsoft Excel?  Should we retrieve the entire database over the network and process the data client-side?  I don’t program in the Java programming language, but I am guessing that the majority of Java programmers would not attempt to retrieve the entire database.   Would the number of SQL statement executions depend on the number of employees?  Would the number of SQL statement executions depend on the number of different part operations whose batches closed in a particular week?  Would the number of SQL statement executions depend on the number weeks included in the report?

In the words of the TV show “Name that Tune“, I can name that tune in 1 note.  Because there are many to many joins between the various table, it is not possible to construct a single simple SQL statement that retrieves the correct result.  However, it is possible to construct a single complex query using inline views or subquery factoring (WITH blocks), along with analytic functions to retrieve the correct, necessary information to the client for presentation in Microsoft Excel.  The correct answer to the above question must:

  • Minimize the amount of data flowing across the network.  SQL*Net compression may help, but of course that can only possibly help if more than one row is retrieved at a time.  Ideally, the array fetch size should be set to a large value (possibly 100 up to the maximum of 5,000) to make the best use of SQL*Net compression.
  • Minimize as best as possible the performance impact on the database server, the database server is a shared resource without unlimited capacity.  Setting a reasonably large array fetch size may help reduce the number of memory accesses (specifically consistent gets) on the database server.  Make certain that available indexes are usable by the query – be careful about using functions (such as TRUNC) on date type columns in the WHERE clause.
  • When a query accesses many tables, and also when there are multiple inline views (or factored subqueries), the Oracle optimizer may not always find the optimal join order.  The _OPTIMIZER_MAX_PERMUTATIONS hidden parameter defaults to a value of 2,000, which of course restricts the number of possible join permutations attempted per query block in a query, so with many tables in a query block (more than 6, for instance), the limit for the maximum number of permutations may easily be hit.  Statistics must be reasonably accurate for cardinality estimates to be close to accurate, otherwise the query optimizer may select not only an inappropriate join order but also an inappropriate join type.  Histograms may help (or harm) the query optimizer, indexes may help (or harm), establishing default cardinality and cost for custom PL/SQL functions may help, and adding hints to help control the cardinality estimates or join orders may also help.
  • Lastly, and most importantly, the queries must provide correct information.

I struggled with this problem a bit.  The report is intended to measure efficiency, so it would be somewhat laughable if the report performed inefficiently.  How could I minimize execution time for this particular report?  If Oracle Database is licensed by CPU core (or CPU socket in the case of the Standard Edition), does it make sense to perform all of the computations on the database server, or does it make sense to off-load some of that processing to the client’s dual, quad, or eight core CPU?  If the report were rendered by a shared resource, such as a web server, application server, or Citrix server, would I want that same CPU offloading as I would have planned with a dedicated client computer?  The report shows the information in a (modified) pivot table format, should I use the PIVOT function that was introduced with Oracle Database 11.1; or maybe I should use client-side control break logic that was taught in one of my university-level structured programming classes years ago?

That sketch of a report on a sheet of paper sure looked simple, now I need a control break (must be simple one key solution for that problem – check your keyboard :-) ).

It is possible to group the manufacturing lots by the Monday of the work week of the completion date (CLOSE_DATE) with the following function call:

NEXT_DAY(TRUNC(CLOSE_DATE),'MONDAY')-7)

Looks fairly simple.  If I had a date range of four weeks in mind, I might be inclined to do something like this in the WHERE clause of the query:

CLOSE_DATE BETWEEN (NEXT_DAY(TO_DATE('20-AUG-2012','DD-MON-YYYY'),'MONDAY')-28) AND (NEXT_DAY(TO_DATE('20-AUG-2012','DD-MON-YYYY'),'MONDAY')-1)

However, I did not do that.  Why not?  One reason is that the Oracle query optimizer would likely have a hard time determining the approximate number of rows that would be returned from the table with the CLOSE_DATE column.  The NEXT_DAY(TO_DATE()) combination hides the actual date range of interest, so the optimizer falls back to using default cardinality percentages for that predicate.  Secondly, what happens when the user of the program is interested in something other than four weeks – the solution may be cumbersome to maintain.  Thirdly, the above did not make use of bind variables, so every time the date range is changed, the query optimizer will perform a hard parse of the query.  A better approach calculates the date range on the client-side and submits the WHERE clause like this:

CLOSE_DATE BETWEEN TO_DATE('30-JUL-2012','DD-MON-YYYY') AND TO_DATE('26-AUG-2012','DD-MON-YYYY')

Or, better yet using bind variables:

CLOSE_DATE BETWEEN :START_DATE AND :END_DATE

So, how many query executions will this report require?  Just one query, executed a single time.  The particular programming environment that I used requires ? to appear in bind variable positions, so when you see a ? just read it as either :START_DATE or :END_DATE.

SELECT
  L.EMPLOYEE_ID,
  E.LAST_NAME||', '||E.FIRST_NAME EMPLOYEE_NAME,
  L.PART_ID,
  L.WORKORDER_SUB_ID,
  L.OPERATION_SEQ_NO,
  L.DESCRIPTION,
  L.CLOSE_WEEK,
  L.PART_EFF_WEEK,
  L.AVG_HRS_PC,
  L.EMP_ACCUM_EFF_WEEK,
  SUM(L.HOURS_WORKED) OVER (PARTITION BY L.EMPLOYEE_ID) EMP_HOURS_WORKED,
  SUM(L.HOURS_WORKED) OVER (PARTITION BY L.EMPLOYEE_ID,L.PART_ID, L.WORKORDER_SUB_ID,L.OPERATION_SEQ_NO) EMP_PART_HOURS_WORKED,
  SUM(L.EMP_ACCUM_EFF_WEEK) OVER (PARTITION BY L.EMPLOYEE_ID,L.CLOSE_WEEK) EMP_EFF_WEEK
FROM
  (SELECT
    O.PART_ID,
    O.DESCRIPTION,
    O.WORKORDER_SUB_ID,
    O.OPERATION_SEQ_NO,
    O.CLOSE_WEEK,
    O.RECEIVED_QTY,
    O.ACT_RUN_HRS,
    L.EMPLOYEE_ID,
    L.HOURS_WORKED,
    ROUND(O.ACT_RUN_HRS/O.RECEIVED_QTY,2) AVG_HRS_PC,
    ENG.ENG_HRS,
    ROUND(DECODE((O.ACT_RUN_HRS/O.RECEIVED_QTY),0,1,ENG.ENG_HRS/(O.ACT_RUN_HRS/O.RECEIVED_QTY)),4) PART_EFF_WEEK,
    SUM(L.HOURS_WORKED) OVER (PARTITION BY L.EMPLOYEE_ID,L.CLOSE_WEEK) EMP_HRS_WEEK,
    ROUND(L.HOURS_WORKED/(SUM(L.HOURS_WORKED) OVER (PARTITION BY L.EMPLOYEE_ID,L.CLOSE_WEEK)),4) EMP_PERCENT_WEEK,
    ROUND((L.HOURS_WORKED/(SUM(L.HOURS_WORKED) OVER (PARTITION BY L.EMPLOYEE_ID,L.CLOSE_WEEK))) * DECODE((O.ACT_RUN_HRS/O.RECEIVED_QTY),0,1,ENG.ENG_HRS/(O.ACT_RUN_HRS/O.RECEIVED_QTY)),4) EMP_ACCUM_EFF_WEEK
  FROM
    (SELECT
      PART_ID,
      DESCRIPTION,
      WORKORDER_SUB_ID,
      OPERATION_SEQ_NO,
      NEXT_DAY(TRUNC(CLOSE_DATE),'MONDAY')-7 CLOSE_WEEK,
      SUM(RECEIVED_QTY) RECEIVED_QTY,
      SUM(ACT_RUN_HRS) ACT_RUN_HRS
    FROM
      (SELECT
        WO.TYPE,
        WO.BASE_ID,
        WO.LOT_ID,
        WO.SPLIT_ID,
        WO.RECEIVED_QTY,
        WO.CLOSE_DATE,
        SUM(O.ACT_SETUP_HRS+O.ACT_RUN_HRS) ACT_RUN_HRS,
        WO.PART_ID,
        O.WORKORDER_SUB_ID,
        O.SEQUENCE_NO OPERATION_SEQ_NO,
        P.DESCRIPTION
      FROM
        WORK_ORDER WO,
        PART P,
        OPERATION O,
        SHOP_RESOURCE SR
      WHERE
        WO.TYPE='W'
        AND WO.SUB_ID='0'
        AND WO.CLOSE_DATE BETWEEN ? AND ?
        AND WO.RECEIVED_QTY>0
        AND WO.TYPE=O.WORKORDER_TYPE
        AND WO.BASE_ID=O.WORKORDER_BASE_ID
        AND WO.LOT_ID=O.WORKORDER_LOT_ID
        AND WO.SPLIT_ID=O.WORKORDER_SPLIT_ID
        AND WO.PART_ID=P.ID
        AND O.RESOURCE_ID=SR.ID
        AND SR.BUILDING_ID='Machine'
        AND SR.LOGICAL_GROUP IN ('10 TURNING CENTERS','20 VERTICAL M.C.','30 630MM PALLET HMC','40 800MM-1000MM PALLET HMC','50 1000MM PALLET HMC','55 LARGE 2-TABLE HMC/HBM','60 SINGLE TABLE HBM','60 SINGLE TABLE HDM','70 LARGE FLOOR MACHINES')
      GROUP BY
        WO.TYPE,
        WO.BASE_ID,
        WO.LOT_ID,
        WO.SPLIT_ID,
        WO.CLOSE_DATE,
        WO.RECEIVED_QTY,
        WO.PART_ID,
        O.WORKORDER_SUB_ID,
        O.SEQUENCE_NO,
        P.DESCRIPTION)
    GROUP BY
      PART_ID,
      DESCRIPTION,
      WORKORDER_SUB_ID,
      OPERATION_SEQ_NO,
      NEXT_DAY(TRUNC(CLOSE_DATE),'MONDAY')-7) O,
    (SELECT /*+ LEADING(WO LT) */
      WO.PART_ID,
      LT.EMPLOYEE_ID,
      LT.WORKORDER_SUB_ID,
      LT.OPERATION_SEQ_NO,
      NEXT_DAY(TRUNC(CLOSE_DATE),'MONDAY')-7 CLOSE_WEEK,
      SUM(LT.HOURS_WORKED) HOURS_WORKED
    FROM
      (SELECT
        WO.TYPE,
        WO.BASE_ID,
        WO.LOT_ID,
        WO.SPLIT_ID,
        WO.CLOSE_DATE,
        WO.PART_ID
      FROM
        WORK_ORDER WO,
        OPERATION O,
        SHOP_RESOURCE SR
      WHERE
        WO.TYPE='W'
        AND WO.SUB_ID='0'
        AND WO.CLOSE_DATE BETWEEN ? AND ?
        AND WO.RECEIVED_QTY>0
        AND WO.TYPE=O.WORKORDER_TYPE
        AND WO.BASE_ID=O.WORKORDER_BASE_ID
        AND WO.LOT_ID=O.WORKORDER_LOT_ID
        AND WO.SPLIT_ID=O.WORKORDER_SPLIT_ID
        AND O.RESOURCE_ID=SR.ID
        AND SR.BUILDING_ID='Machine'
        AND SR.LOGICAL_GROUP IN ('10 TURNING CENTERS','20 VERTICAL M.C.','30 630MM PALLET HMC','40 800MM-1000MM PALLET HMC','50 1000MM PALLET HMC','55 LARGE 2-TABLE HMC/HBM','60 SINGLE TABLE HBM','60 SINGLE TABLE HDM','70 LARGE FLOOR MACHINES')
      GROUP BY
        WO.TYPE,
        WO.BASE_ID,
        WO.LOT_ID,
        WO.SPLIT_ID,
        WO.CLOSE_DATE,
        WO.PART_ID) WO,
      LABOR_TICKET LT,
      SHOP_RESOURCE SR
    WHERE
      WO.TYPE=LT.WORKORDER_TYPE
      AND WO.BASE_ID=LT.WORKORDER_BASE_ID
      AND WO.LOT_ID=LT.WORKORDER_LOT_ID
      AND WO.SPLIT_ID=LT.WORKORDER_SPLIT_ID
      AND LT.TYPE IN ('R','S')
      AND LT.HOURS_WORKED<>0
      AND SR.LOGICAL_GROUP IN ('10 TURNING CENTERS','20 VERTICAL M.C.','30 630MM PALLET HMC','40 800MM-1000MM PALLET HMC','50 1000MM PALLET HMC','55 LARGE 2-TABLE HMC/HBM','60 SINGLE TABLE HBM','60 SINGLE TABLE HDM','70 LARGE FLOOR MACHINES')
      AND LT.RESOURCE_ID=SR.ID
    GROUP BY
      WO.PART_ID,
      LT.WORKORDER_SUB_ID,
      LT.OPERATION_SEQ_NO,
      LT.EMPLOYEE_ID,
      NEXT_DAY(TRUNC(CLOSE_DATE),'MONDAY')-7) L,
    (SELECT /*+ LEADING(WO) */
      WO2.TYPE,
      WO2.BASE_ID,
      O.WORKORDER_SUB_ID,
      O.SEQUENCE_NO OPERATION_SEQ_NO,
      SUM(O.SETUP_HRS+O.RUN_HRS) ENG_HRS,
      WO.PART_ID
    FROM
      (SELECT
        WO.PART_ID
      FROM
        WORK_ORDER WO
      WHERE
        WO.TYPE='W'
        AND WO.SUB_ID='0'
        AND WO.CLOSE_DATE BETWEEN ? AND ?
        AND WO.RECEIVED_QTY>0
      GROUP BY
        WO.PART_ID) WO,
      WORK_ORDER WO2,
      OPERATION O,
      SHOP_RESOURCE SR
    WHERE
      WO2.TYPE='M'
      AND WO.PART_ID=WO2.BASE_ID
      AND WO2.LOT_ID='0'
      AND WO2.SPLIT_ID='0'
      AND WO2.SUB_ID='0'
      AND WO2.TYPE=O.WORKORDER_TYPE
      AND WO2.BASE_ID=O.WORKORDER_BASE_ID
      AND WO2.LOT_ID=O.WORKORDER_LOT_ID
      AND WO2.SPLIT_ID=O.WORKORDER_SPLIT_ID
      AND O.RESOURCE_ID=SR.ID
      AND SR.BUILDING_ID='Machine'
      AND SR.LOGICAL_GROUP IN ('10 TURNING CENTERS','20 VERTICAL M.C.','30 630MM PALLET HMC','40 800MM-1000MM PALLET HMC','50 1000MM PALLET HMC','55 LARGE 2-TABLE HMC/HBM','60 SINGLE TABLE HBM','60 SINGLE TABLE HDM','70 LARGE FLOOR MACHINES')
    GROUP BY
      WO2.TYPE,
      WO2.BASE_ID,
      WO.PART_ID,
      O.WORKORDER_SUB_ID,
      O.SEQUENCE_NO) ENG
  WHERE
    O.PART_ID=L.PART_ID
    AND O.WORKORDER_SUB_ID=L.WORKORDER_SUB_ID
    AND O.OPERATION_SEQ_NO=L.OPERATION_SEQ_NO
    AND O.CLOSE_WEEK=L.CLOSE_WEEK
    AND O.PART_ID=ENG.PART_ID
    AND O.WORKORDER_SUB_ID=ENG.WORKORDER_SUB_ID
    AND O.OPERATION_SEQ_NO=ENG.OPERATION_SEQ_NO) L,
  EMPLOYEE E
WHERE
  L.EMPLOYEE_ID=E.ID
ORDER BY
  L.EMPLOYEE_ID,
  L.PART_ID,
  TO_NUMBER(L.WORKORDER_SUB_ID),
  L.OPERATION_SEQ_NO,
  L.CLOSE_WEEK;

I suppose that I could have performed more of the processing on the database server by sliding the above query into an inline view and used the PIVOT function, and/or additional analytic functions to calculate the employee’s weighted efficiency average per week and the simple averages of the wieghted averages – doing so would have eliminated some potentially complex client-side logic that needed to be programmed a single time, but counter point is that every time the report executed, it would require a more of the database server’s resources than were absolutely required.

That sketch of a report on a sheet of paper sure looked simple, but it turned into a three part blog article series.  Interesting, now the person would like to analyze the data by part ID, listing the employees working on the operations for the part.  Do I smell another three part blog article series?  Sorry, no – I just changed the ORDER BY clause so that the PART_ID column was listed first, and made a small change to the client-side control break logic.  Now the person wants to analyze 104 weeks worth of data rather than just four weeks, and only a single part ID or employee.  Too late, those changes were anticipated, and included in the original specification in the client-side programming, implemented with a simple change to the query and pre-planning for a variable number of weeks in the report.





Matching the Expected Output – Analytic RANK, ROW_NUMBER, DENSE_RANK, or Something Different?

16 11 2011

November 16, 2011

I noticed an interesting request on the comp.databases.oracle.misc usenet group, and I thought that I would make an attempt at solving the request.  The original poster (OP) stated that he had a table with data similar to the following:

TX   ID   DEPT   LOCATION   LOAD
1    99    A        NY       12
2    99    A        LA       10
3    99    B        LA       05
4    77    B        LA       15
5    77    C        NY       12
6    77    D        LA       11  

He would like to obtain the following output:

TX   ID   DEPT   DEPT_RANK   LOCATION   LOC_RANK   LOAD
1    99    A         1          NY         2        12
2    99    A         1          LA         1        10
3    99    B         2          LA         1        05
4    77    B         1          LA         1        15
5    77    C         2          NY         2        12
6    77    D         3          LA         1        11  

The rankings are to be determined as follows:

DEPT_RANK for ID 99 is 1 for A because sum(LOAD) = 22 is the max
LOC_RANK for ID 99 is 1 for LA because sum(LOAD) = 15 is the max

At first glance, that request seems to be reasonably easy to accomplish.  Let’s start by creating a table with the sample data (ideally, the OP should have provided the DDL and DML to create and populate this table):

DROP TABLE T1 PURGE;

CREATE TABLE T1 (
  TX NUMBER,
  ID NUMBER,
  DEPT VARCHAR2(1),
  LOCATION VARCHAR2(2),
  LOAD NUMBER);

INSERT INTO T1 VALUES (1,99,'A','NY',12);
INSERT INTO T1 VALUES (2,99,'A','LA',10);
INSERT INTO T1 VALUES (3,99,'B','LA',05);
INSERT INTO T1 VALUES (4,77,'B','LA',15);
INSERT INTO T1 VALUES (5,77,'C','NY',12);
INSERT INTO T1 VALUES (6,77,'D','LA',11); 

The first step, if we were to think about creating the solution in logical steps, is to find a way to calculate the SUM values that were mentioned by the OP.  So, as a starting point, we might try this:

SELECT
  TX,
  ID,
  DEPT,
  SUM(LOAD) OVER (PARTITION BY ID,DEPT) SUM_LOAD_ID,
  LOCATION,
  SUM(LOAD) OVER (PARTITION BY ID,LOCATION) SUM_LOAD_LOCATION,
  LOAD
FROM
  T1
ORDER BY
  TX;

 TX  ID D SUM_LOAD_ID LO SUM_LOAD_LOCATION  LOAD
--- --- - ----------- -- ----------------- -----
  1  99 A          22 NY                12    12
  2  99 A          22 LA                15    10
  3  99 B           5 LA                15     5
  4  77 B          15 LA                26    15
  5  77 C          12 NY                12    12
  6  77 D          11 LA                26    11 

If I am understanding the OP’s request correctly, the above is a good starting point (even though the alias for the first analytic function could have been better selected).

We are then able to take the above SQL statement and push it into an inline view to hopefully produce the output that is expected by the OP  (note that the PARTITION clause differs for the LOC_RANK column from what is specified in the inline view for the function that is used to create that column):

SELECT
  TX,
  ID,
  DEPT,
  RANK() OVER (PARTITION BY ID,DEPT ORDER BY SUM_LOAD_ID DESC) DEPT_RANK,
  LOCATION,
  RANK() OVER (PARTITION BY ID,DEPT ORDER BY SUM_LOAD_LOCATION DESC) LOC_RANK,
  LOAD
FROM
  (SELECT
    TX,
    ID,
    DEPT,
    SUM(LOAD) OVER (PARTITION BY ID,DEPT) SUM_LOAD_ID,
    LOCATION,
    SUM(LOAD) OVER (PARTITION BY ID,LOCATION) SUM_LOAD_LOCATION,
    LOAD
  FROM
    T1)
ORDER BY
  TX;

 TX  ID D  DEPT_RANK LO   LOC_RANK  LOAD
--- --- - ---------- -- ---------- -----
  1  99 A          1 NY          2    12
  2  99 A          1 LA          1    10
  3  99 B          1 LA          1     5
  4  77 B          1 LA          1    15
  5  77 C          1 NY          1    12
  6  77 D          1 LA          1    11 

Let’s compare the above output with what the OP requested:

TX   ID   DEPT   DEPT_RANK   LOCATION   LOC_RANK   LOAD
1    99    A         1          NY         2        12
2    99    A         1          LA         1        10
3    99    B         2          LA         1        05
4    77    B         1          LA         1        15
5    77    C         2          NY         2        12
6    77    D         3          LA         1        11

Almost for the LOC_RANK column (not even close for the DEPT_RANK column), but not quite right.  The problem is that when attempting to calculate the RANK columns in the above output, we should only PARTITION on the ID column, not the ID column and some other column, as was the case when we used the SUM analytic function.

Let’s fix the PARTITION clause and try again:

SELECT
  TX,
  ID,
  DEPT,
  RANK() OVER (PARTITION BY ID ORDER BY SUM_LOAD_ID DESC) DEPT_RANK,
  LOCATION,
  RANK() OVER (PARTITION BY ID ORDER BY SUM_LOAD_LOCATION DESC) LOC_RANK,
  LOAD
FROM
  (SELECT
    TX,
    ID,
    DEPT,
    SUM(LOAD) OVER (PARTITION BY ID,DEPT) SUM_LOAD_ID,
    LOCATION,
    SUM(LOAD) OVER (PARTITION BY ID,LOCATION) SUM_LOAD_LOCATION,
    LOAD
  FROM
    T1)
ORDER BY
  TX;

 TX  ID D  DEPT_RANK LO   LOC_RANK  LOAD
--- --- - ---------- -- ---------- -----
  1  99 A          1 NY          3    12
  2  99 A          1 LA          1    10
  3  99 B          3 LA          1     5
  4  77 B          1 LA          1    15
  5  77 C          2 NY          3    12
  6  77 D          3 LA          1    11 

Let’s compare the above output with what the OP requested:

TX   ID   DEPT   DEPT_RANK   LOCATION   LOC_RANK   LOAD
1    99    A         1          NY         2        12
2    99    A         1          LA         1        10
3    99    B         2          LA         1        05
4    77    B         1          LA         1        15
5    77    C         2          NY         2        12
6    77    D         3          LA         1        11

Notice in the above that when two rows have the same SUM_LOAD_ value, the displayed rank is correct, but that repeated rank value then causes a rank value to be skipped (compare the DEPT_RANK column value on row 3, the LOC_RANK column value on row 1, and the LOC_RANK column value on row 5).

Now what?  The ROW_NUMBER function could be used to produce sequential rank numbers without gaps, for example:

SELECT
  TX,
  ID,
  DEPT,
  ROW_NUMBER() OVER (PARTITION BY ID ORDER BY SUM_LOAD_ID DESC) DEPT_RANK,
  LOCATION,
  ROW_NUMBER() OVER (PARTITION BY ID ORDER BY SUM_LOAD_LOCATION DESC) LOC_RANK,
  LOAD
FROM
  (SELECT
    TX,
    ID,
    DEPT,
    SUM(LOAD) OVER (PARTITION BY ID,DEPT) SUM_LOAD_ID,
    LOCATION,
    SUM(LOAD) OVER (PARTITION BY ID,LOCATION) SUM_LOAD_LOCATION,
    LOAD
  FROM
    T1)
ORDER BY
  TX;

 TX  ID D  DEPT_RANK LO   LOC_RANK  LOAD
--- --- - ---------- -- ---------- -----
  1  99 A          1 NY          3    12
  2  99 A          2 LA          2    10
  3  99 B          3 LA          1     5
  4  77 B          1 LA          1    15
  5  77 C          2 NY          3    12
  6  77 D          3 LA          2    11 

The above output, as mentioned, does not match the output requested by the OP, since the OP’s requested output specifes that equal values for different rows should show the same rank value.

One more try using the DENSE_RANK analytic function:

SELECT
  TX,
  ID,
  DEPT,
  DENSE_RANK() OVER (PARTITION BY ID ORDER BY SUM_LOAD_ID DESC) DEPT_RANK,
  LOCATION,
  DENSE_RANK() OVER (PARTITION BY ID ORDER BY SUM_LOAD_LOCATION DESC) LOC_RANK,
  LOAD
FROM
  (SELECT
    TX,
    ID,
    DEPT,
    SUM(LOAD) OVER (PARTITION BY ID,DEPT) SUM_LOAD_ID,
    LOCATION,
    SUM(LOAD) OVER (PARTITION BY ID,LOCATION) SUM_LOAD_LOCATION,
    LOAD
  FROM
    T1)
ORDER BY
  TX;

 TX  ID D  DEPT_RANK LO   LOC_RANK  LOAD
--- --- - ---------- -- ---------- -----
  1  99 A          1 NY          2    12
  2  99 A          1 LA          1    10
  3  99 B          2 LA          1     5
  4  77 B          1 LA          1    15
  5  77 C          2 NY          2    12
  6  77 D          3 LA          1    11 

Let’s compare the above output with what the OP requested:

TX   ID   DEPT   DEPT_RANK   LOCATION   LOC_RANK   LOAD
1    99    A         1          NY         2        12
2    99    A         1          LA         1        10
3    99    B         2          LA         1        05
4    77    B         1          LA         1        15
5    77    C         2          NY         2        12
6    77    D         3          LA         1        11

As best as I am able to determine, the above SQL statement will satisfy the OP’s request.

—–

Part 2 of the Challenge

If the OP has the following SQL statement:

SELECT
  LOAD_YEAR,
  ORG_UNIT_CODE,
  RANK () OVER (PARTITION BY LOAD_YEAR, CLASSIF_CODE ORDER BY SUM (FTE_DAYS) DESC) ORG_RANK,
  CLASSIF_CODE,
  RANK () OVER (PARTITION BY LOAD_YEAR, ORG_UNIT_CODE ORDER BY SUM (FTE_DAYS) DESC) CLASSIF_RANK,
  SUM (FTE_DAYS) FTE
FROM
  STAFF_LOAD
GROUP BY
  LOAD_YEAR,
  ORG_UNIT_CODE,
  CLASSIF_CODE;  

And the above SQL statement produces the following output:

YEAR ORG_UNIT_CODE ORG_RANK CLASSIF_CODE CLASSIF_RANK FTE
2010 A46                  1 HEW3                    1  59
2010 A42                  2 HEW3                    1  13
2010 A42                  1 HEW4                    1  13
2010 A46                  2 HEW4                    2  12

And the OP wants the output to look like this:

YEAR ORG_UNIT_CODE ORG_RANK CLASSIF_CODE CLASSIF_RANK FTE
2010 A46                  1 HEW3                    1  59
2010 A42                  2 HEW3                    1  13
2010 A42                  2 HEW4                    2  13
2010 A46                  1 HEW4                    2  12  

Write the DDL and DML statements to create the source table and populate it with the non-aggregated original data, and then produce the output requested by the OP (without looking at the updated usenet thread).





Simple SQL with and without Inline Views

21 10 2011

October 21, 2011

(Forward to the Next Post in the Series)

Sometimes it is interesting to take a step back from some of the more difficult to construct SQL statements, and just put together something simple.  The following request recently arrived through an ERP mailing list:

“I want to find the last ship date for all the open part numbers in Customer Order Entry. The last ship date is not necessarily from the Open order in Customer Order Entry…”

The original poster (OP) provided more information, and even provided a description of what was already tried as a solution.  It is not exceptionally clear what the OP wanted, but because I have several years of experience working with this particular ERP system, I had a feeling that the OP might be headed in the wrong direction.  This ERP system uses tables that are well normalized, which would make it a bit more difficult to locate the most recent ship date for a part number that is recorded with one customer order’s tables’ rows when another customer order’s tables’ rows are reported.  Unless, of course, we remember that there is also a transaction table that records every inventory movement of part numbers through the system – we just need to know what those rows in the transaction table look like.  In this case, the rows have the following characteristics:

  • The CUST_ORDER_ID column value must not be NULL
  • The TYPE column must have a value of ‘O’
  • The CLASS column must have a value of ‘I’

With that knowledge, we could then find the most recent date that any part number shipped with a SQL statement similar to the following:

SELECT
  PART_ID,
  MAX(TRANSACTION_DATE) AS LAST_TRANSACTION_DATE
FROM
  INVENTORY_TRANS
WHERE
  CUST_ORDER_ID IS NOT NULL
  AND TYPE='O'
  AND CLASS='I'
GROUP BY
  PART_ID;

In the above, the AS keyword is optional on an Oracle Database platform, but it is not optional on some other database platforms.

With an Oracle Database backend, inline views may be used to permit the above SQL statement to be used to retrieve additional information for a parent SQL statement.  If we wrap the above SQL statement in ( ) and place it in the FROM clause of the SQL statement, Oracle will treat the data returned by the above SQL statement quite similar to how it would handle a regular table (the Oracle query optimizer might decide to rewrite the combined SQL statement into an equivalent form that no longer contains an inline view).  There is always a chance that a part number that is on order may have never shipped to a customer, so we will need to outer join (in this case using a left outer join) to the newly created inline view:

SELECT
  COL.CUST_ORDER_ID,
  COL.PART_ID,
  LS.LAST_TRANSACTION_DATE
FROM
  CUSTOMER_ORDER CO,
  CUST_ORDER_LINE COL,
  (SELECT
    PART_ID,
    MAX(TRANSACTION_DATE) AS LAST_TRANSACTION_DATE
  FROM
    INVENTORY_TRANS
  WHERE
    CUST_ORDER_ID IS NOT NULL
    AND TYPE='O'
    AND CLASS='I'
  GROUP BY
    PART_ID) LS
WHERE
  CO.ID=COL.CUST_ORDER_ID
  AND CO.STATUS IN ('U','F','R')
  AND COL.LINE_STATUS='A'
  AND COL.ORDER_QTY>COL.TOTAL_SHIPPED_QTY
  AND COL.PART_ID=LS.PART_ID(+);

In the above, I joined the CUSTOMER_ORDER and CUST_ORDER_LINE tables to the inline view that I aliased as LS.  A third table, CUST_LINE_DEL, that optionally contains the delivery schedule for some of the rows in the CUST_ORDER_LINE table, could have also been outer joined to the CUST_ORDER_LINE table.

Running SQL Server, or just feeling ANSI?  If so, the above may be rewritten as follows:

SELECT
  COL.CUST_ORDER_ID,
  COL.PART_ID,
  LS.LAST_TRANSACTION_DATE
FROM
  CUSTOMER_ORDER CO
JOIN
  CUST_ORDER_LINE COL
ON
  CO.ID=COL.CUST_ORDER_ID
LEFT OUTER JOIN
  (SELECT
    PART_ID,
    MAX(TRANSACTION_DATE) AS LAST_TRANSACTION_DATE
  FROM
    INVENTORY_TRANS
  WHERE
    CUST_ORDER_ID IS NOT NULL
    AND TYPE='O'
    AND CLASS='I'
  GROUP BY
    PART_ID) LS
ON
  COL.PART_ID=LS.PART_ID
WHERE
  CO.STATUS IN ('U','F','R')
  AND COL.LINE_STATUS='A'
  AND COL.ORDER_QTY>COL.TOTAL_SHIPPED_QTY;

Unfortunately, the OP is actually using a SQLBase database backend that does not support inline views.  I remember the feeling before I discovered that Oracle Database supported inline views… in that case I would do something like the following:

  1. Create a statically defined view.
  2. Join to that statically defined view just as if the view were a table.
CREATE VIEW CUST_ORDER_PART_LS AS
SELECT
  PART_ID,
  MAX(TRANSACTION_DATE) AS LAST_TRANSACTION_DATE
FROM
  INVENTORY_TRANS
WHERE
  CUST_ORDER_ID IS NOT NULL
  AND TYPE='O'
  AND CLASS='I'
GROUP BY
  PART_ID;

SELECT
  COL.CUST_ORDER_ID,
  COL.PART_ID,
  LS.LAST_TRANSACTION_DATE
FROM
  CUSTOMER_ORDER CO,
  CUST_ORDER_LINE COL,
  CUST_ORDER_PART_LS LS
WHERE
  CO.ID=COL.CUST_ORDER_ID
  AND CO.STATUS IN ('U','F','R')
  AND COL.LINE_STATUS='A'
  AND COL.ORDER_QTY>COL.TOTAL_SHIPPED_QTY
  AND COL.PART_ID=LS.PART_ID(+);

I guess that it is a good exercise once in a while to practice simple SQL.

A second example from the ERP mailing list showed a SQL Server solution for a particular problem.  The particular problem that the SQL Server solution set out to solve is as follows:

“Someone posed an interesting question to me – How can you do a mass insert of Operations? Well most of us know how to manage that without too much trouble but this one came with a couple of caveats – The Master’s first operation (lowest numbered) has to have a particular ResourceID and then a different but specific ResourceID must be on a subsequent Operation. It is after the second operation where the insert must be placed.

Sounds like fun – So I figured that it could be done in a single SQL statement – well after a 4-way join of the Operation table to itself I got it to work how I expected…”

Interesting, I think that there are times in the past that I have used multiple self-joins to a table in order to solve similar SQL problems.  However, there is an easier way using analytic functions.  The following SQL statement attempts to indicate: the sequence of the operation within the work order sub ID, the sequence number that follows sequentially, the second sequence number that follows sequentially, and the resource ID of the next operation in sequential order by the operation number:

SELECT
  WORKORDER_BASE_ID,
  WORKORDER_LOT_ID,
  WORKORDER_SPLIT_ID,
  WORKORDER_SUB_ID,
  SEQUENCE_NO,
  RESOURCE_ID,
  ROW_NUMBER() OVER (PARTITION BY WORKORDER_BASE_ID, WORKORDER_LOT_ID, WORKORDER_SPLIT_ID, WORKORDER_SUB_ID
                     ORDER BY RESOURCE_ID) AS RN,
  LEAD(RESOURCE_ID,1) OVER (PARTITION BY WORKORDER_BASE_ID, WORKORDER_LOT_ID, WORKORDER_SPLIT_ID, WORKORDER_SUB_ID
                            ORDER BY RESOURCE_ID) AS NEXT_RESOURCE_ID,
  LEAD(SEQUENCE_NO,1) OVER (PARTITION BY WORKORDER_BASE_ID, WORKORDER_LOT_ID, WORKORDER_SPLIT_ID, WORKORDER_SUB_ID
                            ORDER BY RESOURCE_ID) AS NEXT_SEQ_NO,
  LEAD(SEQUENCE_NO,2) OVER (PARTITION BY WORKORDER_BASE_ID, WORKORDER_LOT_ID, WORKORDER_SPLIT_ID, WORKORDER_SUB_ID
                            ORDER BY RESOURCE_ID) AS NEXT_NEXT_SEQ_NO
FROM
  OPERATION
WHERE
  WORKORDER_TYPE='M'; 

Next, we only care about those rows when our resource ID of interest (69 in my example) is specified in the first operation on the work order sub ID and the second operation on that work order sub ID specifies the other resource ID that is of interest (FW in my example).  If we are to avoid primary key violations, we should select a new sequence number that is half way between the next-next operation and the next operation:

SELECT
  WORKORDER_BASE_ID,
  WORKORDER_LOT_ID,
  WORKORDER_SPLIT_ID,
  WORKORDER_SUB_ID,
  SEQUENCE_NO,
  NEXT_SEQ_NO,
  NEXT_NEXT_SEQ_NO,
  NEXT_SEQ_NO + ROUND((COALESCE(NEXT_NEXT_SEQ_NO,NEXT_SEQ_NO+10) - NEXT_SEQ_NO)/2) NEW_SEQUENCE_NO,
  'NEW RESOURCE' RESOURCE_ID
FROM
  (SELECT
    WORKORDER_BASE_ID,
    WORKORDER_LOT_ID,
    WORKORDER_SPLIT_ID,
    WORKORDER_SUB_ID,
    SEQUENCE_NO,
    RESOURCE_ID,
    ROW_NUMBER() OVER (PARTITION BY WORKORDER_BASE_ID, WORKORDER_LOT_ID, WORKORDER_SPLIT_ID, WORKORDER_SUB_ID
                       ORDER BY RESOURCE_ID) AS RN,
    LEAD(RESOURCE_ID,1) OVER (PARTITION BY WORKORDER_BASE_ID, WORKORDER_LOT_ID, WORKORDER_SPLIT_ID, WORKORDER_SUB_ID
                              ORDER BY RESOURCE_ID) AS NEXT_RESOURCE_ID,
    LEAD(SEQUENCE_NO,1) OVER (PARTITION BY WORKORDER_BASE_ID, WORKORDER_LOT_ID, WORKORDER_SPLIT_ID, WORKORDER_SUB_ID
                              ORDER BY RESOURCE_ID) AS NEXT_SEQ_NO,
    LEAD(SEQUENCE_NO,2) OVER (PARTITION BY WORKORDER_BASE_ID, WORKORDER_LOT_ID, WORKORDER_SPLIT_ID, WORKORDER_SUB_ID
                              ORDER BY RESOURCE_ID) AS NEXT_NEXT_SEQ_NO
  FROM
    OPERATION
  WHERE
    WORKORDER_TYPE='M')
WHERE
  RN=1
  AND RESOURCE_ID='69'
  AND NEXT_RESOURCE_ID='FW';

WORKORDER_ WOR WOR WOR SEQUENCE_NO NEXT_SEQ_NO NEXT_NEXT_SEQ_NO NEW_SEQUENCE_NO RESOURCE_ID
---------- --- --- --- ----------- ----------- ---------------- --------------- ------------
231610     0   0   0            10         777              888             833 NEW RESOURCE
237680     0   0   0            10         777              888             833 NEW RESOURCE
32018917X  0   0   0            10         777              888             833 NEW RESOURCE
3201B8920  0   0   0            10         777              888             833 NEW RESOURCE
3201C8765  0   0   0            10         777              888             833 NEW RESOURCE
3201G8639  0   0   0            10         777              888             833 NEW RESOURCE
3201G9003  0   0   0            10         777              888             833 NEW RESOURCE
3201J8772  0   0   0            10         777              888             833 NEW RESOURCE
3201J8850  0   0   0            10         777              888             833 NEW RESOURCE
3201K8669  0   0   0            10         777              888             833 NEW RESOURCE
3201M8281  0   0   0            10         777              888             833 NEW RESOURCE
3201M8489  0   0   0            10         777              888             833 NEW RESOURCE
3201N8516  0   0   0            10         777              888             833 NEW RESOURCE
3201N8776  0   0   0            10         777              888             833 NEW RESOURCE
3201Q8545  0   0   0            10         777              888             833 NEW RESOURCE
3201T8964  0   0   0            10         777              888             833 NEW RESOURCE
3201T8964X 0   0   0            10          20               30              25 NEW RESOURCE
3201V8524  0   0   0            10         777              888             833 NEW RESOURCE
3201V8966  0   0   0            10         777              888             833 NEW RESOURCE
3201V8966X 0   0   0            10          20               30              25 NEW RESOURCE
3201W8967  0   0   0            10         777              888             833 NEW RESOURCE
3201W8967X 0   0   0            10          20               30              25 NEW RESOURCE
3201Y8423  0   0   0            10         777              888             833 NEW RESOURCE
3201Z8996  0   0   0            10         777              888             833 NEW RESOURCE

24 rows selected. 

With the above, we are well on our way to perfoming a mass insert into this table.  If I recall correctly, the above must first be inserted into another table (a temp table, preferrably) and then the rows may be inserted into the OPERATION table (the table that is the row source for the analytic functions used in the SQL statement).





Finding Rows with Common Attributes – Roman to Find a Solution in Something New

1 07 2011

July 1, 1011

A recent thread in the comp.databases.oracle.server Usenet group (actually two threads) asked an interesting question.  Assume that you had a detail table that contained several attributes for each of the unique key values.  How would one go about finding all of the unique key values that share the same set of attributes?  The sample set provided by the OP looks like this:

COL1   COL2
----  -----
I         a
I         b
I         c
II        a
II        b
III       a
III       b
III       c 

For the above, assume that the OP was interested in the attributes of “I”: a,b,c.  “II” lacks a “c” attribute, while “III” has the required “a”, “b”, and “c” attributes.  So, the OP would like to return C1 value “III” but not “II”.  I wonder if there is a simple solution for the OP?

First, let’s create our test data.  COL1 appears to contain Roman numbers – if we go beyond the number 3, those could be tricky to generate (unless of course you find the RN format parameter for the TO_CHAR function).  Let’s first create a temporary work table that contains the Roman numbers from 1 to 100 and a random number between 1 and 10:

CREATE TABLE T1_TEMP AS
SELECT
  TRIM(TO_CHAR(ROWNUM,'RN')) C1,
  TRUNC(DBMS_RANDOM.VALUE(1,10)+1) C2,
  ROWNUM C3
FROM
  DUAL
CONNECT BY
  LEVEL<=100; 

Let’s see what is in the T1_TEMP table:

COLUMN C1 FORMAT A10

SELECT
  *
FROM
  T1_TEMP
ORDER BY
  C3;

C1                 C2         C3
---------- ---------- ----------
I                  10          1
II                  4          2
III                 7          3
IV                  9          4
V                   8          5
VI                 10          6
VII                 9          7
VIII                4          8
IX                  4          9
X                  10         10
...
XCV                 5         95
XCVI                4         96
XCVII               8         97
XCVIII              7         98
XCIX               10         99
C                   4        100

100 rows selected. 

The row with the value “I” in column C1 has the number 10 in column C2, but that number might be a bit different in your temporary work table.  Column C2 will determine the number of attributes that are added for each of the values found in column C1 when we create the table T1 (note that we could have defined column C2 with the function CHR(96 + COUNTER) to place lowercase letters in that column, rather than numbers, to help reproduce the OP’s dataset):

CREATE TABLE T1 AS
SELECT
  T1_TEMP.C1,
  V1.COUNTER C2
FROM
  T1_TEMP,
  (SELECT
     ROWNUM COUNTER
   FROM
     DUAL
   CONNECT BY
     LEVEL<=10) V1
WHERE
  T1_TEMP.C2>=V1.COUNTER; 

Let’s see what is in table T1:

SELECT
  *
FROM
  T1
ORDER BY
  C1,
  C2;

C1                 C2
---------- ----------
C                   1
C                   2
C                   3
C                   4
I                   1
I                   2
I                   3
I                   4
I                   5
I                   6
I                   7
I                   8
I                   9
I                  10
...
XXXVII              1
XXXVII              2
XXXVII              3
XXXVIII             1
XXXVIII             2
XXXVIII             3
XXXVIII             4
XXXVIII             5
XXXVIII             6

634 rows selected.  

From the above output, you can see that we now have the number of rows in table T1 for each distinct value of C1 as was specified in table T1_TEMP.  An interesting side-note, the Roman number 100 (C) is less than the Roman number 1 (I) – I guess that explains why computers do not natively use Roman numbers for calculations.  :-)

For the next step, we need to collapse the different C2 values for each of the unique C1 values into a single row.  Oracle Database 11.2.0.1 introduced the LISTAGG function that makes easy work of this task, as shown in this earlier blog article.

COLUMN C2_LISTING FORMAT A22

SELECT
  C1,
  LISTAGG(TO_CHAR(C2), ',') WITHIN GROUP (ORDER BY C2) C2_LISTING
FROM
  T1
GROUP BY
  C1
ORDER BY
  C1;

C1         C2_LISTING
---------- --------------------
C          1,2,3,4
I          1,2,3,4,5,6,7,8,9,10
II         1,2,3,4
III        1,2,3,4,5,6,7
IV         1,2,3,4,5,6,7,8,9
IX         1,2,3,4
...
XXXV       1,2,3,4,5,6,7,8,9
XXXVI      1,2,3,4,5
XXXVII     1,2,3
XXXVIII    1,2,3,4,5,6

100 rows selected. 

The question remains, how can I find all of the unique C1 values that have all of the same attributes as the C1 value “I” – in this case 1,2,3,4,5,6,7,8,9,10?  One method slides the above query into a WITH block and then the WITH block is referenced twice in the main query:

WITH MY_VIEW AS
(SELECT
  C1,
  LISTAGG(TO_CHAR(C2), ',') WITHIN GROUP (ORDER BY C2) C2_LISTING
FROM
  T1
GROUP BY
  C1)
SELECT
  V2.C1,
  V2.C2_LISTING
FROM
  MY_VIEW V1,
  MY_VIEW V2
WHERE
  V1.C1='I'
  AND V1.C1<>V2.C1
  AND V1.C2_LISTING=V2.C2_LISTING
ORDER BY
  V2.C1;

C1         C2_LISTING
---------- --------------------
LVII       1,2,3,4,5,6,7,8,9,10
LXXI       1,2,3,4,5,6,7,8,9,10
LXXIII     1,2,3,4,5,6,7,8,9,10
VI         1,2,3,4,5,6,7,8,9,10
X          1,2,3,4,5,6,7,8,9,10
XCIX       1,2,3,4,5,6,7,8,9,10
XV         1,2,3,4,5,6,7,8,9,10
XXIX       1,2,3,4,5,6,7,8,9,10
XXXI       1,2,3,4,5,6,7,8,9,10

9 rows selected. 

How else might you solve the problem posted by the OP in the Usenet thread?





Row Values to Comma Separated Lists, an Overly Complicated Use Case Example

26 05 2011

May 26, 2011

In a previous article I showed a simple method to convert values found in multiple table rows into a comma delimited list.  The method works very well, until the situation in which the approach is to be used becomes a bit more complex.

Assume for a moment that the following table structure exists where the approach must be implemented:

In the above diagram, the PURCHASE_ORDER table is the parent table, the PURC_ORDER_LINE table is the child table (the line-level detail), and the PURC_LINE_DEL table is the grand-child table (the sub-line level detail).  A fairly simple setup, and I suspect not a too terribly uncommon setup.

Now the complexity.  Assume that an ERP package using the above tables generates a report using the tables’ data, and as the ERP package is flexible, it allows additional information to be added to the reports – essentially, it is possible to execute a query that retrieves a single row for every detail-level row in the report.  Great, except that in this particular report, the detail-level row level is at the PURC_ORDER_LINE table.  If there were 18 rows, for instance, in the PURC_LINE_DEL table (the schedule that indicates when the parts should be delivered) for the first order line in the PURC_ORDER_LINE table for a particular order ID that is found in the PURCHASE_ORDER table, then we have a bit of a problem.  Fortunately, this ERP package’s report was constructed such that the DESIRED_RECV_DATE and USER_ORDER_QTY columns are represented in two report variables with CRLF (carriage return – line feed) delimited lists, making it very easy to list the delivery dates that the parts are due, and the quantities that are due for each of those dates.  Fantastic, so where is the complexity?  What if the purchase order must be revised after the supplier starts sending the parts on the agreed upon delivery dates, such that the first 9 of the 18 delivery schedules for line 1 of a purchase order have been satisfied, 1 of the 18 delivery schedule lines has been partially satisfied, and the remaining 8 delivery schedule lines have dates that are in the future?  This might be helpful information to provide to the supplier when presenting the updated purchase order.  Unfortunately, the ERP package’s report does not deliver the USER_RECEIVED_QTY column as a CRLF delimited list, as it does for the DESIRED_RECV_DATE and USER_ORDER_QTY columns. 

A complex problem, or not?  Well, the ERP package does permit a SQL statement returning a single row to be executed per detail row in the report (at the PURC_ORDER_LINE table level), so we have a possible solution.  However, if the SQL statement is too complex (having inline views is one such complication), the complex portion of the SQL statement must be isolated in a statically defined view, and then that view should be referenced by the report.  A complex problem, or not?  Let’s try building a solution starting in SQL*Plus:

SELECT
  PURC_ORDER_ID,
  PURC_ORDER_LINE_NO,
  USER_RECEIVED_QTY,
  ROW_NUMBER() OVER (PARTITION BY PURC_ORDER_ID,PURC_ORDER_LINE_NO ORDER BY DEL_SCHED_LINE_NO) RN
FROM
  PURC_LINE_DEL
WHERE
  PURC_ORDER_ID='146825'
  AND PURC_ORDER_LINE_NO=1
ORDER BY
  DEL_SCHED_LINE_NO;

PURC_ORDER_ID   PURC_ORDER_LINE_NO USER_RECEIVED_QTY         RN
--------------- ------------------ ----------------- ----------
146825                           1                 1          1
146825                           1                 1          2
146825                           1                 2          3
146825                           1                 2          4
146825                           1                 2          5
146825                           1                 3          6
146825                           1                 3          7
146825                           1                 1          8
146825                           1                 3          9
146825                           1                 2         10
146825                           1                 0         11
146825                           1                 0         12
146825                           1                 0         13
146825                           1                 0         14
146825                           1                 0         15
146825                           1                 0         16
146825                           1                 0         17
146825                           1                 0         18 

Simple so far, that line identified by RN 10 requests an order quantity of 3, but the supplier only delivered 2 of the 3 parts for that date.  Now let’s generate a comma separated list using the SYS_CONNECT_BY_PATH function:

SELECT
  SYS_CONNECT_BY_PATH(TO_CHAR(USER_RECEIVED_QTY),',') LISTING
FROM
(SELECT
  PURC_ORDER_ID,
  PURC_ORDER_LINE_NO,
  USER_RECEIVED_QTY,
  ROW_NUMBER() OVER (PARTITION BY PURC_ORDER_ID,PURC_ORDER_LINE_NO ORDER BY DEL_SCHED_LINE_NO) RN
FROM
  PURC_LINE_DEL
WHERE
  PURC_ORDER_ID='146825'
  AND PURC_ORDER_LINE_NO=1
ORDER BY
  DEL_SCHED_LINE_NO) V
CONNECT BY PRIOR
  RN = RN-1
START WITH
  RN = 1;

LISTING
------------------------------------

,1
,1,1
,1,1,2
,1,1,2,2
,1,1,2,2,2
,1,1,2,2,2,3
,1,1,2,2,2,3,3
,1,1,2,2,2,3,3,1
,1,1,2,2,2,3,3,1,3
,1,1,2,2,2,3,3,1,3,2
,1,1,2,2,2,3,3,1,3,2,0
,1,1,2,2,2,3,3,1,3,2,0,0
,1,1,2,2,2,3,3,1,3,2,0,0,0
,1,1,2,2,2,3,3,1,3,2,0,0,0,0
,1,1,2,2,2,3,3,1,3,2,0,0,0,0,0
,1,1,2,2,2,3,3,1,3,2,0,0,0,0,0,0
,1,1,2,2,2,3,3,1,3,2,0,0,0,0,0,0,0
,1,1,2,2,2,3,3,1,3,2,0,0,0,0,0,0,0,0 

We only want the last row returned from the above result, and we need to change those commas to a CRLF character combination (after eliminating the initial comma):

SELECT
  MAX(REPLACE(SUBSTR(SYS_CONNECT_BY_PATH(TO_CHAR(USER_RECEIVED_QTY),','),2),',',CHR(13)||CHR(10))) LISTING
FROM
(SELECT
  PURC_ORDER_ID,
  PURC_ORDER_LINE_NO,
  USER_RECEIVED_QTY,
  ROW_NUMBER() OVER (PARTITION BY PURC_ORDER_ID,PURC_ORDER_LINE_NO ORDER BY DEL_SCHED_LINE_NO) RN
FROM
  PURC_LINE_DEL
WHERE
  PURC_ORDER_ID='146825'
  AND PURC_ORDER_LINE_NO=1
ORDER BY
  DEL_SCHED_LINE_NO) V
CONNECT BY PRIOR
  RN = RN-1
START WITH
  RN = 1;

LISTING
-------

1
1
2
2
2
3
3
1
3
2
0
0
0
0
0
0
0
0

Looks like mission accomplished, all of the previously received quantities are returned in a single query row… except now we have a complicated query that must be isolated in a statically defined view (silly rules…).

Before we head down the path of isolating the complex portion of the query, I thought that it might be entertaining to examine a couple of execution plan to see what happens when we isolate the complicated portion of the query in a statically defined view.  First, we will take a look at the last of the above SQL statements:

SET LINESIZE 140
SET TRIMSPOOL ON
SET PAGESIZE 1000
SET AUTOTRACE TRACEONLY EXPLAIN

SELECT
  MAX(REPLACE(SUBSTR(SYS_CONNECT_BY_PATH(TO_CHAR(USER_RECEIVED_QTY),','),2),',',CHR(13)||CHR(10))) LISTING
FROM
(SELECT
  PURC_ORDER_ID,
  PURC_ORDER_LINE_NO,
  USER_RECEIVED_QTY,
  ROW_NUMBER() OVER (PARTITION BY PURC_ORDER_ID,PURC_ORDER_LINE_NO ORDER BY DEL_SCHED_LINE_NO) RN
FROM
  PURC_LINE_DEL
WHERE
  PURC_ORDER_ID='146825'
  AND PURC_ORDER_LINE_NO=1
ORDER BY
  DEL_SCHED_LINE_NO) V
CONNECT BY PRIOR
  RN = RN-1
START WITH
  RN = 1;

Plan hash value: 929864805

----------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |               |     1 |    26 |     4  (25)| 00:00:01 |
|   1 |  SORT AGGREGATE                          |               |     1 |    26 |            |          |
|*  2 |   CONNECT BY NO FILTERING WITH START-WITH|               |       |       |            |          |
|   3 |    VIEW                                  |               |     4 |   104 |     3   (0)| 00:00:01 |
|   4 |     WINDOW NOSORT                        |               |     4 |    76 |     3   (0)| 00:00:01 |
|   5 |      TABLE ACCESS BY INDEX ROWID         | PURC_LINE_DEL |     4 |    76 |     3   (0)| 00:00:01 |
|*  6 |       INDEX RANGE SCAN                   | SYS_C0021790  |     4 |       |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("RN"-1=PRIOR "RN")
       filter("RN"=1)
   6 - access("PURC_ORDER_ID"='146825' AND "PURC_ORDER_LINE_NO"=1) 

The above looks simple, and the query really does execute quickly.  For the above complex portion of the query to work in a statically defined view, we need to relocate the WHERE clause outside of the inline view, so that when the report runs, it is able to pass in the appropriate values for the PURC_ORDER_ID and PURC_ORDER_LINE_NO columns.  Here is the execution plan (generated on Oracle Database 11.2.0.2) when the WHERE clause is relocated:

SELECT
  MAX(REPLACE(SUBSTR(SYS_CONNECT_BY_PATH(TO_CHAR(USER_RECEIVED_QTY),','),2),',',CHR(13)||CHR(10))) LISTING
FROM
(SELECT
  PURC_ORDER_ID,
  PURC_ORDER_LINE_NO,
  USER_RECEIVED_QTY,
  ROW_NUMBER() OVER (PARTITION BY PURC_ORDER_ID,PURC_ORDER_LINE_NO ORDER BY DEL_SCHED_LINE_NO) RN
FROM
  PURC_LINE_DEL
ORDER BY
  DEL_SCHED_LINE_NO) V
WHERE
  PURC_ORDER_ID='146825'
  AND PURC_ORDER_LINE_NO=1
CONNECT BY PRIOR
  RN = RN-1
START WITH
  RN = 1;

Plan hash value: 2107147172

-------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                 | Name          | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                          |               |     1 |    48 |       |  1193  (50)| 00:00:05 |
|   1 |  SORT AGGREGATE                           |               |     1 |    48 |       |            |          |
|*  2 |   FILTER                                  |               |       |       |       |            |          |
|*  3 |    CONNECT BY NO FILTERING WITH START-WITH|               |       |       |       |            |          |
|   4 |     VIEW                                  |               | 53851 |  2524K|       |   625   (3)| 00:00:03 |
|   5 |      SORT ORDER BY                        |               | 53851 |   999K|  1704K|   625   (3)| 00:00:03 |
|   6 |       WINDOW SORT                         |               | 53851 |   999K|  1704K|   625   (3)| 00:00:03 |
|   7 |        TABLE ACCESS FULL                  | PURC_LINE_DEL | 53851 |   999K|       |    57   (8)| 00:00:01 |
-------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("PURC_ORDER_ID"='146825' AND "PURC_ORDER_LINE_NO"=1)
   3 - access("RN"-1=PRIOR "RN")
       filter("RN"=1) 

The above looks to be reasonably sane, although the predicted amount of time jumped from about 1 second to about 5 seconds.  The Predicate Information section of the execution plan indicates that the filter for the specified PURC_ORDER_ID and PURC_ORDER_LINE_NO was not pushed into the inline view as I had hoped.  The actual execution time for this simple query is much longer than 5 seconds, spending 100% of its time burning CPU time (I killed the query after waiting several minutes).  No problem, I will just use a hint to force the optimizer to push those predicates into the inline view:

SELECT /*+ PUSH_PRED(V) */
  MAX(REPLACE(SUBSTR(SYS_CONNECT_BY_PATH(TO_CHAR(USER_RECEIVED_QTY),','),2),',',CHR(13)||CHR(10))) LISTING
FROM
(SELECT
  PURC_ORDER_ID,
  PURC_ORDER_LINE_NO,
  USER_RECEIVED_QTY,
  ROW_NUMBER() OVER (PARTITION BY PURC_ORDER_ID,PURC_ORDER_LINE_NO ORDER BY DEL_SCHED_LINE_NO) RN
FROM
  PURC_LINE_DEL
ORDER BY
  DEL_SCHED_LINE_NO) V
WHERE
  PURC_ORDER_ID='146825'
  AND PURC_ORDER_LINE_NO=1
CONNECT BY PRIOR
  RN = RN-1
START WITH
  RN = 1;

Plan hash value: 2107147172

-------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                 | Name          | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                          |               |     1 |    48 |       |  1193  (50)| 00:00:05 |
|   1 |  SORT AGGREGATE                           |               |     1 |    48 |       |            |          |
|*  2 |   FILTER                                  |               |       |       |       |            |          |
|*  3 |    CONNECT BY NO FILTERING WITH START-WITH|               |       |       |       |            |          |
|   4 |     VIEW                                  |               | 53851 |  2524K|       |   625   (3)| 00:00:03 |
|   5 |      SORT ORDER BY                        |               | 53851 |   999K|  1704K|   625   (3)| 00:00:03 |
|   6 |       WINDOW SORT                         |               | 53851 |   999K|  1704K|   625   (3)| 00:00:03 |
|   7 |        TABLE ACCESS FULL                  | PURC_LINE_DEL | 53851 |   999K|       |    57   (8)| 00:00:01 |
-------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("PURC_ORDER_ID"='146825' AND "PURC_ORDER_LINE_NO"=1)
   3 - access("RN"-1=PRIOR "RN")
       filter("RN"=1) 

Well, the above is disappointing.  I could dig through a 10053 trace to try to determine why the hint was “ignored“, or I could try another query that produces an equivalent result.  I choose the latter.  Let’s try using the COUNT analytic function to replace the ROW_NUMBER analytic function:

SELECT
  MAX(REPLACE(SUBSTR(SYS_CONNECT_BY_PATH(TO_CHAR(USER_RECEIVED_QTY),','),2),',',CHR(13)||CHR(10))) LISTING
FROM
(SELECT
  PURC_ORDER_ID,
  PURC_ORDER_LINE_NO,
  DEL_SCHED_LINE_NO,
  USER_RECEIVED_QTY,
  COUNT(*) OVER (PARTITION BY PURC_ORDER_ID,PURC_ORDER_LINE_NO ORDER BY DEL_SCHED_LINE_NO) RN
FROM
  PURC_LINE_DEL
WHERE
  PURC_ORDER_ID='146825'
  AND PURC_ORDER_LINE_NO=1
ORDER BY
  DEL_SCHED_LINE_NO) V
CONNECT BY PRIOR
  RN=RN-1
START WITH
  RN=1;

Plan hash value: 617003335

----------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |               |     1 |    26 |     4  (25)| 00:00:01 |
|   1 |  SORT AGGREGATE                          |               |     1 |    26 |            |          |
|*  2 |   CONNECT BY NO FILTERING WITH START-WITH|               |       |       |            |          |
|   3 |    VIEW                                  |               |     4 |   104 |     3   (0)| 00:00:01 |
|   4 |     WINDOW BUFFER                        |               |     4 |    76 |     3   (0)| 00:00:01 |
|   5 |      TABLE ACCESS BY INDEX ROWID         | PURC_LINE_DEL |     4 |    76 |     3   (0)| 00:00:01 |
|*  6 |       INDEX RANGE SCAN                   | SYS_C0021790  |     4 |       |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("RN"-1=PRIOR "RN")
       filter("RN"=1)
   6 - access("PURC_ORDER_ID"='146825' AND "PURC_ORDER_LINE_NO"=1) 

The execution plan appears to be reasonable – the predicates on the PURC_ORDER_ID and PURC_ORDER_LINE_NO columns are applied early, so the query executes quickly.  Now to move the WHERE clause:

SELECT
  MAX(REPLACE(SUBSTR(SYS_CONNECT_BY_PATH(TO_CHAR(USER_RECEIVED_QTY),','),2),',',CHR(13)||CHR(10))) LISTING
FROM
(SELECT
  PURC_ORDER_ID,
  PURC_ORDER_LINE_NO,
  DEL_SCHED_LINE_NO,
  USER_RECEIVED_QTY,
  COUNT(*) OVER (PARTITION BY PURC_ORDER_ID,PURC_ORDER_LINE_NO ORDER BY DEL_SCHED_LINE_NO) RN
FROM
  PURC_LINE_DEL
ORDER BY
  DEL_SCHED_LINE_NO) V
WHERE
  PURC_ORDER_ID='146825'
  AND PURC_ORDER_LINE_NO=1
CONNECT BY PRIOR
  RN=RN-1
START WITH
  RN=1;

Plan hash value: 2107147172

-------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                 | Name          | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                          |               |     1 |    48 |       |  1193  (50)| 00:00:05 |
|   1 |  SORT AGGREGATE                           |               |     1 |    48 |       |            |          |
|*  2 |   FILTER                                  |               |       |       |       |            |          |
|*  3 |    CONNECT BY NO FILTERING WITH START-WITH|               |       |       |       |            |          |
|   4 |     VIEW                                  |               | 53851 |  2524K|       |   625   (3)| 00:00:03 |
|   5 |      SORT ORDER BY                        |               | 53851 |   999K|  1704K|   625   (3)| 00:00:03 |
|   6 |       WINDOW SORT                         |               | 53851 |   999K|  1704K|   625   (3)| 00:00:03 |
|   7 |        TABLE ACCESS FULL                  | PURC_LINE_DEL | 53851 |   999K|       |    57   (8)| 00:00:01 |
-------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("PURC_ORDER_ID"='146825' AND "PURC_ORDER_LINE_NO"=1)
   3 - access("RN"-1=PRIOR "RN")
       filter("RN"=1) 

As had happened before, the predicates for the PURC_ORDER_ID and PURC_ORDER_LINE_NO columns were not pushed into the inline view, so the query executes extremely slow.  Let’s try the PUSH_PRED hint again:

SELECT /*+ PUSH_PRED(V) */
  MAX(REPLACE(SUBSTR(SYS_CONNECT_BY_PATH(TO_CHAR(USER_RECEIVED_QTY),','),2),',',CHR(13)||CHR(10))) LISTING
FROM
(SELECT
  PURC_ORDER_ID,
  PURC_ORDER_LINE_NO,
  DEL_SCHED_LINE_NO,
  USER_RECEIVED_QTY,
  COUNT(*) OVER (PARTITION BY PURC_ORDER_ID,PURC_ORDER_LINE_NO ORDER BY DEL_SCHED_LINE_NO) RN
FROM
  PURC_LINE_DEL
ORDER BY
  DEL_SCHED_LINE_NO) V
WHERE
  PURC_ORDER_ID='146825'
  AND PURC_ORDER_LINE_NO=1
CONNECT BY PRIOR
  RN=RN-1
START WITH
  RN=1;

Plan hash value: 2107147172

-------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                 | Name          | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                          |               |     1 |    48 |       |  1193  (50)| 00:00:05 |
|   1 |  SORT AGGREGATE                           |               |     1 |    48 |       |            |          |
|*  2 |   FILTER                                  |               |       |       |       |            |          |
|*  3 |    CONNECT BY NO FILTERING WITH START-WITH|               |       |       |       |            |          |
|   4 |     VIEW                                  |               | 53851 |  2524K|       |   625   (3)| 00:00:03 |
|   5 |      SORT ORDER BY                        |               | 53851 |   999K|  1704K|   625   (3)| 00:00:03 |
|   6 |       WINDOW SORT                         |               | 53851 |   999K|  1704K|   625   (3)| 00:00:03 |
|   7 |        TABLE ACCESS FULL                  | PURC_LINE_DEL | 53851 |   999K|       |    57   (8)| 00:00:01 |
-------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("PURC_ORDER_ID"='146825' AND "PURC_ORDER_LINE_NO"=1)
   3 - access("RN"-1=PRIOR "RN")
       filter("RN"=1) 

The execution plan is unchanged.  Out of luck…

I remembered reading about a new analytic function introduced in Oracle Database 11.2.0.1, what was that new function… LISTAGG?  I am not a fan of that function name, Oracle should have named the function __CREATE_A_DARN_COMMA_SEPARATED_LIST_QUICKLY (a double underscore function, making it a challenge to find the function).  The LISTAGG function is described in the Oracle documentation library for 11.2.  Let’s give the LISTAGG function a try (the SQL statement is also now formatting the USER_RECEIVED_QTY so that the column values consistently show 2 decimal places):

SELECT
  LISTING
FROM
(SELECT
  LISTAGG(TO_CHAR(USER_RECEIVED_QTY,'999,990.00'), CHR(13)||CHR(10)) WITHIN GROUP (ORDER BY DEL_SCHED_LINE_NO) LISTING
FROM
  PURC_LINE_DEL
WHERE
  PURC_ORDER_ID='146825'
  AND PURC_ORDER_LINE_NO=1
GROUP BY
  PURC_ORDER_ID,
  PURC_ORDER_LINE_NO);

Plan hash value: 3148992471

----------------------------------------------------------------------------------------------
| Id  | Operation                    | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |               |     1 |    19 |     3   (0)| 00:00:01 |
|   1 |  SORT GROUP BY               |               |     1 |    19 |     3   (0)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID| PURC_LINE_DEL |     4 |    76 |     3   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | SYS_C0021790  |     4 |       |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("PURC_ORDER_ID"='146825' AND "PURC_ORDER_LINE_NO"=1) 

The calculated cost for the query dropped by 1 and the CPU component of that cost dropped to 0%, so that must be a good thing (the calculated cost can be misleading, just as we saw earlier that the estimated time can be misleading).  Without AUTOTRACE TRACEONLY EXPLAIN enabled, the query executes very quickly and produces output that looks similar to this:

LISTING
-----------

       1.00
       1.00
       2.00
       2.00
       2.00
       3.00
       3.00
       1.00
       3.00
       2.00
       0.00
       0.00
       0.00
       0.00
       0.00
       0.00
       0.00
       0.00
 

OK, let’s re-enable AUTOTRACE TRACEONLY EXPLAIN and try moving the WHERE clause so that we are able to package the complex portion of this query into a statically defined view:

SELECT
  LISTING
FROM
(SELECT
  PURC_ORDER_ID,
  PURC_ORDER_LINE_NO,
  LISTAGG(TO_CHAR(USER_RECEIVED_QTY,'999,990.00'), CHR(13)||CHR(10)) WITHIN GROUP (ORDER BY DEL_SCHED_LINE_NO) LISTING
FROM
  PURC_LINE_DEL
GROUP BY
  PURC_ORDER_ID,
  PURC_ORDER_LINE_NO) V
WHERE
  PURC_ORDER_ID='146825'
  AND PURC_ORDER_LINE_NO=1;

Plan hash value: 3148992471

----------------------------------------------------------------------------------------------
| Id  | Operation                    | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |               |     1 |    19 |     3   (0)| 00:00:01 |
|   1 |  SORT GROUP BY               |               |     1 |    19 |     3   (0)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID| PURC_LINE_DEL |     4 |    76 |     3   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | SYS_C0021790  |     4 |       |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("PURC_ORDER_ID"='146825' AND "PURC_ORDER_LINE_NO"=1) 

In the above, notice that the  predicates for the PURC_ORDER_ID and PURC_ORDER_LINE_NO columns were pushed into the inline view, or more correctly, the inline view was completely eliminated by the query optimizer (notice that there is no line in the execution plan showing VIEW as we saw in the previous examples above).

OK, so Oracle’s query optimizer worked as I had hoped with the inline view, but what about a statically defined view – would it also execute the SQL statement efficiently?  Let’s create the view:

CREATE VIEW TEST_LA AS
SELECT
  PURC_ORDER_ID,
  PURC_ORDER_LINE_NO,
  LISTAGG(TO_CHAR(USER_RECEIVED_QTY,'999,990.00'), CHR(13)||CHR(10)) WITHIN GROUP (ORDER BY DEL_SCHED_LINE_NO) LISTING
FROM
  PURC_LINE_DEL
GROUP BY
  PURC_ORDER_ID,
  PURC_ORDER_LINE_NO; 

Now let’s try the query with AUTOTRACE TRACEONLY EXPLAIN enabled:

SELECT
  LISTING
FROM
  TEST_LA
WHERE
  PURC_ORDER_ID='146825'
  AND PURC_ORDER_LINE_NO=1;

Plan hash value: 3148992471

----------------------------------------------------------------------------------------------
| Id  | Operation                    | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |               |     1 |    19 |     3   (0)| 00:00:01 |
|   1 |  SORT GROUP BY               |               |     1 |    19 |     3   (0)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID| PURC_LINE_DEL |     4 |    76 |     3   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | SYS_C0021790  |     4 |       |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("PURC_ORDER_ID"='146825' AND "PURC_ORDER_LINE_NO"=1) 

The execution plan above shows the same cost (3) that we saw earlier, and once again the view was eliminated by Oracle’s query optimizer (as indicated by the absence of a VIEW line in the execution plan), allowing the query to execute quickly since the predicates would be applied early in the query execution.

So, Oracle Database 11.2.0.1 and above provide an efficient way to retrieve the information that we need in a comma (or CRLF) delimited list for the report.  When the SQL statement is added to the report (defined as an “extended query” in the ERP package), the SQL statement looks like the following (the reporting package does not permit line breaks in the SQL statement):

SELECT (LISTAGG(TO_CHAR(USER_RECEIVED_QTY,'999,990.00'), CHR(13)||CHR(10)) WITHIN GROUP (ORDER BY DEL_SCHED_LINE_NO)) FROM PURC_LINE_DEL WHERE PURC_ORDER_ID= : PO_ID AND PURC_ORDER_LINE_NO= :LN_LINE_NO GROUP BY PURC_ORDER_ID , PURC_ORDER_LINE_NO

Oddly enough, by enclosing the LISTAGG analytic funtion in the SQL statement with ( ), the reporting package did not require me to use a statically defined view for the complex portion of the query.  But even if the statically defined view was required, as shown above, the query would still have executed efficiently.








Follow

Get every new post delivered to your Inbox.

Join 142 other followers