Everything or Nothing in SQL

23 05 2013

May 23, 2013

The following question recently came through an ERP mailing list (significantly rephrased):

I would like to use the Microsoft Query tool in Microsoft Excel to extract records from the ERP database.  I would like the list of parent records to be retrieved into Excel when all of the specified child records belong to the parent.

That sounds as though it would be an easy request.  The OP in the ERP mailing list later followed up with the following:

“If I may ask for another variation to pick your brain….

How would I rewrite this to show ONLY the work order(s) that have all of the 3 Parts, but instead of showing all of them, show only the ones that don’t include a specific other part?

For example, Include all the work order retrieved from your query, but then subtract those work orders that don’t contain a part or parts in another separate list.

I.E. All order containing parts (‘147555′,’227643′,’155328′) but not containing ANY of these parts (‘12345′, ‘456789’)”

In the above follow up, the term “work order(s)” refer to the “parent records” that was previously mentioned, and the term “parts” refer to the “child records” that was previously mentioned.  Below is a graphical mock up, with the light-green colored rectangles representing the “parts”, and the white colored rectangle at the bottom left representing the “work order”.  The other white colored rectangles represent subassemblies that belong to the “work order”, and the cyan colored rectangles represent the operations that utilize the “parts”.  Through normalization, the white colored rectangles represent rows in one database table (WORK_ORDER – the “parent” table), the cyan colored rectangles represent rows in another table (OPERATION – the “child” table), and the light-green colored rectangles represent rows in a third table (REQUIREMENT – the “grandchild” table).

EverythingOrNothing

The first request still seems to be easy to solve, even though there are three tables that are potentially involved.  Let’s simplify the problem further and just focus on the “grandchild” table.  Let’s create a small version of that table and populate the table with some sample rows:

CREATE TABLE T1 (
  WORKORDER_TYPE VARCHAR2(1),
  WORKORDER_BASE_ID VARCHAR2(15),
  WORKORDER_LOT_ID VARCHAR2(3),
  WORKORDER_SUB_ID VARCHAR2(3),
  PIECE_NO NUMBER,
  PART_ID VARCHAR2(15),
  QTY NUMBER(12,4));

INSERT INTO
  T1
SELECT
  'W' WORKORDER_TYPE,
  TRIM(TO_CHAR(TRUNC(ROWNUM / 100 + 1)*1000,'00000')) || 'W' WORKORDER_BASE_ID,
  TO_CHAR(MOD(TRUNC(ROWNUM / 25 + 1), 999) + 1) WORKORDER_LOT_ID,
  TO_CHAR(TRUNC(MOD(ROWNUM - 1, 25) / 5) + 1) WORKORDER_SUB_ID,
  MOD(ROWNUM - 1, 25) + 1 PIECE_NO,
  CHR(65 + MOD(ROWNUM - 1, 26)) || CHR(65 + MOD(ROWNUM - 1, 15)) || MOD(ROWNUM - 1, 10)  PART_ID,
  TRUNC(DBMS_RANDOM.VALUE*100) + 1 QTY
FROM
  DUAL
CONNECT BY
  LEVEL<=1000;

In the above, the combination of the WORKORDER_BASE_ID column and the WORKORDER_LOT_ID column represent the “work order” that was mentioned by the OP.  The WORKORDER_SUB_ID column identifies the subassembly number (answering the question of to which of the white colored rectangles does the “part” belong).  The PIECE_NO column is used to uniquely identify the rows that belong to a specific subassembly number (that is the final column in the REQUIREMENT table’s primary key).  The PART_ID column is the specific “part” that is attached to a specific “work order”.  The QTY column indicates the number of the “part” that is required by the “work order” – with the sample data, this is the only column that will differ from one execution of the table creation script to the next execution.

The sample data is a bit too consistent, however, we will ignore any issues that consistency may present for now.  Let’s execute a simple SQL statement that indicates how many from the following “parts” list appear in the “work orders”: MI8, NJ9, KG6 (note the use of DISTINCT – a single PART_ID could appear on more than one subassembly in a single WORKORDER_BASE_ID, WORKORDER_LOT_ID combination, and we do not want to repeatedly count those “parts”):

SELECT
  T.WORKORDER_TYPE,
  T.WORKORDER_BASE_ID,
  T.WORKORDER_LOT_ID,
  T.WORKORDER_SUB_ID,
  T.PART_ID,
  COUNT(DISTINCT T.PART_ID) OVER (PARTITION BY T.WORKORDER_TYPE, T.WORKORDER_BASE_ID, T.WORKORDER_LOT_ID) AS C
FROM
  T1 TWHERE  T.PART_ID IN ('MI8','NJ9','KG6');

W WORKORDER_BASE_ WOR WOR PART_ID                  C
- --------------- --- --- --------------- ----------
W 01000W          3   3   KG6                      3
W 01000W          3   3   MI8                      3
W 01000W          3   3   NJ9                      3
W 05000W          19  1   KG6                      3
W 05000W          19  1   MI8                      3
W 05000W          19  1   NJ9                      3
W 09000W          34  4   KG6                      3
W 09000W          34  4   MI8                      3
W 09000W          34  4   NJ9                      3

Only 9 rows returned, with three different WORKORDER_BASE_ID, WORKORDER_LOT_ID combinations.  For a moment, let’s ignore that the three “parts” appear in all three “work orders”.  If we wanted a list of those “work orders” where _all_ three of the “parts” are present, and not just _one or more_ of the three “parts”, we could simply slide the above SQL statement into an inline view and specify that our C generated column must be equal to (or greater than – the use of greater than will be explained later) 3:

SELECT
  WORKORDER_TYPE,
  WORKORDER_BASE_ID,
  WORKORDER_LOT_ID,
  WORKORDER_SUB_ID,
  PART_ID
FROM
  (SELECT
    T.WORKORDER_TYPE,
    T.WORKORDER_BASE_ID,
    T.WORKORDER_LOT_ID,
    T.WORKORDER_SUB_ID,
    T.PART_ID,
    COUNT(DISTINCT T.PART_ID) OVER (PARTITION BY T.WORKORDER_TYPE, T.WORKORDER_BASE_ID, T.WORKORDER_LOT_ID) AS C
  FROM
    T1 T
  WHERE
    T.PART_ID IN ('MI8','NJ9','KG6'))
WHERE
  C>=3
ORDER BY
  WORKORDER_BASE_ID,
  WORKORDER_LOT_ID,
  WORKORDER_SUB_ID;

W WORKORDER_BASE_ WOR WOR PART_ID
- --------------- --- --- -------
W 01000W          3   3   KG6
W 01000W          3   3   MI8
W 01000W          3   3   NJ9
W 05000W          19  1   KG6
W 05000W          19  1   MI8
W 05000W          19  1   NJ9
W 09000W          34  4   KG6
W 09000W          34  4   MI8
W 09000W          34  4   NJ9

While trying to determine the OP’s business case for the requests, I wondered if the OP was attempting to replace a specific “part” with another “part” within the ERP package, and simply required a report to understand the progress of that manual “part” replacement procedure in the ERP package.  Let’s move forward with this assumption by replacing a row containing a specific PART_ID value with a new PART_ID value, but only for a single “work order”:

DELETE FROM
  T1
WHERE
  WORKORDER_TYPE='W'
  AND WORKORDER_BASE_ID='09000W'
  AND WORKORDER_LOT_ID='34'
  AND PART_ID='NJ9';

1 row deleted.

INSERT INTO T1 VALUES (
  'W',
  '09000W',
  '34',
  '4',
  10,
  'REPLACEMENT-NJ9',
  10);

1 row created.

Let’s check the result of the SQL statement that we were in the process of developing:

SELECT
  WORKORDER_TYPE,
  WORKORDER_BASE_ID,
  WORKORDER_LOT_ID,
  WORKORDER_SUB_ID,
  PART_ID
FROM
  (SELECT
    T.WORKORDER_TYPE,
    T.WORKORDER_BASE_ID,
    T.WORKORDER_LOT_ID,
    T.WORKORDER_SUB_ID,
    T.PART_ID,
    COUNT(DISTINCT T.PART_ID) OVER (PARTITION BY T.WORKORDER_TYPE, T.WORKORDER_BASE_ID, T.WORKORDER_LOT_ID) AS C
  FROM
    T1 T
  WHERE
    T.PART_ID IN ('MI8','NJ9','KG6'))
WHERE
  C>=3
ORDER BY
  WORKORDER_BASE_ID,
  WORKORDER_LOT_ID,
  WORKORDER_SUB_ID;

W WORKORDER_BASE_ WOR WOR PART_ID
- --------------- --- --- -------
W 01000W          3   3   KG6
W 01000W          3   3   MI8
W 01000W          3   3   NJ9
W 05000W          19  1   KG6
W 05000W          19  1   MI8
W 05000W          19  1   NJ9

Now, there are only six rows returned from two “work orders” – the third “work order” is no longer included in the query results.

With the use of the C >= n syntax, we are able to find “work orders” with at least n number of the specified “parts”.  So, if we were interested in finding cases where at least two of the three listed “parts” are attached to a “work order”, we just need to change the 3 to 2:

SELECT
  WORKORDER_TYPE,
  WORKORDER_BASE_ID,
  WORKORDER_LOT_ID,
  WORKORDER_SUB_ID,
  PART_ID
FROM
  (SELECT
    T.WORKORDER_TYPE,
    T.WORKORDER_BASE_ID,
    T.WORKORDER_LOT_ID,
    T.WORKORDER_SUB_ID,
    T.PART_ID,
    COUNT(DISTINCT T.PART_ID) OVER (PARTITION BY T.WORKORDER_TYPE, T.WORKORDER_BASE_ID, T.WORKORDER_LOT_ID) AS C
  FROM
    T1 T
  WHERE
    T.PART_ID IN ('MI8','NJ9','KG6'))
WHERE
  C>=2
ORDER BY
  WORKORDER_BASE_ID,
  WORKORDER_LOT_ID,
  WORKORDER_SUB_ID;

W WORKORDER_BASE_ WOR WOR PART_ID
- --------------- --- --- -------
W 01000W          3   3   KG6
W 01000W          3   3   MI8
W 01000W          3   3   NJ9
W 05000W          19  1   KG6
W 05000W          19  1   NJ9
W 05000W          19  1   MI8
W 09000W          34  4   KG6
W 09000W          34  4   MI8

The third “work order” is again included in the query resultset.

Other than putting the results into Microsoft Excel, the sample query satisfies the first request of the OP.  If we were not interested in returning the WORKORDER_SUB_ID and PART_ID columns, we could have simply used the following SQL statement:

SELECT
  T.WORKORDER_TYPE,
  T.WORKORDER_BASE_ID,
  T.WORKORDER_LOT_ID
FROM
  T1 T
WHERE
  T.PART_ID IN ('MI8','NJ9','KG6')
GROUP BY
  T.WORKORDER_TYPE,
  T.WORKORDER_BASE_ID,
  T.WORKORDER_LOT_ID
HAVING
  COUNT(*)>=3;

W WORKORDER_BASE_ WOR
- --------------- ---
W 05000W          19
W 01000W          3

Now to work on the second request, only retrieving the “work orders” when all of a specified list of “parts” are attached to the “work order” and none of another list of “parts” are attached to the “work order”.  As a starting point, we will re-use the inline view from the previous solution.  The IN list will be modified to include the “parts” that must be attached to the “work order”, as well as the “parts” that must not be attached to the “work order”.  The COUNT analytic function must be modified to include either a DECODE or CASE structure to divide the “parts” into the “must” and “must not” lists so that a count of each may be obtained:

SELECT
  T.WORKORDER_TYPE,
  T.WORKORDER_BASE_ID,
  T.WORKORDER_LOT_ID,
  T.WORKORDER_SUB_ID,
  T.PART_ID,
  COUNT(DISTINCT (CASE WHEN T.PART_ID IN ('MI8','NJ9','KG6') THEN T.PART_ID ELSE NULL END)) OVER (PARTITION BY T.WORKORDER_TYPE, T.WORKORDER_BASE_ID, T.WORKORDER_LOT_ID) AS C,
  COUNT(DISTINCT (CASE WHEN T.PART_ID IN ('REPLACEMENT-NJ9') THEN T.PART_ID ELSE NULL END)) OVER (PARTITION BY T.WORKORDER_TYPE, T.WORKORDER_BASE_ID, T.WORKORDER_LOT_ID) AS NC
FROM
  T1 T
WHERE
  T.PART_ID IN ('MI8','KG6',   'REPLACEMENT-NJ9');

W WORKORDER_BASE_ WOR WOR PART_ID                  C         NC
- --------------- --- --- --------------- ---------- ----------
W 01000W          3   3   KG6                      2          0
W 01000W          3   3   MI8                      2          0
W 05000W          19  1   KG6                      2          0
W 05000W          19  1   MI8                      2          0
W 09000W          34  4   KG6                      2          1
W 09000W          34  4   MI8                      2          1
W 09000W          34  4   REPLACEMENT-NJ9          2          1

To satisfy the OP’s second request, we are interested in those rows where the C column value is (at least) 2 and the NC column value is 0.  Just as before, we will slide the above SQL statement into an inline view and add the specifications for the C and NC generated columns:

SELECT
  WORKORDER_TYPE,
  WORKORDER_BASE_ID,
  WORKORDER_LOT_ID,
  WORKORDER_SUB_ID,
  PART_ID
FROM
  (SELECT
    T.WORKORDER_TYPE,
    T.WORKORDER_BASE_ID,
    T.WORKORDER_LOT_ID,
    T.WORKORDER_SUB_ID,
    T.PART_ID,
    COUNT(DISTINCT (CASE WHEN T.PART_ID IN ('MI8','NJ9','KG6') THEN T.PART_ID ELSE NULL END)) OVER (PARTITION BY T.WORKORDER_TYPE, T.WORKORDER_BASE_ID, T.WORKORDER_LOT_ID) AS C,
    COUNT(DISTINCT (CASE WHEN T.PART_ID IN ('REPLACEMENT-NJ9') THEN T.PART_ID ELSE NULL END)) OVER (PARTITION BY T.WORKORDER_TYPE, T.WORKORDER_BASE_ID, T.WORKORDER_LOT_ID) AS NC
  FROM
    T1 T
  WHERE
    T.PART_ID IN ('MI8','KG6',   'REPLACEMENT-NJ9'))
WHERE
  C>=2
  AND NC=0
ORDER BY
  WORKORDER_BASE_ID,
  WORKORDER_LOT_ID,
  WORKORDER_SUB_ID;

W WORKORDER_BASE_ WOR WOR PART_ID
- --------------- --- --- -------
W 01000W          3   3   KG6
W 01000W          3   3   MI8
W 05000W          19  1   KG6
W 05000W          19  1   MI8

The above shows those “work orders” where the NJ9 “part” had not yet been replaced with the REPLACEMENT-NJ9 “part”.  If desired, we are also able to easily modify the OP’s second request to see the “work orders” where the “part” was already replaced just by changing NC=0 to NC=1:

SELECT
  WORKORDER_TYPE,
  WORKORDER_BASE_ID,
  WORKORDER_LOT_ID,
  WORKORDER_SUB_ID,
  PART_ID
FROM
  (SELECT
    T.WORKORDER_TYPE,
    T.WORKORDER_BASE_ID,
    T.WORKORDER_LOT_ID,
    T.WORKORDER_SUB_ID,
    T.PART_ID,
    COUNT(DISTINCT (CASE WHEN T.PART_ID IN ('MI8','NJ9','KG6') THEN T.PART_ID ELSE NULL END)) OVER (PARTITION BY T.WORKORDER_TYPE, T.WORKORDER_BASE_ID, T.WORKORDER_LOT_ID) AS C,
    COUNT(DISTINCT (CASE WHEN T.PART_ID IN ('REPLACEMENT-NJ9') THEN T.PART_ID ELSE NULL END)) OVER (PARTITION BY T.WORKORDER_TYPE, T.WORKORDER_BASE_ID, T.WORKORDER_LOT_ID) AS NC
  FROM
    T1 T
  WHERE
    T.PART_ID IN ('MI8','KG6',   'REPLACEMENT-NJ9'))
WHERE
  C>=2
  AND NC=1
ORDER BY
  WORKORDER_BASE_ID,
  WORKORDER_LOT_ID,
  WORKORDER_SUB_ID;

W WORKORDER_BASE_ WOR WOR PART_ID
- --------------- --- --- ---------------
W 09000W          34  4   KG6
W 09000W          34  4   REPLACEMENT-NJ9
W 09000W          34  4   MI8

What other related information are we able to return?  What about all “parts” (the full bill of material) for each of the work orders that match the OP’s second request.  That modification is simple, just slide the majority of the above SQL statement into a subquery (found in the first WHERE clause):

SELECT
  *
FROM
  T1
WHERE
  (WORKORDER_TYPE,
   WORKORDER_BASE_ID,
   WORKORDER_LOT_ID) IN
    (SELECT
      WORKORDER_TYPE,
      WORKORDER_BASE_ID,
      WORKORDER_LOT_ID
    FROM
      (SELECT
        T.WORKORDER_TYPE,
        T.WORKORDER_BASE_ID,
        T.WORKORDER_LOT_ID,
        T.WORKORDER_SUB_ID,
        T.PART_ID,
        COUNT(DISTINCT (CASE WHEN T.PART_ID IN ('MI8','NJ9','KG6') THEN T.PART_ID ELSE NULL END)) OVER (PARTITION BY T.WORKORDER_TYPE, T.WORKORDER_BASE_ID, T.WORKORDER_LOT_ID) AS C,
        COUNT(DISTINCT (CASE WHEN T.PART_ID IN ('REPLACEMENT-NJ9') THEN T.PART_ID ELSE NULL END)) OVER (PARTITION BY T.WORKORDER_TYPE, T.WORKORDER_BASE_ID, T.WORKORDER_LOT_ID) AS NC
      FROM
        T1 T
      WHERE
        T.PART_ID IN ('MI8','KG6',   'REPLACEMENT-NJ9'))
    WHERE
      C>=2
      AND NC=0)
ORDER BY
  WORKORDER_BASE_ID,
  WORKORDER_LOT_ID,
  WORKORDER_SUB_ID,
  PIECE_NO;

W WORKORDER_BASE_ WOR WOR   PIECE_NO PART_ID                QTY
- --------------- --- --- ---------- --------------- ----------
W 01000W          3   1            1 ZK5                     48
W 01000W          3   1            2 AL6                     60
W 01000W          3   1            3 BM7                     42
W 01000W          3   1            4 CN8                     26
W 01000W          3   1            5 DO9                     49
W 01000W          3   2            6 EA0                     66
W 01000W          3   2            7 FB1                     39
W 01000W          3   2            8 GC2                     57
W 01000W          3   2            9 HD3                     42
W 01000W          3   2           10 IE4                     52
W 01000W          3   3           11 JF5                     16
W 01000W          3   3           12 KG6                     61
W 01000W          3   3           13 LH7                     39
W 01000W          3   3           14 MI8                     75
W 01000W          3   3           15 NJ9                     94
W 01000W          3   4           16 OK0                     67
W 01000W          3   4           17 PL1                     12
W 01000W          3   4           18 QM2                     41
W 01000W          3   4           19 RN3                     97
W 01000W          3   4           20 SO4                     15
W 01000W          3   5           21 TA5                     54
W 01000W          3   5           22 UB6                     59
W 01000W          3   5           23 VC7                     23
W 01000W          3   5           24 WD8                      7
W 01000W          3   5           25 YJ4                     22
W 05000W          19  1            1 JF5                     31
W 05000W          19  1            2 KG6                     42
W 05000W          19  1            3 LH7                     37
W 05000W          19  1            4 MI8                     85
W 05000W          19  1            5 NJ9                    100
W 05000W          19  2            6 OK0                     43
W 05000W          19  2            7 PL1                     58
W 05000W          19  2            8 QM2                     76
W 05000W          19  2            9 RN3                     66
W 05000W          19  2           10 SO4                     75
W 05000W          19  3           11 TA5                     50
W 05000W          19  3           12 UB6                     55
W 05000W          19  3           13 VC7                     18
W 05000W          19  3           14 WD8                     33
W 05000W          19  3           15 XE9                      8
W 05000W          19  4           16 YF0                     86
W 05000W          19  4           17 ZG1                      7
W 05000W          19  4           18 AH2                     25
W 05000W          19  4           19 BI3                     38
W 05000W          19  4           20 CJ4                     34
W 05000W          19  5           21 DK5                     88
W 05000W          19  5           22 EL6                     91
W 05000W          19  5           23 FM7                     52
W 05000W          19  5           24 GN8                     71
W 05000W          19  5           25 IE4                     55

Just as before, we are able to see all “parts” for the “work orders” that already have the replacement part, just by changing the NC=0 to NC=1:

SELECT
  *
FROM
  T1
WHERE
  (WORKORDER_TYPE,
   WORKORDER_BASE_ID,
   WORKORDER_LOT_ID) IN
    (SELECT
      WORKORDER_TYPE,
      WORKORDER_BASE_ID,
      WORKORDER_LOT_ID
    FROM
      (SELECT
        T.WORKORDER_TYPE,
        T.WORKORDER_BASE_ID,
        T.WORKORDER_LOT_ID,
        T.WORKORDER_SUB_ID,
        T.PART_ID,
        COUNT(DISTINCT (CASE WHEN T.PART_ID IN ('MI8','NJ9','KG6') THEN T.PART_ID ELSE NULL END)) OVER (PARTITION BY T.WORKORDER_TYPE, T.WORKORDER_BASE_ID, T.WORKORDER_LOT_ID) AS C,
        COUNT(DISTINCT (CASE WHEN T.PART_ID IN ('REPLACEMENT-NJ9') THEN T.PART_ID ELSE NULL END)) OVER (PARTITION BY T.WORKORDER_TYPE, T.WORKORDER_BASE_ID, T.WORKORDER_LOT_ID) AS NC
      FROM
        T1 T
      WHERE
        T.PART_ID IN ('MI8','KG6',   'REPLACEMENT-NJ9'))
    WHERE
      C>=2
      AND NC=1)
ORDER BY
  WORKORDER_BASE_ID,
  WORKORDER_LOT_ID,
  WORKORDER_SUB_ID,
  PIECE_NO;

W WORKORDER_BASE_ WOR WOR   PIECE_NO PART_ID                QTY
- --------------- --- --- ---------- --------------- ----------
W 09000W          34  1            1 UF0                     80
W 09000W          34  1            2 VG1                     75
W 09000W          34  1            3 WH2                     75
W 09000W          34  1            4 XI3                     77
W 09000W          34  1            5 YJ4                     46
W 09000W          34  2            6 ZK5                     57
W 09000W          34  2            7 AL6                     64
W 09000W          34  2            8 BM7                     14
W 09000W          34  2            9 CN8                     30
W 09000W          34  2           10 DO9                     62
W 09000W          34  3           11 EA0                     45
W 09000W          34  3           12 FB1                     68
W 09000W          34  3           13 GC2                     25
W 09000W          34  3           14 HD3                     73
W 09000W          34  3           15 IE4                     32
W 09000W          34  4           10 REPLACEMENT-NJ9         10
W 09000W          34  4           16 JF5                     49
W 09000W          34  4           17 KG6                     23
W 09000W          34  4           18 LH7                     60
W 09000W          34  4           19 MI8                     75
W 09000W          34  5           21 OK0                     91
W 09000W          34  5           22 PL1                     66
W 09000W          34  5           23 QM2                     45
W 09000W          34  5           24 RN3                     49
W 09000W          34  5           25 TE9                      6

Using the sample table and data provided, are there any other solutions to the OP’s requests?





Grouping Data Sets by Week Number of the Month

1 05 2013

May 1, 2013

I saw a decent SQL brain teaser this morning in the comp.databases.oracle.server Usenet group.  The OP in the message thread is attempting to summarize data in one of his tables, with the summarizations broken down by month and then the week within that month.  Increasing the challenge, the OP required that the dates defining the week start and end dates fall within the month’s data set:

“Today is 4/30/2013.  The we_end [last day of the week containing April 30, 2013] value is 5/4/2013.  What I am really looking for is the weeks within the calendar month.  So, the first week for April 2013 should show 4/1 – 4/6.  And the last week should show 4/28 – 4/30″

How would you solve that particular problem?  Is there an Oracle built-in function that provides the grouping requested by the OP?

-

-

-

-

There are a couple different methods to solve the problem posed by the OP.  One method uses a virtual lookup table to determine the start date and end date of each week in the month, using the restriction that the start date of each week or the end date of each week might need to be adjusted to fall within the same month.  Interestingly, with the restrictions specified by the OP, there are 62 weeks this year.

As with all SQL statements, there is must be a sensible starting point.  First, let’s determine the start and end dates of each month in 2013.  The month start date is easy to determine, while outputting the month end date is a little more challenging – that date is one day less than the start of the next month:

SELECT 
  ADD_MONTHS(TO_DATE('01-01-2013','MM-DD-YYYY'),(LEVEL-1)) MONTH_START, 
  ADD_MONTHS(TO_DATE('01-01-2013','MM-DD-YYYY'),LEVEL)-1 MONTH_END 
FROM 
  DUAL 
CONNECT BY 
  LEVEL<=12; 

MONTH_STA MONTH_END 
--------- --------- 
01-JAN-13 31-JAN-13 
01-FEB-13 28-FEB-13 
01-MAR-13 31-MAR-13 
01-APR-13 30-APR-13 
01-MAY-13 31-MAY-13 
01-JUN-13 30-JUN-13 
01-JUL-13 31-JUL-13 
01-AUG-13 31-AUG-13 
01-SEP-13 30-SEP-13 
01-OCT-13 31-OCT-13 
01-NOV-13 30-NOV-13 
01-DEC-13 31-DEC-13

There are a couple of approaches that could be taken for the next step, finding the start of the first, second, third, fourth, fifth, and potentially sixth week in each of those months.  I am interested in determining the start of the second week – we are able to accomplish that task by using the NEXT_DAY function to find the next Sunday after the first of the month.  The end of the first week will also be calculated, even though it is not used in later calculations.  Note that this solution is subject to problems depending on the NLS settings on the client (and whether or not Sunday is considered the first day of the week):

SELECT 
  ADD_MONTHS(TO_DATE('01-01-2013','MM-DD-YYYY'),(LEVEL-1)) MONTH_START, 
  ADD_MONTHS(TO_DATE('01-01-2013','MM-DD-YYYY'),LEVEL)-1 MONTH_END, 
  NEXT_DAY(ADD_MONTHS(TO_DATE('01-01-2013','MM-DD-YYYY'),(LEVEL-1))-1,'SATURDAY') END_FIRST_WEEK,
  NEXT_DAY(ADD_MONTHS(TO_DATE('01-01-2013','MM-DD-YYYY'),(LEVEL-1)),'SUNDAY') START_SECOND_WEEK
 FROM 
  DUAL 
CONNECT BY 
  LEVEL<=12; 

MONTH_STA MONTH_END END_FIRST START_SEC 
--------- --------- --------- --------- 
01-JAN-13 31-JAN-13 05-JAN-13 06-JAN-13 
01-FEB-13 28-FEB-13 02-FEB-13 03-FEB-13 
01-MAR-13 31-MAR-13 02-MAR-13 03-MAR-13 
01-APR-13 30-APR-13 06-APR-13 07-APR-13 
01-MAY-13 31-MAY-13 04-MAY-13 05-MAY-13 
01-JUN-13 30-JUN-13 01-JUN-13 02-JUN-13 
01-JUL-13 31-JUL-13 06-JUL-13 07-JUL-13 
01-AUG-13 31-AUG-13 03-AUG-13 04-AUG-13 
01-SEP-13 30-SEP-13 07-SEP-13 08-SEP-13 
01-OCT-13 31-OCT-13 05-OCT-13 06-OCT-13 
01-NOV-13 30-NOV-13 02-NOV-13 03-NOV-13 
01-DEC-13 31-DEC-13 07-DEC-13 08-DEC-13

The above may appear to be a number of related, but useless dates.  However, that is not the case.  Next, we will need a method to count through as many as six weeks per month.  This should work:

SELECT 
  LEVEL WEEK_NUMBER 
FROM 
  DUAL 
CONNECT BY 
  LEVEL<=6; 

WEEK_NUMBER 
----------- 
          1 
          2 
          3 
          4 
          5 
          6

Next, a Cartesian join will be created between the two row sources to permit stepping through each month, and each week within that month (we will refine the output later… note that this unrefined output provides a clue for a second method of solving the problem posed by the OP):

SELECT 
  MONTHS.MONTH_START, 
  WN.WEEK_NUMBER, 
  MONTHS.START_SECOND_WEEK + (WN.WEEK_NUMBER-2)*7 START_WEEK, 
  MONTHS.START_SECOND_WEEK + (WN.WEEK_NUMBER-1)*7 - 1 END_WEEK 
FROM 
  (SELECT 
    ADD_MONTHS(TO_DATE('01-01-2013','MM-DD-YYYY'),(LEVEL-1)) MONTH_START, 
    ADD_MONTHS(TO_DATE('01-01-2013','MM-DD-YYYY'),LEVEL)-1 MONTH_END, 
    NEXT_DAY(ADD_MONTHS(TO_DATE('01-01-2013','MM-DD-YYYY'),(LEVEL-1))-1,'SATURDAY') END_FIRST_WEEK,
    NEXT_DAY(ADD_MONTHS(TO_DATE('01-01-2013','MM-DD-YYYY'),(LEVEL-1)),'SUNDAY') START_SECOND_WEEK
   FROM 
    DUAL 
  CONNECT BY 
    LEVEL<=12) MONTHS, 
  (SELECT 
    LEVEL WEEK_NUMBER 
  FROM 
    DUAL 
  CONNECT BY 
    LEVEL<=6) WN 
WHERE 
  (MONTHS.START_SECOND_WEEK + (WN.WEEK_NUMBER-2)*7)<=MONTHS.MONTH_END; 

MONTH_STA WEEK_NUMBER START_WEE END_WEEK 
--------- ----------- --------- --------- 
01-JAN-13           1 30-DEC-12 05-JAN-13 
01-JAN-13           2 06-JAN-13 12-JAN-13 
01-JAN-13           3 13-JAN-13 19-JAN-13 
01-JAN-13           4 20-JAN-13 26-JAN-13 
01-JAN-13           5 27-JAN-13 02-FEB-13 
01-FEB-13           1 27-JAN-13 02-FEB-13 
01-FEB-13           2 03-FEB-13 09-FEB-13 
01-FEB-13           3 10-FEB-13 16-FEB-13 
01-FEB-13           4 17-FEB-13 23-FEB-13 
01-FEB-13           5 24-FEB-13 02-MAR-13 
01-MAR-13           1 24-FEB-13 02-MAR-13 
01-MAR-13           2 03-MAR-13 09-MAR-13 
01-MAR-13           3 10-MAR-13 16-MAR-13 
01-MAR-13           4 17-MAR-13 23-MAR-13 
01-MAR-13           5 24-MAR-13 30-MAR-13 
01-MAR-13           6 31-MAR-13 06-APR-13 
01-APR-13           1 31-MAR-13 06-APR-13 
01-APR-13           2 07-APR-13 13-APR-13 
01-APR-13           3 14-APR-13 20-APR-13 
01-APR-13           4 21-APR-13 27-APR-13 
01-APR-13           5 28-APR-13 04-MAY-13 
01-MAY-13           1 28-APR-13 04-MAY-13 
01-MAY-13           2 05-MAY-13 11-MAY-13 
01-MAY-13           3 12-MAY-13 18-MAY-13 
01-MAY-13           4 19-MAY-13 25-MAY-13 
01-MAY-13           5 26-MAY-13 01-JUN-13 
01-JUN-13           1 26-MAY-13 01-JUN-13 
...

The above output has a slight problem – the Sunday in the first week of the month may not be in the month of interest; additionally, the Saturday in the last week of the month may not be in the month of interest.  We may overcome those problems with the help of the DECODE function (the CASE expression may be used instead, if so desired):

SELECT 
  MONTHS.MONTH_START, 
  WN.WEEK_NUMBER, 
  DECODE(WN.WEEK_NUMBER, 1, MONTHS.MONTH_START, MONTHS.START_SECOND_WEEK + (WN.WEEK_NUMBER-2)*7) START_WEEK,
  DECODE(SIGN(MONTHS.MONTH_END - (MONTHS.START_SECOND_WEEK + (WN.WEEK_NUMBER-1)*7)), -1, MONTHS.MONTH_END, MONTHS.START_SECOND_WEEK + (WN.WEEK_NUMBER-1)*7 - 1) END_WEEK
FROM 
  (SELECT 
    ADD_MONTHS(TO_DATE('01-01-2013','MM-DD-YYYY'),(LEVEL-1)) MONTH_START, 
    ADD_MONTHS(TO_DATE('01-01-2013','MM-DD-YYYY'),LEVEL)-1 MONTH_END, 
    NEXT_DAY(ADD_MONTHS(TO_DATE('01-01-2013','MM-DD-YYYY'),(LEVEL-1))-1,'SATURDAY') END_FIRST_WEEK,
     NEXT_DAY(ADD_MONTHS(TO_DATE('01-01-2013','MM-DD-YYYY'),(LEVEL-1)),'SUNDAY') START_SECOND_WEEK
   FROM 
    DUAL 
  CONNECT BY 
    LEVEL<=12) MONTHS, 
  (SELECT 
    LEVEL WEEK_NUMBER 
  FROM 
    DUAL 
  CONNECT BY 
    LEVEL<=6) WN 
WHERE 
  (MONTHS.START_SECOND_WEEK + (WN.WEEK_NUMBER-2)*7)<=MONTHS.MONTH_END; 

MONTH_STA WEEK_NUMBER START_WEE END_WEEK 
--------- ----------- --------- --------- 
01-JAN-13           1 01-JAN-13 05-JAN-13 
01-JAN-13           2 06-JAN-13 12-JAN-13 
01-JAN-13           3 13-JAN-13 19-JAN-13 
01-JAN-13           4 20-JAN-13 26-JAN-13 
01-JAN-13           5 27-JAN-13 31-JAN-13 
01-FEB-13           1 01-FEB-13 02-FEB-13 
01-FEB-13           2 03-FEB-13 09-FEB-13 
01-FEB-13           3 10-FEB-13 16-FEB-13 
01-FEB-13           4 17-FEB-13 23-FEB-13 
01-FEB-13           5 24-FEB-13 28-FEB-13 
01-MAR-13           1 01-MAR-13 02-MAR-13 
01-MAR-13           2 03-MAR-13 09-MAR-13 
01-MAR-13           3 10-MAR-13 16-MAR-13 
01-MAR-13           4 17-MAR-13 23-MAR-13 
01-MAR-13           5 24-MAR-13 30-MAR-13 
01-MAR-13           6 31-MAR-13 31-MAR-13 
01-APR-13           1 01-APR-13 06-APR-13 
01-APR-13           2 07-APR-13 13-APR-13 
01-APR-13           3 14-APR-13 20-APR-13 
01-APR-13           4 21-APR-13 27-APR-13 
01-APR-13           5 28-APR-13 30-APR-13 
01-MAY-13           1 01-MAY-13 04-MAY-13 
01-MAY-13           2 05-MAY-13 11-MAY-13 
01-MAY-13           3 12-MAY-13 18-MAY-13 
01-MAY-13           4 19-MAY-13 25-MAY-13 
01-MAY-13           5 26-MAY-13 31-MAY-13 
... 
01-DEC-13           5 29-DEC-13 31-DEC-13 

62 rows selected.

Yes, this solution added 10 extra weeks to the year.  :-)

It is easy to extend this solution to cover more than just the year 2013.  For example, the following modification sets the starting period to January 2000, and runs for 20 years:

SELECT 
  MONTHS.MONTH_START, 
  WN.WEEK_NUMBER, 
  DECODE(WN.WEEK_NUMBER, 1, MONTHS.MONTH_START, MONTHS.START_SECOND_WEEK + (WN.WEEK_NUMBER-2)*7) START_WEEK,
  DECODE(SIGN(MONTHS.MONTH_END - (MONTHS.START_SECOND_WEEK + (WN.WEEK_NUMBER-1)*7)), -1, MONTHS.MONTH_END, MONTHS.START_SECOND_WEEK + (WN.WEEK_NUMBER-1)*7 - 1) END_WEEK
FROM 
  (SELECT 
    ADD_MONTHS(TO_DATE('01-01-2000','MM-DD-YYYY'),(LEVEL-1)) MONTH_START, 
    ADD_MONTHS(TO_DATE('01-01-2000','MM-DD-YYYY'),LEVEL)-1 MONTH_END, 
    NEXT_DAY(ADD_MONTHS(TO_DATE('01-01-2000','MM-DD-YYYY'),(LEVEL-1))-1,'SATURDAY') END_FIRST_WEEK,
     NEXT_DAY(ADD_MONTHS(TO_DATE('01-01-2000','MM-DD-YYYY'),(LEVEL-1)),'SUNDAY') START_SECOND_WEEK
   FROM 
    DUAL 
  CONNECT BY 
    LEVEL<=(12*20)) MONTHS, 
  (SELECT 
    LEVEL WEEK_NUMBER 
  FROM 
    DUAL 
  CONNECT BY 
    LEVEL<=6) WN 
WHERE 
  (MONTHS.START_SECOND_WEEK + (WN.WEEK_NUMBER-2)*7)<=MONTHS.MONTH_END;

With the virtual lookup table complete, the next step would be to slide the above SQL statement into an inline view, and then join it to the data set that will be analyzed.  Assuming that the data set has a column named DATE_SENT, the WHERE clause may appear similar to the following:

WHERE
  DATE_SENT BETWEEN START_WEEK AND END_WEEK

The above is one solution, are there other solutions?  What about a solution that uses a little simple mathematics?  Such a solution could be exponentially more efficient.  First, we will create a simple table for experimentation with 1000 rows containing somewhat random dates on or after January 1, 2013:

CREATE TABLE T1 AS
SELECT
  TRUNC(TO_DATE('01-01-2013','MM-DD-YYYY')+DBMS_RANDOM.VALUE*366) DATE_SENT
FROM
  DUAL
CONNECT BY
  LEVEL<=1000;

Next, we will generate a SQL statement that produces output that is somewhat similar to the unrefined output found in the first solution (note that the SQL statement contains a DISTINCT clause to eliminate duplicate rows – there could be several rows in the row source with the same random date value):

SELECT DISTINCT
  DATE_SENT,
  TRUNC(DATE_SENT,'MM') MONTH_START,
  ADD_MONTHS(TRUNC(DATE_SENT,'MM'),1)-1 MONTH_END,
  NEXT_DAY(TRUNC(DATE_SENT,'MM')-7,'SUNDAY') MONTH_WEEK_START
FROM
  T1
ORDER BY
  DATE_SENT;

DATE_SENT MONTH_STA MONTH_END MONTH_WEE
--------- --------- --------- ---------
01-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
02-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
03-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
04-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
05-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
06-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
07-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
08-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
09-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
10-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
11-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
12-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
13-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
14-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
15-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
16-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
17-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
18-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
19-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
20-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
21-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
22-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
23-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
24-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
25-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
26-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
27-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
28-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
29-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
30-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
31-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
01-FEB-13 01-FEB-13 28-FEB-13 27-JAN-13
02-FEB-13 01-FEB-13 28-FEB-13 27-JAN-13
03-FEB-13 01-FEB-13 28-FEB-13 27-JAN-13
...
29-DEC-13 01-DEC-13 31-DEC-13 01-DEC-13
30-DEC-13 01-DEC-13 31-DEC-13 01-DEC-13
31-DEC-13 01-DEC-13 31-DEC-13 01-DEC-13
01-JAN-14 01-JAN-14 31-JAN-14 29-DEC-13

In the above, the MONTH_WEEK_START column shows the Sunday of the week that includes the DATE_SENT column value.  At this point, it makes no difference if the MONTH_WEEK_START column value is in a different month than the DATE_SENT column value.

Next, we will determine the week within the month in which the DATE_SENT column value appears:

SELECT DISTINCT
  DATE_SENT,
  TRUNC(DATE_SENT,'MM') MONTH_START,
  ADD_MONTHS(TRUNC(DATE_SENT,'MM'),1)-1 MONTH_END,
  NEXT_DAY(TRUNC(DATE_SENT,'MM')-7,'SUNDAY') MONTH_WEEK_START,
  TRUNC((DATE_SENT-(NEXT_DAY(TRUNC(DATE_SENT,'MM')-7,'SUNDAY')))/7)+1 WEEK_NUMBER
FROM
  T1
ORDER BY
  DATE_SENT;

DATE_SENT MONTH_STA MONTH_END MONTH_WEE WEEK_NUMBER
--------- --------- --------- --------- -----------
01-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           1
02-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           1
03-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           1
04-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           1
05-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           1
06-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           2
07-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           2
08-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           2
09-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           2
10-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           2
11-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           2
12-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           2
13-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           3
14-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           3
15-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           3
16-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           3
17-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           3
18-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           3
19-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           3
20-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           4
21-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           4
22-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           4
23-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           4
24-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           4
25-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           4
26-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           4
27-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           5
28-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           5
29-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           5
30-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           5
31-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           5
01-FEB-13 01-FEB-13 28-FEB-13 27-JAN-13           1
02-FEB-13 01-FEB-13 28-FEB-13 27-JAN-13           1
03-FEB-13 01-FEB-13 28-FEB-13 27-JAN-13           2
04-FEB-13 01-FEB-13 28-FEB-13 27-JAN-13           2
...
24-DEC-13 01-DEC-13 31-DEC-13 01-DEC-13           4
26-DEC-13 01-DEC-13 31-DEC-13 01-DEC-13           4
27-DEC-13 01-DEC-13 31-DEC-13 01-DEC-13           4
28-DEC-13 01-DEC-13 31-DEC-13 01-DEC-13           4
29-DEC-13 01-DEC-13 31-DEC-13 01-DEC-13           5
30-DEC-13 01-DEC-13 31-DEC-13 01-DEC-13           5
31-DEC-13 01-DEC-13 31-DEC-13 01-DEC-13           5
01-JAN-14 01-JAN-14 31-JAN-14 29-DEC-13           1

Heading toward the request made by the OP in the Usenet thread, the following SQL statement will determine a simple COUNT of the number of rows that have a DATE_SENT date in each week of each month in the T1 table’s data set (note that if there are no rows in a given week, that week will be skipped in the output):

SELECT
  TRUNC(DATE_SENT,'MM') MONTH_START,
  TRUNC((DATE_SENT-(NEXT_DAY(TRUNC(DATE_SENT,'MM')-7,'SUNDAY')))/7)+1 WEEK_NUMBER,
  COUNT(*) DAYS_IN_SET
FROM
  T1
GROUP BY
  TRUNC(DATE_SENT,'MM'),
  TRUNC((DATE_SENT-(NEXT_DAY(TRUNC(DATE_SENT,'MM')-7,'SUNDAY')))/7)+1
ORDER BY
  1,
  2;

MONTH_STA WEEK_NUMBER DAYS_IN_SET
--------- ----------- -----------
01-JAN-13           1          13
01-JAN-13           2          15
01-JAN-13           3          23
01-JAN-13           4          21
01-JAN-13           5          13
01-FEB-13           1           5
01-FEB-13           2          21
01-FEB-13           3          15
01-FEB-13           4          14
01-FEB-13           5          11
01-MAR-13           1           4
01-MAR-13           2          18
01-MAR-13           3          19
01-MAR-13           4          20
01-MAR-13           5          20
01-MAR-13           6           4
01-APR-13           1          14
01-APR-13           2          26
01-APR-13           3          15
01-APR-13           4          23
01-APR-13           5           9
01-MAY-13           1          12
01-MAY-13           2          15
01-MAY-13           3          22
01-MAY-13           4          23
01-MAY-13           5          17
01-JUN-13           1           1
01-JUN-13           2          14
01-JUN-13           3          21
01-JUN-13           4          18
01-JUN-13           5          21
01-JUN-13           6           4
01-JUL-13           1          15
01-JUL-13           2          21
01-JUL-13           3          13
01-JUL-13           4          24
01-JUL-13           5          14
01-AUG-13           1           6
01-AUG-13           2          10
01-AUG-13           3          21
01-AUG-13           4          22
01-AUG-13           5          28
01-SEP-13           1          16
01-SEP-13           2          16
01-SEP-13           3          22
01-SEP-13           4          23
01-SEP-13           5           8
01-OCT-13           1          15
01-OCT-13           2          15
01-OCT-13           3          17
01-OCT-13           4          18
01-OCT-13           5          21
01-NOV-13           1           6
01-NOV-13           2          20
01-NOV-13           3          20
01-NOV-13           4          17
01-NOV-13           5          18
01-DEC-13           1          22
01-DEC-13           2          15
01-DEC-13           3          19
01-DEC-13           4          20
01-DEC-13           5           6
01-JAN-14           1           1

It might be just my opinion, but the mathematical solution appears to be easier to understand than the method that uses the virtual lookup table.  We are still able to derive the start date and end date of each week, if needed, to produce the OP’s solution.

Might there be other solutions to the OP’s SQL brain teaser?





Analysis Challenges

25 04 2013

April 25, 2013

Roughly 12 years ago I was attempting to analyze customer order changes that were received through electronic document interchange (EDI), specifically X12 830 documents that show order forecasted demand for specific part numbers.  At the time, the EDI data was partially transformed and inserted into an Oracle 8.0.5 database, while that data is now inserted into an Oracle 11.2 database.  I recently took another glance that this particular problem to see if there is a better method for performing the analysis.

Complications in the analysis:

  • The EDI documents do not include the customer’s purchase order line numbers, just a list of outstanding due dates and quantities.  That complication means that there is no easy way to see if an outstanding quantity has changed, or the date for that outstanding quantity has changed.
  • It is possible to ship to the customer a partial order quantity, with the remaining quantity on the line handled as a back-order.  If the customer does not alter this particular forecast line, only the quantity outstanding will be provided, along with the due date – the original quantity is not provided.
  • It is possible to ship out an order line received in an 830 document up to five days before the customer’s due date.
  • Shipped parts may require hours or days to reach the customer’s site, thus delaying when those shipped parts are acknowledged in the list of outstanding due dates and quantities received through EDI.
  • EDI 830 documents are generally received the next weekday after the customer batch generates those documents on a nightly basis.  Thus, there is an opportunity for the received forecast to be out of sync with what is known to have arrived at the customer site, and what is acknowledged in the EDI data.
  • The customer may generate multiple EDI 830 documents for a particular part number/purchase order combination on any given day, and only the EDI 830 document received with the highest (sorted in ascending order) release number (which may not be a number) on a specific date is the valid release for the specific date.
  • EDI 830 documents may be generated for a specific part number by the customer nearly every weekday, once a month, or any frequency in between.

I recall trying to find a solution to this particular problem 12 years ago.  I thought that I could possibly pull all of the relevant data from the Oracle database into a custom application, and through some magical formula output what appeared to be a list of what the customer changed.  However, what about those part quantities that are in transit to the customer?  What about those part quantities that were received by the customer, but had not yet made their way through the customer’s batch processing system?  What about those cases where the customer sents two or more EDI 830 releases for a specific part number, and those EDI releases are received on the same date?

At the time, I was frustrated.  The quality of the output of my analysis would be dependent on the accuracy of humans (yes, and humans wrote the program that performed the analysis too).  I simply had to find the first matching due date and order quantity from the previous release with the due date and order quanitity from the current release, and then rely on an end user to correct the 5% to 10% of the cases where the matching algorithm was wrong due to one or more of the “complications” outlined above.

While recently re-examining the problem, I noticed a data element in the EDI 830 document that is described as “ACCUM RECEIPTS AS OF THIS DATE”, and a luck would have it, that particular data element is one of those elements that was selected more than 12 years ago to be transformed into the Oracle Database tables (in the sample data included with this blog article, this column is named PREVIOUS_ACCUM_QTY).  If only I recognized the meaning of the values of this column, and had access to Oracle analytical SQL functions 12 years ago, this particular analysis problem might not have had so many variables that could potentially throw off the quality of the analysis.

This particular blog article has a fairly large set of sample data – however, that sample data is a fairly small subset of the data that must be analyzed on a daily basis.  The sample data may be downloaded here: analysischallenges.sql sample code (save the file as “analysischallenges.sql” – without the .doc double extension).  If the file is saved in the same directory (folder) as SQL*Plus was started from, you may then execute the script in SQL*Plus to build the tables and populate the sample data by using the following command:

@analysischallenges.sql

The two sample table definitions are created with the following commands (also included in the script):

CREATE TABLE T1 (
  CUSTOMER_PO VARCHAR2(15),
  RELEASE_ID VARCHAR2(15),
  PART_ID VARCHAR2(30),
  RELEASE_DATE DATE,
  PREVIOUS_ACCUM_QTY NUMBER,
  CREATE_DATE DATE,
  PRIMARY KEY (CUSTOMER_PO,RELEASE_ID));

CREATE TABLE T1_LINE (
  CUSTOMER_PO VARCHAR2(12),
  RELEASE_ID VARCHAR2(12),
  QTY NUMBER,
  DUE_DATE DATE,
  PRIMARY KEY (CUSTOMER_PO,RELEASE_ID,DUE_DATE));

T1 Table Columns:

CUSTOMER_PO is the unique identifier for the customer purchase order received through the EDI 830 documents.  RELEASE_ID defines the specific serialized change number for the customer’s PO – the RELEASE_ID is unique for a specific PO.  PART_ID is the part number for which the customer is providing a forecast.  RELEASE_DATE is the date that the forecast was batch generated in the customer’s computer system.  PREVIOUS_ACCUM_QTY is the count of the number of this part number that the customer has already received for this specific customer purchase order.  CREATE_DATE is the date that the forecast was received from the customer and transformed/inserted into the Oracle Database tables.

T1_LINE Table Columns:

CUSTOMER_PO and RELEASE_ID are undeclared foreign key columns that point back to table T1.  QTY is the quantity of the part number that should (or is forecasted to) ship on the specfied DUE_DATE.

—–

If you are curious how I created the script to populate the tables, I executed the following in SQL*Plus, and then performed a little cleanup of the resulting spool files:

SET LINESIZE 200
SET TRIMSPOOL ON

SPOOL analysischallenges_t1.txt

SELECT
  'INSERT INTO T1 VALUES ('||CHR(39)||CUSTOMER_PO||CHR(39)||','
    ||CHR(39)||RELEASE_ID||CHR(39)||','
    ||CHR(39)||PART_ID||CHR(39)||','
    ||'TO_DATE('||CHR(39)||TO_CHAR(RELEASE_DATE,'DD-MON-YYYY')||CHR(39)||','||CHR(39)||'DD-MON-YYYY'||CHR(39)||')'||','
    ||PREVIOUS_ACCUM_QTY||','
    ||'TO_DATE('||CHR(39)||TO_CHAR(CREATE_DATE,'DD-MON-YYYY')||CHR(39)||','||CHR(39)||'DD-MON-YYYY'||CHR(39)||')'||');' D
FROM
  T1
ORDER BY
  CUSTOMER_PO,
  RELEASE_ID;

SPOOL OFF

SPOOL analysischallenges_t1_line.txt

SELECT
  'INSERT INTO T1_LINE VALUES ('||CHR(39)||CUSTOMER_PO||CHR(39)||','
    ||CHR(39)||RELEASE_ID||CHR(39)||','
    ||QTY||','
    ||'TO_DATE('||CHR(39)||TO_CHAR(DUE_DATE,'DD-MON-YYYY')||CHR(39)||','||CHR(39)||'DD-MON-YYYY'||CHR(39)||')'||');' D
FROM
  T1_LINE
ORDER BY
  CUSTOMER_PO,
  RELEASE_ID,
  DUE_DATE;

SPOOL OFF

—–

We must find a starting point for performing the analysis of what changed from one day’s forecast to a prior day’s forecast.  There is an issue with the EDI 830 releases that were received (identified by the CREATE_DATE column) on April 24, 2013 – two releases for the same customer purchase orders were received, and there were two purchase orders that demand the same part number.  Our analysis is only concerned with the highest numbered (sorted alphabetically) release number document for each customer purchase order that was received on any particular day, so we need to find a way to eliminate the other older releases for the same customer purchase order that was received on the same day.  The ROW_NUMBER analytic function is perfect for this task – we just need the output to show the same number (1) for the rows that we want to keep, and have some other number appear on the other rows.  Something similar to the following may work (note that the TRUNC function, or other approach that uses a date/time range, probably should be used when working with the CREATE_DATE column – realistically, there could be a time component included with those date values):

COLUMN PART_ID FORMAT A10
SET LINESIZE 120
SET PAGESIZE 1000

SELECT
  T.PART_ID,
  T.CUSTOMER_PO,
  ROW_NUMBER() OVER (PARTITION BY T.CUSTOMER_PO,TRUNC(T.CREATE_DATE) ORDER BY T.RELEASE_ID DESC) REL_ON_DATE,
  T.RELEASE_ID,
  T.RELEASE_DATE,
  T.CREATE_DATE,
  T.PREVIOUS_ACCUM_QTY
FROM
  T1 T
WHERE 
  T.CREATE_DATE=TO_DATE('24-APR-2013','DD-MON-YYYY')
ORDER BY
  T.PART_ID,
  T.CUSTOMER_PO,
  T.RELEASE_ID;

PART_ID    CUSTOMER_PO     REL_ON_DATE RELEASE_ID      RELEASE_D CREATE_DA PREVIOUS_ACCUM_QTY
---------- --------------- ----------- --------------- --------- --------- ------------------
DS1812+    1231234                   2 20130402-001    23-APR-13 24-APR-13                 24
DS1812+    1231234                   1 20130403-001    24-APR-13 24-APR-13                 24
DS1812+    1233290                   2 20130402-001    23-APR-13 24-APR-13                  4
DS1812+    1233290                   1 20130403-001    24-APR-13 24-APR-13                  4
DS212+     1133290                   2 20130402-001    23-APR-13 24-APR-13                  2
DS212+     1133290                   1 20130403-001    24-APR-13 24-APR-13                  2
RS812      1231280                   2 20130402-001    23-APR-13 24-APR-13                 60
RS812      1231280                   1 20130403-001    24-APR-13 24-APR-13                 60

To remove all of the rows that have something other than 1 in the REL_ON_DATE column, we are able to slide the above SQL statement into an inline view, and then specify that the REL_ON_DATE column value must be equal to 1.  Typically, there are supposed to be few/no changes made by the customer within eight weeks of the customer forecast’s release date, but that is not always the case.  While eliminating the unnecessary releases, it probably makes sense to also calculate the end date of this “firm” period by adding 55 days to the date on which the forecast was created:

SELECT
  T.PART_ID,
  T.CUSTOMER_PO,
  T.RELEASE_ID,
  T.RELEASE_DATE+55 POSSIBLE_FIRM_DATE,
  T.PREVIOUS_ACCUM_QTY
FROM
  (SELECT
    T.PART_ID,
    T.CUSTOMER_PO,
    ROW_NUMBER() OVER (PARTITION BY T.CUSTOMER_PO,TRUNC(T.CREATE_DATE) ORDER BY T.RELEASE_ID DESC) REL_ON_DATE,
    T.RELEASE_ID,
    T.RELEASE_DATE,
    T.CREATE_DATE,
    T.PREVIOUS_ACCUM_QTY
  FROM
    T1 T
  WHERE 
    T.CREATE_DATE=TO_DATE('24-APR-2013','DD-MON-YYYY')) T
WHERE
  T.REL_ON_DATE=1
ORDER BY
  T.PART_ID,
  T.CUSTOMER_PO,
  T.RELEASE_ID;

PART_ID    CUSTOMER_PO     RELEASE_ID      POSSIBLE_ PREVIOUS_ACCUM_QTY
---------- --------------- --------------- --------- ------------------
DS1812+    1231234         20130403-001    18-JUN-13                 24
DS1812+    1233290         20130403-001    18-JUN-13                  4
DS212+     1133290         20130403-001    18-JUN-13                  2
RS812      1231280         20130403-001    18-JUN-13                 60

Next, we need to retrieve the forecasted delivery schedule (in the T1_LINES table) that are associated with the above rows.  If we take the value in the PREVIOUS_ACCUM_QTY column, and find a way to generate a running SUM of the QTY column from the T1_LINE table, in theory we should be able to determine if the total quantity ordered through a specific date has changed.  The analytic version of the SUM function offers the capability of producing a running SUM when an ORDER BY clause is included in the function call.  To reduce the data set size, I will only consider those forecasts with a scheduled delivery date that is within 101 days of the customer’s forecast date.  With those changes in place, the SQL statement is modified as follows:

SELECT
  T.PART_ID,
  T.CUSTOMER_PO,
  T.RELEASE_ID,
  T.RELEASE_DATE+55 POSSIBLE_FIRM_DATE,
  T.PREVIOUS_ACCUM_QTY,
  TL.DUE_DATE,
  TL.QTY,
  T.PREVIOUS_ACCUM_QTY + SUM(TL.QTY) OVER (PARTITION BY T.CUSTOMER_PO,T.RELEASE_ID ORDER BY TL.DUE_DATE) REL_RUNNING_SUM
FROM
  (SELECT
    T.PART_ID,
    T.CUSTOMER_PO,
    ROW_NUMBER() OVER (PARTITION BY T.CUSTOMER_PO,TRUNC(T.CREATE_DATE) ORDER BY T.RELEASE_ID DESC) REL_ON_DATE,
    T.RELEASE_ID,
    T.RELEASE_DATE,
    T.CREATE_DATE,
    T.PREVIOUS_ACCUM_QTY
  FROM
    T1 T
  WHERE 
    T.CREATE_DATE=TO_DATE('24-APR-2013','DD-MON-YYYY')) T,
  T1_LINE TL
WHERE
  T.REL_ON_DATE=1
  AND T.CUSTOMER_PO=TL.CUSTOMER_PO
  AND T.RELEASE_ID=TL.RELEASE_ID
  AND (T.RELEASE_DATE+100)>=TL.DUE_DATE
ORDER BY
  T.PART_ID,
  T.CUSTOMER_PO,
  T.RELEASE_ID,
  TL.DUE_DATE;

PART_ID    CUSTOMER_PO     RELEASE_ID      POSSIBLE_ PREVIOUS_ACCUM_QTY DUE_DATE         QTY REL_RUNNING_SUM
---------- --------------- --------------- --------- ------------------ --------- ---------- ---------------
DS1812+    1231234         20130403-001    18-JUN-13                 24 24-APR-13          0              24
DS1812+    1233290         20130403-001    18-JUN-13                  4 24-APR-13          0               4
DS212+     1133290         20130403-001    18-JUN-13                  2 24-APR-13          0               2
RS812      1231280         20130403-001    18-JUN-13                 60 27-JUN-13          1              61
RS812      1231280         20130403-001    18-JUN-13                 60 03-JUL-13          1              62
RS812      1231280         20130403-001    18-JUN-13                 60 24-JUL-13          2              64

As may be seen above, only customer purchase order 1231280 has a forecast, calling for a quantity of 1 on June 27, quantity of 1 on July 3, and a quantity of 2 on July 24.  The previously received quantity for customer purchase order 1231280 is 60, so the additional forecast quantities will bring the total order quantities up to 61, 62, and 64 through the three dates included in the forecast.

-The Top Half of the Final SQL Statement is Above-

So, we now have a starting point – the data that was received on April 24, 2013.  How does that forecast data compare with previously received forecasts?  We need to start building the second half of the SQL statement, which looks at the historical data.  To limit the dataset size, I will limit the data to only those EDI documents that were received up to 56 days prior (the eight week “firm” period) to the April 24, 2013 date that was specified earlier in this article:

SELECT
  T.CUSTOMER_PO,
  ROW_NUMBER() OVER (PARTITION BY T.CUSTOMER_PO,TRUNC(T.CREATE_DATE) ORDER BY T.RELEASE_ID DESC) REL_ON_DATE,
  TRUNC(T.CREATE_DATE) CREATE_DATE,
  T.RELEASE_ID,
  T.PREVIOUS_ACCUM_QTY
FROM
  T1 T
WHERE 
  T.CREATE_DATE >= TO_DATE('24-APR-2013','DD-MON-YYYY') - 56
  AND T.CREATE_DATE < TO_DATE('24-APR-2013','DD-MON-YYYY')
ORDER BY
  T.CUSTOMER_PO,
  T.RELEASE_ID;

CUSTOMER_PO     REL_ON_DATE CREATE_DA RELEASE_ID      PREVIOUS_ACCUM_QTY
--------------- ----------- --------- --------------- ------------------
1131111                   1 27-FEB-13 0371                           575
1131111                   1 01-MAR-13 0381                           577
1131111                   1 06-MAR-13 0431                           578
1131111                   1 07-MAR-13 0441                           581
1131111                   1 08-MAR-13 0451                           581
1131111                   1 11-MAR-13 0461                           581
...
1131111                   1 08-APR-13 0741                           593
1131111                   1 11-APR-13 0791                           593
1131111                   1 23-APR-13 0911                           595
1131245                   1 27-FEB-13 0371                          2299
1131245                   1 01-MAR-13 0381                          2300
1131245                   1 05-MAR-13 0421                          2303
...
1131745                   1 19-APR-13 0871                           435
1131745                   1 23-APR-13 0911                           435
1133290                   1 27-FEB-13 20130205-001                     2
1133290                   1 28-FEB-13 20130206-001                     2
1133290                   1 01-MAR-13 20130207-001                     2
...
1133290                   1 19-APR-13 20130328-001                     2
1133290                   1 23-APR-13 20130401-001                     2
1231234                   1 27-FEB-13 20130205-001                    24
1231234                   1 28-FEB-13 20130206-001                    24
...
1231234                   1 23-APR-13 20130401-001                    24
1231280                   1 27-FEB-13 20130205-001                    57
1231280                   1 28-FEB-13 20130206-001                    57
1231280                   1 01-MAR-13 20130207-001                    57
1231280                   1 04-MAR-13 20130208-001                    58
1231280                   1 05-MAR-13 20130211-001                    58
...
1231280                   1 04-APR-13 20130313-001                    60
1231280                   2 05-APR-13 20130314-002                    60
...
1231280                   1 19-APR-13 20130328-001                    60
1231280                   1 23-APR-13 20130401-001                    60
1233290                   1 27-FEB-13 20130205-001                     4
1233290                   1 28-FEB-13 20130206-001                     4
...

As we found earlier, there could be more than one release for a customer purchase order received in a single date, and we are only concerned with the most recent release received on any particular day.  We have a second issue, in that the dates may have shifted from one release to the next, so we have to artifically generate rows with the potentially missing dates, with an order quantity of 0 on those dates.

But, how to generate a list of dates?  We could use a CONNECT BY LEVEL trick:

SELECT
  TO_DATE('24-APR-2013','DD-MON-YYYY')+(LEVEL-1) DUE_DATE
FROM
  DUAL
CONNECT BY
  LEVEL<=100;

DUE_DATE
---------
24-APR-13
25-APR-13
26-APR-13
27-APR-13
28-APR-13
...
15-JUN-13
16-JUN-13
17-JUN-13
18-JUN-13
...
30-JUL-13
31-JUL-13
01-AUG-13

Now, by creating a Cartesian join between the dataset with the dates and the dataset with the EDI release header records, we will make certain that a date exists for every date that is potentially of interest (up to 100 days after April 24, 2013) when we later join the result with the T1_LINE table (the child table that lists the forecast delivery dates).

SELECT
  T2.CUSTOMER_PO,
  T2.RELEASE_ID,
  T2.CREATE_DATE,
  T2.PREVIOUS_ACCUM_QTY,
  D.DUE_DATE
FROM
  (SELECT
    T.CUSTOMER_PO,
    ROW_NUMBER() OVER (PARTITION BY T.CUSTOMER_PO,TRUNC(T.CREATE_DATE) ORDER BY T.RELEASE_ID DESC) REL_ON_DATE,
    TRUNC(T.CREATE_DATE) CREATE_DATE,
    T.RELEASE_ID,
    T.PREVIOUS_ACCUM_QTY
  FROM
    T1 T
  WHERE 
    T.CREATE_DATE >= TO_DATE('24-APR-2013','DD-MON-YYYY') - 56
    AND T.CREATE_DATE < TO_DATE('24-APR-2013','DD-MON-YYYY')
  ORDER BY
    T.CUSTOMER_PO,
    T.RELEASE_ID) T2,
  (SELECT
    TO_DATE('24-APR-2013','DD-MON-YYYY')+(LEVEL-1) DUE_DATE
  FROM
    DUAL
  CONNECT BY
    LEVEL<=100) D
WHERE
  T2.REL_ON_DATE=1
ORDER BY
  T2.CUSTOMER_PO,
  T2.RELEASE_ID,
  D.DUE_DATE;

...
CUSTOMER_PO     RELEASE_ID      CREATE_DA PREVIOUS_ACCUM_QTY DUE_DATE
--------------- --------------- --------- ------------------ ---------
1233290         20130318-001    09-APR-13                  4 25-MAY-13
1233290         20130318-001    09-APR-13                  4 26-MAY-13
1233290         20130318-001    09-APR-13                  4 27-MAY-13
1233290         20130318-001    09-APR-13                  4 28-MAY-13
...
1233290         20130322-001    15-APR-13                  4 22-JUN-13
1233290         20130322-001    15-APR-13                  4 23-JUN-13
1233290         20130322-001    15-APR-13                  4 24-JUN-13
1233290         20130322-001    15-APR-13                  4 25-JUN-13
...
1233290         20130401-001    23-APR-13                  4 29-JUL-13
1233290         20130401-001    23-APR-13                  4 30-JUL-13
1233290         20130401-001    23-APR-13                  4 31-JUL-13
1233290         20130401-001    23-APR-13                  4 01-AUG-13

Next, we will outer join to the T1_LINE table to bring in the actual forecasted delivery schedule from the previous dates (note that I also removed an unnecessary ORDER BY clause that was still embedded in the SQL statement – the Oracle query optimizer probably would have realized that the ORDER BY clause was meaningless to the query result, and automatically removed it – but I removed the clause just in case my assumption is incorrect):

SELECT
  T3.CUSTOMER_PO,
  T3.CREATE_DATE,
  T3.DUE_DATE,
  T3.PREVIOUS_ACCUM_QTY + SUM(NVL(TL.QTY,0)) OVER (PARTITION BY T3.CUSTOMER_PO,T3.RELEASE_ID ORDER BY T3.DUE_DATE) REL_RUNNING_SUM
FROM
  (SELECT
    T2.CUSTOMER_PO,
    T2.RELEASE_ID,
    T2.CREATE_DATE,
    T2.PREVIOUS_ACCUM_QTY,
    D.DUE_DATE
  FROM
    (SELECT
      T.CUSTOMER_PO,
      ROW_NUMBER() OVER (PARTITION BY T.CUSTOMER_PO,TRUNC(T.CREATE_DATE) ORDER BY T.RELEASE_ID DESC) REL_ON_DATE,
      TRUNC(T.CREATE_DATE) CREATE_DATE,
      T.RELEASE_ID,
      T.PREVIOUS_ACCUM_QTY
    FROM
      T1 T
    WHERE 
      T.CREATE_DATE >= TO_DATE('24-APR-2013','DD-MON-YYYY') - 56
      AND T.CREATE_DATE < TO_DATE('24-APR-2013','DD-MON-YYYY')) T2,
    (SELECT
      TO_DATE('24-APR-2013','DD-MON-YYYY')+(LEVEL-1) DUE_DATE
    FROM
      DUAL
    CONNECT BY
      LEVEL<=100) D
  WHERE
    T2.REL_ON_DATE=1) T3,
  T1_LINE TL
WHERE
  T3.CUSTOMER_PO=TL.CUSTOMER_PO(+)
  AND T3.RELEASE_ID=TL.RELEASE_ID(+)
  AND T3.DUE_DATE=TL.DUE_DATE(+)
ORDER BY
  T3.CUSTOMER_PO,
  T3.RELEASE_ID,
  T3.DUE_DATE;

CUSTOMER_PO     CREATE_DA DUE_DATE  REL_RUNNING_SUM
--------------- --------- --------- ---------------
1131111         27-FEB-13 24-APR-13             577
1131111         27-FEB-13 25-APR-13             578
1131111         27-FEB-13 26-APR-13             579
1131111         27-FEB-13 27-APR-13             579
1131111         27-FEB-13 28-APR-13             579
1131111         27-FEB-13 29-APR-13             580
1131111         27-FEB-13 30-APR-13             581
...
1131111         23-APR-13 30-JUL-13             636
1131111         23-APR-13 31-JUL-13             638
1131111         23-APR-13 01-AUG-13             639
...
1231280         11-MAR-13 08-MAY-13              58
1231280         11-MAR-13 09-MAY-13              58
1231280         11-MAR-13 10-MAY-13              58
1231280         11-MAR-13 11-MAY-13              58
...
1231280         23-APR-13 21-JUL-13              62
1231280         23-APR-13 22-JUL-13              62
1231280         23-APR-13 23-JUL-13              62
1231280         23-APR-13 24-JUL-13              64
1231280         23-APR-13 25-JUL-13              64
1231280         23-APR-13 26-JUL-13              64
...

We are getting close.  Now we just need to join the SQL created earlier in this article with the above SQL statement so that the current customer purchase order forecasts may be compared with the earlier received customer purchase order forecasts.  The LISTAGG analytic funtion, introduced with Oracle Database 11.1, makes it easy to condense multiple rows from the resultset into a much smaller number of rows, if the resultset is grouped on at least one column.  So, the final version of the analysis challenge appears as follows:

SELECT
  C.PART_ID,
  C.CUSTOMER_PO,
  C.POSSIBLE_FIRM_DATE,
  C.DUE_DATE,
  C.QTY,
  C.REL_RUNNING_SUM,
  LISTAGG(T4.REL_RUNNING_SUM||' ('||TO_CHAR(T4.CREATE_DATE,'MM/DD/YY')||')',', ') WITHIN GROUP (ORDER BY T4.CREATE_DATE DESC) EXCEPTIONS
FROM
  (SELECT
    T.PART_ID,
    T.CUSTOMER_PO,
    T.RELEASE_ID,
    T.RELEASE_DATE+55 POSSIBLE_FIRM_DATE,
    T.PREVIOUS_ACCUM_QTY,
    TL.DUE_DATE,
    TL.QTY,
    T.PREVIOUS_ACCUM_QTY + SUM(TL.QTY) OVER (PARTITION BY T.CUSTOMER_PO,T.RELEASE_ID ORDER BY TL.DUE_DATE) REL_RUNNING_SUM
  FROM
    (SELECT
      T.PART_ID,
      T.CUSTOMER_PO,
      ROW_NUMBER() OVER (PARTITION BY T.CUSTOMER_PO,TRUNC(T.CREATE_DATE) ORDER BY T.RELEASE_ID DESC) REL_ON_DATE,
      T.RELEASE_ID,
      T.RELEASE_DATE,
      T.CREATE_DATE,
      T.PREVIOUS_ACCUM_QTY
    FROM
      T1 T
    WHERE 
      T.CREATE_DATE=TO_DATE('24-APR-2013','DD-MON-YYYY')) T,
    T1_LINE TL
  WHERE
    T.REL_ON_DATE=1
    AND T.CUSTOMER_PO=TL.CUSTOMER_PO
    AND T.RELEASE_ID=TL.RELEASE_ID
    AND (T.RELEASE_DATE+100)>=TL.DUE_DATE) C,
  (SELECT
    T3.CUSTOMER_PO,
    T3.CREATE_DATE,
    T3.DUE_DATE,
    T3.PREVIOUS_ACCUM_QTY + SUM(NVL(TL.QTY,0)) OVER (PARTITION BY T3.CUSTOMER_PO,T3.RELEASE_ID ORDER BY T3.DUE_DATE) REL_RUNNING_SUM
  FROM
    (SELECT
      T2.CUSTOMER_PO,
      T2.RELEASE_ID,
      T2.CREATE_DATE,
      T2.PREVIOUS_ACCUM_QTY,
      D.DUE_DATE
    FROM
      (SELECT
        T.CUSTOMER_PO,
        ROW_NUMBER() OVER (PARTITION BY T.CUSTOMER_PO,TRUNC(T.CREATE_DATE) ORDER BY T.RELEASE_ID DESC) REL_ON_DATE,
        TRUNC(T.CREATE_DATE) CREATE_DATE,
        T.RELEASE_ID,
        T.PREVIOUS_ACCUM_QTY
      FROM
        T1 T
      WHERE 
        T.CREATE_DATE >= TO_DATE('24-APR-2013','DD-MON-YYYY') - 56
        AND T.CREATE_DATE < TO_DATE('24-APR-2013','DD-MON-YYYY')) T2,
      (SELECT
        TO_DATE('24-APR-2013','DD-MON-YYYY')+(LEVEL-1) DUE_DATE
      FROM
        DUAL
      CONNECT BY
        LEVEL<=100) D
    WHERE
      T2.REL_ON_DATE=1) T3,
    T1_LINE TL
  WHERE
    T3.CUSTOMER_PO=TL.CUSTOMER_PO(+)
    AND T3.RELEASE_ID=TL.RELEASE_ID(+)
    AND T3.DUE_DATE=TL.DUE_DATE(+)) T4
WHERE
  C.CUSTOMER_PO=T4.CUSTOMER_PO
  AND C.DUE_DATE=T4.DUE_DATE
  AND C.REL_RUNNING_SUM<>T4.REL_RUNNING_SUM
GROUP BY
  C.PART_ID,
  C.CUSTOMER_PO,
  C.POSSIBLE_FIRM_DATE,
  C.DUE_DATE,
  C.QTY,
  C.REL_RUNNING_SUM
ORDER BY
  C.PART_ID,
  C.CUSTOMER_PO,
  C.DUE_DATE;

PART_ID    CUSTOMER_PO     POSSIBLE_ DUE_DATE         QTY REL_RUNNING_SUM
---------- --------------- --------- --------- ---------- ---------------
EXCEPTIONS
-----------------------------------------------------------------------------------------------------------------------
RS812      1231280         18-JUN-13 27-JUN-13          1              61
60 (03/22/13), 60 (03/21/13), 60 (03/19/13), 60 (03/18/13), 60 (03/15/13), 60 (03/14/13), 59 (03/13/13), 59 (03/12/13),
59 (03/11/13), 59 (03/08/13), 60 (03/07/13), 60 (03/06/13), 60 (03/05/13), 60 (03/04/13), 59 (03/01/13), 59 (02/28/13),
59 (02/27/13)

RS812      1231280         18-JUN-13 03-JUL-13          1              62
61 (03/22/13), 61 (03/21/13), 61 (03/19/13), 61 (03/18/13), 61 (03/15/13), 61 (03/14/13), 60 (03/13/13), 60 (03/12/13),
60 (03/11/13), 60 (03/08/13), 61 (03/07/13), 61 (03/06/13), 61 (03/05/13), 61 (03/04/13), 60 (03/01/13), 60 (02/28/13),
60 (02/27/13)

RS812      1231280         18-JUN-13 24-JUL-13          2              64
63 (03/26/13), 63 (03/25/13), 62 (03/22/13), 62 (03/21/13), 62 (03/19/13), 62 (03/18/13), 62 (03/15/13), 62 (03/14/13),
61 (03/13/13), 61 (03/12/13), 61 (03/11/13), 61 (03/08/13), 62 (03/07/13), 62 (03/06/13), 62 (03/05/13), 62 (03/04/13),
61 (03/01/13), 61 (02/28/13), 61 (02/27/13)

It is now easy to see that the requested total quantity due through June 27, 2013 increased by 2 since the first forecast which was received on February 27, 2013.  The requested total quantity increased by 1 in the EDI release that was received on March 4, 2013, and by 1 again on the next EDI release that was received after March 22, 2013.  The second and third lines of the output show that there was a little more fluctuation in the customer’s forecast, increasing in quantity, decreasing in quantity, and then increasing in quantity again.

An unfortunate problem may be present – what if the customer completely eliminates the forecast orders on their purchase order, but still sends in a header line for the forecast?  A future enhancement may be required, where the CONNECT BY LEVEL trick is used in the first half of the SQL statement (the current releases), just as it is used in the second half of the SQL statement (the historical releases), so that when the top half of the final SQL statement is joined to the bottom half, rows returned by the bottom half of the SQL statement are not unnecessarily removed from the output.

The output is a little ugly.  One approach to clean up the output involves spooling the output to an HTML file as explained here and in the documentation here.  Another approach to clean up the output involves sending the data into an Excel spreadsheet using either a macro or the Microsoft Query Wizard, accessed from within Excel (see the articles in the Excel category of this blog for an explanation of these approaches).

—–

The final solution for this analysis challenge is so simple when built up from smaller SQL statements as shown above; I wonder why I did not consider this approach 12 years ago?  :-)





Bitten by a Virtual Column, _OPTIMIZER_IGNORE_HINTS Doesn’t Ignore Hints?

9 03 2013

March 9, 2013

I had a couple of spare minutes today, so I tried a couple of experiments with Oracle Database 11.2.0.2 just to see if I could produce some unexpected results.

First, I will create a simple database table with two indexes:

CREATE TABLE T1 (
  N1 NUMBER,
  V1 VARCHAR2(20),
  D1 DATE,
  PADDING VARCHAR2(300));

CREATE INDEX IND_T1_I1 ON T1(V1);
CREATE INDEX IND_T1_I2 ON T1(N1,1);

In the above, I indexed column V1.  The second index is a composite index with column N1 as the first column and the constant 1 as the second column (see this article to understand the reason for the constant 1).  Creating the second index causes Oracle Database to add a hidden virtual column to the table.

Let’s insert 100,000 rows into the table (column N1 includes a repeating sequence from 1 to 999 and then a NULL, column V1 is the Roman numeral for the repeating sequence from 1 to 999 and then 0, column D1 simply adds from 1 to 999 and then 0 days to the current date, and column PADDING is just a series of A characters to discourage the Oracle query optimizer from using full table scans rather than an index access path, if such an access path is legal):

INSERT INTO
  T1 
SELECT 
  DECODE(MOD(ROWNUM,1000),0,NULL,MOD(ROWNUM,1000)) N1, 
  TRIM(TO_CHAR(MOD(ROWNUM,1000),'RN')) V1, 
  TRUNC(SYSDATE) + MOD(ROWNUM,1000) D1, 
  RPAD('A',300,'A') PADDING 
FROM 
  DUAL 
CONNECT BY 
  LEVEL<=100000;

COMMIT;

Let’s take a look at the columns that are included in each of the indexes:

SELECT
  INDEX_NAME,
  COLUMN_NAME
FROM
  USER_IND_COLUMNS
WHERE
  TABLE_NAME = 'T1'
ORDER BY
  INDEX_NAME,
  COLUMN_POSITION;

INDEX_NAME   COLUMN_NAME
------------ ------------
IND_T1_I1    V1
IND_T1_I2    N1
IND_T1_I2    SYS_NC00005$

As can be seen above, Oracle added the SYS_NC00005$ hidden virtual column to the table so that the number 1 could be included as the second column of the composite index.  Somewhat interesting, we are able to query the value of the SYS_NC00005$ hidden virtual column in a query:

SET LINESIZE 140
SET PAGESIZE 1000

SELECT
  V1,
  N1,
  "SYS_NC00005$"
FROM
  T1
WHERE
  N1 IS NULL;

V1                           N1 SYS_NC00005$
-------------------- ---------- ------------
###############                            1
###############                            1
###############                            1
###############                            1
###############                            1
...
###############                            1
###############                            1
###############                            1

100 rows selected.

The above indicates that (according to Oracle) just as there is no Roman number higher than 3,999, there is no Roman number for 0.

While probably not wise, we are able to rename this hidden virtual column:

ALTER TABLE T1 RENAME COLUMN "SYS_NC00005$" TO "C";

Table altered.

Let’s check which columns are included in the index definitions again:

SELECT
  INDEX_NAME,
  COLUMN_NAME
FROM
  USER_IND_COLUMNS
WHERE
  TABLE_NAME = 'T1'
ORDER BY
  INDEX_NAME,
  COLUMN_POSITION;

INDEX_NAME   COLUMN_NAME
------------ -----------
IND_T1_I1    V1
IND_T1_I2    N1
IND_T1_I2    C

The output of the above might suggest to the unsuspecting person that someone intentionally added a column named C to the table, and included that column in a composite index with column N1.  Interesting…

Let’s collect statistics on all of the rows in the table, as well as the table’s indexes:

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>NULL,TABNAME=>'T1',CASCADE=>TRUE,ESTIMATE_PERCENT=>100)

SELECT
  V1,
  N1,
  C
FROM
  T1
WHERE
  N1 IS NULL;

V1                           N1          C
-------------------- ---------- ----------
###############                          1
###############                          1
###############                          1
###############                          1
###############                          1
...
###############                          1
###############                          1
###############                          1

100 rows selected.

DBMS_XPLAN will be used to retrieve the actual execution plan for the SQL statement that was just executed:

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'TYPICAL'));

SQL_ID  f35sfgu6s8huh, child number 0
-------------------------------------
SELECT   V1,   N1,   C FROM   T1 WHERE   N1 IS NULL

Plan hash value: 2648162318

-----------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |       |       |   102 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1        |   100 |  1500 |   102   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IND_T1_I2 |   100 |       |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("N1" IS NULL)

The above indicates that the optimizer determined that using an index access path was likely the most efficient access path to search for cases where column N1 contains NULL values.

Oracle Database offers more than 200 hints to help control the query optimizer’s behavior (as well as other behavior).  In general, hints should be used sparingly.  For example, we are able to add a FULL hint to force the optimizer to use a full table scan when searching column N1 for NULL values:

SELECT /*+ FULL(T1) */
  V1,
  N1,
  C
FROM
  T1
WHERE
  N1 IS NULL;

V1                           N1          C
-------------------- ---------- ----------
###############                          1
###############                          1
###############                          1
###############                          1
###############                          1
...
###############                          1
###############                          1
###############                          1

100 rows selected.

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'TYPICAL'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------
SQL_ID  3w60zjynqtwrp, child number 0
-------------------------------------
SELECT /*+ FULL(T1) */   V1,   N1,   C FROM   T1 WHERE   N1 IS NULL

Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |       |       |   590 (100)|          |
|*  1 |  TABLE ACCESS FULL| T1   |   100 |  1500 |   590   (2)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("N1" IS NULL)

As the above shows, Oracle’s optimizer selected to use a full table scan even though the calculated cost for that access path (590) is higher than the calculated cost of the index access path (102).  There is an obvious temptation when learning about hints to overuse those hints – as data volumes (and the data itself) in a table changes, those index hints may very well hurt performance, rather than help performance.

_OPTIMIZER_IGNORE_HINTS doesn’t ignore hints?

That was a recent set of search keywords used to access this blog.  Oracle Database offers a fairly large number of parameters (we will get back to hints in a moment), some of which are considered hidden parameters because those parameters begin with _ (underscore) characters, typically do not display when querying theV$PARAMETER view (unless the parameter value was modified), and typically should not be modified without the explicit request of Oracle Support.  If we had used the +OUTLINE format parameter in the DBMS_XPLAN call we would see a similarly named IGNORE_OPTIM_EMBEDDED_HINTS hint in the Outline Data section of the execution plan:

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR('3w60zjynqtwrp',0,'TYPICAL +OUTLINE'));

SQL_ID  3w60zjynqtwrp, child number 0
-------------------------------------
SELECT /*+ FULL(T1) */   V1,   N1,   C FROM   T1 WHERE   N1 IS NULL

Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |       |       |   590 (100)|          |
|*  1 |  TABLE ACCESS FULL| T1   |   100 |  1500 |   590   (2)| 00:00:01 |
--------------------------------------------------------------------------

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('11.2.0.2')
      DB_VERSION('11.2.0.2')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$1")
      FULL(@"SEL$1" "T1"@"SEL$1")
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("N1" IS NULL)

Let’s modify the _OPTIMIZER_IGNORE_HINTS hidden parameter at the session level and try the test query again:

ALTER SESSION SET "_optimizer_ignore_hints"=TRUE;

Session altered.

SELECT /*+ FULL(T1) */
  V1,
  N1,
  C
FROM
  T1
WHERE
  N1 IS NULL;

V1                           N1          C
-------------------- ---------- ----------
###############                          1
###############                          1
###############                          1
###############                          1
###############                          1
...
###############                          1
###############                          1
###############                          1

100 rows selected.

The above is no surprise, so let’s take a look at the execution plan with the Outline Data section included:

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'TYPICAL +OUTLINE'));

SQL_ID  3w60zjynqtwrp, child number 1
-------------------------------------
SELECT /*+ FULL(T1) */   V1,   N1,   C FROM   T1 WHERE   N1 IS NULL

Plan hash value: 2648162318

-----------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |       |       |   102 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1        |   100 |  1500 |   102   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IND_T1_I2 |   100 |       |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('11.2.0.2')
      DB_VERSION('11.2.0.2')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$1")
      INDEX_RS_ASC(@"SEL$1" "T1"@"SEL$1" ("T1"."N1" "T1"."C1"))
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("N1" IS NULL)

Looking at the above execution plan, you will notice that the Oracle optimizer has ignored the FULL hint that was specified in the SQL statement, and that by examining the Outline Data section of the plan you will see that FULL(@”SEL$1″ “T1″@”SEL$1″) was replaced with INDEX_RS_ASC(@”SEL$1″ “T1″@”SEL$1″ (“T1″.”N1″ “T1″.”C1″)).

I am wondering about the above search keywords that were used to access my blog.  Can I force the Oracle query optimizer NOT to ignore embedded hints when the _OPTIMIZER_IGNORE_HINTS hidden parameter is set to TRUE?  Before we start, let’s set the hidden parameter back to the default value of FALSE.

ALTER SESSION SET "_optimizer_ignore_hints"=FALSE;

Occasionally, when the Outline Data section of the execution plan is reviewed (either with DBMS_XPLAN or by reviewing a 10053 trace) an OPT_PARAM hint may appear in the Outline Data section.  Oracle Database itself inserts these hints into SQL statement, for example (from a 10046 trace file):

PARSING IN CURSOR #448514944 len=3052 dep=1 uid=64 oct=3 lid=64 tim=853194336675 hv=3275773959 ad='3edba6b28' sqlid='2sdms4r1n0q07'
SELECT /*+ opt_param('parallel_execution_enabled', 'false') */
                       /* EXEC_FROM_DBMS_XPLAN */ id, position, depth , operation, options, object_name, cardinality, bytes, temp_space, cost,

A 10053 trace file also includes a section titled PARAMETERS IN OPT_PARAM HINT that indicates the SQL statement level parameters that are modified through the OPT_PARAM hint.

Let’s test that OPT_PARAM hint to see if we are able to hint the optimizer to ignore all other hints in the SQL statement:

SELECT /*+ OPT_PARAM('_optimizer_ignore_hints','TRUE') FULL(T1) */
  V1,
  N1,
  C
FROM
  T1
WHERE
  N1 IS NULL;

V1                           N1          C
-------------------- ---------- ----------
###############                          1
###############                          1
###############                          1
###############                          1
###############                          1
...
###############                          1
###############                          1
###############                          1

100 rows selected.

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'TYPICAL'));

SQL_ID  c4fxyhf0fzg4t, child number 0
-------------------------------------
SELECT /*+ OPT_PARAM('_optimizer_ignore_hints','TRUE') FULL(T1) */
V1,   N1,   C FROM   T1 WHERE   N1 IS NULL

Plan hash value: 2648162318

-----------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |       |       |   102 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1        |   100 |  1500 |   102   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IND_T1_I2 |   100 |       |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("N1" IS NULL)

The hint that modified the parameter that controls whether or not hints are ignored worked.  Well, that just does not seem right – should it not be the case that that hint is also ignored?  Might this be a catch 22?

What if… we reverse the order of the hints?

SELECT /*+ FULL(T1) OPT_PARAM('_optimizer_ignore_hints','TRUE') */
  V1,
  N1,
  C
FROM
  T1
WHERE
  N1 IS NULL;

V1                           N1          C
-------------------- ---------- ----------
###############                          1
###############                          1
###############                          1
###############                          1
###############                          1
...
###############                          1
###############                          1
###############                          1

100 rows selected.

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'TYPICAL'));

SQL_ID  07pu6c3jvdrd0, child number 0
-------------------------------------
SELECT /*+ FULL(T1) OPT_PARAM('_optimizer_ignore_hints','TRUE') */
V1,   N1,   C FROM   T1 WHERE   N1 IS NULL

Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |       |       |   590 (100)|          |
|*  1 |  TABLE ACCESS FULL| T1   |   100 |  1500 |   590   (2)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("N1" IS NULL)

Well, the FULL hint definitely was not ignored by the optimizer that time, as it was for the previous arrangement of the hints.  Might we be on to something?

Let’s verify that the OPT_PARAM hint does work to control optimizer parameters at the SQL statement level:

SELECT /*+ OPT_PARAM('optimizer_index_cost_adj',1) */
  V1,
  N1,
  C
FROM
  T1
WHERE
  N1 IS NULL;

V1                           N1          C
-------------------- ---------- ----------
###############                          1
###############                          1
###############                          1
###############                          1
###############                          1
...
###############                          1
###############                          1
###############                          1

100 rows selected.

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'TYPICAL'));

SQL_ID  ac45wtypubpfu, child number 0
-------------------------------------
SELECT /*+ OPT_PARAM('optimizer_index_cost_adj',1) */   V1,   N1,   C
FROM   T1 WHERE   N1 IS NULL

Plan hash value: 2648162318

-----------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |       |       |     1 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1        |   100 |  1500 |     1   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IND_T1_I2 |   100 |       |     1   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("N1" IS NULL)

Reviewing the above, you might have noticed that the calculated cost of the index access decreased from a value of 102 to just 1 as a result of the OPT_PARAM hint, so we might be able to conclude that the hint does work.  Might this difference in the execution plan be explained as just the query optimizer NOT silently ignoring invalid hints?  Food for thought.

You might have noticed that the above queries have selected column C from the table.  Let’s take a look at the table’s definition:

DESC T1

 Name             Null?    Type
 ---------------- -------- -------------
 N1                        NUMBER
 V1                        VARCHAR2(20)
 D1                        DATE
 PADDING                   VARCHAR2(300)

Yep, no column C.  How are we able to select column C from the table if DESC T1 does not list that column?

Let’s drop the second index, the one that used the constant 1 as the second column in the index:

DROP INDEX IND_T1_I2;

Index dropped.

Now that the index is gone that created that hidden virtual column, let’s create our own column C and use that column in a new IND_T1_I2 composite index:

ALTER TABLE T1 ADD (C NUMBER DEFAULT 1 NOT NULL);

                    *
ERROR at line 1:
ORA-01430: column being added already exists in table

ORA-01430?  I never added a column C to the table, did you?  Take a look:

DESC T1

 Name             Null?    Type
 ---------------- -------- -------------
 N1                        NUMBER
 V1                        VARCHAR2(20)
 D1                        DATE
 PADDING                   VARCHAR2(300)

Be gone mysterious column C (do you think this might be the reason why it is a bad idea to rename the system generated hidden virtual columns?):

ALTER TABLE T1 DROP (C);

                     *
ERROR at line 1:
ORA-12996: cannot drop system-generated virtual column

But, but, I dropped the index that caused that hidden virtual column to be created.

ALTER TABLE T1 ADD (C1 NUMBER DEFAULT 1 NOT NULL);

Table altered.

CREATE INDEX IND_T1_I2 ON T1(N1,C1);

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>NULL,TABNAME=>'T1',CASCADE=>TRUE,ESTIMATE_PERCENT=>100)

I didn’t really want a column C any way, sniff.

Now, if we were to execute this SQL statement, will Oracle’s query optimizer use the just created IND_T1_I2 index to search for NULL values in column N1?  Would the result potentially be different if column C1 did not have a declared NOT NULL constraint?  Keep in mind that the Oracle documentation states: “The query must be guaranteed not to need any NULL values from the indexed expression, because NULL values are not stored in indexes.”

SELECT
  V1,
  N1,
  C1
FROM
  T1
WHERE
  N1 IS NULL;

Did I mention that I had a couple of spare minutes today?  That does not happen too often lately.

Above I stated that it was odd that Oracle Database did not automatically dop the automatically created hidden virtual column when the index for which the column was created was dropped.  So, what happens if the automatically created hidden virtual column is not renamed?  Here is a simple script based on the above experiment to determine what happens (change SYS_NC00005$ in the ALTER TABLE commands as necessary so that the column name is the same as the column name returned from the query of the USER_IND_COLUMNS view).

DROP TABLE T1 PURGE;

CREATE TABLE T1 (
  N1 NUMBER,
  V1 VARCHAR2(20),
  D1 DATE,
  PADDING VARCHAR2(300));

CREATE INDEX IND_T1_I1 ON T1(V1);

CREATE INDEX IND_T1_I2 ON T1(N1,1);

INSERT INTO
  T1
SELECT
  DECODE(MOD(ROWNUM,1000),0,NULL,MOD(ROWNUM,1000)) N1,
  TRIM(TO_CHAR(MOD(ROWNUM,1000),'RN')) V1,
  TRUNC(SYSDATE) + MOD(ROWNUM,1000) D1,
  RPAD('A',300,'A') PADDING
FROM
  DUAL
CONNECT BY
  LEVEL<=100000;

COMMIT;

SELECT
  INDEX_NAME,
  COLUMN_NAME
FROM
  USER_IND_COLUMNS
WHERE
  TABLE_NAME = 'T1'
ORDER BY
  INDEX_NAME,
  COLUMN_POSITION;

ALTER TABLE T1 ADD ("SYS_NC00005$" NUMBER DEFAULT 1 NOT NULL);

DROP INDEX IND_T1_I2;

ALTER TABLE T1 ADD ("SYS_NC00005$" NUMBER DEFAULT 1 NOT NULL);

My results follow:

SQL> DROP TABLE T1 PURGE;
Table dropped.

SQL> CREATE TABLE T1 (
  2    N1 NUMBER,
  3    V1 VARCHAR2(20),
  4    D1 DATE,
  5    PADDING VARCHAR2(300));

Table created.

SQL> CREATE INDEX IND_T1_I1 ON T1(V1);

Index created.

SQL> CREATE INDEX IND_T1_I2 ON T1(N1,1);

Index created.

SQL> INSERT INTO
  2    T1
  3  SELECT
  4    DECODE(MOD(ROWNUM,1000),0,NULL,MOD(ROWNUM,1000)) N1,
  5    TRIM(TO_CHAR(MOD(ROWNUM,1000),'RN')) V1,
  6    TRUNC(SYSDATE) + MOD(ROWNUM,1000) D1,
  7    RPAD('A',300,'A') PADDING
  8  FROM
  9    DUAL
 10  CONNECT BY
 11    LEVEL<=100000;

100000 rows created.

SQL> COMMIT;

Commit complete.

SQL> SELECT
  2    INDEX_NAME,
  3    COLUMN_NAME
  4  FROM
  5    USER_IND_COLUMNS
  6  WHERE
  7    TABLE_NAME = 'T1'
  8  ORDER BY
  9    INDEX_NAME,
 10    COLUMN_POSITION;

INDEX_NAME   COLUMN_NAME
------------ --------------------
IND_T1_I1    V1
IND_T1_I2    N1
IND_T1_I2    SYS_NC00005$

SQL> ALTER TABLE T1 ADD ("SYS_NC00005$" NUMBER DEFAULT 1 NOT NULL);
ALTER TABLE T1 ADD ("SYS_NC00005$" NUMBER DEFAULT 1 NOT NULL)
                    *
ERROR at line 1:
ORA-01430: column being added already exists in table

SQL> DROP INDEX IND_T1_I2;

Index dropped.

SQL> ALTER TABLE T1 ADD ("SYS_NC00005$" NUMBER DEFAULT 1 NOT NULL);

Table altered.

With the original test case, both the first and second ALTER TABLE commands failed.





Feeling ANSI About Oracle Join Syntax? 2

7 02 2013

February 7, 2013

(Back to the Previous Post in the Series)

As I have mentioned a couple of times previously, I am not much of a fan of ANSI style joins – I prefer using the classical Oracle join syntax when possible.  I try to keep up with an ERP mailing list, and try to assist with providing answers to questions when time permits.  A SQL statement was recently shared with the ERP mailing list, demonstrating a solution that was put together to solve a particular problem.  A portion of the SQL statement follows:

...
 FROM
 dbo.WORK_ORDER RIGHT OUTER JOIN dbo.OPERATION ON
     dbo.OPERATION.WORKORDER_TYPE = dbo.WORK_ORDER.TYPE AND
     dbo.OPERATION.WORKORDER_BASE_ID = dbo.WORK_ORDER.BASE_ID AND
     dbo.OPERATION.WORKORDER_LOT_ID = dbo.WORK_ORDER.LOT_ID AND
     dbo.OPERATION.WORKORDER_SPLIT_ID = dbo.WORK_ORDER.SPLIT_ID AND
     dbo.OPERATION.WORKORDER_SUB_ID = dbo.WORK_ORDER.SUB_ID
   left join LABOR_TICKET CUR on
 ...

An ANSI join… I really wish that the person who wrote that SQL statement used Oracle’s classical (+) notation for the outer join declaration… and it probably would have helped if the OP was running with an Oracle Database backend rather than a SQL Server backend.  When I saw that ANSI outer join, I immediately started thinking about pig outer join… a response that I submitted to an Internet forum in 2006, and republished here.

After thinking about the SQL statement for a couple of minutes, I decided that the above ANSI RIGHT OUTER JOIN is equivalent to the following using Oracle’s classical (+) notation for outer joins:

WORK_ORDER.TYPE(+) = OPERATION.WORKORDER_TYPE
AND WORK_ORDER.BASE_ID(+) = OPERATION.WORKORDER_BASE_ID
AND WORK_ORDER.LOT_ID(+) = OPERATION.WORKORDER_LOT_ID
AND WORK_ORDER.SPLIT_ID(+) = OPERATION.WORKORDER_SPLIT_ID
AND WORK_ORDER.SUB_ID(+) = OPERATION.WORKORDER_SUB_ID

If you look closely at the above, you might have noticed that not only did I remove the dbo. in front of the table names, but I also switched which columns appear on the left and right side of the equal sign.  Why switch which column is on the left and which is on the right of the equal sign?  So that the columns belonging to each table were listed in the same order from left to right as the join declaration: dbo.WORK_ORDER RIGHT OUTER JOIN dbo.OPERATION

With that change made, the rules of pig outer join apply.  With the ANSI join type of RIGHT OUTER JOIN, the (+) notation is placed after the column that is to the left of the equal sign.  That column with the (+) notation appended is permitted to return a NULL value when there is no matching row in that column’s table for that join condition.  The (+) notation follows all of the columns from the WORK_ORDER table that appear in the join.  To the casual observer, one might think that there is a parent-child relationship between the two tables, maybe even a delared foreign key relationship with the OPERATION table as the parent and the WORK_ORDER table as the child.

There is in fact a declared foreign key relationship between the two tables.  For Oracle Database backends, that declared foreign key relationship was created using a command similar to the following:

ALTER TABLE OPERATION ADD(
  CONSTRAINT CONFUSE_ME_NOT FOREIGN KEY (
    WORKORDER_TYPE,
    WORKORDER_BASE_ID,
    WORKORDER_LOT_ID,
    WORKORDER_SPLIT_ID,
    WORKORDER_SUB_ID)
  REFERENCES WORK_ORDER ON DELETE CASCADE);

So yes, a delared foreign key relationship exists between the two tables.  But wait, doesn’t the above constraint require that for a row to be present in the OPERATION table, a corresponding row must already exist in the WORK_ORDER table?

Maybe I should not have flip-flopped which columns appear on the left and right side of the equal sign?  Did I mention that I am not much of a fan of ANSI style joins?  There might be a few bugs in Oracle Database related to its automatic conversion of ANSI style joins to classical Oracle joins, but let’s try a test anyway.

I will use the autotrace functionality in SQL*Plus to output the execution plan – there are times when autotrace outputs the wrong execution plan for a SQL statement, but we will ignore that quirk for now.  I will add a NO_QUERY_TRANSFORMATION hint to a much shortened version of the original poster’s (OP’s) original query – this hint was an attempt to keep Oracle’s query optimizer from recognizing that there is a declared foreign key relationship between the two tables, and automatically converting the outer join into an inner join (oddly, the Oracle query optimizer did not alter the join to an inner join when the hint was removed).

SET AUTOTRACE TRACEONLY EXPLAIN
SET LINESIZE 200
SET PAGESIZE 1000

SELECT /*+ NO_QUERY_TRANSFORMATION */
  *
FROM
  WORK_ORDER,
  OPERATION
WHERE
  WORK_ORDER.TYPE(+) = OPERATION.WORKORDER_TYPE
  AND WORK_ORDER.BASE_ID(+) = OPERATION.WORKORDER_BASE_ID
  AND WORK_ORDER.LOT_ID(+) = OPERATION.WORKORDER_LOT_ID
  AND WORK_ORDER.SPLIT_ID(+) = OPERATION.WORKORDER_SPLIT_ID
  AND WORK_ORDER.SUB_ID(+) = OPERATION.WORKORDER_SUB_ID
  AND OPERATION.WORKORDER_TYPE = 'W'
  AND OPERATION.WORKORDER_BASE_ID = '13000';

Execution Plan
----------------------------------------------------------
Plan hash value: 4262719396

---------------------------------------------------------------------------------------------
| Id  | Operation                    | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |              |    31 | 12772 |    21   (5)| 00:00:01 |
|*  1 |  HASH JOIN OUTER             |              |    31 | 12772 |    21   (5)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID| OPERATION    |    31 |  6944 |    12   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | SYS_C0021734 |    31 |       |     3   (0)| 00:00:01 |
|   4 |   TABLE ACCESS BY INDEX ROWID| WORK_ORDER   |    11 |  2068 |     8   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN          | SYS_C0021989 |    11 |       |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("WORK_ORDER"."TYPE"(+)="OPERATION"."WORKORDER_TYPE" AND
              "WORK_ORDER"."BASE_ID"(+)="OPERATION"."WORKORDER_BASE_ID" AND
              "WORK_ORDER"."LOT_ID"(+)="OPERATION"."WORKORDER_LOT_ID" AND
              "WORK_ORDER"."SPLIT_ID"(+)="OPERATION"."WORKORDER_SPLIT_ID" AND
              "WORK_ORDER"."SUB_ID"(+)="OPERATION"."WORKORDER_SUB_ID")
   3 - access("OPERATION"."WORKORDER_TYPE"='W' AND
              "OPERATION"."WORKORDER_BASE_ID"='13000')
   5 - access("WORK_ORDER"."TYPE"(+)='W' AND "WORK_ORDER"."BASE_ID"(+)='13000')

For now, just focus on the Predicate Information section of the generated execution plan, in particular the information found in the “1 – access” section – that section shows how the two tables are joined together during the hash join operation.  Much of the WHERE clause from the shortened version of the query is found in the “1 – access” section (and there is an indication of an automatically generated predicate in the “5-access” section).

Let’s take a look at an ANSI join version of the shortened SQL statement, with the WORK_ORDER table listed first in the join syntax, the OPERATION table listed second in the join syntax (as it was in the OP’s SQL statement), and with the columns in the ON clause flip-flopped on each side of the = signs (the opposite order in which the columns were listed in the OP’s original SQL statement):

SELECT
  *
FROM
  WORK_ORDER RIGHT OUTER JOIN OPERATION
ON (WORK_ORDER.TYPE = OPERATION.WORKORDER_TYPE
    AND WORK_ORDER.BASE_ID = OPERATION.WORKORDER_BASE_ID
    AND WORK_ORDER.LOT_ID = OPERATION.WORKORDER_LOT_ID
    AND WORK_ORDER.SPLIT_ID = OPERATION.WORKORDER_SPLIT_ID
    AND WORK_ORDER.SUB_ID = OPERATION.WORKORDER_SUB_ID)
WHERE
  OPERATION.WORKORDER_TYPE = 'W'
  AND OPERATION.WORKORDER_BASE_ID = '13000';

Execution Plan
----------------------------------------------------------
Plan hash value: 4262719396

---------------------------------------------------------------------------------------------
| Id  | Operation                    | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |              |    31 | 12772 |    21   (5)| 00:00:01 |
|*  1 |  HASH JOIN OUTER             |              |    31 | 12772 |    21   (5)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID| OPERATION    |    31 |  6944 |    12   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | SYS_C0021734 |    31 |       |     3   (0)| 00:00:01 |
|   4 |   TABLE ACCESS BY INDEX ROWID| WORK_ORDER   |    11 |  2068 |     8   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN          | SYS_C0021989 |    11 |       |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("WORK_ORDER"."SUB_ID"(+)="OPERATION"."WORKORDER_SUB_ID" AND
              "WORK_ORDER"."SPLIT_ID"(+)="OPERATION"."WORKORDER_SPLIT_ID" AND
              "WORK_ORDER"."LOT_ID"(+)="OPERATION"."WORKORDER_LOT_ID" AND
              "WORK_ORDER"."BASE_ID"(+)="OPERATION"."WORKORDER_BASE_ID" AND
              "WORK_ORDER"."TYPE"(+)="OPERATION"."WORKORDER_TYPE")
   3 - access("OPERATION"."WORKORDER_TYPE"='W' AND
              "OPERATION"."WORKORDER_BASE_ID"='13000')
   5 - access("WORK_ORDER"."TYPE"(+)='W' AND "WORK_ORDER"."BASE_ID"(+)='13000')

In the above, just focus on the Predicate Information section for a moment, in particular the information found in the “1 – access” section – that section shows how the two tables are joined together during the hash join operation.  Notice the location of the (+) symbols, immediately after the columns from the WORK_ORDER table – that arrangement exactly matches my Oracle style join syntax.  You might have also noticed that the plan hash value is 4262719396 in both of the execution plans (that plan hash value consistency does NOT verify that the Predicate Information section is identical for the two execution plans, but it does verify that otherwise the execution plans are identical).

So, the Oracle query optimizer esentially transformed the ANSI style join version of the SQL statement into the same (optimizer transformed) SQL statement as I submitted using the classical Oracle outer join syntax.  Nice, however, that still leaves a question.  Does it matter in ANSI style joins which column is to the left of the equal sign in the ON clause?

Let’s try the modified ANSI SQL statement again, this time with the tables listed in the same order of the join clause as in the OP’s SQL statement, and the columns in the same order of the ON clause as in the OP’s SQL statement:

SELECT
  *
FROM
  WORK_ORDER RIGHT OUTER JOIN OPERATION
ON (OPERATION.WORKORDER_TYPE = WORK_ORDER.TYPE
    AND OPERATION.WORKORDER_BASE_ID = WORK_ORDER.BASE_ID
    AND OPERATION.WORKORDER_LOT_ID = WORK_ORDER.LOT_ID
    AND OPERATION.WORKORDER_SPLIT_ID = WORK_ORDER.SPLIT_ID
    AND OPERATION.WORKORDER_SUB_ID = WORK_ORDER.SUB_ID)
WHERE
  OPERATION.WORKORDER_TYPE = 'W'
  AND OPERATION.WORKORDER_BASE_ID = '13000';

Execution Plan
----------------------------------------------------------
Plan hash value: 4262719396

---------------------------------------------------------------------------------------------
| Id  | Operation                    | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |              |    31 | 12772 |    21   (5)| 00:00:01 |
|*  1 |  HASH JOIN OUTER             |              |    31 | 12772 |    21   (5)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID| OPERATION    |    31 |  6944 |    12   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | SYS_C0021734 |    31 |       |     3   (0)| 00:00:01 |
|   4 |   TABLE ACCESS BY INDEX ROWID| WORK_ORDER   |    11 |  2068 |     8   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN          | SYS_C0021989 |    11 |       |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("OPERATION"."WORKORDER_SUB_ID"="WORK_ORDER"."SUB_ID"(+) AND
              "OPERATION"."WORKORDER_SPLIT_ID"="WORK_ORDER"."SPLIT_ID"(+) AND
              "OPERATION"."WORKORDER_LOT_ID"="WORK_ORDER"."LOT_ID"(+) AND
              "OPERATION"."WORKORDER_BASE_ID"="WORK_ORDER"."BASE_ID"(+) AND
              "OPERATION"."WORKORDER_TYPE"="WORK_ORDER"."TYPE"(+))
   3 - access("OPERATION"."WORKORDER_TYPE"='W' AND
              "OPERATION"."WORKORDER_BASE_ID"='13000')
   5 - access("WORK_ORDER"."TYPE"(+)='W' AND "WORK_ORDER"."BASE_ID"(+)='13000')

You might have noticed that once again, the plan hash value is 4262719396, just as it was for the previous two execution plans.  What does that plan hash value consistency indicate again?  In the above, focus on the Predicate Information section for a moment, in particular the “1 – access” section.  Notice again that the (+) symbols appear immediately after the columns from the WORK_ORDER table, even though the column order, in respect to the = sign, mirrors that of the submitted SQL statement (Oracle’s query optimizer is permitted to flip-flop the columns that appear on each side of the = sign, however that rearrangement did not happen during this demonstration).

So, what have we learned from the above information?  Are you feeling ANSI yet?





Name that Table’s Column

9 11 2012

November 9, 2012

I have not had a lot of time to browse through forums lately, but I noticed an interesting thread in the comp.databases.oracle.server Usenet group.  The OP in the thread was curious why quotes (double quotes) were required around a particular column when referenced in a SQL statement, because specifying quotes around the column name is apparently a little challenging in the PHP scripting language.

I thought about this issue a bit, wondering “how did that happen” and then thought about the benefits of this approach.  In theory, a sophisticated programmer could build a 16 column table using a single four character column name (with different letters capitalized).  Such an approach is sure to evoke a couple of choice four letter words!

I thought that I would throw together a quick example table:

CREATE TABLE T1(
  My_Data NUMBER,
  "My_Data" NUMBER,
  "my_data" NUMBER,
  "MY_DATA" NUMBER);

That almost worked:

SQL> CREATE TABLE T1(
  2    My_Data NUMBER,
  3    "My_Data" NUMBER,
  4    "my_data" NUMBER,
  5    "MY_DATA" NUMBER);
  "MY_DATA" NUMBER)
  *
ERROR at line 5:
ORA-00957: duplicate column name

Columns 1 and 4 have the same name.  Let’s fix that problem and try again:

CREATE TABLE T1(
  My_Data NUMBER,
  "My_Data" NUMBER,
  "my_data" NUMBER,
  "MY_DATa" NUMBER);

Table created.

That worked, now we have a table with four columns, where all of the column names are the same.  Notice that the first column name was not wrapped in quotes.

Let’s insert a row into the table:

INSERT INTO T1 VALUES(
  1,
  2,
  3,
  4);

1 row created.

Let’s see what happens when we query the table:

SELECT
  *
FROM
  T1
WHERE
  MY_DATA=1;

   MY_DATA    My_Data    my_data    MY_DATa
---------- ---------- ---------- ----------
         1          2          3          4

Notice that a row was returned, even though the upper/lower case of the column in the WHERE clause did not match the first column name when the T1 table was created (when the table was created, for the first column, the column name was created as if were specified in all uppercase letters).

Let’s see if a row is returned if we try the following (notice that the attempt at querying with the fourth column based on matching that column’s capitalization in the WHERE clause failed to restrict the query results based on the contents of the fourth column in the table):

SELECT
  *
FROM
  T1
WHERE
  MY_DATa=1;

   MY_DATA    My_Data    my_data    MY_DATa
---------- ---------- ---------- ----------
         1          2          3          4

Let’s try again, this time wrapping the column name found in the WHERE clause in quotes:

SELECT
  *
FROM
  T1
WHERE
  "MY_DATA"=1;

   MY_DATA    My_Data    my_data    MY_DATa
---------- ---------- ---------- ----------
         1          2          3          4

A row was still returned (as expected).  So then, how do we determine which rows in the table have a value of 1 in the fourth column?  We need to place the column name found in the WHERE clause within quotes as follows:

SELECT
  *
FROM
  T1
WHERE
  "MY_DATa"=1;

no rows selected

Notice that this time no rows were returned (as intended, and expected).

I guess that the lesson here is to be careful when creating tables in Oracle Database.  Just because other database products may place column names within quotes as standard practice, that does not indicate that you, as the developer, should continue the practice when working with Oracle Databases.  Now get out there and create some 16 column tables with choice four letter words.  :-)





On the Topic of Programming 3

3 09 2012

September 3, 2012

(Back to the Previous Post in the Series)

In the previous article in this series, a question was posed regarding how many SQL statements would be required to generate the following report, which displays in Microsoft Excel:

A significant number of items regarding the report require explanation:

  • Employee IDs and employee names appear in column A (the left-most column).  There could be many employees, but only those employees reporting labor transactions for machining operations in production batches (work order lots) that completed during one of the weeks should be listed.
  • Part IDs, part descriptions, and operations (Sub n OP nn in the above example report) are listed in column B (the second column from the left).  The number of operations listed for an employee could range from one to possibly 50 (or more, depending on the date range of the report), but that list will only include machining type operations (not welding, blasting, painting, laser cutting, etc.).
  • Total hours are reported in column C.  The second output row for an employee shows the sum of the numbers directly below for the employee.  The numbers on the 3rd, 6th, 9th, etc. rows for the employee in column C are the sum of the hours the employee worked in the operation for all manufacturing batches that completed in the report time period.
  • The first row for each employee, in columns E though H in the above report (could very well extend beyond column H, depending on the number of weeks selected), are the Monday through Sunday dates for the week.
  • The second row for each employee, in columns E through H in the above report show the weighted average efficiency for the employee in that week.  The weighting is based on the percentage of hours the employee spent in the operations, for those operations in batches that completed during the week.  This number of hours is calculated, but not printed in the report.
  • The numbers on the 4th, 7th, 10th, etc. rows for each employee, in columns E through H, shows the average number of hours the operation required, for those batches that completed during the week.  If an employee did not work in the operation one week, but did another, then a blank should appear in the week that the employee did not work in the operation.  All employees that worked in the operation for the week will have the same average printed.  For example, Cary Grant and James Stewart both worked on WIDGET101 operation Sub 0 OP 30.  The average hours for the first week is 10.50, and that value appears for both employees.  During the second week, only James Stewart worked on that operation, and the average hours for that week decreased to 7.56.
  • The numbers of on the 3rd, 6th, 9th, etc. rows for each employee, shows the ratio of the engineering standard hours divided by the average hours for the operation in that week.  The same values will appear for all employees reporting time to the operation in the week.
  • The percentage in column I is the simple average of the weighted averages to the left.  There may be causes where a blank will appear for one of the weighted averages for an employee, so it is not possible to simply sum the weighted averages and divide by the number of weeks.

This appeared to be a simple report when it was just a sketch on a sheet of paper!

How many SQL statement executions will be necessary to generate the above report?  Before answering that question, let’s take another look at how the data is organized and related between the tables:

OK, so the above diagram might be a little confusing.

The engineering standard hours are found by querying the OPERATION table with the WORKORDER_TYPE=’W’ and the WORKORDER_BASE_ID equal to the part ID that is of interest (the WORK_ORDER table also contains engineering standard information, accessed by searching for TYPE=’M’ and the BASE_ID equal to the part ID tht is of interest).  In theory, there could be multiple engineering standards for the same part ID and operation, which is why we might need to join to the PART table to make certain that the current engineering standard is retrieved, but we will ignore that potential issue for now.  It would be a good idea to also check the WORK_ORDER table to obtain a distinct list of part IDs that had a close date in the time range that is of interest (there is no sense in forcing the database RDBMS to retrieve the engineering standards for parts that have not been produced in the last 17 years), so we will need to specify that the TYPE column in the WORK_ORDER table is equal to W and that the CLOSE_DATE is in the date range of the report.  Many operations are NOT machining operations, and we can only determine the type of operation by determining the type machine that is selected for use, so we must also query the SHOP_RESOURCE table to determine if the operation is processed at a machining center.

The actual production hours are found by querying the LABOR_TICKET table with the the WORKORDER_TYPE=’W’ (to eliminate spurious indirect labor) and joined to the SHOP_RESOURCE table to make certain that the operation is a machining type operation (as explained above).  To determine the employees’ name, the LABOR_TICKET table is joined to the EMPLOYEE table.  To determine the part ID that is produced in the labor ticket transaction, the LABOR_TICKET table must be joined to the WORK_ORDER table, with the TYPE column in the WORK_ORDER table set to ‘W’ and the SUB_ID column in the WORK_ORDER table set to ‘0’ so that the main part ID for the work order is returned rather than a subordinate part ID.  To retrieve the description of the part ID that is produced, the WORK_ORDER table must be joined to the PART table.

That sketch of a report on a sheet of paper sure looked simple, but the degree of difficulty is all relative (or relational).

How many query executions will be required to produce the report that will appear in Microsoft Excel?  Should we retrieve the entire database over the network and process the data client-side?  I don’t program in the Java programming language, but I am guessing that the majority of Java programmers would not attempt to retrieve the entire database.   Would the number of SQL statement executions depend on the number of employees?  Would the number of SQL statement executions depend on the number of different part operations whose batches closed in a particular week?  Would the number of SQL statement executions depend on the number weeks included in the report?

In the words of the TV show “Name that Tune“, I can name that tune in 1 note.  Because there are many to many joins between the various table, it is not possible to construct a single simple SQL statement that retrieves the correct result.  However, it is possible to construct a single complex query using inline views or subquery factoring (WITH blocks), along with analytic functions to retrieve the correct, necessary information to the client for presentation in Microsoft Excel.  The correct answer to the above question must:

  • Minimize the amount of data flowing across the network.  SQL*Net compression may help, but of course that can only possibly help if more than one row is retrieved at a time.  Ideally, the array fetch size should be set to a large value (possibly 100 up to the maximum of 5,000) to make the best use of SQL*Net compression.
  • Minimize as best as possible the performance impact on the database server, the database server is a shared resource without unlimited capacity.  Setting a reasonably large array fetch size may help reduce the number of memory accesses (specifically consistent gets) on the database server.  Make certain that available indexes are usable by the query – be careful about using functions (such as TRUNC) on date type columns in the WHERE clause.
  • When a query accesses many tables, and also when there are multiple inline views (or factored subqueries), the Oracle optimizer may not always find the optimal join order.  The _OPTIMIZER_MAX_PERMUTATIONS hidden parameter defaults to a value of 2,000, which of course restricts the number of possible join permutations attempted per query block in a query, so with many tables in a query block (more than 6, for instance), the limit for the maximum number of permutations may easily be hit.  Statistics must be reasonably accurate for cardinality estimates to be close to accurate, otherwise the query optimizer may select not only an inappropriate join order but also an inappropriate join type.  Histograms may help (or harm) the query optimizer, indexes may help (or harm), establishing default cardinality and cost for custom PL/SQL functions may help, and adding hints to help control the cardinality estimates or join orders may also help.
  • Lastly, and most importantly, the queries must provide correct information.

I struggled with this problem a bit.  The report is intended to measure efficiency, so it would be somewhat laughable if the report performed inefficiently.  How could I minimize execution time for this particular report?  If Oracle Database is licensed by CPU core (or CPU socket in the case of the Standard Edition), does it make sense to perform all of the computations on the database server, or does it make sense to off-load some of that processing to the client’s dual, quad, or eight core CPU?  If the report were rendered by a shared resource, such as a web server, application server, or Citrix server, would I want that same CPU offloading as I would have planned with a dedicated client computer?  The report shows the information in a (modified) pivot table format, should I use the PIVOT function that was introduced with Oracle Database 11.1; or maybe I should use client-side control break logic that was taught in one of my university-level structured programming classes years ago?

That sketch of a report on a sheet of paper sure looked simple, now I need a control break (must be simple one key solution for that problem – check your keyboard :-) ).

It is possible to group the manufacturing lots by the Monday of the work week of the completion date (CLOSE_DATE) with the following function call:

NEXT_DAY(TRUNC(CLOSE_DATE),'MONDAY')-7)

Looks fairly simple.  If I had a date range of four weeks in mind, I might be inclined to do something like this in the WHERE clause of the query:

CLOSE_DATE BETWEEN (NEXT_DAY(TO_DATE('20-AUG-2012','DD-MON-YYYY'),'MONDAY')-28) AND (NEXT_DAY(TO_DATE('20-AUG-2012','DD-MON-YYYY'),'MONDAY')-1)

However, I did not do that.  Why not?  One reason is that the Oracle query optimizer would likely have a hard time determining the approximate number of rows that would be returned from the table with the CLOSE_DATE column.  The NEXT_DAY(TO_DATE()) combination hides the actual date range of interest, so the optimizer falls back to using default cardinality percentages for that predicate.  Secondly, what happens when the user of the program is interested in something other than four weeks – the solution may be cumbersome to maintain.  Thirdly, the above did not make use of bind variables, so every time the date range is changed, the query optimizer will perform a hard parse of the query.  A better approach calculates the date range on the client-side and submits the WHERE clause like this:

CLOSE_DATE BETWEEN TO_DATE('30-JUL-2012','DD-MON-YYYY') AND TO_DATE('26-AUG-2012','DD-MON-YYYY')

Or, better yet using bind variables:

CLOSE_DATE BETWEEN :START_DATE AND :END_DATE

So, how many query executions will this report require?  Just one query, executed a single time.  The particular programming environment that I used requires ? to appear in bind variable positions, so when you see a ? just read it as either :START_DATE or :END_DATE.

SELECT
  L.EMPLOYEE_ID,
  E.LAST_NAME||', '||E.FIRST_NAME EMPLOYEE_NAME,
  L.PART_ID,
  L.WORKORDER_SUB_ID,
  L.OPERATION_SEQ_NO,
  L.DESCRIPTION,
  L.CLOSE_WEEK,
  L.PART_EFF_WEEK,
  L.AVG_HRS_PC,
  L.EMP_ACCUM_EFF_WEEK,
  SUM(L.HOURS_WORKED) OVER (PARTITION BY L.EMPLOYEE_ID) EMP_HOURS_WORKED,
  SUM(L.HOURS_WORKED) OVER (PARTITION BY L.EMPLOYEE_ID,L.PART_ID, L.WORKORDER_SUB_ID,L.OPERATION_SEQ_NO) EMP_PART_HOURS_WORKED,
  SUM(L.EMP_ACCUM_EFF_WEEK) OVER (PARTITION BY L.EMPLOYEE_ID,L.CLOSE_WEEK) EMP_EFF_WEEK
FROM
  (SELECT
    O.PART_ID,
    O.DESCRIPTION,
    O.WORKORDER_SUB_ID,
    O.OPERATION_SEQ_NO,
    O.CLOSE_WEEK,
    O.RECEIVED_QTY,
    O.ACT_RUN_HRS,
    L.EMPLOYEE_ID,
    L.HOURS_WORKED,
    ROUND(O.ACT_RUN_HRS/O.RECEIVED_QTY,2) AVG_HRS_PC,
    ENG.ENG_HRS,
    ROUND(DECODE((O.ACT_RUN_HRS/O.RECEIVED_QTY),0,1,ENG.ENG_HRS/(O.ACT_RUN_HRS/O.RECEIVED_QTY)),4) PART_EFF_WEEK,
    SUM(L.HOURS_WORKED) OVER (PARTITION BY L.EMPLOYEE_ID,L.CLOSE_WEEK) EMP_HRS_WEEK,
    ROUND(L.HOURS_WORKED/(SUM(L.HOURS_WORKED) OVER (PARTITION BY L.EMPLOYEE_ID,L.CLOSE_WEEK)),4) EMP_PERCENT_WEEK,
    ROUND((L.HOURS_WORKED/(SUM(L.HOURS_WORKED) OVER (PARTITION BY L.EMPLOYEE_ID,L.CLOSE_WEEK))) * DECODE((O.ACT_RUN_HRS/O.RECEIVED_QTY),0,1,ENG.ENG_HRS/(O.ACT_RUN_HRS/O.RECEIVED_QTY)),4) EMP_ACCUM_EFF_WEEK
  FROM
    (SELECT
      PART_ID,
      DESCRIPTION,
      WORKORDER_SUB_ID,
      OPERATION_SEQ_NO,
      NEXT_DAY(TRUNC(CLOSE_DATE),'MONDAY')-7 CLOSE_WEEK,
      SUM(RECEIVED_QTY) RECEIVED_QTY,
      SUM(ACT_RUN_HRS) ACT_RUN_HRS
    FROM
      (SELECT
        WO.TYPE,
        WO.BASE_ID,
        WO.LOT_ID,
        WO.SPLIT_ID,
        WO.RECEIVED_QTY,
        WO.CLOSE_DATE,
        SUM(O.ACT_SETUP_HRS+O.ACT_RUN_HRS) ACT_RUN_HRS,
        WO.PART_ID,
        O.WORKORDER_SUB_ID,
        O.SEQUENCE_NO OPERATION_SEQ_NO,
        P.DESCRIPTION
      FROM
        WORK_ORDER WO,
        PART P,
        OPERATION O,
        SHOP_RESOURCE SR
      WHERE
        WO.TYPE='W'
        AND WO.SUB_ID='0'
        AND WO.CLOSE_DATE BETWEEN ? AND ?
        AND WO.RECEIVED_QTY>0
        AND WO.TYPE=O.WORKORDER_TYPE
        AND WO.BASE_ID=O.WORKORDER_BASE_ID
        AND WO.LOT_ID=O.WORKORDER_LOT_ID
        AND WO.SPLIT_ID=O.WORKORDER_SPLIT_ID
        AND WO.PART_ID=P.ID
        AND O.RESOURCE_ID=SR.ID
        AND SR.BUILDING_ID='Machine'
        AND SR.LOGICAL_GROUP IN ('10 TURNING CENTERS','20 VERTICAL M.C.','30 630MM PALLET HMC','40 800MM-1000MM PALLET HMC','50 1000MM PALLET HMC','55 LARGE 2-TABLE HMC/HBM','60 SINGLE TABLE HBM','60 SINGLE TABLE HDM','70 LARGE FLOOR MACHINES')
      GROUP BY
        WO.TYPE,
        WO.BASE_ID,
        WO.LOT_ID,
        WO.SPLIT_ID,
        WO.CLOSE_DATE,
        WO.RECEIVED_QTY,
        WO.PART_ID,
        O.WORKORDER_SUB_ID,
        O.SEQUENCE_NO,
        P.DESCRIPTION)
    GROUP BY
      PART_ID,
      DESCRIPTION,
      WORKORDER_SUB_ID,
      OPERATION_SEQ_NO,
      NEXT_DAY(TRUNC(CLOSE_DATE),'MONDAY')-7) O,
    (SELECT /*+ LEADING(WO LT) */
      WO.PART_ID,
      LT.EMPLOYEE_ID,
      LT.WORKORDER_SUB_ID,
      LT.OPERATION_SEQ_NO,
      NEXT_DAY(TRUNC(CLOSE_DATE),'MONDAY')-7 CLOSE_WEEK,
      SUM(LT.HOURS_WORKED) HOURS_WORKED
    FROM
      (SELECT
        WO.TYPE,
        WO.BASE_ID,
        WO.LOT_ID,
        WO.SPLIT_ID,
        WO.CLOSE_DATE,
        WO.PART_ID
      FROM
        WORK_ORDER WO,
        OPERATION O,
        SHOP_RESOURCE SR
      WHERE
        WO.TYPE='W'
        AND WO.SUB_ID='0'
        AND WO.CLOSE_DATE BETWEEN ? AND ?
        AND WO.RECEIVED_QTY>0
        AND WO.TYPE=O.WORKORDER_TYPE
        AND WO.BASE_ID=O.WORKORDER_BASE_ID
        AND WO.LOT_ID=O.WORKORDER_LOT_ID
        AND WO.SPLIT_ID=O.WORKORDER_SPLIT_ID
        AND O.RESOURCE_ID=SR.ID
        AND SR.BUILDING_ID='Machine'
        AND SR.LOGICAL_GROUP IN ('10 TURNING CENTERS','20 VERTICAL M.C.','30 630MM PALLET HMC','40 800MM-1000MM PALLET HMC','50 1000MM PALLET HMC','55 LARGE 2-TABLE HMC/HBM','60 SINGLE TABLE HBM','60 SINGLE TABLE HDM','70 LARGE FLOOR MACHINES')
      GROUP BY
        WO.TYPE,
        WO.BASE_ID,
        WO.LOT_ID,
        WO.SPLIT_ID,
        WO.CLOSE_DATE,
        WO.PART_ID) WO,
      LABOR_TICKET LT,
      SHOP_RESOURCE SR
    WHERE
      WO.TYPE=LT.WORKORDER_TYPE
      AND WO.BASE_ID=LT.WORKORDER_BASE_ID
      AND WO.LOT_ID=LT.WORKORDER_LOT_ID
      AND WO.SPLIT_ID=LT.WORKORDER_SPLIT_ID
      AND LT.TYPE IN ('R','S')
      AND LT.HOURS_WORKED<>0
      AND SR.LOGICAL_GROUP IN ('10 TURNING CENTERS','20 VERTICAL M.C.','30 630MM PALLET HMC','40 800MM-1000MM PALLET HMC','50 1000MM PALLET HMC','55 LARGE 2-TABLE HMC/HBM','60 SINGLE TABLE HBM','60 SINGLE TABLE HDM','70 LARGE FLOOR MACHINES')
      AND LT.RESOURCE_ID=SR.ID
    GROUP BY
      WO.PART_ID,
      LT.WORKORDER_SUB_ID,
      LT.OPERATION_SEQ_NO,
      LT.EMPLOYEE_ID,
      NEXT_DAY(TRUNC(CLOSE_DATE),'MONDAY')-7) L,
    (SELECT /*+ LEADING(WO) */
      WO2.TYPE,
      WO2.BASE_ID,
      O.WORKORDER_SUB_ID,
      O.SEQUENCE_NO OPERATION_SEQ_NO,
      SUM(O.SETUP_HRS+O.RUN_HRS) ENG_HRS,
      WO.PART_ID
    FROM
      (SELECT
        WO.PART_ID
      FROM
        WORK_ORDER WO
      WHERE
        WO.TYPE='W'
        AND WO.SUB_ID='0'
        AND WO.CLOSE_DATE BETWEEN ? AND ?
        AND WO.RECEIVED_QTY>0
      GROUP BY
        WO.PART_ID) WO,
      WORK_ORDER WO2,
      OPERATION O,
      SHOP_RESOURCE SR
    WHERE
      WO2.TYPE='M'
      AND WO.PART_ID=WO2.BASE_ID
      AND WO2.LOT_ID='0'
      AND WO2.SPLIT_ID='0'
      AND WO2.SUB_ID='0'
      AND WO2.TYPE=O.WORKORDER_TYPE
      AND WO2.BASE_ID=O.WORKORDER_BASE_ID
      AND WO2.LOT_ID=O.WORKORDER_LOT_ID
      AND WO2.SPLIT_ID=O.WORKORDER_SPLIT_ID
      AND O.RESOURCE_ID=SR.ID
      AND SR.BUILDING_ID='Machine'
      AND SR.LOGICAL_GROUP IN ('10 TURNING CENTERS','20 VERTICAL M.C.','30 630MM PALLET HMC','40 800MM-1000MM PALLET HMC','50 1000MM PALLET HMC','55 LARGE 2-TABLE HMC/HBM','60 SINGLE TABLE HBM','60 SINGLE TABLE HDM','70 LARGE FLOOR MACHINES')
    GROUP BY
      WO2.TYPE,
      WO2.BASE_ID,
      WO.PART_ID,
      O.WORKORDER_SUB_ID,
      O.SEQUENCE_NO) ENG
  WHERE
    O.PART_ID=L.PART_ID
    AND O.WORKORDER_SUB_ID=L.WORKORDER_SUB_ID
    AND O.OPERATION_SEQ_NO=L.OPERATION_SEQ_NO
    AND O.CLOSE_WEEK=L.CLOSE_WEEK
    AND O.PART_ID=ENG.PART_ID
    AND O.WORKORDER_SUB_ID=ENG.WORKORDER_SUB_ID
    AND O.OPERATION_SEQ_NO=ENG.OPERATION_SEQ_NO) L,
  EMPLOYEE E
WHERE
  L.EMPLOYEE_ID=E.ID
ORDER BY
  L.EMPLOYEE_ID,
  L.PART_ID,
  TO_NUMBER(L.WORKORDER_SUB_ID),
  L.OPERATION_SEQ_NO,
  L.CLOSE_WEEK;

I suppose that I could have performed more of the processing on the database server by sliding the above query into an inline view and used the PIVOT function, and/or additional analytic functions to calculate the employee’s weighted efficiency average per week and the simple averages of the wieghted averages – doing so would have eliminated some potentially complex client-side logic that needed to be programmed a single time, but counter point is that every time the report executed, it would require a more of the database server’s resources than were absolutely required.

That sketch of a report on a sheet of paper sure looked simple, but it turned into a three part blog article series.  Interesting, now the person would like to analyze the data by part ID, listing the employees working on the operations for the part.  Do I smell another three part blog article series?  Sorry, no – I just changed the ORDER BY clause so that the PART_ID column was listed first, and made a small change to the client-side control break logic.  Now the person wants to analyze 104 weeks worth of data rather than just four weeks, and only a single part ID or employee.  Too late, those changes were anticipated, and included in the original specification in the client-side programming, implemented with a simple change to the query and pre-planning for a variable number of weeks in the report.








Follow

Get every new post delivered to your Inbox.

Join 142 other followers