Everything or Nothing in SQL

23 05 2013

May 23, 2013

The following question recently came through an ERP mailing list (significantly rephrased):

I would like to use the Microsoft Query tool in Microsoft Excel to extract records from the ERP database.  I would like the list of parent records to be retrieved into Excel when all of the specified child records belong to the parent.

That sounds as though it would be an easy request.  The OP in the ERP mailing list later followed up with the following:

“If I may ask for another variation to pick your brain….

How would I rewrite this to show ONLY the work order(s) that have all of the 3 Parts, but instead of showing all of them, show only the ones that don’t include a specific other part?

For example, Include all the work order retrieved from your query, but then subtract those work orders that don’t contain a part or parts in another separate list.

I.E. All order containing parts (‘147555′,’227643′,’155328′) but not containing ANY of these parts (‘12345′, ‘456789’)”

In the above follow up, the term “work order(s)” refer to the “parent records” that was previously mentioned, and the term “parts” refer to the “child records” that was previously mentioned.  Below is a graphical mock up, with the light-green colored rectangles representing the “parts”, and the white colored rectangle at the bottom left representing the “work order”.  The other white colored rectangles represent subassemblies that belong to the “work order”, and the cyan colored rectangles represent the operations that utilize the “parts”.  Through normalization, the white colored rectangles represent rows in one database table (WORK_ORDER – the “parent” table), the cyan colored rectangles represent rows in another table (OPERATION – the “child” table), and the light-green colored rectangles represent rows in a third table (REQUIREMENT – the “grandchild” table).

EverythingOrNothing

The first request still seems to be easy to solve, even though there are three tables that are potentially involved.  Let’s simplify the problem further and just focus on the “grandchild” table.  Let’s create a small version of that table and populate the table with some sample rows:

CREATE TABLE T1 (
  WORKORDER_TYPE VARCHAR2(1),
  WORKORDER_BASE_ID VARCHAR2(15),
  WORKORDER_LOT_ID VARCHAR2(3),
  WORKORDER_SUB_ID VARCHAR2(3),
  PIECE_NO NUMBER,
  PART_ID VARCHAR2(15),
  QTY NUMBER(12,4));

INSERT INTO
  T1
SELECT
  'W' WORKORDER_TYPE,
  TRIM(TO_CHAR(TRUNC(ROWNUM / 100 + 1)*1000,'00000')) || 'W' WORKORDER_BASE_ID,
  TO_CHAR(MOD(TRUNC(ROWNUM / 25 + 1), 999) + 1) WORKORDER_LOT_ID,
  TO_CHAR(TRUNC(MOD(ROWNUM - 1, 25) / 5) + 1) WORKORDER_SUB_ID,
  MOD(ROWNUM - 1, 25) + 1 PIECE_NO,
  CHR(65 + MOD(ROWNUM - 1, 26)) || CHR(65 + MOD(ROWNUM - 1, 15)) || MOD(ROWNUM - 1, 10)  PART_ID,
  TRUNC(DBMS_RANDOM.VALUE*100) + 1 QTY
FROM
  DUAL
CONNECT BY
  LEVEL<=1000;

In the above, the combination of the WORKORDER_BASE_ID column and the WORKORDER_LOT_ID column represent the “work order” that was mentioned by the OP.  The WORKORDER_SUB_ID column identifies the subassembly number (answering the question of to which of the white colored rectangles does the “part” belong).  The PIECE_NO column is used to uniquely identify the rows that belong to a specific subassembly number (that is the final column in the REQUIREMENT table’s primary key).  The PART_ID column is the specific “part” that is attached to a specific “work order”.  The QTY column indicates the number of the “part” that is required by the “work order” – with the sample data, this is the only column that will differ from one execution of the table creation script to the next execution.

The sample data is a bit too consistent, however, we will ignore any issues that consistency may present for now.  Let’s execute a simple SQL statement that indicates how many from the following “parts” list appear in the “work orders”: MI8, NJ9, KG6 (note the use of DISTINCT – a single PART_ID could appear on more than one subassembly in a single WORKORDER_BASE_ID, WORKORDER_LOT_ID combination, and we do not want to repeatedly count those “parts”):

SELECT
  T.WORKORDER_TYPE,
  T.WORKORDER_BASE_ID,
  T.WORKORDER_LOT_ID,
  T.WORKORDER_SUB_ID,
  T.PART_ID,
  COUNT(DISTINCT T.PART_ID) OVER (PARTITION BY T.WORKORDER_TYPE, T.WORKORDER_BASE_ID, T.WORKORDER_LOT_ID) AS C
FROM
  T1 TWHERE  T.PART_ID IN ('MI8','NJ9','KG6');

W WORKORDER_BASE_ WOR WOR PART_ID                  C
- --------------- --- --- --------------- ----------
W 01000W          3   3   KG6                      3
W 01000W          3   3   MI8                      3
W 01000W          3   3   NJ9                      3
W 05000W          19  1   KG6                      3
W 05000W          19  1   MI8                      3
W 05000W          19  1   NJ9                      3
W 09000W          34  4   KG6                      3
W 09000W          34  4   MI8                      3
W 09000W          34  4   NJ9                      3

Only 9 rows returned, with three different WORKORDER_BASE_ID, WORKORDER_LOT_ID combinations.  For a moment, let’s ignore that the three “parts” appear in all three “work orders”.  If we wanted a list of those “work orders” where _all_ three of the “parts” are present, and not just _one or more_ of the three “parts”, we could simply slide the above SQL statement into an inline view and specify that our C generated column must be equal to (or greater than – the use of greater than will be explained later) 3:

SELECT
  WORKORDER_TYPE,
  WORKORDER_BASE_ID,
  WORKORDER_LOT_ID,
  WORKORDER_SUB_ID,
  PART_ID
FROM
  (SELECT
    T.WORKORDER_TYPE,
    T.WORKORDER_BASE_ID,
    T.WORKORDER_LOT_ID,
    T.WORKORDER_SUB_ID,
    T.PART_ID,
    COUNT(DISTINCT T.PART_ID) OVER (PARTITION BY T.WORKORDER_TYPE, T.WORKORDER_BASE_ID, T.WORKORDER_LOT_ID) AS C
  FROM
    T1 T
  WHERE
    T.PART_ID IN ('MI8','NJ9','KG6'))
WHERE
  C>=3
ORDER BY
  WORKORDER_BASE_ID,
  WORKORDER_LOT_ID,
  WORKORDER_SUB_ID;

W WORKORDER_BASE_ WOR WOR PART_ID
- --------------- --- --- -------
W 01000W          3   3   KG6
W 01000W          3   3   MI8
W 01000W          3   3   NJ9
W 05000W          19  1   KG6
W 05000W          19  1   MI8
W 05000W          19  1   NJ9
W 09000W          34  4   KG6
W 09000W          34  4   MI8
W 09000W          34  4   NJ9

While trying to determine the OP’s business case for the requests, I wondered if the OP was attempting to replace a specific “part” with another “part” within the ERP package, and simply required a report to understand the progress of that manual “part” replacement procedure in the ERP package.  Let’s move forward with this assumption by replacing a row containing a specific PART_ID value with a new PART_ID value, but only for a single “work order”:

DELETE FROM
  T1
WHERE
  WORKORDER_TYPE='W'
  AND WORKORDER_BASE_ID='09000W'
  AND WORKORDER_LOT_ID='34'
  AND PART_ID='NJ9';

1 row deleted.

INSERT INTO T1 VALUES (
  'W',
  '09000W',
  '34',
  '4',
  10,
  'REPLACEMENT-NJ9',
  10);

1 row created.

Let’s check the result of the SQL statement that we were in the process of developing:

SELECT
  WORKORDER_TYPE,
  WORKORDER_BASE_ID,
  WORKORDER_LOT_ID,
  WORKORDER_SUB_ID,
  PART_ID
FROM
  (SELECT
    T.WORKORDER_TYPE,
    T.WORKORDER_BASE_ID,
    T.WORKORDER_LOT_ID,
    T.WORKORDER_SUB_ID,
    T.PART_ID,
    COUNT(DISTINCT T.PART_ID) OVER (PARTITION BY T.WORKORDER_TYPE, T.WORKORDER_BASE_ID, T.WORKORDER_LOT_ID) AS C
  FROM
    T1 T
  WHERE
    T.PART_ID IN ('MI8','NJ9','KG6'))
WHERE
  C>=3
ORDER BY
  WORKORDER_BASE_ID,
  WORKORDER_LOT_ID,
  WORKORDER_SUB_ID;

W WORKORDER_BASE_ WOR WOR PART_ID
- --------------- --- --- -------
W 01000W          3   3   KG6
W 01000W          3   3   MI8
W 01000W          3   3   NJ9
W 05000W          19  1   KG6
W 05000W          19  1   MI8
W 05000W          19  1   NJ9

Now, there are only six rows returned from two “work orders” – the third “work order” is no longer included in the query results.

With the use of the C >= n syntax, we are able to find “work orders” with at least n number of the specified “parts”.  So, if we were interested in finding cases where at least two of the three listed “parts” are attached to a “work order”, we just need to change the 3 to 2:

SELECT
  WORKORDER_TYPE,
  WORKORDER_BASE_ID,
  WORKORDER_LOT_ID,
  WORKORDER_SUB_ID,
  PART_ID
FROM
  (SELECT
    T.WORKORDER_TYPE,
    T.WORKORDER_BASE_ID,
    T.WORKORDER_LOT_ID,
    T.WORKORDER_SUB_ID,
    T.PART_ID,
    COUNT(DISTINCT T.PART_ID) OVER (PARTITION BY T.WORKORDER_TYPE, T.WORKORDER_BASE_ID, T.WORKORDER_LOT_ID) AS C
  FROM
    T1 T
  WHERE
    T.PART_ID IN ('MI8','NJ9','KG6'))
WHERE
  C>=2
ORDER BY
  WORKORDER_BASE_ID,
  WORKORDER_LOT_ID,
  WORKORDER_SUB_ID;

W WORKORDER_BASE_ WOR WOR PART_ID
- --------------- --- --- -------
W 01000W          3   3   KG6
W 01000W          3   3   MI8
W 01000W          3   3   NJ9
W 05000W          19  1   KG6
W 05000W          19  1   NJ9
W 05000W          19  1   MI8
W 09000W          34  4   KG6
W 09000W          34  4   MI8

The third “work order” is again included in the query resultset.

Other than putting the results into Microsoft Excel, the sample query satisfies the first request of the OP.  If we were not interested in returning the WORKORDER_SUB_ID and PART_ID columns, we could have simply used the following SQL statement:

SELECT
  T.WORKORDER_TYPE,
  T.WORKORDER_BASE_ID,
  T.WORKORDER_LOT_ID
FROM
  T1 T
WHERE
  T.PART_ID IN ('MI8','NJ9','KG6')
GROUP BY
  T.WORKORDER_TYPE,
  T.WORKORDER_BASE_ID,
  T.WORKORDER_LOT_ID
HAVING
  COUNT(*)>=3;

W WORKORDER_BASE_ WOR
- --------------- ---
W 05000W          19
W 01000W          3

Now to work on the second request, only retrieving the “work orders” when all of a specified list of “parts” are attached to the “work order” and none of another list of “parts” are attached to the “work order”.  As a starting point, we will re-use the inline view from the previous solution.  The IN list will be modified to include the “parts” that must be attached to the “work order”, as well as the “parts” that must not be attached to the “work order”.  The COUNT analytic function must be modified to include either a DECODE or CASE structure to divide the “parts” into the “must” and “must not” lists so that a count of each may be obtained:

SELECT
  T.WORKORDER_TYPE,
  T.WORKORDER_BASE_ID,
  T.WORKORDER_LOT_ID,
  T.WORKORDER_SUB_ID,
  T.PART_ID,
  COUNT(DISTINCT (CASE WHEN T.PART_ID IN ('MI8','NJ9','KG6') THEN T.PART_ID ELSE NULL END)) OVER (PARTITION BY T.WORKORDER_TYPE, T.WORKORDER_BASE_ID, T.WORKORDER_LOT_ID) AS C,
  COUNT(DISTINCT (CASE WHEN T.PART_ID IN ('REPLACEMENT-NJ9') THEN T.PART_ID ELSE NULL END)) OVER (PARTITION BY T.WORKORDER_TYPE, T.WORKORDER_BASE_ID, T.WORKORDER_LOT_ID) AS NC
FROM
  T1 T
WHERE
  T.PART_ID IN ('MI8','KG6',   'REPLACEMENT-NJ9');

W WORKORDER_BASE_ WOR WOR PART_ID                  C         NC
- --------------- --- --- --------------- ---------- ----------
W 01000W          3   3   KG6                      2          0
W 01000W          3   3   MI8                      2          0
W 05000W          19  1   KG6                      2          0
W 05000W          19  1   MI8                      2          0
W 09000W          34  4   KG6                      2          1
W 09000W          34  4   MI8                      2          1
W 09000W          34  4   REPLACEMENT-NJ9          2          1

To satisfy the OP’s second request, we are interested in those rows where the C column value is (at least) 2 and the NC column value is 0.  Just as before, we will slide the above SQL statement into an inline view and add the specifications for the C and NC generated columns:

SELECT
  WORKORDER_TYPE,
  WORKORDER_BASE_ID,
  WORKORDER_LOT_ID,
  WORKORDER_SUB_ID,
  PART_ID
FROM
  (SELECT
    T.WORKORDER_TYPE,
    T.WORKORDER_BASE_ID,
    T.WORKORDER_LOT_ID,
    T.WORKORDER_SUB_ID,
    T.PART_ID,
    COUNT(DISTINCT (CASE WHEN T.PART_ID IN ('MI8','NJ9','KG6') THEN T.PART_ID ELSE NULL END)) OVER (PARTITION BY T.WORKORDER_TYPE, T.WORKORDER_BASE_ID, T.WORKORDER_LOT_ID) AS C,
    COUNT(DISTINCT (CASE WHEN T.PART_ID IN ('REPLACEMENT-NJ9') THEN T.PART_ID ELSE NULL END)) OVER (PARTITION BY T.WORKORDER_TYPE, T.WORKORDER_BASE_ID, T.WORKORDER_LOT_ID) AS NC
  FROM
    T1 T
  WHERE
    T.PART_ID IN ('MI8','KG6',   'REPLACEMENT-NJ9'))
WHERE
  C>=2
  AND NC=0
ORDER BY
  WORKORDER_BASE_ID,
  WORKORDER_LOT_ID,
  WORKORDER_SUB_ID;

W WORKORDER_BASE_ WOR WOR PART_ID
- --------------- --- --- -------
W 01000W          3   3   KG6
W 01000W          3   3   MI8
W 05000W          19  1   KG6
W 05000W          19  1   MI8

The above shows those “work orders” where the NJ9 “part” had not yet been replaced with the REPLACEMENT-NJ9 “part”.  If desired, we are also able to easily modify the OP’s second request to see the “work orders” where the “part” was already replaced just by changing NC=0 to NC=1:

SELECT
  WORKORDER_TYPE,
  WORKORDER_BASE_ID,
  WORKORDER_LOT_ID,
  WORKORDER_SUB_ID,
  PART_ID
FROM
  (SELECT
    T.WORKORDER_TYPE,
    T.WORKORDER_BASE_ID,
    T.WORKORDER_LOT_ID,
    T.WORKORDER_SUB_ID,
    T.PART_ID,
    COUNT(DISTINCT (CASE WHEN T.PART_ID IN ('MI8','NJ9','KG6') THEN T.PART_ID ELSE NULL END)) OVER (PARTITION BY T.WORKORDER_TYPE, T.WORKORDER_BASE_ID, T.WORKORDER_LOT_ID) AS C,
    COUNT(DISTINCT (CASE WHEN T.PART_ID IN ('REPLACEMENT-NJ9') THEN T.PART_ID ELSE NULL END)) OVER (PARTITION BY T.WORKORDER_TYPE, T.WORKORDER_BASE_ID, T.WORKORDER_LOT_ID) AS NC
  FROM
    T1 T
  WHERE
    T.PART_ID IN ('MI8','KG6',   'REPLACEMENT-NJ9'))
WHERE
  C>=2
  AND NC=1
ORDER BY
  WORKORDER_BASE_ID,
  WORKORDER_LOT_ID,
  WORKORDER_SUB_ID;

W WORKORDER_BASE_ WOR WOR PART_ID
- --------------- --- --- ---------------
W 09000W          34  4   KG6
W 09000W          34  4   REPLACEMENT-NJ9
W 09000W          34  4   MI8

What other related information are we able to return?  What about all “parts” (the full bill of material) for each of the work orders that match the OP’s second request.  That modification is simple, just slide the majority of the above SQL statement into a subquery (found in the first WHERE clause):

SELECT
  *
FROM
  T1
WHERE
  (WORKORDER_TYPE,
   WORKORDER_BASE_ID,
   WORKORDER_LOT_ID) IN
    (SELECT
      WORKORDER_TYPE,
      WORKORDER_BASE_ID,
      WORKORDER_LOT_ID
    FROM
      (SELECT
        T.WORKORDER_TYPE,
        T.WORKORDER_BASE_ID,
        T.WORKORDER_LOT_ID,
        T.WORKORDER_SUB_ID,
        T.PART_ID,
        COUNT(DISTINCT (CASE WHEN T.PART_ID IN ('MI8','NJ9','KG6') THEN T.PART_ID ELSE NULL END)) OVER (PARTITION BY T.WORKORDER_TYPE, T.WORKORDER_BASE_ID, T.WORKORDER_LOT_ID) AS C,
        COUNT(DISTINCT (CASE WHEN T.PART_ID IN ('REPLACEMENT-NJ9') THEN T.PART_ID ELSE NULL END)) OVER (PARTITION BY T.WORKORDER_TYPE, T.WORKORDER_BASE_ID, T.WORKORDER_LOT_ID) AS NC
      FROM
        T1 T
      WHERE
        T.PART_ID IN ('MI8','KG6',   'REPLACEMENT-NJ9'))
    WHERE
      C>=2
      AND NC=0)
ORDER BY
  WORKORDER_BASE_ID,
  WORKORDER_LOT_ID,
  WORKORDER_SUB_ID,
  PIECE_NO;

W WORKORDER_BASE_ WOR WOR   PIECE_NO PART_ID                QTY
- --------------- --- --- ---------- --------------- ----------
W 01000W          3   1            1 ZK5                     48
W 01000W          3   1            2 AL6                     60
W 01000W          3   1            3 BM7                     42
W 01000W          3   1            4 CN8                     26
W 01000W          3   1            5 DO9                     49
W 01000W          3   2            6 EA0                     66
W 01000W          3   2            7 FB1                     39
W 01000W          3   2            8 GC2                     57
W 01000W          3   2            9 HD3                     42
W 01000W          3   2           10 IE4                     52
W 01000W          3   3           11 JF5                     16
W 01000W          3   3           12 KG6                     61
W 01000W          3   3           13 LH7                     39
W 01000W          3   3           14 MI8                     75
W 01000W          3   3           15 NJ9                     94
W 01000W          3   4           16 OK0                     67
W 01000W          3   4           17 PL1                     12
W 01000W          3   4           18 QM2                     41
W 01000W          3   4           19 RN3                     97
W 01000W          3   4           20 SO4                     15
W 01000W          3   5           21 TA5                     54
W 01000W          3   5           22 UB6                     59
W 01000W          3   5           23 VC7                     23
W 01000W          3   5           24 WD8                      7
W 01000W          3   5           25 YJ4                     22
W 05000W          19  1            1 JF5                     31
W 05000W          19  1            2 KG6                     42
W 05000W          19  1            3 LH7                     37
W 05000W          19  1            4 MI8                     85
W 05000W          19  1            5 NJ9                    100
W 05000W          19  2            6 OK0                     43
W 05000W          19  2            7 PL1                     58
W 05000W          19  2            8 QM2                     76
W 05000W          19  2            9 RN3                     66
W 05000W          19  2           10 SO4                     75
W 05000W          19  3           11 TA5                     50
W 05000W          19  3           12 UB6                     55
W 05000W          19  3           13 VC7                     18
W 05000W          19  3           14 WD8                     33
W 05000W          19  3           15 XE9                      8
W 05000W          19  4           16 YF0                     86
W 05000W          19  4           17 ZG1                      7
W 05000W          19  4           18 AH2                     25
W 05000W          19  4           19 BI3                     38
W 05000W          19  4           20 CJ4                     34
W 05000W          19  5           21 DK5                     88
W 05000W          19  5           22 EL6                     91
W 05000W          19  5           23 FM7                     52
W 05000W          19  5           24 GN8                     71
W 05000W          19  5           25 IE4                     55

Just as before, we are able to see all “parts” for the “work orders” that already have the replacement part, just by changing the NC=0 to NC=1:

SELECT
  *
FROM
  T1
WHERE
  (WORKORDER_TYPE,
   WORKORDER_BASE_ID,
   WORKORDER_LOT_ID) IN
    (SELECT
      WORKORDER_TYPE,
      WORKORDER_BASE_ID,
      WORKORDER_LOT_ID
    FROM
      (SELECT
        T.WORKORDER_TYPE,
        T.WORKORDER_BASE_ID,
        T.WORKORDER_LOT_ID,
        T.WORKORDER_SUB_ID,
        T.PART_ID,
        COUNT(DISTINCT (CASE WHEN T.PART_ID IN ('MI8','NJ9','KG6') THEN T.PART_ID ELSE NULL END)) OVER (PARTITION BY T.WORKORDER_TYPE, T.WORKORDER_BASE_ID, T.WORKORDER_LOT_ID) AS C,
        COUNT(DISTINCT (CASE WHEN T.PART_ID IN ('REPLACEMENT-NJ9') THEN T.PART_ID ELSE NULL END)) OVER (PARTITION BY T.WORKORDER_TYPE, T.WORKORDER_BASE_ID, T.WORKORDER_LOT_ID) AS NC
      FROM
        T1 T
      WHERE
        T.PART_ID IN ('MI8','KG6',   'REPLACEMENT-NJ9'))
    WHERE
      C>=2
      AND NC=1)
ORDER BY
  WORKORDER_BASE_ID,
  WORKORDER_LOT_ID,
  WORKORDER_SUB_ID,
  PIECE_NO;

W WORKORDER_BASE_ WOR WOR   PIECE_NO PART_ID                QTY
- --------------- --- --- ---------- --------------- ----------
W 09000W          34  1            1 UF0                     80
W 09000W          34  1            2 VG1                     75
W 09000W          34  1            3 WH2                     75
W 09000W          34  1            4 XI3                     77
W 09000W          34  1            5 YJ4                     46
W 09000W          34  2            6 ZK5                     57
W 09000W          34  2            7 AL6                     64
W 09000W          34  2            8 BM7                     14
W 09000W          34  2            9 CN8                     30
W 09000W          34  2           10 DO9                     62
W 09000W          34  3           11 EA0                     45
W 09000W          34  3           12 FB1                     68
W 09000W          34  3           13 GC2                     25
W 09000W          34  3           14 HD3                     73
W 09000W          34  3           15 IE4                     32
W 09000W          34  4           10 REPLACEMENT-NJ9         10
W 09000W          34  4           16 JF5                     49
W 09000W          34  4           17 KG6                     23
W 09000W          34  4           18 LH7                     60
W 09000W          34  4           19 MI8                     75
W 09000W          34  5           21 OK0                     91
W 09000W          34  5           22 PL1                     66
W 09000W          34  5           23 QM2                     45
W 09000W          34  5           24 RN3                     49
W 09000W          34  5           25 TE9                      6

Using the sample table and data provided, are there any other solutions to the OP’s requests?





Grouping Data Sets by Week Number of the Month

1 05 2013

May 1, 2013

I saw a decent SQL brain teaser this morning in the comp.databases.oracle.server Usenet group.  The OP in the message thread is attempting to summarize data in one of his tables, with the summarizations broken down by month and then the week within that month.  Increasing the challenge, the OP required that the dates defining the week start and end dates fall within the month’s data set:

“Today is 4/30/2013.  The we_end [last day of the week containing April 30, 2013] value is 5/4/2013.  What I am really looking for is the weeks within the calendar month.  So, the first week for April 2013 should show 4/1 – 4/6.  And the last week should show 4/28 – 4/30″

How would you solve that particular problem?  Is there an Oracle built-in function that provides the grouping requested by the OP?

-

-

-

-

There are a couple different methods to solve the problem posed by the OP.  One method uses a virtual lookup table to determine the start date and end date of each week in the month, using the restriction that the start date of each week or the end date of each week might need to be adjusted to fall within the same month.  Interestingly, with the restrictions specified by the OP, there are 62 weeks this year.

As with all SQL statements, there is must be a sensible starting point.  First, let’s determine the start and end dates of each month in 2013.  The month start date is easy to determine, while outputting the month end date is a little more challenging – that date is one day less than the start of the next month:

SELECT 
  ADD_MONTHS(TO_DATE('01-01-2013','MM-DD-YYYY'),(LEVEL-1)) MONTH_START, 
  ADD_MONTHS(TO_DATE('01-01-2013','MM-DD-YYYY'),LEVEL)-1 MONTH_END 
FROM 
  DUAL 
CONNECT BY 
  LEVEL<=12; 

MONTH_STA MONTH_END 
--------- --------- 
01-JAN-13 31-JAN-13 
01-FEB-13 28-FEB-13 
01-MAR-13 31-MAR-13 
01-APR-13 30-APR-13 
01-MAY-13 31-MAY-13 
01-JUN-13 30-JUN-13 
01-JUL-13 31-JUL-13 
01-AUG-13 31-AUG-13 
01-SEP-13 30-SEP-13 
01-OCT-13 31-OCT-13 
01-NOV-13 30-NOV-13 
01-DEC-13 31-DEC-13

There are a couple of approaches that could be taken for the next step, finding the start of the first, second, third, fourth, fifth, and potentially sixth week in each of those months.  I am interested in determining the start of the second week – we are able to accomplish that task by using the NEXT_DAY function to find the next Sunday after the first of the month.  The end of the first week will also be calculated, even though it is not used in later calculations.  Note that this solution is subject to problems depending on the NLS settings on the client (and whether or not Sunday is considered the first day of the week):

SELECT 
  ADD_MONTHS(TO_DATE('01-01-2013','MM-DD-YYYY'),(LEVEL-1)) MONTH_START, 
  ADD_MONTHS(TO_DATE('01-01-2013','MM-DD-YYYY'),LEVEL)-1 MONTH_END, 
  NEXT_DAY(ADD_MONTHS(TO_DATE('01-01-2013','MM-DD-YYYY'),(LEVEL-1))-1,'SATURDAY') END_FIRST_WEEK,
  NEXT_DAY(ADD_MONTHS(TO_DATE('01-01-2013','MM-DD-YYYY'),(LEVEL-1)),'SUNDAY') START_SECOND_WEEK
 FROM 
  DUAL 
CONNECT BY 
  LEVEL<=12; 

MONTH_STA MONTH_END END_FIRST START_SEC 
--------- --------- --------- --------- 
01-JAN-13 31-JAN-13 05-JAN-13 06-JAN-13 
01-FEB-13 28-FEB-13 02-FEB-13 03-FEB-13 
01-MAR-13 31-MAR-13 02-MAR-13 03-MAR-13 
01-APR-13 30-APR-13 06-APR-13 07-APR-13 
01-MAY-13 31-MAY-13 04-MAY-13 05-MAY-13 
01-JUN-13 30-JUN-13 01-JUN-13 02-JUN-13 
01-JUL-13 31-JUL-13 06-JUL-13 07-JUL-13 
01-AUG-13 31-AUG-13 03-AUG-13 04-AUG-13 
01-SEP-13 30-SEP-13 07-SEP-13 08-SEP-13 
01-OCT-13 31-OCT-13 05-OCT-13 06-OCT-13 
01-NOV-13 30-NOV-13 02-NOV-13 03-NOV-13 
01-DEC-13 31-DEC-13 07-DEC-13 08-DEC-13

The above may appear to be a number of related, but useless dates.  However, that is not the case.  Next, we will need a method to count through as many as six weeks per month.  This should work:

SELECT 
  LEVEL WEEK_NUMBER 
FROM 
  DUAL 
CONNECT BY 
  LEVEL<=6; 

WEEK_NUMBER 
----------- 
          1 
          2 
          3 
          4 
          5 
          6

Next, a Cartesian join will be created between the two row sources to permit stepping through each month, and each week within that month (we will refine the output later… note that this unrefined output provides a clue for a second method of solving the problem posed by the OP):

SELECT 
  MONTHS.MONTH_START, 
  WN.WEEK_NUMBER, 
  MONTHS.START_SECOND_WEEK + (WN.WEEK_NUMBER-2)*7 START_WEEK, 
  MONTHS.START_SECOND_WEEK + (WN.WEEK_NUMBER-1)*7 - 1 END_WEEK 
FROM 
  (SELECT 
    ADD_MONTHS(TO_DATE('01-01-2013','MM-DD-YYYY'),(LEVEL-1)) MONTH_START, 
    ADD_MONTHS(TO_DATE('01-01-2013','MM-DD-YYYY'),LEVEL)-1 MONTH_END, 
    NEXT_DAY(ADD_MONTHS(TO_DATE('01-01-2013','MM-DD-YYYY'),(LEVEL-1))-1,'SATURDAY') END_FIRST_WEEK,
    NEXT_DAY(ADD_MONTHS(TO_DATE('01-01-2013','MM-DD-YYYY'),(LEVEL-1)),'SUNDAY') START_SECOND_WEEK
   FROM 
    DUAL 
  CONNECT BY 
    LEVEL<=12) MONTHS, 
  (SELECT 
    LEVEL WEEK_NUMBER 
  FROM 
    DUAL 
  CONNECT BY 
    LEVEL<=6) WN 
WHERE 
  (MONTHS.START_SECOND_WEEK + (WN.WEEK_NUMBER-2)*7)<=MONTHS.MONTH_END; 

MONTH_STA WEEK_NUMBER START_WEE END_WEEK 
--------- ----------- --------- --------- 
01-JAN-13           1 30-DEC-12 05-JAN-13 
01-JAN-13           2 06-JAN-13 12-JAN-13 
01-JAN-13           3 13-JAN-13 19-JAN-13 
01-JAN-13           4 20-JAN-13 26-JAN-13 
01-JAN-13           5 27-JAN-13 02-FEB-13 
01-FEB-13           1 27-JAN-13 02-FEB-13 
01-FEB-13           2 03-FEB-13 09-FEB-13 
01-FEB-13           3 10-FEB-13 16-FEB-13 
01-FEB-13           4 17-FEB-13 23-FEB-13 
01-FEB-13           5 24-FEB-13 02-MAR-13 
01-MAR-13           1 24-FEB-13 02-MAR-13 
01-MAR-13           2 03-MAR-13 09-MAR-13 
01-MAR-13           3 10-MAR-13 16-MAR-13 
01-MAR-13           4 17-MAR-13 23-MAR-13 
01-MAR-13           5 24-MAR-13 30-MAR-13 
01-MAR-13           6 31-MAR-13 06-APR-13 
01-APR-13           1 31-MAR-13 06-APR-13 
01-APR-13           2 07-APR-13 13-APR-13 
01-APR-13           3 14-APR-13 20-APR-13 
01-APR-13           4 21-APR-13 27-APR-13 
01-APR-13           5 28-APR-13 04-MAY-13 
01-MAY-13           1 28-APR-13 04-MAY-13 
01-MAY-13           2 05-MAY-13 11-MAY-13 
01-MAY-13           3 12-MAY-13 18-MAY-13 
01-MAY-13           4 19-MAY-13 25-MAY-13 
01-MAY-13           5 26-MAY-13 01-JUN-13 
01-JUN-13           1 26-MAY-13 01-JUN-13 
...

The above output has a slight problem – the Sunday in the first week of the month may not be in the month of interest; additionally, the Saturday in the last week of the month may not be in the month of interest.  We may overcome those problems with the help of the DECODE function (the CASE expression may be used instead, if so desired):

SELECT 
  MONTHS.MONTH_START, 
  WN.WEEK_NUMBER, 
  DECODE(WN.WEEK_NUMBER, 1, MONTHS.MONTH_START, MONTHS.START_SECOND_WEEK + (WN.WEEK_NUMBER-2)*7) START_WEEK,
  DECODE(SIGN(MONTHS.MONTH_END - (MONTHS.START_SECOND_WEEK + (WN.WEEK_NUMBER-1)*7)), -1, MONTHS.MONTH_END, MONTHS.START_SECOND_WEEK + (WN.WEEK_NUMBER-1)*7 - 1) END_WEEK
FROM 
  (SELECT 
    ADD_MONTHS(TO_DATE('01-01-2013','MM-DD-YYYY'),(LEVEL-1)) MONTH_START, 
    ADD_MONTHS(TO_DATE('01-01-2013','MM-DD-YYYY'),LEVEL)-1 MONTH_END, 
    NEXT_DAY(ADD_MONTHS(TO_DATE('01-01-2013','MM-DD-YYYY'),(LEVEL-1))-1,'SATURDAY') END_FIRST_WEEK,
     NEXT_DAY(ADD_MONTHS(TO_DATE('01-01-2013','MM-DD-YYYY'),(LEVEL-1)),'SUNDAY') START_SECOND_WEEK
   FROM 
    DUAL 
  CONNECT BY 
    LEVEL<=12) MONTHS, 
  (SELECT 
    LEVEL WEEK_NUMBER 
  FROM 
    DUAL 
  CONNECT BY 
    LEVEL<=6) WN 
WHERE 
  (MONTHS.START_SECOND_WEEK + (WN.WEEK_NUMBER-2)*7)<=MONTHS.MONTH_END; 

MONTH_STA WEEK_NUMBER START_WEE END_WEEK 
--------- ----------- --------- --------- 
01-JAN-13           1 01-JAN-13 05-JAN-13 
01-JAN-13           2 06-JAN-13 12-JAN-13 
01-JAN-13           3 13-JAN-13 19-JAN-13 
01-JAN-13           4 20-JAN-13 26-JAN-13 
01-JAN-13           5 27-JAN-13 31-JAN-13 
01-FEB-13           1 01-FEB-13 02-FEB-13 
01-FEB-13           2 03-FEB-13 09-FEB-13 
01-FEB-13           3 10-FEB-13 16-FEB-13 
01-FEB-13           4 17-FEB-13 23-FEB-13 
01-FEB-13           5 24-FEB-13 28-FEB-13 
01-MAR-13           1 01-MAR-13 02-MAR-13 
01-MAR-13           2 03-MAR-13 09-MAR-13 
01-MAR-13           3 10-MAR-13 16-MAR-13 
01-MAR-13           4 17-MAR-13 23-MAR-13 
01-MAR-13           5 24-MAR-13 30-MAR-13 
01-MAR-13           6 31-MAR-13 31-MAR-13 
01-APR-13           1 01-APR-13 06-APR-13 
01-APR-13           2 07-APR-13 13-APR-13 
01-APR-13           3 14-APR-13 20-APR-13 
01-APR-13           4 21-APR-13 27-APR-13 
01-APR-13           5 28-APR-13 30-APR-13 
01-MAY-13           1 01-MAY-13 04-MAY-13 
01-MAY-13           2 05-MAY-13 11-MAY-13 
01-MAY-13           3 12-MAY-13 18-MAY-13 
01-MAY-13           4 19-MAY-13 25-MAY-13 
01-MAY-13           5 26-MAY-13 31-MAY-13 
... 
01-DEC-13           5 29-DEC-13 31-DEC-13 

62 rows selected.

Yes, this solution added 10 extra weeks to the year.  :-)

It is easy to extend this solution to cover more than just the year 2013.  For example, the following modification sets the starting period to January 2000, and runs for 20 years:

SELECT 
  MONTHS.MONTH_START, 
  WN.WEEK_NUMBER, 
  DECODE(WN.WEEK_NUMBER, 1, MONTHS.MONTH_START, MONTHS.START_SECOND_WEEK + (WN.WEEK_NUMBER-2)*7) START_WEEK,
  DECODE(SIGN(MONTHS.MONTH_END - (MONTHS.START_SECOND_WEEK + (WN.WEEK_NUMBER-1)*7)), -1, MONTHS.MONTH_END, MONTHS.START_SECOND_WEEK + (WN.WEEK_NUMBER-1)*7 - 1) END_WEEK
FROM 
  (SELECT 
    ADD_MONTHS(TO_DATE('01-01-2000','MM-DD-YYYY'),(LEVEL-1)) MONTH_START, 
    ADD_MONTHS(TO_DATE('01-01-2000','MM-DD-YYYY'),LEVEL)-1 MONTH_END, 
    NEXT_DAY(ADD_MONTHS(TO_DATE('01-01-2000','MM-DD-YYYY'),(LEVEL-1))-1,'SATURDAY') END_FIRST_WEEK,
     NEXT_DAY(ADD_MONTHS(TO_DATE('01-01-2000','MM-DD-YYYY'),(LEVEL-1)),'SUNDAY') START_SECOND_WEEK
   FROM 
    DUAL 
  CONNECT BY 
    LEVEL<=(12*20)) MONTHS, 
  (SELECT 
    LEVEL WEEK_NUMBER 
  FROM 
    DUAL 
  CONNECT BY 
    LEVEL<=6) WN 
WHERE 
  (MONTHS.START_SECOND_WEEK + (WN.WEEK_NUMBER-2)*7)<=MONTHS.MONTH_END;

With the virtual lookup table complete, the next step would be to slide the above SQL statement into an inline view, and then join it to the data set that will be analyzed.  Assuming that the data set has a column named DATE_SENT, the WHERE clause may appear similar to the following:

WHERE
  DATE_SENT BETWEEN START_WEEK AND END_WEEK

The above is one solution, are there other solutions?  What about a solution that uses a little simple mathematics?  Such a solution could be exponentially more efficient.  First, we will create a simple table for experimentation with 1000 rows containing somewhat random dates on or after January 1, 2013:

CREATE TABLE T1 AS
SELECT
  TRUNC(TO_DATE('01-01-2013','MM-DD-YYYY')+DBMS_RANDOM.VALUE*366) DATE_SENT
FROM
  DUAL
CONNECT BY
  LEVEL<=1000;

Next, we will generate a SQL statement that produces output that is somewhat similar to the unrefined output found in the first solution (note that the SQL statement contains a DISTINCT clause to eliminate duplicate rows – there could be several rows in the row source with the same random date value):

SELECT DISTINCT
  DATE_SENT,
  TRUNC(DATE_SENT,'MM') MONTH_START,
  ADD_MONTHS(TRUNC(DATE_SENT,'MM'),1)-1 MONTH_END,
  NEXT_DAY(TRUNC(DATE_SENT,'MM')-7,'SUNDAY') MONTH_WEEK_START
FROM
  T1
ORDER BY
  DATE_SENT;

DATE_SENT MONTH_STA MONTH_END MONTH_WEE
--------- --------- --------- ---------
01-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
02-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
03-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
04-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
05-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
06-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
07-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
08-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
09-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
10-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
11-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
12-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
13-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
14-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
15-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
16-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
17-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
18-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
19-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
20-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
21-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
22-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
23-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
24-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
25-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
26-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
27-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
28-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
29-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
30-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
31-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
01-FEB-13 01-FEB-13 28-FEB-13 27-JAN-13
02-FEB-13 01-FEB-13 28-FEB-13 27-JAN-13
03-FEB-13 01-FEB-13 28-FEB-13 27-JAN-13
...
29-DEC-13 01-DEC-13 31-DEC-13 01-DEC-13
30-DEC-13 01-DEC-13 31-DEC-13 01-DEC-13
31-DEC-13 01-DEC-13 31-DEC-13 01-DEC-13
01-JAN-14 01-JAN-14 31-JAN-14 29-DEC-13

In the above, the MONTH_WEEK_START column shows the Sunday of the week that includes the DATE_SENT column value.  At this point, it makes no difference if the MONTH_WEEK_START column value is in a different month than the DATE_SENT column value.

Next, we will determine the week within the month in which the DATE_SENT column value appears:

SELECT DISTINCT
  DATE_SENT,
  TRUNC(DATE_SENT,'MM') MONTH_START,
  ADD_MONTHS(TRUNC(DATE_SENT,'MM'),1)-1 MONTH_END,
  NEXT_DAY(TRUNC(DATE_SENT,'MM')-7,'SUNDAY') MONTH_WEEK_START,
  TRUNC((DATE_SENT-(NEXT_DAY(TRUNC(DATE_SENT,'MM')-7,'SUNDAY')))/7)+1 WEEK_NUMBER
FROM
  T1
ORDER BY
  DATE_SENT;

DATE_SENT MONTH_STA MONTH_END MONTH_WEE WEEK_NUMBER
--------- --------- --------- --------- -----------
01-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           1
02-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           1
03-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           1
04-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           1
05-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           1
06-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           2
07-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           2
08-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           2
09-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           2
10-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           2
11-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           2
12-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           2
13-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           3
14-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           3
15-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           3
16-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           3
17-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           3
18-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           3
19-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           3
20-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           4
21-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           4
22-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           4
23-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           4
24-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           4
25-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           4
26-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           4
27-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           5
28-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           5
29-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           5
30-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           5
31-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           5
01-FEB-13 01-FEB-13 28-FEB-13 27-JAN-13           1
02-FEB-13 01-FEB-13 28-FEB-13 27-JAN-13           1
03-FEB-13 01-FEB-13 28-FEB-13 27-JAN-13           2
04-FEB-13 01-FEB-13 28-FEB-13 27-JAN-13           2
...
24-DEC-13 01-DEC-13 31-DEC-13 01-DEC-13           4
26-DEC-13 01-DEC-13 31-DEC-13 01-DEC-13           4
27-DEC-13 01-DEC-13 31-DEC-13 01-DEC-13           4
28-DEC-13 01-DEC-13 31-DEC-13 01-DEC-13           4
29-DEC-13 01-DEC-13 31-DEC-13 01-DEC-13           5
30-DEC-13 01-DEC-13 31-DEC-13 01-DEC-13           5
31-DEC-13 01-DEC-13 31-DEC-13 01-DEC-13           5
01-JAN-14 01-JAN-14 31-JAN-14 29-DEC-13           1

Heading toward the request made by the OP in the Usenet thread, the following SQL statement will determine a simple COUNT of the number of rows that have a DATE_SENT date in each week of each month in the T1 table’s data set (note that if there are no rows in a given week, that week will be skipped in the output):

SELECT
  TRUNC(DATE_SENT,'MM') MONTH_START,
  TRUNC((DATE_SENT-(NEXT_DAY(TRUNC(DATE_SENT,'MM')-7,'SUNDAY')))/7)+1 WEEK_NUMBER,
  COUNT(*) DAYS_IN_SET
FROM
  T1
GROUP BY
  TRUNC(DATE_SENT,'MM'),
  TRUNC((DATE_SENT-(NEXT_DAY(TRUNC(DATE_SENT,'MM')-7,'SUNDAY')))/7)+1
ORDER BY
  1,
  2;

MONTH_STA WEEK_NUMBER DAYS_IN_SET
--------- ----------- -----------
01-JAN-13           1          13
01-JAN-13           2          15
01-JAN-13           3          23
01-JAN-13           4          21
01-JAN-13           5          13
01-FEB-13           1           5
01-FEB-13           2          21
01-FEB-13           3          15
01-FEB-13           4          14
01-FEB-13           5          11
01-MAR-13           1           4
01-MAR-13           2          18
01-MAR-13           3          19
01-MAR-13           4          20
01-MAR-13           5          20
01-MAR-13           6           4
01-APR-13           1          14
01-APR-13           2          26
01-APR-13           3          15
01-APR-13           4          23
01-APR-13           5           9
01-MAY-13           1          12
01-MAY-13           2          15
01-MAY-13           3          22
01-MAY-13           4          23
01-MAY-13           5          17
01-JUN-13           1           1
01-JUN-13           2          14
01-JUN-13           3          21
01-JUN-13           4          18
01-JUN-13           5          21
01-JUN-13           6           4
01-JUL-13           1          15
01-JUL-13           2          21
01-JUL-13           3          13
01-JUL-13           4          24
01-JUL-13           5          14
01-AUG-13           1           6
01-AUG-13           2          10
01-AUG-13           3          21
01-AUG-13           4          22
01-AUG-13           5          28
01-SEP-13           1          16
01-SEP-13           2          16
01-SEP-13           3          22
01-SEP-13           4          23
01-SEP-13           5           8
01-OCT-13           1          15
01-OCT-13           2          15
01-OCT-13           3          17
01-OCT-13           4          18
01-OCT-13           5          21
01-NOV-13           1           6
01-NOV-13           2          20
01-NOV-13           3          20
01-NOV-13           4          17
01-NOV-13           5          18
01-DEC-13           1          22
01-DEC-13           2          15
01-DEC-13           3          19
01-DEC-13           4          20
01-DEC-13           5           6
01-JAN-14           1           1

It might be just my opinion, but the mathematical solution appears to be easier to understand than the method that uses the virtual lookup table.  We are still able to derive the start date and end date of each week, if needed, to produce the OP’s solution.

Might there be other solutions to the OP’s SQL brain teaser?





Analysis Challenges

25 04 2013

April 25, 2013

Roughly 12 years ago I was attempting to analyze customer order changes that were received through electronic document interchange (EDI), specifically X12 830 documents that show order forecasted demand for specific part numbers.  At the time, the EDI data was partially transformed and inserted into an Oracle 8.0.5 database, while that data is now inserted into an Oracle 11.2 database.  I recently took another glance that this particular problem to see if there is a better method for performing the analysis.

Complications in the analysis:

  • The EDI documents do not include the customer’s purchase order line numbers, just a list of outstanding due dates and quantities.  That complication means that there is no easy way to see if an outstanding quantity has changed, or the date for that outstanding quantity has changed.
  • It is possible to ship to the customer a partial order quantity, with the remaining quantity on the line handled as a back-order.  If the customer does not alter this particular forecast line, only the quantity outstanding will be provided, along with the due date – the original quantity is not provided.
  • It is possible to ship out an order line received in an 830 document up to five days before the customer’s due date.
  • Shipped parts may require hours or days to reach the customer’s site, thus delaying when those shipped parts are acknowledged in the list of outstanding due dates and quantities received through EDI.
  • EDI 830 documents are generally received the next weekday after the customer batch generates those documents on a nightly basis.  Thus, there is an opportunity for the received forecast to be out of sync with what is known to have arrived at the customer site, and what is acknowledged in the EDI data.
  • The customer may generate multiple EDI 830 documents for a particular part number/purchase order combination on any given day, and only the EDI 830 document received with the highest (sorted in ascending order) release number (which may not be a number) on a specific date is the valid release for the specific date.
  • EDI 830 documents may be generated for a specific part number by the customer nearly every weekday, once a month, or any frequency in between.

I recall trying to find a solution to this particular problem 12 years ago.  I thought that I could possibly pull all of the relevant data from the Oracle database into a custom application, and through some magical formula output what appeared to be a list of what the customer changed.  However, what about those part quantities that are in transit to the customer?  What about those part quantities that were received by the customer, but had not yet made their way through the customer’s batch processing system?  What about those cases where the customer sents two or more EDI 830 releases for a specific part number, and those EDI releases are received on the same date?

At the time, I was frustrated.  The quality of the output of my analysis would be dependent on the accuracy of humans (yes, and humans wrote the program that performed the analysis too).  I simply had to find the first matching due date and order quantity from the previous release with the due date and order quanitity from the current release, and then rely on an end user to correct the 5% to 10% of the cases where the matching algorithm was wrong due to one or more of the “complications” outlined above.

While recently re-examining the problem, I noticed a data element in the EDI 830 document that is described as “ACCUM RECEIPTS AS OF THIS DATE”, and a luck would have it, that particular data element is one of those elements that was selected more than 12 years ago to be transformed into the Oracle Database tables (in the sample data included with this blog article, this column is named PREVIOUS_ACCUM_QTY).  If only I recognized the meaning of the values of this column, and had access to Oracle analytical SQL functions 12 years ago, this particular analysis problem might not have had so many variables that could potentially throw off the quality of the analysis.

This particular blog article has a fairly large set of sample data – however, that sample data is a fairly small subset of the data that must be analyzed on a daily basis.  The sample data may be downloaded here: analysischallenges.sql sample code (save the file as “analysischallenges.sql” – without the .doc double extension).  If the file is saved in the same directory (folder) as SQL*Plus was started from, you may then execute the script in SQL*Plus to build the tables and populate the sample data by using the following command:

@analysischallenges.sql

The two sample table definitions are created with the following commands (also included in the script):

CREATE TABLE T1 (
  CUSTOMER_PO VARCHAR2(15),
  RELEASE_ID VARCHAR2(15),
  PART_ID VARCHAR2(30),
  RELEASE_DATE DATE,
  PREVIOUS_ACCUM_QTY NUMBER,
  CREATE_DATE DATE,
  PRIMARY KEY (CUSTOMER_PO,RELEASE_ID));

CREATE TABLE T1_LINE (
  CUSTOMER_PO VARCHAR2(12),
  RELEASE_ID VARCHAR2(12),
  QTY NUMBER,
  DUE_DATE DATE,
  PRIMARY KEY (CUSTOMER_PO,RELEASE_ID,DUE_DATE));

T1 Table Columns:

CUSTOMER_PO is the unique identifier for the customer purchase order received through the EDI 830 documents.  RELEASE_ID defines the specific serialized change number for the customer’s PO – the RELEASE_ID is unique for a specific PO.  PART_ID is the part number for which the customer is providing a forecast.  RELEASE_DATE is the date that the forecast was batch generated in the customer’s computer system.  PREVIOUS_ACCUM_QTY is the count of the number of this part number that the customer has already received for this specific customer purchase order.  CREATE_DATE is the date that the forecast was received from the customer and transformed/inserted into the Oracle Database tables.

T1_LINE Table Columns:

CUSTOMER_PO and RELEASE_ID are undeclared foreign key columns that point back to table T1.  QTY is the quantity of the part number that should (or is forecasted to) ship on the specfied DUE_DATE.

—–

If you are curious how I created the script to populate the tables, I executed the following in SQL*Plus, and then performed a little cleanup of the resulting spool files:

SET LINESIZE 200
SET TRIMSPOOL ON

SPOOL analysischallenges_t1.txt

SELECT
  'INSERT INTO T1 VALUES ('||CHR(39)||CUSTOMER_PO||CHR(39)||','
    ||CHR(39)||RELEASE_ID||CHR(39)||','
    ||CHR(39)||PART_ID||CHR(39)||','
    ||'TO_DATE('||CHR(39)||TO_CHAR(RELEASE_DATE,'DD-MON-YYYY')||CHR(39)||','||CHR(39)||'DD-MON-YYYY'||CHR(39)||')'||','
    ||PREVIOUS_ACCUM_QTY||','
    ||'TO_DATE('||CHR(39)||TO_CHAR(CREATE_DATE,'DD-MON-YYYY')||CHR(39)||','||CHR(39)||'DD-MON-YYYY'||CHR(39)||')'||');' D
FROM
  T1
ORDER BY
  CUSTOMER_PO,
  RELEASE_ID;

SPOOL OFF

SPOOL analysischallenges_t1_line.txt

SELECT
  'INSERT INTO T1_LINE VALUES ('||CHR(39)||CUSTOMER_PO||CHR(39)||','
    ||CHR(39)||RELEASE_ID||CHR(39)||','
    ||QTY||','
    ||'TO_DATE('||CHR(39)||TO_CHAR(DUE_DATE,'DD-MON-YYYY')||CHR(39)||','||CHR(39)||'DD-MON-YYYY'||CHR(39)||')'||');' D
FROM
  T1_LINE
ORDER BY
  CUSTOMER_PO,
  RELEASE_ID,
  DUE_DATE;

SPOOL OFF

—–

We must find a starting point for performing the analysis of what changed from one day’s forecast to a prior day’s forecast.  There is an issue with the EDI 830 releases that were received (identified by the CREATE_DATE column) on April 24, 2013 – two releases for the same customer purchase orders were received, and there were two purchase orders that demand the same part number.  Our analysis is only concerned with the highest numbered (sorted alphabetically) release number document for each customer purchase order that was received on any particular day, so we need to find a way to eliminate the other older releases for the same customer purchase order that was received on the same day.  The ROW_NUMBER analytic function is perfect for this task – we just need the output to show the same number (1) for the rows that we want to keep, and have some other number appear on the other rows.  Something similar to the following may work (note that the TRUNC function, or other approach that uses a date/time range, probably should be used when working with the CREATE_DATE column – realistically, there could be a time component included with those date values):

COLUMN PART_ID FORMAT A10
SET LINESIZE 120
SET PAGESIZE 1000

SELECT
  T.PART_ID,
  T.CUSTOMER_PO,
  ROW_NUMBER() OVER (PARTITION BY T.CUSTOMER_PO,TRUNC(T.CREATE_DATE) ORDER BY T.RELEASE_ID DESC) REL_ON_DATE,
  T.RELEASE_ID,
  T.RELEASE_DATE,
  T.CREATE_DATE,
  T.PREVIOUS_ACCUM_QTY
FROM
  T1 T
WHERE 
  T.CREATE_DATE=TO_DATE('24-APR-2013','DD-MON-YYYY')
ORDER BY
  T.PART_ID,
  T.CUSTOMER_PO,
  T.RELEASE_ID;

PART_ID    CUSTOMER_PO     REL_ON_DATE RELEASE_ID      RELEASE_D CREATE_DA PREVIOUS_ACCUM_QTY
---------- --------------- ----------- --------------- --------- --------- ------------------
DS1812+    1231234                   2 20130402-001    23-APR-13 24-APR-13                 24
DS1812+    1231234                   1 20130403-001    24-APR-13 24-APR-13                 24
DS1812+    1233290                   2 20130402-001    23-APR-13 24-APR-13                  4
DS1812+    1233290                   1 20130403-001    24-APR-13 24-APR-13                  4
DS212+     1133290                   2 20130402-001    23-APR-13 24-APR-13                  2
DS212+     1133290                   1 20130403-001    24-APR-13 24-APR-13                  2
RS812      1231280                   2 20130402-001    23-APR-13 24-APR-13                 60
RS812      1231280                   1 20130403-001    24-APR-13 24-APR-13                 60

To remove all of the rows that have something other than 1 in the REL_ON_DATE column, we are able to slide the above SQL statement into an inline view, and then specify that the REL_ON_DATE column value must be equal to 1.  Typically, there are supposed to be few/no changes made by the customer within eight weeks of the customer forecast’s release date, but that is not always the case.  While eliminating the unnecessary releases, it probably makes sense to also calculate the end date of this “firm” period by adding 55 days to the date on which the forecast was created:

SELECT
  T.PART_ID,
  T.CUSTOMER_PO,
  T.RELEASE_ID,
  T.RELEASE_DATE+55 POSSIBLE_FIRM_DATE,
  T.PREVIOUS_ACCUM_QTY
FROM
  (SELECT
    T.PART_ID,
    T.CUSTOMER_PO,
    ROW_NUMBER() OVER (PARTITION BY T.CUSTOMER_PO,TRUNC(T.CREATE_DATE) ORDER BY T.RELEASE_ID DESC) REL_ON_DATE,
    T.RELEASE_ID,
    T.RELEASE_DATE,
    T.CREATE_DATE,
    T.PREVIOUS_ACCUM_QTY
  FROM
    T1 T
  WHERE 
    T.CREATE_DATE=TO_DATE('24-APR-2013','DD-MON-YYYY')) T
WHERE
  T.REL_ON_DATE=1
ORDER BY
  T.PART_ID,
  T.CUSTOMER_PO,
  T.RELEASE_ID;

PART_ID    CUSTOMER_PO     RELEASE_ID      POSSIBLE_ PREVIOUS_ACCUM_QTY
---------- --------------- --------------- --------- ------------------
DS1812+    1231234         20130403-001    18-JUN-13                 24
DS1812+    1233290         20130403-001    18-JUN-13                  4
DS212+     1133290         20130403-001    18-JUN-13                  2
RS812      1231280         20130403-001    18-JUN-13                 60

Next, we need to retrieve the forecasted delivery schedule (in the T1_LINES table) that are associated with the above rows.  If we take the value in the PREVIOUS_ACCUM_QTY column, and find a way to generate a running SUM of the QTY column from the T1_LINE table, in theory we should be able to determine if the total quantity ordered through a specific date has changed.  The analytic version of the SUM function offers the capability of producing a running SUM when an ORDER BY clause is included in the function call.  To reduce the data set size, I will only consider those forecasts with a scheduled delivery date that is within 101 days of the customer’s forecast date.  With those changes in place, the SQL statement is modified as follows:

SELECT
  T.PART_ID,
  T.CUSTOMER_PO,
  T.RELEASE_ID,
  T.RELEASE_DATE+55 POSSIBLE_FIRM_DATE,
  T.PREVIOUS_ACCUM_QTY,
  TL.DUE_DATE,
  TL.QTY,
  T.PREVIOUS_ACCUM_QTY + SUM(TL.QTY) OVER (PARTITION BY T.CUSTOMER_PO,T.RELEASE_ID ORDER BY TL.DUE_DATE) REL_RUNNING_SUM
FROM
  (SELECT
    T.PART_ID,
    T.CUSTOMER_PO,
    ROW_NUMBER() OVER (PARTITION BY T.CUSTOMER_PO,TRUNC(T.CREATE_DATE) ORDER BY T.RELEASE_ID DESC) REL_ON_DATE,
    T.RELEASE_ID,
    T.RELEASE_DATE,
    T.CREATE_DATE,
    T.PREVIOUS_ACCUM_QTY
  FROM
    T1 T
  WHERE 
    T.CREATE_DATE=TO_DATE('24-APR-2013','DD-MON-YYYY')) T,
  T1_LINE TL
WHERE
  T.REL_ON_DATE=1
  AND T.CUSTOMER_PO=TL.CUSTOMER_PO
  AND T.RELEASE_ID=TL.RELEASE_ID
  AND (T.RELEASE_DATE+100)>=TL.DUE_DATE
ORDER BY
  T.PART_ID,
  T.CUSTOMER_PO,
  T.RELEASE_ID,
  TL.DUE_DATE;

PART_ID    CUSTOMER_PO     RELEASE_ID      POSSIBLE_ PREVIOUS_ACCUM_QTY DUE_DATE         QTY REL_RUNNING_SUM
---------- --------------- --------------- --------- ------------------ --------- ---------- ---------------
DS1812+    1231234         20130403-001    18-JUN-13                 24 24-APR-13          0              24
DS1812+    1233290         20130403-001    18-JUN-13                  4 24-APR-13          0               4
DS212+     1133290         20130403-001    18-JUN-13                  2 24-APR-13          0               2
RS812      1231280         20130403-001    18-JUN-13                 60 27-JUN-13          1              61
RS812      1231280         20130403-001    18-JUN-13                 60 03-JUL-13          1              62
RS812      1231280         20130403-001    18-JUN-13                 60 24-JUL-13          2              64

As may be seen above, only customer purchase order 1231280 has a forecast, calling for a quantity of 1 on June 27, quantity of 1 on July 3, and a quantity of 2 on July 24.  The previously received quantity for customer purchase order 1231280 is 60, so the additional forecast quantities will bring the total order quantities up to 61, 62, and 64 through the three dates included in the forecast.

-The Top Half of the Final SQL Statement is Above-

So, we now have a starting point – the data that was received on April 24, 2013.  How does that forecast data compare with previously received forecasts?  We need to start building the second half of the SQL statement, which looks at the historical data.  To limit the dataset size, I will limit the data to only those EDI documents that were received up to 56 days prior (the eight week “firm” period) to the April 24, 2013 date that was specified earlier in this article:

SELECT
  T.CUSTOMER_PO,
  ROW_NUMBER() OVER (PARTITION BY T.CUSTOMER_PO,TRUNC(T.CREATE_DATE) ORDER BY T.RELEASE_ID DESC) REL_ON_DATE,
  TRUNC(T.CREATE_DATE) CREATE_DATE,
  T.RELEASE_ID,
  T.PREVIOUS_ACCUM_QTY
FROM
  T1 T
WHERE 
  T.CREATE_DATE >= TO_DATE('24-APR-2013','DD-MON-YYYY') - 56
  AND T.CREATE_DATE < TO_DATE('24-APR-2013','DD-MON-YYYY')
ORDER BY
  T.CUSTOMER_PO,
  T.RELEASE_ID;

CUSTOMER_PO     REL_ON_DATE CREATE_DA RELEASE_ID      PREVIOUS_ACCUM_QTY
--------------- ----------- --------- --------------- ------------------
1131111                   1 27-FEB-13 0371                           575
1131111                   1 01-MAR-13 0381                           577
1131111                   1 06-MAR-13 0431                           578
1131111                   1 07-MAR-13 0441                           581
1131111                   1 08-MAR-13 0451                           581
1131111                   1 11-MAR-13 0461                           581
...
1131111                   1 08-APR-13 0741                           593
1131111                   1 11-APR-13 0791                           593
1131111                   1 23-APR-13 0911                           595
1131245                   1 27-FEB-13 0371                          2299
1131245                   1 01-MAR-13 0381                          2300
1131245                   1 05-MAR-13 0421                          2303
...
1131745                   1 19-APR-13 0871                           435
1131745                   1 23-APR-13 0911                           435
1133290                   1 27-FEB-13 20130205-001                     2
1133290                   1 28-FEB-13 20130206-001                     2
1133290                   1 01-MAR-13 20130207-001                     2
...
1133290                   1 19-APR-13 20130328-001                     2
1133290                   1 23-APR-13 20130401-001                     2
1231234                   1 27-FEB-13 20130205-001                    24
1231234                   1 28-FEB-13 20130206-001                    24
...
1231234                   1 23-APR-13 20130401-001                    24
1231280                   1 27-FEB-13 20130205-001                    57
1231280                   1 28-FEB-13 20130206-001                    57
1231280                   1 01-MAR-13 20130207-001                    57
1231280                   1 04-MAR-13 20130208-001                    58
1231280                   1 05-MAR-13 20130211-001                    58
...
1231280                   1 04-APR-13 20130313-001                    60
1231280                   2 05-APR-13 20130314-002                    60
...
1231280                   1 19-APR-13 20130328-001                    60
1231280                   1 23-APR-13 20130401-001                    60
1233290                   1 27-FEB-13 20130205-001                     4
1233290                   1 28-FEB-13 20130206-001                     4
...

As we found earlier, there could be more than one release for a customer purchase order received in a single date, and we are only concerned with the most recent release received on any particular day.  We have a second issue, in that the dates may have shifted from one release to the next, so we have to artifically generate rows with the potentially missing dates, with an order quantity of 0 on those dates.

But, how to generate a list of dates?  We could use a CONNECT BY LEVEL trick:

SELECT
  TO_DATE('24-APR-2013','DD-MON-YYYY')+(LEVEL-1) DUE_DATE
FROM
  DUAL
CONNECT BY
  LEVEL<=100;

DUE_DATE
---------
24-APR-13
25-APR-13
26-APR-13
27-APR-13
28-APR-13
...
15-JUN-13
16-JUN-13
17-JUN-13
18-JUN-13
...
30-JUL-13
31-JUL-13
01-AUG-13

Now, by creating a Cartesian join between the dataset with the dates and the dataset with the EDI release header records, we will make certain that a date exists for every date that is potentially of interest (up to 100 days after April 24, 2013) when we later join the result with the T1_LINE table (the child table that lists the forecast delivery dates).

SELECT
  T2.CUSTOMER_PO,
  T2.RELEASE_ID,
  T2.CREATE_DATE,
  T2.PREVIOUS_ACCUM_QTY,
  D.DUE_DATE
FROM
  (SELECT
    T.CUSTOMER_PO,
    ROW_NUMBER() OVER (PARTITION BY T.CUSTOMER_PO,TRUNC(T.CREATE_DATE) ORDER BY T.RELEASE_ID DESC) REL_ON_DATE,
    TRUNC(T.CREATE_DATE) CREATE_DATE,
    T.RELEASE_ID,
    T.PREVIOUS_ACCUM_QTY
  FROM
    T1 T
  WHERE 
    T.CREATE_DATE >= TO_DATE('24-APR-2013','DD-MON-YYYY') - 56
    AND T.CREATE_DATE < TO_DATE('24-APR-2013','DD-MON-YYYY')
  ORDER BY
    T.CUSTOMER_PO,
    T.RELEASE_ID) T2,
  (SELECT
    TO_DATE('24-APR-2013','DD-MON-YYYY')+(LEVEL-1) DUE_DATE
  FROM
    DUAL
  CONNECT BY
    LEVEL<=100) D
WHERE
  T2.REL_ON_DATE=1
ORDER BY
  T2.CUSTOMER_PO,
  T2.RELEASE_ID,
  D.DUE_DATE;

...
CUSTOMER_PO     RELEASE_ID      CREATE_DA PREVIOUS_ACCUM_QTY DUE_DATE
--------------- --------------- --------- ------------------ ---------
1233290         20130318-001    09-APR-13                  4 25-MAY-13
1233290         20130318-001    09-APR-13                  4 26-MAY-13
1233290         20130318-001    09-APR-13                  4 27-MAY-13
1233290         20130318-001    09-APR-13                  4 28-MAY-13
...
1233290         20130322-001    15-APR-13                  4 22-JUN-13
1233290         20130322-001    15-APR-13                  4 23-JUN-13
1233290         20130322-001    15-APR-13                  4 24-JUN-13
1233290         20130322-001    15-APR-13                  4 25-JUN-13
...
1233290         20130401-001    23-APR-13                  4 29-JUL-13
1233290         20130401-001    23-APR-13                  4 30-JUL-13
1233290         20130401-001    23-APR-13                  4 31-JUL-13
1233290         20130401-001    23-APR-13                  4 01-AUG-13

Next, we will outer join to the T1_LINE table to bring in the actual forecasted delivery schedule from the previous dates (note that I also removed an unnecessary ORDER BY clause that was still embedded in the SQL statement – the Oracle query optimizer probably would have realized that the ORDER BY clause was meaningless to the query result, and automatically removed it – but I removed the clause just in case my assumption is incorrect):

SELECT
  T3.CUSTOMER_PO,
  T3.CREATE_DATE,
  T3.DUE_DATE,
  T3.PREVIOUS_ACCUM_QTY + SUM(NVL(TL.QTY,0)) OVER (PARTITION BY T3.CUSTOMER_PO,T3.RELEASE_ID ORDER BY T3.DUE_DATE) REL_RUNNING_SUM
FROM
  (SELECT
    T2.CUSTOMER_PO,
    T2.RELEASE_ID,
    T2.CREATE_DATE,
    T2.PREVIOUS_ACCUM_QTY,
    D.DUE_DATE
  FROM
    (SELECT
      T.CUSTOMER_PO,
      ROW_NUMBER() OVER (PARTITION BY T.CUSTOMER_PO,TRUNC(T.CREATE_DATE) ORDER BY T.RELEASE_ID DESC) REL_ON_DATE,
      TRUNC(T.CREATE_DATE) CREATE_DATE,
      T.RELEASE_ID,
      T.PREVIOUS_ACCUM_QTY
    FROM
      T1 T
    WHERE 
      T.CREATE_DATE >= TO_DATE('24-APR-2013','DD-MON-YYYY') - 56
      AND T.CREATE_DATE < TO_DATE('24-APR-2013','DD-MON-YYYY')) T2,
    (SELECT
      TO_DATE('24-APR-2013','DD-MON-YYYY')+(LEVEL-1) DUE_DATE
    FROM
      DUAL
    CONNECT BY
      LEVEL<=100) D
  WHERE
    T2.REL_ON_DATE=1) T3,
  T1_LINE TL
WHERE
  T3.CUSTOMER_PO=TL.CUSTOMER_PO(+)
  AND T3.RELEASE_ID=TL.RELEASE_ID(+)
  AND T3.DUE_DATE=TL.DUE_DATE(+)
ORDER BY
  T3.CUSTOMER_PO,
  T3.RELEASE_ID,
  T3.DUE_DATE;

CUSTOMER_PO     CREATE_DA DUE_DATE  REL_RUNNING_SUM
--------------- --------- --------- ---------------
1131111         27-FEB-13 24-APR-13             577
1131111         27-FEB-13 25-APR-13             578
1131111         27-FEB-13 26-APR-13             579
1131111         27-FEB-13 27-APR-13             579
1131111         27-FEB-13 28-APR-13             579
1131111         27-FEB-13 29-APR-13             580
1131111         27-FEB-13 30-APR-13             581
...
1131111         23-APR-13 30-JUL-13             636
1131111         23-APR-13 31-JUL-13             638
1131111         23-APR-13 01-AUG-13             639
...
1231280         11-MAR-13 08-MAY-13              58
1231280         11-MAR-13 09-MAY-13              58
1231280         11-MAR-13 10-MAY-13              58
1231280         11-MAR-13 11-MAY-13              58
...
1231280         23-APR-13 21-JUL-13              62
1231280         23-APR-13 22-JUL-13              62
1231280         23-APR-13 23-JUL-13              62
1231280         23-APR-13 24-JUL-13              64
1231280         23-APR-13 25-JUL-13              64
1231280         23-APR-13 26-JUL-13              64
...

We are getting close.  Now we just need to join the SQL created earlier in this article with the above SQL statement so that the current customer purchase order forecasts may be compared with the earlier received customer purchase order forecasts.  The LISTAGG analytic funtion, introduced with Oracle Database 11.1, makes it easy to condense multiple rows from the resultset into a much smaller number of rows, if the resultset is grouped on at least one column.  So, the final version of the analysis challenge appears as follows:

SELECT
  C.PART_ID,
  C.CUSTOMER_PO,
  C.POSSIBLE_FIRM_DATE,
  C.DUE_DATE,
  C.QTY,
  C.REL_RUNNING_SUM,
  LISTAGG(T4.REL_RUNNING_SUM||' ('||TO_CHAR(T4.CREATE_DATE,'MM/DD/YY')||')',', ') WITHIN GROUP (ORDER BY T4.CREATE_DATE DESC) EXCEPTIONS
FROM
  (SELECT
    T.PART_ID,
    T.CUSTOMER_PO,
    T.RELEASE_ID,
    T.RELEASE_DATE+55 POSSIBLE_FIRM_DATE,
    T.PREVIOUS_ACCUM_QTY,
    TL.DUE_DATE,
    TL.QTY,
    T.PREVIOUS_ACCUM_QTY + SUM(TL.QTY) OVER (PARTITION BY T.CUSTOMER_PO,T.RELEASE_ID ORDER BY TL.DUE_DATE) REL_RUNNING_SUM
  FROM
    (SELECT
      T.PART_ID,
      T.CUSTOMER_PO,
      ROW_NUMBER() OVER (PARTITION BY T.CUSTOMER_PO,TRUNC(T.CREATE_DATE) ORDER BY T.RELEASE_ID DESC) REL_ON_DATE,
      T.RELEASE_ID,
      T.RELEASE_DATE,
      T.CREATE_DATE,
      T.PREVIOUS_ACCUM_QTY
    FROM
      T1 T
    WHERE 
      T.CREATE_DATE=TO_DATE('24-APR-2013','DD-MON-YYYY')) T,
    T1_LINE TL
  WHERE
    T.REL_ON_DATE=1
    AND T.CUSTOMER_PO=TL.CUSTOMER_PO
    AND T.RELEASE_ID=TL.RELEASE_ID
    AND (T.RELEASE_DATE+100)>=TL.DUE_DATE) C,
  (SELECT
    T3.CUSTOMER_PO,
    T3.CREATE_DATE,
    T3.DUE_DATE,
    T3.PREVIOUS_ACCUM_QTY + SUM(NVL(TL.QTY,0)) OVER (PARTITION BY T3.CUSTOMER_PO,T3.RELEASE_ID ORDER BY T3.DUE_DATE) REL_RUNNING_SUM
  FROM
    (SELECT
      T2.CUSTOMER_PO,
      T2.RELEASE_ID,
      T2.CREATE_DATE,
      T2.PREVIOUS_ACCUM_QTY,
      D.DUE_DATE
    FROM
      (SELECT
        T.CUSTOMER_PO,
        ROW_NUMBER() OVER (PARTITION BY T.CUSTOMER_PO,TRUNC(T.CREATE_DATE) ORDER BY T.RELEASE_ID DESC) REL_ON_DATE,
        TRUNC(T.CREATE_DATE) CREATE_DATE,
        T.RELEASE_ID,
        T.PREVIOUS_ACCUM_QTY
      FROM
        T1 T
      WHERE 
        T.CREATE_DATE >= TO_DATE('24-APR-2013','DD-MON-YYYY') - 56
        AND T.CREATE_DATE < TO_DATE('24-APR-2013','DD-MON-YYYY')) T2,
      (SELECT
        TO_DATE('24-APR-2013','DD-MON-YYYY')+(LEVEL-1) DUE_DATE
      FROM
        DUAL
      CONNECT BY
        LEVEL<=100) D
    WHERE
      T2.REL_ON_DATE=1) T3,
    T1_LINE TL
  WHERE
    T3.CUSTOMER_PO=TL.CUSTOMER_PO(+)
    AND T3.RELEASE_ID=TL.RELEASE_ID(+)
    AND T3.DUE_DATE=TL.DUE_DATE(+)) T4
WHERE
  C.CUSTOMER_PO=T4.CUSTOMER_PO
  AND C.DUE_DATE=T4.DUE_DATE
  AND C.REL_RUNNING_SUM<>T4.REL_RUNNING_SUM
GROUP BY
  C.PART_ID,
  C.CUSTOMER_PO,
  C.POSSIBLE_FIRM_DATE,
  C.DUE_DATE,
  C.QTY,
  C.REL_RUNNING_SUM
ORDER BY
  C.PART_ID,
  C.CUSTOMER_PO,
  C.DUE_DATE;

PART_ID    CUSTOMER_PO     POSSIBLE_ DUE_DATE         QTY REL_RUNNING_SUM
---------- --------------- --------- --------- ---------- ---------------
EXCEPTIONS
-----------------------------------------------------------------------------------------------------------------------
RS812      1231280         18-JUN-13 27-JUN-13          1              61
60 (03/22/13), 60 (03/21/13), 60 (03/19/13), 60 (03/18/13), 60 (03/15/13), 60 (03/14/13), 59 (03/13/13), 59 (03/12/13),
59 (03/11/13), 59 (03/08/13), 60 (03/07/13), 60 (03/06/13), 60 (03/05/13), 60 (03/04/13), 59 (03/01/13), 59 (02/28/13),
59 (02/27/13)

RS812      1231280         18-JUN-13 03-JUL-13          1              62
61 (03/22/13), 61 (03/21/13), 61 (03/19/13), 61 (03/18/13), 61 (03/15/13), 61 (03/14/13), 60 (03/13/13), 60 (03/12/13),
60 (03/11/13), 60 (03/08/13), 61 (03/07/13), 61 (03/06/13), 61 (03/05/13), 61 (03/04/13), 60 (03/01/13), 60 (02/28/13),
60 (02/27/13)

RS812      1231280         18-JUN-13 24-JUL-13          2              64
63 (03/26/13), 63 (03/25/13), 62 (03/22/13), 62 (03/21/13), 62 (03/19/13), 62 (03/18/13), 62 (03/15/13), 62 (03/14/13),
61 (03/13/13), 61 (03/12/13), 61 (03/11/13), 61 (03/08/13), 62 (03/07/13), 62 (03/06/13), 62 (03/05/13), 62 (03/04/13),
61 (03/01/13), 61 (02/28/13), 61 (02/27/13)

It is now easy to see that the requested total quantity due through June 27, 2013 increased by 2 since the first forecast which was received on February 27, 2013.  The requested total quantity increased by 1 in the EDI release that was received on March 4, 2013, and by 1 again on the next EDI release that was received after March 22, 2013.  The second and third lines of the output show that there was a little more fluctuation in the customer’s forecast, increasing in quantity, decreasing in quantity, and then increasing in quantity again.

An unfortunate problem may be present – what if the customer completely eliminates the forecast orders on their purchase order, but still sends in a header line for the forecast?  A future enhancement may be required, where the CONNECT BY LEVEL trick is used in the first half of the SQL statement (the current releases), just as it is used in the second half of the SQL statement (the historical releases), so that when the top half of the final SQL statement is joined to the bottom half, rows returned by the bottom half of the SQL statement are not unnecessarily removed from the output.

The output is a little ugly.  One approach to clean up the output involves spooling the output to an HTML file as explained here and in the documentation here.  Another approach to clean up the output involves sending the data into an Excel spreadsheet using either a macro or the Microsoft Query Wizard, accessed from within Excel (see the articles in the Excel category of this blog for an explanation of these approaches).

—–

The final solution for this analysis challenge is so simple when built up from smaller SQL statements as shown above; I wonder why I did not consider this approach 12 years ago?  :-)





Bitten by a Virtual Column, _OPTIMIZER_IGNORE_HINTS Doesn’t Ignore Hints?

9 03 2013

March 9, 2013

I had a couple of spare minutes today, so I tried a couple of experiments with Oracle Database 11.2.0.2 just to see if I could produce some unexpected results.

First, I will create a simple database table with two indexes:

CREATE TABLE T1 (
  N1 NUMBER,
  V1 VARCHAR2(20),
  D1 DATE,
  PADDING VARCHAR2(300));

CREATE INDEX IND_T1_I1 ON T1(V1);
CREATE INDEX IND_T1_I2 ON T1(N1,1);

In the above, I indexed column V1.  The second index is a composite index with column N1 as the first column and the constant 1 as the second column (see this article to understand the reason for the constant 1).  Creating the second index causes Oracle Database to add a hidden virtual column to the table.

Let’s insert 100,000 rows into the table (column N1 includes a repeating sequence from 1 to 999 and then a NULL, column V1 is the Roman numeral for the repeating sequence from 1 to 999 and then 0, column D1 simply adds from 1 to 999 and then 0 days to the current date, and column PADDING is just a series of A characters to discourage the Oracle query optimizer from using full table scans rather than an index access path, if such an access path is legal):

INSERT INTO
  T1 
SELECT 
  DECODE(MOD(ROWNUM,1000),0,NULL,MOD(ROWNUM,1000)) N1, 
  TRIM(TO_CHAR(MOD(ROWNUM,1000),'RN')) V1, 
  TRUNC(SYSDATE) + MOD(ROWNUM,1000) D1, 
  RPAD('A',300,'A') PADDING 
FROM 
  DUAL 
CONNECT BY 
  LEVEL<=100000;

COMMIT;

Let’s take a look at the columns that are included in each of the indexes:

SELECT
  INDEX_NAME,
  COLUMN_NAME
FROM
  USER_IND_COLUMNS
WHERE
  TABLE_NAME = 'T1'
ORDER BY
  INDEX_NAME,
  COLUMN_POSITION;

INDEX_NAME   COLUMN_NAME
------------ ------------
IND_T1_I1    V1
IND_T1_I2    N1
IND_T1_I2    SYS_NC00005$

As can be seen above, Oracle added the SYS_NC00005$ hidden virtual column to the table so that the number 1 could be included as the second column of the composite index.  Somewhat interesting, we are able to query the value of the SYS_NC00005$ hidden virtual column in a query:

SET LINESIZE 140
SET PAGESIZE 1000

SELECT
  V1,
  N1,
  "SYS_NC00005$"
FROM
  T1
WHERE
  N1 IS NULL;

V1                           N1 SYS_NC00005$
-------------------- ---------- ------------
###############                            1
###############                            1
###############                            1
###############                            1
###############                            1
...
###############                            1
###############                            1
###############                            1

100 rows selected.

The above indicates that (according to Oracle) just as there is no Roman number higher than 3,999, there is no Roman number for 0.

While probably not wise, we are able to rename this hidden virtual column:

ALTER TABLE T1 RENAME COLUMN "SYS_NC00005$" TO "C";

Table altered.

Let’s check which columns are included in the index definitions again:

SELECT
  INDEX_NAME,
  COLUMN_NAME
FROM
  USER_IND_COLUMNS
WHERE
  TABLE_NAME = 'T1'
ORDER BY
  INDEX_NAME,
  COLUMN_POSITION;

INDEX_NAME   COLUMN_NAME
------------ -----------
IND_T1_I1    V1
IND_T1_I2    N1
IND_T1_I2    C

The output of the above might suggest to the unsuspecting person that someone intentionally added a column named C to the table, and included that column in a composite index with column N1.  Interesting…

Let’s collect statistics on all of the rows in the table, as well as the table’s indexes:

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>NULL,TABNAME=>'T1',CASCADE=>TRUE,ESTIMATE_PERCENT=>100)

SELECT
  V1,
  N1,
  C
FROM
  T1
WHERE
  N1 IS NULL;

V1                           N1          C
-------------------- ---------- ----------
###############                          1
###############                          1
###############                          1
###############                          1
###############                          1
...
###############                          1
###############                          1
###############                          1

100 rows selected.

DBMS_XPLAN will be used to retrieve the actual execution plan for the SQL statement that was just executed:

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'TYPICAL'));

SQL_ID  f35sfgu6s8huh, child number 0
-------------------------------------
SELECT   V1,   N1,   C FROM   T1 WHERE   N1 IS NULL

Plan hash value: 2648162318

-----------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |       |       |   102 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1        |   100 |  1500 |   102   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IND_T1_I2 |   100 |       |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("N1" IS NULL)

The above indicates that the optimizer determined that using an index access path was likely the most efficient access path to search for cases where column N1 contains NULL values.

Oracle Database offers more than 200 hints to help control the query optimizer’s behavior (as well as other behavior).  In general, hints should be used sparingly.  For example, we are able to add a FULL hint to force the optimizer to use a full table scan when searching column N1 for NULL values:

SELECT /*+ FULL(T1) */
  V1,
  N1,
  C
FROM
  T1
WHERE
  N1 IS NULL;

V1                           N1          C
-------------------- ---------- ----------
###############                          1
###############                          1
###############                          1
###############                          1
###############                          1
...
###############                          1
###############                          1
###############                          1

100 rows selected.

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'TYPICAL'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------
SQL_ID  3w60zjynqtwrp, child number 0
-------------------------------------
SELECT /*+ FULL(T1) */   V1,   N1,   C FROM   T1 WHERE   N1 IS NULL

Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |       |       |   590 (100)|          |
|*  1 |  TABLE ACCESS FULL| T1   |   100 |  1500 |   590   (2)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("N1" IS NULL)

As the above shows, Oracle’s optimizer selected to use a full table scan even though the calculated cost for that access path (590) is higher than the calculated cost of the index access path (102).  There is an obvious temptation when learning about hints to overuse those hints – as data volumes (and the data itself) in a table changes, those index hints may very well hurt performance, rather than help performance.

_OPTIMIZER_IGNORE_HINTS doesn’t ignore hints?

That was a recent set of search keywords used to access this blog.  Oracle Database offers a fairly large number of parameters (we will get back to hints in a moment), some of which are considered hidden parameters because those parameters begin with _ (underscore) characters, typically do not display when querying theV$PARAMETER view (unless the parameter value was modified), and typically should not be modified without the explicit request of Oracle Support.  If we had used the +OUTLINE format parameter in the DBMS_XPLAN call we would see a similarly named IGNORE_OPTIM_EMBEDDED_HINTS hint in the Outline Data section of the execution plan:

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR('3w60zjynqtwrp',0,'TYPICAL +OUTLINE'));

SQL_ID  3w60zjynqtwrp, child number 0
-------------------------------------
SELECT /*+ FULL(T1) */   V1,   N1,   C FROM   T1 WHERE   N1 IS NULL

Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |       |       |   590 (100)|          |
|*  1 |  TABLE ACCESS FULL| T1   |   100 |  1500 |   590   (2)| 00:00:01 |
--------------------------------------------------------------------------

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('11.2.0.2')
      DB_VERSION('11.2.0.2')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$1")
      FULL(@"SEL$1" "T1"@"SEL$1")
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("N1" IS NULL)

Let’s modify the _OPTIMIZER_IGNORE_HINTS hidden parameter at the session level and try the test query again:

ALTER SESSION SET "_optimizer_ignore_hints"=TRUE;

Session altered.

SELECT /*+ FULL(T1) */
  V1,
  N1,
  C
FROM
  T1
WHERE
  N1 IS NULL;

V1                           N1          C
-------------------- ---------- ----------
###############                          1
###############                          1
###############                          1
###############                          1
###############                          1
...
###############                          1
###############                          1
###############                          1

100 rows selected.

The above is no surprise, so let’s take a look at the execution plan with the Outline Data section included:

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'TYPICAL +OUTLINE'));

SQL_ID  3w60zjynqtwrp, child number 1
-------------------------------------
SELECT /*+ FULL(T1) */   V1,   N1,   C FROM   T1 WHERE   N1 IS NULL

Plan hash value: 2648162318

-----------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |       |       |   102 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1        |   100 |  1500 |   102   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IND_T1_I2 |   100 |       |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('11.2.0.2')
      DB_VERSION('11.2.0.2')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$1")
      INDEX_RS_ASC(@"SEL$1" "T1"@"SEL$1" ("T1"."N1" "T1"."C1"))
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("N1" IS NULL)

Looking at the above execution plan, you will notice that the Oracle optimizer has ignored the FULL hint that was specified in the SQL statement, and that by examining the Outline Data section of the plan you will see that FULL(@”SEL$1″ “T1″@”SEL$1″) was replaced with INDEX_RS_ASC(@”SEL$1″ “T1″@”SEL$1″ (“T1″.”N1″ “T1″.”C1″)).

I am wondering about the above search keywords that were used to access my blog.  Can I force the Oracle query optimizer NOT to ignore embedded hints when the _OPTIMIZER_IGNORE_HINTS hidden parameter is set to TRUE?  Before we start, let’s set the hidden parameter back to the default value of FALSE.

ALTER SESSION SET "_optimizer_ignore_hints"=FALSE;

Occasionally, when the Outline Data section of the execution plan is reviewed (either with DBMS_XPLAN or by reviewing a 10053 trace) an OPT_PARAM hint may appear in the Outline Data section.  Oracle Database itself inserts these hints into SQL statement, for example (from a 10046 trace file):

PARSING IN CURSOR #448514944 len=3052 dep=1 uid=64 oct=3 lid=64 tim=853194336675 hv=3275773959 ad='3edba6b28' sqlid='2sdms4r1n0q07'
SELECT /*+ opt_param('parallel_execution_enabled', 'false') */
                       /* EXEC_FROM_DBMS_XPLAN */ id, position, depth , operation, options, object_name, cardinality, bytes, temp_space, cost,

A 10053 trace file also includes a section titled PARAMETERS IN OPT_PARAM HINT that indicates the SQL statement level parameters that are modified through the OPT_PARAM hint.

Let’s test that OPT_PARAM hint to see if we are able to hint the optimizer to ignore all other hints in the SQL statement:

SELECT /*+ OPT_PARAM('_optimizer_ignore_hints','TRUE') FULL(T1) */
  V1,
  N1,
  C
FROM
  T1
WHERE
  N1 IS NULL;

V1                           N1          C
-------------------- ---------- ----------
###############                          1
###############                          1
###############                          1
###############                          1
###############                          1
...
###############                          1
###############                          1
###############                          1

100 rows selected.

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'TYPICAL'));

SQL_ID  c4fxyhf0fzg4t, child number 0
-------------------------------------
SELECT /*+ OPT_PARAM('_optimizer_ignore_hints','TRUE') FULL(T1) */
V1,   N1,   C FROM   T1 WHERE   N1 IS NULL

Plan hash value: 2648162318

-----------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |       |       |   102 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1        |   100 |  1500 |   102   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IND_T1_I2 |   100 |       |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("N1" IS NULL)

The hint that modified the parameter that controls whether or not hints are ignored worked.  Well, that just does not seem right – should it not be the case that that hint is also ignored?  Might this be a catch 22?

What if… we reverse the order of the hints?

SELECT /*+ FULL(T1) OPT_PARAM('_optimizer_ignore_hints','TRUE') */
  V1,
  N1,
  C
FROM
  T1
WHERE
  N1 IS NULL;

V1                           N1          C
-------------------- ---------- ----------
###############                          1
###############                          1
###############                          1
###############                          1
###############                          1
...
###############                          1
###############                          1
###############                          1

100 rows selected.

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'TYPICAL'));

SQL_ID  07pu6c3jvdrd0, child number 0
-------------------------------------
SELECT /*+ FULL(T1) OPT_PARAM('_optimizer_ignore_hints','TRUE') */
V1,   N1,   C FROM   T1 WHERE   N1 IS NULL

Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |       |       |   590 (100)|          |
|*  1 |  TABLE ACCESS FULL| T1   |   100 |  1500 |   590   (2)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("N1" IS NULL)

Well, the FULL hint definitely was not ignored by the optimizer that time, as it was for the previous arrangement of the hints.  Might we be on to something?

Let’s verify that the OPT_PARAM hint does work to control optimizer parameters at the SQL statement level:

SELECT /*+ OPT_PARAM('optimizer_index_cost_adj',1) */
  V1,
  N1,
  C
FROM
  T1
WHERE
  N1 IS NULL;

V1                           N1          C
-------------------- ---------- ----------
###############                          1
###############                          1
###############                          1
###############                          1
###############                          1
...
###############                          1
###############                          1
###############                          1

100 rows selected.

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'TYPICAL'));

SQL_ID  ac45wtypubpfu, child number 0
-------------------------------------
SELECT /*+ OPT_PARAM('optimizer_index_cost_adj',1) */   V1,   N1,   C
FROM   T1 WHERE   N1 IS NULL

Plan hash value: 2648162318

-----------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |       |       |     1 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1        |   100 |  1500 |     1   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IND_T1_I2 |   100 |       |     1   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("N1" IS NULL)

Reviewing the above, you might have noticed that the calculated cost of the index access decreased from a value of 102 to just 1 as a result of the OPT_PARAM hint, so we might be able to conclude that the hint does work.  Might this difference in the execution plan be explained as just the query optimizer NOT silently ignoring invalid hints?  Food for thought.

You might have noticed that the above queries have selected column C from the table.  Let’s take a look at the table’s definition:

DESC T1

 Name             Null?    Type
 ---------------- -------- -------------
 N1                        NUMBER
 V1                        VARCHAR2(20)
 D1                        DATE
 PADDING                   VARCHAR2(300)

Yep, no column C.  How are we able to select column C from the table if DESC T1 does not list that column?

Let’s drop the second index, the one that used the constant 1 as the second column in the index:

DROP INDEX IND_T1_I2;

Index dropped.

Now that the index is gone that created that hidden virtual column, let’s create our own column C and use that column in a new IND_T1_I2 composite index:

ALTER TABLE T1 ADD (C NUMBER DEFAULT 1 NOT NULL);

                    *
ERROR at line 1:
ORA-01430: column being added already exists in table

ORA-01430?  I never added a column C to the table, did you?  Take a look:

DESC T1

 Name             Null?    Type
 ---------------- -------- -------------
 N1                        NUMBER
 V1                        VARCHAR2(20)
 D1                        DATE
 PADDING                   VARCHAR2(300)

Be gone mysterious column C (do you think this might be the reason why it is a bad idea to rename the system generated hidden virtual columns?):

ALTER TABLE T1 DROP (C);

                     *
ERROR at line 1:
ORA-12996: cannot drop system-generated virtual column

But, but, I dropped the index that caused that hidden virtual column to be created.

ALTER TABLE T1 ADD (C1 NUMBER DEFAULT 1 NOT NULL);

Table altered.

CREATE INDEX IND_T1_I2 ON T1(N1,C1);

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>NULL,TABNAME=>'T1',CASCADE=>TRUE,ESTIMATE_PERCENT=>100)

I didn’t really want a column C any way, sniff.

Now, if we were to execute this SQL statement, will Oracle’s query optimizer use the just created IND_T1_I2 index to search for NULL values in column N1?  Would the result potentially be different if column C1 did not have a declared NOT NULL constraint?  Keep in mind that the Oracle documentation states: “The query must be guaranteed not to need any NULL values from the indexed expression, because NULL values are not stored in indexes.”

SELECT
  V1,
  N1,
  C1
FROM
  T1
WHERE
  N1 IS NULL;

Did I mention that I had a couple of spare minutes today?  That does not happen too often lately.

Above I stated that it was odd that Oracle Database did not automatically dop the automatically created hidden virtual column when the index for which the column was created was dropped.  So, what happens if the automatically created hidden virtual column is not renamed?  Here is a simple script based on the above experiment to determine what happens (change SYS_NC00005$ in the ALTER TABLE commands as necessary so that the column name is the same as the column name returned from the query of the USER_IND_COLUMNS view).

DROP TABLE T1 PURGE;

CREATE TABLE T1 (
  N1 NUMBER,
  V1 VARCHAR2(20),
  D1 DATE,
  PADDING VARCHAR2(300));

CREATE INDEX IND_T1_I1 ON T1(V1);

CREATE INDEX IND_T1_I2 ON T1(N1,1);

INSERT INTO
  T1
SELECT
  DECODE(MOD(ROWNUM,1000),0,NULL,MOD(ROWNUM,1000)) N1,
  TRIM(TO_CHAR(MOD(ROWNUM,1000),'RN')) V1,
  TRUNC(SYSDATE) + MOD(ROWNUM,1000) D1,
  RPAD('A',300,'A') PADDING
FROM
  DUAL
CONNECT BY
  LEVEL<=100000;

COMMIT;

SELECT
  INDEX_NAME,
  COLUMN_NAME
FROM
  USER_IND_COLUMNS
WHERE
  TABLE_NAME = 'T1'
ORDER BY
  INDEX_NAME,
  COLUMN_POSITION;

ALTER TABLE T1 ADD ("SYS_NC00005$" NUMBER DEFAULT 1 NOT NULL);

DROP INDEX IND_T1_I2;

ALTER TABLE T1 ADD ("SYS_NC00005$" NUMBER DEFAULT 1 NOT NULL);

My results follow:

SQL> DROP TABLE T1 PURGE;
Table dropped.

SQL> CREATE TABLE T1 (
  2    N1 NUMBER,
  3    V1 VARCHAR2(20),
  4    D1 DATE,
  5    PADDING VARCHAR2(300));

Table created.

SQL> CREATE INDEX IND_T1_I1 ON T1(V1);

Index created.

SQL> CREATE INDEX IND_T1_I2 ON T1(N1,1);

Index created.

SQL> INSERT INTO
  2    T1
  3  SELECT
  4    DECODE(MOD(ROWNUM,1000),0,NULL,MOD(ROWNUM,1000)) N1,
  5    TRIM(TO_CHAR(MOD(ROWNUM,1000),'RN')) V1,
  6    TRUNC(SYSDATE) + MOD(ROWNUM,1000) D1,
  7    RPAD('A',300,'A') PADDING
  8  FROM
  9    DUAL
 10  CONNECT BY
 11    LEVEL<=100000;

100000 rows created.

SQL> COMMIT;

Commit complete.

SQL> SELECT
  2    INDEX_NAME,
  3    COLUMN_NAME
  4  FROM
  5    USER_IND_COLUMNS
  6  WHERE
  7    TABLE_NAME = 'T1'
  8  ORDER BY
  9    INDEX_NAME,
 10    COLUMN_POSITION;

INDEX_NAME   COLUMN_NAME
------------ --------------------
IND_T1_I1    V1
IND_T1_I2    N1
IND_T1_I2    SYS_NC00005$

SQL> ALTER TABLE T1 ADD ("SYS_NC00005$" NUMBER DEFAULT 1 NOT NULL);
ALTER TABLE T1 ADD ("SYS_NC00005$" NUMBER DEFAULT 1 NOT NULL)
                    *
ERROR at line 1:
ORA-01430: column being added already exists in table

SQL> DROP INDEX IND_T1_I2;

Index dropped.

SQL> ALTER TABLE T1 ADD ("SYS_NC00005$" NUMBER DEFAULT 1 NOT NULL);

Table altered.

With the original test case, both the first and second ALTER TABLE commands failed.





Feeling ANSI About Oracle Join Syntax? 2

7 02 2013

February 7, 2013

(Back to the Previous Post in the Series)

As I have mentioned a couple of times previously, I am not much of a fan of ANSI style joins – I prefer using the classical Oracle join syntax when possible.  I try to keep up with an ERP mailing list, and try to assist with providing answers to questions when time permits.  A SQL statement was recently shared with the ERP mailing list, demonstrating a solution that was put together to solve a particular problem.  A portion of the SQL statement follows:

...
 FROM
 dbo.WORK_ORDER RIGHT OUTER JOIN dbo.OPERATION ON
     dbo.OPERATION.WORKORDER_TYPE = dbo.WORK_ORDER.TYPE AND
     dbo.OPERATION.WORKORDER_BASE_ID = dbo.WORK_ORDER.BASE_ID AND
     dbo.OPERATION.WORKORDER_LOT_ID = dbo.WORK_ORDER.LOT_ID AND
     dbo.OPERATION.WORKORDER_SPLIT_ID = dbo.WORK_ORDER.SPLIT_ID AND
     dbo.OPERATION.WORKORDER_SUB_ID = dbo.WORK_ORDER.SUB_ID
   left join LABOR_TICKET CUR on
 ...

An ANSI join… I really wish that the person who wrote that SQL statement used Oracle’s classical (+) notation for the outer join declaration… and it probably would have helped if the OP was running with an Oracle Database backend rather than a SQL Server backend.  When I saw that ANSI outer join, I immediately started thinking about pig outer join… a response that I submitted to an Internet forum in 2006, and republished here.

After thinking about the SQL statement for a couple of minutes, I decided that the above ANSI RIGHT OUTER JOIN is equivalent to the following using Oracle’s classical (+) notation for outer joins:

WORK_ORDER.TYPE(+) = OPERATION.WORKORDER_TYPE
AND WORK_ORDER.BASE_ID(+) = OPERATION.WORKORDER_BASE_ID
AND WORK_ORDER.LOT_ID(+) = OPERATION.WORKORDER_LOT_ID
AND WORK_ORDER.SPLIT_ID(+) = OPERATION.WORKORDER_SPLIT_ID
AND WORK_ORDER.SUB_ID(+) = OPERATION.WORKORDER_SUB_ID

If you look closely at the above, you might have noticed that not only did I remove the dbo. in front of the table names, but I also switched which columns appear on the left and right side of the equal sign.  Why switch which column is on the left and which is on the right of the equal sign?  So that the columns belonging to each table were listed in the same order from left to right as the join declaration: dbo.WORK_ORDER RIGHT OUTER JOIN dbo.OPERATION

With that change made, the rules of pig outer join apply.  With the ANSI join type of RIGHT OUTER JOIN, the (+) notation is placed after the column that is to the left of the equal sign.  That column with the (+) notation appended is permitted to return a NULL value when there is no matching row in that column’s table for that join condition.  The (+) notation follows all of the columns from the WORK_ORDER table that appear in the join.  To the casual observer, one might think that there is a parent-child relationship between the two tables, maybe even a delared foreign key relationship with the OPERATION table as the parent and the WORK_ORDER table as the child.

There is in fact a declared foreign key relationship between the two tables.  For Oracle Database backends, that declared foreign key relationship was created using a command similar to the following:

ALTER TABLE OPERATION ADD(
  CONSTRAINT CONFUSE_ME_NOT FOREIGN KEY (
    WORKORDER_TYPE,
    WORKORDER_BASE_ID,
    WORKORDER_LOT_ID,
    WORKORDER_SPLIT_ID,
    WORKORDER_SUB_ID)
  REFERENCES WORK_ORDER ON DELETE CASCADE);

So yes, a delared foreign key relationship exists between the two tables.  But wait, doesn’t the above constraint require that for a row to be present in the OPERATION table, a corresponding row must already exist in the WORK_ORDER table?

Maybe I should not have flip-flopped which columns appear on the left and right side of the equal sign?  Did I mention that I am not much of a fan of ANSI style joins?  There might be a few bugs in Oracle Database related to its automatic conversion of ANSI style joins to classical Oracle joins, but let’s try a test anyway.

I will use the autotrace functionality in SQL*Plus to output the execution plan – there are times when autotrace outputs the wrong execution plan for a SQL statement, but we will ignore that quirk for now.  I will add a NO_QUERY_TRANSFORMATION hint to a much shortened version of the original poster’s (OP’s) original query – this hint was an attempt to keep Oracle’s query optimizer from recognizing that there is a declared foreign key relationship between the two tables, and automatically converting the outer join into an inner join (oddly, the Oracle query optimizer did not alter the join to an inner join when the hint was removed).

SET AUTOTRACE TRACEONLY EXPLAIN
SET LINESIZE 200
SET PAGESIZE 1000

SELECT /*+ NO_QUERY_TRANSFORMATION */
  *
FROM
  WORK_ORDER,
  OPERATION
WHERE
  WORK_ORDER.TYPE(+) = OPERATION.WORKORDER_TYPE
  AND WORK_ORDER.BASE_ID(+) = OPERATION.WORKORDER_BASE_ID
  AND WORK_ORDER.LOT_ID(+) = OPERATION.WORKORDER_LOT_ID
  AND WORK_ORDER.SPLIT_ID(+) = OPERATION.WORKORDER_SPLIT_ID
  AND WORK_ORDER.SUB_ID(+) = OPERATION.WORKORDER_SUB_ID
  AND OPERATION.WORKORDER_TYPE = 'W'
  AND OPERATION.WORKORDER_BASE_ID = '13000';

Execution Plan
----------------------------------------------------------
Plan hash value: 4262719396

---------------------------------------------------------------------------------------------
| Id  | Operation                    | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |              |    31 | 12772 |    21   (5)| 00:00:01 |
|*  1 |  HASH JOIN OUTER             |              |    31 | 12772 |    21   (5)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID| OPERATION    |    31 |  6944 |    12   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | SYS_C0021734 |    31 |       |     3   (0)| 00:00:01 |
|   4 |   TABLE ACCESS BY INDEX ROWID| WORK_ORDER   |    11 |  2068 |     8   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN          | SYS_C0021989 |    11 |       |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("WORK_ORDER"."TYPE"(+)="OPERATION"."WORKORDER_TYPE" AND
              "WORK_ORDER"."BASE_ID"(+)="OPERATION"."WORKORDER_BASE_ID" AND
              "WORK_ORDER"."LOT_ID"(+)="OPERATION"."WORKORDER_LOT_ID" AND
              "WORK_ORDER"."SPLIT_ID"(+)="OPERATION"."WORKORDER_SPLIT_ID" AND
              "WORK_ORDER"."SUB_ID"(+)="OPERATION"."WORKORDER_SUB_ID")
   3 - access("OPERATION"."WORKORDER_TYPE"='W' AND
              "OPERATION"."WORKORDER_BASE_ID"='13000')
   5 - access("WORK_ORDER"."TYPE"(+)='W' AND "WORK_ORDER"."BASE_ID"(+)='13000')

For now, just focus on the Predicate Information section of the generated execution plan, in particular the information found in the “1 – access” section – that section shows how the two tables are joined together during the hash join operation.  Much of the WHERE clause from the shortened version of the query is found in the “1 – access” section (and there is an indication of an automatically generated predicate in the “5-access” section).

Let’s take a look at an ANSI join version of the shortened SQL statement, with the WORK_ORDER table listed first in the join syntax, the OPERATION table listed second in the join syntax (as it was in the OP’s SQL statement), and with the columns in the ON clause flip-flopped on each side of the = signs (the opposite order in which the columns were listed in the OP’s original SQL statement):

SELECT
  *
FROM
  WORK_ORDER RIGHT OUTER JOIN OPERATION
ON (WORK_ORDER.TYPE = OPERATION.WORKORDER_TYPE
    AND WORK_ORDER.BASE_ID = OPERATION.WORKORDER_BASE_ID
    AND WORK_ORDER.LOT_ID = OPERATION.WORKORDER_LOT_ID
    AND WORK_ORDER.SPLIT_ID = OPERATION.WORKORDER_SPLIT_ID
    AND WORK_ORDER.SUB_ID = OPERATION.WORKORDER_SUB_ID)
WHERE
  OPERATION.WORKORDER_TYPE = 'W'
  AND OPERATION.WORKORDER_BASE_ID = '13000';

Execution Plan
----------------------------------------------------------
Plan hash value: 4262719396

---------------------------------------------------------------------------------------------
| Id  | Operation                    | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |              |    31 | 12772 |    21   (5)| 00:00:01 |
|*  1 |  HASH JOIN OUTER             |              |    31 | 12772 |    21   (5)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID| OPERATION    |    31 |  6944 |    12   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | SYS_C0021734 |    31 |       |     3   (0)| 00:00:01 |
|   4 |   TABLE ACCESS BY INDEX ROWID| WORK_ORDER   |    11 |  2068 |     8   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN          | SYS_C0021989 |    11 |       |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("WORK_ORDER"."SUB_ID"(+)="OPERATION"."WORKORDER_SUB_ID" AND
              "WORK_ORDER"."SPLIT_ID"(+)="OPERATION"."WORKORDER_SPLIT_ID" AND
              "WORK_ORDER"."LOT_ID"(+)="OPERATION"."WORKORDER_LOT_ID" AND
              "WORK_ORDER"."BASE_ID"(+)="OPERATION"."WORKORDER_BASE_ID" AND
              "WORK_ORDER"."TYPE"(+)="OPERATION"."WORKORDER_TYPE")
   3 - access("OPERATION"."WORKORDER_TYPE"='W' AND
              "OPERATION"."WORKORDER_BASE_ID"='13000')
   5 - access("WORK_ORDER"."TYPE"(+)='W' AND "WORK_ORDER"."BASE_ID"(+)='13000')

In the above, just focus on the Predicate Information section for a moment, in particular the information found in the “1 – access” section – that section shows how the two tables are joined together during the hash join operation.  Notice the location of the (+) symbols, immediately after the columns from the WORK_ORDER table – that arrangement exactly matches my Oracle style join syntax.  You might have also noticed that the plan hash value is 4262719396 in both of the execution plans (that plan hash value consistency does NOT verify that the Predicate Information section is identical for the two execution plans, but it does verify that otherwise the execution plans are identical).

So, the Oracle query optimizer esentially transformed the ANSI style join version of the SQL statement into the same (optimizer transformed) SQL statement as I submitted using the classical Oracle outer join syntax.  Nice, however, that still leaves a question.  Does it matter in ANSI style joins which column is to the left of the equal sign in the ON clause?

Let’s try the modified ANSI SQL statement again, this time with the tables listed in the same order of the join clause as in the OP’s SQL statement, and the columns in the same order of the ON clause as in the OP’s SQL statement:

SELECT
  *
FROM
  WORK_ORDER RIGHT OUTER JOIN OPERATION
ON (OPERATION.WORKORDER_TYPE = WORK_ORDER.TYPE
    AND OPERATION.WORKORDER_BASE_ID = WORK_ORDER.BASE_ID
    AND OPERATION.WORKORDER_LOT_ID = WORK_ORDER.LOT_ID
    AND OPERATION.WORKORDER_SPLIT_ID = WORK_ORDER.SPLIT_ID
    AND OPERATION.WORKORDER_SUB_ID = WORK_ORDER.SUB_ID)
WHERE
  OPERATION.WORKORDER_TYPE = 'W'
  AND OPERATION.WORKORDER_BASE_ID = '13000';

Execution Plan
----------------------------------------------------------
Plan hash value: 4262719396

---------------------------------------------------------------------------------------------
| Id  | Operation                    | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |              |    31 | 12772 |    21   (5)| 00:00:01 |
|*  1 |  HASH JOIN OUTER             |              |    31 | 12772 |    21   (5)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID| OPERATION    |    31 |  6944 |    12   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | SYS_C0021734 |    31 |       |     3   (0)| 00:00:01 |
|   4 |   TABLE ACCESS BY INDEX ROWID| WORK_ORDER   |    11 |  2068 |     8   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN          | SYS_C0021989 |    11 |       |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("OPERATION"."WORKORDER_SUB_ID"="WORK_ORDER"."SUB_ID"(+) AND
              "OPERATION"."WORKORDER_SPLIT_ID"="WORK_ORDER"."SPLIT_ID"(+) AND
              "OPERATION"."WORKORDER_LOT_ID"="WORK_ORDER"."LOT_ID"(+) AND
              "OPERATION"."WORKORDER_BASE_ID"="WORK_ORDER"."BASE_ID"(+) AND
              "OPERATION"."WORKORDER_TYPE"="WORK_ORDER"."TYPE"(+))
   3 - access("OPERATION"."WORKORDER_TYPE"='W' AND
              "OPERATION"."WORKORDER_BASE_ID"='13000')
   5 - access("WORK_ORDER"."TYPE"(+)='W' AND "WORK_ORDER"."BASE_ID"(+)='13000')

You might have noticed that once again, the plan hash value is 4262719396, just as it was for the previous two execution plans.  What does that plan hash value consistency indicate again?  In the above, focus on the Predicate Information section for a moment, in particular the “1 – access” section.  Notice again that the (+) symbols appear immediately after the columns from the WORK_ORDER table, even though the column order, in respect to the = sign, mirrors that of the submitted SQL statement (Oracle’s query optimizer is permitted to flip-flop the columns that appear on each side of the = sign, however that rearrangement did not happen during this demonstration).

So, what have we learned from the above information?  Are you feeling ANSI yet?





Name that Table’s Column

9 11 2012

November 9, 2012

I have not had a lot of time to browse through forums lately, but I noticed an interesting thread in the comp.databases.oracle.server Usenet group.  The OP in the thread was curious why quotes (double quotes) were required around a particular column when referenced in a SQL statement, because specifying quotes around the column name is apparently a little challenging in the PHP scripting language.

I thought about this issue a bit, wondering “how did that happen” and then thought about the benefits of this approach.  In theory, a sophisticated programmer could build a 16 column table using a single four character column name (with different letters capitalized).  Such an approach is sure to evoke a couple of choice four letter words!

I thought that I would throw together a quick example table:

CREATE TABLE T1(
  My_Data NUMBER,
  "My_Data" NUMBER,
  "my_data" NUMBER,
  "MY_DATA" NUMBER);

That almost worked:

SQL> CREATE TABLE T1(
  2    My_Data NUMBER,
  3    "My_Data" NUMBER,
  4    "my_data" NUMBER,
  5    "MY_DATA" NUMBER);
  "MY_DATA" NUMBER)
  *
ERROR at line 5:
ORA-00957: duplicate column name

Columns 1 and 4 have the same name.  Let’s fix that problem and try again:

CREATE TABLE T1(
  My_Data NUMBER,
  "My_Data" NUMBER,
  "my_data" NUMBER,
  "MY_DATa" NUMBER);

Table created.

That worked, now we have a table with four columns, where all of the column names are the same.  Notice that the first column name was not wrapped in quotes.

Let’s insert a row into the table:

INSERT INTO T1 VALUES(
  1,
  2,
  3,
  4);

1 row created.

Let’s see what happens when we query the table:

SELECT
  *
FROM
  T1
WHERE
  MY_DATA=1;

   MY_DATA    My_Data    my_data    MY_DATa
---------- ---------- ---------- ----------
         1          2          3          4

Notice that a row was returned, even though the upper/lower case of the column in the WHERE clause did not match the first column name when the T1 table was created (when the table was created, for the first column, the column name was created as if were specified in all uppercase letters).

Let’s see if a row is returned if we try the following (notice that the attempt at querying with the fourth column based on matching that column’s capitalization in the WHERE clause failed to restrict the query results based on the contents of the fourth column in the table):

SELECT
  *
FROM
  T1
WHERE
  MY_DATa=1;

   MY_DATA    My_Data    my_data    MY_DATa
---------- ---------- ---------- ----------
         1          2          3          4

Let’s try again, this time wrapping the column name found in the WHERE clause in quotes:

SELECT
  *
FROM
  T1
WHERE
  "MY_DATA"=1;

   MY_DATA    My_Data    my_data    MY_DATa
---------- ---------- ---------- ----------
         1          2          3          4

A row was still returned (as expected).  So then, how do we determine which rows in the table have a value of 1 in the fourth column?  We need to place the column name found in the WHERE clause within quotes as follows:

SELECT
  *
FROM
  T1
WHERE
  "MY_DATa"=1;

no rows selected

Notice that this time no rows were returned (as intended, and expected).

I guess that the lesson here is to be careful when creating tables in Oracle Database.  Just because other database products may place column names within quotes as standard practice, that does not indicate that you, as the developer, should continue the practice when working with Oracle Databases.  Now get out there and create some 16 column tables with choice four letter words.  :-)





On the Topic of Programming 3

3 09 2012

September 3, 2012

(Back to the Previous Post in the Series)

In the previous article in this series, a question was posed regarding how many SQL statements would be required to generate the following report, which displays in Microsoft Excel:

A significant number of items regarding the report require explanation:

  • Employee IDs and employee names appear in column A (the left-most column).  There could be many employees, but only those employees reporting labor transactions for machining operations in production batches (work order lots) that completed during one of the weeks should be listed.
  • Part IDs, part descriptions, and operations (Sub n OP nn in the above example report) are listed in column B (the second column from the left).  The number of operations listed for an employee could range from one to possibly 50 (or more, depending on the date range of the report), but that list will only include machining type operations (not welding, blasting, painting, laser cutting, etc.).
  • Total hours are reported in column C.  The second output row for an employee shows the sum of the numbers directly below for the employee.  The numbers on the 3rd, 6th, 9th, etc. rows for the employee in column C are the sum of the hours the employee worked in the operation for all manufacturing batches that completed in the report time period.
  • The first row for each employee, in columns E though H in the above report (could very well extend beyond column H, depending on the number of weeks selected), are the Monday through Sunday dates for the week.
  • The second row for each employee, in columns E through H in the above report show the weighted average efficiency for the employee in that week.  The weighting is based on the percentage of hours the employee spent in the operations, for those operations in batches that completed during the week.  This number of hours is calculated, but not printed in the report.
  • The numbers on the 4th, 7th, 10th, etc. rows for each employee, in columns E through H, shows the average number of hours the operation required, for those batches that completed during the week.  If an employee did not work in the operation one week, but did another, then a blank should appear in the week that the employee did not work in the operation.  All employees that worked in the operation for the week will have the same average printed.  For example, Cary Grant and James Stewart both worked on WIDGET101 operation Sub 0 OP 30.  The average hours for the first week is 10.50, and that value appears for both employees.  During the second week, only James Stewart worked on that operation, and the average hours for that week decreased to 7.56.
  • The numbers of on the 3rd, 6th, 9th, etc. rows for each employee, shows the ratio of the engineering standard hours divided by the average hours for the operation in that week.  The same values will appear for all employees reporting time to the operation in the week.
  • The percentage in column I is the simple average of the weighted averages to the left.  There may be causes where a blank will appear for one of the weighted averages for an employee, so it is not possible to simply sum the weighted averages and divide by the number of weeks.

This appeared to be a simple report when it was just a sketch on a sheet of paper!

How many SQL statement executions will be necessary to generate the above report?  Before answering that question, let’s take another look at how the data is organized and related between the tables:

OK, so the above diagram might be a little confusing.

The engineering standard hours are found by querying the OPERATION table with the WORKORDER_TYPE=’W’ and the WORKORDER_BASE_ID equal to the part ID that is of interest (the WORK_ORDER table also contains engineering standard information, accessed by searching for TYPE=’M’ and the BASE_ID equal to the part ID tht is of interest).  In theory, there could be multiple engineering standards for the same part ID and operation, which is why we might need to join to the PART table to make certain that the current engineering standard is retrieved, but we will ignore that potential issue for now.  It would be a good idea to also check the WORK_ORDER table to obtain a distinct list of part IDs that had a close date in the time range that is of interest (there is no sense in forcing the database RDBMS to retrieve the engineering standards for parts that have not been produced in the last 17 years), so we will need to specify that the TYPE column in the WORK_ORDER table is equal to W and that the CLOSE_DATE is in the date range of the report.  Many operations are NOT machining operations, and we can only determine the type of operation by determining the type machine that is selected for use, so we must also query the SHOP_RESOURCE table to determine if the operation is processed at a machining center.

The actual production hours are found by querying the LABOR_TICKET table with the the WORKORDER_TYPE=’W’ (to eliminate spurious indirect labor) and joined to the SHOP_RESOURCE table to make certain that the operation is a machining type operation (as explained above).  To determine the employees’ name, the LABOR_TICKET table is joined to the EMPLOYEE table.  To determine the part ID that is produced in the labor ticket transaction, the LABOR_TICKET table must be joined to the WORK_ORDER table, with the TYPE column in the WORK_ORDER table set to ‘W’ and the SUB_ID column in the WORK_ORDER table set to ‘0’ so that the main part ID for the work order is returned rather than a subordinate part ID.  To retrieve the description of the part ID that is produced, the WORK_ORDER table must be joined to the PART table.

That sketch of a report on a sheet of paper sure looked simple, but the degree of difficulty is all relative (or relational).

How many query executions will be required to produce the report that will appear in Microsoft Excel?  Should we retrieve the entire database over the network and process the data client-side?  I don’t program in the Java programming language, but I am guessing that the majority of Java programmers would not attempt to retrieve the entire database.   Would the number of SQL statement executions depend on the number of employees?  Would the number of SQL statement executions depend on the number of different part operations whose batches closed in a particular week?  Would the number of SQL statement executions depend on the number weeks included in the report?

In the words of the TV show “Name that Tune“, I can name that tune in 1 note.  Because there are many to many joins between the various table, it is not possible to construct a single simple SQL statement that retrieves the correct result.  However, it is possible to construct a single complex query using inline views or subquery factoring (WITH blocks), along with analytic functions to retrieve the correct, necessary information to the client for presentation in Microsoft Excel.  The correct answer to the above question must:

  • Minimize the amount of data flowing across the network.  SQL*Net compression may help, but of course that can only possibly help if more than one row is retrieved at a time.  Ideally, the array fetch size should be set to a large value (possibly 100 up to the maximum of 5,000) to make the best use of SQL*Net compression.
  • Minimize as best as possible the performance impact on the database server, the database server is a shared resource without unlimited capacity.  Setting a reasonably large array fetch size may help reduce the number of memory accesses (specifically consistent gets) on the database server.  Make certain that available indexes are usable by the query – be careful about using functions (such as TRUNC) on date type columns in the WHERE clause.
  • When a query accesses many tables, and also when there are multiple inline views (or factored subqueries), the Oracle optimizer may not always find the optimal join order.  The _OPTIMIZER_MAX_PERMUTATIONS hidden parameter defaults to a value of 2,000, which of course restricts the number of possible join permutations attempted per query block in a query, so with many tables in a query block (more than 6, for instance), the limit for the maximum number of permutations may easily be hit.  Statistics must be reasonably accurate for cardinality estimates to be close to accurate, otherwise the query optimizer may select not only an inappropriate join order but also an inappropriate join type.  Histograms may help (or harm) the query optimizer, indexes may help (or harm), establishing default cardinality and cost for custom PL/SQL functions may help, and adding hints to help control the cardinality estimates or join orders may also help.
  • Lastly, and most importantly, the queries must provide correct information.

I struggled with this problem a bit.  The report is intended to measure efficiency, so it would be somewhat laughable if the report performed inefficiently.  How could I minimize execution time for this particular report?  If Oracle Database is licensed by CPU core (or CPU socket in the case of the Standard Edition), does it make sense to perform all of the computations on the database server, or does it make sense to off-load some of that processing to the client’s dual, quad, or eight core CPU?  If the report were rendered by a shared resource, such as a web server, application server, or Citrix server, would I want that same CPU offloading as I would have planned with a dedicated client computer?  The report shows the information in a (modified) pivot table format, should I use the PIVOT function that was introduced with Oracle Database 11.1; or maybe I should use client-side control break logic that was taught in one of my university-level structured programming classes years ago?

That sketch of a report on a sheet of paper sure looked simple, now I need a control break (must be simple one key solution for that problem – check your keyboard :-) ).

It is possible to group the manufacturing lots by the Monday of the work week of the completion date (CLOSE_DATE) with the following function call:

NEXT_DAY(TRUNC(CLOSE_DATE),'MONDAY')-7)

Looks fairly simple.  If I had a date range of four weeks in mind, I might be inclined to do something like this in the WHERE clause of the query:

CLOSE_DATE BETWEEN (NEXT_DAY(TO_DATE('20-AUG-2012','DD-MON-YYYY'),'MONDAY')-28) AND (NEXT_DAY(TO_DATE('20-AUG-2012','DD-MON-YYYY'),'MONDAY')-1)

However, I did not do that.  Why not?  One reason is that the Oracle query optimizer would likely have a hard time determining the approximate number of rows that would be returned from the table with the CLOSE_DATE column.  The NEXT_DAY(TO_DATE()) combination hides the actual date range of interest, so the optimizer falls back to using default cardinality percentages for that predicate.  Secondly, what happens when the user of the program is interested in something other than four weeks – the solution may be cumbersome to maintain.  Thirdly, the above did not make use of bind variables, so every time the date range is changed, the query optimizer will perform a hard parse of the query.  A better approach calculates the date range on the client-side and submits the WHERE clause like this:

CLOSE_DATE BETWEEN TO_DATE('30-JUL-2012','DD-MON-YYYY') AND TO_DATE('26-AUG-2012','DD-MON-YYYY')

Or, better yet using bind variables:

CLOSE_DATE BETWEEN :START_DATE AND :END_DATE

So, how many query executions will this report require?  Just one query, executed a single time.  The particular programming environment that I used requires ? to appear in bind variable positions, so when you see a ? just read it as either :START_DATE or :END_DATE.

SELECT
  L.EMPLOYEE_ID,
  E.LAST_NAME||', '||E.FIRST_NAME EMPLOYEE_NAME,
  L.PART_ID,
  L.WORKORDER_SUB_ID,
  L.OPERATION_SEQ_NO,
  L.DESCRIPTION,
  L.CLOSE_WEEK,
  L.PART_EFF_WEEK,
  L.AVG_HRS_PC,
  L.EMP_ACCUM_EFF_WEEK,
  SUM(L.HOURS_WORKED) OVER (PARTITION BY L.EMPLOYEE_ID) EMP_HOURS_WORKED,
  SUM(L.HOURS_WORKED) OVER (PARTITION BY L.EMPLOYEE_ID,L.PART_ID, L.WORKORDER_SUB_ID,L.OPERATION_SEQ_NO) EMP_PART_HOURS_WORKED,
  SUM(L.EMP_ACCUM_EFF_WEEK) OVER (PARTITION BY L.EMPLOYEE_ID,L.CLOSE_WEEK) EMP_EFF_WEEK
FROM
  (SELECT
    O.PART_ID,
    O.DESCRIPTION,
    O.WORKORDER_SUB_ID,
    O.OPERATION_SEQ_NO,
    O.CLOSE_WEEK,
    O.RECEIVED_QTY,
    O.ACT_RUN_HRS,
    L.EMPLOYEE_ID,
    L.HOURS_WORKED,
    ROUND(O.ACT_RUN_HRS/O.RECEIVED_QTY,2) AVG_HRS_PC,
    ENG.ENG_HRS,
    ROUND(DECODE((O.ACT_RUN_HRS/O.RECEIVED_QTY),0,1,ENG.ENG_HRS/(O.ACT_RUN_HRS/O.RECEIVED_QTY)),4) PART_EFF_WEEK,
    SUM(L.HOURS_WORKED) OVER (PARTITION BY L.EMPLOYEE_ID,L.CLOSE_WEEK) EMP_HRS_WEEK,
    ROUND(L.HOURS_WORKED/(SUM(L.HOURS_WORKED) OVER (PARTITION BY L.EMPLOYEE_ID,L.CLOSE_WEEK)),4) EMP_PERCENT_WEEK,
    ROUND((L.HOURS_WORKED/(SUM(L.HOURS_WORKED) OVER (PARTITION BY L.EMPLOYEE_ID,L.CLOSE_WEEK))) * DECODE((O.ACT_RUN_HRS/O.RECEIVED_QTY),0,1,ENG.ENG_HRS/(O.ACT_RUN_HRS/O.RECEIVED_QTY)),4) EMP_ACCUM_EFF_WEEK
  FROM
    (SELECT
      PART_ID,
      DESCRIPTION,
      WORKORDER_SUB_ID,
      OPERATION_SEQ_NO,
      NEXT_DAY(TRUNC(CLOSE_DATE),'MONDAY')-7 CLOSE_WEEK,
      SUM(RECEIVED_QTY) RECEIVED_QTY,
      SUM(ACT_RUN_HRS) ACT_RUN_HRS
    FROM
      (SELECT
        WO.TYPE,
        WO.BASE_ID,
        WO.LOT_ID,
        WO.SPLIT_ID,
        WO.RECEIVED_QTY,
        WO.CLOSE_DATE,
        SUM(O.ACT_SETUP_HRS+O.ACT_RUN_HRS) ACT_RUN_HRS,
        WO.PART_ID,
        O.WORKORDER_SUB_ID,
        O.SEQUENCE_NO OPERATION_SEQ_NO,
        P.DESCRIPTION
      FROM
        WORK_ORDER WO,
        PART P,
        OPERATION O,
        SHOP_RESOURCE SR
      WHERE
        WO.TYPE='W'
        AND WO.SUB_ID='0'
        AND WO.CLOSE_DATE BETWEEN ? AND ?
        AND WO.RECEIVED_QTY>0
        AND WO.TYPE=O.WORKORDER_TYPE
        AND WO.BASE_ID=O.WORKORDER_BASE_ID
        AND WO.LOT_ID=O.WORKORDER_LOT_ID
        AND WO.SPLIT_ID=O.WORKORDER_SPLIT_ID
        AND WO.PART_ID=P.ID
        AND O.RESOURCE_ID=SR.ID
        AND SR.BUILDING_ID='Machine'
        AND SR.LOGICAL_GROUP IN ('10 TURNING CENTERS','20 VERTICAL M.C.','30 630MM PALLET HMC','40 800MM-1000MM PALLET HMC','50 1000MM PALLET HMC','55 LARGE 2-TABLE HMC/HBM','60 SINGLE TABLE HBM','60 SINGLE TABLE HDM','70 LARGE FLOOR MACHINES')
      GROUP BY
        WO.TYPE,
        WO.BASE_ID,
        WO.LOT_ID,
        WO.SPLIT_ID,
        WO.CLOSE_DATE,
        WO.RECEIVED_QTY,
        WO.PART_ID,
        O.WORKORDER_SUB_ID,
        O.SEQUENCE_NO,
        P.DESCRIPTION)
    GROUP BY
      PART_ID,
      DESCRIPTION,
      WORKORDER_SUB_ID,
      OPERATION_SEQ_NO,
      NEXT_DAY(TRUNC(CLOSE_DATE),'MONDAY')-7) O,
    (SELECT /*+ LEADING(WO LT) */
      WO.PART_ID,
      LT.EMPLOYEE_ID,
      LT.WORKORDER_SUB_ID,
      LT.OPERATION_SEQ_NO,
      NEXT_DAY(TRUNC(CLOSE_DATE),'MONDAY')-7 CLOSE_WEEK,
      SUM(LT.HOURS_WORKED) HOURS_WORKED
    FROM
      (SELECT
        WO.TYPE,
        WO.BASE_ID,
        WO.LOT_ID,
        WO.SPLIT_ID,
        WO.CLOSE_DATE,
        WO.PART_ID
      FROM
        WORK_ORDER WO,
        OPERATION O,
        SHOP_RESOURCE SR
      WHERE
        WO.TYPE='W'
        AND WO.SUB_ID='0'
        AND WO.CLOSE_DATE BETWEEN ? AND ?
        AND WO.RECEIVED_QTY>0
        AND WO.TYPE=O.WORKORDER_TYPE
        AND WO.BASE_ID=O.WORKORDER_BASE_ID
        AND WO.LOT_ID=O.WORKORDER_LOT_ID
        AND WO.SPLIT_ID=O.WORKORDER_SPLIT_ID
        AND O.RESOURCE_ID=SR.ID
        AND SR.BUILDING_ID='Machine'
        AND SR.LOGICAL_GROUP IN ('10 TURNING CENTERS','20 VERTICAL M.C.','30 630MM PALLET HMC','40 800MM-1000MM PALLET HMC','50 1000MM PALLET HMC','55 LARGE 2-TABLE HMC/HBM','60 SINGLE TABLE HBM','60 SINGLE TABLE HDM','70 LARGE FLOOR MACHINES')
      GROUP BY
        WO.TYPE,
        WO.BASE_ID,
        WO.LOT_ID,
        WO.SPLIT_ID,
        WO.CLOSE_DATE,
        WO.PART_ID) WO,
      LABOR_TICKET LT,
      SHOP_RESOURCE SR
    WHERE
      WO.TYPE=LT.WORKORDER_TYPE
      AND WO.BASE_ID=LT.WORKORDER_BASE_ID
      AND WO.LOT_ID=LT.WORKORDER_LOT_ID
      AND WO.SPLIT_ID=LT.WORKORDER_SPLIT_ID
      AND LT.TYPE IN ('R','S')
      AND LT.HOURS_WORKED<>0
      AND SR.LOGICAL_GROUP IN ('10 TURNING CENTERS','20 VERTICAL M.C.','30 630MM PALLET HMC','40 800MM-1000MM PALLET HMC','50 1000MM PALLET HMC','55 LARGE 2-TABLE HMC/HBM','60 SINGLE TABLE HBM','60 SINGLE TABLE HDM','70 LARGE FLOOR MACHINES')
      AND LT.RESOURCE_ID=SR.ID
    GROUP BY
      WO.PART_ID,
      LT.WORKORDER_SUB_ID,
      LT.OPERATION_SEQ_NO,
      LT.EMPLOYEE_ID,
      NEXT_DAY(TRUNC(CLOSE_DATE),'MONDAY')-7) L,
    (SELECT /*+ LEADING(WO) */
      WO2.TYPE,
      WO2.BASE_ID,
      O.WORKORDER_SUB_ID,
      O.SEQUENCE_NO OPERATION_SEQ_NO,
      SUM(O.SETUP_HRS+O.RUN_HRS) ENG_HRS,
      WO.PART_ID
    FROM
      (SELECT
        WO.PART_ID
      FROM
        WORK_ORDER WO
      WHERE
        WO.TYPE='W'
        AND WO.SUB_ID='0'
        AND WO.CLOSE_DATE BETWEEN ? AND ?
        AND WO.RECEIVED_QTY>0
      GROUP BY
        WO.PART_ID) WO,
      WORK_ORDER WO2,
      OPERATION O,
      SHOP_RESOURCE SR
    WHERE
      WO2.TYPE='M'
      AND WO.PART_ID=WO2.BASE_ID
      AND WO2.LOT_ID='0'
      AND WO2.SPLIT_ID='0'
      AND WO2.SUB_ID='0'
      AND WO2.TYPE=O.WORKORDER_TYPE
      AND WO2.BASE_ID=O.WORKORDER_BASE_ID
      AND WO2.LOT_ID=O.WORKORDER_LOT_ID
      AND WO2.SPLIT_ID=O.WORKORDER_SPLIT_ID
      AND O.RESOURCE_ID=SR.ID
      AND SR.BUILDING_ID='Machine'
      AND SR.LOGICAL_GROUP IN ('10 TURNING CENTERS','20 VERTICAL M.C.','30 630MM PALLET HMC','40 800MM-1000MM PALLET HMC','50 1000MM PALLET HMC','55 LARGE 2-TABLE HMC/HBM','60 SINGLE TABLE HBM','60 SINGLE TABLE HDM','70 LARGE FLOOR MACHINES')
    GROUP BY
      WO2.TYPE,
      WO2.BASE_ID,
      WO.PART_ID,
      O.WORKORDER_SUB_ID,
      O.SEQUENCE_NO) ENG
  WHERE
    O.PART_ID=L.PART_ID
    AND O.WORKORDER_SUB_ID=L.WORKORDER_SUB_ID
    AND O.OPERATION_SEQ_NO=L.OPERATION_SEQ_NO
    AND O.CLOSE_WEEK=L.CLOSE_WEEK
    AND O.PART_ID=ENG.PART_ID
    AND O.WORKORDER_SUB_ID=ENG.WORKORDER_SUB_ID
    AND O.OPERATION_SEQ_NO=ENG.OPERATION_SEQ_NO) L,
  EMPLOYEE E
WHERE
  L.EMPLOYEE_ID=E.ID
ORDER BY
  L.EMPLOYEE_ID,
  L.PART_ID,
  TO_NUMBER(L.WORKORDER_SUB_ID),
  L.OPERATION_SEQ_NO,
  L.CLOSE_WEEK;

I suppose that I could have performed more of the processing on the database server by sliding the above query into an inline view and used the PIVOT function, and/or additional analytic functions to calculate the employee’s weighted efficiency average per week and the simple averages of the wieghted averages – doing so would have eliminated some potentially complex client-side logic that needed to be programmed a single time, but counter point is that every time the report executed, it would require a more of the database server’s resources than were absolutely required.

That sketch of a report on a sheet of paper sure looked simple, but it turned into a three part blog article series.  Interesting, now the person would like to analyze the data by part ID, listing the employees working on the operations for the part.  Do I smell another three part blog article series?  Sorry, no – I just changed the ORDER BY clause so that the PART_ID column was listed first, and made a small change to the client-side control break logic.  Now the person wants to analyze 104 weeks worth of data rather than just four weeks, and only a single part ID or employee.  Too late, those changes were anticipated, and included in the original specification in the client-side programming, implemented with a simple change to the query and pre-planning for a variable number of weeks in the report.





On the Topic of Programming 2

2 09 2012

September 2, 2012

(Back to the Previous Post in the Series) (Forward to the Next Post in the Series)

Over the course of the last six months I developed several efficiency reports for the machining facility where I work, measuring the average time required to complete a customer’s part compared to the engineering standard “target” run time for producing the part.  Sounds simple, right?  If the engineering target is 30 widgets per hour, and only 20 widgets per hour are produced, then the average efficiency is 20/30 = 66.67%.  It does not take a person with a degree in mathematics to arrive at the conclusion that the widgets are not being produced as quickly as expected.  What is the scope of the measurement time frame for the average: a year, a month, a week, a day, one work shift in a day, a single machining center for a day, a single machining center and employee, or something else?

OK, now flip the engineering standard run times a bit so that the times are a little more consistent with the machining facility where I work.  It might take four hours, six hours, 10 hours, 20 hours, or possibly even 200 hours to complete a single operation at a machining center (one of several operations in the manufacturing process for the widget) to produce a single widget.  With the long run times of the operations, calculating the efficiency of a machining center or an employee for a specified period of time was a daunting task… a task that I was asked to solve roughly 12 years ago (with the help of Oracle Database 8.0.5).

The hours per widget run time (rather than widgets per hour) situation presented various problems for calculating employee efficiency, especially when an operation at a machining center did not complete before the end of an employee’s shift.  Consider a situation where an operation is expected to require eight hours to complete, but only when the tooling used at the machining center is new.  As such, the first shift employee installs new tooling in the machining center every morning before starting a new widget.  The first shift employee spends about 30 minutes finishing up the widget that was started the previous night, changes the tooling, and then starts the machining on the next widget.  So, the first shift employee reports that one widget (started the previous night) completed after 0.5 hours and the second widget completed in 7.5 hours.  The first shift employee’s efficiency, considering that two widgets were completed during his shift, is easily calculated as (8 + 8) / (0.5 + 7.5) * 100 = 200%.  The second shift employee’s efficiency is a consistent 0% because the operation for the widget never completes during his shift because the sharpness of the tooling deteriorates through the day (thus causing the machining operation to take longer).  This obviously leads to odd questions: 1) Why is the second shift employee more efficient when the first shift employee is on vacation (equivalent to the question in Oracle performance tuning: why does my report run faster when it rains Monday mornings?)?  Why is the second shift employee more efficient when working a nine or 10 hour shift, rather than an eight hour shift?  The fun questions that one is able to answer when familiar with the data…

When an employee starts working on a machining operation, a labor ticket transaction is created detailing the fact that the work for the machining operation is in-process.  When the machining operation completes (or when it is time to leave for the day), the employee closes the labor ticket transaction and reports the number of widgets that completed.  These labor ticket transactions are conveniently stored in a table named LABOR_TICKET.  Back in 2006 or 2007, I decided to take another attempt at solving the efficiency problem – after all, I now had access to analytic functions in Oracle Database (analytic functions did not exist in Oracle Database 8.0.5).  I expected that by extending the time range beyond a single employee and/or shift, I might be able to extract useful efficiency information from the database.  I put together a rather long SQL statement that looked something like this:

SELECT DISTINCT
  1 TYPE,
  WO.PART_ID,
  TO_NUMBER(LT.WORKORDER_SUB_ID) WORKORDER_SUB_ID,
  LT.OPERATION_SEQ_NO,
  TRUNC(SYSDATE-30) SHIFT_DATE,
  LT.EMPLOYEE_ID,
  LT.RESOURCE_ID,
  SUM(LT.HOURS_WORKED) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,EMPLOYEE_ID) HOURS_WORKED,
  SUM(LT.GOOD_QTY) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,EMPLOYEE_ID) GOOD_QTY,
  NULL HRS_PC,
  ROUND(DECODE(SUM(LT.GOOD_QTY) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO),0,0,
    SUM(LT.HOURS_WORKED) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO)/
      SUM(LT.GOOD_QTY) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO)),2) AVG_HRS_PC_TIME_ALL,
  ROUND(DECODE(SUM(LT.GOOD_QTY) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,EMPLOYEE_ID),0,0,
    SUM(LT.HOURS_WORKED) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,EMPLOYEE_ID)/
      SUM(LT.GOOD_QTY) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,EMPLOYEE_ID)),2) AVG_HRS_PC_TIME_EMP,
  ROUND(DECODE(SUM(LT.GOOD_QTY) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,RESOURCE_ID),0,0,
    SUM(LT.HOURS_WORKED) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,RESOURCE_ID)/
      SUM(LT.GOOD_QTY) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,RESOURCE_ID)),2) AVG_HRS_PC_TIME_RES,
  ROUND(DECODE(SUM(LT.GOOD_QTY) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,EMPLOYEE_ID,RESOURCE_ID),0,0,
    SUM(LT.HOURS_WORKED) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,EMPLOYEE_ID,RESOURCE_ID)/
      SUM(LT.GOOD_QTY) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,EMPLOYEE_ID,RESOURCE_ID)),2) AVG_HRS_PC_TIME_EMP_RES
FROM
  WORK_ORDER WO,
  LABOR_TICKET LT
WHERE
  WO.TYPE='W'
  AND WO.SUB_ID='0'
  AND WO.PART_ID='WIDGET1'
  AND LT.SHIFT_DATE BETWEEN TRUNC(SYSDATE-30) AND TRUNC(SYSDATE)
  AND WO.TYPE=LT.WORKORDER_TYPE
  AND WO.BASE_ID=LT.WORKORDER_BASE_ID
  AND WO.LOT_ID=LT.WORKORDER_LOT_ID
  AND WO.SPLIT_ID=LT.WORKORDER_SPLIT_ID
UNION ALL
SELECT DISTINCT
  2 TYPE,
  WO.PART_ID,
  TO_NUMBER(LT.WORKORDER_SUB_ID) WORKORDER_SUB_ID,
  LT.OPERATION_SEQ_NO,
  TRUNC(SYSDATE-60) SHIFT_DATE,
  LT.EMPLOYEE_ID,
  LT.RESOURCE_ID,
  SUM(LT.HOURS_WORKED) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,EMPLOYEE_ID) HOURS_WORKED,
  SUM(LT.GOOD_QTY) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,EMPLOYEE_ID) GOOD_QTY,
  NULL HRS_PC,
  ROUND(DECODE(SUM(LT.GOOD_QTY) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO),0,0,
    SUM(LT.HOURS_WORKED) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO)/
      SUM(LT.GOOD_QTY) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO)),2) AVG_HRS_PC_TIME_ALL,
  ROUND(DECODE(SUM(LT.GOOD_QTY) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,EMPLOYEE_ID),0,0,
    SUM(LT.HOURS_WORKED) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,EMPLOYEE_ID)/
      SUM(LT.GOOD_QTY) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,EMPLOYEE_ID)),2) AVG_HRS_PC_TIME_EMP,
  ROUND(DECODE(SUM(LT.GOOD_QTY) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,RESOURCE_ID),0,0,
    SUM(LT.HOURS_WORKED) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,RESOURCE_ID)/
      SUM(LT.GOOD_QTY) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,RESOURCE_ID)),2) AVG_HRS_PC_TIME_RES,
  ROUND(DECODE(SUM(LT.GOOD_QTY) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,EMPLOYEE_ID,RESOURCE_ID),0,0,
    SUM(LT.HOURS_WORKED) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,EMPLOYEE_ID,RESOURCE_ID)/
      SUM(LT.GOOD_QTY) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,EMPLOYEE_ID,RESOURCE_ID)),2) AVG_HRS_PC_TIME_EMP_RES
FROM
  WORK_ORDER WO,
  LABOR_TICKET LT
WHERE
  WO.TYPE='W'
  AND WO.SUB_ID='0'
  AND WO.PART_ID='WIDGET1'
  AND LT.SHIFT_DATE BETWEEN TRUNC(SYSDATE-60) AND TRUNC(SYSDATE-31)
  AND WO.TYPE=LT.WORKORDER_TYPE
  AND WO.BASE_ID=LT.WORKORDER_BASE_ID
  AND WO.LOT_ID=LT.WORKORDER_LOT_ID
  AND WO.SPLIT_ID=LT.WORKORDER_SPLIT_ID
...

The above SQL statement grew in length substantially based on the number of time periods that I selected to compare.  This data was returned to a program for display purposes, so reformatting the output was always a possibility.  Consider a situation where the programmer is unfamiliar with Oracle specific SQL and is asked to generate the same output in a program that he is constructing.  For a program seeking to measure efficiency, the programmer’s solution would likely involve repeated executions of SQL statements to retrieve the bits and pieces of information that need to be presented – this row by row processing will likely be very slow over a high latency WAN (VPN) connection (but certainly faster than never receiving the information, so the programmer will still be a hero), and could very well hinder the performance of the Oracle Database server.

The above solution is good, but problems such as the progressive wearing of the tooling cutter, as mentioned earlier, could still lead to unexpected differences in efficiency of different employees that are working as fast as is possible.  Fast forward a few years.  The right questions are asked from people with a more complete understanding of efficiency measurements – examine the efficiency calculation problem from an entirely different angle.  Not long ago I was handed an example of a simple efficiency report, and asked to reproduce that report with live data, output in a Microsoft Excel spreadsheet.  The example was little more than a simple sketch, so for this blog article I put together a color-coded example of the report format in Microsoft Excel:

A little explanation is required.  The entirely different angle for calculating employee efficiency in the execution of widget operations involves not looking at the date of the labor ticket transaction, or even the number of widgets produced by a particular employee in a particular time frame.  Instead, the date of the manufacturing batch’s (lot in this particular ERP system) completion date, the quantity produced in the batch, and the total hours to execute a single machining operation for the lot become the criteria when comparing against the engineering standards to determine efficiency.  The manufacturing batch’s completion date is used to divide the batches into specific timeframes (in this case weeks: Monday through Sunday).  All employees working on a particular operation, where the batch completion date is in a certain week, will receive the same efficiency rating for that particular operation (EFF_WEEK1, EFF_WEEK2, EFF_WEEK3, etc.) as all other employees working on the same operation with the same batch completion date range.  It is not uncommon for employees to generate labor transactions for multiple operations for production of the same part, as well as operations for different parts that have batch completion dates in the same week, so the employee’s efficiency rating (W_AVG_EFF1, W_AVG_EFF2, W_AVG_EFF3, etc.) weights the individual part efficiencies based on the percentage of machining time the employee spent in a given operation compared to all of the other operations the employee worked.  The employee’s efficiency rating (Avg W_VE_EFF) is the simple average of the employee’s weekly efficiency ratings.

In the above example report, the blue colored text is a static label.  The red colored text is a calculated static label that shows the Monday through Sunday date range for the week.  The black colored text is a simple “rip and read” from the database – no calculation is required.  The purple colored text indicates a calculated value.  The numbers at the right will not appear on the completed report, but are included to indicate which EFF_WEEKn and AVG_WEEKn values will be identical if the employee worked on an operation whose batch closed in the given week (notice that there are blanks in columns of the sample report, indicating that the employee did not work on that operation in the batch completion week).

The programming challenge is to determine the number of SQL statements that would be needed to retrieve the information from the database, and the number of times those SQL statements would need to be executed.  Would the number of SQL statement executions depend on the number of employees?  Would the number of SQL statement executions depend on the number of different part operations whose batches closed in a particular week?  Would the number of SQL statement executions depend on the number weeks included in the report?  Before you think about the problem, we should probably investigate the data organization in the various tables.  The picture below shows the tables (and the specific columns) required for the report, showing the tables for the engineering standards in green boxes and the tables needed for the labor transaction analysis in black boxes.  The lines show the association of the data in the different tables.

(The above image was adjusted 7 hours after this blog article was initially posted.  The linking between the green OPERATION table and the green WORK_ORDER table was corrected, and the red colored text was added to indicate whether the table was to be used for retrieving the engineering master standards (WORKORDER_TYPE=’M’) or information from the production work order batches (WORKORDER_TYPE=’W’ or TYPE=’W’).)

It might also be interesting to think about what processing will be performed on the data returned by the SQL statements – that could be a pivotal design decision for the SQL statements.  Curve balls are to be expected – what is currently an examination of four weeks’ worth of efficiency numbers today might be an examination of 52 weeks tomorrow.  Oh, here is a thought, what if instead on the next day it is important to focus on the various part efficiencies and the employees who worked on the operations, rather than focusing on the employees and the various parts that the employees worked on?

Part 3 of this series will attempt to answer some of the above questions.

Late addition, sample of a completed report in Excel:





On the Topic of Programming 1

26 08 2012

August 26, 2012

(Forward to the Next Post in the Series)

As those of you who have read this blog’s About page probably know, my day to day job responsibilities involve a lot of activities that are not specific to Oracle Database performance tuning, or even remotely Oracle DBA type activities.  Those extra acttivites are part of what keeps the job fresh and interesting, whether I am swapping in a new roll of labels into a Zebra label printer (that was a three minute fun task this morning), troubleshooting phone system problems (from a fork lift truck “disabling” a phone, to programming the PBX with a clever “message delivery system” to waste the time of persistent telemarketers), swapping out the power supply in a computer very early in the morning, or engaged in a marathon of application programming into the evening and weekend hours.

One of the recent programming projects involves the continuing effort of decreasing the number of data islands, allowing the data previously contained in the islands to be accessed and cross-referenced with data generated by other applications.  One of the data island generators that I have been working to phase out is Superbase 3.0, a database platform that started life on a Commodore 64 in 1983.  Superbase 3.0 is a 16 bit client-side database, so it does not play well with the 64 bit Windows 7 that ships with most new desktop computers (16 bit applications will not run natively on 64 bit Windows, instead the 16 bit applications must be run inside a virtual machine such as Windows XP Mode, or run remotely using remote access software such as a VNC client).

The data contained in the old Superbase databases is critical to the company’s processes, so that data must be salvaged – meaning that the data must be transformed and imported into an Oracle database.  Unlike what a developer would likely create in a relational database, often with multiple tables used to store one “record” in the database, the long departed designer of the Superbase databases used a single row in a single database table to store one “record” in the database.  That organization reminds me a lot of the Microsoft Works package’s database from the early 1990s, with its fancy data entry forms which allowed users to escape the dull spreadsheet-like data entry screen.  Microsoft Excel from the early/mid 1990s could magically transform a dull spreadsheet data entry screen into a simple data entry form, in the process transforming the expensive Microsoft Excel into essentially a cheap database program.  It is a bit more of a challenge to locate the automatic data entry form creator in Excel 2010 than I recall it being in the early/mid 1990s version of Excel, but I suppose that helps further reduce the number of potential data islands:

So, what does the above discussion of Microsoft Excel have to do with anything related to Oracle Database?  The data contained in the Superbase databases must be transformed and inserted into an Oracle database.  It is good news that Superbase is able to export data to Microsoft Excel format.  The bad news is that the exported format is designed to work with Microsoft Excel 2.0 – a very old version of Microsoft Excel that seems to date back to 1987!  Time for a lot of manual data entry if that data must end up in an Oracle Database 11.2.0.x database… unless…

Microsoft Excel 2003 (version 12.0 if I remember correctly) is able to open Excel 2.0 files… success, even if the success is minor.  Now, how to go about tranferring the data from Excel into Oracle Database?  I suppose that I could have created a macro in Microsoft Excel to insert the data into Oracle Database, but at the time I was not interested in writing a macro that accomplished the task “the right way” using bind variables.  And just look at that data – some of the date values were imported as very small (roughly -65000) numbers, in some cases nearly 20 different spellings for the same customer name, and alpha-numeric text in columns that should be numeric.

So, how did I import the Superbase data that was now in Excel 2003 into the Oracle Database 11.2.0.x database without writing an Excel macro?  The particular computer with Excel 2003 that I was using also had a copy of Access 2003 installed.  Access 2003 is able to create a table “link” to an Excel 2003 spreadsheet’s worksheet, and handle that worksheet essentially the same as if it were a database table.  Now the data is “in” Microsoft Access 2003, but still not in an Oracle database.  Previous experience with this process pays off – before bringing the data into Microsoft Access, type each of the Oracle Database destination table’s column names into the first row of the Microsoft Excel spreadsheet, above the appropriate column’s data.  Importing the data into the Oracle database then becomes a simple four step process (assuming that no other data transformation is necessary)

  1. Link to the Excel spreadsheet’s worksheet and the destination table in the Oracle database.
  2. Create an Access view (stored Query) that selects all of the columns from the Excel worksheet that must be inserted into the Oracle database.
  3. Convert the view (stored Query) type to an Append type and select the linked Oracle Database table as the destination – Access will automatically find the correct destination column in the Oracle table, if the source column name (from the first row in the Excel worksheet) matches the destination column name.
  4. Execute the append type view.

A simple transformation of the data from 1994 database technology to 1987, 2003, and then on to 2011 in Oracle Database – and without writing a single line of code.  Remember that problem that I mentioned about alpha-numeric text in columns that should be numeric, such as “10&20″ in a column named OPERATION_NUMBER (or OPERATION_SEQ_NO) – it turns out that that bit of inconsistent data cannot just be thrown out (thanks Microsoft Access 2003).  To fix that problem, I would need to add another column to the Oracle Database table, and then have Microsoft Access update that table using the Microsoft Excel spreadsheet data (fixing the “10&20″, “10 & 20″, “10  &20″, “10 AND 20″, “10,20” and “10, 20″ variants into a standard format.  The SQL dialect in Microsoft Access is a bit odd at times, and I could not remember if the odd syntax applies to UPDATE statements also.  As an example of the odd syntax, the simple CREATE TABLEAS SELECT:

CREATE TABLE
  T2 AS
SELECT
  T1.SID,
  T1.SERIAL,
  T1.USERNAME,
  T1.PROGRAM,
  MIN(T1.CHECK_COUNT) AS CHECK_COUNT_START,
  MAX(T1.CHECK_COUNT) AS CHECK_COUNT_END
FROM
  T1
GROUP BY
  T1.SID,
  T1.SERIAL,
  T1.USERNAME,
  T1.PROGRAM;

Must be written like the following in Microsoft Access:

SELECT
  T1.SID,
  T1.SERIAL,
  T1.USERNAME,
  T1.PROGRAM,
  MIN(T1.CHECK_COUNT) AS CHECK_COUNT_START,
  MAX(T1.CHECK_COUNT) AS CHECK_COUNT_END
INTO
  T2
FROM
  T1
GROUP BY
  T1.SID,
  T1.SERIAL,
  T1.USERNAME,
  T1.PROGRAM;

Since I am not 100% confident in my SQL authoring skills in Microsoft Access, how do I move the data from the Excel spreadsheet into the new column of the Oracle Database table… and without writing a single line of programming code?  I simply created a temporary table (not a true temporary table, because the table data must be visible to more than one session) that contained the primary key column and a second column for the non-numeric numeric data.  Once the data was in the temporary Oracle table (using the simple four step process outlined above), I simply executed an UPDATE statement similar to this:

UPDATE
  T1
SET
  NON_NUMERIC_NUMERIC=(
    SELECT
      NON_NUMERIC_NUMERIC
    FROM
      T1_TEMP TT
    WHERE
      T1.PRIMARY_KEY=TT.PRIMARY_KEY)
WHERE
  T1.PRIMARY_KEY IN (
    SELECT
      PRIMARY_KEY
    FROM
      T1_TEMP);

With the data successfully transferred into an Oracle database table, the programming continues.  That brings me to the next article in this series, the internal conflicts of the “best” way to accomplish the programming task.





SQL Challenges

14 06 2012

June 14, 2012

Dominic Delmolino put together a very interesting challenge.  The challenge is to produce something called a Pascal matrix using Oracle Database… more specifically, just SQL.  I had a vague recollection of Pascal matrixes when I read Dominic’s challenge.  Basically, the goal is to create a matrix similar to the following:

The rule for generating the matrix is simply that a cell’s value is the sum of the value in the cell that is immediately to the left plus the value in the cell that is immediately above.  Sounds easy, right?

If we were just working in Microsoft Excel (or some other spreadsheet package), we could do something like this to quickly create the matrix:

Dominic’s challenge probably would not be much of a challenge if we could just type in formulas like the above into a SQL statement.  Give his challenge a try to see if you are able to derive a unique solution to the problem.  I probably spent a couple of minutes (maybe 60 seconds with the help of copy and paste) creating the above example using Microsoft Excel, but spent a couple of hours trying to produce a solution that worked using SQL.

——

Part 2 of the challenge.

Take a look at the bullet point items in this blog article about NULL values in table’s columns, in particular the first three large bullet point items.  Do you agree or disagree with the statements, and why?





Reproducing a Canned Report using a Single SQL Statement

11 04 2012

April 11, 2012

I recently received an interesting request for assistance from an ERP email mailing list.  The author of the email wanted to reproduce a canned report found in the ERP package so that the information could be published on a Microsoft Sharepoint system.  The author of the email is using SQL Server for the database backend, but we will ignore that bit of complexity for the moment.

The canned report that ships with the ERP system looks similar to the following:

Basically, the end user enters a start and an end date for the report, the ERP system checks the shop resource availability calendar for the days between the start and end dates, and then calculates the Capacity value from that calendar (there is a default shop resource calendar, and an override calendar for some of the shop resources).  The Act Hrs column in the report is simply the sum of hours calculated from the employee labor transactions that are recorded in real-time on the production floor – that column value is easy to calculate, but is subject to minor rounding errors and date calculation errors when an employee labor transaction starts before midnight and ends after midnight on either the start date or end date specified by the end user running the report.  The Est Hrs column in the report is a little more difficult to calculate, requiring a bit of calculation that determines what item the employee actually claimed to have produced, and in what quantity – what if the employee claimed to have produced 6 of 10 widgets in the specified time period, might he have completed 99% of the seventh widget but not yet reported that widget complete?  The EFF column is simply the Est Hours column divided by the Act Hrs column, with the result multiplied by 100.  The Act Util column is also easy to calculate: the Act Hrs column value divided by the Capacity column value, with the result multiplied by 100.  The Est Util column is simply the Est Hrs column value divided by the Capacity column value, with the result multiplied by 100.

So, where do we start in trying to reproduce this particular report?  How about enabling a 10046 trace for the session that creates the report.  Analyzing the resulting trace file might provide some insight into how the report is built.  Below is a summary of the SQL statements found in the trace file:

Query 1: (this query that retrieves the estimated hours, shop resource description, and various other information that does not appear on the report – this appears to be the query that provokes the ERP system to issue the queries that follow)

SELECT
  L.RESOURCE_ID,
  L.WORKORDER_TYPE,
  L.WORKORDER_BASE_ID, 
  L.WORKORDER_LOT_ID,
  L.WORKORDER_SPLIT_ID,
  L.WORKORDER_SUB_ID, 
  L.OPERATION_SEQ_NO,
  L.TRANSACTION_DATE,
  L.HOURS_WORKED,
  L.GOOD_QTY, 
  L.TYPE,
  L.EMPLOYEE_ID, 
  OP.SETUP_HRS,
  OP.RUN,
  OP.RUN_TYPE,
  OP.LOAD_SIZE_QTY,
  OP.CALC_END_QTY, 
  R.DESCRIPTION, 
  RTRIM(E.LAST_NAME),
  RTRIM(E.FIRST_NAME),
  E.MIDDLE_INITIAL, 
  W.PART_ID,
  P.DESCRIPTION,
  L.TRANSACTION_ID 
FROM
  LABOR_TICKET L,
  OPERATION OP,
  SHOP_RESOURCE R,
  EMPLOYEE E,
  WORK_ORDER W,
  PART P 
WHERE
  L.RESOURCE_ID = R.ID 
  AND L.EMPLOYEE_ID = E.ID 
  AND L.WORKORDER_TYPE = OP.WORKORDER_TYPE 
  AND L.WORKORDER_BASE_ID = OP.WORKORDER_BASE_ID 
  AND L.WORKORDER_LOT_ID = OP.WORKORDER_LOT_ID 
  AND L.WORKORDER_SPLIT_ID = OP.WORKORDER_SPLIT_ID 
  AND L.WORKORDER_SUB_ID = OP.WORKORDER_SUB_ID 
  AND L.OPERATION_SEQ_NO = OP.SEQUENCE_NO  
  AND W.TYPE = OP.WORKORDER_TYPE 
  AND W.BASE_ID = OP.WORKORDER_BASE_ID 
  AND W.LOT_ID = OP.WORKORDER_LOT_ID 
  AND W.SPLIT_ID = OP.WORKORDER_SPLIT_ID 
  AND W.SUB_ID = OP.WORKORDER_SUB_ID  
  AND W.PART_ID = P.ID (+)  
  AND TRUNC(L.TRANSACTION_DATE) BETWEEN :1  AND :2
ORDER BY
  L.RESOURCE_ID,
  L.WORKORDER_TYPE,
  L.WORKORDER_BASE_ID, 
  L.WORKORDER_LOT_ID,
  L.WORKORDER_SPLIT_ID,
  L.WORKORDER_SUB_ID,
  L.OPERATION_SEQ_NO;

Query 2: (this query returns the quantity of a specific resource that is available for each of three shifts… for instance, the number of employees with a specific skill)

SELECT
  SHIFT_1_CAPACITY,
  SHIFT_2_CAPACITY,
  SHIFT_3_CAPACITY 
FROM
  SHOP_RESOURCE 
WHERE
  ID = :1;

Query 3: (this query returns the number of hours of availability for a specific resource, for each day of the week; the default resource calendar has a NULL value for the RESOURCE_ID column, so the specified sort order will return the default resource calendar rows before the over-ride calendar for a specific shop resource)

SELECT 
  DAY_OF_WEEK,
  START_OF_DAY,
  SHIFT_1,
  SHIFT_2,
  SHIFT_3 
FROM
  CALENDAR_WEEK 
WHERE
  (SCHEDULE_ID IS NULL OR SCHEDULE_ID = 'STANDARD') 
  AND (RESOURCE_ID IS NULL OR RESOURCE_ID = :1 ) 
ORDER BY
  SCHEDULE_ID,
  RESOURCE_ID DESC,
  DAY_OF_WEEK;

Query 4: (this query returns the exceptions, planned days of unavailabilty, for the default resource calendar and the shop resource specific over-ride calendar.  I just noticed a logic error in this SQL statement – what about those scheduled exceptions that start before the user specified start date and/or those scheduled exceptions that end after the user specified end date, where some of those dates fall into the user specified date range?)

SELECT  
  START_DATE,
  END_DATE,
  START_OF_DAY,
  SHIFT_1,
  SHIFT_2,
  SHIFT_3, 
  SHIFT_1_CAPACITY,
  SHIFT_2_CAPACITY,
  SHIFT_3_CAPACITY 
FROM
  CALENDAR_CHANGE 
WHERE
  (SCHEDULE_ID IS NULL OR SCHEDULE_ID = 'STANDARD') 
  AND (RESOURCE_ID IS NULL OR RESOURCE_ID = :1 ) 
  AND START_DATE >= :2 AND END_DATE <= :3
ORDER BY
  SCHEDULE_ID,
  RESOURCE_ID,
  START_DATE;

Query 5: (this query returns the number of operation setup hours for a specific operation, time spent preparing to produce parts, reported on the shop floor – it would seem that Query 1 could be modified to return this information)

SELECT
  SUM(HOURS_WORKED) 
FROM
  LABOR_TICKET 
WHERE
  WORKORDER_TYPE = :1     
  AND WORKORDER_BASE_ID = :2
  AND WORKORDER_LOT_ID = :3
  AND WORKORDER_SPLIT_ID = :4
  AND WORKORDER_SUB_ID = :5
  AND OPERATION_SEQ_NO = :6
  AND TYPE = 'S';

Now that we have a general idea of what steps the ERP system is performing to reproduce the report, where do we start?  We start by telling the OP (the person with the SQL Server backend) that with an Oracle Database backend, we could produce this report using a single SQL statement (without any stored procedures) – that approach would certainly eliminate a lot of back and forth communication between the client computer and the database server, which is an important consideration if a high latency network connects the two devices.  For the sake of simplicity, I will set the above query 4 aside for now.  Since there is an index on the LABOR_TICKET.TRANSACTION_DATE column, the ERP system loses the benefit of that index by applying the TRUNC function to that column’s values – so we will need to fix that problem.

First, we need to know how many hours of capacity are available for each resource.  If a specific over-ride schedule (shop calendar) is not available for a shop resource, then the standard schedule is used.  To retrieve the schedules for the shop resources that do not have a defined unique schedule, we can create a Cartesian join between two tables with a NOT EXISTS clause to exclude those shop resources with an over-ride schedule:

SELECT
  SR.ID AS RESOURCE_ID, 
  CW.DAY_OF_WEEK,
  CW.SHIFT_1*SR.SHIFT_1_CAPACITY AS HOURS_1,
  CW.SHIFT_2*SR.SHIFT_2_CAPACITY AS HOURS_2,
  CW.SHIFT_3*SR.SHIFT_3_CAPACITY AS HOURS_3,
  'DEFAULT' CALENDAR_TYPE
FROM
  CALENDAR_WEEK CW,
  SHOP_RESOURCE SR
WHERE
  (CW.SCHEDULE_ID IS NULL OR CW.SCHEDULE_ID = 'STANDARD')
  AND CW.RESOURCE_ID IS NULL
  AND NOT EXISTS (
    SELECT
      C.RESOURCE_ID
    FROM
      CALENDAR_WEEK C
    WHERE
      SR.ID=C.RESOURCE_ID);

We also need to retrieve the shop resources with defined over-ride schedules:

SELECT
  SR.ID AS RESOURCE_ID, 
  CW.DAY_OF_WEEK,
  CW.SHIFT_1*SR.SHIFT_1_CAPACITY AS HOURS_1,
  CW.SHIFT_2*SR.SHIFT_2_CAPACITY AS HOURS_2,
  CW.SHIFT_3*SR.SHIFT_3_CAPACITY AS HOURS_3,
  'RESOURCE' CALENDAR_TYPE
FROM
  CALENDAR_WEEK CW,
  SHOP_RESOURCE SR
WHERE
  (CW.SCHEDULE_ID IS NULL OR CW.SCHEDULE_ID = 'STANDARD')
  AND CW.RESOURCE_ID=SR.ID;

Finally, we need to UNION ALL the above two queries:

SELECT
  SR.ID AS RESOURCE_ID, 
  CW.DAY_OF_WEEK,
  CW.SHIFT_1*SR.SHIFT_1_CAPACITY AS HOURS_1,
  CW.SHIFT_2*SR.SHIFT_2_CAPACITY AS HOURS_2,
  CW.SHIFT_3*SR.SHIFT_3_CAPACITY AS HOURS_3,
  'DEFAULT' CALENDAR_TYPE
FROM
  CALENDAR_WEEK CW,
  SHOP_RESOURCE SR
WHERE
  (CW.SCHEDULE_ID IS NULL OR CW.SCHEDULE_ID = 'STANDARD')
  AND CW.RESOURCE_ID IS NULL
  AND NOT EXISTS (
    SELECT
      C.RESOURCE_ID
    FROM
      CALENDAR_WEEK C
    WHERE
      SR.ID=C.RESOURCE_ID)
UNION ALL
SELECT
  SR.ID AS RESOURCE_ID, 
  CW.DAY_OF_WEEK,
  CW.SHIFT_1*SR.SHIFT_1_CAPACITY AS HOURS_1,
  CW.SHIFT_2*SR.SHIFT_2_CAPACITY AS HOURS_2,
  CW.SHIFT_3*SR.SHIFT_3_CAPACITY AS HOURS_3,
  'RESOURCE' CALENDAR_TYPE
FROM
  CALENDAR_WEEK CW,
  SHOP_RESOURCE SR
WHERE
  (CW.SCHEDULE_ID IS NULL OR CW.SCHEDULE_ID = 'STANDARD')
  AND CW.RESOURCE_ID=SR.ID
ORDER BY
  RESOURCE_ID,
  DAY_OF_WEEK;

The ERP system uses the following code numbers to represent each day of the week in the output produced by the above SQL statement:

0: Saturday
1: Sunday
2: Monday
3: Tuesday
4: Wednesday
5: Thursday
6: Friday

On Oracle Database, we are able to use the TO_DATE function to almost reproduce the above translation table, although NLS settings that are in effect could produce results that require further adjustment.  The following formula returns 1 for a Sunday and 6 for a Friday, but 7 for Saturday:

TO_DATE(TRANSACTION_DATE, 'D')

We are able to convert the 7 value for Saturdays into a 0 by using the MOD function to return the remainder value after dividing by 7:

MOD(TO_DATE(TRANSACTION_DATE, 'D'), 7)

Assume that we want to generate a report for the dates between March 26, 2012 and April 14, 2012.  We need a way to determine the number of Mondays, Tuesdays, Wednesdays, Thursdays, Fridays, Saturdays, and Sundays between the two dates (including the end-point dates).  There are a couple of approaches to this problem:

  1. Use the CONNECT BY LEVEL syntax to generate a row for each date between the start and end dates.
  2. Use a pre-created, statistically defined table that simply lists all of the dates between an arbitrary start and end date.
  3. Use a stored procedure to calculate the number of each day of the week between the start and end dates.
  4. Mathematically calculate within the SQL statement the number of each day of the week.

I will use the first of the above approaches… I suppose this approach would be a bit of a challenge with a SQL Server backend.

SELECT
  TO_DATE('03/26/2012','MM/DD/YYYY')+(LEVEL-1) MY_DATE,
  TO_CHAR(TO_DATE('03/26/2012','MM/DD/YYYY')+(LEVEL-1), 'D') AS D,
  MOD(TO_CHAR(TO_DATE('03/26/2012','MM/DD/YYYY')+(LEVEL-1), 'D'),7) DAY_OF_WEEK
FROM
  DUAL
CONNECT BY
  LEVEL <= (TO_DATE('04/14/2012','MM/DD/YYYY') - TO_DATE('03/26/2012','MM/DD/YYYY'))+1;

MY_DATE   D DAY_OF_WEEK
--------- - -----------
26-MAR-12 2           2
27-MAR-12 3           3
28-MAR-12 4           4
29-MAR-12 5           5
30-MAR-12 6           6
31-MAR-12 7           0
01-APR-12 1           1
02-APR-12 2           2
03-APR-12 3           3
04-APR-12 4           4
05-APR-12 5           5
06-APR-12 6           6
07-APR-12 7           0
08-APR-12 1           1
09-APR-12 2           2
10-APR-12 3           3
11-APR-12 4           4
12-APR-12 5           5
13-APR-12 6           6
14-APR-12 7           0

If we were to use the above in a production environment, we certainly would convert the literal (constant) date values to bind variables.  The DAY_OF_WEEK column values (assuming that the NLS settings result in Sunday being the first day of the week) are the important values in the above output.  If we also implemented query 4 used by the ERP system, then the MY_DATE column values would also need to be considered.  How many Mondays, Tuesdays, Wednesdays, etc. are between the two dates?:

SELECT
  MOD(TO_CHAR(TO_DATE('03/26/2012','MM/DD/YYYY')+(LEVEL-1), 'D'),7) DAY_OF_WEEK,
  COUNT(*) NUM_DAYS
FROM
  DUAL
CONNECT BY
  LEVEL <= (TO_DATE('04/14/2012','MM/DD/YYYY') - TO_DATE('03/26/2012','MM/DD/YYYY'))+1
GROUP BY
  MOD(TO_CHAR(TO_DATE('03/26/2012','MM/DD/YYYY')+(LEVEL-1), 'D'),7);

DAY_OF_WEEK   NUM_DAYS
----------- ----------
          0          3
          1          2
          2          3
          3          3
          4          3
          5          3
          6          3

As shown above, just 2 Sundays, but 3 of every other day of the week.

We now have two SQL statements (the one that indicates the available capacity per day of week, and the other that indicates the number of each day of the week in the user specified time interval) that need to be joined together:

SELECT
  C.RESOURCE_ID,
  C.DAY_OF_WEEK,
  (C.HOURS_1 + C.HOURS_2 + C.HOURS_3) * D.NUM_DAYS
FROM
  (SELECT
    SR.ID AS RESOURCE_ID, 
    CW.DAY_OF_WEEK,
    CW.SHIFT_1*SR.SHIFT_1_CAPACITY AS HOURS_1,
    CW.SHIFT_2*SR.SHIFT_2_CAPACITY AS HOURS_2,
    CW.SHIFT_3*SR.SHIFT_3_CAPACITY AS HOURS_3,
    'DEFAULT' CALENDAR_TYPE
  FROM
    CALENDAR_WEEK CW,
    SHOP_RESOURCE SR
  WHERE
    (CW.SCHEDULE_ID IS NULL OR CW.SCHEDULE_ID = 'STANDARD')
    AND CW.RESOURCE_ID IS NULL
    AND NOT EXISTS (
      SELECT
        C.RESOURCE_ID
      FROM
        CALENDAR_WEEK C
      WHERE
        SR.ID=C.RESOURCE_ID)
  UNION ALL
  SELECT
    SR.ID AS RESOURCE_ID, 
    CW.DAY_OF_WEEK,
    CW.SHIFT_1*SR.SHIFT_1_CAPACITY AS HOURS_1,
    CW.SHIFT_2*SR.SHIFT_2_CAPACITY AS HOURS_2,
    CW.SHIFT_3*SR.SHIFT_3_CAPACITY AS HOURS_3,
    'RESOURCE' CALENDAR_TYPE
  FROM
    CALENDAR_WEEK CW,
    SHOP_RESOURCE SR
  WHERE
    (CW.SCHEDULE_ID IS NULL OR CW.SCHEDULE_ID = 'STANDARD')
    AND CW.RESOURCE_ID=SR.ID) C,
  (SELECT
    MOD(TO_CHAR(TO_DATE('03/26/2012','MM/DD/YYYY')+(LEVEL-1), 'D'),7) DAY_OF_WEEK,
    COUNT(*) NUM_DAYS
  FROM
    DUAL
  CONNECT BY
    LEVEL <= (TO_DATE('04/14/2012','MM/DD/YYYY') - TO_DATE('03/26/2012','MM/DD/YYYY'))+1
  GROUP BY
    MOD(TO_CHAR(TO_DATE('03/26/2012','MM/DD/YYYY')+(LEVEL-1), 'D'),7) ) D
WHERE
  C.DAY_OF_WEEK=D.DAY_OF_WEEK;

Above is a somewhat large and complicated SQL statement, if you simply jumped into the middle of this article.

We still need to GROUP the result by the RESOURCE_ID column and add in the DESCRIPTION that is associated with each RESOURCE_ID:

SELECT
  C.RESOURCE_ID,
  C.DESCRIPTION,
  SUM((C.HOURS_1 + C.HOURS_2 + C.HOURS_3) * D.NUM_DAYS) CAPACITY
FROM
  (SELECT
    SR.ID AS RESOURCE_ID, 
    SR.DESCRIPTION,
    CW.DAY_OF_WEEK,
    CW.SHIFT_1*SR.SHIFT_1_CAPACITY AS HOURS_1,
    CW.SHIFT_2*SR.SHIFT_2_CAPACITY AS HOURS_2,
    CW.SHIFT_3*SR.SHIFT_3_CAPACITY AS HOURS_3,
    'DEFAULT' CALENDAR_TYPE
  FROM
    CALENDAR_WEEK CW,
    SHOP_RESOURCE SR
  WHERE
    (CW.SCHEDULE_ID IS NULL OR CW.SCHEDULE_ID = 'STANDARD')
    AND CW.RESOURCE_ID IS NULL
    AND NOT EXISTS (
      SELECT
        C.RESOURCE_ID
      FROM
        CALENDAR_WEEK C
      WHERE
        SR.ID=C.RESOURCE_ID)
  UNION ALL
  SELECT
    SR.ID AS RESOURCE_ID,
    SR.DESCRIPTION,
    CW.DAY_OF_WEEK,
    CW.SHIFT_1*SR.SHIFT_1_CAPACITY AS HOURS_1,
    CW.SHIFT_2*SR.SHIFT_2_CAPACITY AS HOURS_2,
    CW.SHIFT_3*SR.SHIFT_3_CAPACITY AS HOURS_3,
    'RESOURCE' CALENDAR_TYPE
  FROM
    CALENDAR_WEEK CW,
    SHOP_RESOURCE SR
  WHERE
    (CW.SCHEDULE_ID IS NULL OR CW.SCHEDULE_ID = 'STANDARD')
    AND CW.RESOURCE_ID=SR.ID) C,
  (SELECT
    MOD(TO_CHAR(TO_DATE('03/26/2012','MM/DD/YYYY')+(LEVEL-1), 'D'),7) DAY_OF_WEEK,
    COUNT(*) NUM_DAYS
  FROM
    DUAL
  CONNECT BY
    LEVEL <= (TO_DATE('04/14/2012','MM/DD/YYYY') - TO_DATE('03/26/2012','MM/DD/YYYY'))+1
  GROUP BY
    MOD(TO_CHAR(TO_DATE('03/26/2012','MM/DD/YYYY')+(LEVEL-1), 'D'),7) ) D
WHERE
  C.DAY_OF_WEEK=D.DAY_OF_WEEK
GROUP BY
  C.RESOURCE_ID,
  C.DESCRIPTION
ORDER BY
  RESOURCE_ID;

The above SQL statement results in the following output – we now have the first column in the ERP system’s canned report:

At this point, my claim of being able to reproduce this canned report in a single SQL statement might seem suspect – all of the above work for just a single column in the report.  It’s easy, just build the report in separate units, and then join the separate units.

As I mentioned earlier, calculating the estimated average hours and average actual hours could be a little difficult.  We might start with something like this:

SELECT
  LT.RESOURCE_ID,
  O.RUN_HRS AS EST_HOURS,
  O.CALC_END_QTY AS OPERATION_QTY,
  ROUND(O.RUN_HRS/O.CALC_END_QTY,2) AS AVG_EST_HOURS,
  SUM(LT.HOURS_WORKED) AS ACT_HOURS,
  SUM(LT.GOOD_QTY) AS COMPLETED_QTY,
  ROUND(DECODE(SUM(LT.GOOD_QTY),0,SUM(LT.HOURS_WORKED),SUM(LT.HOURS_WORKED)/SUM(LT.GOOD_QTY)),2) AVG_ACT_HOURS
FROM
  LABOR_TICKET LT,
  OPERATION O
WHERE
  LT.WORKORDER_TYPE='W'
  AND LT.TYPE='R'
  AND LT.WORKORDER_TYPE=O.WORKORDER_TYPE
  AND LT.WORKORDER_BASE_ID=O.WORKORDER_BASE_ID
  AND LT.WORKORDER_LOT_ID=O.WORKORDER_LOT_ID
  AND LT.WORKORDER_SPLIT_ID=O.WORKORDER_SPLIT_ID
  AND LT.WORKORDER_SUB_ID=O.WORKORDER_SUB_ID
  AND LT.OPERATION_SEQ_NO=O.SEQUENCE_NO
  AND LT.SHIFT_DATE BETWEEN TO_DATE('03/26/2012','MM/DD/YYYY') AND TO_DATE('04/14/2012','MM/DD/YYYY')
GROUP BY
  LT.RESOURCE_ID,
  O.RUN_HRS,
  O.CALC_END_QTY
ORDER BY
  LT.RESOURCE_ID;

The output of the above SQL statement looks similar to the following:

Nice, if we were just interested in summarizing by the individual operations.  However, the above will not work as a starting point for the next two columns in the report because we need to summarized by the RESOURCE_ID column – if we group on that column, we will throw off the accuracy of the efficiency calculations in the report.  So, we need to take a step back and try again.

First, we will summarize the data from the LABOR_TICKET table by itself – notice that I am not using the TRUNC function around the date column, and I am using the SHIFT_DATE column rather than the TRANSACTION_DATE column (the SHIFT_DATE column in this table is better able to work around the issue with labor transactions that cross midnight, and the time portion of the date values are already truncated to midnight):

SELECT
  LT.WORKORDER_TYPE,
  LT.WORKORDER_BASE_ID,
  LT.WORKORDER_LOT_ID,
  LT.WORKORDER_SPLIT_ID,
  LT.WORKORDER_SUB_ID,
  LT.OPERATION_SEQ_NO,
  LT.RESOURCE_ID,
  SUM(LT.HOURS_WORKED) AS ACT_HOURS,
  SUM(LT.GOOD_QTY) AS COMPLETED_QTY
FROM
  LABOR_TICKET LT
WHERE
  LT.SHIFT_DATE BETWEEN TO_DATE('03/26/2012','MM/DD/YYYY') AND TO_DATE('04/14/2012','MM/DD/YYYY')
  AND LT.TYPE='R'
  AND LT.WORKORDER_TYPE='W'
GROUP BY
  LT.WORKORDER_TYPE,
  LT.WORKORDER_BASE_ID,
  LT.WORKORDER_LOT_ID,
  LT.WORKORDER_SPLIT_ID,
  LT.WORKORDER_SUB_ID,
  LT.OPERATION_SEQ_NO,
  LT.RESOURCE_ID
HAVING
  SUM(LT.HOURS_WORKED)>0;

Now that we have pre-summarized the results from the LABOR_TICKET table, we can slide the above SQL statement into an inline view and join that inline view with the OPERATION table:

SELECT
  LT.RESOURCE_ID,
  SUM(ROUND(O.RUN_HRS/O.CALC_END_QTY*LT.COMPLETED_QTY,2)) AS ENG_HOURS,
  SUM(LT.ACT_HOURS) AS ACT_HOURS,
  ROUND(SUM(ROUND(O.RUN_HRS/O.CALC_END_QTY*LT.COMPLETED_QTY,2))/SUM(LT.ACT_HOURS)*100,2) AS EFF
FROM
  (SELECT
    LT.WORKORDER_TYPE,
    LT.WORKORDER_BASE_ID,
    LT.WORKORDER_LOT_ID,
    LT.WORKORDER_SPLIT_ID,
    LT.WORKORDER_SUB_ID,
    LT.OPERATION_SEQ_NO,
    LT.RESOURCE_ID,
    SUM(LT.HOURS_WORKED) AS ACT_HOURS,
    SUM(LT.GOOD_QTY) AS COMPLETED_QTY
  FROM
    LABOR_TICKET LT
  WHERE
    LT.SHIFT_DATE BETWEEN TO_DATE('03/26/2012','MM/DD/YYYY') AND TO_DATE('04/14/2012','MM/DD/YYYY')
    AND LT.TYPE='R'
    AND LT.WORKORDER_TYPE='W'
  GROUP BY
    LT.WORKORDER_TYPE,
    LT.WORKORDER_BASE_ID,
    LT.WORKORDER_LOT_ID,
    LT.WORKORDER_SPLIT_ID,
    LT.WORKORDER_SUB_ID,
    LT.OPERATION_SEQ_NO,
    LT.RESOURCE_ID
  HAVING
    SUM(LT.HOURS_WORKED)>0) LT,
  OPERATION O
WHERE
  LT.WORKORDER_TYPE=O.WORKORDER_TYPE
  AND LT.WORKORDER_BASE_ID=O.WORKORDER_BASE_ID
  AND LT.WORKORDER_LOT_ID=O.WORKORDER_LOT_ID
  AND LT.WORKORDER_SPLIT_ID=O.WORKORDER_SPLIT_ID
  AND LT.WORKORDER_SUB_ID=O.WORKORDER_SUB_ID
  AND LT.OPERATION_SEQ_NO=O.SEQUENCE_NO
GROUP BY
  LT.RESOURCE_ID
ORDER BY
  LT.RESOURCE_ID;

The result of the above SQL statement is much more useful than the original attempt.  Note that the above SQL statement makes certain assumptions about the estimated (ENG_HOURS) that the ERP system does not make, so the resulting numbers could be slightly different.

We now have three more columns of our report.

The final step is to put the complete SQL statement together, joining the previously created portion of the SQL statement that determined the available capacity with the just created SQL statement that determined the actual and estimated utilization.  The resulting SQL statement appears as follows:

SELECT
  C.RESOURCE_ID,
  C.DESCRIPTION,
  C.CAPACITY,
  E.ACT_HOURS,
  E.ENG_HOURS AS EST_HOURS,
  E.EFF,
  DECODE(C.CAPACITY,0,0,ROUND(E.ACT_HOURS/C.CAPACITY*100,2)) AS ACT_UTIL,
  DECODE(C.CAPACITY,0,0,ROUND(E.ENG_HOURS/C.CAPACITY*100,2)) AS EST_UTIL
FROM
  (SELECT
    C.RESOURCE_ID,
    C.DESCRIPTION,
    SUM((C.HOURS_1 + C.HOURS_2 + C.HOURS_3) * D.NUM_DAYS) CAPACITY
  FROM
    (SELECT
      SR.ID AS RESOURCE_ID, 
      SR.DESCRIPTION,
      CW.DAY_OF_WEEK,
      CW.SHIFT_1*SR.SHIFT_1_CAPACITY AS HOURS_1,
      CW.SHIFT_2*SR.SHIFT_2_CAPACITY AS HOURS_2,
      CW.SHIFT_3*SR.SHIFT_3_CAPACITY AS HOURS_3,
      'DEFAULT' CALENDAR_TYPE
    FROM
      CALENDAR_WEEK CW,
      SHOP_RESOURCE SR
    WHERE
      (CW.SCHEDULE_ID IS NULL OR CW.SCHEDULE_ID = 'STANDARD')
      AND CW.RESOURCE_ID IS NULL
      AND NOT EXISTS (
        SELECT
          C.RESOURCE_ID
        FROM
          CALENDAR_WEEK C
        WHERE
          SR.ID=C.RESOURCE_ID)
    UNION ALL
    SELECT
      SR.ID AS RESOURCE_ID,
      SR.DESCRIPTION,
      CW.DAY_OF_WEEK,
      CW.SHIFT_1*SR.SHIFT_1_CAPACITY AS HOURS_1,
      CW.SHIFT_2*SR.SHIFT_2_CAPACITY AS HOURS_2,
      CW.SHIFT_3*SR.SHIFT_3_CAPACITY AS HOURS_3,
      'RESOURCE' CALENDAR_TYPE
    FROM
      CALENDAR_WEEK CW,
      SHOP_RESOURCE SR
    WHERE
      (CW.SCHEDULE_ID IS NULL OR CW.SCHEDULE_ID = 'STANDARD')
      AND CW.RESOURCE_ID=SR.ID) C,
    (SELECT
      MOD(TO_CHAR(TO_DATE('03/26/2012','MM/DD/YYYY')+(LEVEL-1), 'D'),7) DAY_OF_WEEK,
      COUNT(*) NUM_DAYS
    FROM
      DUAL
    CONNECT BY
      LEVEL <= (TO_DATE('04/14/2012','MM/DD/YYYY') - TO_DATE('03/26/2012','MM/DD/YYYY'))+1
    GROUP BY
      MOD(TO_CHAR(TO_DATE('03/26/2012','MM/DD/YYYY')+(LEVEL-1), 'D'),7) ) D
  WHERE
    C.DAY_OF_WEEK=D.DAY_OF_WEEK
  GROUP BY
    C.RESOURCE_ID,
    C.DESCRIPTION) C,
  (SELECT
    LT.RESOURCE_ID,
    SUM(ROUND(O.RUN_HRS/O.CALC_END_QTY*LT.COMPLETED_QTY,2)) AS ENG_HOURS,
    SUM(LT.ACT_HOURS) AS ACT_HOURS,
    ROUND(SUM(ROUND(O.RUN_HRS/O.CALC_END_QTY*LT.COMPLETED_QTY,2))/SUM(LT.ACT_HOURS)*100,2) AS EFF
  FROM
    (SELECT
      LT.WORKORDER_TYPE,
      LT.WORKORDER_BASE_ID,
      LT.WORKORDER_LOT_ID,
      LT.WORKORDER_SPLIT_ID,
      LT.WORKORDER_SUB_ID,
      LT.OPERATION_SEQ_NO,
      LT.RESOURCE_ID,
      SUM(LT.HOURS_WORKED) AS ACT_HOURS,
      SUM(LT.GOOD_QTY) AS COMPLETED_QTY
    FROM
      LABOR_TICKET LT
    WHERE
      LT.SHIFT_DATE BETWEEN TO_DATE('03/26/2012','MM/DD/YYYY') AND TO_DATE('04/14/2012','MM/DD/YYYY')
      AND LT.TYPE='R'
      AND LT.WORKORDER_TYPE='W'
    GROUP BY
      LT.WORKORDER_TYPE,
      LT.WORKORDER_BASE_ID,
      LT.WORKORDER_LOT_ID,
      LT.WORKORDER_SPLIT_ID,
      LT.WORKORDER_SUB_ID,
      LT.OPERATION_SEQ_NO,
      LT.RESOURCE_ID
    HAVING
      SUM(LT.HOURS_WORKED)>0) LT,
    OPERATION O
  WHERE
    LT.WORKORDER_TYPE=O.WORKORDER_TYPE
    AND LT.WORKORDER_BASE_ID=O.WORKORDER_BASE_ID
    AND LT.WORKORDER_LOT_ID=O.WORKORDER_LOT_ID
    AND LT.WORKORDER_SPLIT_ID=O.WORKORDER_SPLIT_ID
    AND LT.WORKORDER_SUB_ID=O.WORKORDER_SUB_ID
    AND LT.OPERATION_SEQ_NO=O.SEQUENCE_NO
  GROUP BY
    LT.RESOURCE_ID) E
WHERE
  E.RESOURCE_ID=C.RESOURCE_ID
ORDER BY
  C.RESOURCE_ID;

The resulting output appears as follows:

Well, that was easy… :-) 

The solution reminds me of the phrase “How to Eat an Elephant“.  Elephant poaching is illegal in most parts of the world, so I propose changing this phrase to “How to Walk an Elephant”.  One step at a time, and make certain not to stand directly in front or directly behind.  ;-)





Monitoring Changes to Table Data

22 03 2012

March 22, 2012

Suppose that you receive a request stating that a particular table in one of your databases must be monitored for changes.  One such table might be the one that lists the ERP system’s suppliers and their addresses – you would not want someone to be able to set up a legitimate supplier, and later have the billing remit to name and address changed without being detected.

What approach would you take to solve the above problem?  A question similar to the above arrived recently in an ERP mailing list email – the original poster (OP) is using a SQL Server RDBMS, so that might change the proposed solution just a bit.

Here is the “How simple is too simple?” suggestion that I offered:

Periodically, create a VENDOR_SAVE table (drop it if it already exists, or delete all rows and re-insert from the original source table):

CREATE TABLE VENDOR_SAVE AS
SELECT
  *
FROM
  VENDOR;

Now, just wait a while.

The question then becomes, how do we detect:

  • A new row (record) added to the original source (VENDOR) table.
  • An old row (record) deleted from the original source (VENDOR) table.
  • A change to any column (program field) in the original source (VENDOR) table since the last time the VENDOR_SAVE table was created/refreshed.

Let’s start with the first two bullet points.  I will write the SQL statements so that the statements should work with Oracle Database 9.0.1 and above, and SQL Server with very few changes, even if I feel a bit ANSI about doing so.  I will use the COALESCE function, rather than the NVL function, and CASE syntax rather than the equivalent DECODE syntax.

To identify cases where a row has appeared in, or disappeared from the original source (VENDOR) table, we can simply perform a full outer join between the original source table and the historical mirror image of the original table (VENDOR_SAVE).  We are only interested in cases where the primary key column (ID) is found in exactly one of the two tables:

SELECT
  COALESCE(V.ID,V2.ID) AS ID,
  COALESCE(V.NAME,V2.NAME) AS NAME,
  COALESCE(V.ADDR_1,V2.ADDR_1) AS ADDR_1,
  CASE WHEN V.ID IS NULL THEN 'VENDOR ADDED' ELSE 'VENDOR DELETED' END AS PROBLEM
FROM
  VENDOR V
FULL OUTER JOIN
  VENDOR_SAVE V2
ON (V.ID=V2.ID)
WHERE
  V.ID IS NULL
  OR V2.ID IS NULL;

So, the above SQL statement satisfies the first two bullet points.  The third bullet point is a little more challenging to accomplish… unless of course we employ UNION labor.  If we have two row sources with identical columns, and UNION the row sources together, the resulting row source will be absent of any entirely duplicated rows from the two original row sources (two rows will be reduced to a single row).  If there were no changes to any of the column values (or if the row was added to or deleted from the original source table), there will be a single row for the primary key column value.  If any columns were changed, there will be two rows containing the primary key column value.

Let’s build a SQL statement that UNIONs the rows from the two tables together, and counts the number of rows for each primary key value:

SELECT
  COUNT(*) OVER (PARTITION BY VL.ID) AS CNT,
  VL.ID,
  VL.NAME,
  VL.ADDR_1
FROM
  (SELECT
    *
  FROM
    VENDOR
  UNION
  SELECT
    *
  FROM
    VENDOR_SAVE) VL;

To complete the requirement for bullet point 3 above, we need to eliminate all rows from the result set where there is a single row for the primary key value:

SELECT DISTINCT
  VL2.ID,
  VL2.NAME,
  VL2.ADDR_1,
  'VALUE CHANGE' PROBLEM
FROM
  (SELECT
    COUNT(*) OVER (PARTITION BY VL.ID) AS CNT,
    VL.ID,
    VL.NAME,
    VL.ADDR_1
  FROM
    (SELECT
      *
    FROM
      VENDOR
    UNION
    SELECT
      *
    FROM
      VENDOR_SAVE) VL) VL2
WHERE
  VL2.CNT>1;

As a final step, we should join the two resultsets into a single resultset using UNION ALL:

SELECT
  COALESCE(V.ID,V2.ID) AS ID,
  COALESCE(V.NAME,V2.NAME) AS NAME,
  COALESCE(V.ADDR_1,V2.ADDR_1) AS ADDR_1,
  CASE WHEN V.ID IS NULL THEN 'VENDOR ADDED' ELSE 'VENDOR DELETED' END AS PROBLEM
FROM
  VENDOR V
FULL OUTER JOIN
  VENDOR_SAVE V2
ON (V.ID=V2.ID)
WHERE
  V.ID IS NULL
  OR V2.ID IS NULL
UNION ALL
SELECT DISTINCT
  VL2.ID,
  VL2.NAME,
  VL2.ADDR_1,
  'VALUE CHANGE' PROBLEM
FROM
  (SELECT
    COUNT(*) OVER (PARTITION BY VL.ID) AS CNT,
    VL.ID,
    VL.NAME,
    VL.ADDR_1
  FROM
    (SELECT
      *
    FROM
      VENDOR
    UNION
    SELECT
      *
    FROM
      VENDOR_SAVE) VL) VL2
WHERE
  VL2.CNT>1
ORDER BY
  ID;

While somewhat limited in usefulness, the above approach will indicate which rows in the original source table should be examined because the column values in those rows changed (or were added or deleted).

Another, potentially more useful approach involves setting up a logging trigger and logging table.  I previously shared a VBS script that helps to automate and standardize the process of creating the logging trigger and logging table.  A very similar feature is built into my Hyper-Extended Oracle Performance Monitor program – but, much like the VBS script, this solution is useless for the OP who uses a SQL Server backend.

What are the other options?  Oracle Database’s built-in auditing capabilities.  Auditing built into the ERP system (this particular ERP system writes all audit/change records to a single table that uses a VARCHAR2 column to capture the before/after images of the values stored in columns).  Any other options?  (The mess that someone can create with a bit of idle time on their hands…)





Oracle Query Optimizer Vanishing Acts

3 02 2012

February 3, 2012

A couple of days ago I noticed an interesting thread in the comp.databases.oracle.server Usenet group that described a problem of vanishing tables.  The title of the thread certainly caught my attention, and I was a bit disappointed when I found that the there was little to no magic involved in the vanishing act.  The situation reported in the thread was not that the table and its data simply disappeared, but instead that a table referenced in a SQL statement simply did not appear in an execution plan.  While not quite as entertaining as the TV show Magic’s Biggest Secrets Finally Revealed, the thread is worth reading, with discussion of enhancements provided in recent Oracle Database releases.

Almost two years ago I wrote a blog article that showed a similar vanishing act, where a SQL statement referencing the same table eight times triggered an interesting enhancement:

SELECT /*+ USE_NL(T51 T52 T53 T54 T55 T56 T57 T58) */
  T51.C1,
  T51.C2,
  T52.C1,
  T52.C2,
  T53.C1,
  T53.C2,
  T54.C1,
  T54.C2,
  T55.C1,
  T55.C2,
  T56.C1,
  T56.C2,
  T57.C1,
  T57.C2,
  T58.C1,
  T58.C2
FROM
  T5 T51,
  T5 T52,
  T5 T53,
  T5 T54,
  T5 T55,
  T5 T56,
  T5 T57,
  T5 T58
WHERE
  T51.C1=T52.C1
  AND T51.C1=T53.C1
  AND T51.C1=T54.C1
  AND T51.C1=T55.C1
  AND T51.C1=T56.C1
  AND T51.C1=T57.C1
  AND T51.C1=T58.C1
  AND T51.C1 BETWEEN 'A 0000000' AND 'A 1000000'; 

Oracle Database 11.1.0.6 behaved as expected, where table T5 was included eight times in the execution plan for the above SQL statement.  Oracle Database 11.2.0.1 output the following execution plan for the above SQL statement:

Plan hash value: 2002323537

------------------------------------------------------------------------------------
| Id  | Operation         | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |      1 |        |   1000K|00:00:00.50 |   26055 |
|*  1 |  TABLE ACCESS FULL| T5   |      1 |   1000K|   1000K|00:00:00.50 |   26055 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(("T51"."C1">='A 0000000' AND "T51"."C1"<='A 1000000')) 

A fine example of becoming eight times as efficient, while producing the same output.

Coincidentally, the same day that I saw the above mentioned Usenet thread, an email arrived from an ERP mailing list.  The original poster (OP) in the ERP mailing list reported that a SQL statement similar to the following was possibly causing a report to fail when the report was viewed by a typical ERP user:

SELECT
  C.TERRITORY,
  C.TAX_ID_NUMBER,
  O.CONTACT_FIRST_NAME,
  O.CONTACT_LAST_NAME,
  O.CONTACT_PHONE,
  O.CONTACT_FAX
FROM
  CUSTOMERS C,
  CUSTOMER_ORDERS O
WHERE
  C.ID = :CO_CUSTOMER_ID
  AND O.ID = :CO_ID ; 

Notice in the above that there is no declared join between the CUSTOMERS and CUSTOMER_ORDERS tables.  The ID column in each table is the primary key column, so at most one row will be returned from each table.  There is a declared foreign key constraint on the CUSTOMER_ORDERS.CUSTOMER_ID column that points to the CUSTOMERS.ID column (you might be wondering if that foreign key constraint might generate an additional predicate in the optimized version of the WHERE clause).  I suppose that if we want to be technical, we could state that the query creates a Cartesian join between the CUSTOMERS and CUSTOMER_ORDERS table, but in this case I do not see this Cartesian join as a problem since each row source will return at most one row.

Let’s try a quick experiment in SQL*Plus with Oracle Database 11.2.0.2 to see what the execution plan looks like for the above SQL statement:

EXEC :CO_CUSTOMER_ID:='CLA/COM/STA'
EXEC :CO_ID:='10002'

SET LINESIZE 120
SET PAGESIZE 1000

SELECT /*+ GATHER_PLAN_STATISTICS */
  C.TERRITORY,
  C.TAX_ID_NUMBER,
  O.CONTACT_FIRST_NAME,
  O.CONTACT_LAST_NAME,
  O.CONTACT_PHONE,
  O.CONTACT_FAX
FROM
  CUSTOMERS C,
  CUSTOMER_ORDERS O
WHERE
  C.ID = :CO_CUSTOMER_ID
  AND O.ID = :CO_ID ;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

Plan hash value: 31577051

----------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name            | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                 |      1 |        |      1 |00:00:00.01 |       5 |
|   1 |  NESTED LOOPS                |                 |      1 |      1 |      1 |00:00:00.01 |       5 |
|   2 |   TABLE ACCESS BY INDEX ROWID| CUSTOMER_ORDERS |      1 |      1 |      1 |00:00:00.01 |       3 |
|*  3 |    INDEX UNIQUE SCAN         | SYS_C0021619    |      1 |      1 |      1 |00:00:00.01 |       2 |
|   4 |   TABLE ACCESS BY INDEX ROWID| CUSTOMERS       |      1 |      1 |      1 |00:00:00.01 |       2 |
|*  5 |    INDEX UNIQUE SCAN         | SYS_C0021612    |      1 |      1 |      1 |00:00:00.01 |       1 |
----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("O"."ID"=:CO_ID)
   5 - access("C"."ID"=:CO_CUSTOMER_ID) 

So, the optimizer managed to recognize that the SQL statement is requesting a single row (the E-Rows column), where each row in the outer table (CUSTOMER_ORDERS) is expected to cause the retrieval of one row from the inner table (CUSTOMERS), and that prediction was accurate based on the values shown in the Starts and A-Rows columns.

Another reader of the ERP mailing list mentioned that the joins between tables should always be specified in SQL statements.  While logically correct, I also recall reading in the “Cost-Based Oracle Fundamentals” book about an interesting side-effect of transitive closure, where join conditions between tables in SQL statements might automatically vanish during query optimization; based on the information found in the book, the vanishing act is Oracle Database version dependent and the behavior may be affected by certain initialization parameters (Oracle Database 10.2 is said to not remove the join condition by default).  The OP in the ERP mailing list is running Oracle Database 8.1.0.7.  If we did modify the query to specify the join condition C.ID=O.CUSTOMER_ID, would that join condition still appear in the “optimized” version of the SQL statement after transitive closure on Oracle Database 8.1.0.7?  What about Oracle Database 9.0.1, 9.2.0.8, 10.2.0.5, or even something more recent such as 11.2.0.2?

Why guess, when you can easily set up a test case script?  First, we will create two sample tables with a declared foreign key relationship.  Each table will be created with a FILLER column that is declared as a VARCHAR2(200) – that column will substitute for the various other columns (in the production version of the tables) that typically consume roughly 200 characters per row:

CREATE TABLE CUSTOMERS (
  ID VARCHAR2(15) NOT NULL ENABLE,
  NAME VARCHAR2(50),
  CONTACT_FIRST_NAME VARCHAR2(30),
  CONTACT_LAST_NAME VARCHAR2(30),
  CONTACT_PHONE VARCHAR2(20),
  CONTACT_FAX VARCHAR2(20),
  TERRITORY VARCHAR2(15),
  TAX_ID_NUMBER VARCHAR2(25),
  FILLER VARCHAR2(200),
  PRIMARY KEY (ID));

CREATE TABLE CUSTOMER_ORDERS (
  ID VARCHAR2(15) NOT NULL ENABLE,
  CUSTOMER_ID VARCHAR2(15) NOT NULL ENABLE,
  CONTACT_FIRST_NAME VARCHAR2(30),
  CONTACT_LAST_NAME VARCHAR2(30),
  CONTACT_PHONE VARCHAR2(20),
  CONTACT_FAX VARCHAR2(20),
  FILLER VARCHAR2(200),
  PRIMARY KEY (ID),
  CONSTRAINT CHK_CUST FOREIGN KEY (CUSTOMER_ID)
    REFERENCES CUSTOMERS (ID) ENABLE); 

Next, we will insert a combination of reproducible and random data into the two tables (with 1,000 rows being inserted into the CUSTOMERS table and 500,000 rows being inserted into the CUSTOMER_ORDERS table), create an index on the foreign key column in the CUSTOMER_ORDERS table, and collect table and index statistics:

INSERT INTO
  CUSTOMERS
SELECT
  RPAD(CHR(MOD(ROWNUM-1,26)+65),8,CHR(MOD(ROWNUM,26)+65))||TO_CHAR(ROWNUM) ID,
  RPAD(CHR(MOD(ROWNUM-1,26)+65),30,CHR(MOD(ROWNUM,20)+65))||TO_CHAR(ROWNUM) NAME,
  RPAD(CHR(MOD(ROWNUM-1,26)+65),10,CHR(MOD(ROWNUM,26)+96)) CONTACT_FIRST_NAME,
  RPAD(CHR(MOD(ROWNUM+1,26)+65),10,CHR(MOD(ROWNUM+2,26)+96)) CONTACT_LAST_NAME,
  '###-###-####' CONTACT_PHONE,
  '###-###-####' CONTACT_FAX,
  DBMS_RANDOM.STRING('A',10) TERRITORY,
  DBMS_RANDOM.STRING('A',20) TAX_ID_NUMBER,
  RPAD('A',200,'A') FILLER
FROM
  DUAL
CONNECT BY
  LEVEL<=1000;

INSERT INTO
  CUSTOMER_ORDERS
SELECT /*+ LEADING(R) */
  TO_CHAR(ROWNUM+10000) ID,
  C.ID CUSTOMER_ID,
  C.CONTACT_FIRST_NAME,
  C.CONTACT_LAST_NAME,
  C.CONTACT_PHONE,
  C.CONTACT_FAX,
  C.FILLER
FROM
  CUSTOMERS C,
  (SELECT
     ROWNUM RN
   FROM
     DUAL
   CONNECT BY
     LEVEL<=500) R;

COMMIT;

CREATE INDEX IND_CO_CUSTOMER ON CUSTOMER_ORDERS (CUSTOMER_ID);

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'CUSTOMERS',CASCADE=>TRUE)
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'CUSTOMER_ORDERS',CASCADE=>TRUE) 

In the test script that follows, I will use the customer order 15000.  In my test case tables, the customer ID for this customer order is LMMMMMMM1000, but your CUSTOMER_ORDERS table might have a different customer ID for that row.  Let’s see which customer ID is associated with customer order 15000:

SELECT
  CUSTOMER_ID
FROM
  CUSTOMER_ORDERS
WHERE
  ID='15000';

CUSTOMER_ID
---------------
LMMMMMMM1000 

The heart of the test script follows.  In the script, change the value LMMMMMMM1000 where the bind variable value is set so that the value matches the CUSTOMER_ID that was returned by the above SQL statement, and set the OPTIMIZER_FEATURES_ENABLE parameter value to the default value for your database version.  The test script first includes the SQL statement that appeared in the OP’s email, followed by a modified version of the SQL statement that also includes the table join condition C.ID=O.CUSTOMER_ID, and a third SQL statement that joins the foreign key column in the CUSTOMER_ORDERS table with the primary key column in the CUSTOMERS table (in substitution of explicitly specifying the C.ID = :CO_CUSTOMER_ID predicate in the WHERE clause).  After each SQL statement is executed, the execution plan is retrieved for that SQL statement.  Will the execution plans for the three SQL statements be the same, or will the execution plans for one or more SQL statements differ?:

VARIABLE CO_CUSTOMER_ID VARCHAR2(15)
VARIABLE CO_ID VARCHAR2(15)

EXEC :CO_CUSTOMER_ID:='LMMMMMMM1000'
EXEC :CO_ID:='15000'

SET LINESIZE 120
SET PAGESIZE 1000

ALTER SESSION SET OPTIMIZER_FEATURES_ENABLE='11.2.0.2';

SELECT /*+ GATHER_PLAN_STATISTICS */
  C.TERRITORY,
  C.TAX_ID_NUMBER,
  O.CONTACT_FIRST_NAME,
  O.CONTACT_LAST_NAME,
  O.CONTACT_PHONE,
  O.CONTACT_FAX
FROM
  CUSTOMERS C,
  CUSTOMER_ORDERS O
WHERE
  C.ID = :CO_CUSTOMER_ID
  AND O.ID = :CO_ID ;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

SELECT /*+ GATHER_PLAN_STATISTICS */
  C.TERRITORY,
  C.TAX_ID_NUMBER,
  O.CONTACT_FIRST_NAME,
  O.CONTACT_LAST_NAME,
  O.CONTACT_PHONE,
  O.CONTACT_FAX
FROM
  CUSTOMERS C,
  CUSTOMER_ORDERS O
WHERE
  C.ID = :CO_CUSTOMER_ID
  AND O.ID = :CO_ID
  AND C.ID=O.CUSTOMER_ID;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

SELECT /*+ GATHER_PLAN_STATISTICS */
  C.TERRITORY,
  C.TAX_ID_NUMBER,
  O.CONTACT_FIRST_NAME,
  O.CONTACT_LAST_NAME,
  O.CONTACT_PHONE,
  O.CONTACT_FAX
FROM
  CUSTOMERS C,
  CUSTOMER_ORDERS O
WHERE
  O.ID = :CO_ID
  AND C.ID=O.CUSTOMER_ID;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST')); 

Repeat the above script several times, using progressively older Oracle Database versions in the ALTER SESSION SET OPTIMIZER_FEATURES_ENABLE line of the script (I used 11.2.0.2, 10.2.0.5, 9.2.0.8, 9.0.1, and 8.1.7).  Will the execution plans remain the same for the various executions of the script, or will the value of the OPTIMIZER_FEATURES_ENABLE parameter impact the execution plan?

Compare your results with those that I obtained below (to shorten the output, I removed the non-essential row that was returned by the SQL statements).

With OPTIMIZER_FEATURES_ENABLE=11.2.0.2:

SQL> SELECT /*+ GATHER_PLAN_STATISTICS */
  2    C.TERRITORY,
  3    C.TAX_ID_NUMBER,
  4    O.CONTACT_FIRST_NAME,
  5    O.CONTACT_LAST_NAME,
  6    O.CONTACT_PHONE,
  7    O.CONTACT_FAX
  8  FROM
  9    CUSTOMERS C,
 10    CUSTOMER_ORDERS O
 11  WHERE
 12    C.ID = :CO_CUSTOMER_ID
 13    AND O.ID = :CO_ID ;

SQL_ID  0vy1zt7jq25fs, child number 0
-------------------------------------
SELECT /*+ GATHER_PLAN_STATISTICS */   C.TERRITORY,   C.TAX_ID_NUMBER,
 O.CONTACT_FIRST_NAME,   O.CONTACT_LAST_NAME,   O.CONTACT_PHONE,
O.CONTACT_FAX FROM   CUSTOMERS C,   CUSTOMER_ORDERS O WHERE   C.ID =
:CO_CUSTOMER_ID   AND O.ID = :CO_ID

Plan hash value: 1588498623

----------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name            | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                 |      1 |        |      1 |00:00:00.01 |       7 |
|   1 |  NESTED LOOPS                |                 |      1 |      1 |      1 |00:00:00.01 |       7 |
|   2 |   TABLE ACCESS BY INDEX ROWID| CUSTOMER_ORDERS |      1 |      1 |      1 |00:00:00.01 |       4 |
|*  3 |    INDEX UNIQUE SCAN         | SYS_C008716     |      1 |      1 |      1 |00:00:00.01 |       3 |
|   4 |   TABLE ACCESS BY INDEX ROWID| CUSTOMERS       |      1 |      1 |      1 |00:00:00.01 |       3 |
|*  5 |    INDEX UNIQUE SCAN         | SYS_C008713     |      1 |      1 |      1 |00:00:00.01 |       2 |
----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("O"."ID"=:CO_ID)
   5 - access("C"."ID"=:CO_CUSTOMER_ID)

SQL>
SQL> SELECT /*+ GATHER_PLAN_STATISTICS */
  2    C.TERRITORY,
  3    C.TAX_ID_NUMBER,
  4    O.CONTACT_FIRST_NAME,
  5    O.CONTACT_LAST_NAME,
  6    O.CONTACT_PHONE,
  7    O.CONTACT_FAX
  8  FROM
  9    CUSTOMERS C,
 10    CUSTOMER_ORDERS O
 11  WHERE
 12    C.ID = :CO_CUSTOMER_ID
 13    AND O.ID = :CO_ID
 14    AND C.ID=O.CUSTOMER_ID;

SQL_ID  6suwfw1m251j0, child number 0
-------------------------------------
SELECT /*+ GATHER_PLAN_STATISTICS */   C.TERRITORY,   C.TAX_ID_NUMBER,
 O.CONTACT_FIRST_NAME,   O.CONTACT_LAST_NAME,   O.CONTACT_PHONE,
O.CONTACT_FAX FROM   CUSTOMERS C,   CUSTOMER_ORDERS O WHERE   C.ID =
:CO_CUSTOMER_ID   AND O.ID = :CO_ID   AND C.ID=O.CUSTOMER_ID

Plan hash value: 1588498623

----------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name            | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                 |      1 |        |      1 |00:00:00.01 |       7 |
|   1 |  NESTED LOOPS                |                 |      1 |      1 |      1 |00:00:00.01 |       7 |
|*  2 |   TABLE ACCESS BY INDEX ROWID| CUSTOMER_ORDERS |      1 |      1 |      1 |00:00:00.01 |       4 |
|*  3 |    INDEX UNIQUE SCAN         | SYS_C008716     |      1 |      1 |      1 |00:00:00.01 |       3 |
|   4 |   TABLE ACCESS BY INDEX ROWID| CUSTOMERS       |      1 |      1 |      1 |00:00:00.01 |       3 |
|*  5 |    INDEX UNIQUE SCAN         | SYS_C008713     |      1 |      1 |      1 |00:00:00.01 |       2 |
----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("O"."CUSTOMER_ID"=:CO_CUSTOMER_ID)
   3 - access("O"."ID"=:CO_ID)
   5 - access("C"."ID"=:CO_CUSTOMER_ID)

SQL>
SQL> SELECT /*+ GATHER_PLAN_STATISTICS */
  2    C.TERRITORY,
  3    C.TAX_ID_NUMBER,
  4    O.CONTACT_FIRST_NAME,
  5    O.CONTACT_LAST_NAME,
  6    O.CONTACT_PHONE,
  7    O.CONTACT_FAX
  8  FROM
  9    CUSTOMERS C,
 10    CUSTOMER_ORDERS O
 11  WHERE
 12    O.ID = :CO_ID
 13    AND C.ID=O.CUSTOMER_ID;

SQL_ID  abbq9g44p111b, child number 0
-------------------------------------
SELECT /*+ GATHER_PLAN_STATISTICS */   C.TERRITORY,   C.TAX_ID_NUMBER,
 O.CONTACT_FIRST_NAME,   O.CONTACT_LAST_NAME,   O.CONTACT_PHONE,
O.CONTACT_FAX FROM   CUSTOMERS C,   CUSTOMER_ORDERS O WHERE   O.ID =
:CO_ID   AND C.ID=O.CUSTOMER_ID

Plan hash value: 1588498623

----------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name            | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                 |      1 |        |      1 |00:00:00.01 |       7 |
|   1 |  NESTED LOOPS                |                 |      1 |      1 |      1 |00:00:00.01 |       7 |
|   2 |   TABLE ACCESS BY INDEX ROWID| CUSTOMER_ORDERS |      1 |      1 |      1 |00:00:00.01 |       4 |
|*  3 |    INDEX UNIQUE SCAN         | SYS_C008716     |      1 |      1 |      1 |00:00:00.01 |       3 |
|   4 |   TABLE ACCESS BY INDEX ROWID| CUSTOMERS       |      1 |   1000 |      1 |00:00:00.01 |       3 |
|*  5 |    INDEX UNIQUE SCAN         | SYS_C008713     |      1 |      1 |      1 |00:00:00.01 |       2 |
----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("O"."ID"=:CO_ID)
   5 - access("C"."ID"="O"."CUSTOMER_ID") 

Notice in the above output that while the SQL_ID changes for the three versions of the SQL statement, the Plan hash value remains 1588498623 (you might obtain a different constant value, such as 3347798118).  The consistent Plan hash value does NOT mean that the Predicate Information section of the execution plan output is identical, nor does it mean that the estimated number of rows will be the same.  If you closely examine the Predicate Information section of the second SQL statement, you might notice that the 11.2.0.2 optimizer introduced an additional predicate, while at the same time removing the explicit join condition between the two tables.  Take a look at the E-Rows column in the last of the execution plans – might this be a sign of a bug.  Should the optimizer really predict that each row of the CUSTOMER_ORDERS table should return 1000 rows from the CUSTOMERS table when there is a declared foreign key relationship that points to the primary key column of the CUSTOMERS table, and is there a chance that this type of bad prediction might adversely affect the execution plans of other SQL statements?

OPTIMIZER_FEATURES_ENABLE=10.2.0.5:
(same as 11.2.0.2)

OPTIMIZER_FEATURES_ENABLE=9.2.0.8
(same as 11.2.0.2)

OPTIMIZER_FEATURES_ENABLE=9.0.1
(same as 11.2.0.2)

With OPTIMIZER_FEATURES_ENABLE=8.1.7:

SQL> SELECT /*+ GATHER_PLAN_STATISTICS */
  2    C.TERRITORY,
  3    C.TAX_ID_NUMBER,
  4    O.CONTACT_FIRST_NAME,
  5    O.CONTACT_LAST_NAME,
  6    O.CONTACT_PHONE,
  7    O.CONTACT_FAX
  8  FROM
  9    CUSTOMERS C,
 10    CUSTOMER_ORDERS O
 11  WHERE
 12    C.ID = :CO_CUSTOMER_ID
 13    AND O.ID = :CO_ID ;

SQL_ID  0vy1zt7jq25fs, child number 4
-------------------------------------
SELECT /*+ GATHER_PLAN_STATISTICS */   C.TERRITORY,   C.TAX_ID_NUMBER,
 O.CONTACT_FIRST_NAME,   O.CONTACT_LAST_NAME,   O.CONTACT_PHONE,
O.CONTACT_FAX FROM   CUSTOMERS C,   CUSTOMER_ORDERS O WHERE   C.ID =
:CO_CUSTOMER_ID   AND O.ID = :CO_ID

Plan hash value: 1588498623

----------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name            | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                 |      1 |        |      1 |00:00:00.01 |       7 |
|   1 |  NESTED LOOPS                |                 |      1 |      1 |      1 |00:00:00.01 |       7 |
|   2 |   TABLE ACCESS BY INDEX ROWID| CUSTOMER_ORDERS |      1 |      1 |      1 |00:00:00.01 |       4 |
|*  3 |    INDEX UNIQUE SCAN         | SYS_C008716     |      1 |      1 |      1 |00:00:00.01 |       3 |
|   4 |   TABLE ACCESS BY INDEX ROWID| CUSTOMERS       |      1 |      1 |      1 |00:00:00.01 |       3 |
|*  5 |    INDEX UNIQUE SCAN         | SYS_C008713     |      1 |      1 |      1 |00:00:00.01 |       2 |
----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("O"."ID"=:CO_ID)
   5 - access("C"."ID"=:CO_CUSTOMER_ID)

Note
-----
   - cpu costing is off (consider enabling it)

SQL>
SQL> SELECT /*+ GATHER_PLAN_STATISTICS */
  2    C.TERRITORY,
  3    C.TAX_ID_NUMBER,
  4    O.CONTACT_FIRST_NAME,
  5    O.CONTACT_LAST_NAME,
  6    O.CONTACT_PHONE,
  7    O.CONTACT_FAX
  8  FROM
  9    CUSTOMERS C,
 10    CUSTOMER_ORDERS O
 11  WHERE
 12    C.ID = :CO_CUSTOMER_ID
 13    AND O.ID = :CO_ID
 14    AND C.ID=O.CUSTOMER_ID;

SQL_ID  6suwfw1m251j0, child number 4
-------------------------------------
SELECT /*+ GATHER_PLAN_STATISTICS */   C.TERRITORY,   C.TAX_ID_NUMBER,
 O.CONTACT_FIRST_NAME,   O.CONTACT_LAST_NAME,   O.CONTACT_PHONE,
O.CONTACT_FAX FROM   CUSTOMERS C,   CUSTOMER_ORDERS O WHERE   C.ID =
:CO_CUSTOMER_ID   AND O.ID = :CO_ID   AND C.ID=O.CUSTOMER_ID

Plan hash value: 1588498623

----------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name            | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                 |      1 |        |      1 |00:00:00.01 |       7 |
|   1 |  NESTED LOOPS                |                 |      1 |      1 |      1 |00:00:00.01 |       7 |
|*  2 |   TABLE ACCESS BY INDEX ROWID| CUSTOMER_ORDERS |      1 |      1 |      1 |00:00:00.01 |       4 |
|*  3 |    INDEX UNIQUE SCAN         | SYS_C008716     |      1 |      1 |      1 |00:00:00.01 |       3 |
|   4 |   TABLE ACCESS BY INDEX ROWID| CUSTOMERS       |      1 |      1 |      1 |00:00:00.01 |       3 |
|*  5 |    INDEX UNIQUE SCAN         | SYS_C008713     |      1 |      1 |      1 |00:00:00.01 |       2 |
----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("O"."CUSTOMER_ID"=:CO_CUSTOMER_ID)
   3 - access("O"."ID"=:CO_ID)
   5 - access("C"."ID"=:CO_CUSTOMER_ID)

Note
-----
   - cpu costing is off (consider enabling it)

SQL>
SQL> SELECT /*+ GATHER_PLAN_STATISTICS */
  2    C.TERRITORY,
  3    C.TAX_ID_NUMBER,
  4    O.CONTACT_FIRST_NAME,
  5    O.CONTACT_LAST_NAME,
  6    O.CONTACT_PHONE,
  7    O.CONTACT_FAX
  8  FROM
  9    CUSTOMERS C,
 10    CUSTOMER_ORDERS O
 11  WHERE
 12    O.ID = :CO_ID
 13    AND C.ID=O.CUSTOMER_ID;

SQL_ID  abbq9g44p111b, child number 4
-------------------------------------
SELECT /*+ GATHER_PLAN_STATISTICS */   C.TERRITORY,   C.TAX_ID_NUMBER,
 O.CONTACT_FIRST_NAME,   O.CONTACT_LAST_NAME,   O.CONTACT_PHONE,
O.CONTACT_FAX FROM   CUSTOMERS C,   CUSTOMER_ORDERS O WHERE   O.ID =
:CO_ID   AND C.ID=O.CUSTOMER_ID

Plan hash value: 1588498623

----------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name            | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                 |      1 |        |      1 |00:00:00.01 |       7 |
|   1 |  NESTED LOOPS                |                 |      1 |      1 |      1 |00:00:00.01 |       7 |
|   2 |   TABLE ACCESS BY INDEX ROWID| CUSTOMER_ORDERS |      1 |      1 |      1 |00:00:00.01 |       4 |
|*  3 |    INDEX UNIQUE SCAN         | SYS_C008716     |      1 |      1 |      1 |00:00:00.01 |       3 |
|   4 |   TABLE ACCESS BY INDEX ROWID| CUSTOMERS       |      1 |   1000 |      1 |00:00:00.01 |       3 |
|*  5 |    INDEX UNIQUE SCAN         | SYS_C008713     |      1 |      1 |      1 |00:00:00.01 |       2 |
----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("O"."ID"=:CO_ID)
   5 - access("C"."ID"="O"."CUSTOMER_ID")

Note
-----
   - cpu costing is off (consider enabling it) 

Other than the note cpu costing is off (consider enabling it), the output with the OPTIMIZER_FEATURES_ENABLED parameter set to 8.1.7 is identical to the output when that parameter was set to 11.2.0.2.

How about testing the real thing… actually executing the SQL statements in the test case script on an older version of Oracle Database – will you receive the same execution plans as shown above?  I tested Oracle Database 10.2.0.5 and obtained the same execution plans as I saw with 11.2.0.2.  Anyone with access to Oracle Database 10.1, 9.2, 9.0.1, or 8.1.7 that is able to test the above script?  The DBMS_XPLAN.DISPLAY_CURSOR function is not available in Oracle Database versions prior to 10.1, so you will need to be creative to display the execution plan (AUTOTRACE may show the wrong execution plan – you might experiment with a solution offered by Tanel Poder for Oracle Database 9i).

As a reminder, you may post execution plans in a blog comment by enclosing the execution plan inside <pre> </pre> tags:

<pre>
Plan hash value: 3347798118
----------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name            | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                 |      1 |        |      1 |00:00:00.01 |       7 |
|   1 |  NESTED LOOPS                |                 |      1 |      1 |      1 |00:00:00.01 |       7 |
|*  2 |   TABLE ACCESS BY INDEX ROWID| CUSTOMER_ORDERS |      1 |      1 |      1 |00:00:00.01 |       4 |
|*  3 |    INDEX UNIQUE SCAN         | SYS_C0042378    |      1 |      1 |      1 |00:00:00.01 |       3 |
|   4 |   TABLE ACCESS BY INDEX ROWID| CUSTOMERS       |      1 |      1 |      1 |00:00:00.01 |       3 |
|*  5 |    INDEX UNIQUE SCAN         | SYS_C0042375    |      1 |      1 |      1 |00:00:00.01 |       2 |
----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("O"."CUSTOMER_ID"=:CO_CUSTOMER_ID)
   3 - access("O"."ID"=:CO_ID)
   5 - access("C"."ID"=:CO_CUSTOMER_ID)
</pre>




Non-Specific Index Hints

24 01 2012

January 24, 2012 (Modified January 25, 2012)

As I read the “Troubleshooting Oracle Performance” book for the second time a couple of months ago, I made note on page 385 that it was possible to specify table columns in an index hint, rather than specifying specific index names (or just specifying the table name).  This might be useful, for example, if you have tables in your database with primary key columns with system assigned names for the supporting indexes, and an index hint is needed to correct specific performance issues.  I was again reminded that it was possible to create non-specific index hints that specify table columns when a recent quiz was posted that asked to find specific cases where the behavior is other than expected with the newer index hint syntax.

As an example of the newer syntax, I put together a brief demonstration.  First, the test table and index creation script:

CREATE TABLE T2 (
  C1 NUMBER,
  C2 NUMBER,
  C3 VARCHAR2(100),
  PRIMARY KEY(C1));

INSERT INTO
  T2
SELECT
  ROWNUM C1,
  MOD(ROWNUM-1,20)+1 C2,
  RPAD('A',100,'A')
FROM
  DUAL
CONNECT BY
  LEVEL<=100000;

COMMIT;

CREATE INDEX IND_T2_C2 ON T2(C2);

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T2',CASCADE=>TRUE)

SET AUTOTRACE TRACEONLY EXPLAIN
SET LINESIZE 120
SET PAGESIZE 1000 

Let’s try a simple query that specifies columns C1 and C2 in the WHERE clause:

SELECT
  C1,
  C2,
  C3
FROM
  T2
WHERE
  C1 BETWEEN 1 AND 10000
  AND C2=1;

Plan hash value: 906133967

------------------------------------------------------------------------------------------------
| Id  | Operation                        | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                 |             |   500 | 54500 |   121   (1)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID     | T2          |   500 | 54500 |   121   (1)| 00:00:01 |
|   2 |   BITMAP CONVERSION TO ROWIDS    |             |       |       |            |          |
|   3 |    BITMAP AND                    |             |       |       |            |          |
|   4 |     BITMAP CONVERSION FROM ROWIDS|             |       |       |            |          |
|*  5 |      INDEX RANGE SCAN            | IND_T2_C2   |       |       |    10   (0)| 00:00:01 |
|   6 |     BITMAP CONVERSION FROM ROWIDS|             |       |       |            |          |
|   7 |      SORT ORDER BY               |             |       |       |            |          |
|*  8 |       INDEX RANGE SCAN           | SYS_C008661 |       |       |    20   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   5 - access("C2"=1)
   8 - access("C1">=1 AND "C1"<=10000) 

The above shows that if this SQL statement were actually executed, the index IND_T2_C2, and the index SYS_C008661 (that is used to help enforce the primary key) would be used when executing the SQL statement.  You will only see the above execution plan in the Enterprise Edition of Oracle Database.  Let’s try again with a hint that prohibits the BITMAP CONVERSION TO FROM ROWIDS operation:

SELECT /*+ OPT_PARAM('_B_TREE_BITMAP_PLANS','FALSE') */
  C1,
  C2,
  C3
FROM
  T2
WHERE
  C1 BETWEEN 1 AND 10000
  AND C2=1;

Execution Plan
----------------------------------------------------------
Plan hash value: 1513984157

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |   500 | 54500 |   131   (2)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T2   |   500 | 54500 |   131   (2)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("C2"=1 AND "C1"<=10000 AND "C1">=1) 

The above execution plan shows that without the ability to perform the  the BITMAP CONVERSION TO FROM ROWIDS operation to allow the BITMAP AND operation, a full table scan was selected, so this is a case where the Standard Edition of Oracle Database and the Enterprise Edition might exhibit different execution performance.

Let’s use the previous SQL statement with hint as a starting point, and use a non-specific index hint to instruct the optimizer to use an index on the T2 table:

SELECT /*+ INDEX(T2) OPT_PARAM('_B_TREE_BITMAP_PLANS','FALSE') */
  C1,
  C2,
  C3
FROM
  T2
WHERE
  C1 BETWEEN 1 AND 10000
  AND C2=1;

Plan hash value: 3350885058

-------------------------------------------------------------------------------------------
| Id  | Operation                   | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |             |   500 | 54500 |   179   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS BY INDEX ROWID| T2          |   500 | 54500 |   179   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | SYS_C008661 | 10000 |       |    20   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("C2"=1)
   2 - access("C1">=1 AND "C1"<=10000) 

As shown by the above, the optimizer selected to use the SYS_C008661 primary key index, which would require it to retrieve 10,000 ROWID values from the index, rather than using the index on the T2 column which would have retrieved 5,000 (100,000 * 1/20) ROWIDs from the IND_T2_C2 index (the clustering factor of the IND_T2_C2 index was likely the deciding factor).  Let’s specifically request (demand) that the optimizer use the IND_T2_C2 index by adjusting the hint:

SELECT /*+ INDEX(T2 IND_T2_C2) OPT_PARAM('_B_TREE_BITMAP_PLANS','FALSE') */
  C1,
  C2,
  C3
FROM
  T2
WHERE
  C1 BETWEEN 1 AND 10000
  AND C2=1;

Plan hash value: 174424276

-----------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |   500 | 54500 |  1596   (1)| 00:00:01 |
|*  1 |  TABLE ACCESS BY INDEX ROWID| T2        |   500 | 54500 |  1596   (1)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IND_T2_C2 |  5000 |       |    10   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("C1"<=10000 AND "C1">=1)
   2 - access("C2"=1) 

Notice in the above that the IND_T2_C2 index was selected, is expected to return 5,000 ROWID values from the index, and the execution plan now has a calculated cost of 1,596.  The calculated cost is a simple explanation why the optimizer did not select to use this index automatically.

Now, consider a situation where the optimizer insists on using the IND_T2_C2 index, rather than the SYS_C008661 primary key index for this SQL statement, which could happen if the CLUSTERING_FACTOR of the indexes are not set correctly:

SET AUTOTRACE OFF

SELECT
  INDEX_NAME,
  CLUSTERING_FACTOR
FROM
  USER_INDEXES
WHERE
  TABLE_NAME='T2';

INDEX_NAME   CLUSTERING_FACTOR
------------ -----------------
SYS_C008661               1585
IND_T2_C2                31700

EXEC DBMS_STATS.SET_INDEX_STATS (OWNNAME=>USER,INDNAME=>'IND_T2_C2',CLSTFCT=>1585,NO_INVALIDATE=>FALSE)

SELECT
  INDEX_NAME,
  CLUSTERING_FACTOR
FROM
  USER_INDEXES
WHERE
  TABLE_NAME='T2';

INDEX_NAME   CLUSTERING_FACTOR
------------ -----------------
SYS_C008661               1585
IND_T2_C2                 1585 

We started with the primary key index having a clustering factor of 1,585 and the IND_T2_C2 index having a clustering factor of 31,700.  After using DBMS_STATS.SET_INDEX_STATS, the optimizer is convinced that the IND_T2_C2 index also has a clusting factor of 1,585.  Now both of the indexes have the same CLUSTERING_FACTOR statistic value, what happens if we execute the SQL statement again that specifies an index should be used to access table T1, but does not specify the exact index name?:

SELECT /*+ INDEX(T2) OPT_PARAM('_B_TREE_BITMAP_PLANS','FALSE') */
  C1,
  C2,
  C3
FROM
  T2
WHERE
  C1 BETWEEN 1 AND 10000
  AND C2=1;

Execution Plan
----------------------------------------------------------
Plan hash value: 174424276

-----------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |   500 | 54500 |    90   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS BY INDEX ROWID| T2        |   500 | 54500 |    90   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IND_T2_C2 |  5000 |       |    10   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("C1"<=10000 AND "C1">=1)
   2 - access("C2"=1) 

So, now that the CLUSTERING_FACTOR values are the same for both indexes, the optimizer is selecting the IND_T2_C2 index for the SQL statement.  We have successfully painted ourselves into a corner, telling the optimizer that the IND_T2_C2 index really is helpful for this SQL statement.  We are fairly certain that the IND_T2_C2 index is not ideal, and that the the SYS_C008661 primary key index is a better choice if an index access path is determined to be better than a full table scan, based on the order in which the data was inserted into the table’s blocks.  The problem now is that the SYS_C008661 index name is not consistent from one database to the next, or even in the same database if you drop table T2 and repeat the test.  How do we tell the optimizer to pick the index that is on the primary key column, column C1?  Oracle Database 10.1 introduced new syntax for index hints, which allows us to specify the column name(s), rather than the index name in the hint:

SELECT /*+ INDEX(T2 (C1)) OPT_PARAM('_B_TREE_BITMAP_PLANS','FALSE') */
  C1,
  C2,
  C3
FROM
  T2
WHERE
  C1 BETWEEN 1 AND 10000
  AND C2=1;

Plan hash value: 3350885058

-------------------------------------------------------------------------------------------
| Id  | Operation                   | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |             |   500 | 54500 |   179   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS BY INDEX ROWID| T2          |   500 | 54500 |   179   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | SYS_C008661 | 10000 |       |    20   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("C2"=1)
   2 - access("C1">=1 AND "C1"<=10000) 

As can be seen by the above, the index on column C1, which happens to be the primary key index, was selected by the optimizer at the request of the index hint.

Let’s take a look at the documentation for Oracle Database 10.2:

“Hints that specify an index can use either a simple index name or a parenthesized list of columns as follows:

index (table (column))
  • table specifies the name
  • columnspecifies the name of a column in the specified table
    • The columns can optionally be prefixed with table qualifiers allowing the hint to specify bitmap join indexes where the index columns are on a different table than the indexed table. If tables qualifiers are present, they must be base tables, not aliases in the query.
    • Each column in an index specification must be a base column in the specified table, not an expression. Function-based indexes cannot be hinted using a column specification unless the columns specified in the index specification form the prefix of a function-based index.
  • index specifies an index name

The hint is resolved as follows:

  • If an index name is specified, only that index is considered.
  • If a column list is specified and an index exists whose columns match the specified columns in number and order, only that index is considered. If no such index exists, then any index on the table with the specified columns as the prefix in the order specified is considered. In either case, the behavior is exactly as if the user had specified the same hint individually on all the matching indexes.

For example, in Example 16-3 the job_history table has a single-column index on the employee_id column and a concatenated index on employee_id and start_date columns. To specifically instruct the optimizer on index use, the query can be hinted as follows:

SELECT /*+ INDEX(v.j jhist_employee_ix (employee_id start_date)) */ * FROM v;

That is almost crystal clear (I probably poorly translated the diagram), even though it seems that the cost calculation might not have an impact (“If no such index exists, then any index on the table with the specified columns as the prefix in the order specified is considered“).  If you have the sample database schema loaded into a database, jump to the view definition in Example 16-3, create the view, and then see if the index hint in the documentation (at the bottom of the above quote block) actually works – it did not work for me.  The same hint example is found in the Oracle Database 11.2 documentation.

However, in the example above, where I used the hint syntax INDEX (TABLE (COLUMN)), the hint worked as expected.  Interestingly, the hint reference for Oracle Database 11.2 does not specifically mention this new syntax, and that might be why I was slow to recognize the new hint syntax.

Now that the basics are covered, let’s see if we are able to confuse the optimizer with index hints, taking up the challenge proposed by the blog article mentioned at the start of this article.  Below is a slightly modified version of the test script that I posted as a comment in the other blog article (with an additional column in the table, an additional index, and an enabled 10053 trace).  First, we will create the table, a couple of indexes on the table, change a couple of SQL*Plus parameters, and then enable a 10053 trace:

CREATE TABLE T1 (
  C1 NUMBER NOT NULL,
  C2 NUMBER NOT NULL,
  C3 VARCHAR2(30) NOT NULL,
  C4 VARCHAR2(200),
  C5 VARCHAR2(10));

INSERT INTO T1
SELECT
  MOD(ROWNUM-1, 90) * 4 C1,
  ROWNUM - 1 C2,
  TO_CHAR(ROWNUM - 1, 'RN') C3,
  LPAD('A',200,'A') C4,
  LPAD('B',10,'B') C5
FROM
  DUAL
CONNECT BY
  LEVEL<=1000000;

CREATE INDEX IND_T1_C1_C2 ON T1(C1,C2);
CREATE INDEX IND_T1_C1_C2_C5 ON T1(C1,C2,C5);
CREATE INDEX IND_T1_C2_C1_C3 ON T1(C2,C1,C3);
CREATE INDEX IND_T1_C3_C1_C2 ON T1(C3,C1,C2);

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T1',CASCADE=>TRUE)

SET AUTOTRACE TRACEONLY EXPLAIN
SET LINESIZE 120
SET PAGESIZE 1000
ALTER SESSION SET EVENTS '10053 TRACE NAME CONTEXT FOREVER, LEVEL 1';

Let’s begin the process of trying to confuse the Oracle Database 11.2.0.2 optimizer.  Unhinted, the following query accesses the index on columns C2, C1, and C3 to avoid accessing the table, and uses an INDEX FAST FULL SCAN operation that employs multi-block reads :

ALTER SESSION SET TRACEFILE_IDENTIFIER = 'UNHINTED';

SELECT
  C1,
  C2,
  C3
FROM
  T1;

Plan hash value: 2374279026

----------------------------------------------------------------------------------------
| Id  | Operation            | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |                 |  1000K|    23M|   387   (2)| 00:00:01 |
|   1 |  INDEX FAST FULL SCAN| IND_T1_C2_C1_C3 |  1000K|    23M|   387   (2)| 00:00:01 |
----------------------------------------------------------------------------------------

Let’s hint the optimizer to use the index on the columns C1 and C2:

ALTER SESSION SET TRACEFILE_IDENTIFIER = 'HINT_C1_C2';

SELECT /*+ INDEX(T1 (C1 C2)) */
  C1,
  C2,
  C3
FROM
  T1;

Plan hash value: 3388050039
--------------------------------------------------------------------------------------------
| Id  | Operation                   | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |              |  1000K|    23M|  1012K  (1)| 00:06:46 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1           |  1000K|    23M|  1012K  (1)| 00:06:46 |
|   2 |   INDEX FULL SCAN           | IND_T1_C1_C2 |  1000K|       |  3026  (10)| 00:00:02 |
--------------------------------------------------------------------------------------------

In the above, the optimizer obeyed the hint, even though the calculated cost from the unhinted plan increased from 387 to 1,012,000, and the optimizer selected to perform an INDEX FULL SCAN operation.

Let’s reverse the order of the columns in the index hint:

ALTER SESSION SET TRACEFILE_IDENTIFIER = 'HINT_C2_C1';

SELECT /*+ INDEX(T1 (C2 C1)) */
  C1,
  C2,
  C3
FROM
  T1;

Execution Plan
----------------------------------------------------------
Plan hash value: 1746297295

------------------------------------------------------------------------------------
| Id  | Operation        | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |                 |  1000K|    23M|  4851   (1)| 00:00:01 |
|   1 |  INDEX FULL SCAN | IND_T1_C2_C1_C3 |  1000K|    23M|  4851   (1)| 00:00:01 |
------------------------------------------------------------------------------------

In the above, note that the index on columns C2, C1, and C3 was used, but the cost is now calculated at 4,851 rather than 387 as it was in the unhinted plan. The INDEX FAST FULL SCAN operation is now shown as an INDEX FULL SCAN operation.  So, we have now convinced the optimizer to use an access path that employs single block reads of an index rather than multi-block reads of an index, simply by telling the optimizer to use the index that it would have used without the hint.  Have we confused the optimizer already?  Someone should probably take a look at the 10053 trace files.  :-)

We have an index on columns C2, C1, and C3, but we also have an index on columns C3, C1, and C2. What happens when we specify the columns C3, C1, and C2 in the index hint in that order?

ALTER SESSION SET TRACEFILE_IDENTIFIER = 'HINT_C3_C1_C2';

SELECT /*+ INDEX(T1 (C3 C1 C2)) */
  C1,
  C2,
  C3
FROM
  T1;

Plan hash value: 2273443829

------------------------------------------------------------------------------------
| Id  | Operation        | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |                 |  1000K|    23M|  4943   (1)| 00:00:01 |
|   1 |  INDEX FULL SCAN | IND_T1_C3_C1_C2 |  1000K|    23M|  4943   (1)| 00:00:01 |
------------------------------------------------------------------------------------

An index full scan was selected to be performed on the IND_T1_C3_C1_C2 index with a calculated cost of 4,943, rather than using the IND_T1_C2_C1_C3 index that previously resulted in a cost of 4,851 – so the optimizer will not alter the order of the columns in the index hint to reduce the calculated cost.

Let’s try another example where we select all of the columns that are present in the IND_T1_C1_C2_C5 index – will the optimizer use that index, or will it obey the expected behavior of the index hint?:

ALTER SESSION SET TRACEFILE_IDENTIFIER = 'HINT_C1_C2_WC5';

SELECT /*+ INDEX(T1 (C1 C2)) */
  C1,
  C2,
  C5
FROM
  T1;

Execution Plan
----------------------------------------------------------
Plan hash value: 3388050039

--------------------------------------------------------------------------------------------
| Id  | Operation                   | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |              |  1000K|    19M|  1003K  (1)| 00:00:23 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1           |  1000K|    19M|  1003K  (1)| 00:00:23 |
|   2 |   INDEX FULL SCAN           | IND_T1_C1_C2 |  1000K|       |  2750   (1)| 00:00:01 |
--------------------------------------------------------------------------------------------

The index that exactly included the columns specified in the index hint was selected, with a calculated cost of 1,003,000.

So, what happens if we are less specific in the index hint, and just list the first column, where there are two composite indexes that start with the specified column:

ALTER SESSION SET TRACEFILE_IDENTIFIER = 'HINT_C1_WC2_C5';

SELECT /*+ INDEX(T1 (C1)) */
  C1,
  C2,
  C5
FROM
  T1;

Execution Plan
----------------------------------------------------------
Plan hash value: 2942389535

------------------------------------------------------------------------------------
| Id  | Operation        | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |                 |  1000K|    19M|  4293   (1)| 00:00:01 |
|   1 |  INDEX FULL SCAN | IND_T1_C1_C2_C5 |  1000K|    19M|  4293   (1)| 00:00:01 |
------------------------------------------------------------------------------------

As shown in the above execution plan, the optimizer arrived at a cost of 4,293 by selecting to use the IND_T1_C1_C2_C5 index, which avoided the access to the T1 table.  So, sometimes the performance will be better with slightly less specific hints.

Out of curiosity, what do you think will happen if we completely eliminate the index hint?  Let’s test:

ALTER SESSION SET TRACEFILE_IDENTIFIER = 'UNHINTED_C1_C2_C5';

SELECT
  C1,
  C2,
  C5
FROM
  T1;

Execution Plan
----------------------------------------------------------
Plan hash value: 2722951733

----------------------------------------------------------------------------------------
| Id  | Operation            | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |                 |  1000K|    19M|   343   (3)| 00:00:01 |
|   1 |  INDEX FAST FULL SCAN| IND_T1_C1_C2_C5 |  1000K|    19M|   343   (3)| 00:00:01 |
----------------------------------------------------------------------------------------

In the above execution plan, the IND_T1_C1_C2_C5 index is still selected for use as it was in the previous execution plan, but notice that the access path has changed to an INDEX FAST FULL SCAN operation and the calculated cost dropped from 4,293 to 343.  So, the above output implies that sometimes the performance will be better if we simply do not hint an index access path, if the index access path would have been selected otherwise.  Someone want to take a look at the 10053 trace files and explain why?

If we add a WHERE clause that places a restriction on column C2 to be less than 10, the optimizer could use a couple of different access paths. Let’s specify the columns C1 and C2 in the index hint to see which index is selected:

ALTER SESSION SET TRACEFILE_IDENTIFIER = 'HINT_C1_C2_C2_WHERE';

SELECT /*+ INDEX(T1 (C1 C2)) */
  C1,
  C2,
  C3
FROM
  T1
WHERE
   C2<10;

Plan hash value: 1883798457

--------------------------------------------------------------------------------------------
| Id  | Operation                   | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |              |    10 |   250 |   103   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1           |    10 |   250 |   103   (0)| 00:00:01 |
|*  2 |   INDEX SKIP SCAN           | IND_T1_C1_C2 |    10 |       |    92   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C2"<10)
       filter("C2"<10)

In the above, an INDEX SKIP SCAN operation was selected because the index with columns C1 and C2 was specified in the hint.

Let’s try another example that possibly might be considered a case where the optimizer disobeys the hint or is free to change the order of the columns specified in the index hint (this might be incorrectly considered an edge case):

ALTER SESSION SET TRACEFILE_IDENTIFIER = 'HINT_C1_C2_C3_C2_WHERE';

SELECT /*+ INDEX(T1 (C1 C2 C3)) */
  C1,
  C2,
  C3
FROM
  T1
WHERE
  C2<10;

Plan hash value: 4150417361

------------------------------------------------------------------------------------
| Id  | Operation        | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |                 |    10 |   250 |     3   (0)| 00:00:01 |
|*  1 |  INDEX RANGE SCAN| IND_T1_C2_C1_C3 |    10 |   250 |     3   (0)| 00:00:01 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("C2"<10)

A quick peek at the above output might suggest that the optimizer could potentially decide to locate an index with columns C1, C2, and C3 in any order – but I do not believe that this is the case. I believe that the optimizer considered the index hint specified in the SQL statement as being invalid (a check of the 10053 trace might confirm).

Let’s create another index and then repeat the above SQL statement:

CREATE INDEX IND_T1_C1_C2_C3 ON T1(C1,C2,C3);

ALTER SESSION SET TRACEFILE_IDENTIFIER = 'HINT_C1_C2_C3_C2_WHERE2';

SELECT /*+ INDEX(T1 (C1 C2 C3)) */
  C1,
  C2,
  C3
FROM
  T1
WHERE
  C2<10;

Plan hash value: 212907557

------------------------------------------------------------------------------------
| Id  | Operation        | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |                 |    10 |   250 |    92   (0)| 00:00:01 |
|*  1 |  INDEX SKIP SCAN | IND_T1_C1_C2_C3 |    10 |   250 |    92   (0)| 00:00:01 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("C2"<10)
       filter("C2"<10)

The index with the columns that matched the order of the columns in the index hint was selected, even though the calculated cost would have been lower if the optimizer were permitted to select any index with the columns listed in the index hint.

What about a case where there is an exact match between an index definition and an index hint, and there is also another index with one additional column which would avoid the table access:

ALTER SESSION SET TRACEFILE_IDENTIFIER = 'HINT_C1_C2_C2_WHERE2';

SELECT /*+ INDEX(T1 (C1 C2)) */
  C1,
  C2,
  C3
FROM
  T1
WHERE
  C2<10;

Plan hash value: 1883798457

--------------------------------------------------------------------------------------------
| Id  | Operation                   | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |              |    10 |   250 |   103   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1           |    10 |   250 |   103   (0)| 00:00:01 |
|*  2 |   INDEX SKIP SCAN           | IND_T1_C1_C2 |    10 |       |    92   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C2"<10)
       filter("C2"<10)

The index that exactly matched the index hint was selected by the optimizer.

What if we only specify in the index hint a leading column, when there are two indexes with that leading column, one of which allows the optimizer to avoid the table access:

ALTER SESSION SET TRACEFILE_IDENTIFIER = 'HINT_C1_C2_WHERE';

SELECT /*+ INDEX(T1 (C1)) */
  C1,
  C2,
  C3
FROM
  T1
WHERE
  C2<10;

Plan hash value: 212907557

------------------------------------------------------------------------------------
| Id  | Operation        | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |                 |    10 |   250 |    92   (0)| 00:00:01 |
|*  1 |  INDEX SKIP SCAN | IND_T1_C1_C2_C3 |    10 |   250 |    92   (0)| 00:00:01 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("C2"<10)
       filter("C2"<10)

The optimizer selected the lowest cost access path from the two indexes that matched the hint.

What if we specify a column in the index hint that is not listed in the SELECT or WHERE clauses?

ALTER SESSION SET TRACEFILE_IDENTIFIER = 'HINT_C3_C2_WHERE';

SELECT /*+ INDEX(T1 (C3)) */
  C1,
  C2
FROM
  T1
WHERE
  C2<10;

Plan hash value: 1328421701

------------------------------------------------------------------------------------
| Id  | Operation        | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |                 |    10 |    90 |  4039   (1)| 00:00:02 |
|*  1 |  INDEX SKIP SCAN | IND_T1_C3_C1_C2 |    10 |    90 |  4039   (1)| 00:00:02 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("C2"<10)
       filter("C2"<10)

The above shows the original execution plan (captured a couple of days ago without column C5 in the table definition, without index IND_T1_C1_C2_C5, and without an enabled 10053 trace), which shows that the optimizer still obeyed the intention of the hint – it found an index that started with the specified column and selected to perform an INDEX SKIP SCAN operation even though column C2, specified in the WHERE clause, is the third column in the index definition.

As luck would have it, with the slightly altered table definition (and possibly different statistics estimates) the execution plan has changed to use an INDEX FULL SCAN operation rather than an INDEX SKIP SCAN operation, and now the execution plan has a higher calculated cost.  So, why did the execution plan change from an INDEX SKIP SCAN?  Here is the current execution plan:

Execution Plan
----------------------------------------------------------
Plan hash value: 2273443829

------------------------------------------------------------------------------------
| Id  | Operation        | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |                 |    10 |    90 |  4945   (1)| 00:00:01 |
|*  1 |  INDEX FULL SCAN | IND_T1_C3_C1_C2 |    10 |    90 |  4945   (1)| 00:00:01 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("C2"<10)
       filter("C2"<10)

So, for the challenge, try to explain the oddities that I pointed out above.  I have not yet reviewed the 10053 trace files, but I will later.





Finding the Next Primary Key Value, a Pre-fixed Solution

20 01 2012

January 20, 2012

A request for assistance came in from an ERP mailing list.  The original poster (OP) is running an unspecified version of Oracle Database 9i, and is in need of a solution to generate new part numbers with prefixed characters that describe the type of part, followed by a sequential number that is unique to the prefix.  The prefixes might be PAINT, BAR, BEARING, DRILL, etc.  Sample part numbers might include BAR0599, PAINT012, BEARING012345, etc.

When I first saw the request, my first thought was to create sequences for the different prefixes, similar to the following:

CREATE SEQUENCE PART_PAINT_ID START WITH 13 NOCACHE;
CREATE SEQUENCE PART_BAR_ID START WITH 600 NOCACHE;
CREATE SEQUENCE PART_BEARING_ID START WITH 12346 NOCACHE;
CREATE SEQUENCE PART_DRILL_ID START WITH 999 NOCACHE;
...

Once the above sequences are created, we could then find the next part number with a SQL statement similar to the following:

SELECT
  'PAINT'||PART_PAINT_ID.NEXTVAL NEXT_PART_ID
FROM
  DUAL;

NEXT_PART_ID
------------
PAINT13 

I suspected that there was a catch – for some reason the sample part numbers included a 0 before the sequence number, and I assumed that there could be a variable number of 0 digits before that sequence number for the different prefixes.  To fix the above, we might try working with the LPAD function to add leading zeros to the sequence number:

SELECT
  'PAINT'||LPAD(TO_CHAR(PART_PAINT_ID.NEXTVAL), 3, '0') NEXT_PART_ID
FROM
  DUAL;

NEXT_PAR
--------
PAINT014 

Need more leading zeros?  No problem, just adjust the number in the LPAD function:

SELECT
  'PAINT'||LPAD(TO_CHAR(PART_PAINT_ID.NEXTVAL), 6, '0') NEXT_PART_ID
FROM
  DUAL;

NEXT_PART_I
-----------
PAINT000015 

Need fewer zeros?:

SELECT
  'PAINT'||LPAD(TO_CHAR(PART_PAINT_ID.NEXTVAL), 1, '0') NEXT_PART_ID
FROM
  DUAL;

NEXT_P
------
PAINT1 

The above shows the first of several potential problems with this approach.  What else could go wrong?  What if a smart person decides that he does not need this “crutch” solution and creates 30 part numbers using the method of best guess or cheat sheet in the side drawer?  Because these are primary key values, the smart person might cause a number of problems that might not be detected for some time… until the Oracle sequence reaches one of the unexpected sequence numbers that were already used.

In addition to the suggestion of Oracle sequences, I offered a couple of other suggestions.  The ERP system offers a macro language that is a lot like VBScript.  The OP could create a VBScript that builds a HTML web page in real time, or possibly pull the next sequence number from a ASP (or similar) web page.  As an example of a HTML web page built in real time, this is an example that I created roughly three years ago:

Dim objIE
Dim objShell
Dim strHTML
Dim intFlag

On Error Resume Next

Set objShell = CreateObject("WScript.Shell")
strHTML = strHTML & "<form name=""Visual"">" & vbCrLf
strHTML = strHTML & "<input type=hidden id=txtOK value="" "">" & vbCrLf

strHTML = strHTML & "<table>" & vbCrLf
strHTML = strHTML & "<tr><td>Component<td><select size=""1"" id=""cboComponent"" name=""cboComponent"" style=""width:150"">" & vbCrLf
strHTML = strHTML & "<option value=""ACTIVATOR"">ACTIVATOR</option>" & vbCrLf
strHTML = strHTML & "<option value=""ACCELERATOR"">ACCELERATOR</option>" & vbCrLf
strHTML = strHTML & "<option value=""CATALYST"">CATALYST</option>" & vbCrLf
strHTML = strHTML & "<option value=""EPOXY PRIMER"">EPOXY PRIMER</option>" & vbCrLf
strHTML = strHTML & "<option value=""PRIMER"">PRIMER</option>" & vbCrLf
strHTML = strHTML & "<option value=""REDUCER"">REDUCER</option>" & vbCrLf
strHTML = strHTML & "<option value=""TOP COAT"">TOP COAT</option>" & vbCrLf
strHTML = strHTML & "</select></td></tr>" & vbCrLf

strHTML = strHTML & "<tr><td>Vendor<td><select size=""1"" id=""cboVendor"" name=""cboVendor"" style=""width:150"">" & vbCrLf
strHTML = strHTML & "<option value=""DUPONT"">DUPONT</option>" & vbCrLf
strHTML = strHTML & "<option value=""LILLY"">LILLY</option>" & vbCrLf
strHTML = strHTML & "<option value=""NILES CHEMICAL"">NILES CHEMICAL</option>" & vbCrLf
strHTML = strHTML & "<option value=""MANITOWOC"">MANITOWOC</option>" & vbCrLf
strHTML = strHTML & "<option value=""MAUTZ"">MAUTZ</option>" & vbCrLf
strHTML = strHTML & "<option value=""PAINTS AND SOLVENTS"">PAINTS AND SOLVENTS</option>" & vbCrLf
strHTML = strHTML & "<option value=""SHEBOYGAN"">SHEBOYGAN</option>" & vbCrLf
strHTML = strHTML & "</select></td></tr>" & vbCrLf

strHTML = strHTML & "<tr><td>Type<td><select size=""1"" id=""cboType"" name=""cboType"" style=""width:150"">" & vbCrLf
strHTML = strHTML & "<option value=""FLAT"">FLAT</option>" & vbCrLf
strHTML = strHTML & "<option value=""GLOSS"">GLOSS</option>" & vbCrLf
strHTML = strHTML & "<option value=""MED. GLOSS"">MED. GLOSS</option>" & vbCrLf
strHTML = strHTML & "</select></td></tr>" & vbCrLf

strHTML = strHTML & "<tr><td>Color<td><select size=""1"" id=""cboColor"" name=""cboColor"" style=""width:150"">" & vbCrLf
strHTML = strHTML & "<option value=""RED"">RED</option>" & vbCrLf
strHTML = strHTML & "<option value=""YELLOW"">YELLOW</option>" & vbCrLf
strHTML = strHTML & "<option value=""GREEN"">GREEN</option>" & vbCrLf
strHTML = strHTML & "<option value=""BLUE"">BLUE</option>" & vbCrLf
strHTML = strHTML & "<option value=""BLACK"">BLACK</option>" & vbCrLf
strHTML = strHTML & "<option value=""WHITE"">WHITE</option>" & vbCrLf
strHTML = strHTML & "<option value=""GRAY"">GRAY</option>" & vbCrLf
strHTML = strHTML & "</select></td></tr>" & vbCrLf
strHTML = strHTML & "</table>" & vbCrLf

strHTML = strHTML & "<p><center><input type=button value=""OK"" id=cmdOK onclick=""document.getElementById('txtOK').value='OK';""></center>" & vbCrLf
strHTML = strHTML & "</form>" & vbCrLf

Set objIE = CreateObject("InternetExplorer.Application")
objIE.Navigate "about:blank"
objIE.Document.Body.InnerHTML = strHTML
objIE.Document.Title="Get Part Info"
objIE.Left = 0
objIE.Top = 0
objIE.Width = 400
objIE.Height = 400
objIE.Statusbar = False
objIE.Menubar = False
objIE.Toolbar = False

objIE.Visible = True

Do While objIE.Busy <> False
    objShell.Sleep 200
Loop

intFlag = 0

'loop until the button is clicked
Do While intFlag = 0
    If Err <> 0 Then
        IntFlag = -1
    End If   
    If objIE is Nothing Then
        'User closed ID
        intFlag = -1
    Else
        If objIE.Document.All.txtOK.Value <> " " Then
            intFlag = 1
        End If
    End If
    'objShell.Sleep 250 'Throws an error?
Loop

If intFlag = 1 Then
    'Copy in the values from the web page
    USER_1 = objIE.Document.Body.All.cboComponent.Value
    USER_2 = objIE.Document.Body.All.cboVendor.Value
    USER_3 = objIE.Document.Body.All.cboType.Value
    USER_4 = objIE.Document.Body.All.cboColor.Value

    objIE.Quit
End If

Set objIE = Nothing
Set objShell = Nothing
 

The result of the above VBScript is an interactive interface that appears similar to the following:

Another option that I suggested to the OP is to use Excel to keep track of the last sequence number for each prefix – and use an Excel dialog displayed from a VBScript macro.  Roughly three years ago I created a sample macro with the following code:

Dim objExcel
Dim objForm
Dim objShell

On Error Resume Next

Set objExcel = CreateObject("Excel.Application")

'with the help of custom program, set a 1 second delay, then force the window to the top
Set objShell = CreateObject("WScript.Shell")
objShell.Run("C:\BringToTop.exe " & Chr(34) & "Paint Naming" & Chr(34) & " 2")
Set objShell = Nothing

With objExcel
    .Workbooks.Open "C:\ExcelMacroDialog.xls"

    If .Sheets("CalculateArea").Cells(1, 1).Value <> "" Then
        ID = .Sheets("CalculateArea").Cells(1, 1).Value
        DESCRIPTION = .Sheets("CalculateArea").Cells(2, 1).Value
        PRODUCT_CODE = .Sheets("CalculateArea").Cells(3, 1).Value
        COMMODITY_CODE = .Sheets("CalculateArea").Cells(4, 1).Value
        USER_5 = .Sheets("CalculateArea").Cells(5, 1).Value
        PURCHASED = True
        FABRICATED = False
    End If
End With

objExcel.DisplayAlerts = False
objExcel.ActiveWorkbook.Saved = True
objExcel.Quit
Set objExcel = Nothing  

The above macro is quite short, because most of the work is performed in the Excel spreadsheet.  The resulting Excel dialog appeared similar to this:

My first suggestion to the OP, not knowing the full scope of the problem, was to try coding a VBScript macro similar to the following:

Dim strPartID
Dim strPartIDNew
Dim strNumberOld
Dim strNumberNew
Dim i

strPartID = PART_ID
strNumberOld = ""

'strPartID = "PAINT0599"  'Remove this line after testing
strPartID = "PAINT0089"  'Remove this line after testing

For i = Len(strPartID) to 1 Step -1
  If (Mid(strPartID, i, 1) >= "0") And (Mid(strPartID, i, 1) <= "9") Then
    strNumberOld = Mid(strPartID, i, 1) & strNumberOld
  Else
    strNumberNew = cStr(cInt(strNumberOld) + 1)

    strPartIDNew = Left(strPartID, i)
    If Len(strNumberOld) > Len(strNumberNew) Then
      'Add Padding 0s
      strPartIDNew = strPartIDNew & String((Len(strNumberOld) - Len(strNumberNew)), "0")
    End If

    strPartIDNew = strPartIDNew & strNumberNew

    Exit For
  End If
Next

If strPartIDNew <> "" Then
  Msgbox "The New Part ID is " & strPartIDNew
Else
  Msgbox "Not a Valid Starting Point" & strPartID
End If 

The intention of the above macro is to locate the number 89 in the supplied strPartID variable, recognize that a 4 digit serial number is expected, and output:

The New Part ID is PAINT0090 

Nice, but that is not what the OP needs.  The highest currently sequenced number will not be provided – that value must be looked up in the database.  So close…

Let’s try a different approach, starting by creating a sample table with three sequences of part numbers with different prefixes:

CREATE TABLE T1 (
  ID VARCHAR2(30),
  DESCRIPTION VARCHAR2(40),
  PRIMARY KEY(ID));

INSERT INTO
  T1
SELECT
  'PAINT'||LPAD(TO_CHAR(ROWNUM), 6, '0') ID,
  'TESTING ' || TO_CHAR(ROWNUM) DESCRIPTION
FROM
  DUAL
CONNECT BY
  LEVEL<=35000;

INSERT INTO
  T1
SELECT
  'BAR'||LPAD(TO_CHAR(ROWNUM), 6, '0') ID,
  'TESTING ' || TO_CHAR(ROWNUM) DESCRIPTION
FROM
  DUAL
CONNECT BY
  LEVEL<=45000;

INSERT INTO
  T1
SELECT
  'BEARING'||LPAD(TO_CHAR(ROWNUM), 6, '0') ID,
  'TESTING ' || TO_CHAR(ROWNUM) DESCRIPTION
FROM
  DUAL
CONNECT BY
  LEVEL<=888;

COMMIT; 

Let’s find the next sequence number for the BAR prefix:

SELECT
  MAX(TO_NUMBER(SUBSTR(ID,4))) + 1 NN
FROM
  T1
WHERE
  ID BETWEEN 'BAR0' AND 'BAR99999999';

        NN
----------
     45001 

As long as ALL of the characters after the BAR keyword prefix are numbers, the above would tell us that the next number with BAR as the prefix is 45001.  On Oracle Database 10.1 and higher it would be a good idea to add an additional predicate to the WHERE clause that uses regular expressions to avoid potential problems where some unrelated ID column values start with the letters BAR, a number character, and then at some position to the right contain a letter character (that condition would cause the above SQL statement to fail).

Building onto the above SQL statement, we could just retrieve the next part number in the sequence from the database, when provided any existing prefixed part number as the starting point:

SELECT
  'BAR' || LPAD(TO_CHAR(MAX(TO_NUMBER(SUBSTR(ID,4))) + 1), 6, '0') NEXT_PART_ID
FROM
  T1
WHERE
  ID BETWEEN 'BAR0' AND 'BAR99999999';

NEXT_PART
---------
BAR045001 

We are able to take the above SQL statement and incorporate it into a VBScript macro to find the next prefixed sequential number for the primary key column:

Dim strPartID
Dim strPartIDNew
Dim strNumberOld
Dim strNumberNew
Dim i
Dim strUserName
Dim strPassword
Dim strDatabase
Dim strSQL
Dim dbDatabase
Dim snpData

On Error Resume Next

strUsername = "MyUserID"
strPassword = "MyPassword"
strDatabase = "MyDatabase"

Set snpData = CreateObject("ADODB.Recordset")
Set dbDatabase = CreateObject("ADODB.Connection")

dbDatabase.ConnectionString = "Provider=OraOLEDB.Oracle;Data Source=" & strDatabase & ";User ID=" & strUserName & ";Password=" & strPassword & ";"
dbDatabase.Open

strNumberOld = ""

'strPartID = PART_ID
strPartID = "BEARING0599"  'Remove when finished testing
'strPartID = "BAR0089"  'Remove when finished testing
'strPartID = "PAINT0089"  'Remove when finished testing

For i = Len(strPartID) to 1 Step -1
  If (Mid(strPartID, i, 1) >= "0") And (Mid(strPartID, i, 1) <= "9") Then
    strNumberOld = Mid(strPartID, i, 1) & strNumberOld
  Else
    strPartIDNew = Left(strPartID, i)

    strSQL = "SELECT" & VBCrLf
    strSQL = strSQL & "  '" & strPartIDNew & "' || LPAD(TO_CHAR(MAX(TO_NUMBER(SUBSTR(ID," & (i+1) &"))) + 1), 6, '0') NEXT_PART_ID" & VBCrLf
    strSQL = strSQL & "FROM" & VBCrLf
    strSQL = strSQL & "  T1" & VBCrLf
    strSQL = strSQL & "WHERE" & VBCrLf
    strSQL = strSQL & "  ID BETWEEN '" & strPartIDNew & "0' AND '" & strPartIDNew & "99999999'"

    snpData.Open strSQL, dbDatabase

    If snpData.State = 1 Then
      If Not(snpData.EOF) Then
        strPartIDNew = snpData("next_part_id")
      End If
      snpData.Close
    End If
    Exit For
  End If
Next

If strPartIDNew <> "" Then
  Msgbox "The New Part ID is " & strPartIDNew
Else
  Msgbox "Not a Valid Starting Point" & strPartID
End If

dbDatabase.Close
Set snpData = Nothing
Set dbDatabase = Nothing 

The OP put together a parallel solution that also used a VBScript macro.  The macro sent a SQL statement very similar to the following to the database:

SELECT
  ID
FROM
  T1
WHERE
  ID LIKE 'BAR%'; 

In the macro code the OP parsed each of the returned ID values to determine the highest sequence number, added 1 to that value, padded the new highest sequence number with “0′ digits and output the result.  Most likely due to curiosity, the OP asked why I did not simply use his VBScript macro as part of my proposed solution.  What reasons do you think that I gave to the OP?





What Number Immediately Follows 3,999? Oracle Database Refuses to Answer

16 01 2012

January 16, 2012

I put together a test case to demonstrate how the physical reads autotrace statistic could exceed the consistent gets autotrace statistic if a single-pass or multi-pass workarea execution were performed during the execution of the SQL statement.  If you are interested, you can see the test case in this recent OTN thread.  If I recall correctly, index pre-fetching could also result in a similar situation where the physical reads autotrace statistic could exceed the consistent gets autotrace statistic.

The result in the OTN test case left me a little concerned.  It might not be well known, however it is possible to instruct Oracle Database to format normal base 10 numbers as Roman numerals with the RN format specification.  For example:

SELECT
  TO_CHAR(1, 'RN') ROMAN_NUMBER
FROM
  DUAL;

ROMAN_NUMBER
---------------
              I

SELECT
  TO_CHAR(12, 'RN') ROMAN_NUMBER
FROM
  DUAL;

ROMAN_NUMBER
---------------
            XII

SELECT
  TO_CHAR(123, 'RN') ROMAN_NUMBER
FROM
  DUAL;

ROMAN_NUMBER
---------------
         CXXIII

SELECT
  TO_CHAR(1234, 'RN') ROMAN_NUMBER
FROM
  DUAL;

ROMAN_NUMBER
---------------
       MCCXXXIV 

I personally think that is a neat feature.  In the OTN test case I created a table that would hopefully contain the first 9,999,999 Roman numbers:

CREATE TABLE T1 (
  C1 NUMBER NOT NULL,
  C2 NUMBER NOT NULL,
  C3 VARCHAR2(30) NOT NULL,
  C4 VARCHAR2(200));

INSERT INTO T1
SELECT
  MOD(ROWNUM-1, 90) * 4 C1,
  ROWNUM - 1 C2,
  TO_CHAR(ROWNUM - 1, 'RN') C3,
  LPAD('A',200,'A') C4
FROM
  (SELECT
     ROWNUM RN
   FROM
     DUAL
   CONNECT BY
     LEVEL<=1000000),
  (SELECT
     ROWNUM RN
   FROM
     DUAL
   CONNECT BY
     LEVEL<=10);

COMMIT;

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T1') 

As part of the OTN test case, I thought that I would demonstrate a non-optimal work area execution by determining how many unique Roman numbers made it into the T1 test table:

SELECT
  COUNT(DISTINCT C3)
FROM
  T1;

COUNT(DISTINCTC3)
-----------------
             4000 

Just 4,000?  There are 10,000,000 rows in that test table.  So, maybe we should ask Oracle Database what number immediately follows 3,999 by querying the table:

SELECT
  C2,
  C3
FROM
  T1
WHERE
  C2 BETWEEN 3950 AND 4050
ORDER BY
  C2;

---------- ------------------------------
      3950          MMMCML
      3951         MMMCMLI
      3952        MMMCMLII
      3953       MMMCMLIII
      3954        MMMCMLIV
      3955         MMMCMLV
      3956        MMMCMLVI
      3957       MMMCMLVII
      3958      MMMCMLVIII
      3959        MMMCMLIX
      3960         MMMCMLX
      3961        MMMCMLXI
      3962       MMMCMLXII
      3963      MMMCMLXIII
      3964       MMMCMLXIV
      3965        MMMCMLXV
      3966       MMMCMLXVI
      3967      MMMCMLXVII
      3968     MMMCMLXVIII
      3969       MMMCMLXIX
      3970        MMMCMLXX
      3971       MMMCMLXXI
      3972      MMMCMLXXII
      3973     MMMCMLXXIII
      3974      MMMCMLXXIV
      3975       MMMCMLXXV
      3976      MMMCMLXXVI
      3977     MMMCMLXXVII
      3978    MMMCMLXXVIII
      3979      MMMCMLXXIX
      3980       MMMCMLXXX
      3981      MMMCMLXXXI
      3982     MMMCMLXXXII
      3983    MMMCMLXXXIII
      3984     MMMCMLXXXIV
      3985      MMMCMLXXXV
      3986     MMMCMLXXXVI
      3987    MMMCMLXXXVII
      3988   MMMCMLXXXVIII
      3989     MMMCMLXXXIX
      3990         MMMCMXC
      3991        MMMCMXCI
      3992       MMMCMXCII
      3993      MMMCMXCIII
      3994       MMMCMXCIV
      3995        MMMCMXCV
      3996       MMMCMXCVI
      3997      MMMCMXCVII
      3998     MMMCMXCVIII
      3999       MMMCMXCIX
      4000 ###############
      4001 ###############
      4002 ###############
      4003 ###############
      4004 ###############
      4005 ###############
      4006 ###############
      4007 ###############
      4008 ###############
      4009 ###############
      4010 ###############
      4011 ###############
      4012 ###############
      4013 ###############
      4014 ###############
      4015 ###############
      4016 ###############
      4017 ###############
      4018 ###############
      4019 ###############
      4020 ###############
      4021 ###############
      4022 ###############
      4023 ###############
      4024 ###############
      4025 ###############
      4026 ###############
      4027 ###############
      4028 ###############
      4029 ###############
      4030 ###############
      4031 ###############
      4032 ###############
      4033 ###############
      4034 ###############
      4035 ###############
      4036 ###############
      4037 ###############
      4038 ###############
      4039 ###############
      4040 ###############
      4041 ###############
      4042 ###############
      4043 ###############
      4044 ###############
      4045 ###############
      4046 ###############
      4047 ###############
      4048 ###############
      4049 ###############
      4050 ############### 

You heard it here first, in Roman times 4,000 is equivalent to infinity.  An obvious extension to this rule is that in Roman times the value of PI had exactly 3999 digits to the right of the decimal point.   :-)





What is the Meaning of the %CPU Column in an Explain Plan? 2

1 01 2012

January 1, 2012

(Back to the Previous Post in the Series)

Nearly two years ago I posted the following execution plan in a blog article and asked whether or not there was anything strange about the %CPU column:

--------------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name     | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |          |       |       |   247 (100)|          |        |      |            |
|   1 |  PX COORDINATOR      |          |       |       |            |          |        |      |            |
|   2 |   PX SEND QC (RANDOM)| :TQ10000 | 10000 |  2236K|   247   (1)| 00:00:03 |  Q1,00 | P->S | QC (RAND)  |
|   3 |    PX BLOCK ITERATOR |          | 10000 |  2236K|   247   (1)| 00:00:03 |  Q1,00 | PCWC |            |
|*  4 |     TABLE ACCESS FULL| T1       | 10000 |  2236K|   247   (1)| 00:00:03 |  Q1,00 | PCWP |            |
--------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access(:Z>=:Z AND :Z<=:Z)
       filter(("C1"<=10000 AND "C1">=1)) 

For nearly two years no one mentioned that it is strange that the row in the execution plan with ID 0 can have a Cost of 247 with a %CPU of 100, while that operation’s child operations can have the same Cost of 247 with a %CPU of 1.  I had long forgotten about the challenge that I posed in the blog article, until someone mentioned the execution plan in an OTN forum thread and asked about that particular oddity.

In the earlier article I demonstrated querying the PLAN_TABLE after using EXPLAIN PLAN FOR to determine how the %CPU column is calculated.  Unfortunately, I did not perform that step two years ago for the SQL statement that was used to generate the above execution plan, so the challenge remains.

As best as I am able to determine, the following is the table creation script:

DROP TABLE T1 PURGE;

CREATE TABLE T1 (
  C1 NUMBER NOT NULL,
  C2 NUMBER NOT NULL,
  C3 NUMBER NOT NULL,
  C4 NUMBER NOT NULL,
  C5 VARCHAR2(30) NOT NULL,
  C6 VARCHAR2(30) NOT NULL,
  FILLER VARCHAR2(200),
  PRIMARY KEY (C1));

INSERT INTO T1
SELECT
  ROWNUM,
  ROWNUM,
  TRUNC(ROWNUM/100+1),
  TRUNC(ROWNUM/100+1),
  CHR(65+TRUNC(ROWNUM/10000))||TRUNC(ROWNUM/100+1),
  CHR(65+TRUNC(ROWNUM/10000))||TRUNC(ROWNUM/100+1),
  LPAD('A',200,'A')
FROM
  DUAL
CONNECT BY
  LEVEL<=100000;

COMMIT;

CREATE INDEX IND_T1_C3 ON T1(C3);
CREATE INDEX IND_T1_C4 ON T1(C4);
CREATE INDEX IND_T1_C5 ON T1(C5);

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T1',CASCADE=>TRUE,ESTIMATE_PERCENT=>100,METHOD_OPT=>'FOR COLUMNS SIZE 254 C2, C4, C6') 

I originally created the execution plan on an unpatched version of Oracle Database 11.2.0.1 running on 64 bit Linux.  The following is the SQL statement that was used to generate the execution plan:

SELECT /*+ PARALLEL(4) */
  *
FROM
  T1
WHERE
  C1 BETWEEN 1 AND 10000; 

Unfortunately, the above SQL statement generates a bit of a different looking execution plan when not hinted to perform a full table scan.  In an Oracle Database 11.2.0.2 database, the following execution plan appeared (although a similar one also appears for 11.2.0.1):

Plan hash value: 2275811211

--------------------------------------------------------------------------------------------
| Id  | Operation                   | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |              |   250 | 39750 |    17   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1           |   250 | 39750 |    17   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | SYS_C0041509 |   450 |       |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C1">=1 AND "C1"<=10000)

Note
-----
   - Degree of Parallelism is 1 because of hint 

The execution plan did not employ parallel execution even though it was hinted, because the calculated cost for the index access was less than the calculated cost for the parallel full table scan.  Note also that the cardinality estimate is incorrect  – this query will actually return 10,000 rows (for a side challenge, how is it possible that the index is projected to return 450 rows, when the parent operation is expected to return just 250 rows without a filter predicate applied at the parent operation?).

Let’s add a couple of hints to fix the cardinality issue (note that this particular hint is undocumented), and force the parallel table scan:

DELETE FROM PLAN_TABLE;

EXPLAIN PLAN FOR
SELECT /*+ PARALLEL(4) FULL(T1) CARDINALITY(T1 10000) */
  *
FROM
  T1
WHERE
  C1 BETWEEN 1 AND 10000;

SELECT
  ID,
  COST,
  IO_COST,
  COST-IO_COST DIFF,
  CEIL(DECODE(COST,0,0,(COST-IO_COST)/COST)*100) PER_CPU,
  CPU_COST
FROM
  PLAN_TABLE;

SET LINESIZE 120
SET PAGESIZE 1000

SELECT
  *
FROM
  TABLE(DBMS_XPLAN.DISPLAY); 

On the unpatched 11.2.0.1 database, the above script produced the following output:

ID       COST    IO_COST       DIFF    PER_CPU   CPU_COST
-- ---------- ---------- ---------- ---------- ----------
 0        247        246          1          1    5958333
 1
 2        247        246          1          1    5958333
 3        247        246          1          1    5958333
 4        247        246          1          1    5958333

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------
Plan hash value: 2494645258

--------------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name     | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |          | 10000 |  1552K|   247   (1)| 00:00:03 |        |      |            |
|   1 |  PX COORDINATOR      |          |       |       |            |          |        |      |            |
|   2 |   PX SEND QC (RANDOM)| :TQ10000 | 10000 |  1552K|   247   (1)| 00:00:03 |  Q1,00 | P->S | QC (RAND)  |
|   3 |    PX BLOCK ITERATOR |          | 10000 |  1552K|   247   (1)| 00:00:03 |  Q1,00 | PCWC |            |
|*  4 |     TABLE ACCESS FULL| T1       | 10000 |  1552K|   247   (1)| 00:00:03 |  Q1,00 | PCWP |            |
--------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter("C1">=1 AND "C1"<=10000)

Note
-----
   - Degree of Parallelism is 4 because of hint 

The above execution plan is similar to the execution plan at the start of this article (even the costs are identical), yet not exactly the same.  The Predicate Information section of the execution plan is missing access(:Z>=:Z AND :Z<=:Z) and the %CPU column shows a value of 1 from bottom to top, rather than 100 for the top row in the plan.  The output from the query of PLAN_TABLE exactly matches the above output from DBMS_XPLAN.DISPLAY function.  So, what happened, how did I generate the execution plan that appears at the start of this blog article?

You might be curious – are we able to force the first line in the execution plan to show 100 in the %CPU column?  Let’s try an experiment where we manually change the IO_COST value for the row in the PLAN_TABLE where the ID column value is equal to 0:

UPDATE
  PLAN_TABLE
SET
  IO_COST=0
WHERE
  ID=0;

SELECT
  *
FROM
  TABLE(DBMS_XPLAN.DISPLAY); 

With the above change, this is the execution plan that I obtained:

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------
Plan hash value: 2494645258

--------------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name     | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |          | 10000 |  1552K|   247 (100)| 00:00:03 |        |      |            |
|   1 |  PX COORDINATOR      |          |       |       |            |          |        |      |            |
|   2 |   PX SEND QC (RANDOM)| :TQ10000 | 10000 |  1552K|   247   (1)| 00:00:03 |  Q1,00 | P->S | QC (RAND)  |
|   3 |    PX BLOCK ITERATOR |          | 10000 |  1552K|   247   (1)| 00:00:03 |  Q1,00 | PCWC |            |
|*  4 |     TABLE ACCESS FULL| T1       | 10000 |  1552K|   247   (1)| 00:00:03 |  Q1,00 | PCWP |            |
--------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter("C1">=1 AND "C1"<=10000)

Note
-----
   - Degree of Parallelism is 4 because of hint 

The first line in the above execution plan now shows 100 in the %CPU column – so if a 0 were present in that row/column combination in the PLAN_TABLE when I generated the execution plan at the start of this article, that might be one explanation for the strange output.  Note, however, that the Predicate Information section still does not match.  So, did I fake the execution plan through manipulation of PLAN_TABLE, or did I actually execute the SQL statement and pull from memory the actual execution plan:

SELECT /*+ PARALLEL(4) FULL(T1) CARDINALITY(T1 10000) */
  *
FROM
  T1
WHERE
  C1 BETWEEN 1 AND 10000;

SELECT
  *
FROM
  TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'TYPICAL')); 

When I executed the above, this is the execution plan that appeared:

SQL_ID  4r3zujtat35jb, child number 0
-------------------------------------
SELECT /*+ PARALLEL(4) FULL(T1) CARDINALITY(T1 10000) */   * FROM   T1
WHERE   C1 BETWEEN 1 AND 10000

Plan hash value: 2494645258

--------------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name     | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |          |       |       |   247 (100)|          |        |      |            |
|   1 |  PX COORDINATOR      |          |       |       |            |          |        |      |            |
|   2 |   PX SEND QC (RANDOM)| :TQ10000 | 10000 |  1552K|   247   (1)| 00:00:03 |  Q1,00 | P->S | QC (RAND)  |
|   3 |    PX BLOCK ITERATOR |          | 10000 |  1552K|   247   (1)| 00:00:03 |  Q1,00 | PCWC |            |
|*  4 |     TABLE ACCESS FULL| T1       | 10000 |  1552K|   247   (1)| 00:00:03 |  Q1,00 | PCWP |            |
--------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access(:Z>=:Z AND :Z<=:Z)
       filter(("C1">=1 AND "C1"<=10000))

Note
-----
   - Degree of Parallelism is 4 because of hint 

Notice in the above execution plan that the %CPU column matches the values in the execution plan that appears at the start of this blog article.  Also note that the missing access(:Z>=:Z AND :Z<=:Z) now appears in the Predicate Information section.  Another possible case of explain plan lies?  I wonder what is the point of the :Z >= :Z and :Z <= :Z access predicate?  That is the equivalent of :Z BETWEEN :Z AND :Z, or 1 BETWEEN 1 AND 1 (assuming that the value of the :Z bind variable is not NULL), if you prefer.

Now that I have reproduced the execution plan at the start of this blog article (ignoring the predicted Bytes column), why did the DBMS_XPLAN function display a value of 100 in the %CPU column of the first line in the execution plan?  Is it caused by a bug in DBMS_XPLAN, a bug in the 11.2.0.1 optimizer, or something else?

Recall earlier in this blog article that I tried an experiment of changing the IO_COST for a row in PLAN_TABLE to a value of 0, and that change resulted in a value of 100 appearing in the %CPU column – does that mean that the IO_COST for the first row in the execution plan is 0?  In an execution plan, the costs shown for parent operations includes the cost shown for child operations…

In this case we cannot query the PLAN_TABLE, but we are able to query V$SQL_PLAN for the same type of information using the SQL_ID and Plan hash value that appeared in the above DBMS_XPLAN output:

SELECT
  ID,
  COST,
  IO_COST,
  COST-IO_COST DIFF,
  CPU_COST
FROM
  V$SQL_PLAN
WHERE
  SQL_ID='4r3zujtat35jb'
  AND PLAN_HASH_VALUE=2494645258
ORDER BY
  ID;

        ID       COST    IO_COST       DIFF   CPU_COST
---------- ---------- ---------- ---------- ----------
         0        247
         1
         2        247        246          1    5958333
         3        247        246          1    5958333
         4        247        246          1    5958333 

A NULL value for the IO_COST in execution plan line ID 0 (the first line in the execution plan)…

In the previous article, I proposed that the %CPU column is calculated with the following formula:

CEIL(DECODE(COST,0,0,(COST-IO_COST)/COST)*100) PER_CPU 

If the above were true, then a NULL would appear in the %CPU column for the first row.  It appears that I need to adjust the formula slightly to account for the possibility of a NULL value in the IO_COST column and handle it as if the value were 0 (who says that NULL is not equal to 0?).

CEIL(DECODE(COST,0,0,(COST-NVL(IO_COST,0))/COST)*100) PER_CPU

Later in the OTN thread the OP expressed concern about two SQL statements and their execution plans, where the “the cost per se of a single execution is low, but since the query is executed many number of times, the cpu consumption is hitting a high.”  Feel free to read the OTN thread to see my comments related to this concern.





BIN$ Index Found in the Execution Plan – Digging through the Recycle Bin

28 11 2011

November 28, 2011

There are a few articles that can be found on the Internet that describe the cause of indexes with names similar to BIN$ld5VAtb88PngQAB/AQF8hg==$0 and BIN$PESygWW5R0WhbOaDugxqwQ==$0 appearing in execution plans.  As is likely known by readers, the Oracle Database documentation describes that these object names are associated with the recycle bin that was introduced in Oracle Database 10.1.  When an object is dropped (but not purged), it is placed into the recycle bin with a name that begins with BIN$ and ends with ==$ followed by a number (the version, which in brief testing seems to always be 0).

I have answered this question a couple of times in the past in various Oracle Database forums, including a recent OTN thread.  What is the significance of having an index named, for instance, BIN$ld5VAtb88PngQAB/AQF8hg==$0 in an execution plan.  Does that mean that Oracle’s query optimizer has selected to use a dropped index?  No.  The simple answer is that the table to which the index belongs was dropped and then flashed back to before the drop.  When this happens, the table name is restored to its original name, but the names of the associated indexes are not restored.

A quick test case to demonstrate.  First, we will create a table with an index, and then collect statistics:

DROP TABLE T1 PURGE;

CREATE TABLE T1 AS
SELECT
  ROWNUM C1,
  RPAD('A',255,'A') C2
FROM
  DUAL
CONNECT BY
  LEVEL<=10000;

COMMIT;

CREATE INDEX IND_T1_C1 ON T1(C1);

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T1',CASCADE=>TRUE) 

Let’s put together a simple SQL statement that will hopefully use the index, and then confirm that the index was in fact used by displaying the actual execution plan used for the SQL statement:

SELECT
  C1,
  SUBSTR(C2,1,2)
FROM
  T1
WHERE
  C1<=5;

 C1 SU
--- --
  1 AA
  2 AA
  3 AA
  4 AA
  5 AA

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,NULL));

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------
SQL_ID  g0kkvxqg3v145, child number 0
-------------------------------------
SELECT   C1,   SUBSTR(C2,1,2) FROM   T1 WHERE   C1<=5

Plan hash value: 683303157

-----------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |       |       |     3 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1        |     5 |  1300 |     3   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IND_T1_C1 |     5 |       |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C1"<=5) 

The above plan shows that the IND_T1_C1 index was used for this particular SQL statement.

Next, we will purge the recycle bin (make certain that there is nothing useful in the recycle bin first), drop the index, and see if it can still be used in an execution plan:

DROP INDEX IND_T1_C1;

SELECT
  C1,
  SUBSTR(C2,1,2)
FROM
  T1
WHERE
  C1<=5;

 C1 SU
--- --
  1 AA
  2 AA
  3 AA
  4 AA
  5 AA

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,NULL));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------
SQL_ID  g0kkvxqg3v145, child number 0
-------------------------------------
SELECT   C1,   SUBSTR(C2,1,2) FROM   T1 WHERE   C1<=5

Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |       |       |   137 (100)|          |
|*  1 |  TABLE ACCESS FULL| T1   |     5 |  1300 |   137   (5)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("C1"<=5)

/* SHOW RECYCLEBIN  should be roughly equivalent to the following SQL statement */
SELECT
  OBJECT_NAME,
  ORIGINAL_NAME,
  TYPE,
  CREATETIME
FROM
  RECYCLEBIN
ORDER BY
  TYPE,
  ORIGINAL_NAME,
  OBJECT_NAME;

no rows selected 

From the above, it is apparent that an index that is dropped will not appear in an execution plan.  The dropped index is not in the recycle bin either.

Let’s recreate the index:

CREATE INDEX IND_T1_C1 ON T1(C1);

Then, using SQL statements similar to those at the start of this article (without the PURGE clause in the DROP TABLE statement), we will drop and recreate the table:

DROP TABLE T1;

CREATE TABLE T1 AS
SELECT
  ROWNUM C1,
  RPAD('A',255,'A') C2
FROM
  DUAL
CONNECT BY
  LEVEL<=10000;

COMMIT;

CREATE INDEX IND_T1_C1 ON T1(C1);

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T1',CASCADE=>TRUE) 

Next, let’s take a look at the recycle bin (note that we could simply execute SHOW RECYCLEBIN rather than execute the SQL statement, but the index would not be listed using that method):

SELECT
  OBJECT_NAME,
  ORIGINAL_NAME,
  TYPE,
  CREATETIME
FROM
  RECYCLEBIN
ORDER BY
  TYPE,
  ORIGINAL_NAME,
  OBJECT_NAME,
  CREATETIME;

OBJECT_NAME                    ORIGINAL_N TYPE  CREATETIME
------------------------------ ---------- ----- -------------------
BIN$cU4bWUSaSu2PUYdJvOq+hA==$0 IND_T1_C1  INDEX 2011-11-28:07:21:30
BIN$2smXLnTGTSqcBa8SJucvtg==$0 T1         TABLE 2011-11-28:07:21:30 

The above shows that we now have one table and its index in the recycle bin.  Let’s repeat the drop and recreate:

DROP TABLE T1;

CREATE TABLE T1 AS
SELECT
  ROWNUM C1,
  RPAD('A',255,'A') C2
FROM
  DUAL
CONNECT BY
  LEVEL<=10000;

COMMIT;

CREATE INDEX IND_T1_C1 ON T1(C1);

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T1',CASCADE=>TRUE) 

Now let’s take a look at the recycle bin’s contents again:

SELECT
  OBJECT_NAME,
  ORIGINAL_NAME,
  TYPE,
  CREATETIME
FROM
  RECYCLEBIN
ORDER BY
  TYPE,
  ORIGINAL_NAME,
  OBJECT_NAME,
  CREATETIME;

OBJECT_NAME                    ORIGINAL_N TYPE  CREATETIME
------------------------------ ---------- ----- -------------------
BIN$cU4bWUSaSu2PUYdJvOq+hA==$0 IND_T1_C1  INDEX 2011-11-28:07:21:30
BIN$/40oC3RJSNiLmEESZ7VNEw==$0 IND_T1_C1  INDEX 2011-11-28:07:21:48
BIN$2smXLnTGTSqcBa8SJucvtg==$0 T1         TABLE 2011-11-28:07:21:30
BIN$nYId4wdGRf6IgpSXSDb4Kw==$0 T1         TABLE 2011-11-28:07:21:48 

The above output now shows that there are two tables and their associated indexes in the recycle bin.  Let’s recover one of those tables and its index:

FLASHBACK TABLE T1 TO BEFORE DROP;

Flashback complete. 

A quick check of the recycle bin shows that the most recently dropped table and its associated index are no longer in the recycle bin, but the older version of table T1 and its index are still in the recycle bin:

SELECT
  OBJECT_NAME,
  ORIGINAL_NAME,
  TYPE,
  CREATETIME
FROM
  RECYCLEBIN
ORDER BY
  TYPE,
  ORIGINAL_NAME,
  OBJECT_NAME,
  CREATETIME; 

OBJECT_NAME                    ORIGINAL_N TYPE  CREATETIME
------------------------------ ---------- ----- -------------------
BIN$cU4bWUSaSu2PUYdJvOq+hA==$0 IND_T1_C1  INDEX 2011-11-28:07:21:30
BIN$2smXLnTGTSqcBa8SJucvtg==$0 T1         TABLE 2011-11-28:07:21:30

Let’s re-execute the SQL statement that queries table T1:

SELECT
  C1,
  SUBSTR(C2,1,2)
FROM
  T1
WHERE
  C1<=5;

 C1 SU
--- --
  1 AA
  2 AA
  3 AA
  4 AA
  5 AA 

So, after recoving the table from the recycle bin, we are able to query the table.  Let’s take a look at the execution plan for this query:

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,NULL));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------
SQL_ID  g0kkvxqg3v145, child number 0
-------------------------------------
SELECT   C1,   SUBSTR(C2,1,2) FROM   T1 WHERE   C1<=5

Plan hash value: 1395723482

--------------------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name                           | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                                |       |       |     3 (100)|       |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1                             |     5 |  1300 |     3   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | BIN$/40oC3RJSNiLmEESZ7VNEw==$0 |     5 |       |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C1"<=5) 

Notice in the above execution plan, the index name of BIN$/40oC3RJSNiLmEESZ7VNEw==$0 – that is what the index was named when it was sent to the recycle bin.  Let’s fix the odd BIN$ name and re-execute the query:

ALTER INDEX "BIN$/40oC3RJSNiLmEESZ7VNEw==$0" RENAME TO IND_T1_C1;

Index altered. 

SELECT
  C1,
  SUBSTR(C2,1,2)
FROM
  T1
WHERE
  C1<=5;

 C1 SU
--- --
  1 AA
  2 AA
  3 AA
  4 AA
  5 AA

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,NULL));

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------
SQL_ID  g0kkvxqg3v145, child number 0
-------------------------------------
SELECT   C1,   SUBSTR(C2,1,2) FROM   T1 WHERE   C1<=5

Plan hash value: 683303157

-----------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |       |       |     3 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1        |     5 |  1300 |     3   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IND_T1_C1 |     5 |       |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C1"<=5)

So, the above shows how a BIN$ named index might appear in an execution plan, and how to fix the name.

We still have one table and its index in the recycle bin.  Let’s take a quick look at that table:

SELECT
  C1,
  SUBSTR(C2,1,2)
FROM
  "BIN$2smXLnTGTSqcBa8SJucvtg==$0"
WHERE
  C1<=5;

 C1 SU
--- --
  1 AA
  2 AA
  3 AA
  4 AA
  5 AA 

We are able to query the table that is in the recycle bin, as long as we enclose the table name (OBJECT_NAME in the query of RECYCLEBIN) in quotation marks ().  Let’s take a look at the execution plan for the previous SQL statement:

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,NULL));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------
SQL_ID  a2a2vcsbtw5ac, child number 0
-------------------------------------
SELECT   C1,   SUBSTR(C2,1,2) FROM   "BIN$2smXLnTGTSqcBa8SJucvtg==$0"
WHERE   C1<=5

Plan hash value: 3681245720

--------------------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name                           | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                                |       |       |     3 (100)|       |
|   1 |  TABLE ACCESS BY INDEX ROWID| BIN$2smXLnTGTSqcBa8SJucvtg==$0 |     5 |  1300 |     3   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | BIN$cU4bWUSaSu2PUYdJvOq+hA==$0 |     5 |       |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C1"<=5) 

Now we have both a BIN$ prefixed table and index in the execution plan.

Let’s recover the old version of the T1 table (renaming it when it is recovered) and rename its associated recovered index:

FLASHBACK TABLE T1 TO BEFORE DROP RENAME TO T1_OLD;

Flashback complete.

ALTER INDEX "BIN$cU4bWUSaSu2PUYdJvOq+hA==$0" RENAME TO IND_T1_OLD_C1;

Index altered. 

Let’s query the recovered table (now called T1_OLD) and check the execution plan:

SELECT
  C1,
  SUBSTR(C2,1,2)
FROM
  T1_OLD
WHERE
  C1<=5;

 C1 SU
--- --
  1 AA
  2 AA
  3 AA
  4 AA
  5 AA

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,NULL));

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------
SQL_ID  bv1ty7jq2hc5g, child number 0
-------------------------------------
SELECT   C1,   SUBSTR(C2,1,2) FROM   T1_OLD WHERE   C1<=5

Plan hash value: 3358254750

---------------------------------------------------------------------------------------------
| Id  | Operation                   | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |               |       |       |     3 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1_OLD        |     5 |  1300 |     3   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IND_T1_OLD_C1 |     5 |       |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C1"<=5) 

Now let’s make certain that there is nothing in the recycle bin:

SELECT
  OBJECT_NAME,
  ORIGINAL_NAME,
  TYPE,
  CREATETIME
FROM
  RECYCLEBIN
ORDER BY
  TYPE,
  ORIGINAL_NAME,
  OBJECT_NAME,
  CREATETIME;

no rows selected 

The experimental tables and indexes are not in the recycle bin.  Let’s drop the experimental tables that we recovered from the recycle bin, this time skipping the recycle bin:

DROP TABLE T1 PURGE;
DROP TABLE T1_OLD PURGE; 

Just to confirm that the tables and their associated indexes are not in the recycle bin:

SELECT
  OBJECT_NAME,
  ORIGINAL_NAME,
  TYPE,
  CREATETIME
FROM
  RECYCLEBIN
ORDER BY
  TYPE,
  ORIGINAL_NAME,
  OBJECT_NAME,
  CREATETIME;

no rows selected 

—–

Hopefully, by now everyone is aware of those BIN$ prefixed object names that might appear in execution plans, what causes the names to be created, and how to fix the names.





Why Isn’t My Index Used… When USER2 Executes this Query?

23 11 2011

November 23, 2011

I previously wrote a couple of articles that mention reasons why an index might not be use for a particular query, including an article that was formatted as a True or False quiz with several reference articles.  A few days ago I saw an OTN thread that caught my curiosity, where the original poster (OP) claimed that the optimizer simply will not use an index to access a table when any user other than the schema owner or the SYS user executes a particular query.

Why is the OP attempting to execute the SQL statement as the SYS user?  The SYS user is special.  As mentioned in my review of the book “Practical Oracle 8i“, as I read the book I wrote the following paraphrase into my notes, the SYS user is special:

Oracle 8i introduces row level security, which uses a PL/SQL function to apply an additional WHERE clause predicate to a table – row level security does not apply to the SYS user. It is important to use CONSISTENT=Y when exporting partitioned tables. When CONSISTENT=N is specified, the export of each partition in a table is treated as a separate transaction, and may be exported at a different SCN number (incremented when any session commits). When tables are exported which contain nested tables, the two physical segments are exported in separate transactions, potentially resulting in inconsistent data during the import if the export was performed with the default CONSISTENT=N.

Is the above paraphrase from this 10 year old book a clue?  Maybe it is a problem related to secure view merging because the SQL statement uses the index when the schema owner executes the SQL statement (there is a very good example of this type of problem found in the book “Troubleshooting Oracle Performance“).  Maybe it is a problem where the public synonym for the table actually points to a view or an entirely different table – the execution plan for the non-schema owner did show a VIEW operation, while the execution plan for the schema owner did not show the VIEW operation.  Maybe it is a problem where the optimizer parameters are adjusted differently for different users – in such a case we might need to dig into the V$SYS_OPTIMIZER_ENV, V$SES_OPTIMIZER_ENV, and V$SQL_OPTIMIZER_ENV views.

Maybe taking a look at the DBMS_XPLAN output would help.  Why does the Predicate Information section of the execution plan show the following only for the non-schema owner?

7 - filter(("SEAL_FLAG" IS NULL OR "SEAL_FLAG"<>'Y'))
9 - filter(("SEAL_FLAG" IS NULL OR "SEAL_FLAG"<>'Y'))
11 - filter(("SEAL_FLAG"<>'Y' OR "SEAL_FLAG" IS NULL))
13 - filter(("SEAL_FLAG"<>'Y' OR "SEAL_FLAG" IS NULL))
19 - filter(("SEAL_FLAG"<>'Y' OR "SEAL_FLAG" IS NULL)) 

A significant clue?  If those predicates were also found in the DBMS_XPLAN generated output for the schema owner (and the SYS user), I would probably conclude that the optimizer generated those additional predicates from defined column constraints, and that a review of a 10053 trace file might help determine what caused those predicates to be automatically created.  However, those predicates did not appear in the execution plan that was generated for the schema owner.  It might be time to start checking the V$VPD_POLICY view for this particular SQL_ID, for example (a completely unrelated test case output):

SELECT
  *
FROM
  V$VPD_POLICY
WHERE
  SQL_ID='6hqw5p9d8g8wf';

ADDRESS          PARADDR            SQL_HASH SQL_ID        CHILD_NUMBER OBJECT_OWNER OBJECT_NAME                    POLICY_GROUP                   POLICY                 POLICY_FUNCTION_OWNER          PREDICATE
---------------- ---------------- ---------- ------------- ------------ ------------ ------------------------------ ------------------------------ ---------------------- ------------------------------ ------------------------------------------------------------------------------------
000007FFB7701608 000007FFB7743350 1518838670 6hqw5p9d8g8wf            0 TESTUSER     T12                            SYS_DEFAULT                    T_SEC                  TESTUSER                       ID < 10 

Maybe we should also check some of the other virtual private database (VPD) related views including ALL_POLICIES (once again from a completely unrelated test case):

SELECT
  *
FROM
  ALL_POLICIES;

OBJECT_OWNER                   OBJECT_NAME                    POLICY_GROUP                  POLICY_NAME                    PF_OWNER                       PACKAGE                       FUNCTION                       SEL INS UPD DEL IDX CHK ENA STA POLICY_TYPE              LON
------------------------------ ------------------------------ ----------------------------- ------------------------------ ------------------------------ ----------------------------- ------------------------------ --- --- --- --- --- --- --- --- ------------------------ ---
TESTUSER                       T12                            SYS_DEFAULT                   T_SEC                          TESTUSER                       S                                                            YES YES YES YES NO  NO  YES NO  DYNAMIC                  NO 

There are known performance problems related to the use of VPD, some of which are Oracle Database version dependent, and some of which have been corrected in recent versions.  Maybe a quick check of one of the following articles would help, if the OP finds that VPD is in fact in use (the second article provides step by step directions for investigation):

  • Metalink (MOS) Doc ID 728292.1 “Known Performance Issues When Using TDE and Indexes on the Encrypted Columns”
  • Metalink (MOS) Doc ID 967042.1 “How to Investigate Query Performance Regressions Caused by VPD (FGAC) Predicates?”

Take a look at the OTN thread.  Any other suggestions for the OP?





Select For Update – In What Order are the Rows Locked?

21 11 2011

November 21, 2011

A recent thread in the comp.databases.oracle.server usenet group asked whether or not a SELECT FOR UPDATE statement locks rows in the order specified by the ORDER BY clause.  Why might this be an important question?  Possibly if the SKIP LOCKED clause is implemented in the SELECT FOR UPDATE statement?  Possibly if a procedure is hanging, and it is important to know at what point an enqueue happened?  Any other reasons?

Without testing, I would have suggested that the rows are locked as the rows are read from the table blocks, and not after the ORDER BY clause alters the order of the rows.  Why?  Oracle Database is fundamentally lazy, or put another way, fundamentally efficient – in general it does not perform unnecessary work.  Locking the rows after applying the ORDER BY clause would require a second visit to the table blocks (possibly having to visit each block multiple times to lock different rows in the same block) in the order specified by the ORDER BY clause.  Such an approach could be incredibly inefficient and also error prone (what happens if a row was locked by a second session while the first session was sorting the rows per the ORDER BY clause?).

We could guess, but why guess when we are able to easily test the theory?  Let’s create a simple table with 10,000 rows:

DROP TABLE T1 PURGE;

CREATE TABLE T1 AS
SELECT
  ROWNUM C1,
  RPAD('A',255,'A') C2
FROM
  DUAL
CONNECT BY
  LEVEL<=10000;

COMMIT; 

Now we need 2 sessions (I will call them Session 1 and Session 2).  We will execute the same SELECT FOR UPDATE statement in both sessions, with Session 1 first selecting the table rows in ascending order and then Session 2 selecting the table rows in descending order.  If the ORDER BY clause determines the order in which the rows are locked, the row with a C1 value of 10,000 should be identified as the row that caused the enqueue because that is the first row that should be returned to Session 2.

In Session 1:

SELECT
  C1,
  C2
FROM
  T1
WHERE
  MOD(C1,100)=0
ORDER BY
  C1
FOR UPDATE; 

In Session 2:

SELECT
  C1,
  C2
FROM
  T1
WHERE
  MOD(C1,100)=0
ORDER BY
  C1 DESC
FOR UPDATE; 

(Session 2 is hung)

Let’s try to identify the row that Session 2 is waiting to lock.  There are a couple of ways to accomplish this task (on Oracle Database 10.1 and higher the join to the V$SESSION_WAIT view is unnecessary because that information is found in V$SESSION).

In Session 1:

COLUMN EVENT FORMAT A20
COLUMN OBJ# FORMAT 999999
COLUMN BLOCK# FORMAT 999999
COLUMN ROW# FORMAT 99

SELECT
  SW.EVENT,
  S.ROW_WAIT_OBJ# OBJ#,
  S.ROW_WAIT_FILE# FILE#,
  S.ROW_WAIT_BLOCK# BLOCK#,
  S.ROW_WAIT_ROW# ROW#
FROM
  V$SESSION_WAIT SW,
  V$SESSION S
WHERE
  S.USERNAME IS NOT NULL
  AND SW.SID=S.SID
  AND SW.EVENT NOT LIKE '%SQL*Net%'
  AND SW.EVENT NOT IN ('Streams AQ: waiting for messages in the queue', 'wait for unread message on broadcast channel');

EVENT                   OBJ#      FILE#  BLOCK# ROW#
-------------------- ------- ---------- ------- ----
enq: TX - row lock c   71913          4    4262   18
ontention 

We have the object ID (71913 – technically we need the DATA_OBJECT_ID, not the OBJECT_ID for the DBMS_ROWID.ROWID_CREATE call, but the two values will be identical in this test case), absolute file number (4), block (4262), and row (18) in the block that caused the enqueue.  Let’s select that row from the table:

SELECT
  C1
FROM
  T1
WHERE
  ROWID=DBMS_ROWID.ROWID_CREATE(1, 71913, 4, 4262, 18);

 C1
---
100 

The row with a C1 value of 100 is the first row that was returned by Session 1 (it will be the last row requested to be returned by Session 2), but it is also the first row that matched the WHERE clause predicates for Session 2’s SQL statement as the blocks were read (you could confirm the order in which the blocks are read by flushing the buffer cache before the select, setting the DB_FILE_MULTIBLOCK_READ_COUNT parameter at the session level to 1, enabling a 10046 trace, and then executing the SELECT FOR UPDATE statement).

Let’s retrieve the execution plan for Session 2 to determine why the original poster (OP) might have throught that the rows were locked after the ORDER BY is applied (note that the SQL_ID and SQL_CHILD_NUMBER columns only exist in V$SESSION as of Oracle Database 10.1, so technically the join to V$SESSION_WAIT is unnecessary; however, for consistency with the previous SQL statement that determined the locked row, the join to the V$SESSION_WAIT view is included):

SELECT
  S.SQL_ID,
  S.SQL_CHILD_NUMBER
FROM
  V$SESSION_WAIT SW,
  V$SESSION S
WHERE
  S.USERNAME IS NOT NULL
  AND SW.SID=S.SID
  AND SW.EVENT NOT LIKE '%SQL*Net%'
  AND SW.EVENT NOT IN ('Streams AQ: waiting for messages in the queue', 'wait for unread message on broadcast channel');

SQL_ID        SQL_CHILD_NUMBER
------------- ----------------
2dnpymtj0rc1r                0 

Now retrieving the execution plan:

SELECT
  *
FROM
  TABLE(DBMS_XPLAN.DISPLAY_CURSOR('2dnpymtj0rc1r',0,'TYPICAL'));

SQL_ID  2dnpymtj0rc1r, child number 0
-------------------------------------
SELECT   C1,   C2 FROM   T1 WHERE   MOD(C1,100)=0 ORDER BY   C1 DESC
FOR UPDATE

Plan hash value: 3991553210

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |       |       |    33 (100)|          |
|   1 |  FOR UPDATE         |      |       |       |            |          |
|   2 |   SORT ORDER BY     |      |   123 | 17466 |    33   (4)| 00:00:01 |
|*  3 |    TABLE ACCESS FULL| T1   |   123 | 17466 |    32   (0)| 00:00:01 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter(MOD("C1",100)=0)

Note
-----
   - dynamic sampling used for this statement (level=2) 

As mentioned by the OP, the execution plan appears to be slightly misleading – unless of course you remember that locking the rows after sorting the rows based on the ORDER BY clause  would require revisiting the rows in the table blocks.  The situation could be different in this test case if there was an index on column C1.  In such a case the index could be read in descending order, thus making it appear that the rows were attempted to be locked in the order described by the ORDER BY clause.

Let’s slightly adapt the original test case to test the second theory, that it could appear that the rows are locked in the order specified by the ORDER BY clause.

In Session 1:

ROLLBACK;

In Session 2:

ROLLBACK;

Now we will create an index on column C1.

In Session 1:

CREATE INDEX IND_T1_C1 ON T1(C1);

Now the revised test begins.

In Session 1:

SELECT /*+ INDEX(T1) */
  C1,
  C2
FROM
  T1
WHERE
  C1<=100
ORDER BY
  C1
FOR UPDATE;

In Session 2:

SELECT /*+ INDEX(T1) */
  C1,
  C2
FROM
  T1
WHERE
  C1<=100
ORDER BY
  C1 DESC
FOR UPDATE;

(Session 2 is hung)

Let’s try to identify the row that Session 2 is waiting to lock.

In Session 1:

COLUMN EVENT FORMAT A20
COLUMN OBJ# FORMAT 999999
COLUMN BLOCK# FORMAT 999999
COLUMN ROW# FORMAT 99

SELECT
  SW.EVENT,
  S.ROW_WAIT_OBJ# OBJ#,
  S.ROW_WAIT_FILE# FILE#,
  S.ROW_WAIT_BLOCK# BLOCK#,
  S.ROW_WAIT_ROW# ROW#
FROM
  V$SESSION_WAIT SW,
  V$SESSION S
WHERE
  S.USERNAME IS NOT NULL
  AND SW.SID=S.SID
  AND SW.EVENT NOT LIKE '%SQL*Net%'
  AND SW.EVENT NOT IN ('Streams AQ: waiting for messages in the queue', 'wait for unread message on broadcast channel');

EVENT                   OBJ#      FILE#  BLOCK# ROW#
-------------------- ------- ---------- ------- ----
enq: TX - row lock c   71913          4    4259    0
ontention 

Notice in the above output, now block number 4259 is identified, while in the earlier test script block number 4262 was identified by the above SQL statement.  We have the object ID (we actually need the DATA_OBJECT_ID, but the values will be the same in this test case), the absolute file number, the block number, and the row number in the block (0 in this case).  Let’s select that row from the table:

SELECT
  C1
FROM
  T1
WHERE
  ROWID=DBMS_ROWID.ROWID_CREATE(1, 71913, 4, 4259, 0);

 C1
---
 1 

The row with a C1 value of 1 is the first row that was returned by Session 1 (it will be the last row requested to be returned by Session 2), but it is also the first row that matched the WHERE clause predicates for Session 2’s SQL statement as the blocks were read.  This is the same result as we saw before – maybe it does not matter whether an index access path is used to avoid the sort operation that would be otherwise required to satisfy the ORDER BY clause – is the second theory false?  Before making that determination, let’s take a look at the execution plan for Session 2’s SQL statement:

SELECT
  S.SQL_ID,
  S.SQL_CHILD_NUMBER
FROM
  V$SESSION_WAIT SW,
  V$SESSION S
WHERE
  S.USERNAME IS NOT NULL
  AND SW.SID=S.SID
  AND SW.EVENT NOT LIKE '%SQL*Net%'
  AND SW.EVENT NOT IN ('Streams AQ: waiting for messages in the queue', 'wait for unread message on broadcast channel');

SQL_ID        SQL_CHILD_NUMBER
------------- ----------------
3yz7pu7rw5cw0                0

SELECT
  *
FROM
  TABLE(DBMS_XPLAN.DISPLAY_CURSOR('3yz7pu7rw5cw0',0,'TYPICAL'));

Plan hash value: 3432103074

-------------------------------------------------------------------------------------------
| Id  | Operation                     | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |           |       |       |     7 (100)|          |
|   1 |  FOR UPDATE                   |           |       |       |            |          |
|   2 |   SORT ORDER BY               |           |   100 | 14200 |     7  (15)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID| T1        |   100 | 14200 |     6   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN          | IND_T1_C1 |   100 |       |     2   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("C1"<=100)

Note
-----
   - dynamic sampling used for this statement (level=2)

The index that we created was definitely used, but notice that there is still a SORT ORDER BY operation in the execution plan.  The rows in the index were read in ascending order, not descending order!  Let’s try again using an INDEX_DESC hint in the SQL statement for Session 2.

In Session 1:

ROLLBACK;

In Session 2:

ROLLBACK;

In Session 1:

SELECT /*+ INDEX(T1) */
  C1,
  C2
FROM
  T1
WHERE
  C1<=100
ORDER BY
  C1
FOR UPDATE;

In Session 2:

SELECT /*+ INDEX_DESC(T1) */
  C1,
  C2
FROM
  T1
WHERE
  C1<=100
ORDER BY
  C1 DESC
FOR UPDATE;

(Session 2 is hung)

Let’s determine the row that Session 2 is waiting to lock.

In Session 1:

SELECT
  SW.EVENT,
  S.ROW_WAIT_OBJ# OBJ#,
  S.ROW_WAIT_FILE# FILE#,
  S.ROW_WAIT_BLOCK# BLOCK#,
  S.ROW_WAIT_ROW# ROW#
FROM
  V$SESSION_WAIT SW,
  V$SESSION S
WHERE
  S.USERNAME IS NOT NULL
  AND SW.SID=S.SID
  AND SW.EVENT NOT LIKE '%SQL*Net%'
  AND SW.EVENT NOT IN ('Streams AQ: waiting for messages in the queue', 'wait for unread message on broadcast channel');

EVENT                   OBJ#      FILE#  BLOCK# ROW#
-------------------- ------- ---------- ------- ----
enq: TX - row lock c   71913          4    4262   18
ontention

SELECT
  C1
FROM
  T1
WHERE
  ROWID=DBMS_ROWID.ROWID_CREATE(1, 71913, 4, 4262, 18);

 C1
---
100

The above output now shows that Session 2 is attempting to lock the first row (with a C1 value of 100) that it intends to return.  Has the execution plan changed?

SELECT
  S.SQL_ID,
  S.SQL_CHILD_NUMBER
FROM
  V$SESSION_WAIT SW,
  V$SESSION S
WHERE
  S.USERNAME IS NOT NULL
  AND SW.SID=S.SID
  AND SW.EVENT NOT LIKE '%SQL*Net%'
  AND SW.EVENT NOT IN ('Streams AQ: waiting for messages in the queue', 'wait for unread message on broadcast channel');

SQL_ID        SQL_CHILD_NUMBER
------------- ----------------
806mtjxk7k1dv                0

SELECT
  *
FROM
  TABLE(DBMS_XPLAN.DISPLAY_CURSOR('806mtjxk7k1dv',0,'TYPICAL'));

Plan hash value: 3814195162

--------------------------------------------------------------------------------------------
| Id  | Operation                      | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |           |       |       |     6 (100)|          |
|   1 |  FOR UPDATE                    |           |       |       |            |          |
|   2 |   BUFFER SORT                  |           |       |       |            |          |
|   3 |    TABLE ACCESS BY INDEX ROWID | T1        |   100 | 14200 |     6   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN DESCENDING| IND_T1_C1 |   100 |       |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("C1"<=100)

Note
-----
   - dynamic sampling used for this statement (level=2) 

Notice in the above execution plan that there is no longer a SORT ORDER BY operation in the execution plan (replaced by a BUFFER SORT operation) and the INDEX RANGE SCAN operation was also replaced by an INDEX RANGE SCAN DESCENDING operation.  Simply because the index range scan is performed in the same order as specified by the ORDER BY clause, the rows are locked in the same order as is specified by the ORDER BY clause – that is, after all, the order in which the rows were touched.

Any other ideas for a demonstration of the order in which rows are locked when a SELECT FOR UPDATE is used?








Follow

Get every new post delivered to your Inbox.

Join 139 other followers