Everything or Nothing in SQL

23 05 2013

May 23, 2013

The following question recently came through an ERP mailing list (significantly rephrased):

I would like to use the Microsoft Query tool in Microsoft Excel to extract records from the ERP database.  I would like the list of parent records to be retrieved into Excel when all of the specified child records belong to the parent.

That sounds as though it would be an easy request.  The OP in the ERP mailing list later followed up with the following:

“If I may ask for another variation to pick your brain….

How would I rewrite this to show ONLY the work order(s) that have all of the 3 Parts, but instead of showing all of them, show only the ones that don’t include a specific other part?

For example, Include all the work order retrieved from your query, but then subtract those work orders that don’t contain a part or parts in another separate list.

I.E. All order containing parts (‘147555′,’227643′,’155328′) but not containing ANY of these parts (‘12345′, ‘456789’)”

In the above follow up, the term “work order(s)” refer to the “parent records” that was previously mentioned, and the term “parts” refer to the “child records” that was previously mentioned.  Below is a graphical mock up, with the light-green colored rectangles representing the “parts”, and the white colored rectangle at the bottom left representing the “work order”.  The other white colored rectangles represent subassemblies that belong to the “work order”, and the cyan colored rectangles represent the operations that utilize the “parts”.  Through normalization, the white colored rectangles represent rows in one database table (WORK_ORDER – the “parent” table), the cyan colored rectangles represent rows in another table (OPERATION – the “child” table), and the light-green colored rectangles represent rows in a third table (REQUIREMENT – the “grandchild” table).

EverythingOrNothing

The first request still seems to be easy to solve, even though there are three tables that are potentially involved.  Let’s simplify the problem further and just focus on the “grandchild” table.  Let’s create a small version of that table and populate the table with some sample rows:

CREATE TABLE T1 (
  WORKORDER_TYPE VARCHAR2(1),
  WORKORDER_BASE_ID VARCHAR2(15),
  WORKORDER_LOT_ID VARCHAR2(3),
  WORKORDER_SUB_ID VARCHAR2(3),
  PIECE_NO NUMBER,
  PART_ID VARCHAR2(15),
  QTY NUMBER(12,4));

INSERT INTO
  T1
SELECT
  'W' WORKORDER_TYPE,
  TRIM(TO_CHAR(TRUNC(ROWNUM / 100 + 1)*1000,'00000')) || 'W' WORKORDER_BASE_ID,
  TO_CHAR(MOD(TRUNC(ROWNUM / 25 + 1), 999) + 1) WORKORDER_LOT_ID,
  TO_CHAR(TRUNC(MOD(ROWNUM - 1, 25) / 5) + 1) WORKORDER_SUB_ID,
  MOD(ROWNUM - 1, 25) + 1 PIECE_NO,
  CHR(65 + MOD(ROWNUM - 1, 26)) || CHR(65 + MOD(ROWNUM - 1, 15)) || MOD(ROWNUM - 1, 10)  PART_ID,
  TRUNC(DBMS_RANDOM.VALUE*100) + 1 QTY
FROM
  DUAL
CONNECT BY
  LEVEL<=1000;

In the above, the combination of the WORKORDER_BASE_ID column and the WORKORDER_LOT_ID column represent the “work order” that was mentioned by the OP.  The WORKORDER_SUB_ID column identifies the subassembly number (answering the question of to which of the white colored rectangles does the “part” belong).  The PIECE_NO column is used to uniquely identify the rows that belong to a specific subassembly number (that is the final column in the REQUIREMENT table’s primary key).  The PART_ID column is the specific “part” that is attached to a specific “work order”.  The QTY column indicates the number of the “part” that is required by the “work order” – with the sample data, this is the only column that will differ from one execution of the table creation script to the next execution.

The sample data is a bit too consistent, however, we will ignore any issues that consistency may present for now.  Let’s execute a simple SQL statement that indicates how many from the following “parts” list appear in the “work orders”: MI8, NJ9, KG6 (note the use of DISTINCT – a single PART_ID could appear on more than one subassembly in a single WORKORDER_BASE_ID, WORKORDER_LOT_ID combination, and we do not want to repeatedly count those “parts”):

SELECT
  T.WORKORDER_TYPE,
  T.WORKORDER_BASE_ID,
  T.WORKORDER_LOT_ID,
  T.WORKORDER_SUB_ID,
  T.PART_ID,
  COUNT(DISTINCT T.PART_ID) OVER (PARTITION BY T.WORKORDER_TYPE, T.WORKORDER_BASE_ID, T.WORKORDER_LOT_ID) AS C
FROM
  T1 TWHERE  T.PART_ID IN ('MI8','NJ9','KG6');

W WORKORDER_BASE_ WOR WOR PART_ID                  C
- --------------- --- --- --------------- ----------
W 01000W          3   3   KG6                      3
W 01000W          3   3   MI8                      3
W 01000W          3   3   NJ9                      3
W 05000W          19  1   KG6                      3
W 05000W          19  1   MI8                      3
W 05000W          19  1   NJ9                      3
W 09000W          34  4   KG6                      3
W 09000W          34  4   MI8                      3
W 09000W          34  4   NJ9                      3

Only 9 rows returned, with three different WORKORDER_BASE_ID, WORKORDER_LOT_ID combinations.  For a moment, let’s ignore that the three “parts” appear in all three “work orders”.  If we wanted a list of those “work orders” where _all_ three of the “parts” are present, and not just _one or more_ of the three “parts”, we could simply slide the above SQL statement into an inline view and specify that our C generated column must be equal to (or greater than – the use of greater than will be explained later) 3:

SELECT
  WORKORDER_TYPE,
  WORKORDER_BASE_ID,
  WORKORDER_LOT_ID,
  WORKORDER_SUB_ID,
  PART_ID
FROM
  (SELECT
    T.WORKORDER_TYPE,
    T.WORKORDER_BASE_ID,
    T.WORKORDER_LOT_ID,
    T.WORKORDER_SUB_ID,
    T.PART_ID,
    COUNT(DISTINCT T.PART_ID) OVER (PARTITION BY T.WORKORDER_TYPE, T.WORKORDER_BASE_ID, T.WORKORDER_LOT_ID) AS C
  FROM
    T1 T
  WHERE
    T.PART_ID IN ('MI8','NJ9','KG6'))
WHERE
  C>=3
ORDER BY
  WORKORDER_BASE_ID,
  WORKORDER_LOT_ID,
  WORKORDER_SUB_ID;

W WORKORDER_BASE_ WOR WOR PART_ID
- --------------- --- --- -------
W 01000W          3   3   KG6
W 01000W          3   3   MI8
W 01000W          3   3   NJ9
W 05000W          19  1   KG6
W 05000W          19  1   MI8
W 05000W          19  1   NJ9
W 09000W          34  4   KG6
W 09000W          34  4   MI8
W 09000W          34  4   NJ9

While trying to determine the OP’s business case for the requests, I wondered if the OP was attempting to replace a specific “part” with another “part” within the ERP package, and simply required a report to understand the progress of that manual “part” replacement procedure in the ERP package.  Let’s move forward with this assumption by replacing a row containing a specific PART_ID value with a new PART_ID value, but only for a single “work order”:

DELETE FROM
  T1
WHERE
  WORKORDER_TYPE='W'
  AND WORKORDER_BASE_ID='09000W'
  AND WORKORDER_LOT_ID='34'
  AND PART_ID='NJ9';

1 row deleted.

INSERT INTO T1 VALUES (
  'W',
  '09000W',
  '34',
  '4',
  10,
  'REPLACEMENT-NJ9',
  10);

1 row created.

Let’s check the result of the SQL statement that we were in the process of developing:

SELECT
  WORKORDER_TYPE,
  WORKORDER_BASE_ID,
  WORKORDER_LOT_ID,
  WORKORDER_SUB_ID,
  PART_ID
FROM
  (SELECT
    T.WORKORDER_TYPE,
    T.WORKORDER_BASE_ID,
    T.WORKORDER_LOT_ID,
    T.WORKORDER_SUB_ID,
    T.PART_ID,
    COUNT(DISTINCT T.PART_ID) OVER (PARTITION BY T.WORKORDER_TYPE, T.WORKORDER_BASE_ID, T.WORKORDER_LOT_ID) AS C
  FROM
    T1 T
  WHERE
    T.PART_ID IN ('MI8','NJ9','KG6'))
WHERE
  C>=3
ORDER BY
  WORKORDER_BASE_ID,
  WORKORDER_LOT_ID,
  WORKORDER_SUB_ID;

W WORKORDER_BASE_ WOR WOR PART_ID
- --------------- --- --- -------
W 01000W          3   3   KG6
W 01000W          3   3   MI8
W 01000W          3   3   NJ9
W 05000W          19  1   KG6
W 05000W          19  1   MI8
W 05000W          19  1   NJ9

Now, there are only six rows returned from two “work orders” – the third “work order” is no longer included in the query results.

With the use of the C >= n syntax, we are able to find “work orders” with at least n number of the specified “parts”.  So, if we were interested in finding cases where at least two of the three listed “parts” are attached to a “work order”, we just need to change the 3 to 2:

SELECT
  WORKORDER_TYPE,
  WORKORDER_BASE_ID,
  WORKORDER_LOT_ID,
  WORKORDER_SUB_ID,
  PART_ID
FROM
  (SELECT
    T.WORKORDER_TYPE,
    T.WORKORDER_BASE_ID,
    T.WORKORDER_LOT_ID,
    T.WORKORDER_SUB_ID,
    T.PART_ID,
    COUNT(DISTINCT T.PART_ID) OVER (PARTITION BY T.WORKORDER_TYPE, T.WORKORDER_BASE_ID, T.WORKORDER_LOT_ID) AS C
  FROM
    T1 T
  WHERE
    T.PART_ID IN ('MI8','NJ9','KG6'))
WHERE
  C>=2
ORDER BY
  WORKORDER_BASE_ID,
  WORKORDER_LOT_ID,
  WORKORDER_SUB_ID;

W WORKORDER_BASE_ WOR WOR PART_ID
- --------------- --- --- -------
W 01000W          3   3   KG6
W 01000W          3   3   MI8
W 01000W          3   3   NJ9
W 05000W          19  1   KG6
W 05000W          19  1   NJ9
W 05000W          19  1   MI8
W 09000W          34  4   KG6
W 09000W          34  4   MI8

The third “work order” is again included in the query resultset.

Other than putting the results into Microsoft Excel, the sample query satisfies the first request of the OP.  If we were not interested in returning the WORKORDER_SUB_ID and PART_ID columns, we could have simply used the following SQL statement:

SELECT
  T.WORKORDER_TYPE,
  T.WORKORDER_BASE_ID,
  T.WORKORDER_LOT_ID
FROM
  T1 T
WHERE
  T.PART_ID IN ('MI8','NJ9','KG6')
GROUP BY
  T.WORKORDER_TYPE,
  T.WORKORDER_BASE_ID,
  T.WORKORDER_LOT_ID
HAVING
  COUNT(*)>=3;

W WORKORDER_BASE_ WOR
- --------------- ---
W 05000W          19
W 01000W          3

Now to work on the second request, only retrieving the “work orders” when all of a specified list of “parts” are attached to the “work order” and none of another list of “parts” are attached to the “work order”.  As a starting point, we will re-use the inline view from the previous solution.  The IN list will be modified to include the “parts” that must be attached to the “work order”, as well as the “parts” that must not be attached to the “work order”.  The COUNT analytic function must be modified to include either a DECODE or CASE structure to divide the “parts” into the “must” and “must not” lists so that a count of each may be obtained:

SELECT
  T.WORKORDER_TYPE,
  T.WORKORDER_BASE_ID,
  T.WORKORDER_LOT_ID,
  T.WORKORDER_SUB_ID,
  T.PART_ID,
  COUNT(DISTINCT (CASE WHEN T.PART_ID IN ('MI8','NJ9','KG6') THEN T.PART_ID ELSE NULL END)) OVER (PARTITION BY T.WORKORDER_TYPE, T.WORKORDER_BASE_ID, T.WORKORDER_LOT_ID) AS C,
  COUNT(DISTINCT (CASE WHEN T.PART_ID IN ('REPLACEMENT-NJ9') THEN T.PART_ID ELSE NULL END)) OVER (PARTITION BY T.WORKORDER_TYPE, T.WORKORDER_BASE_ID, T.WORKORDER_LOT_ID) AS NC
FROM
  T1 T
WHERE
  T.PART_ID IN ('MI8','KG6',   'REPLACEMENT-NJ9');

W WORKORDER_BASE_ WOR WOR PART_ID                  C         NC
- --------------- --- --- --------------- ---------- ----------
W 01000W          3   3   KG6                      2          0
W 01000W          3   3   MI8                      2          0
W 05000W          19  1   KG6                      2          0
W 05000W          19  1   MI8                      2          0
W 09000W          34  4   KG6                      2          1
W 09000W          34  4   MI8                      2          1
W 09000W          34  4   REPLACEMENT-NJ9          2          1

To satisfy the OP’s second request, we are interested in those rows where the C column value is (at least) 2 and the NC column value is 0.  Just as before, we will slide the above SQL statement into an inline view and add the specifications for the C and NC generated columns:

SELECT
  WORKORDER_TYPE,
  WORKORDER_BASE_ID,
  WORKORDER_LOT_ID,
  WORKORDER_SUB_ID,
  PART_ID
FROM
  (SELECT
    T.WORKORDER_TYPE,
    T.WORKORDER_BASE_ID,
    T.WORKORDER_LOT_ID,
    T.WORKORDER_SUB_ID,
    T.PART_ID,
    COUNT(DISTINCT (CASE WHEN T.PART_ID IN ('MI8','NJ9','KG6') THEN T.PART_ID ELSE NULL END)) OVER (PARTITION BY T.WORKORDER_TYPE, T.WORKORDER_BASE_ID, T.WORKORDER_LOT_ID) AS C,
    COUNT(DISTINCT (CASE WHEN T.PART_ID IN ('REPLACEMENT-NJ9') THEN T.PART_ID ELSE NULL END)) OVER (PARTITION BY T.WORKORDER_TYPE, T.WORKORDER_BASE_ID, T.WORKORDER_LOT_ID) AS NC
  FROM
    T1 T
  WHERE
    T.PART_ID IN ('MI8','KG6',   'REPLACEMENT-NJ9'))
WHERE
  C>=2
  AND NC=0
ORDER BY
  WORKORDER_BASE_ID,
  WORKORDER_LOT_ID,
  WORKORDER_SUB_ID;

W WORKORDER_BASE_ WOR WOR PART_ID
- --------------- --- --- -------
W 01000W          3   3   KG6
W 01000W          3   3   MI8
W 05000W          19  1   KG6
W 05000W          19  1   MI8

The above shows those “work orders” where the NJ9 “part” had not yet been replaced with the REPLACEMENT-NJ9 “part”.  If desired, we are also able to easily modify the OP’s second request to see the “work orders” where the “part” was already replaced just by changing NC=0 to NC=1:

SELECT
  WORKORDER_TYPE,
  WORKORDER_BASE_ID,
  WORKORDER_LOT_ID,
  WORKORDER_SUB_ID,
  PART_ID
FROM
  (SELECT
    T.WORKORDER_TYPE,
    T.WORKORDER_BASE_ID,
    T.WORKORDER_LOT_ID,
    T.WORKORDER_SUB_ID,
    T.PART_ID,
    COUNT(DISTINCT (CASE WHEN T.PART_ID IN ('MI8','NJ9','KG6') THEN T.PART_ID ELSE NULL END)) OVER (PARTITION BY T.WORKORDER_TYPE, T.WORKORDER_BASE_ID, T.WORKORDER_LOT_ID) AS C,
    COUNT(DISTINCT (CASE WHEN T.PART_ID IN ('REPLACEMENT-NJ9') THEN T.PART_ID ELSE NULL END)) OVER (PARTITION BY T.WORKORDER_TYPE, T.WORKORDER_BASE_ID, T.WORKORDER_LOT_ID) AS NC
  FROM
    T1 T
  WHERE
    T.PART_ID IN ('MI8','KG6',   'REPLACEMENT-NJ9'))
WHERE
  C>=2
  AND NC=1
ORDER BY
  WORKORDER_BASE_ID,
  WORKORDER_LOT_ID,
  WORKORDER_SUB_ID;

W WORKORDER_BASE_ WOR WOR PART_ID
- --------------- --- --- ---------------
W 09000W          34  4   KG6
W 09000W          34  4   REPLACEMENT-NJ9
W 09000W          34  4   MI8

What other related information are we able to return?  What about all “parts” (the full bill of material) for each of the work orders that match the OP’s second request.  That modification is simple, just slide the majority of the above SQL statement into a subquery (found in the first WHERE clause):

SELECT
  *
FROM
  T1
WHERE
  (WORKORDER_TYPE,
   WORKORDER_BASE_ID,
   WORKORDER_LOT_ID) IN
    (SELECT
      WORKORDER_TYPE,
      WORKORDER_BASE_ID,
      WORKORDER_LOT_ID
    FROM
      (SELECT
        T.WORKORDER_TYPE,
        T.WORKORDER_BASE_ID,
        T.WORKORDER_LOT_ID,
        T.WORKORDER_SUB_ID,
        T.PART_ID,
        COUNT(DISTINCT (CASE WHEN T.PART_ID IN ('MI8','NJ9','KG6') THEN T.PART_ID ELSE NULL END)) OVER (PARTITION BY T.WORKORDER_TYPE, T.WORKORDER_BASE_ID, T.WORKORDER_LOT_ID) AS C,
        COUNT(DISTINCT (CASE WHEN T.PART_ID IN ('REPLACEMENT-NJ9') THEN T.PART_ID ELSE NULL END)) OVER (PARTITION BY T.WORKORDER_TYPE, T.WORKORDER_BASE_ID, T.WORKORDER_LOT_ID) AS NC
      FROM
        T1 T
      WHERE
        T.PART_ID IN ('MI8','KG6',   'REPLACEMENT-NJ9'))
    WHERE
      C>=2
      AND NC=0)
ORDER BY
  WORKORDER_BASE_ID,
  WORKORDER_LOT_ID,
  WORKORDER_SUB_ID,
  PIECE_NO;

W WORKORDER_BASE_ WOR WOR   PIECE_NO PART_ID                QTY
- --------------- --- --- ---------- --------------- ----------
W 01000W          3   1            1 ZK5                     48
W 01000W          3   1            2 AL6                     60
W 01000W          3   1            3 BM7                     42
W 01000W          3   1            4 CN8                     26
W 01000W          3   1            5 DO9                     49
W 01000W          3   2            6 EA0                     66
W 01000W          3   2            7 FB1                     39
W 01000W          3   2            8 GC2                     57
W 01000W          3   2            9 HD3                     42
W 01000W          3   2           10 IE4                     52
W 01000W          3   3           11 JF5                     16
W 01000W          3   3           12 KG6                     61
W 01000W          3   3           13 LH7                     39
W 01000W          3   3           14 MI8                     75
W 01000W          3   3           15 NJ9                     94
W 01000W          3   4           16 OK0                     67
W 01000W          3   4           17 PL1                     12
W 01000W          3   4           18 QM2                     41
W 01000W          3   4           19 RN3                     97
W 01000W          3   4           20 SO4                     15
W 01000W          3   5           21 TA5                     54
W 01000W          3   5           22 UB6                     59
W 01000W          3   5           23 VC7                     23
W 01000W          3   5           24 WD8                      7
W 01000W          3   5           25 YJ4                     22
W 05000W          19  1            1 JF5                     31
W 05000W          19  1            2 KG6                     42
W 05000W          19  1            3 LH7                     37
W 05000W          19  1            4 MI8                     85
W 05000W          19  1            5 NJ9                    100
W 05000W          19  2            6 OK0                     43
W 05000W          19  2            7 PL1                     58
W 05000W          19  2            8 QM2                     76
W 05000W          19  2            9 RN3                     66
W 05000W          19  2           10 SO4                     75
W 05000W          19  3           11 TA5                     50
W 05000W          19  3           12 UB6                     55
W 05000W          19  3           13 VC7                     18
W 05000W          19  3           14 WD8                     33
W 05000W          19  3           15 XE9                      8
W 05000W          19  4           16 YF0                     86
W 05000W          19  4           17 ZG1                      7
W 05000W          19  4           18 AH2                     25
W 05000W          19  4           19 BI3                     38
W 05000W          19  4           20 CJ4                     34
W 05000W          19  5           21 DK5                     88
W 05000W          19  5           22 EL6                     91
W 05000W          19  5           23 FM7                     52
W 05000W          19  5           24 GN8                     71
W 05000W          19  5           25 IE4                     55

Just as before, we are able to see all “parts” for the “work orders” that already have the replacement part, just by changing the NC=0 to NC=1:

SELECT
  *
FROM
  T1
WHERE
  (WORKORDER_TYPE,
   WORKORDER_BASE_ID,
   WORKORDER_LOT_ID) IN
    (SELECT
      WORKORDER_TYPE,
      WORKORDER_BASE_ID,
      WORKORDER_LOT_ID
    FROM
      (SELECT
        T.WORKORDER_TYPE,
        T.WORKORDER_BASE_ID,
        T.WORKORDER_LOT_ID,
        T.WORKORDER_SUB_ID,
        T.PART_ID,
        COUNT(DISTINCT (CASE WHEN T.PART_ID IN ('MI8','NJ9','KG6') THEN T.PART_ID ELSE NULL END)) OVER (PARTITION BY T.WORKORDER_TYPE, T.WORKORDER_BASE_ID, T.WORKORDER_LOT_ID) AS C,
        COUNT(DISTINCT (CASE WHEN T.PART_ID IN ('REPLACEMENT-NJ9') THEN T.PART_ID ELSE NULL END)) OVER (PARTITION BY T.WORKORDER_TYPE, T.WORKORDER_BASE_ID, T.WORKORDER_LOT_ID) AS NC
      FROM
        T1 T
      WHERE
        T.PART_ID IN ('MI8','KG6',   'REPLACEMENT-NJ9'))
    WHERE
      C>=2
      AND NC=1)
ORDER BY
  WORKORDER_BASE_ID,
  WORKORDER_LOT_ID,
  WORKORDER_SUB_ID,
  PIECE_NO;

W WORKORDER_BASE_ WOR WOR   PIECE_NO PART_ID                QTY
- --------------- --- --- ---------- --------------- ----------
W 09000W          34  1            1 UF0                     80
W 09000W          34  1            2 VG1                     75
W 09000W          34  1            3 WH2                     75
W 09000W          34  1            4 XI3                     77
W 09000W          34  1            5 YJ4                     46
W 09000W          34  2            6 ZK5                     57
W 09000W          34  2            7 AL6                     64
W 09000W          34  2            8 BM7                     14
W 09000W          34  2            9 CN8                     30
W 09000W          34  2           10 DO9                     62
W 09000W          34  3           11 EA0                     45
W 09000W          34  3           12 FB1                     68
W 09000W          34  3           13 GC2                     25
W 09000W          34  3           14 HD3                     73
W 09000W          34  3           15 IE4                     32
W 09000W          34  4           10 REPLACEMENT-NJ9         10
W 09000W          34  4           16 JF5                     49
W 09000W          34  4           17 KG6                     23
W 09000W          34  4           18 LH7                     60
W 09000W          34  4           19 MI8                     75
W 09000W          34  5           21 OK0                     91
W 09000W          34  5           22 PL1                     66
W 09000W          34  5           23 QM2                     45
W 09000W          34  5           24 RN3                     49
W 09000W          34  5           25 TE9                      6

Using the sample table and data provided, are there any other solutions to the OP’s requests?





Grouping Data Sets by Week Number of the Month

1 05 2013

May 1, 2013

I saw a decent SQL brain teaser this morning in the comp.databases.oracle.server Usenet group.  The OP in the message thread is attempting to summarize data in one of his tables, with the summarizations broken down by month and then the week within that month.  Increasing the challenge, the OP required that the dates defining the week start and end dates fall within the month’s data set:

“Today is 4/30/2013.  The we_end [last day of the week containing April 30, 2013] value is 5/4/2013.  What I am really looking for is the weeks within the calendar month.  So, the first week for April 2013 should show 4/1 – 4/6.  And the last week should show 4/28 – 4/30″

How would you solve that particular problem?  Is there an Oracle built-in function that provides the grouping requested by the OP?

-

-

-

-

There are a couple different methods to solve the problem posed by the OP.  One method uses a virtual lookup table to determine the start date and end date of each week in the month, using the restriction that the start date of each week or the end date of each week might need to be adjusted to fall within the same month.  Interestingly, with the restrictions specified by the OP, there are 62 weeks this year.

As with all SQL statements, there is must be a sensible starting point.  First, let’s determine the start and end dates of each month in 2013.  The month start date is easy to determine, while outputting the month end date is a little more challenging – that date is one day less than the start of the next month:

SELECT 
  ADD_MONTHS(TO_DATE('01-01-2013','MM-DD-YYYY'),(LEVEL-1)) MONTH_START, 
  ADD_MONTHS(TO_DATE('01-01-2013','MM-DD-YYYY'),LEVEL)-1 MONTH_END 
FROM 
  DUAL 
CONNECT BY 
  LEVEL<=12; 

MONTH_STA MONTH_END 
--------- --------- 
01-JAN-13 31-JAN-13 
01-FEB-13 28-FEB-13 
01-MAR-13 31-MAR-13 
01-APR-13 30-APR-13 
01-MAY-13 31-MAY-13 
01-JUN-13 30-JUN-13 
01-JUL-13 31-JUL-13 
01-AUG-13 31-AUG-13 
01-SEP-13 30-SEP-13 
01-OCT-13 31-OCT-13 
01-NOV-13 30-NOV-13 
01-DEC-13 31-DEC-13

There are a couple of approaches that could be taken for the next step, finding the start of the first, second, third, fourth, fifth, and potentially sixth week in each of those months.  I am interested in determining the start of the second week – we are able to accomplish that task by using the NEXT_DAY function to find the next Sunday after the first of the month.  The end of the first week will also be calculated, even though it is not used in later calculations.  Note that this solution is subject to problems depending on the NLS settings on the client (and whether or not Sunday is considered the first day of the week):

SELECT 
  ADD_MONTHS(TO_DATE('01-01-2013','MM-DD-YYYY'),(LEVEL-1)) MONTH_START, 
  ADD_MONTHS(TO_DATE('01-01-2013','MM-DD-YYYY'),LEVEL)-1 MONTH_END, 
  NEXT_DAY(ADD_MONTHS(TO_DATE('01-01-2013','MM-DD-YYYY'),(LEVEL-1))-1,'SATURDAY') END_FIRST_WEEK,
  NEXT_DAY(ADD_MONTHS(TO_DATE('01-01-2013','MM-DD-YYYY'),(LEVEL-1)),'SUNDAY') START_SECOND_WEEK
 FROM 
  DUAL 
CONNECT BY 
  LEVEL<=12; 

MONTH_STA MONTH_END END_FIRST START_SEC 
--------- --------- --------- --------- 
01-JAN-13 31-JAN-13 05-JAN-13 06-JAN-13 
01-FEB-13 28-FEB-13 02-FEB-13 03-FEB-13 
01-MAR-13 31-MAR-13 02-MAR-13 03-MAR-13 
01-APR-13 30-APR-13 06-APR-13 07-APR-13 
01-MAY-13 31-MAY-13 04-MAY-13 05-MAY-13 
01-JUN-13 30-JUN-13 01-JUN-13 02-JUN-13 
01-JUL-13 31-JUL-13 06-JUL-13 07-JUL-13 
01-AUG-13 31-AUG-13 03-AUG-13 04-AUG-13 
01-SEP-13 30-SEP-13 07-SEP-13 08-SEP-13 
01-OCT-13 31-OCT-13 05-OCT-13 06-OCT-13 
01-NOV-13 30-NOV-13 02-NOV-13 03-NOV-13 
01-DEC-13 31-DEC-13 07-DEC-13 08-DEC-13

The above may appear to be a number of related, but useless dates.  However, that is not the case.  Next, we will need a method to count through as many as six weeks per month.  This should work:

SELECT 
  LEVEL WEEK_NUMBER 
FROM 
  DUAL 
CONNECT BY 
  LEVEL<=6; 

WEEK_NUMBER 
----------- 
          1 
          2 
          3 
          4 
          5 
          6

Next, a Cartesian join will be created between the two row sources to permit stepping through each month, and each week within that month (we will refine the output later… note that this unrefined output provides a clue for a second method of solving the problem posed by the OP):

SELECT 
  MONTHS.MONTH_START, 
  WN.WEEK_NUMBER, 
  MONTHS.START_SECOND_WEEK + (WN.WEEK_NUMBER-2)*7 START_WEEK, 
  MONTHS.START_SECOND_WEEK + (WN.WEEK_NUMBER-1)*7 - 1 END_WEEK 
FROM 
  (SELECT 
    ADD_MONTHS(TO_DATE('01-01-2013','MM-DD-YYYY'),(LEVEL-1)) MONTH_START, 
    ADD_MONTHS(TO_DATE('01-01-2013','MM-DD-YYYY'),LEVEL)-1 MONTH_END, 
    NEXT_DAY(ADD_MONTHS(TO_DATE('01-01-2013','MM-DD-YYYY'),(LEVEL-1))-1,'SATURDAY') END_FIRST_WEEK,
    NEXT_DAY(ADD_MONTHS(TO_DATE('01-01-2013','MM-DD-YYYY'),(LEVEL-1)),'SUNDAY') START_SECOND_WEEK
   FROM 
    DUAL 
  CONNECT BY 
    LEVEL<=12) MONTHS, 
  (SELECT 
    LEVEL WEEK_NUMBER 
  FROM 
    DUAL 
  CONNECT BY 
    LEVEL<=6) WN 
WHERE 
  (MONTHS.START_SECOND_WEEK + (WN.WEEK_NUMBER-2)*7)<=MONTHS.MONTH_END; 

MONTH_STA WEEK_NUMBER START_WEE END_WEEK 
--------- ----------- --------- --------- 
01-JAN-13           1 30-DEC-12 05-JAN-13 
01-JAN-13           2 06-JAN-13 12-JAN-13 
01-JAN-13           3 13-JAN-13 19-JAN-13 
01-JAN-13           4 20-JAN-13 26-JAN-13 
01-JAN-13           5 27-JAN-13 02-FEB-13 
01-FEB-13           1 27-JAN-13 02-FEB-13 
01-FEB-13           2 03-FEB-13 09-FEB-13 
01-FEB-13           3 10-FEB-13 16-FEB-13 
01-FEB-13           4 17-FEB-13 23-FEB-13 
01-FEB-13           5 24-FEB-13 02-MAR-13 
01-MAR-13           1 24-FEB-13 02-MAR-13 
01-MAR-13           2 03-MAR-13 09-MAR-13 
01-MAR-13           3 10-MAR-13 16-MAR-13 
01-MAR-13           4 17-MAR-13 23-MAR-13 
01-MAR-13           5 24-MAR-13 30-MAR-13 
01-MAR-13           6 31-MAR-13 06-APR-13 
01-APR-13           1 31-MAR-13 06-APR-13 
01-APR-13           2 07-APR-13 13-APR-13 
01-APR-13           3 14-APR-13 20-APR-13 
01-APR-13           4 21-APR-13 27-APR-13 
01-APR-13           5 28-APR-13 04-MAY-13 
01-MAY-13           1 28-APR-13 04-MAY-13 
01-MAY-13           2 05-MAY-13 11-MAY-13 
01-MAY-13           3 12-MAY-13 18-MAY-13 
01-MAY-13           4 19-MAY-13 25-MAY-13 
01-MAY-13           5 26-MAY-13 01-JUN-13 
01-JUN-13           1 26-MAY-13 01-JUN-13 
...

The above output has a slight problem – the Sunday in the first week of the month may not be in the month of interest; additionally, the Saturday in the last week of the month may not be in the month of interest.  We may overcome those problems with the help of the DECODE function (the CASE expression may be used instead, if so desired):

SELECT 
  MONTHS.MONTH_START, 
  WN.WEEK_NUMBER, 
  DECODE(WN.WEEK_NUMBER, 1, MONTHS.MONTH_START, MONTHS.START_SECOND_WEEK + (WN.WEEK_NUMBER-2)*7) START_WEEK,
  DECODE(SIGN(MONTHS.MONTH_END - (MONTHS.START_SECOND_WEEK + (WN.WEEK_NUMBER-1)*7)), -1, MONTHS.MONTH_END, MONTHS.START_SECOND_WEEK + (WN.WEEK_NUMBER-1)*7 - 1) END_WEEK
FROM 
  (SELECT 
    ADD_MONTHS(TO_DATE('01-01-2013','MM-DD-YYYY'),(LEVEL-1)) MONTH_START, 
    ADD_MONTHS(TO_DATE('01-01-2013','MM-DD-YYYY'),LEVEL)-1 MONTH_END, 
    NEXT_DAY(ADD_MONTHS(TO_DATE('01-01-2013','MM-DD-YYYY'),(LEVEL-1))-1,'SATURDAY') END_FIRST_WEEK,
     NEXT_DAY(ADD_MONTHS(TO_DATE('01-01-2013','MM-DD-YYYY'),(LEVEL-1)),'SUNDAY') START_SECOND_WEEK
   FROM 
    DUAL 
  CONNECT BY 
    LEVEL<=12) MONTHS, 
  (SELECT 
    LEVEL WEEK_NUMBER 
  FROM 
    DUAL 
  CONNECT BY 
    LEVEL<=6) WN 
WHERE 
  (MONTHS.START_SECOND_WEEK + (WN.WEEK_NUMBER-2)*7)<=MONTHS.MONTH_END; 

MONTH_STA WEEK_NUMBER START_WEE END_WEEK 
--------- ----------- --------- --------- 
01-JAN-13           1 01-JAN-13 05-JAN-13 
01-JAN-13           2 06-JAN-13 12-JAN-13 
01-JAN-13           3 13-JAN-13 19-JAN-13 
01-JAN-13           4 20-JAN-13 26-JAN-13 
01-JAN-13           5 27-JAN-13 31-JAN-13 
01-FEB-13           1 01-FEB-13 02-FEB-13 
01-FEB-13           2 03-FEB-13 09-FEB-13 
01-FEB-13           3 10-FEB-13 16-FEB-13 
01-FEB-13           4 17-FEB-13 23-FEB-13 
01-FEB-13           5 24-FEB-13 28-FEB-13 
01-MAR-13           1 01-MAR-13 02-MAR-13 
01-MAR-13           2 03-MAR-13 09-MAR-13 
01-MAR-13           3 10-MAR-13 16-MAR-13 
01-MAR-13           4 17-MAR-13 23-MAR-13 
01-MAR-13           5 24-MAR-13 30-MAR-13 
01-MAR-13           6 31-MAR-13 31-MAR-13 
01-APR-13           1 01-APR-13 06-APR-13 
01-APR-13           2 07-APR-13 13-APR-13 
01-APR-13           3 14-APR-13 20-APR-13 
01-APR-13           4 21-APR-13 27-APR-13 
01-APR-13           5 28-APR-13 30-APR-13 
01-MAY-13           1 01-MAY-13 04-MAY-13 
01-MAY-13           2 05-MAY-13 11-MAY-13 
01-MAY-13           3 12-MAY-13 18-MAY-13 
01-MAY-13           4 19-MAY-13 25-MAY-13 
01-MAY-13           5 26-MAY-13 31-MAY-13 
... 
01-DEC-13           5 29-DEC-13 31-DEC-13 

62 rows selected.

Yes, this solution added 10 extra weeks to the year.  :-)

It is easy to extend this solution to cover more than just the year 2013.  For example, the following modification sets the starting period to January 2000, and runs for 20 years:

SELECT 
  MONTHS.MONTH_START, 
  WN.WEEK_NUMBER, 
  DECODE(WN.WEEK_NUMBER, 1, MONTHS.MONTH_START, MONTHS.START_SECOND_WEEK + (WN.WEEK_NUMBER-2)*7) START_WEEK,
  DECODE(SIGN(MONTHS.MONTH_END - (MONTHS.START_SECOND_WEEK + (WN.WEEK_NUMBER-1)*7)), -1, MONTHS.MONTH_END, MONTHS.START_SECOND_WEEK + (WN.WEEK_NUMBER-1)*7 - 1) END_WEEK
FROM 
  (SELECT 
    ADD_MONTHS(TO_DATE('01-01-2000','MM-DD-YYYY'),(LEVEL-1)) MONTH_START, 
    ADD_MONTHS(TO_DATE('01-01-2000','MM-DD-YYYY'),LEVEL)-1 MONTH_END, 
    NEXT_DAY(ADD_MONTHS(TO_DATE('01-01-2000','MM-DD-YYYY'),(LEVEL-1))-1,'SATURDAY') END_FIRST_WEEK,
     NEXT_DAY(ADD_MONTHS(TO_DATE('01-01-2000','MM-DD-YYYY'),(LEVEL-1)),'SUNDAY') START_SECOND_WEEK
   FROM 
    DUAL 
  CONNECT BY 
    LEVEL<=(12*20)) MONTHS, 
  (SELECT 
    LEVEL WEEK_NUMBER 
  FROM 
    DUAL 
  CONNECT BY 
    LEVEL<=6) WN 
WHERE 
  (MONTHS.START_SECOND_WEEK + (WN.WEEK_NUMBER-2)*7)<=MONTHS.MONTH_END;

With the virtual lookup table complete, the next step would be to slide the above SQL statement into an inline view, and then join it to the data set that will be analyzed.  Assuming that the data set has a column named DATE_SENT, the WHERE clause may appear similar to the following:

WHERE
  DATE_SENT BETWEEN START_WEEK AND END_WEEK

The above is one solution, are there other solutions?  What about a solution that uses a little simple mathematics?  Such a solution could be exponentially more efficient.  First, we will create a simple table for experimentation with 1000 rows containing somewhat random dates on or after January 1, 2013:

CREATE TABLE T1 AS
SELECT
  TRUNC(TO_DATE('01-01-2013','MM-DD-YYYY')+DBMS_RANDOM.VALUE*366) DATE_SENT
FROM
  DUAL
CONNECT BY
  LEVEL<=1000;

Next, we will generate a SQL statement that produces output that is somewhat similar to the unrefined output found in the first solution (note that the SQL statement contains a DISTINCT clause to eliminate duplicate rows – there could be several rows in the row source with the same random date value):

SELECT DISTINCT
  DATE_SENT,
  TRUNC(DATE_SENT,'MM') MONTH_START,
  ADD_MONTHS(TRUNC(DATE_SENT,'MM'),1)-1 MONTH_END,
  NEXT_DAY(TRUNC(DATE_SENT,'MM')-7,'SUNDAY') MONTH_WEEK_START
FROM
  T1
ORDER BY
  DATE_SENT;

DATE_SENT MONTH_STA MONTH_END MONTH_WEE
--------- --------- --------- ---------
01-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
02-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
03-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
04-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
05-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
06-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
07-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
08-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
09-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
10-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
11-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
12-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
13-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
14-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
15-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
16-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
17-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
18-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
19-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
20-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
21-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
22-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
23-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
24-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
25-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
26-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
27-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
28-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
29-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
30-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
31-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
01-FEB-13 01-FEB-13 28-FEB-13 27-JAN-13
02-FEB-13 01-FEB-13 28-FEB-13 27-JAN-13
03-FEB-13 01-FEB-13 28-FEB-13 27-JAN-13
...
29-DEC-13 01-DEC-13 31-DEC-13 01-DEC-13
30-DEC-13 01-DEC-13 31-DEC-13 01-DEC-13
31-DEC-13 01-DEC-13 31-DEC-13 01-DEC-13
01-JAN-14 01-JAN-14 31-JAN-14 29-DEC-13

In the above, the MONTH_WEEK_START column shows the Sunday of the week that includes the DATE_SENT column value.  At this point, it makes no difference if the MONTH_WEEK_START column value is in a different month than the DATE_SENT column value.

Next, we will determine the week within the month in which the DATE_SENT column value appears:

SELECT DISTINCT
  DATE_SENT,
  TRUNC(DATE_SENT,'MM') MONTH_START,
  ADD_MONTHS(TRUNC(DATE_SENT,'MM'),1)-1 MONTH_END,
  NEXT_DAY(TRUNC(DATE_SENT,'MM')-7,'SUNDAY') MONTH_WEEK_START,
  TRUNC((DATE_SENT-(NEXT_DAY(TRUNC(DATE_SENT,'MM')-7,'SUNDAY')))/7)+1 WEEK_NUMBER
FROM
  T1
ORDER BY
  DATE_SENT;

DATE_SENT MONTH_STA MONTH_END MONTH_WEE WEEK_NUMBER
--------- --------- --------- --------- -----------
01-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           1
02-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           1
03-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           1
04-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           1
05-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           1
06-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           2
07-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           2
08-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           2
09-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           2
10-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           2
11-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           2
12-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           2
13-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           3
14-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           3
15-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           3
16-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           3
17-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           3
18-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           3
19-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           3
20-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           4
21-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           4
22-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           4
23-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           4
24-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           4
25-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           4
26-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           4
27-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           5
28-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           5
29-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           5
30-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           5
31-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           5
01-FEB-13 01-FEB-13 28-FEB-13 27-JAN-13           1
02-FEB-13 01-FEB-13 28-FEB-13 27-JAN-13           1
03-FEB-13 01-FEB-13 28-FEB-13 27-JAN-13           2
04-FEB-13 01-FEB-13 28-FEB-13 27-JAN-13           2
...
24-DEC-13 01-DEC-13 31-DEC-13 01-DEC-13           4
26-DEC-13 01-DEC-13 31-DEC-13 01-DEC-13           4
27-DEC-13 01-DEC-13 31-DEC-13 01-DEC-13           4
28-DEC-13 01-DEC-13 31-DEC-13 01-DEC-13           4
29-DEC-13 01-DEC-13 31-DEC-13 01-DEC-13           5
30-DEC-13 01-DEC-13 31-DEC-13 01-DEC-13           5
31-DEC-13 01-DEC-13 31-DEC-13 01-DEC-13           5
01-JAN-14 01-JAN-14 31-JAN-14 29-DEC-13           1

Heading toward the request made by the OP in the Usenet thread, the following SQL statement will determine a simple COUNT of the number of rows that have a DATE_SENT date in each week of each month in the T1 table’s data set (note that if there are no rows in a given week, that week will be skipped in the output):

SELECT
  TRUNC(DATE_SENT,'MM') MONTH_START,
  TRUNC((DATE_SENT-(NEXT_DAY(TRUNC(DATE_SENT,'MM')-7,'SUNDAY')))/7)+1 WEEK_NUMBER,
  COUNT(*) DAYS_IN_SET
FROM
  T1
GROUP BY
  TRUNC(DATE_SENT,'MM'),
  TRUNC((DATE_SENT-(NEXT_DAY(TRUNC(DATE_SENT,'MM')-7,'SUNDAY')))/7)+1
ORDER BY
  1,
  2;

MONTH_STA WEEK_NUMBER DAYS_IN_SET
--------- ----------- -----------
01-JAN-13           1          13
01-JAN-13           2          15
01-JAN-13           3          23
01-JAN-13           4          21
01-JAN-13           5          13
01-FEB-13           1           5
01-FEB-13           2          21
01-FEB-13           3          15
01-FEB-13           4          14
01-FEB-13           5          11
01-MAR-13           1           4
01-MAR-13           2          18
01-MAR-13           3          19
01-MAR-13           4          20
01-MAR-13           5          20
01-MAR-13           6           4
01-APR-13           1          14
01-APR-13           2          26
01-APR-13           3          15
01-APR-13           4          23
01-APR-13           5           9
01-MAY-13           1          12
01-MAY-13           2          15
01-MAY-13           3          22
01-MAY-13           4          23
01-MAY-13           5          17
01-JUN-13           1           1
01-JUN-13           2          14
01-JUN-13           3          21
01-JUN-13           4          18
01-JUN-13           5          21
01-JUN-13           6           4
01-JUL-13           1          15
01-JUL-13           2          21
01-JUL-13           3          13
01-JUL-13           4          24
01-JUL-13           5          14
01-AUG-13           1           6
01-AUG-13           2          10
01-AUG-13           3          21
01-AUG-13           4          22
01-AUG-13           5          28
01-SEP-13           1          16
01-SEP-13           2          16
01-SEP-13           3          22
01-SEP-13           4          23
01-SEP-13           5           8
01-OCT-13           1          15
01-OCT-13           2          15
01-OCT-13           3          17
01-OCT-13           4          18
01-OCT-13           5          21
01-NOV-13           1           6
01-NOV-13           2          20
01-NOV-13           3          20
01-NOV-13           4          17
01-NOV-13           5          18
01-DEC-13           1          22
01-DEC-13           2          15
01-DEC-13           3          19
01-DEC-13           4          20
01-DEC-13           5           6
01-JAN-14           1           1

It might be just my opinion, but the mathematical solution appears to be easier to understand than the method that uses the virtual lookup table.  We are still able to derive the start date and end date of each week, if needed, to produce the OP’s solution.

Might there be other solutions to the OP’s SQL brain teaser?





Analysis Challenges

25 04 2013

April 25, 2013

Roughly 12 years ago I was attempting to analyze customer order changes that were received through electronic document interchange (EDI), specifically X12 830 documents that show order forecasted demand for specific part numbers.  At the time, the EDI data was partially transformed and inserted into an Oracle 8.0.5 database, while that data is now inserted into an Oracle 11.2 database.  I recently took another glance that this particular problem to see if there is a better method for performing the analysis.

Complications in the analysis:

  • The EDI documents do not include the customer’s purchase order line numbers, just a list of outstanding due dates and quantities.  That complication means that there is no easy way to see if an outstanding quantity has changed, or the date for that outstanding quantity has changed.
  • It is possible to ship to the customer a partial order quantity, with the remaining quantity on the line handled as a back-order.  If the customer does not alter this particular forecast line, only the quantity outstanding will be provided, along with the due date – the original quantity is not provided.
  • It is possible to ship out an order line received in an 830 document up to five days before the customer’s due date.
  • Shipped parts may require hours or days to reach the customer’s site, thus delaying when those shipped parts are acknowledged in the list of outstanding due dates and quantities received through EDI.
  • EDI 830 documents are generally received the next weekday after the customer batch generates those documents on a nightly basis.  Thus, there is an opportunity for the received forecast to be out of sync with what is known to have arrived at the customer site, and what is acknowledged in the EDI data.
  • The customer may generate multiple EDI 830 documents for a particular part number/purchase order combination on any given day, and only the EDI 830 document received with the highest (sorted in ascending order) release number (which may not be a number) on a specific date is the valid release for the specific date.
  • EDI 830 documents may be generated for a specific part number by the customer nearly every weekday, once a month, or any frequency in between.

I recall trying to find a solution to this particular problem 12 years ago.  I thought that I could possibly pull all of the relevant data from the Oracle database into a custom application, and through some magical formula output what appeared to be a list of what the customer changed.  However, what about those part quantities that are in transit to the customer?  What about those part quantities that were received by the customer, but had not yet made their way through the customer’s batch processing system?  What about those cases where the customer sents two or more EDI 830 releases for a specific part number, and those EDI releases are received on the same date?

At the time, I was frustrated.  The quality of the output of my analysis would be dependent on the accuracy of humans (yes, and humans wrote the program that performed the analysis too).  I simply had to find the first matching due date and order quantity from the previous release with the due date and order quanitity from the current release, and then rely on an end user to correct the 5% to 10% of the cases where the matching algorithm was wrong due to one or more of the “complications” outlined above.

While recently re-examining the problem, I noticed a data element in the EDI 830 document that is described as “ACCUM RECEIPTS AS OF THIS DATE”, and a luck would have it, that particular data element is one of those elements that was selected more than 12 years ago to be transformed into the Oracle Database tables (in the sample data included with this blog article, this column is named PREVIOUS_ACCUM_QTY).  If only I recognized the meaning of the values of this column, and had access to Oracle analytical SQL functions 12 years ago, this particular analysis problem might not have had so many variables that could potentially throw off the quality of the analysis.

This particular blog article has a fairly large set of sample data – however, that sample data is a fairly small subset of the data that must be analyzed on a daily basis.  The sample data may be downloaded here: analysischallenges.sql sample code (save the file as “analysischallenges.sql” – without the .doc double extension).  If the file is saved in the same directory (folder) as SQL*Plus was started from, you may then execute the script in SQL*Plus to build the tables and populate the sample data by using the following command:

@analysischallenges.sql

The two sample table definitions are created with the following commands (also included in the script):

CREATE TABLE T1 (
  CUSTOMER_PO VARCHAR2(15),
  RELEASE_ID VARCHAR2(15),
  PART_ID VARCHAR2(30),
  RELEASE_DATE DATE,
  PREVIOUS_ACCUM_QTY NUMBER,
  CREATE_DATE DATE,
  PRIMARY KEY (CUSTOMER_PO,RELEASE_ID));

CREATE TABLE T1_LINE (
  CUSTOMER_PO VARCHAR2(12),
  RELEASE_ID VARCHAR2(12),
  QTY NUMBER,
  DUE_DATE DATE,
  PRIMARY KEY (CUSTOMER_PO,RELEASE_ID,DUE_DATE));

T1 Table Columns:

CUSTOMER_PO is the unique identifier for the customer purchase order received through the EDI 830 documents.  RELEASE_ID defines the specific serialized change number for the customer’s PO – the RELEASE_ID is unique for a specific PO.  PART_ID is the part number for which the customer is providing a forecast.  RELEASE_DATE is the date that the forecast was batch generated in the customer’s computer system.  PREVIOUS_ACCUM_QTY is the count of the number of this part number that the customer has already received for this specific customer purchase order.  CREATE_DATE is the date that the forecast was received from the customer and transformed/inserted into the Oracle Database tables.

T1_LINE Table Columns:

CUSTOMER_PO and RELEASE_ID are undeclared foreign key columns that point back to table T1.  QTY is the quantity of the part number that should (or is forecasted to) ship on the specfied DUE_DATE.

—–

If you are curious how I created the script to populate the tables, I executed the following in SQL*Plus, and then performed a little cleanup of the resulting spool files:

SET LINESIZE 200
SET TRIMSPOOL ON

SPOOL analysischallenges_t1.txt

SELECT
  'INSERT INTO T1 VALUES ('||CHR(39)||CUSTOMER_PO||CHR(39)||','
    ||CHR(39)||RELEASE_ID||CHR(39)||','
    ||CHR(39)||PART_ID||CHR(39)||','
    ||'TO_DATE('||CHR(39)||TO_CHAR(RELEASE_DATE,'DD-MON-YYYY')||CHR(39)||','||CHR(39)||'DD-MON-YYYY'||CHR(39)||')'||','
    ||PREVIOUS_ACCUM_QTY||','
    ||'TO_DATE('||CHR(39)||TO_CHAR(CREATE_DATE,'DD-MON-YYYY')||CHR(39)||','||CHR(39)||'DD-MON-YYYY'||CHR(39)||')'||');' D
FROM
  T1
ORDER BY
  CUSTOMER_PO,
  RELEASE_ID;

SPOOL OFF

SPOOL analysischallenges_t1_line.txt

SELECT
  'INSERT INTO T1_LINE VALUES ('||CHR(39)||CUSTOMER_PO||CHR(39)||','
    ||CHR(39)||RELEASE_ID||CHR(39)||','
    ||QTY||','
    ||'TO_DATE('||CHR(39)||TO_CHAR(DUE_DATE,'DD-MON-YYYY')||CHR(39)||','||CHR(39)||'DD-MON-YYYY'||CHR(39)||')'||');' D
FROM
  T1_LINE
ORDER BY
  CUSTOMER_PO,
  RELEASE_ID,
  DUE_DATE;

SPOOL OFF

—–

We must find a starting point for performing the analysis of what changed from one day’s forecast to a prior day’s forecast.  There is an issue with the EDI 830 releases that were received (identified by the CREATE_DATE column) on April 24, 2013 – two releases for the same customer purchase orders were received, and there were two purchase orders that demand the same part number.  Our analysis is only concerned with the highest numbered (sorted alphabetically) release number document for each customer purchase order that was received on any particular day, so we need to find a way to eliminate the other older releases for the same customer purchase order that was received on the same day.  The ROW_NUMBER analytic function is perfect for this task – we just need the output to show the same number (1) for the rows that we want to keep, and have some other number appear on the other rows.  Something similar to the following may work (note that the TRUNC function, or other approach that uses a date/time range, probably should be used when working with the CREATE_DATE column – realistically, there could be a time component included with those date values):

COLUMN PART_ID FORMAT A10
SET LINESIZE 120
SET PAGESIZE 1000

SELECT
  T.PART_ID,
  T.CUSTOMER_PO,
  ROW_NUMBER() OVER (PARTITION BY T.CUSTOMER_PO,TRUNC(T.CREATE_DATE) ORDER BY T.RELEASE_ID DESC) REL_ON_DATE,
  T.RELEASE_ID,
  T.RELEASE_DATE,
  T.CREATE_DATE,
  T.PREVIOUS_ACCUM_QTY
FROM
  T1 T
WHERE 
  T.CREATE_DATE=TO_DATE('24-APR-2013','DD-MON-YYYY')
ORDER BY
  T.PART_ID,
  T.CUSTOMER_PO,
  T.RELEASE_ID;

PART_ID    CUSTOMER_PO     REL_ON_DATE RELEASE_ID      RELEASE_D CREATE_DA PREVIOUS_ACCUM_QTY
---------- --------------- ----------- --------------- --------- --------- ------------------
DS1812+    1231234                   2 20130402-001    23-APR-13 24-APR-13                 24
DS1812+    1231234                   1 20130403-001    24-APR-13 24-APR-13                 24
DS1812+    1233290                   2 20130402-001    23-APR-13 24-APR-13                  4
DS1812+    1233290                   1 20130403-001    24-APR-13 24-APR-13                  4
DS212+     1133290                   2 20130402-001    23-APR-13 24-APR-13                  2
DS212+     1133290                   1 20130403-001    24-APR-13 24-APR-13                  2
RS812      1231280                   2 20130402-001    23-APR-13 24-APR-13                 60
RS812      1231280                   1 20130403-001    24-APR-13 24-APR-13                 60

To remove all of the rows that have something other than 1 in the REL_ON_DATE column, we are able to slide the above SQL statement into an inline view, and then specify that the REL_ON_DATE column value must be equal to 1.  Typically, there are supposed to be few/no changes made by the customer within eight weeks of the customer forecast’s release date, but that is not always the case.  While eliminating the unnecessary releases, it probably makes sense to also calculate the end date of this “firm” period by adding 55 days to the date on which the forecast was created:

SELECT
  T.PART_ID,
  T.CUSTOMER_PO,
  T.RELEASE_ID,
  T.RELEASE_DATE+55 POSSIBLE_FIRM_DATE,
  T.PREVIOUS_ACCUM_QTY
FROM
  (SELECT
    T.PART_ID,
    T.CUSTOMER_PO,
    ROW_NUMBER() OVER (PARTITION BY T.CUSTOMER_PO,TRUNC(T.CREATE_DATE) ORDER BY T.RELEASE_ID DESC) REL_ON_DATE,
    T.RELEASE_ID,
    T.RELEASE_DATE,
    T.CREATE_DATE,
    T.PREVIOUS_ACCUM_QTY
  FROM
    T1 T
  WHERE 
    T.CREATE_DATE=TO_DATE('24-APR-2013','DD-MON-YYYY')) T
WHERE
  T.REL_ON_DATE=1
ORDER BY
  T.PART_ID,
  T.CUSTOMER_PO,
  T.RELEASE_ID;

PART_ID    CUSTOMER_PO     RELEASE_ID      POSSIBLE_ PREVIOUS_ACCUM_QTY
---------- --------------- --------------- --------- ------------------
DS1812+    1231234         20130403-001    18-JUN-13                 24
DS1812+    1233290         20130403-001    18-JUN-13                  4
DS212+     1133290         20130403-001    18-JUN-13                  2
RS812      1231280         20130403-001    18-JUN-13                 60

Next, we need to retrieve the forecasted delivery schedule (in the T1_LINES table) that are associated with the above rows.  If we take the value in the PREVIOUS_ACCUM_QTY column, and find a way to generate a running SUM of the QTY column from the T1_LINE table, in theory we should be able to determine if the total quantity ordered through a specific date has changed.  The analytic version of the SUM function offers the capability of producing a running SUM when an ORDER BY clause is included in the function call.  To reduce the data set size, I will only consider those forecasts with a scheduled delivery date that is within 101 days of the customer’s forecast date.  With those changes in place, the SQL statement is modified as follows:

SELECT
  T.PART_ID,
  T.CUSTOMER_PO,
  T.RELEASE_ID,
  T.RELEASE_DATE+55 POSSIBLE_FIRM_DATE,
  T.PREVIOUS_ACCUM_QTY,
  TL.DUE_DATE,
  TL.QTY,
  T.PREVIOUS_ACCUM_QTY + SUM(TL.QTY) OVER (PARTITION BY T.CUSTOMER_PO,T.RELEASE_ID ORDER BY TL.DUE_DATE) REL_RUNNING_SUM
FROM
  (SELECT
    T.PART_ID,
    T.CUSTOMER_PO,
    ROW_NUMBER() OVER (PARTITION BY T.CUSTOMER_PO,TRUNC(T.CREATE_DATE) ORDER BY T.RELEASE_ID DESC) REL_ON_DATE,
    T.RELEASE_ID,
    T.RELEASE_DATE,
    T.CREATE_DATE,
    T.PREVIOUS_ACCUM_QTY
  FROM
    T1 T
  WHERE 
    T.CREATE_DATE=TO_DATE('24-APR-2013','DD-MON-YYYY')) T,
  T1_LINE TL
WHERE
  T.REL_ON_DATE=1
  AND T.CUSTOMER_PO=TL.CUSTOMER_PO
  AND T.RELEASE_ID=TL.RELEASE_ID
  AND (T.RELEASE_DATE+100)>=TL.DUE_DATE
ORDER BY
  T.PART_ID,
  T.CUSTOMER_PO,
  T.RELEASE_ID,
  TL.DUE_DATE;

PART_ID    CUSTOMER_PO     RELEASE_ID      POSSIBLE_ PREVIOUS_ACCUM_QTY DUE_DATE         QTY REL_RUNNING_SUM
---------- --------------- --------------- --------- ------------------ --------- ---------- ---------------
DS1812+    1231234         20130403-001    18-JUN-13                 24 24-APR-13          0              24
DS1812+    1233290         20130403-001    18-JUN-13                  4 24-APR-13          0               4
DS212+     1133290         20130403-001    18-JUN-13                  2 24-APR-13          0               2
RS812      1231280         20130403-001    18-JUN-13                 60 27-JUN-13          1              61
RS812      1231280         20130403-001    18-JUN-13                 60 03-JUL-13          1              62
RS812      1231280         20130403-001    18-JUN-13                 60 24-JUL-13          2              64

As may be seen above, only customer purchase order 1231280 has a forecast, calling for a quantity of 1 on June 27, quantity of 1 on July 3, and a quantity of 2 on July 24.  The previously received quantity for customer purchase order 1231280 is 60, so the additional forecast quantities will bring the total order quantities up to 61, 62, and 64 through the three dates included in the forecast.

-The Top Half of the Final SQL Statement is Above-

So, we now have a starting point – the data that was received on April 24, 2013.  How does that forecast data compare with previously received forecasts?  We need to start building the second half of the SQL statement, which looks at the historical data.  To limit the dataset size, I will limit the data to only those EDI documents that were received up to 56 days prior (the eight week “firm” period) to the April 24, 2013 date that was specified earlier in this article:

SELECT
  T.CUSTOMER_PO,
  ROW_NUMBER() OVER (PARTITION BY T.CUSTOMER_PO,TRUNC(T.CREATE_DATE) ORDER BY T.RELEASE_ID DESC) REL_ON_DATE,
  TRUNC(T.CREATE_DATE) CREATE_DATE,
  T.RELEASE_ID,
  T.PREVIOUS_ACCUM_QTY
FROM
  T1 T
WHERE 
  T.CREATE_DATE >= TO_DATE('24-APR-2013','DD-MON-YYYY') - 56
  AND T.CREATE_DATE < TO_DATE('24-APR-2013','DD-MON-YYYY')
ORDER BY
  T.CUSTOMER_PO,
  T.RELEASE_ID;

CUSTOMER_PO     REL_ON_DATE CREATE_DA RELEASE_ID      PREVIOUS_ACCUM_QTY
--------------- ----------- --------- --------------- ------------------
1131111                   1 27-FEB-13 0371                           575
1131111                   1 01-MAR-13 0381                           577
1131111                   1 06-MAR-13 0431                           578
1131111                   1 07-MAR-13 0441                           581
1131111                   1 08-MAR-13 0451                           581
1131111                   1 11-MAR-13 0461                           581
...
1131111                   1 08-APR-13 0741                           593
1131111                   1 11-APR-13 0791                           593
1131111                   1 23-APR-13 0911                           595
1131245                   1 27-FEB-13 0371                          2299
1131245                   1 01-MAR-13 0381                          2300
1131245                   1 05-MAR-13 0421                          2303
...
1131745                   1 19-APR-13 0871                           435
1131745                   1 23-APR-13 0911                           435
1133290                   1 27-FEB-13 20130205-001                     2
1133290                   1 28-FEB-13 20130206-001                     2
1133290                   1 01-MAR-13 20130207-001                     2
...
1133290                   1 19-APR-13 20130328-001                     2
1133290                   1 23-APR-13 20130401-001                     2
1231234                   1 27-FEB-13 20130205-001                    24
1231234                   1 28-FEB-13 20130206-001                    24
...
1231234                   1 23-APR-13 20130401-001                    24
1231280                   1 27-FEB-13 20130205-001                    57
1231280                   1 28-FEB-13 20130206-001                    57
1231280                   1 01-MAR-13 20130207-001                    57
1231280                   1 04-MAR-13 20130208-001                    58
1231280                   1 05-MAR-13 20130211-001                    58
...
1231280                   1 04-APR-13 20130313-001                    60
1231280                   2 05-APR-13 20130314-002                    60
...
1231280                   1 19-APR-13 20130328-001                    60
1231280                   1 23-APR-13 20130401-001                    60
1233290                   1 27-FEB-13 20130205-001                     4
1233290                   1 28-FEB-13 20130206-001                     4
...

As we found earlier, there could be more than one release for a customer purchase order received in a single date, and we are only concerned with the most recent release received on any particular day.  We have a second issue, in that the dates may have shifted from one release to the next, so we have to artifically generate rows with the potentially missing dates, with an order quantity of 0 on those dates.

But, how to generate a list of dates?  We could use a CONNECT BY LEVEL trick:

SELECT
  TO_DATE('24-APR-2013','DD-MON-YYYY')+(LEVEL-1) DUE_DATE
FROM
  DUAL
CONNECT BY
  LEVEL<=100;

DUE_DATE
---------
24-APR-13
25-APR-13
26-APR-13
27-APR-13
28-APR-13
...
15-JUN-13
16-JUN-13
17-JUN-13
18-JUN-13
...
30-JUL-13
31-JUL-13
01-AUG-13

Now, by creating a Cartesian join between the dataset with the dates and the dataset with the EDI release header records, we will make certain that a date exists for every date that is potentially of interest (up to 100 days after April 24, 2013) when we later join the result with the T1_LINE table (the child table that lists the forecast delivery dates).

SELECT
  T2.CUSTOMER_PO,
  T2.RELEASE_ID,
  T2.CREATE_DATE,
  T2.PREVIOUS_ACCUM_QTY,
  D.DUE_DATE
FROM
  (SELECT
    T.CUSTOMER_PO,
    ROW_NUMBER() OVER (PARTITION BY T.CUSTOMER_PO,TRUNC(T.CREATE_DATE) ORDER BY T.RELEASE_ID DESC) REL_ON_DATE,
    TRUNC(T.CREATE_DATE) CREATE_DATE,
    T.RELEASE_ID,
    T.PREVIOUS_ACCUM_QTY
  FROM
    T1 T
  WHERE 
    T.CREATE_DATE >= TO_DATE('24-APR-2013','DD-MON-YYYY') - 56
    AND T.CREATE_DATE < TO_DATE('24-APR-2013','DD-MON-YYYY')
  ORDER BY
    T.CUSTOMER_PO,
    T.RELEASE_ID) T2,
  (SELECT
    TO_DATE('24-APR-2013','DD-MON-YYYY')+(LEVEL-1) DUE_DATE
  FROM
    DUAL
  CONNECT BY
    LEVEL<=100) D
WHERE
  T2.REL_ON_DATE=1
ORDER BY
  T2.CUSTOMER_PO,
  T2.RELEASE_ID,
  D.DUE_DATE;

...
CUSTOMER_PO     RELEASE_ID      CREATE_DA PREVIOUS_ACCUM_QTY DUE_DATE
--------------- --------------- --------- ------------------ ---------
1233290         20130318-001    09-APR-13                  4 25-MAY-13
1233290         20130318-001    09-APR-13                  4 26-MAY-13
1233290         20130318-001    09-APR-13                  4 27-MAY-13
1233290         20130318-001    09-APR-13                  4 28-MAY-13
...
1233290         20130322-001    15-APR-13                  4 22-JUN-13
1233290         20130322-001    15-APR-13                  4 23-JUN-13
1233290         20130322-001    15-APR-13                  4 24-JUN-13
1233290         20130322-001    15-APR-13                  4 25-JUN-13
...
1233290         20130401-001    23-APR-13                  4 29-JUL-13
1233290         20130401-001    23-APR-13                  4 30-JUL-13
1233290         20130401-001    23-APR-13                  4 31-JUL-13
1233290         20130401-001    23-APR-13                  4 01-AUG-13

Next, we will outer join to the T1_LINE table to bring in the actual forecasted delivery schedule from the previous dates (note that I also removed an unnecessary ORDER BY clause that was still embedded in the SQL statement – the Oracle query optimizer probably would have realized that the ORDER BY clause was meaningless to the query result, and automatically removed it – but I removed the clause just in case my assumption is incorrect):

SELECT
  T3.CUSTOMER_PO,
  T3.CREATE_DATE,
  T3.DUE_DATE,
  T3.PREVIOUS_ACCUM_QTY + SUM(NVL(TL.QTY,0)) OVER (PARTITION BY T3.CUSTOMER_PO,T3.RELEASE_ID ORDER BY T3.DUE_DATE) REL_RUNNING_SUM
FROM
  (SELECT
    T2.CUSTOMER_PO,
    T2.RELEASE_ID,
    T2.CREATE_DATE,
    T2.PREVIOUS_ACCUM_QTY,
    D.DUE_DATE
  FROM
    (SELECT
      T.CUSTOMER_PO,
      ROW_NUMBER() OVER (PARTITION BY T.CUSTOMER_PO,TRUNC(T.CREATE_DATE) ORDER BY T.RELEASE_ID DESC) REL_ON_DATE,
      TRUNC(T.CREATE_DATE) CREATE_DATE,
      T.RELEASE_ID,
      T.PREVIOUS_ACCUM_QTY
    FROM
      T1 T
    WHERE 
      T.CREATE_DATE >= TO_DATE('24-APR-2013','DD-MON-YYYY') - 56
      AND T.CREATE_DATE < TO_DATE('24-APR-2013','DD-MON-YYYY')) T2,
    (SELECT
      TO_DATE('24-APR-2013','DD-MON-YYYY')+(LEVEL-1) DUE_DATE
    FROM
      DUAL
    CONNECT BY
      LEVEL<=100) D
  WHERE
    T2.REL_ON_DATE=1) T3,
  T1_LINE TL
WHERE
  T3.CUSTOMER_PO=TL.CUSTOMER_PO(+)
  AND T3.RELEASE_ID=TL.RELEASE_ID(+)
  AND T3.DUE_DATE=TL.DUE_DATE(+)
ORDER BY
  T3.CUSTOMER_PO,
  T3.RELEASE_ID,
  T3.DUE_DATE;

CUSTOMER_PO     CREATE_DA DUE_DATE  REL_RUNNING_SUM
--------------- --------- --------- ---------------
1131111         27-FEB-13 24-APR-13             577
1131111         27-FEB-13 25-APR-13             578
1131111         27-FEB-13 26-APR-13             579
1131111         27-FEB-13 27-APR-13             579
1131111         27-FEB-13 28-APR-13             579
1131111         27-FEB-13 29-APR-13             580
1131111         27-FEB-13 30-APR-13             581
...
1131111         23-APR-13 30-JUL-13             636
1131111         23-APR-13 31-JUL-13             638
1131111         23-APR-13 01-AUG-13             639
...
1231280         11-MAR-13 08-MAY-13              58
1231280         11-MAR-13 09-MAY-13              58
1231280         11-MAR-13 10-MAY-13              58
1231280         11-MAR-13 11-MAY-13              58
...
1231280         23-APR-13 21-JUL-13              62
1231280         23-APR-13 22-JUL-13              62
1231280         23-APR-13 23-JUL-13              62
1231280         23-APR-13 24-JUL-13              64
1231280         23-APR-13 25-JUL-13              64
1231280         23-APR-13 26-JUL-13              64
...

We are getting close.  Now we just need to join the SQL created earlier in this article with the above SQL statement so that the current customer purchase order forecasts may be compared with the earlier received customer purchase order forecasts.  The LISTAGG analytic funtion, introduced with Oracle Database 11.1, makes it easy to condense multiple rows from the resultset into a much smaller number of rows, if the resultset is grouped on at least one column.  So, the final version of the analysis challenge appears as follows:

SELECT
  C.PART_ID,
  C.CUSTOMER_PO,
  C.POSSIBLE_FIRM_DATE,
  C.DUE_DATE,
  C.QTY,
  C.REL_RUNNING_SUM,
  LISTAGG(T4.REL_RUNNING_SUM||' ('||TO_CHAR(T4.CREATE_DATE,'MM/DD/YY')||')',', ') WITHIN GROUP (ORDER BY T4.CREATE_DATE DESC) EXCEPTIONS
FROM
  (SELECT
    T.PART_ID,
    T.CUSTOMER_PO,
    T.RELEASE_ID,
    T.RELEASE_DATE+55 POSSIBLE_FIRM_DATE,
    T.PREVIOUS_ACCUM_QTY,
    TL.DUE_DATE,
    TL.QTY,
    T.PREVIOUS_ACCUM_QTY + SUM(TL.QTY) OVER (PARTITION BY T.CUSTOMER_PO,T.RELEASE_ID ORDER BY TL.DUE_DATE) REL_RUNNING_SUM
  FROM
    (SELECT
      T.PART_ID,
      T.CUSTOMER_PO,
      ROW_NUMBER() OVER (PARTITION BY T.CUSTOMER_PO,TRUNC(T.CREATE_DATE) ORDER BY T.RELEASE_ID DESC) REL_ON_DATE,
      T.RELEASE_ID,
      T.RELEASE_DATE,
      T.CREATE_DATE,
      T.PREVIOUS_ACCUM_QTY
    FROM
      T1 T
    WHERE 
      T.CREATE_DATE=TO_DATE('24-APR-2013','DD-MON-YYYY')) T,
    T1_LINE TL
  WHERE
    T.REL_ON_DATE=1
    AND T.CUSTOMER_PO=TL.CUSTOMER_PO
    AND T.RELEASE_ID=TL.RELEASE_ID
    AND (T.RELEASE_DATE+100)>=TL.DUE_DATE) C,
  (SELECT
    T3.CUSTOMER_PO,
    T3.CREATE_DATE,
    T3.DUE_DATE,
    T3.PREVIOUS_ACCUM_QTY + SUM(NVL(TL.QTY,0)) OVER (PARTITION BY T3.CUSTOMER_PO,T3.RELEASE_ID ORDER BY T3.DUE_DATE) REL_RUNNING_SUM
  FROM
    (SELECT
      T2.CUSTOMER_PO,
      T2.RELEASE_ID,
      T2.CREATE_DATE,
      T2.PREVIOUS_ACCUM_QTY,
      D.DUE_DATE
    FROM
      (SELECT
        T.CUSTOMER_PO,
        ROW_NUMBER() OVER (PARTITION BY T.CUSTOMER_PO,TRUNC(T.CREATE_DATE) ORDER BY T.RELEASE_ID DESC) REL_ON_DATE,
        TRUNC(T.CREATE_DATE) CREATE_DATE,
        T.RELEASE_ID,
        T.PREVIOUS_ACCUM_QTY
      FROM
        T1 T
      WHERE 
        T.CREATE_DATE >= TO_DATE('24-APR-2013','DD-MON-YYYY') - 56
        AND T.CREATE_DATE < TO_DATE('24-APR-2013','DD-MON-YYYY')) T2,
      (SELECT
        TO_DATE('24-APR-2013','DD-MON-YYYY')+(LEVEL-1) DUE_DATE
      FROM
        DUAL
      CONNECT BY
        LEVEL<=100) D
    WHERE
      T2.REL_ON_DATE=1) T3,
    T1_LINE TL
  WHERE
    T3.CUSTOMER_PO=TL.CUSTOMER_PO(+)
    AND T3.RELEASE_ID=TL.RELEASE_ID(+)
    AND T3.DUE_DATE=TL.DUE_DATE(+)) T4
WHERE
  C.CUSTOMER_PO=T4.CUSTOMER_PO
  AND C.DUE_DATE=T4.DUE_DATE
  AND C.REL_RUNNING_SUM<>T4.REL_RUNNING_SUM
GROUP BY
  C.PART_ID,
  C.CUSTOMER_PO,
  C.POSSIBLE_FIRM_DATE,
  C.DUE_DATE,
  C.QTY,
  C.REL_RUNNING_SUM
ORDER BY
  C.PART_ID,
  C.CUSTOMER_PO,
  C.DUE_DATE;

PART_ID    CUSTOMER_PO     POSSIBLE_ DUE_DATE         QTY REL_RUNNING_SUM
---------- --------------- --------- --------- ---------- ---------------
EXCEPTIONS
-----------------------------------------------------------------------------------------------------------------------
RS812      1231280         18-JUN-13 27-JUN-13          1              61
60 (03/22/13), 60 (03/21/13), 60 (03/19/13), 60 (03/18/13), 60 (03/15/13), 60 (03/14/13), 59 (03/13/13), 59 (03/12/13),
59 (03/11/13), 59 (03/08/13), 60 (03/07/13), 60 (03/06/13), 60 (03/05/13), 60 (03/04/13), 59 (03/01/13), 59 (02/28/13),
59 (02/27/13)

RS812      1231280         18-JUN-13 03-JUL-13          1              62
61 (03/22/13), 61 (03/21/13), 61 (03/19/13), 61 (03/18/13), 61 (03/15/13), 61 (03/14/13), 60 (03/13/13), 60 (03/12/13),
60 (03/11/13), 60 (03/08/13), 61 (03/07/13), 61 (03/06/13), 61 (03/05/13), 61 (03/04/13), 60 (03/01/13), 60 (02/28/13),
60 (02/27/13)

RS812      1231280         18-JUN-13 24-JUL-13          2              64
63 (03/26/13), 63 (03/25/13), 62 (03/22/13), 62 (03/21/13), 62 (03/19/13), 62 (03/18/13), 62 (03/15/13), 62 (03/14/13),
61 (03/13/13), 61 (03/12/13), 61 (03/11/13), 61 (03/08/13), 62 (03/07/13), 62 (03/06/13), 62 (03/05/13), 62 (03/04/13),
61 (03/01/13), 61 (02/28/13), 61 (02/27/13)

It is now easy to see that the requested total quantity due through June 27, 2013 increased by 2 since the first forecast which was received on February 27, 2013.  The requested total quantity increased by 1 in the EDI release that was received on March 4, 2013, and by 1 again on the next EDI release that was received after March 22, 2013.  The second and third lines of the output show that there was a little more fluctuation in the customer’s forecast, increasing in quantity, decreasing in quantity, and then increasing in quantity again.

An unfortunate problem may be present – what if the customer completely eliminates the forecast orders on their purchase order, but still sends in a header line for the forecast?  A future enhancement may be required, where the CONNECT BY LEVEL trick is used in the first half of the SQL statement (the current releases), just as it is used in the second half of the SQL statement (the historical releases), so that when the top half of the final SQL statement is joined to the bottom half, rows returned by the bottom half of the SQL statement are not unnecessarily removed from the output.

The output is a little ugly.  One approach to clean up the output involves spooling the output to an HTML file as explained here and in the documentation here.  Another approach to clean up the output involves sending the data into an Excel spreadsheet using either a macro or the Microsoft Query Wizard, accessed from within Excel (see the articles in the Excel category of this blog for an explanation of these approaches).

—–

The final solution for this analysis challenge is so simple when built up from smaller SQL statements as shown above; I wonder why I did not consider this approach 12 years ago?  :-)





Unexpected Timer Resolution, Unexpected Parked CPUs, Unexpected Power Consumption

19 04 2013

April 19, 2013 (Modified May 11, 2013, June 5, 2014)

This blog article is not purely Oracle Database specific, yet it may have some relevance to companies that run Oracle Database on the Windows Server platform (for those DBAs lucky/unlucky enough to run Oracle Database on the Windows Server platform, you may find this article interesting).

I am in the process of setting up a couple of new Windows servers to perform various non-Oracle Database tasks.  I noticed that one of the servers had an odd issue – the server would occasionally become very slow at responding to mouse movements and keyboard input, for instance taking 30 seconds to move the mouse pointer a short distance across the screen.  These servers are running Windows Server 2012, which shares the same kernel and includes much the same features as Windows 8 – with the exception that the server operating system opens to the desktop rather than Windows 8’s new start screen.

Two years ago I wrote a brain teaser article that asked how it was possible that a 10046 extended SQL trace could output c=15600,e=510 on a line of the trace file when executing a SQL statement without using parallel query – essentially asking how it was possible to consume 0.015600 seconds of CPU time in 0.000510 seconds of elapsed time when the SQL statement was restricted to running on no more than one CPU.  In the comments section of the article I mentioned the ClockRes utility, but did not provide a link for the download of the program.  So, I thought that I would run the ClockRes utility on one of the new servers, make a change to the server, and then run the ClockRes utility again:

UnexpectedClockResOutput

As can be seen above, on the first execution of ClockRes the Current timer interval was 1.001 ms, while on the second execution of the ClockRes program the Current timer interval was 15.626 ms.  There is an odd similarity between that 15.626ms time (which oddly exceeds the reported Maximum timer interval of 15.625ms) and the c=15600 reported in the Oracle 10046 extended SQL trace file.  So, what change did I make to the server between the first execution of ClockRes utility and the second execution?  For now I will just say that I stopped one of the background services on the server (more later).

I recall performing an experiment a couple of years ago with Oracle Database.  I downloaded a utility that offered to change the Windows default timer resolution from 15.625ms to 1.0ms.  That utility did in fact change the Windows timer resolution, resulting in Oracle Database outputting c= values in increments of 1000, rather than in increments of 15600.  If I am remembering correctly, a second outcome of the experiment was a decrease in performance of the test Oracle database on the computer due to the higher resolution of the Windows timer.

Could the change in the resolution of the Windows timer from the Windows default of 15.625ms to 1.001ms be responsible for the occasionally sluggish performance of the server?  One article that I found (and unfortunately did not save the link to) claimed that adjusting the Windows timer from the default of 15.625ms to a lower value, 1ms for example, could cause a significant negative impact in multitasking system performance (roughly 30% decrease, if I recall correctly).  I located an article on Microsoft’s website that offered some level of clarification, below is a short quote from the article:

“Applications can call timeBeginPeriod to increase the timer resolution. The maximum resolution of 1 ms is used to support graphical animations, audio playback, or video playback. This not only increases the timer resolution for the application to 1 ms, but also affects the global system timer resolution, because Windows uses at least the highest resolution (that is, the lowest interval) that any application requests. Therefore, if only one application requests a timer resolution of 1 ms, the system timer sets the interval (also called the “system timer tick”) to at least 1 ms. For more information, see “timeBeginPeriod Function” on the MSDN® website.

Modern processors and chipsets, particularly in portable platforms, use the idle time between system timer intervals to reduce system power consumption. Various processor and chipset components are placed into low-power idle states between timer intervals. However, these low-power idle states are often ineffective at lowering system power consumption when the system timer interval is less than the default.

If the system timer interval is decreased to less than the default, including when an application calls timeBeginPeriod with a resolution of 1 ms, the low-power idle states are ineffective at reducing system power consumption and system battery life suffers.”

The above mentioned Microsoft article also suggested running the following command from the Windows command line:

powercfg /energy

I had actually executed the above command before running the ClockRes program for the first time, and again after running the ClockRes program for the second time.  A very small portion of the powercfg generated HTML file follows, generated prior to the first execution of ClockRes:

Platform Timer Resolution:Platform Timer Resolution
The default platform timer resolution is 15.6ms (15625000ns) and should be used whenever the system is idle. If the timer resolution is increased, processor power management technologies may not be effective. The timer resolution may be increased due to multimedia playback or graphical animations.
Current Timer Resolution (100ns units) 10009
Maximum Timer Period (100ns units) 156250

Platform Timer Resolution:Outstanding Timer Request
A program or service has requested a timer resolution smaller than the platform maximum timer resolution.
Requested Period 10000
Requesting Process ID 536
Requesting Process Path \Device\HarddiskVolume4\PROGRA~2\APC\POWERC~1\agent\pbeagent.exe

This is the same section of the generated HTML file, generated after the second execution of ClockRes:

Platform Timer Resolution:Platform Timer Resolution
The default platform timer resolution is 15.6ms (15625000ns) and should be used whenever the system is idle. If the timer resolution is increased, processor power management technologies may not be effective. The timer resolution may be increased due to multimedia playback or graphical animations.
Current Timer Resolution (100ns units) 156261

That is potentially interesting.  The output of powercfg stated that PROGRA~2\APC\POWERC~1\agent\pbeagent.exe requested a timer of 1.000 ms, which then changed the Windows server system-wide timer to 1.0009ms.  Interesting?  PROGRA~2\APC\POWERC~1\agent\pbeagent.exe resolves to the “APC PBE Agent” service in Windows, which is a component of the American Power Conversion (APC) PowerChute Business Edition software.  That software interfaces with an attached UPS to provide a gentle shutdown of the server in the event of an extended power outage.  The “APC PBE Agent” service happens to be the service that I shut down between the first and second execution of the ClockRes utility.

Interesting?  Does that suggest that installing the APC PowerChute Business Edition software on a server potentially has a significant impact on the performance of that server due to the program’s insistance on changing the Windows system-wide timer resolution to 1ms?  A quick observation indicates that the change made by the APC software to the Windows system-wide timer resolution does NOT apparently affect the reporting of the c=15600 entries in an Oracle Database 10046 extended SQL trace when the APC software is installed on the server.  The question remains whether or not this APC software could significantly decrease the performance of that Oracle Database software (potentially by 30%, as suggested in the one unnamed article).

——

The Windows Server that is experiencing occasionally jittery mouse and keyboard input is reasonally high-end for a Windows server: Intel Xeon E5-2690 8 core CPU at 2.9GHz (with hyperthreading enabled, giving the appearance of 16 CPUs in Windows), 64GB of memory, RAID controller with 1GB of battery backed cache, 16 internal 10,000 RPM hard drives, two gigabit network adapters in a teamed configuration, etc.  It should require a substantial load on the server to cause the jittery mouse and keyboard input behavior.

The power option plan in Windows was set to High Performance, while the default plan in Windows Server is Balanced.  Various articles on Microsoft’s website state that the Balanced plan allows the server/operating system to use CPU speed throttling (reducing the CPU speed from the stated speed rating, 2.9GHz in the case of this server), and core parking (essentially putting one or more CPU cores to sleep) in order to reduce energy consumption.  Some articles on Microsoft’s site indicate that, at least with Windows Server 2008, that CPU parking may increase IO latencies – that, of course, would be bad if Oracle Database were installed on the server.  Other articles on Microsoft’s site indicate that there are bugs, at least with Windows Server 2008, related to core parking which causes the parked cores not to wake up when the CPU load increases.  I wonder if this particular bug is playing a part in the performance issue faced in this very recent Usenet thread that describes poor performance of Oracle Database running in Hyper-V on Windows?

Here is a screen capture of the Power Options window and Task Manager on the Windows Server 2012 machine that is experiencing occasionally jittery mouse and keyboard input (screen capture taken when the server was mostly idle):

UnexpectedPowerOptionsTaskManager

Notice the inconsistency?  The server’s CPU is throttled down from 2.9GHz to just 1.16GHz while the power option plan is set to High Performance.  The Microsoft published “Performance Tuning Guidelines for Windows Server 2012” document on pages 16-17 states the following (I highlighted some of the words in red):

Balanced (recommended): Default setting. Targets good energy efficiency with minimal performance impact.  Matches capacity to demand. Energy-saving features balance power and performance.

High Performance: Increases performance at the cost of high energy consumption. Power and thermal limitations, operating expenses, and reliability considerations apply.  Processors are always locked at the highest performance state (including “turbo” frequencies). All cores are unparked.

Power Saver: Limits performance to save energy and reduce operating cost.  Caps processor frequency at a percentage of maximum (if supported), and enables other energy-saving features.”

Well, that is interesting, and is inconsistent with the above screen capture.  Incidentally, when the server was experiencing the worst of the occasionally jittery mouse and keyboard input, the CPU utilization was hovering around 6% and the CPU speed was still coasting at 1.16GHz to 1.18GHz, the network performance hovered between 600Mbps and 1100Mbps, and the server’s internal hard drives barely noticed the traffic passing to/from the disks through the network interface (lower than 75MB/s and 137MB/s, respectively).  6% CPU utilization causes the mouse and keyboard input to become jittery?  With hyperthreading enabled, there is essentially 16 available CPU seconds per each second of elapsed time.  A quick check: 1/16 = 0.0625, so 1 (hyperthreaded) CPU at 100% utilization would be reported as a system-wide utilization of 6.25%.  Interesting, but is that statistic relevant?

I happened to have the Windows Resource Monitor open during one of the jittery episodes.  The Resource Monitor showed, shockingly, that 14 (possibly 15) of the hyperthreaded “CPUs” were parked!  That result is also in conflict with the Microsoft document mentioned above regarding “all cores are unparked” when the High Performance power plan is selected.  So, at 6% CPU utilization the server was CPU constrained.  Modifying the setting in the server’s BIOS that controls whether or not cores may be parked, so that the cores could not be parked, fixed the issue in Windows Server 2012 that resulted in the 30 second delay that accompanied moving the mouse pointer a short distance across the screen.

The server still exhibits a bit of jittery behavior with mouse and keyboard input when the server’s teamed network cards are heavily used for file transfers to the server, but at least the CPU activity is no longer confined to a single hyperthreaded “CPU”:

UnexpectedResourceMonitor

Considering that this server was ordered from the manufacturer as “performance optimized”, I am a bit surprised at the power consumption of the server.  The server was ordered with dual (redundant) 1100 watt power supplies.  With the CPU’s 135 watt maximum TDP (per Intel: “Thermal Design Power (TDP) represents the near maximum power a product can draw for a thermally significant period while running commercially available software.”), 16 hard drives, and 64GB of memory, I fully expected the server to consume between 700 and 900 watts of electrical power.

Here is the server’s power consumption when the server is lightly loaded with roughly 68 running processes (note that the server is connected to a 120 volt power outlet):

UnexpectedPowerConsumptionLittleLoad

Here is the server’s power consumption when the server is moderately loaded with between 600Mbps and 1100Mbps of network traffic (the mouse pointer was slightly jittery at this point):

UnexpectedPowerConsumptionNetworkLoad

So, the server consumes 1.2 amps (126 watts) when lightly loaded and 1.4 amps (154 watts) when moderately loaded.  Keeping in mind that many of the popular incandescent light bulbs require 100 watts of power (note that some governments have now restricted the manufacturing of high wattage incandescent light bulbs), this server is consuming just a little more electrical power than a light bulb that might have been hung overhead just a decade or two ago.

One of the common arguments for server virtualization is energy savings – the above screen captures may suggest that energy savings may not be a significant cost-savings factor for virtualization with modern server hardware.  One might question how much energy is really being saved when the network interface is maxed out by a single virtualized server, just 6% CPU utilization results in a jittering mouse pointer, and there are eight to ten virtualized servers stacked on the physical hardware (all competing for the scarce CPU and network resources).

Added May 11, 2013:

Dell BIOS setting to enable or disable CPU parking in Windows Server 2012:

UnexpectedPowerConsumptionBIOSProcIdle

With the BIOS option set to enabled, disk activity caused by network traffic results in occasionally jittery mouse movements on the server.  Based on a bit of research, installing the Hyper-V role on either Windows Server 2012 or Windows 8 may disable CPU throttling and/or disable CPU parking.

Added June 5, 2014:

I finally had sufficient time to fully analyze this problem, where a 2.9GHz CPU in a Dell PowerEdge T620 server crawled along at a leasurely pace of about 1.16GHz, actually throttling back performance further as demand for the server’s resources increased.  A second Dell PowerEdge T620 server with a 2.6GHz CPU that was purchased at the same time also coasted along at roughly 1.16GHz, but that server did not seem to throttle back performance further as demand for the server’s resources increased.

As a review, the screen capture shown below at the left shows the Windows Server 2012 Power Options settings and the Performance tab of the Task Manager.  The screen capture below at the right shows the Windows Server 2012 Power Options settings and the Performance tab of the Task Manager after fixing this particular problem – note that the 2.9GHz CPU is now essentially overclocked at 3.28GHz (it has operated at roughly that speed since the fix).

UnexpectedT620PowerOptionsTaskManager UnexpectedT620PowerOptionsTaskManager2

The 2.9GHz PowerEdge T620 and the 2.6GHz PowerEdge T620 are both Active Directory domain controllers and internal DNS servers (along with supporting other tasks), so the occasionally slow (or extremely slow) performance of the servers negatively impacted the performance of other servers as well as client workstations.

There was a BIOS firmware update released in the third quarter of 2013, which was supposed to address some CPU throttling issues – that BIOS update did not seem to help the problem that I experienced.

I thought that the low power consumption of the big server with the 2.9 GHz E5-2690 8 core CPU was a significant clue when I tried troubleshooting the server a year ago since that CPU is rated to consume up to 135 watts and the server liked to hover between 120 watts and 140 watts regardless of the server’s workload.  The Dell T620 (and other recent Dell servers) has some pretty sophisticated power management capabilities.  Dell sells a utility that is able to alter the electrical power profile of a server, and I thought that Dell might have imposed a 140 watt limit on the server for some reason, but I could not find where that limit was specified.  The 2.9 GHz E5-2690 8 core CPU apparently has some additional electrical power limiting capabilities.  A year ago I even tried downloading a demo of Dell’s power management utility – that did not help resolve the issue (I think that the installation might have caused some other issues that I had to fix).  Last week Tuesday I read the following articles:
http://www.dell.com/learn/us/en/19/financial-services-markets-solutions-processor-acceleration-technology
http://www.dell.com/us/business/p/dell-openmanage-power-center/pd
http://www.intel.com/content/www/us/en/data-center/data-center-management/how-to-configure-node-manager-video.html
ftp://ftp.dell.com/Manuals/Common/poweredge-r720_Concept%20Guide_en-us.pdf
http://en.community.dell.com/techcenter/power-cooling/w/wiki/3536.openmanage-power-center-faq.aspx

I rebooted the server, pressed F2, and dug around in the settings a bit.  I found that the System Profile Setting was set to “Performance per Watt” (I believe that this was how it was set when it left the Dell factory).  I changed that setting to “Performance”, saved the changes, and rebooted the server again.  The server is now consuming 200+ watts, and the CPU is freely exceeding its rated speed.  Once in the System BIOS settings, the pictures below show the configuration changes to remove the electric power cap, thus allowing the server to behave as it should have from the factory:

UnexpectedT620SystemProfile1 UnexpectedT620SystemProfile2

UnexpectedT620SystemProfile3 UnexpectedT620SystemProfile4

I suppose that if a Dell PowerEdge T620 (or similar recent model Dell server) seems to be running a bit slower than expected (note that the particular problem mentioned above is NOT Windows specific – a Dell PowerEdge T620 running Linux should be affected in the same way), you might take a quick peek at the System Profile Setting in the System BIOS to make certain that the System Profile is set to Performance.  As shipped from the factory, two Dell PowerEdge T620 servers purchased this year were NOT affected by the problems mentioned in this blog article.





Value of Improving One’s Knowledge

24 03 2013

March 24, 2013

As I type this blog article I am in the middle of reading a second book on the topic of Windows Server 2008 R2.  I bought the books several months ago, possibly even a year or two ago, and just had not found the time to invest in reading yet another dry IT-related 1,500 page book.  I would much prefer to read an exciting Oracle Database book, but those books seem to be in short supply (and I really do need to finish these two Windows Server 2008 R2 books so that I can move onto the Windows Server 2012 books, the Exchange Server 2013 books, and the Oracle Database 12c books that will be released eventually).

Three years ago I put together an article about finding the motivation to keep reading through various IT books, and offered a suggested reading list for people interested specifically in various aspects of Oracle Database (totally unrelated, but I recall that the word aspect was a pet peeve of one of my college English professors… he would say, “What is an aspect?” – I probably have used that word about five times since then).  I think that my list of suggested books is still a decent list.  However, Kyle Hailey recently put together a related article, “Where to begin with Oracle and SQL” that is excellent, and a bit more up to date than my list – the only book on Kyle’s list that I have not read is written by Dan Tow.

So, for people working in the information technology (IT) disciplines, how much effort do you put into improving your knowledge, or just maintaining that knowledge?  Do you actively try to break things, in the hope of discovering a solution?  Do you participate in discussion forums, such as the OTN forums or Usenet?  Do you read magazines, books, and various online articles?  Or, do you simply continue to perform the same actions that you have for the last five, 10, or 20 years (if the RULE worked just fine back then, why change)?

I try to read at least a couple of IT focused books a year.  For many of the Oracle Database specific books read since mid-2008 I have tried to write formal reviews of those books, posted to Amazon.com.  One of those reviews stretched to 35.5 typewritten pages!  Early yesterday morning a reader left the following comment attached to that review:

Mar 22, 2013 11:43:09 PM PDT SavvyShopper says:
Charles, you seem Very Intelligent(?) and should then Write a Book, fixing all these issues, than writing so much … :) How do you get so much time , if you are really “Working” ??

Oddly, the person deleted their comment a mere two hours after I posted a reply.  My first reaction to the comment was “must spending time expanding one’s knowledge be mutually exclusive to being employed?”  That was not the first time that someone using Amazon left a similar comment attached to one of my reviews:

Mar 31, 2011 12:37:49 PM PDT Wanda A. Cadogan says:
And don’t you ever wonder how guys like Mr. Hooper have so much time to write a 24 page review?

Is it really that hard to believe that guys like Mr. Hooper take notes (sometimes extensive notes) while reading technical publications?

I noticed that SavvyShopper had previously reviewed a couple of Oracle Database related books, as well as a couple of woodworking products.  Quite a while ago I toyed a bit with woodworking (a bit in woodworking…. cuts both ways), as a matter of fact I thought that I was headed to college to become a woodworking instructor.  So, I possibly have an odd connection with SavvyShopper, might this be an opportunity for a teachable moment?  I noticed too that in one of his reviews that SavvyShopper appeared to be struggling with the concept that a softwood can in fact be harder than a hardwood.  I am a little disappointed that SavvyShopper deleted his comment after all the effort that I put into formulating a reply:

SavvyShopper,

Thank you for leaving your comment.

Some of my book reviews, especially when there are many errors in the book being reviewed, tend to be very long.  I read computer related books to improve my knowledge, whether that means that I am simply reminded how something works, or I am learning something entirely new.   Investing effort (and occasionally an excessive amount of time) into improving my skill set is important to me, and I hope that it is important to other people in the IT related professions.  The Oracle Database book reviews that I post to Amazon are mostly a by-product of that self-improvement process.

I am still happily employed at the same place where I have worked for more than a decade as an IT professional.  Making time for expanding one’s computer skills through reading sometimes means sacrificing time spent with hobbies or other outside of work activities.  I believe that it is that sacrifice that is an important ingredient in the formula for long-term gainful employment.

Woodworking is a hobby of mine, and judging by your reviews on Amazon, woodworking may also be one of your hobbies.  I see that you gave a thin kerf Forrest Woodworker II saw blade a 2 out of 5 rating (coincidentally the same rating that I gave to this book), primarily because you found that the blade did not impress you when cutting pine and redwood while building one project.  Pine, much more than redwood, leaves deposits of pitch on the face and sides of saw blade teeth (kerosene (or Simple Green) and an old toothbrush will remove the pitch deposits, but I think that there are commercial spray on products on the market now that serve the same purpose).  This buildup of pitch effectively reduces the saw blade’s ability to cut smoothly.  The thin kerf body of the saw blade that you reviewed is less resistant to flexing in a heavy cut, so combined with the pitch build up on the blade, you probably did experience poor quality cuts.  If you mounted the saw blade on an inexpensive saw with a weak/bad bearing setup, that will also explain the poor quality cut that you received.  If you want a treat, mount a regular kerf Woodworker II on an older, properly tuned 3HP or 5HP Unisaw (or Powermatic Model 66) and witness the glass smooth cuts in (even three inch thick) hardwood such as poplar, oak, walnut, or hard maple.

Getting back on topic of your comment, I currently have very little extra time to put into writing.  While probably not your intention, I do not see a book that summarizes mistakes found in other books and the accompanying corrections as selling more than a couple of dozen copies; based on some Oracle book authors remarkable ability to abuse the DMCA (Digital Millennium Copyright Act), such a book could also become a legal nightmare for the publishing company.  I have been approached a couple of times to see if I had an interest in writing a book on the topic of Oracle Database.  At this time, I simply cannot make that commitment (writer’s block, lack of available free time, required reading to stay current in the computer-related field, wanting to spend more time working with non-computer related routers and bits, etc.).

A couple of years ago I contributed to an Oracle Database specific book with fellow OakTable Network members.  The technical standards of the members of that group are extremely high (do a Google search, if you are interested).  As such, I would estimate that I spent roughly eight hours per page writing, testing, proofreading, rewriting, proofreading, testing, rewriting, and proofreading again.  It was quite an experience, especially when Oracle Corporation announced the general availability of a new major version release a day or two before the first draft of the section of the book had to be submitted to the publisher – more testing, rewriting, and proofreading of the book section followed after that first draft was submitted.

Thank you again for leaving the comment.

So, do you go out of your way to learn something new every day, or are you more likely to “let it roll” for five, 10, or 20 years?  For the record, I have a rather large stack of unread woodworking magazines (most of the unread magazines range from 0 to 7 years old) sitting on the shelf that are waiting for the time when spending time expanding one’s knowledge is mutually exclusive to being employed (in the IT industry).

A new logo in development – resawn from 1 inch x 2 inch quarter sawn oak.

Bookmatch Arrangement #1:

OLYMPUS DIGITAL CAMERA

Bookmatch Arrangement #2:

OLYMPUS DIGITAL CAMERA

Bookmatch Arrangement #3:

OLYMPUS DIGITAL CAMERA

The New Logo:

OLYMPUS DIGITAL CAMERA





Bitten by a Virtual Column, _OPTIMIZER_IGNORE_HINTS Doesn’t Ignore Hints?

9 03 2013

March 9, 2013

I had a couple of spare minutes today, so I tried a couple of experiments with Oracle Database 11.2.0.2 just to see if I could produce some unexpected results.

First, I will create a simple database table with two indexes:

CREATE TABLE T1 (
  N1 NUMBER,
  V1 VARCHAR2(20),
  D1 DATE,
  PADDING VARCHAR2(300));

CREATE INDEX IND_T1_I1 ON T1(V1);
CREATE INDEX IND_T1_I2 ON T1(N1,1);

In the above, I indexed column V1.  The second index is a composite index with column N1 as the first column and the constant 1 as the second column (see this article to understand the reason for the constant 1).  Creating the second index causes Oracle Database to add a hidden virtual column to the table.

Let’s insert 100,000 rows into the table (column N1 includes a repeating sequence from 1 to 999 and then a NULL, column V1 is the Roman numeral for the repeating sequence from 1 to 999 and then 0, column D1 simply adds from 1 to 999 and then 0 days to the current date, and column PADDING is just a series of A characters to discourage the Oracle query optimizer from using full table scans rather than an index access path, if such an access path is legal):

INSERT INTO
  T1 
SELECT 
  DECODE(MOD(ROWNUM,1000),0,NULL,MOD(ROWNUM,1000)) N1, 
  TRIM(TO_CHAR(MOD(ROWNUM,1000),'RN')) V1, 
  TRUNC(SYSDATE) + MOD(ROWNUM,1000) D1, 
  RPAD('A',300,'A') PADDING 
FROM 
  DUAL 
CONNECT BY 
  LEVEL<=100000;

COMMIT;

Let’s take a look at the columns that are included in each of the indexes:

SELECT
  INDEX_NAME,
  COLUMN_NAME
FROM
  USER_IND_COLUMNS
WHERE
  TABLE_NAME = 'T1'
ORDER BY
  INDEX_NAME,
  COLUMN_POSITION;

INDEX_NAME   COLUMN_NAME
------------ ------------
IND_T1_I1    V1
IND_T1_I2    N1
IND_T1_I2    SYS_NC00005$

As can be seen above, Oracle added the SYS_NC00005$ hidden virtual column to the table so that the number 1 could be included as the second column of the composite index.  Somewhat interesting, we are able to query the value of the SYS_NC00005$ hidden virtual column in a query:

SET LINESIZE 140
SET PAGESIZE 1000

SELECT
  V1,
  N1,
  "SYS_NC00005$"
FROM
  T1
WHERE
  N1 IS NULL;

V1                           N1 SYS_NC00005$
-------------------- ---------- ------------
###############                            1
###############                            1
###############                            1
###############                            1
###############                            1
...
###############                            1
###############                            1
###############                            1

100 rows selected.

The above indicates that (according to Oracle) just as there is no Roman number higher than 3,999, there is no Roman number for 0.

While probably not wise, we are able to rename this hidden virtual column:

ALTER TABLE T1 RENAME COLUMN "SYS_NC00005$" TO "C";

Table altered.

Let’s check which columns are included in the index definitions again:

SELECT
  INDEX_NAME,
  COLUMN_NAME
FROM
  USER_IND_COLUMNS
WHERE
  TABLE_NAME = 'T1'
ORDER BY
  INDEX_NAME,
  COLUMN_POSITION;

INDEX_NAME   COLUMN_NAME
------------ -----------
IND_T1_I1    V1
IND_T1_I2    N1
IND_T1_I2    C

The output of the above might suggest to the unsuspecting person that someone intentionally added a column named C to the table, and included that column in a composite index with column N1.  Interesting…

Let’s collect statistics on all of the rows in the table, as well as the table’s indexes:

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>NULL,TABNAME=>'T1',CASCADE=>TRUE,ESTIMATE_PERCENT=>100)

SELECT
  V1,
  N1,
  C
FROM
  T1
WHERE
  N1 IS NULL;

V1                           N1          C
-------------------- ---------- ----------
###############                          1
###############                          1
###############                          1
###############                          1
###############                          1
...
###############                          1
###############                          1
###############                          1

100 rows selected.

DBMS_XPLAN will be used to retrieve the actual execution plan for the SQL statement that was just executed:

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'TYPICAL'));

SQL_ID  f35sfgu6s8huh, child number 0
-------------------------------------
SELECT   V1,   N1,   C FROM   T1 WHERE   N1 IS NULL

Plan hash value: 2648162318

-----------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |       |       |   102 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1        |   100 |  1500 |   102   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IND_T1_I2 |   100 |       |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("N1" IS NULL)

The above indicates that the optimizer determined that using an index access path was likely the most efficient access path to search for cases where column N1 contains NULL values.

Oracle Database offers more than 200 hints to help control the query optimizer’s behavior (as well as other behavior).  In general, hints should be used sparingly.  For example, we are able to add a FULL hint to force the optimizer to use a full table scan when searching column N1 for NULL values:

SELECT /*+ FULL(T1) */
  V1,
  N1,
  C
FROM
  T1
WHERE
  N1 IS NULL;

V1                           N1          C
-------------------- ---------- ----------
###############                          1
###############                          1
###############                          1
###############                          1
###############                          1
...
###############                          1
###############                          1
###############                          1

100 rows selected.

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'TYPICAL'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------
SQL_ID  3w60zjynqtwrp, child number 0
-------------------------------------
SELECT /*+ FULL(T1) */   V1,   N1,   C FROM   T1 WHERE   N1 IS NULL

Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |       |       |   590 (100)|          |
|*  1 |  TABLE ACCESS FULL| T1   |   100 |  1500 |   590   (2)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("N1" IS NULL)

As the above shows, Oracle’s optimizer selected to use a full table scan even though the calculated cost for that access path (590) is higher than the calculated cost of the index access path (102).  There is an obvious temptation when learning about hints to overuse those hints – as data volumes (and the data itself) in a table changes, those index hints may very well hurt performance, rather than help performance.

_OPTIMIZER_IGNORE_HINTS doesn’t ignore hints?

That was a recent set of search keywords used to access this blog.  Oracle Database offers a fairly large number of parameters (we will get back to hints in a moment), some of which are considered hidden parameters because those parameters begin with _ (underscore) characters, typically do not display when querying theV$PARAMETER view (unless the parameter value was modified), and typically should not be modified without the explicit request of Oracle Support.  If we had used the +OUTLINE format parameter in the DBMS_XPLAN call we would see a similarly named IGNORE_OPTIM_EMBEDDED_HINTS hint in the Outline Data section of the execution plan:

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR('3w60zjynqtwrp',0,'TYPICAL +OUTLINE'));

SQL_ID  3w60zjynqtwrp, child number 0
-------------------------------------
SELECT /*+ FULL(T1) */   V1,   N1,   C FROM   T1 WHERE   N1 IS NULL

Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |       |       |   590 (100)|          |
|*  1 |  TABLE ACCESS FULL| T1   |   100 |  1500 |   590   (2)| 00:00:01 |
--------------------------------------------------------------------------

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('11.2.0.2')
      DB_VERSION('11.2.0.2')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$1")
      FULL(@"SEL$1" "T1"@"SEL$1")
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("N1" IS NULL)

Let’s modify the _OPTIMIZER_IGNORE_HINTS hidden parameter at the session level and try the test query again:

ALTER SESSION SET "_optimizer_ignore_hints"=TRUE;

Session altered.

SELECT /*+ FULL(T1) */
  V1,
  N1,
  C
FROM
  T1
WHERE
  N1 IS NULL;

V1                           N1          C
-------------------- ---------- ----------
###############                          1
###############                          1
###############                          1
###############                          1
###############                          1
...
###############                          1
###############                          1
###############                          1

100 rows selected.

The above is no surprise, so let’s take a look at the execution plan with the Outline Data section included:

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'TYPICAL +OUTLINE'));

SQL_ID  3w60zjynqtwrp, child number 1
-------------------------------------
SELECT /*+ FULL(T1) */   V1,   N1,   C FROM   T1 WHERE   N1 IS NULL

Plan hash value: 2648162318

-----------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |       |       |   102 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1        |   100 |  1500 |   102   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IND_T1_I2 |   100 |       |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('11.2.0.2')
      DB_VERSION('11.2.0.2')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$1")
      INDEX_RS_ASC(@"SEL$1" "T1"@"SEL$1" ("T1"."N1" "T1"."C1"))
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("N1" IS NULL)

Looking at the above execution plan, you will notice that the Oracle optimizer has ignored the FULL hint that was specified in the SQL statement, and that by examining the Outline Data section of the plan you will see that FULL(@”SEL$1″ “T1″@”SEL$1″) was replaced with INDEX_RS_ASC(@”SEL$1″ “T1″@”SEL$1″ (“T1″.”N1″ “T1″.”C1″)).

I am wondering about the above search keywords that were used to access my blog.  Can I force the Oracle query optimizer NOT to ignore embedded hints when the _OPTIMIZER_IGNORE_HINTS hidden parameter is set to TRUE?  Before we start, let’s set the hidden parameter back to the default value of FALSE.

ALTER SESSION SET "_optimizer_ignore_hints"=FALSE;

Occasionally, when the Outline Data section of the execution plan is reviewed (either with DBMS_XPLAN or by reviewing a 10053 trace) an OPT_PARAM hint may appear in the Outline Data section.  Oracle Database itself inserts these hints into SQL statement, for example (from a 10046 trace file):

PARSING IN CURSOR #448514944 len=3052 dep=1 uid=64 oct=3 lid=64 tim=853194336675 hv=3275773959 ad='3edba6b28' sqlid='2sdms4r1n0q07'
SELECT /*+ opt_param('parallel_execution_enabled', 'false') */
                       /* EXEC_FROM_DBMS_XPLAN */ id, position, depth , operation, options, object_name, cardinality, bytes, temp_space, cost,

A 10053 trace file also includes a section titled PARAMETERS IN OPT_PARAM HINT that indicates the SQL statement level parameters that are modified through the OPT_PARAM hint.

Let’s test that OPT_PARAM hint to see if we are able to hint the optimizer to ignore all other hints in the SQL statement:

SELECT /*+ OPT_PARAM('_optimizer_ignore_hints','TRUE') FULL(T1) */
  V1,
  N1,
  C
FROM
  T1
WHERE
  N1 IS NULL;

V1                           N1          C
-------------------- ---------- ----------
###############                          1
###############                          1
###############                          1
###############                          1
###############                          1
...
###############                          1
###############                          1
###############                          1

100 rows selected.

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'TYPICAL'));

SQL_ID  c4fxyhf0fzg4t, child number 0
-------------------------------------
SELECT /*+ OPT_PARAM('_optimizer_ignore_hints','TRUE') FULL(T1) */
V1,   N1,   C FROM   T1 WHERE   N1 IS NULL

Plan hash value: 2648162318

-----------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |       |       |   102 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1        |   100 |  1500 |   102   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IND_T1_I2 |   100 |       |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("N1" IS NULL)

The hint that modified the parameter that controls whether or not hints are ignored worked.  Well, that just does not seem right – should it not be the case that that hint is also ignored?  Might this be a catch 22?

What if… we reverse the order of the hints?

SELECT /*+ FULL(T1) OPT_PARAM('_optimizer_ignore_hints','TRUE') */
  V1,
  N1,
  C
FROM
  T1
WHERE
  N1 IS NULL;

V1                           N1          C
-------------------- ---------- ----------
###############                          1
###############                          1
###############                          1
###############                          1
###############                          1
...
###############                          1
###############                          1
###############                          1

100 rows selected.

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'TYPICAL'));

SQL_ID  07pu6c3jvdrd0, child number 0
-------------------------------------
SELECT /*+ FULL(T1) OPT_PARAM('_optimizer_ignore_hints','TRUE') */
V1,   N1,   C FROM   T1 WHERE   N1 IS NULL

Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |       |       |   590 (100)|          |
|*  1 |  TABLE ACCESS FULL| T1   |   100 |  1500 |   590   (2)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("N1" IS NULL)

Well, the FULL hint definitely was not ignored by the optimizer that time, as it was for the previous arrangement of the hints.  Might we be on to something?

Let’s verify that the OPT_PARAM hint does work to control optimizer parameters at the SQL statement level:

SELECT /*+ OPT_PARAM('optimizer_index_cost_adj',1) */
  V1,
  N1,
  C
FROM
  T1
WHERE
  N1 IS NULL;

V1                           N1          C
-------------------- ---------- ----------
###############                          1
###############                          1
###############                          1
###############                          1
###############                          1
...
###############                          1
###############                          1
###############                          1

100 rows selected.

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'TYPICAL'));

SQL_ID  ac45wtypubpfu, child number 0
-------------------------------------
SELECT /*+ OPT_PARAM('optimizer_index_cost_adj',1) */   V1,   N1,   C
FROM   T1 WHERE   N1 IS NULL

Plan hash value: 2648162318

-----------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |       |       |     1 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1        |   100 |  1500 |     1   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IND_T1_I2 |   100 |       |     1   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("N1" IS NULL)

Reviewing the above, you might have noticed that the calculated cost of the index access decreased from a value of 102 to just 1 as a result of the OPT_PARAM hint, so we might be able to conclude that the hint does work.  Might this difference in the execution plan be explained as just the query optimizer NOT silently ignoring invalid hints?  Food for thought.

You might have noticed that the above queries have selected column C from the table.  Let’s take a look at the table’s definition:

DESC T1

 Name             Null?    Type
 ---------------- -------- -------------
 N1                        NUMBER
 V1                        VARCHAR2(20)
 D1                        DATE
 PADDING                   VARCHAR2(300)

Yep, no column C.  How are we able to select column C from the table if DESC T1 does not list that column?

Let’s drop the second index, the one that used the constant 1 as the second column in the index:

DROP INDEX IND_T1_I2;

Index dropped.

Now that the index is gone that created that hidden virtual column, let’s create our own column C and use that column in a new IND_T1_I2 composite index:

ALTER TABLE T1 ADD (C NUMBER DEFAULT 1 NOT NULL);

                    *
ERROR at line 1:
ORA-01430: column being added already exists in table

ORA-01430?  I never added a column C to the table, did you?  Take a look:

DESC T1

 Name             Null?    Type
 ---------------- -------- -------------
 N1                        NUMBER
 V1                        VARCHAR2(20)
 D1                        DATE
 PADDING                   VARCHAR2(300)

Be gone mysterious column C (do you think this might be the reason why it is a bad idea to rename the system generated hidden virtual columns?):

ALTER TABLE T1 DROP (C);

                     *
ERROR at line 1:
ORA-12996: cannot drop system-generated virtual column

But, but, I dropped the index that caused that hidden virtual column to be created.

ALTER TABLE T1 ADD (C1 NUMBER DEFAULT 1 NOT NULL);

Table altered.

CREATE INDEX IND_T1_I2 ON T1(N1,C1);

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>NULL,TABNAME=>'T1',CASCADE=>TRUE,ESTIMATE_PERCENT=>100)

I didn’t really want a column C any way, sniff.

Now, if we were to execute this SQL statement, will Oracle’s query optimizer use the just created IND_T1_I2 index to search for NULL values in column N1?  Would the result potentially be different if column C1 did not have a declared NOT NULL constraint?  Keep in mind that the Oracle documentation states: “The query must be guaranteed not to need any NULL values from the indexed expression, because NULL values are not stored in indexes.”

SELECT
  V1,
  N1,
  C1
FROM
  T1
WHERE
  N1 IS NULL;

Did I mention that I had a couple of spare minutes today?  That does not happen too often lately.

Above I stated that it was odd that Oracle Database did not automatically dop the automatically created hidden virtual column when the index for which the column was created was dropped.  So, what happens if the automatically created hidden virtual column is not renamed?  Here is a simple script based on the above experiment to determine what happens (change SYS_NC00005$ in the ALTER TABLE commands as necessary so that the column name is the same as the column name returned from the query of the USER_IND_COLUMNS view).

DROP TABLE T1 PURGE;

CREATE TABLE T1 (
  N1 NUMBER,
  V1 VARCHAR2(20),
  D1 DATE,
  PADDING VARCHAR2(300));

CREATE INDEX IND_T1_I1 ON T1(V1);

CREATE INDEX IND_T1_I2 ON T1(N1,1);

INSERT INTO
  T1
SELECT
  DECODE(MOD(ROWNUM,1000),0,NULL,MOD(ROWNUM,1000)) N1,
  TRIM(TO_CHAR(MOD(ROWNUM,1000),'RN')) V1,
  TRUNC(SYSDATE) + MOD(ROWNUM,1000) D1,
  RPAD('A',300,'A') PADDING
FROM
  DUAL
CONNECT BY
  LEVEL<=100000;

COMMIT;

SELECT
  INDEX_NAME,
  COLUMN_NAME
FROM
  USER_IND_COLUMNS
WHERE
  TABLE_NAME = 'T1'
ORDER BY
  INDEX_NAME,
  COLUMN_POSITION;

ALTER TABLE T1 ADD ("SYS_NC00005$" NUMBER DEFAULT 1 NOT NULL);

DROP INDEX IND_T1_I2;

ALTER TABLE T1 ADD ("SYS_NC00005$" NUMBER DEFAULT 1 NOT NULL);

My results follow:

SQL> DROP TABLE T1 PURGE;
Table dropped.

SQL> CREATE TABLE T1 (
  2    N1 NUMBER,
  3    V1 VARCHAR2(20),
  4    D1 DATE,
  5    PADDING VARCHAR2(300));

Table created.

SQL> CREATE INDEX IND_T1_I1 ON T1(V1);

Index created.

SQL> CREATE INDEX IND_T1_I2 ON T1(N1,1);

Index created.

SQL> INSERT INTO
  2    T1
  3  SELECT
  4    DECODE(MOD(ROWNUM,1000),0,NULL,MOD(ROWNUM,1000)) N1,
  5    TRIM(TO_CHAR(MOD(ROWNUM,1000),'RN')) V1,
  6    TRUNC(SYSDATE) + MOD(ROWNUM,1000) D1,
  7    RPAD('A',300,'A') PADDING
  8  FROM
  9    DUAL
 10  CONNECT BY
 11    LEVEL<=100000;

100000 rows created.

SQL> COMMIT;

Commit complete.

SQL> SELECT
  2    INDEX_NAME,
  3    COLUMN_NAME
  4  FROM
  5    USER_IND_COLUMNS
  6  WHERE
  7    TABLE_NAME = 'T1'
  8  ORDER BY
  9    INDEX_NAME,
 10    COLUMN_POSITION;

INDEX_NAME   COLUMN_NAME
------------ --------------------
IND_T1_I1    V1
IND_T1_I2    N1
IND_T1_I2    SYS_NC00005$

SQL> ALTER TABLE T1 ADD ("SYS_NC00005$" NUMBER DEFAULT 1 NOT NULL);
ALTER TABLE T1 ADD ("SYS_NC00005$" NUMBER DEFAULT 1 NOT NULL)
                    *
ERROR at line 1:
ORA-01430: column being added already exists in table

SQL> DROP INDEX IND_T1_I2;

Index dropped.

SQL> ALTER TABLE T1 ADD ("SYS_NC00005$" NUMBER DEFAULT 1 NOT NULL);

Table altered.

With the original test case, both the first and second ALTER TABLE commands failed.





What does it Mean when a Select Statement in Oracle is using 100% CPU?

14 02 2013

February 14, 2013

A couple of days ago I noticed that an interesting set of search keywords were used to access this blog.  The search keywords:

What does it mean when a select statement in Oracle is using 100% cpu

I had two thoughts when I first saw that set of search keywords:

  • Well Done!
  • What an inconsiderate Oracle developer!

I feel a bit conflicted about the two first thought answers.  Think about the above bullet points for a moment.

I was interested enough in the keywords that were phrased in the form of a question, that I went though the search keywords used to access this blog for the last two weeks, and found the following related search keywords:

  • ora-01861 Oracle causing cpu 100% usage
  • Oracle one session 100% cpu
  • Oracle cpu utilisation 100%, is it a good or bad sign
  • Oracle database server cpu utilization is 100

Interesting, and possibly thought provoking.  I recalled an article that I wrote on this blog roughly three years ago.

Possible related thoughts of a temporarily rogue DBA:

  1. I fixed your problem without changing the execution plan… take a look at the EXPLAIN PLAN output for yourself.
  2. I fixed your problem without changing the execution plan… the server is more efficient now that we have consolidated three more database instances onto the server.
  3. I fixed your problem without changing the execution plan… that PGA_AGGREGATE_TARGET parameter was clear on the other side of 100 MB, so I set it to 10 MB.
  4. I fixed your problem without changing the execution plan… a group of the guys in the office found a way to load the game Doom on the server.
  5. I fixed your problem without changing the execution plan… I took half of the hard drives out of the array and moved the server to a 10 Mb hub.

Took a close look at the bulletpointed search terms.  If someone were to ask you one of the following questions, how would you respond if you were earnestly attempting to help the person?

  1. What does it mean when a Select statement in Oracle is using 100% CPU?
  2. The server admin reported that the Oracle database server cpu utilization hits 100% too frequently, what should I do about the problem?
  3. Oracle is throwing an ORA-01861 error, and that is causing 100% CPU usage.  What should I do?

Share your thoughts.

Summary of Suggestions/Causes Shared by Readers (Last Updated February 14, 2013):

  • Intended or unintended Cartesian product between row sources – does the query have a DISTINCT clause.  (Rodger)
  • Intended or unintended data type conversions – is the data model storing numbers or date values in VARCHAR2 columns.  (Jeremy Kendrick)







Follow

Get every new post delivered to your Inbox.

Join 139 other followers