On the Topic of Programming 2

2 09 2012

September 2, 2012

(Back to the Previous Post in the Series) (Forward to the Next Post in the Series)

Over the course of the last six months I developed several efficiency reports for the machining facility where I work, measuring the average time required to complete a customer’s part compared to the engineering standard “target” run time for producing the part.  Sounds simple, right?  If the engineering target is 30 widgets per hour, and only 20 widgets per hour are produced, then the average efficiency is 20/30 = 66.67%.  It does not take a person with a degree in mathematics to arrive at the conclusion that the widgets are not being produced as quickly as expected.  What is the scope of the measurement time frame for the average: a year, a month, a week, a day, one work shift in a day, a single machining center for a day, a single machining center and employee, or something else?

OK, now flip the engineering standard run times a bit so that the times are a little more consistent with the machining facility where I work.  It might take four hours, six hours, 10 hours, 20 hours, or possibly even 200 hours to complete a single operation at a machining center (one of several operations in the manufacturing process for the widget) to produce a single widget.  With the long run times of the operations, calculating the efficiency of a machining center or an employee for a specified period of time was a daunting task… a task that I was asked to solve roughly 12 years ago (with the help of Oracle Database 8.0.5).

The hours per widget run time (rather than widgets per hour) situation presented various problems for calculating employee efficiency, especially when an operation at a machining center did not complete before the end of an employee’s shift.  Consider a situation where an operation is expected to require eight hours to complete, but only when the tooling used at the machining center is new.  As such, the first shift employee installs new tooling in the machining center every morning before starting a new widget.  The first shift employee spends about 30 minutes finishing up the widget that was started the previous night, changes the tooling, and then starts the machining on the next widget.  So, the first shift employee reports that one widget (started the previous night) completed after 0.5 hours and the second widget completed in 7.5 hours.  The first shift employee’s efficiency, considering that two widgets were completed during his shift, is easily calculated as (8 + 8) / (0.5 + 7.5) * 100 = 200%.  The second shift employee’s efficiency is a consistent 0% because the operation for the widget never completes during his shift because the sharpness of the tooling deteriorates through the day (thus causing the machining operation to take longer).  This obviously leads to odd questions: 1) Why is the second shift employee more efficient when the first shift employee is on vacation (equivalent to the question in Oracle performance tuning: why does my report run faster when it rains Monday mornings?)?  Why is the second shift employee more efficient when working a nine or 10 hour shift, rather than an eight hour shift?  The fun questions that one is able to answer when familiar with the data…

When an employee starts working on a machining operation, a labor ticket transaction is created detailing the fact that the work for the machining operation is in-process.  When the machining operation completes (or when it is time to leave for the day), the employee closes the labor ticket transaction and reports the number of widgets that completed.  These labor ticket transactions are conveniently stored in a table named LABOR_TICKET.  Back in 2006 or 2007, I decided to take another attempt at solving the efficiency problem – after all, I now had access to analytic functions in Oracle Database (analytic functions did not exist in Oracle Database 8.0.5).  I expected that by extending the time range beyond a single employee and/or shift, I might be able to extract useful efficiency information from the database.  I put together a rather long SQL statement that looked something like this:

SELECT DISTINCT
  1 TYPE,
  WO.PART_ID,
  TO_NUMBER(LT.WORKORDER_SUB_ID) WORKORDER_SUB_ID,
  LT.OPERATION_SEQ_NO,
  TRUNC(SYSDATE-30) SHIFT_DATE,
  LT.EMPLOYEE_ID,
  LT.RESOURCE_ID,
  SUM(LT.HOURS_WORKED) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,EMPLOYEE_ID) HOURS_WORKED,
  SUM(LT.GOOD_QTY) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,EMPLOYEE_ID) GOOD_QTY,
  NULL HRS_PC,
  ROUND(DECODE(SUM(LT.GOOD_QTY) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO),0,0,
    SUM(LT.HOURS_WORKED) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO)/
      SUM(LT.GOOD_QTY) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO)),2) AVG_HRS_PC_TIME_ALL,
  ROUND(DECODE(SUM(LT.GOOD_QTY) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,EMPLOYEE_ID),0,0,
    SUM(LT.HOURS_WORKED) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,EMPLOYEE_ID)/
      SUM(LT.GOOD_QTY) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,EMPLOYEE_ID)),2) AVG_HRS_PC_TIME_EMP,
  ROUND(DECODE(SUM(LT.GOOD_QTY) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,RESOURCE_ID),0,0,
    SUM(LT.HOURS_WORKED) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,RESOURCE_ID)/
      SUM(LT.GOOD_QTY) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,RESOURCE_ID)),2) AVG_HRS_PC_TIME_RES,
  ROUND(DECODE(SUM(LT.GOOD_QTY) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,EMPLOYEE_ID,RESOURCE_ID),0,0,
    SUM(LT.HOURS_WORKED) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,EMPLOYEE_ID,RESOURCE_ID)/
      SUM(LT.GOOD_QTY) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,EMPLOYEE_ID,RESOURCE_ID)),2) AVG_HRS_PC_TIME_EMP_RES
FROM
  WORK_ORDER WO,
  LABOR_TICKET LT
WHERE
  WO.TYPE='W'
  AND WO.SUB_ID='0'
  AND WO.PART_ID='WIDGET1'
  AND LT.SHIFT_DATE BETWEEN TRUNC(SYSDATE-30) AND TRUNC(SYSDATE)
  AND WO.TYPE=LT.WORKORDER_TYPE
  AND WO.BASE_ID=LT.WORKORDER_BASE_ID
  AND WO.LOT_ID=LT.WORKORDER_LOT_ID
  AND WO.SPLIT_ID=LT.WORKORDER_SPLIT_ID
UNION ALL
SELECT DISTINCT
  2 TYPE,
  WO.PART_ID,
  TO_NUMBER(LT.WORKORDER_SUB_ID) WORKORDER_SUB_ID,
  LT.OPERATION_SEQ_NO,
  TRUNC(SYSDATE-60) SHIFT_DATE,
  LT.EMPLOYEE_ID,
  LT.RESOURCE_ID,
  SUM(LT.HOURS_WORKED) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,EMPLOYEE_ID) HOURS_WORKED,
  SUM(LT.GOOD_QTY) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,EMPLOYEE_ID) GOOD_QTY,
  NULL HRS_PC,
  ROUND(DECODE(SUM(LT.GOOD_QTY) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO),0,0,
    SUM(LT.HOURS_WORKED) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO)/
      SUM(LT.GOOD_QTY) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO)),2) AVG_HRS_PC_TIME_ALL,
  ROUND(DECODE(SUM(LT.GOOD_QTY) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,EMPLOYEE_ID),0,0,
    SUM(LT.HOURS_WORKED) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,EMPLOYEE_ID)/
      SUM(LT.GOOD_QTY) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,EMPLOYEE_ID)),2) AVG_HRS_PC_TIME_EMP,
  ROUND(DECODE(SUM(LT.GOOD_QTY) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,RESOURCE_ID),0,0,
    SUM(LT.HOURS_WORKED) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,RESOURCE_ID)/
      SUM(LT.GOOD_QTY) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,RESOURCE_ID)),2) AVG_HRS_PC_TIME_RES,
  ROUND(DECODE(SUM(LT.GOOD_QTY) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,EMPLOYEE_ID,RESOURCE_ID),0,0,
    SUM(LT.HOURS_WORKED) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,EMPLOYEE_ID,RESOURCE_ID)/
      SUM(LT.GOOD_QTY) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,EMPLOYEE_ID,RESOURCE_ID)),2) AVG_HRS_PC_TIME_EMP_RES
FROM
  WORK_ORDER WO,
  LABOR_TICKET LT
WHERE
  WO.TYPE='W'
  AND WO.SUB_ID='0'
  AND WO.PART_ID='WIDGET1'
  AND LT.SHIFT_DATE BETWEEN TRUNC(SYSDATE-60) AND TRUNC(SYSDATE-31)
  AND WO.TYPE=LT.WORKORDER_TYPE
  AND WO.BASE_ID=LT.WORKORDER_BASE_ID
  AND WO.LOT_ID=LT.WORKORDER_LOT_ID
  AND WO.SPLIT_ID=LT.WORKORDER_SPLIT_ID
...

The above SQL statement grew in length substantially based on the number of time periods that I selected to compare.  This data was returned to a program for display purposes, so reformatting the output was always a possibility.  Consider a situation where the programmer is unfamiliar with Oracle specific SQL and is asked to generate the same output in a program that he is constructing.  For a program seeking to measure efficiency, the programmer’s solution would likely involve repeated executions of SQL statements to retrieve the bits and pieces of information that need to be presented – this row by row processing will likely be very slow over a high latency WAN (VPN) connection (but certainly faster than never receiving the information, so the programmer will still be a hero), and could very well hinder the performance of the Oracle Database server.

The above solution is good, but problems such as the progressive wearing of the tooling cutter, as mentioned earlier, could still lead to unexpected differences in efficiency of different employees that are working as fast as is possible.  Fast forward a few years.  The right questions are asked from people with a more complete understanding of efficiency measurements – examine the efficiency calculation problem from an entirely different angle.  Not long ago I was handed an example of a simple efficiency report, and asked to reproduce that report with live data, output in a Microsoft Excel spreadsheet.  The example was little more than a simple sketch, so for this blog article I put together a color-coded example of the report format in Microsoft Excel:

A little explanation is required.  The entirely different angle for calculating employee efficiency in the execution of widget operations involves not looking at the date of the labor ticket transaction, or even the number of widgets produced by a particular employee in a particular time frame.  Instead, the date of the manufacturing batch’s (lot in this particular ERP system) completion date, the quantity produced in the batch, and the total hours to execute a single machining operation for the lot become the criteria when comparing against the engineering standards to determine efficiency.  The manufacturing batch’s completion date is used to divide the batches into specific timeframes (in this case weeks: Monday through Sunday).  All employees working on a particular operation, where the batch completion date is in a certain week, will receive the same efficiency rating for that particular operation (EFF_WEEK1, EFF_WEEK2, EFF_WEEK3, etc.) as all other employees working on the same operation with the same batch completion date range.  It is not uncommon for employees to generate labor transactions for multiple operations for production of the same part, as well as operations for different parts that have batch completion dates in the same week, so the employee’s efficiency rating (W_AVG_EFF1, W_AVG_EFF2, W_AVG_EFF3, etc.) weights the individual part efficiencies based on the percentage of machining time the employee spent in a given operation compared to all of the other operations the employee worked.  The employee’s efficiency rating (Avg W_VE_EFF) is the simple average of the employee’s weekly efficiency ratings.

In the above example report, the blue colored text is a static label.  The red colored text is a calculated static label that shows the Monday through Sunday date range for the week.  The black colored text is a simple “rip and read” from the database – no calculation is required.  The purple colored text indicates a calculated value.  The numbers at the right will not appear on the completed report, but are included to indicate which EFF_WEEKn and AVG_WEEKn values will be identical if the employee worked on an operation whose batch closed in the given week (notice that there are blanks in columns of the sample report, indicating that the employee did not work on that operation in the batch completion week).

The programming challenge is to determine the number of SQL statements that would be needed to retrieve the information from the database, and the number of times those SQL statements would need to be executed.  Would the number of SQL statement executions depend on the number of employees?  Would the number of SQL statement executions depend on the number of different part operations whose batches closed in a particular week?  Would the number of SQL statement executions depend on the number weeks included in the report?  Before you think about the problem, we should probably investigate the data organization in the various tables.  The picture below shows the tables (and the specific columns) required for the report, showing the tables for the engineering standards in green boxes and the tables needed for the labor transaction analysis in black boxes.  The lines show the association of the data in the different tables.

(The above image was adjusted 7 hours after this blog article was initially posted.  The linking between the green OPERATION table and the green WORK_ORDER table was corrected, and the red colored text was added to indicate whether the table was to be used for retrieving the engineering master standards (WORKORDER_TYPE=’M’) or information from the production work order batches (WORKORDER_TYPE=’W’ or TYPE=’W’).)

It might also be interesting to think about what processing will be performed on the data returned by the SQL statements – that could be a pivotal design decision for the SQL statements.  Curve balls are to be expected – what is currently an examination of four weeks’ worth of efficiency numbers today might be an examination of 52 weeks tomorrow.  Oh, here is a thought, what if instead on the next day it is important to focus on the various part efficiencies and the employees who worked on the operations, rather than focusing on the employees and the various parts that the employees worked on?

Part 3 of this series will attempt to answer some of the above questions.

Late addition, sample of a completed report in Excel:





On the Topic of Programming 1

26 08 2012

August 26, 2012

(Forward to the Next Post in the Series)

As those of you who have read this blog’s About page probably know, my day to day job responsibilities involve a lot of activities that are not specific to Oracle Database performance tuning, or even remotely Oracle DBA type activities.  Those extra acttivites are part of what keeps the job fresh and interesting, whether I am swapping in a new roll of labels into a Zebra label printer (that was a three minute fun task this morning), troubleshooting phone system problems (from a fork lift truck “disabling” a phone, to programming the PBX with a clever “message delivery system” to waste the time of persistent telemarketers), swapping out the power supply in a computer very early in the morning, or engaged in a marathon of application programming into the evening and weekend hours.

One of the recent programming projects involves the continuing effort of decreasing the number of data islands, allowing the data previously contained in the islands to be accessed and cross-referenced with data generated by other applications.  One of the data island generators that I have been working to phase out is Superbase 3.0, a database platform that started life on a Commodore 64 in 1983.  Superbase 3.0 is a 16 bit client-side database, so it does not play well with the 64 bit Windows 7 that ships with most new desktop computers (16 bit applications will not run natively on 64 bit Windows, instead the 16 bit applications must be run inside a virtual machine such as Windows XP Mode, or run remotely using remote access software such as a VNC client).

The data contained in the old Superbase databases is critical to the company’s processes, so that data must be salvaged – meaning that the data must be transformed and imported into an Oracle database.  Unlike what a developer would likely create in a relational database, often with multiple tables used to store one “record” in the database, the long departed designer of the Superbase databases used a single row in a single database table to store one “record” in the database.  That organization reminds me a lot of the Microsoft Works package’s database from the early 1990s, with its fancy data entry forms which allowed users to escape the dull spreadsheet-like data entry screen.  Microsoft Excel from the early/mid 1990s could magically transform a dull spreadsheet data entry screen into a simple data entry form, in the process transforming the expensive Microsoft Excel into essentially a cheap database program.  It is a bit more of a challenge to locate the automatic data entry form creator in Excel 2010 than I recall it being in the early/mid 1990s version of Excel, but I suppose that helps further reduce the number of potential data islands:

So, what does the above discussion of Microsoft Excel have to do with anything related to Oracle Database?  The data contained in the Superbase databases must be transformed and inserted into an Oracle database.  It is good news that Superbase is able to export data to Microsoft Excel format.  The bad news is that the exported format is designed to work with Microsoft Excel 2.0 – a very old version of Microsoft Excel that seems to date back to 1987!  Time for a lot of manual data entry if that data must end up in an Oracle Database 11.2.0.x database… unless…

Microsoft Excel 2003 (version 12.0 if I remember correctly) is able to open Excel 2.0 files… success, even if the success is minor.  Now, how to go about tranferring the data from Excel into Oracle Database?  I suppose that I could have created a macro in Microsoft Excel to insert the data into Oracle Database, but at the time I was not interested in writing a macro that accomplished the task “the right way” using bind variables.  And just look at that data – some of the date values were imported as very small (roughly -65000) numbers, in some cases nearly 20 different spellings for the same customer name, and alpha-numeric text in columns that should be numeric.

So, how did I import the Superbase data that was now in Excel 2003 into the Oracle Database 11.2.0.x database without writing an Excel macro?  The particular computer with Excel 2003 that I was using also had a copy of Access 2003 installed.  Access 2003 is able to create a table “link” to an Excel 2003 spreadsheet’s worksheet, and handle that worksheet essentially the same as if it were a database table.  Now the data is “in” Microsoft Access 2003, but still not in an Oracle database.  Previous experience with this process pays off – before bringing the data into Microsoft Access, type each of the Oracle Database destination table’s column names into the first row of the Microsoft Excel spreadsheet, above the appropriate column’s data.  Importing the data into the Oracle database then becomes a simple four step process (assuming that no other data transformation is necessary)

  1. Link to the Excel spreadsheet’s worksheet and the destination table in the Oracle database.
  2. Create an Access view (stored Query) that selects all of the columns from the Excel worksheet that must be inserted into the Oracle database.
  3. Convert the view (stored Query) type to an Append type and select the linked Oracle Database table as the destination – Access will automatically find the correct destination column in the Oracle table, if the source column name (from the first row in the Excel worksheet) matches the destination column name.
  4. Execute the append type view.

A simple transformation of the data from 1994 database technology to 1987, 2003, and then on to 2011 in Oracle Database – and without writing a single line of code.  Remember that problem that I mentioned about alpha-numeric text in columns that should be numeric, such as “10&20″ in a column named OPERATION_NUMBER (or OPERATION_SEQ_NO) – it turns out that that bit of inconsistent data cannot just be thrown out (thanks Microsoft Access 2003).  To fix that problem, I would need to add another column to the Oracle Database table, and then have Microsoft Access update that table using the Microsoft Excel spreadsheet data (fixing the “10&20″, “10 & 20″, “10  &20″, “10 AND 20″, “10,20” and “10, 20″ variants into a standard format.  The SQL dialect in Microsoft Access is a bit odd at times, and I could not remember if the odd syntax applies to UPDATE statements also.  As an example of the odd syntax, the simple CREATE TABLEAS SELECT:

CREATE TABLE
  T2 AS
SELECT
  T1.SID,
  T1.SERIAL,
  T1.USERNAME,
  T1.PROGRAM,
  MIN(T1.CHECK_COUNT) AS CHECK_COUNT_START,
  MAX(T1.CHECK_COUNT) AS CHECK_COUNT_END
FROM
  T1
GROUP BY
  T1.SID,
  T1.SERIAL,
  T1.USERNAME,
  T1.PROGRAM;

Must be written like the following in Microsoft Access:

SELECT
  T1.SID,
  T1.SERIAL,
  T1.USERNAME,
  T1.PROGRAM,
  MIN(T1.CHECK_COUNT) AS CHECK_COUNT_START,
  MAX(T1.CHECK_COUNT) AS CHECK_COUNT_END
INTO
  T2
FROM
  T1
GROUP BY
  T1.SID,
  T1.SERIAL,
  T1.USERNAME,
  T1.PROGRAM;

Since I am not 100% confident in my SQL authoring skills in Microsoft Access, how do I move the data from the Excel spreadsheet into the new column of the Oracle Database table… and without writing a single line of programming code?  I simply created a temporary table (not a true temporary table, because the table data must be visible to more than one session) that contained the primary key column and a second column for the non-numeric numeric data.  Once the data was in the temporary Oracle table (using the simple four step process outlined above), I simply executed an UPDATE statement similar to this:

UPDATE
  T1
SET
  NON_NUMERIC_NUMERIC=(
    SELECT
      NON_NUMERIC_NUMERIC
    FROM
      T1_TEMP TT
    WHERE
      T1.PRIMARY_KEY=TT.PRIMARY_KEY)
WHERE
  T1.PRIMARY_KEY IN (
    SELECT
      PRIMARY_KEY
    FROM
      T1_TEMP);

With the data successfully transferred into an Oracle database table, the programming continues.  That brings me to the next article in this series, the internal conflicts of the “best” way to accomplish the programming task.





SQL Challenges

14 06 2012

June 14, 2012

Dominic Delmolino put together a very interesting challenge.  The challenge is to produce something called a Pascal matrix using Oracle Database… more specifically, just SQL.  I had a vague recollection of Pascal matrixes when I read Dominic’s challenge.  Basically, the goal is to create a matrix similar to the following:

The rule for generating the matrix is simply that a cell’s value is the sum of the value in the cell that is immediately to the left plus the value in the cell that is immediately above.  Sounds easy, right?

If we were just working in Microsoft Excel (or some other spreadsheet package), we could do something like this to quickly create the matrix:

Dominic’s challenge probably would not be much of a challenge if we could just type in formulas like the above into a SQL statement.  Give his challenge a try to see if you are able to derive a unique solution to the problem.  I probably spent a couple of minutes (maybe 60 seconds with the help of copy and paste) creating the above example using Microsoft Excel, but spent a couple of hours trying to produce a solution that worked using SQL.

——

Part 2 of the challenge.

Take a look at the bullet point items in this blog article about NULL values in table’s columns, in particular the first three large bullet point items.  Do you agree or disagree with the statements, and why?





Reproducing a Canned Report using a Single SQL Statement

11 04 2012

April 11, 2012

I recently received an interesting request for assistance from an ERP email mailing list.  The author of the email wanted to reproduce a canned report found in the ERP package so that the information could be published on a Microsoft Sharepoint system.  The author of the email is using SQL Server for the database backend, but we will ignore that bit of complexity for the moment.

The canned report that ships with the ERP system looks similar to the following:

Basically, the end user enters a start and an end date for the report, the ERP system checks the shop resource availability calendar for the days between the start and end dates, and then calculates the Capacity value from that calendar (there is a default shop resource calendar, and an override calendar for some of the shop resources).  The Act Hrs column in the report is simply the sum of hours calculated from the employee labor transactions that are recorded in real-time on the production floor – that column value is easy to calculate, but is subject to minor rounding errors and date calculation errors when an employee labor transaction starts before midnight and ends after midnight on either the start date or end date specified by the end user running the report.  The Est Hrs column in the report is a little more difficult to calculate, requiring a bit of calculation that determines what item the employee actually claimed to have produced, and in what quantity – what if the employee claimed to have produced 6 of 10 widgets in the specified time period, might he have completed 99% of the seventh widget but not yet reported that widget complete?  The EFF column is simply the Est Hours column divided by the Act Hrs column, with the result multiplied by 100.  The Act Util column is also easy to calculate: the Act Hrs column value divided by the Capacity column value, with the result multiplied by 100.  The Est Util column is simply the Est Hrs column value divided by the Capacity column value, with the result multiplied by 100.

So, where do we start in trying to reproduce this particular report?  How about enabling a 10046 trace for the session that creates the report.  Analyzing the resulting trace file might provide some insight into how the report is built.  Below is a summary of the SQL statements found in the trace file:

Query 1: (this query that retrieves the estimated hours, shop resource description, and various other information that does not appear on the report – this appears to be the query that provokes the ERP system to issue the queries that follow)

SELECT
  L.RESOURCE_ID,
  L.WORKORDER_TYPE,
  L.WORKORDER_BASE_ID, 
  L.WORKORDER_LOT_ID,
  L.WORKORDER_SPLIT_ID,
  L.WORKORDER_SUB_ID, 
  L.OPERATION_SEQ_NO,
  L.TRANSACTION_DATE,
  L.HOURS_WORKED,
  L.GOOD_QTY, 
  L.TYPE,
  L.EMPLOYEE_ID, 
  OP.SETUP_HRS,
  OP.RUN,
  OP.RUN_TYPE,
  OP.LOAD_SIZE_QTY,
  OP.CALC_END_QTY, 
  R.DESCRIPTION, 
  RTRIM(E.LAST_NAME),
  RTRIM(E.FIRST_NAME),
  E.MIDDLE_INITIAL, 
  W.PART_ID,
  P.DESCRIPTION,
  L.TRANSACTION_ID 
FROM
  LABOR_TICKET L,
  OPERATION OP,
  SHOP_RESOURCE R,
  EMPLOYEE E,
  WORK_ORDER W,
  PART P 
WHERE
  L.RESOURCE_ID = R.ID 
  AND L.EMPLOYEE_ID = E.ID 
  AND L.WORKORDER_TYPE = OP.WORKORDER_TYPE 
  AND L.WORKORDER_BASE_ID = OP.WORKORDER_BASE_ID 
  AND L.WORKORDER_LOT_ID = OP.WORKORDER_LOT_ID 
  AND L.WORKORDER_SPLIT_ID = OP.WORKORDER_SPLIT_ID 
  AND L.WORKORDER_SUB_ID = OP.WORKORDER_SUB_ID 
  AND L.OPERATION_SEQ_NO = OP.SEQUENCE_NO  
  AND W.TYPE = OP.WORKORDER_TYPE 
  AND W.BASE_ID = OP.WORKORDER_BASE_ID 
  AND W.LOT_ID = OP.WORKORDER_LOT_ID 
  AND W.SPLIT_ID = OP.WORKORDER_SPLIT_ID 
  AND W.SUB_ID = OP.WORKORDER_SUB_ID  
  AND W.PART_ID = P.ID (+)  
  AND TRUNC(L.TRANSACTION_DATE) BETWEEN :1  AND :2
ORDER BY
  L.RESOURCE_ID,
  L.WORKORDER_TYPE,
  L.WORKORDER_BASE_ID, 
  L.WORKORDER_LOT_ID,
  L.WORKORDER_SPLIT_ID,
  L.WORKORDER_SUB_ID,
  L.OPERATION_SEQ_NO;

Query 2: (this query returns the quantity of a specific resource that is available for each of three shifts… for instance, the number of employees with a specific skill)

SELECT
  SHIFT_1_CAPACITY,
  SHIFT_2_CAPACITY,
  SHIFT_3_CAPACITY 
FROM
  SHOP_RESOURCE 
WHERE
  ID = :1;

Query 3: (this query returns the number of hours of availability for a specific resource, for each day of the week; the default resource calendar has a NULL value for the RESOURCE_ID column, so the specified sort order will return the default resource calendar rows before the over-ride calendar for a specific shop resource)

SELECT 
  DAY_OF_WEEK,
  START_OF_DAY,
  SHIFT_1,
  SHIFT_2,
  SHIFT_3 
FROM
  CALENDAR_WEEK 
WHERE
  (SCHEDULE_ID IS NULL OR SCHEDULE_ID = 'STANDARD') 
  AND (RESOURCE_ID IS NULL OR RESOURCE_ID = :1 ) 
ORDER BY
  SCHEDULE_ID,
  RESOURCE_ID DESC,
  DAY_OF_WEEK;

Query 4: (this query returns the exceptions, planned days of unavailabilty, for the default resource calendar and the shop resource specific over-ride calendar.  I just noticed a logic error in this SQL statement – what about those scheduled exceptions that start before the user specified start date and/or those scheduled exceptions that end after the user specified end date, where some of those dates fall into the user specified date range?)

SELECT  
  START_DATE,
  END_DATE,
  START_OF_DAY,
  SHIFT_1,
  SHIFT_2,
  SHIFT_3, 
  SHIFT_1_CAPACITY,
  SHIFT_2_CAPACITY,
  SHIFT_3_CAPACITY 
FROM
  CALENDAR_CHANGE 
WHERE
  (SCHEDULE_ID IS NULL OR SCHEDULE_ID = 'STANDARD') 
  AND (RESOURCE_ID IS NULL OR RESOURCE_ID = :1 ) 
  AND START_DATE >= :2 AND END_DATE <= :3
ORDER BY
  SCHEDULE_ID,
  RESOURCE_ID,
  START_DATE;

Query 5: (this query returns the number of operation setup hours for a specific operation, time spent preparing to produce parts, reported on the shop floor – it would seem that Query 1 could be modified to return this information)

SELECT
  SUM(HOURS_WORKED) 
FROM
  LABOR_TICKET 
WHERE
  WORKORDER_TYPE = :1     
  AND WORKORDER_BASE_ID = :2
  AND WORKORDER_LOT_ID = :3
  AND WORKORDER_SPLIT_ID = :4
  AND WORKORDER_SUB_ID = :5
  AND OPERATION_SEQ_NO = :6
  AND TYPE = 'S';

Now that we have a general idea of what steps the ERP system is performing to reproduce the report, where do we start?  We start by telling the OP (the person with the SQL Server backend) that with an Oracle Database backend, we could produce this report using a single SQL statement (without any stored procedures) – that approach would certainly eliminate a lot of back and forth communication between the client computer and the database server, which is an important consideration if a high latency network connects the two devices.  For the sake of simplicity, I will set the above query 4 aside for now.  Since there is an index on the LABOR_TICKET.TRANSACTION_DATE column, the ERP system loses the benefit of that index by applying the TRUNC function to that column’s values – so we will need to fix that problem.

First, we need to know how many hours of capacity are available for each resource.  If a specific over-ride schedule (shop calendar) is not available for a shop resource, then the standard schedule is used.  To retrieve the schedules for the shop resources that do not have a defined unique schedule, we can create a Cartesian join between two tables with a NOT EXISTS clause to exclude those shop resources with an over-ride schedule:

SELECT
  SR.ID AS RESOURCE_ID, 
  CW.DAY_OF_WEEK,
  CW.SHIFT_1*SR.SHIFT_1_CAPACITY AS HOURS_1,
  CW.SHIFT_2*SR.SHIFT_2_CAPACITY AS HOURS_2,
  CW.SHIFT_3*SR.SHIFT_3_CAPACITY AS HOURS_3,
  'DEFAULT' CALENDAR_TYPE
FROM
  CALENDAR_WEEK CW,
  SHOP_RESOURCE SR
WHERE
  (CW.SCHEDULE_ID IS NULL OR CW.SCHEDULE_ID = 'STANDARD')
  AND CW.RESOURCE_ID IS NULL
  AND NOT EXISTS (
    SELECT
      C.RESOURCE_ID
    FROM
      CALENDAR_WEEK C
    WHERE
      SR.ID=C.RESOURCE_ID);

We also need to retrieve the shop resources with defined over-ride schedules:

SELECT
  SR.ID AS RESOURCE_ID, 
  CW.DAY_OF_WEEK,
  CW.SHIFT_1*SR.SHIFT_1_CAPACITY AS HOURS_1,
  CW.SHIFT_2*SR.SHIFT_2_CAPACITY AS HOURS_2,
  CW.SHIFT_3*SR.SHIFT_3_CAPACITY AS HOURS_3,
  'RESOURCE' CALENDAR_TYPE
FROM
  CALENDAR_WEEK CW,
  SHOP_RESOURCE SR
WHERE
  (CW.SCHEDULE_ID IS NULL OR CW.SCHEDULE_ID = 'STANDARD')
  AND CW.RESOURCE_ID=SR.ID;

Finally, we need to UNION ALL the above two queries:

SELECT
  SR.ID AS RESOURCE_ID, 
  CW.DAY_OF_WEEK,
  CW.SHIFT_1*SR.SHIFT_1_CAPACITY AS HOURS_1,
  CW.SHIFT_2*SR.SHIFT_2_CAPACITY AS HOURS_2,
  CW.SHIFT_3*SR.SHIFT_3_CAPACITY AS HOURS_3,
  'DEFAULT' CALENDAR_TYPE
FROM
  CALENDAR_WEEK CW,
  SHOP_RESOURCE SR
WHERE
  (CW.SCHEDULE_ID IS NULL OR CW.SCHEDULE_ID = 'STANDARD')
  AND CW.RESOURCE_ID IS NULL
  AND NOT EXISTS (
    SELECT
      C.RESOURCE_ID
    FROM
      CALENDAR_WEEK C
    WHERE
      SR.ID=C.RESOURCE_ID)
UNION ALL
SELECT
  SR.ID AS RESOURCE_ID, 
  CW.DAY_OF_WEEK,
  CW.SHIFT_1*SR.SHIFT_1_CAPACITY AS HOURS_1,
  CW.SHIFT_2*SR.SHIFT_2_CAPACITY AS HOURS_2,
  CW.SHIFT_3*SR.SHIFT_3_CAPACITY AS HOURS_3,
  'RESOURCE' CALENDAR_TYPE
FROM
  CALENDAR_WEEK CW,
  SHOP_RESOURCE SR
WHERE
  (CW.SCHEDULE_ID IS NULL OR CW.SCHEDULE_ID = 'STANDARD')
  AND CW.RESOURCE_ID=SR.ID
ORDER BY
  RESOURCE_ID,
  DAY_OF_WEEK;

The ERP system uses the following code numbers to represent each day of the week in the output produced by the above SQL statement:

0: Saturday
1: Sunday
2: Monday
3: Tuesday
4: Wednesday
5: Thursday
6: Friday

On Oracle Database, we are able to use the TO_DATE function to almost reproduce the above translation table, although NLS settings that are in effect could produce results that require further adjustment.  The following formula returns 1 for a Sunday and 6 for a Friday, but 7 for Saturday:

TO_DATE(TRANSACTION_DATE, 'D')

We are able to convert the 7 value for Saturdays into a 0 by using the MOD function to return the remainder value after dividing by 7:

MOD(TO_DATE(TRANSACTION_DATE, 'D'), 7)

Assume that we want to generate a report for the dates between March 26, 2012 and April 14, 2012.  We need a way to determine the number of Mondays, Tuesdays, Wednesdays, Thursdays, Fridays, Saturdays, and Sundays between the two dates (including the end-point dates).  There are a couple of approaches to this problem:

  1. Use the CONNECT BY LEVEL syntax to generate a row for each date between the start and end dates.
  2. Use a pre-created, statistically defined table that simply lists all of the dates between an arbitrary start and end date.
  3. Use a stored procedure to calculate the number of each day of the week between the start and end dates.
  4. Mathematically calculate within the SQL statement the number of each day of the week.

I will use the first of the above approaches… I suppose this approach would be a bit of a challenge with a SQL Server backend.

SELECT
  TO_DATE('03/26/2012','MM/DD/YYYY')+(LEVEL-1) MY_DATE,
  TO_CHAR(TO_DATE('03/26/2012','MM/DD/YYYY')+(LEVEL-1), 'D') AS D,
  MOD(TO_CHAR(TO_DATE('03/26/2012','MM/DD/YYYY')+(LEVEL-1), 'D'),7) DAY_OF_WEEK
FROM
  DUAL
CONNECT BY
  LEVEL <= (TO_DATE('04/14/2012','MM/DD/YYYY') - TO_DATE('03/26/2012','MM/DD/YYYY'))+1;

MY_DATE   D DAY_OF_WEEK
--------- - -----------
26-MAR-12 2           2
27-MAR-12 3           3
28-MAR-12 4           4
29-MAR-12 5           5
30-MAR-12 6           6
31-MAR-12 7           0
01-APR-12 1           1
02-APR-12 2           2
03-APR-12 3           3
04-APR-12 4           4
05-APR-12 5           5
06-APR-12 6           6
07-APR-12 7           0
08-APR-12 1           1
09-APR-12 2           2
10-APR-12 3           3
11-APR-12 4           4
12-APR-12 5           5
13-APR-12 6           6
14-APR-12 7           0

If we were to use the above in a production environment, we certainly would convert the literal (constant) date values to bind variables.  The DAY_OF_WEEK column values (assuming that the NLS settings result in Sunday being the first day of the week) are the important values in the above output.  If we also implemented query 4 used by the ERP system, then the MY_DATE column values would also need to be considered.  How many Mondays, Tuesdays, Wednesdays, etc. are between the two dates?:

SELECT
  MOD(TO_CHAR(TO_DATE('03/26/2012','MM/DD/YYYY')+(LEVEL-1), 'D'),7) DAY_OF_WEEK,
  COUNT(*) NUM_DAYS
FROM
  DUAL
CONNECT BY
  LEVEL <= (TO_DATE('04/14/2012','MM/DD/YYYY') - TO_DATE('03/26/2012','MM/DD/YYYY'))+1
GROUP BY
  MOD(TO_CHAR(TO_DATE('03/26/2012','MM/DD/YYYY')+(LEVEL-1), 'D'),7);

DAY_OF_WEEK   NUM_DAYS
----------- ----------
          0          3
          1          2
          2          3
          3          3
          4          3
          5          3
          6          3

As shown above, just 2 Sundays, but 3 of every other day of the week.

We now have two SQL statements (the one that indicates the available capacity per day of week, and the other that indicates the number of each day of the week in the user specified time interval) that need to be joined together:

SELECT
  C.RESOURCE_ID,
  C.DAY_OF_WEEK,
  (C.HOURS_1 + C.HOURS_2 + C.HOURS_3) * D.NUM_DAYS
FROM
  (SELECT
    SR.ID AS RESOURCE_ID, 
    CW.DAY_OF_WEEK,
    CW.SHIFT_1*SR.SHIFT_1_CAPACITY AS HOURS_1,
    CW.SHIFT_2*SR.SHIFT_2_CAPACITY AS HOURS_2,
    CW.SHIFT_3*SR.SHIFT_3_CAPACITY AS HOURS_3,
    'DEFAULT' CALENDAR_TYPE
  FROM
    CALENDAR_WEEK CW,
    SHOP_RESOURCE SR
  WHERE
    (CW.SCHEDULE_ID IS NULL OR CW.SCHEDULE_ID = 'STANDARD')
    AND CW.RESOURCE_ID IS NULL
    AND NOT EXISTS (
      SELECT
        C.RESOURCE_ID
      FROM
        CALENDAR_WEEK C
      WHERE
        SR.ID=C.RESOURCE_ID)
  UNION ALL
  SELECT
    SR.ID AS RESOURCE_ID, 
    CW.DAY_OF_WEEK,
    CW.SHIFT_1*SR.SHIFT_1_CAPACITY AS HOURS_1,
    CW.SHIFT_2*SR.SHIFT_2_CAPACITY AS HOURS_2,
    CW.SHIFT_3*SR.SHIFT_3_CAPACITY AS HOURS_3,
    'RESOURCE' CALENDAR_TYPE
  FROM
    CALENDAR_WEEK CW,
    SHOP_RESOURCE SR
  WHERE
    (CW.SCHEDULE_ID IS NULL OR CW.SCHEDULE_ID = 'STANDARD')
    AND CW.RESOURCE_ID=SR.ID) C,
  (SELECT
    MOD(TO_CHAR(TO_DATE('03/26/2012','MM/DD/YYYY')+(LEVEL-1), 'D'),7) DAY_OF_WEEK,
    COUNT(*) NUM_DAYS
  FROM
    DUAL
  CONNECT BY
    LEVEL <= (TO_DATE('04/14/2012','MM/DD/YYYY') - TO_DATE('03/26/2012','MM/DD/YYYY'))+1
  GROUP BY
    MOD(TO_CHAR(TO_DATE('03/26/2012','MM/DD/YYYY')+(LEVEL-1), 'D'),7) ) D
WHERE
  C.DAY_OF_WEEK=D.DAY_OF_WEEK;

Above is a somewhat large and complicated SQL statement, if you simply jumped into the middle of this article.

We still need to GROUP the result by the RESOURCE_ID column and add in the DESCRIPTION that is associated with each RESOURCE_ID:

SELECT
  C.RESOURCE_ID,
  C.DESCRIPTION,
  SUM((C.HOURS_1 + C.HOURS_2 + C.HOURS_3) * D.NUM_DAYS) CAPACITY
FROM
  (SELECT
    SR.ID AS RESOURCE_ID, 
    SR.DESCRIPTION,
    CW.DAY_OF_WEEK,
    CW.SHIFT_1*SR.SHIFT_1_CAPACITY AS HOURS_1,
    CW.SHIFT_2*SR.SHIFT_2_CAPACITY AS HOURS_2,
    CW.SHIFT_3*SR.SHIFT_3_CAPACITY AS HOURS_3,
    'DEFAULT' CALENDAR_TYPE
  FROM
    CALENDAR_WEEK CW,
    SHOP_RESOURCE SR
  WHERE
    (CW.SCHEDULE_ID IS NULL OR CW.SCHEDULE_ID = 'STANDARD')
    AND CW.RESOURCE_ID IS NULL
    AND NOT EXISTS (
      SELECT
        C.RESOURCE_ID
      FROM
        CALENDAR_WEEK C
      WHERE
        SR.ID=C.RESOURCE_ID)
  UNION ALL
  SELECT
    SR.ID AS RESOURCE_ID,
    SR.DESCRIPTION,
    CW.DAY_OF_WEEK,
    CW.SHIFT_1*SR.SHIFT_1_CAPACITY AS HOURS_1,
    CW.SHIFT_2*SR.SHIFT_2_CAPACITY AS HOURS_2,
    CW.SHIFT_3*SR.SHIFT_3_CAPACITY AS HOURS_3,
    'RESOURCE' CALENDAR_TYPE
  FROM
    CALENDAR_WEEK CW,
    SHOP_RESOURCE SR
  WHERE
    (CW.SCHEDULE_ID IS NULL OR CW.SCHEDULE_ID = 'STANDARD')
    AND CW.RESOURCE_ID=SR.ID) C,
  (SELECT
    MOD(TO_CHAR(TO_DATE('03/26/2012','MM/DD/YYYY')+(LEVEL-1), 'D'),7) DAY_OF_WEEK,
    COUNT(*) NUM_DAYS
  FROM
    DUAL
  CONNECT BY
    LEVEL <= (TO_DATE('04/14/2012','MM/DD/YYYY') - TO_DATE('03/26/2012','MM/DD/YYYY'))+1
  GROUP BY
    MOD(TO_CHAR(TO_DATE('03/26/2012','MM/DD/YYYY')+(LEVEL-1), 'D'),7) ) D
WHERE
  C.DAY_OF_WEEK=D.DAY_OF_WEEK
GROUP BY
  C.RESOURCE_ID,
  C.DESCRIPTION
ORDER BY
  RESOURCE_ID;

The above SQL statement results in the following output – we now have the first column in the ERP system’s canned report:

At this point, my claim of being able to reproduce this canned report in a single SQL statement might seem suspect – all of the above work for just a single column in the report.  It’s easy, just build the report in separate units, and then join the separate units.

As I mentioned earlier, calculating the estimated average hours and average actual hours could be a little difficult.  We might start with something like this:

SELECT
  LT.RESOURCE_ID,
  O.RUN_HRS AS EST_HOURS,
  O.CALC_END_QTY AS OPERATION_QTY,
  ROUND(O.RUN_HRS/O.CALC_END_QTY,2) AS AVG_EST_HOURS,
  SUM(LT.HOURS_WORKED) AS ACT_HOURS,
  SUM(LT.GOOD_QTY) AS COMPLETED_QTY,
  ROUND(DECODE(SUM(LT.GOOD_QTY),0,SUM(LT.HOURS_WORKED),SUM(LT.HOURS_WORKED)/SUM(LT.GOOD_QTY)),2) AVG_ACT_HOURS
FROM
  LABOR_TICKET LT,
  OPERATION O
WHERE
  LT.WORKORDER_TYPE='W'
  AND LT.TYPE='R'
  AND LT.WORKORDER_TYPE=O.WORKORDER_TYPE
  AND LT.WORKORDER_BASE_ID=O.WORKORDER_BASE_ID
  AND LT.WORKORDER_LOT_ID=O.WORKORDER_LOT_ID
  AND LT.WORKORDER_SPLIT_ID=O.WORKORDER_SPLIT_ID
  AND LT.WORKORDER_SUB_ID=O.WORKORDER_SUB_ID
  AND LT.OPERATION_SEQ_NO=O.SEQUENCE_NO
  AND LT.SHIFT_DATE BETWEEN TO_DATE('03/26/2012','MM/DD/YYYY') AND TO_DATE('04/14/2012','MM/DD/YYYY')
GROUP BY
  LT.RESOURCE_ID,
  O.RUN_HRS,
  O.CALC_END_QTY
ORDER BY
  LT.RESOURCE_ID;

The output of the above SQL statement looks similar to the following:

Nice, if we were just interested in summarizing by the individual operations.  However, the above will not work as a starting point for the next two columns in the report because we need to summarized by the RESOURCE_ID column – if we group on that column, we will throw off the accuracy of the efficiency calculations in the report.  So, we need to take a step back and try again.

First, we will summarize the data from the LABOR_TICKET table by itself – notice that I am not using the TRUNC function around the date column, and I am using the SHIFT_DATE column rather than the TRANSACTION_DATE column (the SHIFT_DATE column in this table is better able to work around the issue with labor transactions that cross midnight, and the time portion of the date values are already truncated to midnight):

SELECT
  LT.WORKORDER_TYPE,
  LT.WORKORDER_BASE_ID,
  LT.WORKORDER_LOT_ID,
  LT.WORKORDER_SPLIT_ID,
  LT.WORKORDER_SUB_ID,
  LT.OPERATION_SEQ_NO,
  LT.RESOURCE_ID,
  SUM(LT.HOURS_WORKED) AS ACT_HOURS,
  SUM(LT.GOOD_QTY) AS COMPLETED_QTY
FROM
  LABOR_TICKET LT
WHERE
  LT.SHIFT_DATE BETWEEN TO_DATE('03/26/2012','MM/DD/YYYY') AND TO_DATE('04/14/2012','MM/DD/YYYY')
  AND LT.TYPE='R'
  AND LT.WORKORDER_TYPE='W'
GROUP BY
  LT.WORKORDER_TYPE,
  LT.WORKORDER_BASE_ID,
  LT.WORKORDER_LOT_ID,
  LT.WORKORDER_SPLIT_ID,
  LT.WORKORDER_SUB_ID,
  LT.OPERATION_SEQ_NO,
  LT.RESOURCE_ID
HAVING
  SUM(LT.HOURS_WORKED)>0;

Now that we have pre-summarized the results from the LABOR_TICKET table, we can slide the above SQL statement into an inline view and join that inline view with the OPERATION table:

SELECT
  LT.RESOURCE_ID,
  SUM(ROUND(O.RUN_HRS/O.CALC_END_QTY*LT.COMPLETED_QTY,2)) AS ENG_HOURS,
  SUM(LT.ACT_HOURS) AS ACT_HOURS,
  ROUND(SUM(ROUND(O.RUN_HRS/O.CALC_END_QTY*LT.COMPLETED_QTY,2))/SUM(LT.ACT_HOURS)*100,2) AS EFF
FROM
  (SELECT
    LT.WORKORDER_TYPE,
    LT.WORKORDER_BASE_ID,
    LT.WORKORDER_LOT_ID,
    LT.WORKORDER_SPLIT_ID,
    LT.WORKORDER_SUB_ID,
    LT.OPERATION_SEQ_NO,
    LT.RESOURCE_ID,
    SUM(LT.HOURS_WORKED) AS ACT_HOURS,
    SUM(LT.GOOD_QTY) AS COMPLETED_QTY
  FROM
    LABOR_TICKET LT
  WHERE
    LT.SHIFT_DATE BETWEEN TO_DATE('03/26/2012','MM/DD/YYYY') AND TO_DATE('04/14/2012','MM/DD/YYYY')
    AND LT.TYPE='R'
    AND LT.WORKORDER_TYPE='W'
  GROUP BY
    LT.WORKORDER_TYPE,
    LT.WORKORDER_BASE_ID,
    LT.WORKORDER_LOT_ID,
    LT.WORKORDER_SPLIT_ID,
    LT.WORKORDER_SUB_ID,
    LT.OPERATION_SEQ_NO,
    LT.RESOURCE_ID
  HAVING
    SUM(LT.HOURS_WORKED)>0) LT,
  OPERATION O
WHERE
  LT.WORKORDER_TYPE=O.WORKORDER_TYPE
  AND LT.WORKORDER_BASE_ID=O.WORKORDER_BASE_ID
  AND LT.WORKORDER_LOT_ID=O.WORKORDER_LOT_ID
  AND LT.WORKORDER_SPLIT_ID=O.WORKORDER_SPLIT_ID
  AND LT.WORKORDER_SUB_ID=O.WORKORDER_SUB_ID
  AND LT.OPERATION_SEQ_NO=O.SEQUENCE_NO
GROUP BY
  LT.RESOURCE_ID
ORDER BY
  LT.RESOURCE_ID;

The result of the above SQL statement is much more useful than the original attempt.  Note that the above SQL statement makes certain assumptions about the estimated (ENG_HOURS) that the ERP system does not make, so the resulting numbers could be slightly different.

We now have three more columns of our report.

The final step is to put the complete SQL statement together, joining the previously created portion of the SQL statement that determined the available capacity with the just created SQL statement that determined the actual and estimated utilization.  The resulting SQL statement appears as follows:

SELECT
  C.RESOURCE_ID,
  C.DESCRIPTION,
  C.CAPACITY,
  E.ACT_HOURS,
  E.ENG_HOURS AS EST_HOURS,
  E.EFF,
  DECODE(C.CAPACITY,0,0,ROUND(E.ACT_HOURS/C.CAPACITY*100,2)) AS ACT_UTIL,
  DECODE(C.CAPACITY,0,0,ROUND(E.ENG_HOURS/C.CAPACITY*100,2)) AS EST_UTIL
FROM
  (SELECT
    C.RESOURCE_ID,
    C.DESCRIPTION,
    SUM((C.HOURS_1 + C.HOURS_2 + C.HOURS_3) * D.NUM_DAYS) CAPACITY
  FROM
    (SELECT
      SR.ID AS RESOURCE_ID, 
      SR.DESCRIPTION,
      CW.DAY_OF_WEEK,
      CW.SHIFT_1*SR.SHIFT_1_CAPACITY AS HOURS_1,
      CW.SHIFT_2*SR.SHIFT_2_CAPACITY AS HOURS_2,
      CW.SHIFT_3*SR.SHIFT_3_CAPACITY AS HOURS_3,
      'DEFAULT' CALENDAR_TYPE
    FROM
      CALENDAR_WEEK CW,
      SHOP_RESOURCE SR
    WHERE
      (CW.SCHEDULE_ID IS NULL OR CW.SCHEDULE_ID = 'STANDARD')
      AND CW.RESOURCE_ID IS NULL
      AND NOT EXISTS (
        SELECT
          C.RESOURCE_ID
        FROM
          CALENDAR_WEEK C
        WHERE
          SR.ID=C.RESOURCE_ID)
    UNION ALL
    SELECT
      SR.ID AS RESOURCE_ID,
      SR.DESCRIPTION,
      CW.DAY_OF_WEEK,
      CW.SHIFT_1*SR.SHIFT_1_CAPACITY AS HOURS_1,
      CW.SHIFT_2*SR.SHIFT_2_CAPACITY AS HOURS_2,
      CW.SHIFT_3*SR.SHIFT_3_CAPACITY AS HOURS_3,
      'RESOURCE' CALENDAR_TYPE
    FROM
      CALENDAR_WEEK CW,
      SHOP_RESOURCE SR
    WHERE
      (CW.SCHEDULE_ID IS NULL OR CW.SCHEDULE_ID = 'STANDARD')
      AND CW.RESOURCE_ID=SR.ID) C,
    (SELECT
      MOD(TO_CHAR(TO_DATE('03/26/2012','MM/DD/YYYY')+(LEVEL-1), 'D'),7) DAY_OF_WEEK,
      COUNT(*) NUM_DAYS
    FROM
      DUAL
    CONNECT BY
      LEVEL <= (TO_DATE('04/14/2012','MM/DD/YYYY') - TO_DATE('03/26/2012','MM/DD/YYYY'))+1
    GROUP BY
      MOD(TO_CHAR(TO_DATE('03/26/2012','MM/DD/YYYY')+(LEVEL-1), 'D'),7) ) D
  WHERE
    C.DAY_OF_WEEK=D.DAY_OF_WEEK
  GROUP BY
    C.RESOURCE_ID,
    C.DESCRIPTION) C,
  (SELECT
    LT.RESOURCE_ID,
    SUM(ROUND(O.RUN_HRS/O.CALC_END_QTY*LT.COMPLETED_QTY,2)) AS ENG_HOURS,
    SUM(LT.ACT_HOURS) AS ACT_HOURS,
    ROUND(SUM(ROUND(O.RUN_HRS/O.CALC_END_QTY*LT.COMPLETED_QTY,2))/SUM(LT.ACT_HOURS)*100,2) AS EFF
  FROM
    (SELECT
      LT.WORKORDER_TYPE,
      LT.WORKORDER_BASE_ID,
      LT.WORKORDER_LOT_ID,
      LT.WORKORDER_SPLIT_ID,
      LT.WORKORDER_SUB_ID,
      LT.OPERATION_SEQ_NO,
      LT.RESOURCE_ID,
      SUM(LT.HOURS_WORKED) AS ACT_HOURS,
      SUM(LT.GOOD_QTY) AS COMPLETED_QTY
    FROM
      LABOR_TICKET LT
    WHERE
      LT.SHIFT_DATE BETWEEN TO_DATE('03/26/2012','MM/DD/YYYY') AND TO_DATE('04/14/2012','MM/DD/YYYY')
      AND LT.TYPE='R'
      AND LT.WORKORDER_TYPE='W'
    GROUP BY
      LT.WORKORDER_TYPE,
      LT.WORKORDER_BASE_ID,
      LT.WORKORDER_LOT_ID,
      LT.WORKORDER_SPLIT_ID,
      LT.WORKORDER_SUB_ID,
      LT.OPERATION_SEQ_NO,
      LT.RESOURCE_ID
    HAVING
      SUM(LT.HOURS_WORKED)>0) LT,
    OPERATION O
  WHERE
    LT.WORKORDER_TYPE=O.WORKORDER_TYPE
    AND LT.WORKORDER_BASE_ID=O.WORKORDER_BASE_ID
    AND LT.WORKORDER_LOT_ID=O.WORKORDER_LOT_ID
    AND LT.WORKORDER_SPLIT_ID=O.WORKORDER_SPLIT_ID
    AND LT.WORKORDER_SUB_ID=O.WORKORDER_SUB_ID
    AND LT.OPERATION_SEQ_NO=O.SEQUENCE_NO
  GROUP BY
    LT.RESOURCE_ID) E
WHERE
  E.RESOURCE_ID=C.RESOURCE_ID
ORDER BY
  C.RESOURCE_ID;

The resulting output appears as follows:

Well, that was easy… :-) 

The solution reminds me of the phrase “How to Eat an Elephant“.  Elephant poaching is illegal in most parts of the world, so I propose changing this phrase to “How to Walk an Elephant”.  One step at a time, and make certain not to stand directly in front or directly behind.  ;-)





Monitoring Changes to Table Data

22 03 2012

March 22, 2012

Suppose that you receive a request stating that a particular table in one of your databases must be monitored for changes.  One such table might be the one that lists the ERP system’s suppliers and their addresses – you would not want someone to be able to set up a legitimate supplier, and later have the billing remit to name and address changed without being detected.

What approach would you take to solve the above problem?  A question similar to the above arrived recently in an ERP mailing list email – the original poster (OP) is using a SQL Server RDBMS, so that might change the proposed solution just a bit.

Here is the “How simple is too simple?” suggestion that I offered:

Periodically, create a VENDOR_SAVE table (drop it if it already exists, or delete all rows and re-insert from the original source table):

CREATE TABLE VENDOR_SAVE AS
SELECT
  *
FROM
  VENDOR;

Now, just wait a while.

The question then becomes, how do we detect:

  • A new row (record) added to the original source (VENDOR) table.
  • An old row (record) deleted from the original source (VENDOR) table.
  • A change to any column (program field) in the original source (VENDOR) table since the last time the VENDOR_SAVE table was created/refreshed.

Let’s start with the first two bullet points.  I will write the SQL statements so that the statements should work with Oracle Database 9.0.1 and above, and SQL Server with very few changes, even if I feel a bit ANSI about doing so.  I will use the COALESCE function, rather than the NVL function, and CASE syntax rather than the equivalent DECODE syntax.

To identify cases where a row has appeared in, or disappeared from the original source (VENDOR) table, we can simply perform a full outer join between the original source table and the historical mirror image of the original table (VENDOR_SAVE).  We are only interested in cases where the primary key column (ID) is found in exactly one of the two tables:

SELECT
  COALESCE(V.ID,V2.ID) AS ID,
  COALESCE(V.NAME,V2.NAME) AS NAME,
  COALESCE(V.ADDR_1,V2.ADDR_1) AS ADDR_1,
  CASE WHEN V.ID IS NULL THEN 'VENDOR ADDED' ELSE 'VENDOR DELETED' END AS PROBLEM
FROM
  VENDOR V
FULL OUTER JOIN
  VENDOR_SAVE V2
ON (V.ID=V2.ID)
WHERE
  V.ID IS NULL
  OR V2.ID IS NULL;

So, the above SQL statement satisfies the first two bullet points.  The third bullet point is a little more challenging to accomplish… unless of course we employ UNION labor.  If we have two row sources with identical columns, and UNION the row sources together, the resulting row source will be absent of any entirely duplicated rows from the two original row sources (two rows will be reduced to a single row).  If there were no changes to any of the column values (or if the row was added to or deleted from the original source table), there will be a single row for the primary key column value.  If any columns were changed, there will be two rows containing the primary key column value.

Let’s build a SQL statement that UNIONs the rows from the two tables together, and counts the number of rows for each primary key value:

SELECT
  COUNT(*) OVER (PARTITION BY VL.ID) AS CNT,
  VL.ID,
  VL.NAME,
  VL.ADDR_1
FROM
  (SELECT
    *
  FROM
    VENDOR
  UNION
  SELECT
    *
  FROM
    VENDOR_SAVE) VL;

To complete the requirement for bullet point 3 above, we need to eliminate all rows from the result set where there is a single row for the primary key value:

SELECT DISTINCT
  VL2.ID,
  VL2.NAME,
  VL2.ADDR_1,
  'VALUE CHANGE' PROBLEM
FROM
  (SELECT
    COUNT(*) OVER (PARTITION BY VL.ID) AS CNT,
    VL.ID,
    VL.NAME,
    VL.ADDR_1
  FROM
    (SELECT
      *
    FROM
      VENDOR
    UNION
    SELECT
      *
    FROM
      VENDOR_SAVE) VL) VL2
WHERE
  VL2.CNT>1;

As a final step, we should join the two resultsets into a single resultset using UNION ALL:

SELECT
  COALESCE(V.ID,V2.ID) AS ID,
  COALESCE(V.NAME,V2.NAME) AS NAME,
  COALESCE(V.ADDR_1,V2.ADDR_1) AS ADDR_1,
  CASE WHEN V.ID IS NULL THEN 'VENDOR ADDED' ELSE 'VENDOR DELETED' END AS PROBLEM
FROM
  VENDOR V
FULL OUTER JOIN
  VENDOR_SAVE V2
ON (V.ID=V2.ID)
WHERE
  V.ID IS NULL
  OR V2.ID IS NULL
UNION ALL
SELECT DISTINCT
  VL2.ID,
  VL2.NAME,
  VL2.ADDR_1,
  'VALUE CHANGE' PROBLEM
FROM
  (SELECT
    COUNT(*) OVER (PARTITION BY VL.ID) AS CNT,
    VL.ID,
    VL.NAME,
    VL.ADDR_1
  FROM
    (SELECT
      *
    FROM
      VENDOR
    UNION
    SELECT
      *
    FROM
      VENDOR_SAVE) VL) VL2
WHERE
  VL2.CNT>1
ORDER BY
  ID;

While somewhat limited in usefulness, the above approach will indicate which rows in the original source table should be examined because the column values in those rows changed (or were added or deleted).

Another, potentially more useful approach involves setting up a logging trigger and logging table.  I previously shared a VBS script that helps to automate and standardize the process of creating the logging trigger and logging table.  A very similar feature is built into my Hyper-Extended Oracle Performance Monitor program – but, much like the VBS script, this solution is useless for the OP who uses a SQL Server backend.

What are the other options?  Oracle Database’s built-in auditing capabilities.  Auditing built into the ERP system (this particular ERP system writes all audit/change records to a single table that uses a VARCHAR2 column to capture the before/after images of the values stored in columns).  Any other options?  (The mess that someone can create with a bit of idle time on their hands…)





Oracle Query Optimizer Vanishing Acts

3 02 2012

February 3, 2012

A couple of days ago I noticed an interesting thread in the comp.databases.oracle.server Usenet group that described a problem of vanishing tables.  The title of the thread certainly caught my attention, and I was a bit disappointed when I found that the there was little to no magic involved in the vanishing act.  The situation reported in the thread was not that the table and its data simply disappeared, but instead that a table referenced in a SQL statement simply did not appear in an execution plan.  While not quite as entertaining as the TV show Magic’s Biggest Secrets Finally Revealed, the thread is worth reading, with discussion of enhancements provided in recent Oracle Database releases.

Almost two years ago I wrote a blog article that showed a similar vanishing act, where a SQL statement referencing the same table eight times triggered an interesting enhancement:

SELECT /*+ USE_NL(T51 T52 T53 T54 T55 T56 T57 T58) */
  T51.C1,
  T51.C2,
  T52.C1,
  T52.C2,
  T53.C1,
  T53.C2,
  T54.C1,
  T54.C2,
  T55.C1,
  T55.C2,
  T56.C1,
  T56.C2,
  T57.C1,
  T57.C2,
  T58.C1,
  T58.C2
FROM
  T5 T51,
  T5 T52,
  T5 T53,
  T5 T54,
  T5 T55,
  T5 T56,
  T5 T57,
  T5 T58
WHERE
  T51.C1=T52.C1
  AND T51.C1=T53.C1
  AND T51.C1=T54.C1
  AND T51.C1=T55.C1
  AND T51.C1=T56.C1
  AND T51.C1=T57.C1
  AND T51.C1=T58.C1
  AND T51.C1 BETWEEN 'A 0000000' AND 'A 1000000'; 

Oracle Database 11.1.0.6 behaved as expected, where table T5 was included eight times in the execution plan for the above SQL statement.  Oracle Database 11.2.0.1 output the following execution plan for the above SQL statement:

Plan hash value: 2002323537

------------------------------------------------------------------------------------
| Id  | Operation         | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |      1 |        |   1000K|00:00:00.50 |   26055 |
|*  1 |  TABLE ACCESS FULL| T5   |      1 |   1000K|   1000K|00:00:00.50 |   26055 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(("T51"."C1">='A 0000000' AND "T51"."C1"<='A 1000000')) 

A fine example of becoming eight times as efficient, while producing the same output.

Coincidentally, the same day that I saw the above mentioned Usenet thread, an email arrived from an ERP mailing list.  The original poster (OP) in the ERP mailing list reported that a SQL statement similar to the following was possibly causing a report to fail when the report was viewed by a typical ERP user:

SELECT
  C.TERRITORY,
  C.TAX_ID_NUMBER,
  O.CONTACT_FIRST_NAME,
  O.CONTACT_LAST_NAME,
  O.CONTACT_PHONE,
  O.CONTACT_FAX
FROM
  CUSTOMERS C,
  CUSTOMER_ORDERS O
WHERE
  C.ID = :CO_CUSTOMER_ID
  AND O.ID = :CO_ID ; 

Notice in the above that there is no declared join between the CUSTOMERS and CUSTOMER_ORDERS tables.  The ID column in each table is the primary key column, so at most one row will be returned from each table.  There is a declared foreign key constraint on the CUSTOMER_ORDERS.CUSTOMER_ID column that points to the CUSTOMERS.ID column (you might be wondering if that foreign key constraint might generate an additional predicate in the optimized version of the WHERE clause).  I suppose that if we want to be technical, we could state that the query creates a Cartesian join between the CUSTOMERS and CUSTOMER_ORDERS table, but in this case I do not see this Cartesian join as a problem since each row source will return at most one row.

Let’s try a quick experiment in SQL*Plus with Oracle Database 11.2.0.2 to see what the execution plan looks like for the above SQL statement:

EXEC :CO_CUSTOMER_ID:='CLA/COM/STA'
EXEC :CO_ID:='10002'

SET LINESIZE 120
SET PAGESIZE 1000

SELECT /*+ GATHER_PLAN_STATISTICS */
  C.TERRITORY,
  C.TAX_ID_NUMBER,
  O.CONTACT_FIRST_NAME,
  O.CONTACT_LAST_NAME,
  O.CONTACT_PHONE,
  O.CONTACT_FAX
FROM
  CUSTOMERS C,
  CUSTOMER_ORDERS O
WHERE
  C.ID = :CO_CUSTOMER_ID
  AND O.ID = :CO_ID ;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

Plan hash value: 31577051

----------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name            | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                 |      1 |        |      1 |00:00:00.01 |       5 |
|   1 |  NESTED LOOPS                |                 |      1 |      1 |      1 |00:00:00.01 |       5 |
|   2 |   TABLE ACCESS BY INDEX ROWID| CUSTOMER_ORDERS |      1 |      1 |      1 |00:00:00.01 |       3 |
|*  3 |    INDEX UNIQUE SCAN         | SYS_C0021619    |      1 |      1 |      1 |00:00:00.01 |       2 |
|   4 |   TABLE ACCESS BY INDEX ROWID| CUSTOMERS       |      1 |      1 |      1 |00:00:00.01 |       2 |
|*  5 |    INDEX UNIQUE SCAN         | SYS_C0021612    |      1 |      1 |      1 |00:00:00.01 |       1 |
----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("O"."ID"=:CO_ID)
   5 - access("C"."ID"=:CO_CUSTOMER_ID) 

So, the optimizer managed to recognize that the SQL statement is requesting a single row (the E-Rows column), where each row in the outer table (CUSTOMER_ORDERS) is expected to cause the retrieval of one row from the inner table (CUSTOMERS), and that prediction was accurate based on the values shown in the Starts and A-Rows columns.

Another reader of the ERP mailing list mentioned that the joins between tables should always be specified in SQL statements.  While logically correct, I also recall reading in the “Cost-Based Oracle Fundamentals” book about an interesting side-effect of transitive closure, where join conditions between tables in SQL statements might automatically vanish during query optimization; based on the information found in the book, the vanishing act is Oracle Database version dependent and the behavior may be affected by certain initialization parameters (Oracle Database 10.2 is said to not remove the join condition by default).  The OP in the ERP mailing list is running Oracle Database 8.1.0.7.  If we did modify the query to specify the join condition C.ID=O.CUSTOMER_ID, would that join condition still appear in the “optimized” version of the SQL statement after transitive closure on Oracle Database 8.1.0.7?  What about Oracle Database 9.0.1, 9.2.0.8, 10.2.0.5, or even something more recent such as 11.2.0.2?

Why guess, when you can easily set up a test case script?  First, we will create two sample tables with a declared foreign key relationship.  Each table will be created with a FILLER column that is declared as a VARCHAR2(200) – that column will substitute for the various other columns (in the production version of the tables) that typically consume roughly 200 characters per row:

CREATE TABLE CUSTOMERS (
  ID VARCHAR2(15) NOT NULL ENABLE,
  NAME VARCHAR2(50),
  CONTACT_FIRST_NAME VARCHAR2(30),
  CONTACT_LAST_NAME VARCHAR2(30),
  CONTACT_PHONE VARCHAR2(20),
  CONTACT_FAX VARCHAR2(20),
  TERRITORY VARCHAR2(15),
  TAX_ID_NUMBER VARCHAR2(25),
  FILLER VARCHAR2(200),
  PRIMARY KEY (ID));

CREATE TABLE CUSTOMER_ORDERS (
  ID VARCHAR2(15) NOT NULL ENABLE,
  CUSTOMER_ID VARCHAR2(15) NOT NULL ENABLE,
  CONTACT_FIRST_NAME VARCHAR2(30),
  CONTACT_LAST_NAME VARCHAR2(30),
  CONTACT_PHONE VARCHAR2(20),
  CONTACT_FAX VARCHAR2(20),
  FILLER VARCHAR2(200),
  PRIMARY KEY (ID),
  CONSTRAINT CHK_CUST FOREIGN KEY (CUSTOMER_ID)
    REFERENCES CUSTOMERS (ID) ENABLE); 

Next, we will insert a combination of reproducible and random data into the two tables (with 1,000 rows being inserted into the CUSTOMERS table and 500,000 rows being inserted into the CUSTOMER_ORDERS table), create an index on the foreign key column in the CUSTOMER_ORDERS table, and collect table and index statistics:

INSERT INTO
  CUSTOMERS
SELECT
  RPAD(CHR(MOD(ROWNUM-1,26)+65),8,CHR(MOD(ROWNUM,26)+65))||TO_CHAR(ROWNUM) ID,
  RPAD(CHR(MOD(ROWNUM-1,26)+65),30,CHR(MOD(ROWNUM,20)+65))||TO_CHAR(ROWNUM) NAME,
  RPAD(CHR(MOD(ROWNUM-1,26)+65),10,CHR(MOD(ROWNUM,26)+96)) CONTACT_FIRST_NAME,
  RPAD(CHR(MOD(ROWNUM+1,26)+65),10,CHR(MOD(ROWNUM+2,26)+96)) CONTACT_LAST_NAME,
  '###-###-####' CONTACT_PHONE,
  '###-###-####' CONTACT_FAX,
  DBMS_RANDOM.STRING('A',10) TERRITORY,
  DBMS_RANDOM.STRING('A',20) TAX_ID_NUMBER,
  RPAD('A',200,'A') FILLER
FROM
  DUAL
CONNECT BY
  LEVEL<=1000;

INSERT INTO
  CUSTOMER_ORDERS
SELECT /*+ LEADING(R) */
  TO_CHAR(ROWNUM+10000) ID,
  C.ID CUSTOMER_ID,
  C.CONTACT_FIRST_NAME,
  C.CONTACT_LAST_NAME,
  C.CONTACT_PHONE,
  C.CONTACT_FAX,
  C.FILLER
FROM
  CUSTOMERS C,
  (SELECT
     ROWNUM RN
   FROM
     DUAL
   CONNECT BY
     LEVEL<=500) R;

COMMIT;

CREATE INDEX IND_CO_CUSTOMER ON CUSTOMER_ORDERS (CUSTOMER_ID);

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'CUSTOMERS',CASCADE=>TRUE)
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'CUSTOMER_ORDERS',CASCADE=>TRUE) 

In the test script that follows, I will use the customer order 15000.  In my test case tables, the customer ID for this customer order is LMMMMMMM1000, but your CUSTOMER_ORDERS table might have a different customer ID for that row.  Let’s see which customer ID is associated with customer order 15000:

SELECT
  CUSTOMER_ID
FROM
  CUSTOMER_ORDERS
WHERE
  ID='15000';

CUSTOMER_ID
---------------
LMMMMMMM1000 

The heart of the test script follows.  In the script, change the value LMMMMMMM1000 where the bind variable value is set so that the value matches the CUSTOMER_ID that was returned by the above SQL statement, and set the OPTIMIZER_FEATURES_ENABLE parameter value to the default value for your database version.  The test script first includes the SQL statement that appeared in the OP’s email, followed by a modified version of the SQL statement that also includes the table join condition C.ID=O.CUSTOMER_ID, and a third SQL statement that joins the foreign key column in the CUSTOMER_ORDERS table with the primary key column in the CUSTOMERS table (in substitution of explicitly specifying the C.ID = :CO_CUSTOMER_ID predicate in the WHERE clause).  After each SQL statement is executed, the execution plan is retrieved for that SQL statement.  Will the execution plans for the three SQL statements be the same, or will the execution plans for one or more SQL statements differ?:

VARIABLE CO_CUSTOMER_ID VARCHAR2(15)
VARIABLE CO_ID VARCHAR2(15)

EXEC :CO_CUSTOMER_ID:='LMMMMMMM1000'
EXEC :CO_ID:='15000'

SET LINESIZE 120
SET PAGESIZE 1000

ALTER SESSION SET OPTIMIZER_FEATURES_ENABLE='11.2.0.2';

SELECT /*+ GATHER_PLAN_STATISTICS */
  C.TERRITORY,
  C.TAX_ID_NUMBER,
  O.CONTACT_FIRST_NAME,
  O.CONTACT_LAST_NAME,
  O.CONTACT_PHONE,
  O.CONTACT_FAX
FROM
  CUSTOMERS C,
  CUSTOMER_ORDERS O
WHERE
  C.ID = :CO_CUSTOMER_ID
  AND O.ID = :CO_ID ;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

SELECT /*+ GATHER_PLAN_STATISTICS */
  C.TERRITORY,
  C.TAX_ID_NUMBER,
  O.CONTACT_FIRST_NAME,
  O.CONTACT_LAST_NAME,
  O.CONTACT_PHONE,
  O.CONTACT_FAX
FROM
  CUSTOMERS C,
  CUSTOMER_ORDERS O
WHERE
  C.ID = :CO_CUSTOMER_ID
  AND O.ID = :CO_ID
  AND C.ID=O.CUSTOMER_ID;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

SELECT /*+ GATHER_PLAN_STATISTICS */
  C.TERRITORY,
  C.TAX_ID_NUMBER,
  O.CONTACT_FIRST_NAME,
  O.CONTACT_LAST_NAME,
  O.CONTACT_PHONE,
  O.CONTACT_FAX
FROM
  CUSTOMERS C,
  CUSTOMER_ORDERS O
WHERE
  O.ID = :CO_ID
  AND C.ID=O.CUSTOMER_ID;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST')); 

Repeat the above script several times, using progressively older Oracle Database versions in the ALTER SESSION SET OPTIMIZER_FEATURES_ENABLE line of the script (I used 11.2.0.2, 10.2.0.5, 9.2.0.8, 9.0.1, and 8.1.7).  Will the execution plans remain the same for the various executions of the script, or will the value of the OPTIMIZER_FEATURES_ENABLE parameter impact the execution plan?

Compare your results with those that I obtained below (to shorten the output, I removed the non-essential row that was returned by the SQL statements).

With OPTIMIZER_FEATURES_ENABLE=11.2.0.2:

SQL> SELECT /*+ GATHER_PLAN_STATISTICS */
  2    C.TERRITORY,
  3    C.TAX_ID_NUMBER,
  4    O.CONTACT_FIRST_NAME,
  5    O.CONTACT_LAST_NAME,
  6    O.CONTACT_PHONE,
  7    O.CONTACT_FAX
  8  FROM
  9    CUSTOMERS C,
 10    CUSTOMER_ORDERS O
 11  WHERE
 12    C.ID = :CO_CUSTOMER_ID
 13    AND O.ID = :CO_ID ;

SQL_ID  0vy1zt7jq25fs, child number 0
-------------------------------------
SELECT /*+ GATHER_PLAN_STATISTICS */   C.TERRITORY,   C.TAX_ID_NUMBER,
 O.CONTACT_FIRST_NAME,   O.CONTACT_LAST_NAME,   O.CONTACT_PHONE,
O.CONTACT_FAX FROM   CUSTOMERS C,   CUSTOMER_ORDERS O WHERE   C.ID =
:CO_CUSTOMER_ID   AND O.ID = :CO_ID

Plan hash value: 1588498623

----------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name            | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                 |      1 |        |      1 |00:00:00.01 |       7 |
|   1 |  NESTED LOOPS                |                 |      1 |      1 |      1 |00:00:00.01 |       7 |
|   2 |   TABLE ACCESS BY INDEX ROWID| CUSTOMER_ORDERS |      1 |      1 |      1 |00:00:00.01 |       4 |
|*  3 |    INDEX UNIQUE SCAN         | SYS_C008716     |      1 |      1 |      1 |00:00:00.01 |       3 |
|   4 |   TABLE ACCESS BY INDEX ROWID| CUSTOMERS       |      1 |      1 |      1 |00:00:00.01 |       3 |
|*  5 |    INDEX UNIQUE SCAN         | SYS_C008713     |      1 |      1 |      1 |00:00:00.01 |       2 |
----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("O"."ID"=:CO_ID)
   5 - access("C"."ID"=:CO_CUSTOMER_ID)

SQL>
SQL> SELECT /*+ GATHER_PLAN_STATISTICS */
  2    C.TERRITORY,
  3    C.TAX_ID_NUMBER,
  4    O.CONTACT_FIRST_NAME,
  5    O.CONTACT_LAST_NAME,
  6    O.CONTACT_PHONE,
  7    O.CONTACT_FAX
  8  FROM
  9    CUSTOMERS C,
 10    CUSTOMER_ORDERS O
 11  WHERE
 12    C.ID = :CO_CUSTOMER_ID
 13    AND O.ID = :CO_ID
 14    AND C.ID=O.CUSTOMER_ID;

SQL_ID  6suwfw1m251j0, child number 0
-------------------------------------
SELECT /*+ GATHER_PLAN_STATISTICS */   C.TERRITORY,   C.TAX_ID_NUMBER,
 O.CONTACT_FIRST_NAME,   O.CONTACT_LAST_NAME,   O.CONTACT_PHONE,
O.CONTACT_FAX FROM   CUSTOMERS C,   CUSTOMER_ORDERS O WHERE   C.ID =
:CO_CUSTOMER_ID   AND O.ID = :CO_ID   AND C.ID=O.CUSTOMER_ID

Plan hash value: 1588498623

----------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name            | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                 |      1 |        |      1 |00:00:00.01 |       7 |
|   1 |  NESTED LOOPS                |                 |      1 |      1 |      1 |00:00:00.01 |       7 |
|*  2 |   TABLE ACCESS BY INDEX ROWID| CUSTOMER_ORDERS |      1 |      1 |      1 |00:00:00.01 |       4 |
|*  3 |    INDEX UNIQUE SCAN         | SYS_C008716     |      1 |      1 |      1 |00:00:00.01 |       3 |
|   4 |   TABLE ACCESS BY INDEX ROWID| CUSTOMERS       |      1 |      1 |      1 |00:00:00.01 |       3 |
|*  5 |    INDEX UNIQUE SCAN         | SYS_C008713     |      1 |      1 |      1 |00:00:00.01 |       2 |
----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("O"."CUSTOMER_ID"=:CO_CUSTOMER_ID)
   3 - access("O"."ID"=:CO_ID)
   5 - access("C"."ID"=:CO_CUSTOMER_ID)

SQL>
SQL> SELECT /*+ GATHER_PLAN_STATISTICS */
  2    C.TERRITORY,
  3    C.TAX_ID_NUMBER,
  4    O.CONTACT_FIRST_NAME,
  5    O.CONTACT_LAST_NAME,
  6    O.CONTACT_PHONE,
  7    O.CONTACT_FAX
  8  FROM
  9    CUSTOMERS C,
 10    CUSTOMER_ORDERS O
 11  WHERE
 12    O.ID = :CO_ID
 13    AND C.ID=O.CUSTOMER_ID;

SQL_ID  abbq9g44p111b, child number 0
-------------------------------------
SELECT /*+ GATHER_PLAN_STATISTICS */   C.TERRITORY,   C.TAX_ID_NUMBER,
 O.CONTACT_FIRST_NAME,   O.CONTACT_LAST_NAME,   O.CONTACT_PHONE,
O.CONTACT_FAX FROM   CUSTOMERS C,   CUSTOMER_ORDERS O WHERE   O.ID =
:CO_ID   AND C.ID=O.CUSTOMER_ID

Plan hash value: 1588498623

----------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name            | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                 |      1 |        |      1 |00:00:00.01 |       7 |
|   1 |  NESTED LOOPS                |                 |      1 |      1 |      1 |00:00:00.01 |       7 |
|   2 |   TABLE ACCESS BY INDEX ROWID| CUSTOMER_ORDERS |      1 |      1 |      1 |00:00:00.01 |       4 |
|*  3 |    INDEX UNIQUE SCAN         | SYS_C008716     |      1 |      1 |      1 |00:00:00.01 |       3 |
|   4 |   TABLE ACCESS BY INDEX ROWID| CUSTOMERS       |      1 |   1000 |      1 |00:00:00.01 |       3 |
|*  5 |    INDEX UNIQUE SCAN         | SYS_C008713     |      1 |      1 |      1 |00:00:00.01 |       2 |
----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("O"."ID"=:CO_ID)
   5 - access("C"."ID"="O"."CUSTOMER_ID") 

Notice in the above output that while the SQL_ID changes for the three versions of the SQL statement, the Plan hash value remains 1588498623 (you might obtain a different constant value, such as 3347798118).  The consistent Plan hash value does NOT mean that the Predicate Information section of the execution plan output is identical, nor does it mean that the estimated number of rows will be the same.  If you closely examine the Predicate Information section of the second SQL statement, you might notice that the 11.2.0.2 optimizer introduced an additional predicate, while at the same time removing the explicit join condition between the two tables.  Take a look at the E-Rows column in the last of the execution plans – might this be a sign of a bug.  Should the optimizer really predict that each row of the CUSTOMER_ORDERS table should return 1000 rows from the CUSTOMERS table when there is a declared foreign key relationship that points to the primary key column of the CUSTOMERS table, and is there a chance that this type of bad prediction might adversely affect the execution plans of other SQL statements?

OPTIMIZER_FEATURES_ENABLE=10.2.0.5:
(same as 11.2.0.2)

OPTIMIZER_FEATURES_ENABLE=9.2.0.8
(same as 11.2.0.2)

OPTIMIZER_FEATURES_ENABLE=9.0.1
(same as 11.2.0.2)

With OPTIMIZER_FEATURES_ENABLE=8.1.7:

SQL> SELECT /*+ GATHER_PLAN_STATISTICS */
  2    C.TERRITORY,
  3    C.TAX_ID_NUMBER,
  4    O.CONTACT_FIRST_NAME,
  5    O.CONTACT_LAST_NAME,
  6    O.CONTACT_PHONE,
  7    O.CONTACT_FAX
  8  FROM
  9    CUSTOMERS C,
 10    CUSTOMER_ORDERS O
 11  WHERE
 12    C.ID = :CO_CUSTOMER_ID
 13    AND O.ID = :CO_ID ;

SQL_ID  0vy1zt7jq25fs, child number 4
-------------------------------------
SELECT /*+ GATHER_PLAN_STATISTICS */   C.TERRITORY,   C.TAX_ID_NUMBER,
 O.CONTACT_FIRST_NAME,   O.CONTACT_LAST_NAME,   O.CONTACT_PHONE,
O.CONTACT_FAX FROM   CUSTOMERS C,   CUSTOMER_ORDERS O WHERE   C.ID =
:CO_CUSTOMER_ID   AND O.ID = :CO_ID

Plan hash value: 1588498623

----------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name            | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                 |      1 |        |      1 |00:00:00.01 |       7 |
|   1 |  NESTED LOOPS                |                 |      1 |      1 |      1 |00:00:00.01 |       7 |
|   2 |   TABLE ACCESS BY INDEX ROWID| CUSTOMER_ORDERS |      1 |      1 |      1 |00:00:00.01 |       4 |
|*  3 |    INDEX UNIQUE SCAN         | SYS_C008716     |      1 |      1 |      1 |00:00:00.01 |       3 |
|   4 |   TABLE ACCESS BY INDEX ROWID| CUSTOMERS       |      1 |      1 |      1 |00:00:00.01 |       3 |
|*  5 |    INDEX UNIQUE SCAN         | SYS_C008713     |      1 |      1 |      1 |00:00:00.01 |       2 |
----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("O"."ID"=:CO_ID)
   5 - access("C"."ID"=:CO_CUSTOMER_ID)

Note
-----
   - cpu costing is off (consider enabling it)

SQL>
SQL> SELECT /*+ GATHER_PLAN_STATISTICS */
  2    C.TERRITORY,
  3    C.TAX_ID_NUMBER,
  4    O.CONTACT_FIRST_NAME,
  5    O.CONTACT_LAST_NAME,
  6    O.CONTACT_PHONE,
  7    O.CONTACT_FAX
  8  FROM
  9    CUSTOMERS C,
 10    CUSTOMER_ORDERS O
 11  WHERE
 12    C.ID = :CO_CUSTOMER_ID
 13    AND O.ID = :CO_ID
 14    AND C.ID=O.CUSTOMER_ID;

SQL_ID  6suwfw1m251j0, child number 4
-------------------------------------
SELECT /*+ GATHER_PLAN_STATISTICS */   C.TERRITORY,   C.TAX_ID_NUMBER,
 O.CONTACT_FIRST_NAME,   O.CONTACT_LAST_NAME,   O.CONTACT_PHONE,
O.CONTACT_FAX FROM   CUSTOMERS C,   CUSTOMER_ORDERS O WHERE   C.ID =
:CO_CUSTOMER_ID   AND O.ID = :CO_ID   AND C.ID=O.CUSTOMER_ID

Plan hash value: 1588498623

----------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name            | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                 |      1 |        |      1 |00:00:00.01 |       7 |
|   1 |  NESTED LOOPS                |                 |      1 |      1 |      1 |00:00:00.01 |       7 |
|*  2 |   TABLE ACCESS BY INDEX ROWID| CUSTOMER_ORDERS |      1 |      1 |      1 |00:00:00.01 |       4 |
|*  3 |    INDEX UNIQUE SCAN         | SYS_C008716     |      1 |      1 |      1 |00:00:00.01 |       3 |
|   4 |   TABLE ACCESS BY INDEX ROWID| CUSTOMERS       |      1 |      1 |      1 |00:00:00.01 |       3 |
|*  5 |    INDEX UNIQUE SCAN         | SYS_C008713     |      1 |      1 |      1 |00:00:00.01 |       2 |
----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("O"."CUSTOMER_ID"=:CO_CUSTOMER_ID)
   3 - access("O"."ID"=:CO_ID)
   5 - access("C"."ID"=:CO_CUSTOMER_ID)

Note
-----
   - cpu costing is off (consider enabling it)

SQL>
SQL> SELECT /*+ GATHER_PLAN_STATISTICS */
  2    C.TERRITORY,
  3    C.TAX_ID_NUMBER,
  4    O.CONTACT_FIRST_NAME,
  5    O.CONTACT_LAST_NAME,
  6    O.CONTACT_PHONE,
  7    O.CONTACT_FAX
  8  FROM
  9    CUSTOMERS C,
 10    CUSTOMER_ORDERS O
 11  WHERE
 12    O.ID = :CO_ID
 13    AND C.ID=O.CUSTOMER_ID;

SQL_ID  abbq9g44p111b, child number 4
-------------------------------------
SELECT /*+ GATHER_PLAN_STATISTICS */   C.TERRITORY,   C.TAX_ID_NUMBER,
 O.CONTACT_FIRST_NAME,   O.CONTACT_LAST_NAME,   O.CONTACT_PHONE,
O.CONTACT_FAX FROM   CUSTOMERS C,   CUSTOMER_ORDERS O WHERE   O.ID =
:CO_ID   AND C.ID=O.CUSTOMER_ID

Plan hash value: 1588498623

----------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name            | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                 |      1 |        |      1 |00:00:00.01 |       7 |
|   1 |  NESTED LOOPS                |                 |      1 |      1 |      1 |00:00:00.01 |       7 |
|   2 |   TABLE ACCESS BY INDEX ROWID| CUSTOMER_ORDERS |      1 |      1 |      1 |00:00:00.01 |       4 |
|*  3 |    INDEX UNIQUE SCAN         | SYS_C008716     |      1 |      1 |      1 |00:00:00.01 |       3 |
|   4 |   TABLE ACCESS BY INDEX ROWID| CUSTOMERS       |      1 |   1000 |      1 |00:00:00.01 |       3 |
|*  5 |    INDEX UNIQUE SCAN         | SYS_C008713     |      1 |      1 |      1 |00:00:00.01 |       2 |
----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("O"."ID"=:CO_ID)
   5 - access("C"."ID"="O"."CUSTOMER_ID")

Note
-----
   - cpu costing is off (consider enabling it) 

Other than the note cpu costing is off (consider enabling it), the output with the OPTIMIZER_FEATURES_ENABLED parameter set to 8.1.7 is identical to the output when that parameter was set to 11.2.0.2.

How about testing the real thing… actually executing the SQL statements in the test case script on an older version of Oracle Database – will you receive the same execution plans as shown above?  I tested Oracle Database 10.2.0.5 and obtained the same execution plans as I saw with 11.2.0.2.  Anyone with access to Oracle Database 10.1, 9.2, 9.0.1, or 8.1.7 that is able to test the above script?  The DBMS_XPLAN.DISPLAY_CURSOR function is not available in Oracle Database versions prior to 10.1, so you will need to be creative to display the execution plan (AUTOTRACE may show the wrong execution plan – you might experiment with a solution offered by Tanel Poder for Oracle Database 9i).

As a reminder, you may post execution plans in a blog comment by enclosing the execution plan inside <pre> </pre> tags:

<pre>
Plan hash value: 3347798118
----------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name            | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                 |      1 |        |      1 |00:00:00.01 |       7 |
|   1 |  NESTED LOOPS                |                 |      1 |      1 |      1 |00:00:00.01 |       7 |
|*  2 |   TABLE ACCESS BY INDEX ROWID| CUSTOMER_ORDERS |      1 |      1 |      1 |00:00:00.01 |       4 |
|*  3 |    INDEX UNIQUE SCAN         | SYS_C0042378    |      1 |      1 |      1 |00:00:00.01 |       3 |
|   4 |   TABLE ACCESS BY INDEX ROWID| CUSTOMERS       |      1 |      1 |      1 |00:00:00.01 |       3 |
|*  5 |    INDEX UNIQUE SCAN         | SYS_C0042375    |      1 |      1 |      1 |00:00:00.01 |       2 |
----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("O"."CUSTOMER_ID"=:CO_CUSTOMER_ID)
   3 - access("O"."ID"=:CO_ID)
   5 - access("C"."ID"=:CO_CUSTOMER_ID)
</pre>




Non-Specific Index Hints

24 01 2012

January 24, 2012 (Modified January 25, 2012)

As I read the “Troubleshooting Oracle Performance” book for the second time a couple of months ago, I made note on page 385 that it was possible to specify table columns in an index hint, rather than specifying specific index names (or just specifying the table name).  This might be useful, for example, if you have tables in your database with primary key columns with system assigned names for the supporting indexes, and an index hint is needed to correct specific performance issues.  I was again reminded that it was possible to create non-specific index hints that specify table columns when a recent quiz was posted that asked to find specific cases where the behavior is other than expected with the newer index hint syntax.

As an example of the newer syntax, I put together a brief demonstration.  First, the test table and index creation script:

CREATE TABLE T2 (
  C1 NUMBER,
  C2 NUMBER,
  C3 VARCHAR2(100),
  PRIMARY KEY(C1));

INSERT INTO
  T2
SELECT
  ROWNUM C1,
  MOD(ROWNUM-1,20)+1 C2,
  RPAD('A',100,'A')
FROM
  DUAL
CONNECT BY
  LEVEL<=100000;

COMMIT;

CREATE INDEX IND_T2_C2 ON T2(C2);

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T2',CASCADE=>TRUE)

SET AUTOTRACE TRACEONLY EXPLAIN
SET LINESIZE 120
SET PAGESIZE 1000 

Let’s try a simple query that specifies columns C1 and C2 in the WHERE clause:

SELECT
  C1,
  C2,
  C3
FROM
  T2
WHERE
  C1 BETWEEN 1 AND 10000
  AND C2=1;

Plan hash value: 906133967

------------------------------------------------------------------------------------------------
| Id  | Operation                        | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                 |             |   500 | 54500 |   121   (1)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID     | T2          |   500 | 54500 |   121   (1)| 00:00:01 |
|   2 |   BITMAP CONVERSION TO ROWIDS    |             |       |       |            |          |
|   3 |    BITMAP AND                    |             |       |       |            |          |
|   4 |     BITMAP CONVERSION FROM ROWIDS|             |       |       |            |          |
|*  5 |      INDEX RANGE SCAN            | IND_T2_C2   |       |       |    10   (0)| 00:00:01 |
|   6 |     BITMAP CONVERSION FROM ROWIDS|             |       |       |            |          |
|   7 |      SORT ORDER BY               |             |       |       |            |          |
|*  8 |       INDEX RANGE SCAN           | SYS_C008661 |       |       |    20   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   5 - access("C2"=1)
   8 - access("C1">=1 AND "C1"<=10000) 

The above shows that if this SQL statement were actually executed, the index IND_T2_C2, and the index SYS_C008661 (that is used to help enforce the primary key) would be used when executing the SQL statement.  You will only see the above execution plan in the Enterprise Edition of Oracle Database.  Let’s try again with a hint that prohibits the BITMAP CONVERSION TO FROM ROWIDS operation:

SELECT /*+ OPT_PARAM('_B_TREE_BITMAP_PLANS','FALSE') */
  C1,
  C2,
  C3
FROM
  T2
WHERE
  C1 BETWEEN 1 AND 10000
  AND C2=1;

Execution Plan
----------------------------------------------------------
Plan hash value: 1513984157

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |   500 | 54500 |   131   (2)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T2   |   500 | 54500 |   131   (2)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("C2"=1 AND "C1"<=10000 AND "C1">=1) 

The above execution plan shows that without the ability to perform the  the BITMAP CONVERSION TO FROM ROWIDS operation to allow the BITMAP AND operation, a full table scan was selected, so this is a case where the Standard Edition of Oracle Database and the Enterprise Edition might exhibit different execution performance.

Let’s use the previous SQL statement with hint as a starting point, and use a non-specific index hint to instruct the optimizer to use an index on the T2 table:

SELECT /*+ INDEX(T2) OPT_PARAM('_B_TREE_BITMAP_PLANS','FALSE') */
  C1,
  C2,
  C3
FROM
  T2
WHERE
  C1 BETWEEN 1 AND 10000
  AND C2=1;

Plan hash value: 3350885058

-------------------------------------------------------------------------------------------
| Id  | Operation                   | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |             |   500 | 54500 |   179   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS BY INDEX ROWID| T2          |   500 | 54500 |   179   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | SYS_C008661 | 10000 |       |    20   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("C2"=1)
   2 - access("C1">=1 AND "C1"<=10000) 

As shown by the above, the optimizer selected to use the SYS_C008661 primary key index, which would require it to retrieve 10,000 ROWID values from the index, rather than using the index on the T2 column which would have retrieved 5,000 (100,000 * 1/20) ROWIDs from the IND_T2_C2 index (the clustering factor of the IND_T2_C2 index was likely the deciding factor).  Let’s specifically request (demand) that the optimizer use the IND_T2_C2 index by adjusting the hint:

SELECT /*+ INDEX(T2 IND_T2_C2) OPT_PARAM('_B_TREE_BITMAP_PLANS','FALSE') */
  C1,
  C2,
  C3
FROM
  T2
WHERE
  C1 BETWEEN 1 AND 10000
  AND C2=1;

Plan hash value: 174424276

-----------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |   500 | 54500 |  1596   (1)| 00:00:01 |
|*  1 |  TABLE ACCESS BY INDEX ROWID| T2        |   500 | 54500 |  1596   (1)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IND_T2_C2 |  5000 |       |    10   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("C1"<=10000 AND "C1">=1)
   2 - access("C2"=1) 

Notice in the above that the IND_T2_C2 index was selected, is expected to return 5,000 ROWID values from the index, and the execution plan now has a calculated cost of 1,596.  The calculated cost is a simple explanation why the optimizer did not select to use this index automatically.

Now, consider a situation where the optimizer insists on using the IND_T2_C2 index, rather than the SYS_C008661 primary key index for this SQL statement, which could happen if the CLUSTERING_FACTOR of the indexes are not set correctly:

SET AUTOTRACE OFF

SELECT
  INDEX_NAME,
  CLUSTERING_FACTOR
FROM
  USER_INDEXES
WHERE
  TABLE_NAME='T2';

INDEX_NAME   CLUSTERING_FACTOR
------------ -----------------
SYS_C008661               1585
IND_T2_C2                31700

EXEC DBMS_STATS.SET_INDEX_STATS (OWNNAME=>USER,INDNAME=>'IND_T2_C2',CLSTFCT=>1585,NO_INVALIDATE=>FALSE)

SELECT
  INDEX_NAME,
  CLUSTERING_FACTOR
FROM
  USER_INDEXES
WHERE
  TABLE_NAME='T2';

INDEX_NAME   CLUSTERING_FACTOR
------------ -----------------
SYS_C008661               1585
IND_T2_C2                 1585 

We started with the primary key index having a clustering factor of 1,585 and the IND_T2_C2 index having a clustering factor of 31,700.  After using DBMS_STATS.SET_INDEX_STATS, the optimizer is convinced that the IND_T2_C2 index also has a clusting factor of 1,585.  Now both of the indexes have the same CLUSTERING_FACTOR statistic value, what happens if we execute the SQL statement again that specifies an index should be used to access table T1, but does not specify the exact index name?:

SELECT /*+ INDEX(T2) OPT_PARAM('_B_TREE_BITMAP_PLANS','FALSE') */
  C1,
  C2,
  C3
FROM
  T2
WHERE
  C1 BETWEEN 1 AND 10000
  AND C2=1;

Execution Plan
----------------------------------------------------------
Plan hash value: 174424276

-----------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |   500 | 54500 |    90   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS BY INDEX ROWID| T2        |   500 | 54500 |    90   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IND_T2_C2 |  5000 |       |    10   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("C1"<=10000 AND "C1">=1)
   2 - access("C2"=1) 

So, now that the CLUSTERING_FACTOR values are the same for both indexes, the optimizer is selecting the IND_T2_C2 index for the SQL statement.  We have successfully painted ourselves into a corner, telling the optimizer that the IND_T2_C2 index really is helpful for this SQL statement.  We are fairly certain that the IND_T2_C2 index is not ideal, and that the the SYS_C008661 primary key index is a better choice if an index access path is determined to be better than a full table scan, based on the order in which the data was inserted into the table’s blocks.  The problem now is that the SYS_C008661 index name is not consistent from one database to the next, or even in the same database if you drop table T2 and repeat the test.  How do we tell the optimizer to pick the index that is on the primary key column, column C1?  Oracle Database 10.1 introduced new syntax for index hints, which allows us to specify the column name(s), rather than the index name in the hint:

SELECT /*+ INDEX(T2 (C1)) OPT_PARAM('_B_TREE_BITMAP_PLANS','FALSE') */
  C1,
  C2,
  C3
FROM
  T2
WHERE
  C1 BETWEEN 1 AND 10000
  AND C2=1;

Plan hash value: 3350885058

-------------------------------------------------------------------------------------------
| Id  | Operation                   | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |             |   500 | 54500 |   179   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS BY INDEX ROWID| T2          |   500 | 54500 |   179   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | SYS_C008661 | 10000 |       |    20   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("C2"=1)
   2 - access("C1">=1 AND "C1"<=10000) 

As can be seen by the above, the index on column C1, which happens to be the primary key index, was selected by the optimizer at the request of the index hint.

Let’s take a look at the documentation for Oracle Database 10.2:

“Hints that specify an index can use either a simple index name or a parenthesized list of columns as follows:

index (table (column))
  • table specifies the name
  • columnspecifies the name of a column in the specified table
    • The columns can optionally be prefixed with table qualifiers allowing the hint to specify bitmap join indexes where the index columns are on a different table than the indexed table. If tables qualifiers are present, they must be base tables, not aliases in the query.
    • Each column in an index specification must be a base column in the specified table, not an expression. Function-based indexes cannot be hinted using a column specification unless the columns specified in the index specification form the prefix of a function-based index.
  • index specifies an index name

The hint is resolved as follows:

  • If an index name is specified, only that index is considered.
  • If a column list is specified and an index exists whose columns match the specified columns in number and order, only that index is considered. If no such index exists, then any index on the table with the specified columns as the prefix in the order specified is considered. In either case, the behavior is exactly as if the user had specified the same hint individually on all the matching indexes.

For example, in Example 16-3 the job_history table has a single-column index on the employee_id column and a concatenated index on employee_id and start_date columns. To specifically instruct the optimizer on index use, the query can be hinted as follows:

SELECT /*+ INDEX(v.j jhist_employee_ix (employee_id start_date)) */ * FROM v;

That is almost crystal clear (I probably poorly translated the diagram), even though it seems that the cost calculation might not have an impact (“If no such index exists, then any index on the table with the specified columns as the prefix in the order specified is considered“).  If you have the sample database schema loaded into a database, jump to the view definition in Example 16-3, create the view, and then see if the index hint in the documentation (at the bottom of the above quote block) actually works – it did not work for me.  The same hint example is found in the Oracle Database 11.2 documentation.

However, in the example above, where I used the hint syntax INDEX (TABLE (COLUMN)), the hint worked as expected.  Interestingly, the hint reference for Oracle Database 11.2 does not specifically mention this new syntax, and that might be why I was slow to recognize the new hint syntax.

Now that the basics are covered, let’s see if we are able to confuse the optimizer with index hints, taking up the challenge proposed by the blog article mentioned at the start of this article.  Below is a slightly modified version of the test script that I posted as a comment in the other blog article (with an additional column in the table, an additional index, and an enabled 10053 trace).  First, we will create the table, a couple of indexes on the table, change a couple of SQL*Plus parameters, and then enable a 10053 trace:

CREATE TABLE T1 (
  C1 NUMBER NOT NULL,
  C2 NUMBER NOT NULL,
  C3 VARCHAR2(30) NOT NULL,
  C4 VARCHAR2(200),
  C5 VARCHAR2(10));

INSERT INTO T1
SELECT
  MOD(ROWNUM-1, 90) * 4 C1,
  ROWNUM - 1 C2,
  TO_CHAR(ROWNUM - 1, 'RN') C3,
  LPAD('A',200,'A') C4,
  LPAD('B',10,'B') C5
FROM
  DUAL
CONNECT BY
  LEVEL<=1000000;

CREATE INDEX IND_T1_C1_C2 ON T1(C1,C2);
CREATE INDEX IND_T1_C1_C2_C5 ON T1(C1,C2,C5);
CREATE INDEX IND_T1_C2_C1_C3 ON T1(C2,C1,C3);
CREATE INDEX IND_T1_C3_C1_C2 ON T1(C3,C1,C2);

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T1',CASCADE=>TRUE)

SET AUTOTRACE TRACEONLY EXPLAIN
SET LINESIZE 120
SET PAGESIZE 1000
ALTER SESSION SET EVENTS '10053 TRACE NAME CONTEXT FOREVER, LEVEL 1';

Let’s begin the process of trying to confuse the Oracle Database 11.2.0.2 optimizer.  Unhinted, the following query accesses the index on columns C2, C1, and C3 to avoid accessing the table, and uses an INDEX FAST FULL SCAN operation that employs multi-block reads :

ALTER SESSION SET TRACEFILE_IDENTIFIER = 'UNHINTED';

SELECT
  C1,
  C2,
  C3
FROM
  T1;

Plan hash value: 2374279026

----------------------------------------------------------------------------------------
| Id  | Operation            | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |                 |  1000K|    23M|   387   (2)| 00:00:01 |
|   1 |  INDEX FAST FULL SCAN| IND_T1_C2_C1_C3 |  1000K|    23M|   387   (2)| 00:00:01 |
----------------------------------------------------------------------------------------

Let’s hint the optimizer to use the index on the columns C1 and C2:

ALTER SESSION SET TRACEFILE_IDENTIFIER = 'HINT_C1_C2';

SELECT /*+ INDEX(T1 (C1 C2)) */
  C1,
  C2,
  C3
FROM
  T1;

Plan hash value: 3388050039
--------------------------------------------------------------------------------------------
| Id  | Operation                   | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |              |  1000K|    23M|  1012K  (1)| 00:06:46 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1           |  1000K|    23M|  1012K  (1)| 00:06:46 |
|   2 |   INDEX FULL SCAN           | IND_T1_C1_C2 |  1000K|       |  3026  (10)| 00:00:02 |
--------------------------------------------------------------------------------------------

In the above, the optimizer obeyed the hint, even though the calculated cost from the unhinted plan increased from 387 to 1,012,000, and the optimizer selected to perform an INDEX FULL SCAN operation.

Let’s reverse the order of the columns in the index hint:

ALTER SESSION SET TRACEFILE_IDENTIFIER = 'HINT_C2_C1';

SELECT /*+ INDEX(T1 (C2 C1)) */
  C1,
  C2,
  C3
FROM
  T1;

Execution Plan
----------------------------------------------------------
Plan hash value: 1746297295

------------------------------------------------------------------------------------
| Id  | Operation        | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |                 |  1000K|    23M|  4851   (1)| 00:00:01 |
|   1 |  INDEX FULL SCAN | IND_T1_C2_C1_C3 |  1000K|    23M|  4851   (1)| 00:00:01 |
------------------------------------------------------------------------------------

In the above, note that the index on columns C2, C1, and C3 was used, but the cost is now calculated at 4,851 rather than 387 as it was in the unhinted plan. The INDEX FAST FULL SCAN operation is now shown as an INDEX FULL SCAN operation.  So, we have now convinced the optimizer to use an access path that employs single block reads of an index rather than multi-block reads of an index, simply by telling the optimizer to use the index that it would have used without the hint.  Have we confused the optimizer already?  Someone should probably take a look at the 10053 trace files.  :-)

We have an index on columns C2, C1, and C3, but we also have an index on columns C3, C1, and C2. What happens when we specify the columns C3, C1, and C2 in the index hint in that order?

ALTER SESSION SET TRACEFILE_IDENTIFIER = 'HINT_C3_C1_C2';

SELECT /*+ INDEX(T1 (C3 C1 C2)) */
  C1,
  C2,
  C3
FROM
  T1;

Plan hash value: 2273443829

------------------------------------------------------------------------------------
| Id  | Operation        | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |                 |  1000K|    23M|  4943   (1)| 00:00:01 |
|   1 |  INDEX FULL SCAN | IND_T1_C3_C1_C2 |  1000K|    23M|  4943   (1)| 00:00:01 |
------------------------------------------------------------------------------------

An index full scan was selected to be performed on the IND_T1_C3_C1_C2 index with a calculated cost of 4,943, rather than using the IND_T1_C2_C1_C3 index that previously resulted in a cost of 4,851 – so the optimizer will not alter the order of the columns in the index hint to reduce the calculated cost.

Let’s try another example where we select all of the columns that are present in the IND_T1_C1_C2_C5 index – will the optimizer use that index, or will it obey the expected behavior of the index hint?:

ALTER SESSION SET TRACEFILE_IDENTIFIER = 'HINT_C1_C2_WC5';

SELECT /*+ INDEX(T1 (C1 C2)) */
  C1,
  C2,
  C5
FROM
  T1;

Execution Plan
----------------------------------------------------------
Plan hash value: 3388050039

--------------------------------------------------------------------------------------------
| Id  | Operation                   | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |              |  1000K|    19M|  1003K  (1)| 00:00:23 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1           |  1000K|    19M|  1003K  (1)| 00:00:23 |
|   2 |   INDEX FULL SCAN           | IND_T1_C1_C2 |  1000K|       |  2750   (1)| 00:00:01 |
--------------------------------------------------------------------------------------------

The index that exactly included the columns specified in the index hint was selected, with a calculated cost of 1,003,000.

So, what happens if we are less specific in the index hint, and just list the first column, where there are two composite indexes that start with the specified column:

ALTER SESSION SET TRACEFILE_IDENTIFIER = 'HINT_C1_WC2_C5';

SELECT /*+ INDEX(T1 (C1)) */
  C1,
  C2,
  C5
FROM
  T1;

Execution Plan
----------------------------------------------------------
Plan hash value: 2942389535

------------------------------------------------------------------------------------
| Id  | Operation        | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |                 |  1000K|    19M|  4293   (1)| 00:00:01 |
|   1 |  INDEX FULL SCAN | IND_T1_C1_C2_C5 |  1000K|    19M|  4293   (1)| 00:00:01 |
------------------------------------------------------------------------------------

As shown in the above execution plan, the optimizer arrived at a cost of 4,293 by selecting to use the IND_T1_C1_C2_C5 index, which avoided the access to the T1 table.  So, sometimes the performance will be better with slightly less specific hints.

Out of curiosity, what do you think will happen if we completely eliminate the index hint?  Let’s test:

ALTER SESSION SET TRACEFILE_IDENTIFIER = 'UNHINTED_C1_C2_C5';

SELECT
  C1,
  C2,
  C5
FROM
  T1;

Execution Plan
----------------------------------------------------------
Plan hash value: 2722951733

----------------------------------------------------------------------------------------
| Id  | Operation            | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |                 |  1000K|    19M|   343   (3)| 00:00:01 |
|   1 |  INDEX FAST FULL SCAN| IND_T1_C1_C2_C5 |  1000K|    19M|   343   (3)| 00:00:01 |
----------------------------------------------------------------------------------------

In the above execution plan, the IND_T1_C1_C2_C5 index is still selected for use as it was in the previous execution plan, but notice that the access path has changed to an INDEX FAST FULL SCAN operation and the calculated cost dropped from 4,293 to 343.  So, the above output implies that sometimes the performance will be better if we simply do not hint an index access path, if the index access path would have been selected otherwise.  Someone want to take a look at the 10053 trace files and explain why?

If we add a WHERE clause that places a restriction on column C2 to be less than 10, the optimizer could use a couple of different access paths. Let’s specify the columns C1 and C2 in the index hint to see which index is selected:

ALTER SESSION SET TRACEFILE_IDENTIFIER = 'HINT_C1_C2_C2_WHERE';

SELECT /*+ INDEX(T1 (C1 C2)) */
  C1,
  C2,
  C3
FROM
  T1
WHERE
   C2<10;

Plan hash value: 1883798457

--------------------------------------------------------------------------------------------
| Id  | Operation                   | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |              |    10 |   250 |   103   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1           |    10 |   250 |   103   (0)| 00:00:01 |
|*  2 |   INDEX SKIP SCAN           | IND_T1_C1_C2 |    10 |       |    92   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C2"<10)
       filter("C2"<10)

In the above, an INDEX SKIP SCAN operation was selected because the index with columns C1 and C2 was specified in the hint.

Let’s try another example that possibly might be considered a case where the optimizer disobeys the hint or is free to change the order of the columns specified in the index hint (this might be incorrectly considered an edge case):

ALTER SESSION SET TRACEFILE_IDENTIFIER = 'HINT_C1_C2_C3_C2_WHERE';

SELECT /*+ INDEX(T1 (C1 C2 C3)) */
  C1,
  C2,
  C3
FROM
  T1
WHERE
  C2<10;

Plan hash value: 4150417361

------------------------------------------------------------------------------------
| Id  | Operation        | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |                 |    10 |   250 |     3   (0)| 00:00:01 |
|*  1 |  INDEX RANGE SCAN| IND_T1_C2_C1_C3 |    10 |   250 |     3   (0)| 00:00:01 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("C2"<10)

A quick peek at the above output might suggest that the optimizer could potentially decide to locate an index with columns C1, C2, and C3 in any order – but I do not believe that this is the case. I believe that the optimizer considered the index hint specified in the SQL statement as being invalid (a check of the 10053 trace might confirm).

Let’s create another index and then repeat the above SQL statement:

CREATE INDEX IND_T1_C1_C2_C3 ON T1(C1,C2,C3);

ALTER SESSION SET TRACEFILE_IDENTIFIER = 'HINT_C1_C2_C3_C2_WHERE2';

SELECT /*+ INDEX(T1 (C1 C2 C3)) */
  C1,
  C2,
  C3
FROM
  T1
WHERE
  C2<10;

Plan hash value: 212907557

------------------------------------------------------------------------------------
| Id  | Operation        | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |                 |    10 |   250 |    92   (0)| 00:00:01 |
|*  1 |  INDEX SKIP SCAN | IND_T1_C1_C2_C3 |    10 |   250 |    92   (0)| 00:00:01 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("C2"<10)
       filter("C2"<10)

The index with the columns that matched the order of the columns in the index hint was selected, even though the calculated cost would have been lower if the optimizer were permitted to select any index with the columns listed in the index hint.

What about a case where there is an exact match between an index definition and an index hint, and there is also another index with one additional column which would avoid the table access:

ALTER SESSION SET TRACEFILE_IDENTIFIER = 'HINT_C1_C2_C2_WHERE2';

SELECT /*+ INDEX(T1 (C1 C2)) */
  C1,
  C2,
  C3
FROM
  T1
WHERE
  C2<10;

Plan hash value: 1883798457

--------------------------------------------------------------------------------------------
| Id  | Operation                   | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |              |    10 |   250 |   103   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1           |    10 |   250 |   103   (0)| 00:00:01 |
|*  2 |   INDEX SKIP SCAN           | IND_T1_C1_C2 |    10 |       |    92   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C2"<10)
       filter("C2"<10)

The index that exactly matched the index hint was selected by the optimizer.

What if we only specify in the index hint a leading column, when there are two indexes with that leading column, one of which allows the optimizer to avoid the table access:

ALTER SESSION SET TRACEFILE_IDENTIFIER = 'HINT_C1_C2_WHERE';

SELECT /*+ INDEX(T1 (C1)) */
  C1,
  C2,
  C3
FROM
  T1
WHERE
  C2<10;

Plan hash value: 212907557

------------------------------------------------------------------------------------
| Id  | Operation        | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |                 |    10 |   250 |    92   (0)| 00:00:01 |
|*  1 |  INDEX SKIP SCAN | IND_T1_C1_C2_C3 |    10 |   250 |    92   (0)| 00:00:01 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("C2"<10)
       filter("C2"<10)

The optimizer selected the lowest cost access path from the two indexes that matched the hint.

What if we specify a column in the index hint that is not listed in the SELECT or WHERE clauses?

ALTER SESSION SET TRACEFILE_IDENTIFIER = 'HINT_C3_C2_WHERE';

SELECT /*+ INDEX(T1 (C3)) */
  C1,
  C2
FROM
  T1
WHERE
  C2<10;

Plan hash value: 1328421701

------------------------------------------------------------------------------------
| Id  | Operation        | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |                 |    10 |    90 |  4039   (1)| 00:00:02 |
|*  1 |  INDEX SKIP SCAN | IND_T1_C3_C1_C2 |    10 |    90 |  4039   (1)| 00:00:02 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("C2"<10)
       filter("C2"<10)

The above shows the original execution plan (captured a couple of days ago without column C5 in the table definition, without index IND_T1_C1_C2_C5, and without an enabled 10053 trace), which shows that the optimizer still obeyed the intention of the hint – it found an index that started with the specified column and selected to perform an INDEX SKIP SCAN operation even though column C2, specified in the WHERE clause, is the third column in the index definition.

As luck would have it, with the slightly altered table definition (and possibly different statistics estimates) the execution plan has changed to use an INDEX FULL SCAN operation rather than an INDEX SKIP SCAN operation, and now the execution plan has a higher calculated cost.  So, why did the execution plan change from an INDEX SKIP SCAN?  Here is the current execution plan:

Execution Plan
----------------------------------------------------------
Plan hash value: 2273443829

------------------------------------------------------------------------------------
| Id  | Operation        | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |                 |    10 |    90 |  4945   (1)| 00:00:01 |
|*  1 |  INDEX FULL SCAN | IND_T1_C3_C1_C2 |    10 |    90 |  4945   (1)| 00:00:01 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("C2"<10)
       filter("C2"<10)

So, for the challenge, try to explain the oddities that I pointed out above.  I have not yet reviewed the 10053 trace files, but I will later.








Follow

Get every new post delivered to your Inbox.

Join 142 other followers