The New Order Oracle Coding Challenge 1

31 07 2011

July 31, 2011

(Forward to the Next Post in the Series)

Years ago I played a couple of different games with letters, rather than numbers, on dice – attempting to form words from the letters displayed on top of the dice.  I was not very good with those games, and I recall attempting to create a computer program to help me with the challenge.  The screaming fast 1 MHz CPU and 64KB of memory proved to be no match for the words listed in the paper-bound dictionary sitting on the shelf.  Computers are significantly faster now, with wide-spread access to Internet based word lists, so it probably would not be much of a challenge today to build a solution for one of those dice letter games.

Finding different combinations with number digits is a bit easier than working with letters – we are able to use math rules to determine if the specified conditions are met rather than a dictionary.  We will start with an easy problem that I found on the web, but I will keep the source of that problem a secret for now.  Consider the number 989,901.  If we write the digits in that number from right to left, we obtain the new number 109,989.  What is special about that new number?  The number is evenly divisible by 3 and 9, but more interesting is that the new number divides evenly into the original number (989,901).

With the help of Oracle Database, find all of the numbers from 1 to 1,000,000 where the digits in the number when listed from left to right are evenly divisible by those same digits listed from right to left.  The numbers that end with 0 are a special case, reversing the order of the digits in those numbers will result in the 0 digit effectively disappearing – as such, exclude any number that ends with 0 from being tested.

There are several methods to swap the order of the digits in the number.  Would you use a different method to test all of the numbers between 1 and 10,000, or to test all of the numbers up to 10,000,000?

Might it work to store the numbers in a reverse key index, and then dump the resulting index values – is that the fourth method to switch the order of the digits?  ;-)





The Unique Result Oracle Database Coding Challenge

28 07 2011

July 28, 2011

I must say that I am impressed with the number of unique solutions that were developed for the previous coding challenge (FizzBuzz).  While not all solutions were extremely efficient (a couple were intentionally designed to be as inefficient as possible), the various techniques provide views of different approaches to solving a problem that was not well defined (even though at first glance it appeared to be well defined).  While not all of the solutions presented are optimal for the FizzBuzz challenge, derivatives of those solutions might be perfect for real-world problems (for instance, side-tracking a pesky DBA or developer with a performance challenge).

Time for another coding challenge.  This challenge is adapted from one that was posed in a Usenet thread several years ago.  Remember that if a specification is not well defined, feel free to interpret the specification – one of the goals of this blog article is to see a variety of solutions, but a secondary goal is to determine what might happen when weak specifications are provided in a request for a solution.

Assume that there are two tables, T1 and T2, as designed and populated below.  The two tables contain the results from experiments performed by two different analysts.

DROP TABLE T1 PURGE;
DROP TABLE T2 PURGE;

CREATE TABLE T1 (
  C1 VARCHAR2(10),
  C2 VARCHAR2(10),
  C3 VARCHAR2(10),
  C4 VARCHAR2(10));

CREATE TABLE T2 (
  C1 VARCHAR2(10),
  C2 VARCHAR2(10),
  C3 VARCHAR2(10),
  C4 VARCHAR2(10));

INSERT INTO T1 VALUES ('TEST1A','TEST1B','TEST1C','TEST1D');
INSERT INTO T1 VALUES ('TEST2A','TEST2B','TEST2C','TEST2D');
INSERT INTO T1 VALUES ('TEST3A','TEST3B','TEST3C','TEST3D');
INSERT INTO T1 VALUES ('TEST4A','TEST4B','TEST4C','TEST4D');
INSERT INTO T1 VALUES ('TEST5A','TEST5B','TEST5C','TEST5D');

INSERT INTO T2 VALUES ('TEST1A','TEST1B','TEST1C','TEST1D');
INSERT INTO T2 VALUES ('TEST2A','TEST1B','TEST1C','TEST2D');
INSERT INTO T2 VALUES ('TEST3A','TEST3B','TEST3C','TEST3D');
INSERT INTO T2 VALUES ('TEST5A','TEST5B','TEST5C','TEST5D');
INSERT INTO T2 VALUES ('TEST6A','TEST5B','TEST5C','TEST6D');  

This is a three part challenge.

Part 1: Return all rows from table T1 where the C2 value is not found in table T2’s column C2 values, and return all rows from table T2 where the C2 value is not found in table T1’s column C2 values.  In short, we want the rows that have unique results found in column C2 so that we may identify the differences found by the two analysts.

Part 2: Using a single select statement, return all rows found in table T1 that are not duplicated in table T2, and all rows found in table T2 that are not duplicated in table T1.  The resulting output must identify the table from which the row was found.

Part 3: A third analyst performed experiments and entered his results in table T3, as designed and populated below.  Using a single SQL statement, retrieve the rows from tables T1, T2, and T3 that are not duplicated in the other tables – find the unique rows from the three tables.  Each row should identify its source table (T1, T2, or T3).

DROP TABLE T3 PURGE;

CREATE TABLE T3 (
  C1 VARCHAR2(10),
  C2 VARCHAR2(10),
  C3 VARCHAR2(10),
  C4 VARCHAR2(10));

INSERT INTO T3 VALUES ('TEST1A','TEST1B','TEST1C','TEST1D');
INSERT INTO T3 VALUES ('TEST2A','TEST1B','TEST1C','TEST2D');
INSERT INTO T3 VALUES ('TEST6A','TEST6B','TEST6C','TEST6D'); 

When you post your solutions, please identify Part 1, Part 2, or Part 3 next to the solution.

General tips for comments:

  • Include your code portion of the solution in a monospaced font (Courier) to retain the alignment spaces in the solution.  To do this, use a <pre> tag just before the code portion of the solution and a </pre> tag just after the solution ([s ourcecode] and [/s ourcecode] tags (without the space between the first two letters) should have the same result, just with a smaller font size and a non-scrolling code area).
  • Less than (<) and greater than (>) signs have a special meaning in HTML web pages.  As such, specify &lt; for a less than sign and &gt; for a greater than sign in code sections to avoid having portions of your code sections magically disappear.

Think about how you would build the solutions before scrolling down to the comments section.

-

-

-

-

-

-

-

-

Part 2:

SELECT  NVL(T1.C1,T2.C1) C1, NVL(T1.C2,T2.C2) C2, NVL(T1.C3,T2.C3) C3, NVL(T1.C4,T2.C4) C4, NVL2(T1.C1,'T1','T2') FROM_TABLE FROM T1 FULL OUTER JOIN T2 ON T1.C1=T2.C1 AND T1.C2=T2.C2 AND T1.C3=T2.C3 AND T1.C4=T2.C4 WHERE T1.C1 IS NULL OR T2.C1 IS NULL; 

-

-

-

-

-

-





The FizzBuzz Oracle Database Coding Challenge

26 07 2011

July 26, 2011

Through a web search I located a page titled “Coding Horror: Why Can’t Programmers.. Program?”  A simple question was asked in an interview, and apparently 199 of 200 programmers struggled to build a solution for the problem in less than ten minutes.  The problem must be that the 199 people who did not succeed did not have access to an Oracle Database.  The same question was posed to SQL Server developers in the form of a quiz.  Before looking at the articles, see if you are able to solve the following problem with the help of Oracle Database:

Write a program that prints the numbers from 1 to 100. But for multiples of three print “Fizz” instead of the number and for the multiples of five print “Buzz”. For numbers which are multiples of both three and five print “FizzBuzz”.

How many different solutions are there for this problem?  Think about the problem before scrolling down.

-

-

-

-

-

-

-

-

-

-

-

-

-

My solution:

SELECT
  NVL(DECODE(ROWNUM/3,TRUNC(ROWNUM/3),'Fizz',NULL)||DECODE(ROWNUM/5,TRUNC(ROWNUM/5),'Buzz',NULL),TO_CHAR(ROWNUM)) FIZZBUZZ
FROM
  DUAL
CONNECT BY
  LEVEL<=100; 

-

-

-

-

-

-

-

I think that I remember solving a similar problem using an IBM PC Jr. using BASICA years ago.  How may ways can this problem be solved with the help of Oracle Database?  Before you answer, you might be thinking to yourself why would someone ask such a simple question?  Could there be an overly complex solution, something that the interviewer had never seen before, that was the intended response to the question?





How Many Ways to Solve this Problem? Generate a Calendar that Displays Average Hours Per Day

18 07 2011

July 18, 2011

I am surprised at the significant number of unique solutions to the SQL problems that I have previously posed on this blog.  For fun I thought that I would give people another chance to demonstrate their unique approaches to solving another problem.

Supposed that a transaction table exists with the following definition:

CREATE TABLE T1 (
  TRANS_ID NUMBER,
  EMPLOYEE_ID VARCHAR2(10),
  PART_ID VARCHAR2(20),
  CLOCK_IN DATE,
  CLOCK_OUT DATE,
  QTY_COMPLETE NUMBER,
  PRIMARY KEY (TRANS_ID));  

The above table simulates a transaction table that might capture production run times for people (or even robots) to produce sets of parts.  The requirement is rather simple – we want to know the average number of hours required to produce a particular part by day in a calendar type layout.  The only restriction is that the output must appear in a SQL*Plus window (or SQL*Plus may be used to spool the result to a text file).  For example, the output might look like this (feel free to be creative):

PART_ID    W     SUNDAY     MONDAY    TUESDAY  WEDNESDAY   THURSDAY     FRIDAY   SATURDAY
-------- --- ---------- ---------- ---------- ---------- ---------- ---------- ----------
PART1      1        .98        .56        .67       1.21        .23        .98        .02
PART1      2        .57        .87        .76        .01        .45        .55        .77
PART1      3        .01        .77        .66        .54        .02        .44        .92
PART1      4        .55        .02        .88        .11        .05        .33        .89
PART1      5        .11        .43        .55        .14        .03        .90        .78
PART1      6        .87        .45        .63        .02        .82        .78        .55
PART1      7        .02                                         .01 

If there was no transaction for a particular part on a particular date, a blank (NULL) should appear; to prevent division by zero errors, the hours should be output divided by one if the total QTY_COMPLETE is zero.

Let’s create some reproducible sample data in table T1:

DROP TABLE TABLE T1_EMP PURGE;

CREATE TABLE T1_EMP AS
SELECT
  'EMP'||TO_CHAR(ROWNUM) EMPLOYEE_ID
FROM
  DUAL
CONNECT BY
  LEVEL<=1000;

DROP TABLE T1_PART PURGE;

CREATE TABLE T1_PART AS
SELECT
  'PART'||TO_CHAR(ROWNUM) PART_ID
FROM
  DUAL
CONNECT BY
  LEVEL<=200;

INSERT INTO
  T1
SELECT /*+ LEADING(P E) */
  ROWNUM TRANS_ID,
  E.EMPLOYEE_ID,
  P.PART_ID,
  TO_DATE('01-JAN-2000','DD-MON-YYYY')+ROWNUM/500 CLOCK_IN,
  TO_DATE('01-JAN-2000','DD-MON-YYYY')+ROWNUM/500 + ((MOD(ROWNUM,20) + 1) * 0.5) / 24 CLOCK_OUT,
  ROUND(ABS(SIN(ROWNUM/180*3.141592)*10)) QTY_COMPLETE
FROM
  (SELECT
     EMPLOYEE_ID,
     ROW_NUMBER() OVER (ORDER BY EMPLOYEE_ID) RN
   FROM
     T1_EMP) E,
  (SELECT
     PART_ID,
     ROW_NUMBER() OVER (ORDER BY PART_ID) RN
   FROM
     T1_PART) P,
  (SELECT
     ROWNUM RN
   FROM
     DUAL
   CONNECT BY
     LEVEL<=10) D
WHERE
  MOD(E.RN,8)=MOD(P.RN,8);

COMMIT; 

Taking a quick look at the sample data in table T1:

COLUMN PART_ID FORMAT A8
COLUMN HOURS FORMAT 90.00
SET PAGESIZE 1000
SET LINESIZE 140
SET TRIMSPOOL ON

SELECT
  PART_ID,
  EMPLOYEE_ID,
  TRUNC(CLOCK_IN) SHIFT_DATE,
  CLOCK_OUT-CLOCK_IN HOURS,
  QTY_COMPLETE
FROM
  T1
WHERE
  PART_ID='PART1'
ORDER BY
  CLOCK_IN;

PART_ID  EMPLOYEE_I SHIFT_DAT  HOURS QTY_COMPLETE
-------- ---------- --------- ------ ------------
PART1    EMP811     31-JAN-01   0.19            9
PART1    EMP811     31-JAN-01   0.21            9
PART1    EMP811     31-JAN-01   0.23            9
PART1    EMP819     04-FEB-01   0.04           10
PART1    EMP819     04-FEB-01   0.06           10
PART1    EMP819     04-FEB-01   0.08           10
PART1    EMP819     04-FEB-01   0.10           10
PART1    EMP819     04-FEB-01   0.13           10
PART1    EMP819     04-FEB-01   0.15           10
...
PART1    EMP912     28-MAR-01   0.19            2
PART1    EMP912     28-MAR-01   0.21            2
PART1    EMP912     28-MAR-01   0.23            2
PART1    EMP92      01-APR-01   0.04            0
PART1    EMP92      01-APR-01   0.06            0
PART1    EMP92      01-APR-01   0.08            1
PART1    EMP92      01-APR-01   0.10            1
...
PART1    EMP992     11-MAY-01   0.19            5
PART1    EMP992     11-MAY-01   0.21            5
PART1    EMP992     11-MAY-01   0.23            5

1250 rows selected. 

Now just a quick example of how you might calculate the average hours per piece (note that the HOURS column actually shows the number of days, NOT hours – multiply the numbers in that column by 24 to convert to hours):

SELECT
  PART_ID,
  TRUNC(CLOCK_IN) SHIFT_DATE,
  SUM(CLOCK_OUT-CLOCK_IN) HOURS,
  SUM(QTY_COMPLETE) QTY_COMPLETE
FROM
  T1
WHERE
  PART_ID='PART1'
GROUP BY
  PART_ID,
  TRUNC(CLOCK_IN)
ORDER BY
  TRUNC(CLOCK_IN);

PART_ID  SHIFT_DAT  HOURS QTY_COMPLETE
-------- --------- ------ ------------
PART1    01-JAN-00   1.35           90
PART1    05-JAN-00   1.35          100
PART1    09-JAN-00   1.35           98
PART1    13-JAN-00   1.35           81
PART1    17-JAN-00   1.35           56
PART1    21-JAN-00   1.35           25
PART1    25-JAN-00   1.35           10
PART1    29-JAN-00   1.35           44
...
PART1    29-APR-01   1.35           56
PART1    03-MAY-01   1.35           25
PART1    07-MAY-01   1.35           10
PART1    11-MAY-01   1.35           44

125 rows selected. 

Interesting how the HOURS column always sums to 1.35 per day…

OK, for you self-starters, create the calendar output.  Think about any other kinds of analysis that might be done with this data – is it possible to determine which PART_ID takes the least (or the most) average time per piece on any given day.

-

-

-

For the non-self starters, scroll down.

-

-

-

-

-

-

-

-

-

-

-

-

-

One of the challenges that we face is determining the calendar row and column for any given shift date (the date associated with the CLOCK_IN date/time stamp).  Since in this case we know that the first row of data has a date of 01-JAN-2000, we need to determine the Sunday of that week.  One way to do that is to use the NEXT_DAY function, like this:

SELECT
  NEXT_DAY(TO_DATE('01-JAN-2000','DD-MON-YYYY')-7,'SUNDAY') S
FROM
  DUAL;

S
---------
26-DEC-99 

The next step is to divide the shift dates into rows and columns based on the zero date of 26-DEC-99.  TRUNC and MOD could be used for this calculation, but to keep things interesting I will use TO_CHAR rather than MOD:

COLUMN R FORMAT 99
COLUMN C FORMAT 9

SELECT
  PART_ID,
  NEXT_DAY(TO_DATE('01-JAN-2000','DD-MON-YYYY')-7,'SUNDAY') S,
  TRUNC(CLOCK_IN) SHIFT_DATE,
  TRUNC((TRUNC(CLOCK_IN)-NEXT_DAY(TO_DATE('01-JAN-2000','DD-MON-YYYY')-7,'SUNDAY'))/7)+1 R,
  TO_NUMBER(TO_CHAR(TRUNC(CLOCK_IN),'D')) C,
  SUM(CLOCK_OUT-CLOCK_IN) HOURS,
  SUM(QTY_COMPLETE) QTY_COMPLETE
FROM
  T1
WHERE
  PART_ID='PART1'
GROUP BY
  PART_ID,
  TRUNC(CLOCK_IN)
ORDER BY
  TRUNC(CLOCK_IN);

PART_ID  S         SHIFT_DAT   R  C  HOURS QTY_COMPLETE
-------- --------- --------- --- -- ------ ------------
PART1    26-DEC-99 01-JAN-00   1  7   1.35           90
PART1    26-DEC-99 05-JAN-00   2  4   1.35          100
PART1    26-DEC-99 09-JAN-00   3  1   1.35           98
PART1    26-DEC-99 13-JAN-00   3  5   1.35           81
PART1    26-DEC-99 17-JAN-00   4  2   1.35           56
PART1    26-DEC-99 21-JAN-00   4  6   1.35           25
PART1    26-DEC-99 25-JAN-00   5  3   1.35           10
...
PART1    26-DEC-99 16-NOV-00  47  5   1.35           72
PART1    26-DEC-99 20-NOV-00  48  2   1.35           90
PART1    26-DEC-99 24-NOV-00  48  6   1.35          100
PART1    26-DEC-99 28-NOV-00  49  3   1.35           98
PART1    26-DEC-99 02-DEC-00  49  7   1.35           81
...
PART1    26-DEC-99 25-APR-01  70  4   1.35           81
PART1    26-DEC-99 29-APR-01  71  1   1.35           56
PART1    26-DEC-99 03-MAY-01  71  5   1.35           25
PART1    26-DEC-99 07-MAY-01  72  2   1.35           10
PART1    26-DEC-99 11-MAY-01  72  6   1.35           44

125 rows selected. 

The next step is to collapse all of the rows with the same R value into a single row – we will do this with a combination of the MAX and DECODE functions:

SELECT
  PART_ID,
  R,
  ROUND(MAX(DECODE(C,1,DECODE(QTY_COMPLETE,0,HOURS,HOURS/QTY_COMPLETE),NULL)),2) SUNDAY,
  ROUND(MAX(DECODE(C,2,DECODE(QTY_COMPLETE,0,HOURS,HOURS/QTY_COMPLETE),NULL)),2) MONDAY,
  ROUND(MAX(DECODE(C,3,DECODE(QTY_COMPLETE,0,HOURS,HOURS/QTY_COMPLETE),NULL)),2) TUESDAY,
  ROUND(MAX(DECODE(C,4,DECODE(QTY_COMPLETE,0,HOURS,HOURS/QTY_COMPLETE),NULL)),2) WEDNESDAY,
  ROUND(MAX(DECODE(C,5,DECODE(QTY_COMPLETE,0,HOURS,HOURS/QTY_COMPLETE),NULL)),2) THURSDAY,
  ROUND(MAX(DECODE(C,6,DECODE(QTY_COMPLETE,0,HOURS,HOURS/QTY_COMPLETE),NULL)),2) FRIDAY,
  ROUND(MAX(DECODE(C,7,DECODE(QTY_COMPLETE,0,HOURS,HOURS/QTY_COMPLETE),NULL)),2) SATURDAY
FROM
  (SELECT
    PART_ID,
    TRUNC(CLOCK_IN) SHIFT_DATE,
    TRUNC((TRUNC(CLOCK_IN)-NEXT_DAY(TO_DATE('01-JAN-2000','DD-MON-YYYY')-7,'SUNDAY'))/7)+1 R,
    TO_NUMBER(TO_CHAR(TRUNC(CLOCK_IN),'D')) C,
    SUM(CLOCK_OUT-CLOCK_IN) HOURS,
    SUM(QTY_COMPLETE) QTY_COMPLETE
  FROM
    T1
  WHERE
    PART_ID='PART1'
  GROUP BY
    PART_ID,
    TRUNC(CLOCK_IN))
GROUP BY
  PART_ID,
  R
ORDER BY
  PART_ID,
  R;

PART_ID    R     SUNDAY     MONDAY    TUESDAY  WEDNESDAY   THURSDAY     FRIDAY   SATURDAY
-------- --- ---------- ---------- ---------- ---------- ---------- ---------- ----------
PART1      1                                                                          .02
PART1      2                                         .01
PART1      3        .01                                         .02
PART1      4                   .02                                         .05
PART1      5                              .14                                         .03
PART1      6                                         .02
PART1      7        .02                                         .01
PART1      8                   .01                                         .02
...
PART1     69                              .01                                         .01
PART1     70                                         .02
PART1     71        .02                                         .05
PART1     72                   .14                                         .03

72 rows selected. 

Of course there is a problem with the above – the scale of the time is actually in days, rather than hours, but that is easy enough to fix.

-

-

-

-

-

-

-

Anyone care to share their approach to solving the original problem?

-

-

Assume that the output will be used by a production supervisor – what other kinds of analysis can be performed?  For example, could you produce an analysis like the one below that compares recent transactions with those from prior periods and color codes those prior periods based on how the current period (the Avg Hrs Pc column in the below screen capture) compares with the prior periods:

Be creative – performance, while important, is not the primary objective.





How Many Ways to Solve this Problem? Add the Sequential Numbers x Through y

13 07 2011

July 13, 2011

I am not entirely sure why, however a couple of days ago the following search keywords were used to access one or more articles on this blog:

the sum of 1+2+3+4...+98+99+100

The above request I found to be a bit interesting, and there is a 50/50 chance that the person found the right answer to the sum of the numbers between 1 and 100. 

If you had to solve this problem with the help of Oracle Database, how would you accomplish the task?  If it helps, I think that I recall that the mathematical notation representing the problem posed by the searcher is as follows:

Would your answer be any different if the person needed to know the sum of the numbers between 6 and 105:

What about generalizing the problem even further:

Think about the problem before scrolling down.  How many unique solutions are able to produce the answer?

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

————————————

If you think about the problem, in its simplist form it is really just a matter of repeatedly adding the set of the highest and lowest unmatched numbers (all of those results should be the same, otherwise your calculator needs new batteries) and then multiplying by the number of pairs (1/2 as many numbers are in the sequence to be summed):

01 + 100 = 101
02 + 99 = 101
03 + 98 = 101
04 + 97 = 101
05 + 96 = 101
...
50 + 51 = 101

So, the general formula is:

(max - min + 1) / 2 * (min + max)

And the SQL statements to produce the results:

SELECT   (100 - 1 + 1) /2 * (1 + 100) FROM   DUAL;  
SELECT   (105 - 6 + 1) /2 * (6 + 105) FROM   DUAL;

————————————

-

-

-

-

-

-

-

-

-

-

-

-

For now, ignore the above section.  How many ways can this particular problem be solved with the help of Oracle Database?  Are there any built-in functions that will help?





What Would Cause a NO_INDEX Hint to Not Work as Expected?

11 07 2011

July 11, 2011

Recently, the following search keywords were used to access an article on my site, and that search triggered an idea for another blog article:

no_index hint oracle 10g not working 

In Oracle Database, hints are directives that must be obeyed (with a couple of minor exceptions that include bugs).  I started wondering what might cause a NO_INDEX hint to not work as expected.  Let’s create a test table for a couple of experiments:

CREATE TABLE T3(
  C1 NUMBER,
  C2 NUMBER,
  C3 NUMBER,
  C4 VARCHAR2(300));

INSERT INTO
  T3
SELECT
  ROWNUM C1,
  TRUNC(ROWNUM/10000) C2,
  MOD(ROWNUM,10000) C3,
  LPAD('A',300,'A')
FROM
  DUAL
CONNECT BY
  LEVEL<=1000000;

CREATE INDEX IND_T3_C1 ON T3(C1);
CREATE INDEX IND_T3_C2 ON T3(C2);
CREATE INDEX IND_T3_C3 ON T3(C3);

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T3',CASCADE=>TRUE,ESTIMATE_PERCENT=>100)

SET LINESIZE 140
SET PAGESIZE 1000 

Let’s try a simple query that accesses the table, and display the execution plan for that query (note that these test SQL statements are being executed on Oracle Database 11.2.0.2):

SELECT /*+ GATHER_PLAN_STATISTICS */
  C1,
  C2,
  C3,
  SUBSTR(C4,1,10) C4
FROM
  T3
WHERE
  C1<=20;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

SQL_ID  0s5xrvx04309f, child number 0
-------------------------------------
SELECT /*+ GATHER_PLAN_STATISTICS */   C1,   C2,   C3,
SUBSTR(C4,1,10) C4 FROM   T3 WHERE   C1<=20

Plan hash value: 1371903174

---------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |      1 |        |     20 |00:00:00.01 |       8 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T3        |      1 |     20 |     20 |00:00:00.01 |       8 |
|*  2 |   INDEX RANGE SCAN          | IND_T3_C1 |      1 |     20 |     20 |00:00:00.01 |       5 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C1"<=20) 

The IND_T3_C1 index was automatically selected to assist data retrieval.

Let’s try a NO_INDEX hint just to verify that the hint can work as expected:

SELECT /*+ NO_INDEX(T3 IND_T3_C1) GATHER_PLAN_STATISTICS */
  C1,
  C2,
  C3,
  SUBSTR(C4,1,10) C4
FROM
  T3
WHERE
  C1<=20;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

SQL_ID  d0gpwhvg7629r, child number 0
-------------------------------------
SELECT /*+ NO_INDEX(T3 IND_T3_C1) GATHER_PLAN_STATISTICS */   C1,   C2,
  C3,   SUBSTR(C4,1,10) C4 FROM   T3 WHERE   C1<=20

Plan hash value: 4161002650

------------------------------------------------------------------------------------
| Id  | Operation         | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |      1 |        |     20 |00:00:00.01 |   45583 |
|*  1 |  TABLE ACCESS FULL| T3   |      1 |     20 |     20 |00:00:00.01 |   45583 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("C1"<=20) 

As can be seen above, a full table scan is performed for data retrieval, rather than the index that was used in the previous example.

Let’s try another example with the NO_INDEX hint:

SELECT /*+ NO_INDEX(IND_T3_C1) GATHER_PLAN_STATISTICS */
  C1,
  C2,
  C3,
  SUBSTR(C4,1,10) C4
FROM
  T3
WHERE
  C1<=20;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

SQL_ID  4p5xpu625cw5a, child number 0
-------------------------------------
SELECT /*+ NO_INDEX(IND_T3_C1) GATHER_PLAN_STATISTICS */   C1,   C2,
C3,   SUBSTR(C4,1,10) C4 FROM   T3 WHERE   C1<=20

Plan hash value: 1371903174

---------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |      1 |        |     20 |00:00:00.01 |       8 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T3        |      1 |     20 |     20 |00:00:00.01 |       8 |
|*  2 |   INDEX RANGE SCAN          | IND_T3_C1 |      1 |     20 |     20 |00:00:00.01 |       5 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C1"<=20) 

That did not work quite as someone might think – you must specify the table/view name in the NO_INDEX hint.

Let’s try another example:

SELECT /*+ NO_INDEX(T3 IND_T3_C1) GATHER_PLAN_STATISTICS */
  C1,
  C2,
  C3,
  SUBSTR(C4,1,10) C4
FROM
  T3 T
WHERE
  C1<=20;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

SQL_ID  fnjc3pc41a2mh, child number 0
-------------------------------------
SELECT /*+ NO_INDEX(T3 IND_T3_C1) GATHER_PLAN_STATISTICS */   C1,   C2,
  C3,   SUBSTR(C4,1,10) C4 FROM   T3 T WHERE   C1<=20

Plan hash value: 1371903174

---------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |      1 |        |     20 |00:00:00.01 |       8 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T3        |      1 |     20 |     20 |00:00:00.01 |       8 |
|*  2 |   INDEX RANGE SCAN          | IND_T3_C1 |      1 |     20 |     20 |00:00:00.01 |       5 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C1"<=20) 

Again, the index was used despite the NO_INDEX hint – if you alias a table/view, you must specify the alias in the NO_INDEX hint.  This seems to be a common problem when people report in Internet forums that Oracle hints do not work as expected.

Another example:

SELECT /*+ GATHER_PLAN_STATISTICS */
  C1,
  C2,
  C3,
  SUBSTR(C4,1,10) C4
FROM
  T3
WHERE
  C1 IN
    (SELECT /*+ NO_INDEX(T3 IND_T3_C1) */
       C1
     FROM
       T3
     WHERE
       C1<=20);

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

Plan hash value: 587667290

-------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |           |      1 |        |     20 |00:00:00.01 |      13 |       |       |          |
|*  1 |  HASH JOIN SEMI              |           |      1 |     20 |     20 |00:00:00.01 |      13 |   705K|   705K| 1125K (0)|
|   2 |   TABLE ACCESS BY INDEX ROWID| T3        |      1 |     20 |     20 |00:00:00.01 |       4 |       |       |          |
|*  3 |    INDEX RANGE SCAN          | IND_T3_C1 |      1 |     20 |     20 |00:00:00.01 |       3 |       |       |          |
|*  4 |   TABLE ACCESS FULL          | T3        |      1 |     20 |     20 |00:00:00.01 |       9 |       |       |          |
-------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("C1"="C1")
   3 - access("C1"<=20)
   4 - filter("C1"<=20) 

As seen by the above, the IND_T3_C1 index was still used even though the hint was correctly formed.  What is wrong?  The scope of the index hint is only in the subquery found in the WHERE clause and that hint does not apply to the main portion of the SQL statement – the hint did work in the scope of the subquery.  Note that the execution plan shows that the query was transformed into a simple join.

A similar SQL statement, with the NO_INDEX hint relocated:

SELECT /*+ NO_INDEX(T3 IND_T3_C1) GATHER_PLAN_STATISTICS */
  C1,
  C2,
  C3,
  SUBSTR(C4,1,10) C4
FROM
  T3
WHERE
  C1 IN
    (SELECT /*+  */
       C1
     FROM
       T3
     WHERE
       C1<=20);

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

SQL_ID  3n76qa6km68r2, child number 0
-------------------------------------
SELECT /*+ NO_INDEX(T3 IND_T3_C1) GATHER_PLAN_STATISTICS */   C1,   C2,
  C3,   SUBSTR(C4,1,10) C4 FROM   T3 WHERE   C1 IN     (SELECT /*+  */
      C1      FROM        T3      WHERE        C1<=20)

Plan hash value: 3266157401

---------------------------------------------------------------------------------------------------------------------
| Id  | Operation          | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
---------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |           |      1 |        |     20 |00:00:00.07 |   45586 |       |       |          |
|*  1 |  HASH JOIN SEMI    |           |      1 |     20 |     20 |00:00:00.07 |   45586 |   705K|   705K| 1140K (0)|
|*  2 |   TABLE ACCESS FULL| T3        |      1 |     20 |     20 |00:00:00.01 |   45581 |       |       |          |
|*  3 |   INDEX RANGE SCAN | IND_T3_C1 |      1 |     20 |     20 |00:00:00.01 |       5 |       |       |          |
---------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("C1"="C1")
   2 - filter("C1"<=20)
   3 - access("C1"<=20) 

As can be seen above, the IND_T3_C1 index was still used because the NO_INDEX scope is only in the main body of the SQL statement, not in the subquery found in the WHERE clause.  Once again, the execution plan shows that the query was transformed into a simple join.

OK, so we saw in the previous examples that query transformations happened and Oracle’s optimizer was able to keep track of the scope of the NO_INDEX hint, even when the query was tranformed into a simple join.  Let’s try another example, this time with a NO_QUERY_TRANSFORMATION hint, a NO_INDEX hint, and an INDEX hint: 

SELECT /*+ INDEX(T3 IND_T3_C1) NO_QUERY_TRANSFORMATION GATHER_PLAN_STATISTICS */
  C1,
  C2,
  C3,
  SUBSTR(C4,1,10) C4
FROM
  T3
WHERE
  C1 IN
    (SELECT /*+ NO_INDEX(T3 IND_T3_C1) */
       C1
     FROM
       T3
     WHERE
       C1<=20);

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

SQL_ID  bbs14bbywgfq2, child number 0
-------------------------------------
SELECT /*+ INDEX(T3 IND_T3_C1) NO_QUERY_TRANSFORMATION
GATHER_PLAN_STATISTICS */   C1,   C2,   C3,   SUBSTR(C4,1,10) C4 FROM
T3 WHERE   C1 IN     (SELECT /*+ NO_INDEX(T3 IND_T3_C1) */        C1
  FROM        T3      WHERE        C1<=20)

Plan hash value: 371539318

--------------------------------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |     20 |00:00:00.01 |   45723 |
|*  1 |  FILTER             |      |      1 |        |     20 |00:00:00.01 |   45723 |
|   2 |   TABLE ACCESS FULL | T3   |      1 |   1000K|   1000K|00:00:00.16 |   45583 |
|*  3 |   FILTER            |      |   1000K|        |     20 |00:00:00.11 |     140 |
|*  4 |    TABLE ACCESS FULL| T3   |     20 |      1 |     20 |00:00:00.01 |     140 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter( IS NOT NULL)
   3 - filter(:B1<=20)
   4 - filter(("C1"=:B1 AND "C1"<=20)) 

Note in the above that no query transformation happened, but also notice that two full table scans were performed – it *appears* that the INDEX hint was ignored.  For extra credit, explain why the optimizer could not apply the INDEX hint.

What if we create a view with an embedded hint?

CREATE VIEW V3 AS
SELECT /*+ INDEX(TV3 IND_T3_C2) */
  C1,
  C2,
  C3,
  C4
FROM
  T3 TV3
WHERE
  C2 <= 20; 

Now a query with a NO_INDEX hint that uses that view:

SELECT /*+ GATHER_PLAN_STATISTICS */
  C1,
  C2,
  C3,
  SUBSTR(C4,1,10) C4
FROM
  T3
WHERE
  C1 IN
    (SELECT /*+ NO_INDEX(V3 IND_T3_C2) */
       C1
     FROM
       V3
     WHERE
       C1<=200);

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

SQL_ID  agjapbkt2n8av, child number 0
-------------------------------------
SELECT /*+ GATHER_PLAN_STATISTICS */   C1,   C2,   C3,
SUBSTR(C4,1,10) C4 FROM   T3 WHERE   C1 IN     (SELECT /*+ NO_INDEX(V3
IND_T3_C2) */        C1      FROM        V3      WHERE        C1<=200)

Plan hash value: 1309751330

-------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |           |      1 |        |    200 |00:00:00.01 |      54 |    |          |          |
|*  1 |  HASH JOIN RIGHT SEMI        |           |      1 |     42 |    200 |00:00:00.01 |      54 |  1452K|  1452K| 1282K (0)|
|*  2 |   TABLE ACCESS BY INDEX ROWID| T3        |      1 |     42 |    200 |00:00:00.01 |      13 |    |          |          |
|*  3 |    INDEX RANGE SCAN          | IND_T3_C1 |      1 |    200 |    200 |00:00:00.01 |       3 |    |          |          |
|   4 |   TABLE ACCESS BY INDEX ROWID| T3        |      1 |    200 |    200 |00:00:00.01 |      41 |    |          |          |
|*  5 |    INDEX RANGE SCAN          | IND_T3_C1 |      1 |    200 |    200 |00:00:00.01 |      17 |    |          |          |
-------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("C1"="C1")
   2 - filter("C2"<=20)
   3 - access("C1"<=200)
   5 - access("C1"<=200) 

As seen by the above, the NO_INDEX hint was applied and the conflicting hint that was embedded in the view was not applied.

For fun, let’s reverse the location of the INDEX and NO_INDEX hints to see if the NO_INDEX hint always overrides the INDEX hint.  First, the view definition:

CREATE OR REPLACE VIEW V3 AS
SELECT /*+ NO_INDEX(TV3 IND_T3_C2) */
  C1,
  C2,
  C3,
  C4
FROM
  T3 TV3
WHERE
  C2 <= 20; 

Now the query:

 SELECT /*+ GATHER_PLAN_STATISTICS */
  C1,
  C2,
  C3,
  SUBSTR(C4,1,10) C4
FROM
  T3
WHERE
  C1 IN
    (SELECT /*+ INDEX(V3 IND_T3_C2) */
       C1
     FROM
       V3
     WHERE
       C1<=200);

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

SQL_ID  5j745zr4dmqzx, child number 0
-------------------------------------
SELECT /*+ GATHER_PLAN_STATISTICS */   C1,   C2,   C3,
SUBSTR(C4,1,10) C4 FROM   T3 WHERE   C1 IN     (SELECT /*+ INDEX(V3
IND_T3_C2) */        C1      FROM        V3      WHERE        C1<=200)

Plan hash value: 3864333899

-------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |           |      1 |        |    200 |00:00:00.06 |   10017 |       |       |          |
|*  1 |  HASH JOIN RIGHT SEMI        |           |      1 |     42 |    200 |00:00:00.06 |   10017 |  1452K|  1452K| 1269K (0)|
|*  2 |   TABLE ACCESS BY INDEX ROWID| T3        |      1 |     42 |    200 |00:00:00.01 |    9976 |       |       |          |
|*  3 |    INDEX RANGE SCAN          | IND_T3_C2 |      1 |    209K|    209K|00:00:00.02 |     410 |       |       |          |
|   4 |   TABLE ACCESS BY INDEX ROWID| T3        |      1 |    200 |    200 |00:00:00.01 |      41 |       |       |          |
|*  5 |    INDEX RANGE SCAN          | IND_T3_C1 |      1 |    200 |    200 |00:00:00.01 |      17 |       |       |          |
-------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("C1"="C1")
   2 - filter("C1"<=200)
   3 - access("C2"<=20)
   5 - access("C1"<=200)

As can be seen by the above, the NO_INDEX hint in the view was not applied due to the conflicting INDEX hint in the main query.  So, that is another case where the NO_INDEX hint could appear to not work as expected.  Just for confirmation that the NO_INDEX hint in the view works as expected, we will re-execute the query without the INDEX hint:

SELECT /*+ GATHER_PLAN_STATISTICS */
  C1,
  C2,
  C3,
  SUBSTR(C4,1,10) C4
FROM
  T3
WHERE
  C1 IN
    (SELECT /*+  */
       C1
     FROM
       V3
     WHERE
       C1<=200);

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

SQL_ID  fq3g6pr7ffj2f, child number 0
-------------------------------------
SELECT /*+ GATHER_PLAN_STATISTICS */   C1,   C2,   C3,
SUBSTR(C4,1,10) C4 FROM   T3 WHERE   C1 IN     (SELECT /*+  */
C1      FROM        V3      WHERE        C1<=200)

Plan hash value: 1309751330

-------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |           |      1 |        |    200 |00:00:00.01 |      54 |    |          |          |
|*  1 |  HASH JOIN RIGHT SEMI        |           |      1 |     42 |    200 |00:00:00.01 |      54 |  1452K|  1452K| 1232K (0)|
|*  2 |   TABLE ACCESS BY INDEX ROWID| T3        |      1 |     42 |    200 |00:00:00.01 |      13 |    |          |          |
|*  3 |    INDEX RANGE SCAN          | IND_T3_C1 |      1 |    200 |    200 |00:00:00.01 |       3 |    |          |          |
|   4 |   TABLE ACCESS BY INDEX ROWID| T3        |      1 |    200 |    200 |00:00:00.01 |      41 |    |          |          |
|*  5 |    INDEX RANGE SCAN          | IND_T3_C1 |      1 |    200 |    200 |00:00:00.01 |      17 |    |          |          |
-------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("C1"="C1")
   2 - filter("C2"<=20)
   3 - access("C1"<=200)
   5 - access("C1"<=200) 

  1. By executing the above test scripts on Oracle Database 11.1, 10.2, 10.1, 9.2, or 9.0.1 do you see different results?  I am trying to understand why the person performing the search might have included 10g in the search keywords.  Is it possible that the hint appeared to work correctly in 9.2 simply by coincidence, and a query transformation in 10.1 or 10.2 exposed the fact that the hint was malformed?
  2. Is it possible that a query transformation can cause a NO_INDEX hint to be ignored?  If yes, please provide a test case that demonstrates a NO_INDEX hint being ignored due to a transformation.
  3. Are there any other examples where a NO_INDEX hint will appear to not work properly?  Could an index organized table cause problems for this hint?




How Many Ways to Solve this SQL Problem?

6 07 2011

July 6, 2011

Since there were so many unique solutions to the last blog article that posed a SQL challenge, I thought that I would try another blog article that asks a similar type of question.  Assume that someone showed you the following output:

 C2   D
--- ---
100   0
150  50
200  50
201   1
300  99
350  50
400  50
500 100 

You have the following table definition, and rows in the table:

CREATE TABLE T2 (
  C1 NUMBER,
  C2 NUMBER);

INSERT INTO T2 VALUES (1,100);
INSERT INTO T2 VALUES (4,150);
INSERT INTO T2 VALUES (7,200);
INSERT INTO T2 VALUES (8,201);
INSERT INTO T2 VALUES (10,300);
INSERT INTO T2 VALUES (14,350);
INSERT INTO T2 VALUES (18,400);
INSERT INTO T2 VALUES (24,500);

COMMIT;

Assume that you know nothing other than the fact that the C2 values are listed in ascending order when sorted by column C1.  How many different ways can this particular problem be solved.  Yes, there is an easy way, but assume that you were trying to “help educate” the person who provided the requested output.

My least-shortest-path solution follows:

SELECT
  C2,
  0 D
FROM
  T2
WHERE
  C1=(SELECT
        MIN(C1)
      FROM
        T2)
UNION ALL
SELECT
  V2.C2,
  V2.C2-MAX(T2.C2) D
FROM
  T2,
  (SELECT
    C1,
    C2
  FROM
    T2) V2
WHERE
  T2.C1<V2.C1
GROUP BY
  V2.C2
ORDER BY
  C2;

  C2    D
---- ----
 100    0
 150   50
 200   50
 201    1
 300   99
 350   50
 400   50
 500  100

8 rows selected. 

In the above, the row with the 0 in the D column was the hardest part of the solution.  Why would I use UNION ALL and not UNION – what was not in the specification?

This blog article was inspired by an old question found in a Usenet group from 1998 – if you were answering the question in 1998, would your answer be any different?  Be creative with your solution.  While you are thinking about a solution, take a look at this old Usenet thread and consider how difficult it was to find the “50 highest paid workers” in the last century.








Follow

Get every new post delivered to your Inbox.

Join 144 other followers