Different Execution Plans when Converting SQL from SELECT to UPDATE … IN SELECT, ROWNUM Related Bug in 11.2.0.2

28 06 2012

June 28, 2012 (Modified June 28, 2012)

Sometimes it is possible to build a very effective test case to reproduce a particular problem.  Sometimes the test case drifts a bit offtrack, in the process revealing other information that is not directly related to the original problem that the test case attempted to simulate.  You will see several examples of test cases scripts used in various articles on this blog, including one that attempts to describe how to build useful test cases.  It is not uncommon to see test cases used to demonstrate various types of behavior, and there are several well known people in the Oracle Database community that use test cases to not only demonstrate problems, but also to teach methods of avoiding those problems.

The FAQ (Frequently Asked Questions) for the SQL and PL/SQL OTN forum includes links to two OTN threads that everyone posting to the forum about performance problems should read carefully:

When your query takes too long…
How to post a SQL statement tuning request

A handful of the good test cases have been posted to the OTN forums over the last couple of years that have then sparked an interesting article on this blog.  Take, for example, this OTN thread that provided a test case script to show that a poor execution plan was generated by the Oracle Database optimizer when ROWNUM was included in the WHERE clause.  Prior to Oracle Database 11.2.0.1, the cardinality calculation for steps in the execution plan that could not possibly be short-circuited early (such as a hash join with a subsequent ORDER BY) were affected by the ROWNUM predicate in the WHERE clause (of a parent query block).  That test case resulted in at least one article on this blog, which showed that with a bit of modification to the SQL statement, that it was possible to use the ROW_NUMBER analytic function as a replacement for ROWNUM, thus avoiding the automatic switch to a FIRST_ROWS(n) optimizer mode for the SQL statement.  This particular problem was determined to be a bug, and corrected with the release of Oracle Database 11.2.0.1.

Fast forward two and a half years, and we see another interesting, and oddly related test case in a thread on the OTN forum.  The OP (original poster) in the OTN forum thread generated the test case, posting his results from Oracle Database 10.2.0.5 (Standard Edition), while I posted my observed results from Oracle Database 11.2.0.2 in an attempt to find a more efficient method for updating rows that match specific criteria (other people helped also, possibly executing the test case script in other Oracle Database release versions).

What do we learn from the OTN test case script?  Here is a start:

  • When hinting a SQL statement to generate an expected execution plan, it is important to be as specific as possible with the hints, without boxing the SQL statement into a corner as the data volume increases in the various tables accessed by the SQL statement.  The hints also must appear in the correct section of a complex SQL statement with one or more inline views or subqueries.
  • If you use ROWNUM in the WHERE clause of a SQL statement, and then convert that SELECT statement to a UPDATE tablex WHERE coly IN (SELECT…) statement, you may find that the execution plan for the SELECT portion of the UPDATE statement changes dramatically from what was seen when just the SELECT statement was executed.  With Oracle Database 11.2.0.2, you may find that the first execution is efficient, and on the second execution find that cardinality feedback triggered a new hard parse of the SQL statement, resulting in a significantly less efficient execution plan (or possibly significantly more efficient execution plan).
  • AUTOTRACE in SQL*Plus lies about the execution plan sometimes, such as when cardinality feedback triggers the generation of a new execution plan.
  • Using a UNION ALL in an inline view will likely not permit the Oracle runtime engine to short-circuit an operation in the inline view when a ROWNUM condition is present in the parent SQL statement, even if the query optimizer’s generated execution plan indicates that this short-circuit will happen.  If possible, rewrite the SQL statement to avoid the UNION ALL operation, if both sections of the UNION ALL are accessing the same tables.
  • The GATHER_PLAN_STATISTICS hint, when added to a SQL statement, permits the use of DBMS_XPLAN.DISPLAY_CURSOR with the ALLSTATS LAST format parameter.  An execution plan generated with that format parameter will indicate whether or not short-circuiting of steps in an execution plan took place.  This short-circuiting will be indicated in the A-Rows column, the Starts column, and/or the Predicate Information section of the execution plan.
  • Within an inline view, an index-driven nested loop join that permits the elimination of an ORDER BY clause can be short-circuited by a ROWNUM condition in the parent portion of the SQL statement, while the same is not true for hash or sort-merge joins.
  • The ROWNUM cardinality calculation bug is still present in Oracle Database 11.2.0.2 if the inline view contains a UNION ALL clause – this bug is most apparent when cardinality feedback triggers the generation of a new execution plan on the second execution of the SQL statement.
  • In Oracle Database 11.2.0.2, query blocks in a SQL statement that cannot be short-circuited by a parent query block’s ROWNUM predicate are still optimized with an automatic FIRST_ROWS(n) optimizer mode (where n is the number associated with the parent operation’s ROWNUM predicate).  Through an apparent bug in the query optimizer, this automatic optimization may cause odd cost calculation problems, such as dropping the cost of a full table scan from 35.01 to 2.02 (with the cost of the corresponding hash join dropping from 56.76 to just 23.83); at the same time, due to cardinality feedback, the cost of an efficient index access path with a nested loops join (that may be short-circuited by the runtime engine) at the same point in the execution plan may increase from a cost of 41.32 to 2947.89.  As a result of the costing change, the nested loop join with index access path is likely to be replaced with a hash join and full table scan.

What else may we learn from the latest OTN test case thread?

Late Addition to the Article (June 28, 2012)

A side-by-side comparison of a portion of the 10053 trace file (showing the possible bug related to ROWNUM and cardinality feedback) that was generated by the following script (using the OP’s sample data):

DROP INDEX IND_DTL_TEST;

SET AUTOTRACE OFF
SET LINESIZE 140
SET PAGESIZE 1000

ALTER SESSION SET EVENTS '10053 TRACE NAME CONTEXT FOREVER, LEVEL 1';
ALTER SESSION SET TRACEFILE_IDENTIFIER='TESTING 1';

select /*+ GATHER_PLAN_STATISTICS */
  dtlid
from
  (select
     dtlid
   from
     (select
        dtlid,
        process_date
      from
        dtl
      where
        hdrid is null
        and process_date < (sysdate + 30)
        and process_ind = 'NEW'
      UNION ALL
      select
        dtl.dtlid,
        dtl.process_date
      from
        dtl,
        hdr
      where
        dtl.hdrid = hdr.hdrid
        and dtl.process_date < (sysdate + 30)
        and dtl.process_ind = 'NEW'
        and hdr_ind in (0, 2))
   order by
     dtlid,
     process_date)
where
  rownum <= 10;

SELECT * FROM TABLE (DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

ALTER SESSION SET TRACEFILE_IDENTIFIER='TESTING 2';

select /*+ GATHER_PLAN_STATISTICS */
  dtlid
from
  (select
     dtlid
   from
     (select
        dtlid,
        process_date
      from
        dtl
      where
        hdrid is null
        and process_date < (sysdate + 30)
        and process_ind = 'NEW'
      UNION ALL
      select
        dtl.dtlid,
        dtl.process_date
      from
        dtl,
        hdr
      where
        dtl.hdrid = hdr.hdrid
        and dtl.process_date < (sysdate + 30)
        and dtl.process_ind = 'NEW'
        and hdr_ind in (0, 2))
   order by
     dtlid,
     process_date)
where
  rownum <= 10;

SELECT * FROM TABLE (DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

ALTER SESSION SET EVENTS '10053 TRACE NAME CONTEXT OFF'; 

The generated execution plan, before cardinality feedback:

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  4dwv2ktktv1ck, child number 1
-------------------------------------
select /*+ GATHER_PLAN_STATISTICS */   dtlid from   (select      dtlid
  from      (select         dtlid,         process_date       from
   dtl       where         hdrid is null         and process_date <
(sysdate + 30)         and process_ind = 'NEW'       UNION ALL
select         dtl.dtlid,         dtl.process_date       from
dtl,         hdr       where         dtl.hdrid = hdr.hdrid         and
dtl.process_date < (sysdate + 30)         and dtl.process_ind = 'NEW'
      and hdr_ind in (0, 2))    order by      dtlid,      process_date)
where   rownum <= 10

Plan hash value: 2201656677

------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                          | Name     | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                   |          |      1 |        |     10 |00:00:00.05 |   31500 |       |       |          |
|*  1 |  COUNT STOPKEY                     |          |      1 |        |     10 |00:00:00.05 |   31500 |       |       |          |
|   2 |   VIEW                             |          |      1 |     22 |     10 |00:00:00.05 |   31500 |       |       |          |
|*  3 |    SORT ORDER BY STOPKEY           |          |      1 |     22 |     10 |00:00:00.05 |   31500 |  2048 |  2048 | 2048  (0)|
|   4 |     VIEW                           |          |      1 |     22 |   1367 |00:00:00.04 |   31500 |       |       |          |
|   5 |      UNION-ALL                     |          |      1 |        |   1367 |00:00:00.04 |   31500 |       |       |          |
|*  6 |       TABLE ACCESS FULL            | DTL      |      1 |     11 |    160 |00:00:00.01 |    1117 |       |       |          |
|   7 |       NESTED LOOPS                 |          |      1 |        |   1207 |00:00:00.07 |   30383 |       |       |          |
|   8 |        NESTED LOOPS                |          |      1 |     11 |   2838 |00:00:00.04 |   27566 |       |       |          |
|   9 |         TABLE ACCESS BY INDEX ROWID| DTL      |      1 |     32 |  15034 |00:00:00.02 |   15068 |       |       |          |
|* 10 |          INDEX RANGE SCAN          | DTL_IDX1 |      1 |        |  15034 |00:00:00.01 |      56 |       |       |          |
|* 11 |         INDEX UNIQUE SCAN          | HDR_PK   |  15034 |      1 |   2838 |00:00:00.01 |   12498 |       |       |          |
|* 12 |        TABLE ACCESS BY INDEX ROWID | HDR      |   2838 |      1 |   1207 |00:00:00.01 |    2817 |       |       |          |
------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(ROWNUM<=10)
   3 - filter(ROWNUM<=10)
   6 - filter(("HDRID" IS NULL AND "PROCESS_IND"='NEW' AND "PROCESS_DATE"<SYSDATE@!+30))
  10 - access("DTL"."PROCESS_IND"='NEW' AND "DTL"."PROCESS_DATE"<SYSDATE@!+30)
  11 - access("DTL"."HDRID"="HDR"."HDRID")
  12 - filter(("HDR_IND"=0 OR "HDR_IND"=2)) 

After cardinality feedback:

SQL_ID  4dwv2ktktv1ck, child number 2
-------------------------------------
select /*+ GATHER_PLAN_STATISTICS */   dtlid from   (select      dtlid
  from      (select         dtlid,         process_date       from
   dtl       where         hdrid is null         and process_date <
(sysdate + 30)         and process_ind = 'NEW'       UNION ALL
select         dtl.dtlid,         dtl.process_date       from
dtl,         hdr       where         dtl.hdrid = hdr.hdrid         and
dtl.process_date < (sysdate + 30)         and dtl.process_ind = 'NEW'
      and hdr_ind in (0, 2))    order by      dtlid,      process_date)
where   rownum <= 10

Plan hash value: 986254152

---------------------------------------------------------------------------------------------------------------------
| Id  | Operation               | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
---------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |      |      1 |        |     10 |00:00:00.01 |    2437 |       |       |          |
|*  1 |  COUNT STOPKEY          |      |      1 |        |     10 |00:00:00.01 |    2437 |       |       |          |
|   2 |   VIEW                  |      |      1 |   2104 |     10 |00:00:00.01 |    2437 |       |       |          |
|*  3 |    SORT ORDER BY STOPKEY|      |      1 |   2104 |     10 |00:00:00.01 |    2437 |  2048 |  2048 | 2048  (0)|
|   4 |     VIEW                |      |      1 |   2104 |   1367 |00:00:00.01 |    2437 |       |       |          |
|   5 |      UNION-ALL          |      |      1 |        |   1367 |00:00:00.01 |    2437 |       |       |          |
|*  6 |       TABLE ACCESS FULL | DTL  |      1 |    160 |    160 |00:00:00.01 |    1117 |       |       |          |
|*  7 |       HASH JOIN         |      |      1 |   1944 |   1207 |00:00:00.01 |    1320 |  1517K|  1517K| 1398K (0)|
|*  8 |        TABLE ACCESS FULL| HDR  |      1 |   6667 |   4285 |00:00:00.01 |     203 |       |       |          |
|*  9 |        TABLE ACCESS FULL| DTL  |      1 |  15034 |  15034 |00:00:00.01 |    1117 |       |       |          |
---------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(ROWNUM<=10)
   3 - filter(ROWNUM<=10)
   6 - filter(("HDRID" IS NULL AND "PROCESS_IND"='NEW' AND "PROCESS_DATE"<SYSDATE@!+30))
   7 - access("DTL"."HDRID"="HDR"."HDRID")
   8 - filter(("HDR_IND"=0 OR "HDR_IND"=2))
   9 - filter(("DTL"."PROCESS_IND"='NEW' AND "DTL"."PROCESS_DATE"<SYSDATE@!+30))

Note
-----
   - cardinality feedback used for this statement 

In the above example, cardinality feedback helped to reduce the execution time.  Notice in the first plan that the E-Rows column shows that the ROWNUM predicate was pushed too far into the inline view’s cardinality calculations.  So, cardinality feedback helped in the above example.  Now, let’s look at what happens to my optimized version of the SQL statement that removes the UNION ALL:

select
  dtlid
from
  (select
     dtlid
   from
     (select /*+ INDEX(DTL) INDEX(HRD) USE_NL(DTL HDR) */
        dtl.dtlid,
        dtl.process_date
      from
        dtl,
        hdr
      where
        dtl.hdrid = hdr.hdrid(+)
        and dtl.process_date < (sysdate + 30)
        and dtl.process_ind = 'NEW'
        and (
            (hdr_ind in (0, 2) and dtl.hdrid IS NOT NULL)
            OR (dtl.hdrid IS NULL)))
   order by
     dtlid,
     process_date) 

The execution plan before cardinality feedback:

SQL_ID  5d8128vjhm1j0, child number 0
-------------------------------------
select /*+ GATHER_PLAN_STATISTICS */   dtlid from   (select      dtlid
  from      (select         dtl.dtlid,         dtl.process_date
from         dtl,         hdr       where         dtl.hdrid =
hdr.hdrid(+)         and dtl.process_date < (sysdate + 30)         and
dtl.process_ind = 'NEW'         and (             (hdr_ind in (0, 2)
and dtl.hdrid IS NOT NULL)             OR (dtl.hdrid IS NULL)))
order by      dtlid,      process_date) where   rownum <= 10

Plan hash value: 1187173211

----------------------------------------------------------------------------------------------------------
| Id  | Operation                       | Name         | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                |              |      1 |        |     10 |00:00:00.01 |      80 |
|*  1 |  COUNT STOPKEY                  |              |      1 |        |     10 |00:00:00.01 |      80 |
|   2 |   VIEW                          |              |      1 |     11 |     10 |00:00:00.01 |      80 |
|*  3 |    FILTER                       |              |      1 |        |     10 |00:00:00.01 |      80 |
|   4 |     NESTED LOOPS OUTER          |              |      1 |     11 |     19 |00:00:00.01 |      80 |
|   5 |      TABLE ACCESS BY INDEX ROWID| DTL          |      1 |  19635 |     19 |00:00:00.01 |      52 |
|*  6 |       INDEX FULL SCAN           | IND_DTL_TEST |      1 |     16 |     19 |00:00:00.01 |      49 |
|   7 |      TABLE ACCESS BY INDEX ROWID| HDR          |     19 |      1 |     18 |00:00:00.01 |      28 |
|*  8 |       INDEX UNIQUE SCAN         | HDR_PK       |     19 |      1 |     18 |00:00:00.01 |      10 |
----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(ROWNUM<=10)
   3 - filter(((INTERNAL_FUNCTION("HDR_IND") AND "DTL"."HDRID" IS NOT NULL) OR "DTL"."HDRID" IS
              NULL))
   6 - access("DTL"."PROCESS_IND"='NEW' AND "DTL"."PROCESS_DATE"<SYSDATE@!+30)
       filter(("DTL"."PROCESS_IND"='NEW' AND "DTL"."PROCESS_DATE"<SYSDATE@!+30))
   8 - access("DTL"."HDRID"="HDR"."HDRID") 

The execution plan after cardinality feedback:

SQL_ID  5d8128vjhm1j0, child number 1
-------------------------------------
select /*+ GATHER_PLAN_STATISTICS */   dtlid from   (select      dtlid
  from      (select         dtl.dtlid,         dtl.process_date
from         dtl,         hdr       where         dtl.hdrid =
hdr.hdrid(+)         and dtl.process_date < (sysdate + 30)         and
dtl.process_ind = 'NEW'         and (             (hdr_ind in (0, 2)
and dtl.hdrid IS NOT NULL)             OR (dtl.hdrid IS NULL)))
order by      dtlid,      process_date) where   rownum <= 10

Plan hash value: 672193791

--------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                        | Name   | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                 |        |      1 |        |     10 |00:00:00.02 |    5315 |       |       |          |
|*  1 |  COUNT STOPKEY                   |        |      1 |        |     10 |00:00:00.02 |    5315 |       |       |          |
|   2 |   VIEW                           |        |      1 |      9 |     10 |00:00:00.02 |    5315 |       |       |          |
|*  3 |    SORT ORDER BY STOPKEY         |        |      1 |      9 |     10 |00:00:00.02 |    5315 |  2048 |  2048 | 2048  (0)|
|*  4 |     FILTER                       |        |      1 |        |   1374 |00:00:00.02 |    5315 |       |       |          |
|   5 |      NESTED LOOPS OUTER          |        |      1 |      9 |  15115 |00:00:00.02 |    5315 |       |       |          |
|*  6 |       TABLE ACCESS FULL          | DTL    |      1 |     19 |  15115 |00:00:00.01 |    1117 |       |       |          |
|   7 |       TABLE ACCESS BY INDEX ROWID| HDR    |  15115 |      1 |   2853 |00:00:00.01 |    4198 |       |       |          |
|*  8 |        INDEX UNIQUE SCAN         | HDR_PK |  15115 |      1 |   2853 |00:00:00.01 |    1345 |       |       |          |
--------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(ROWNUM<=10)
   3 - filter(ROWNUM<=10)
   4 - filter(((INTERNAL_FUNCTION("HDR_IND") AND "DTL"."HDRID" IS NOT NULL) OR "DTL"."HDRID" IS NULL))
   6 - filter(("DTL"."PROCESS_IND"='NEW' AND "DTL"."PROCESS_DATE"<SYSDATE@!+30))
   8 - access("DTL"."HDRID"="HDR"."HDRID")

Note
-----
   - cardinality feedback used for this statement 

In the above, notice that the reported A-Time increased from 0.01 seconds (the actual time could be must faster than that) to 0.02 seconds.  The number of consistent gets also increased from 80 to 5,315.  Cardinality feedback will NOT automatically change the execution plan back to the more efficient execution plan during later executions of the SQL statement.





Why is a Full Table Scan Selected – Will the Execution Plan Provide a Clue?

20 06 2012

June 20, 2012

I have not had much time to respond in OTN threads recently, although I do still occasionally read threads on the forum.  I was a little surprised by one of the late responses in one of the recent threads, where one of the responders suggested actually testing the problem with the assistance of execution plans.  Additionally, that responder suggested that the earlier responses missed the target.  Ouch!

The OP did not provide DDL to create the table or indexes, or DML to populate the table.  The SQL statement provided by the OP was apparently looking for rows where the VARCHAR2 column A was blank (as in containing a zero length string).  If an index exists on column A, why would the optimizer select to perform a full table scan when attempting to return all rows with a zero length string in column A?  (Raise your hand if you know the answer.)

Let’s create a test table with four indexes so that we are able to test some of the theories (or possible causes) that were proposed in the thread:

DROP TABLE T1 PURGE;

CREATE TABLE T1 AS 
SELECT
  ROWNUM C1,
  MOD(ROWNUM,10) C2,
  TRUNC(ROWNUM/10000) C3,
  RPAD(TO_CHAR(ROWNUM),6,'A') C4,
  LPAD('A',200,'A') C5
FROM
  DUAL
CONNECT BY
  LEVEL<=100000;

ALTER TABLE T1 MODIFY (
  C1 NOT NULL,
  C2 NOT NULL,
  C3 NOT NULL);

CREATE INDEX IND_T1_C1 ON T1(C1);
CREATE INDEX IND_T1_C2 ON T1(C2);
CREATE INDEX IND_T1_C3 ON T1(C3);
CREATE INDEX IND_T1_C4 ON T1(C4);

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T1',CASCADE=>TRUE,ESTIMATE_PERCENT=>NULL)

SET LINESIZE 140
SET PAGESIZE 1000 

The above script creates a table with 100,000 rows, with the first 3 columns declared as NOT NULL.  Column C1 contains the numbers 1 through 100,000.  Column C2 contains the numbers 0 through 9 in a repeating pattern.  Column C3 contains the numbers 0 through 10 with most of the rows containing the same value likely closely bunched together (a single row contains the number 10).  Column C4 is a simple VARCHAR2 column that is the number 1 through 100,000 padded to six characters using the letter A.  The statistics for the table and indexes were collected with a 100% sampling percentage.

Using the test table created by the above script, we might simulate the OP’s SQL statement using the following.  One of the last responders in the OTN thread recommended looking at the execution plan, so we will retrieve that also:

SELECT
  *
FROM
  T1
WHERE
  C4='';

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'TYPICAL')); 

Unfortunately, the two single quotes () in the SQL statement are not interpretted as a test for a zero length string as happens on other DB platforms, and as the OP likely intended (see this AskTom thread).  The Oracle Database 11.2.0.2 output of the above follows:

no rows selected

SQL>

SQL_ID  5tc2j4q52493b, child number 0
-------------------------------------
SELECT   * FROM   T1 WHERE   C4=''

Plan hash value: 3332582666

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       |     1 (100)|          |
|*  1 |  FILTER            |      |       |       |            |          |
|   2 |   TABLE ACCESS FULL| T1   |   100K|    20M|   252   (1)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(NULL IS NOT NULL) 

So, the Predicate Information section shows NULL IS NOT NULL, which never has a result of TRUE.  Notice that the full table scan has a cost of 252, yet plan row 0, which is a grandparent of row 2, has a cost of 1 (costs for parent operations are supposed to include the calculated cost of child operations plus the calculated cost of the work performed at the parent operation).  (Raise your hand if you know the answer why plan row 0 has a cost of 1 when plan row 2 has a cost of 252.)  We will have to come back to this execution plan oddity later.

For a SQL statement like the following, which should return one row, you would expect the index on column C1 to be used:

SELECT
  C1,
  C2,
  C3,
  C4,
  SUBSTR(C5,1,10) C5
FROM
  T1
WHERE
  C1=9;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'TYPICAL')); 

Here is the output of the above:

        C1         C2         C3 C4     C5
---------- ---------- ---------- ------ ----------
         9          9          0 9AAAAA AAAAAAAAAA

SQL>
SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'TYPICAL'));

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------
SQL_ID  90d7zdbavdpuu, child number 0
-------------------------------------
SELECT   C1,   C2,   C3,   C4,   SUBSTR(C5,1,10) C5 FROM   T1 WHERE
C1=9

Plan hash value: 683303157

-----------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |       |       |     2 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1        |     1 |   219 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IND_T1_C1 |     1 |       |     1   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C1"=9) 

So, an index range scan was performed to retrieve the one row from table T1.  (Raise your hand if you know the answer why an index range scan operation was selected when every value in column C1 is unique and an equality predicate was used in the WHERE clause.)

For the following SQL statement, where an index on column C2 would likely have a large clustering factor value, should an index be used to retrieve 10% of the rows from the table?

SELECT
  C1,
  C2,
  C3,
  C4,
  SUBSTR(C5,1,10) C5
FROM
  T1
WHERE
  C2=9;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'TYPICAL')); 

Here is a portion of the output from the above SQL statement:

...
     99969          9          9 99969A AAAAAAAAAA
     99979          9          9 99979A AAAAAAAAAA
     99989          9          9 99989A AAAAAAAAAA
     99999          9          9 99999A AAAAAAAAAA

10000 rows selected.

SQL>
SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'TYPICAL'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------
SQL_ID  459zvkh9p725k, child number 0
-------------------------------------
SELECT   C1,   C2,   C3,   C4,   SUBSTR(C5,1,10) C5 FROM   T1 WHERE
C2=9

Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |       |       |   252 (100)|          |
|*  1 |  TABLE ACCESS FULL| T1   | 10000 |  2138K|   252   (1)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("C2"=9) 

The optimizer selected to use a full table scan in order to retrieve 10% of the rows from the table.  (Raise your hand if you know the answer why a full table scan was selected.)

A final SQL statement that also retrieves 10% of the rows from the table, this time using column C3:

SELECT
  C1,
  C2,
  C3,
  C4,
  SUBSTR(C5,1,10) C5
FROM
  T1
WHERE
  C3=9;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'TYPICAL')); 

For the above SQL statement, would you expect the optimizer to select the use of an index range scan operation, or a full table scan operation?  Here is the output that I received:

...
     99996          6          9 99996A AAAAAAAAAA
     99997          7          9 99997A AAAAAAAAAA
     99998          8          9 99998A AAAAAAAAAA
     99999          9          9 99999A AAAAAAAAAA

10000 rows selected.

SQL>
SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'TYPICAL'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------
SQL_ID  4c1s2vujtc1ff, child number 0
-------------------------------------
SELECT   C1,   C2,   C3,   C4,   SUBSTR(C5,1,10) C5 FROM   T1 WHERE
C3=9

Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |       |       |   252 (100)|          |
|*  1 |  TABLE ACCESS FULL| T1   |  9091 |  1944K|   252   (1)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("C3"=9) 

So, once again the optimizer selected to use a full table scan to facilitate the retrieval of 10% of the rows from the table.  But why, the clustering factor should be reasonably low.  (Raise your hand if you know the answer why a full table scan was selected.)

But wait… the optimizer from another 11.2.0.2 database instance decided differently:

...
     99996          6          9 99996A AAAAAAAAAA
     99997          7          9 99997A AAAAAAAAAA
     99998          8          9 99998A AAAAAAAAAA
     99999          9          9 99999A AAAAAAAAAA

10000 rows selected.

SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'TYPICAL'));

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------
SQL_ID  4c1s2vujtc1ff, child number 0
-------------------------------------
SELECT   C1,   C2,   C3,   C4,   SUBSTR(C5,1,10) C5 FROM   T1 WHERE
C3=9

Plan hash value: 1220227203

-----------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |       |       |   304 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1        |  9091 |  1944K|   304   (1)| 00:00:02 |
|*  2 |   INDEX RANGE SCAN          | IND_T1_C3 |  9091 |       |    18   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C3"=9) 

Why did the optimizer in one of the 11.2.0.2 database instances select to use a full table scan operation, while the optimizer in the other database instance selected to use an index range scan operation?  (Raise your hand if you know the answer why there is a difference between the two execution plan results.)

If we take another look at the execution plan for the SQL statement that simulates the problem experienced by the OP, I wonder if we are able to determine why the parent operation has a lower calculated cost than its child (or grandchild’s) calculated cost?

SELECT /*+ GATHER_PLAN_STATISTICS */
  *
FROM
  T1
WHERE
  C4='';

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST +COST')); 

Let’s take a look at the output of the above:

no rows selected

SQL>
SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST +COST'));

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------
SQL_ID  63yz0tn58sz4v, child number 0
-------------------------------------
SELECT /*+ GATHER_PLAN_STATISTICS */   * FROM   T1 WHERE   C4=''

Plan hash value: 3332582666

----------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |     1 (100)|      0 |00:00:00.01 |
|*  1 |  FILTER            |      |      1 |        |            |      0 |00:00:00.01 |
|   2 |   TABLE ACCESS FULL| T1   |      0 |    100K|   252   (1)|      0 |00:00:00.01 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(NULL IS NOT NULL) 

Do you see it?  Does it matter if a full table scan of table T1 was selected by the optimizer?   (Raise your hand if you know the answer.)

What if we force an index access path:

SELECT /*+ GATHER_PLAN_STATISTICS INDEX(T1) */
  *
FROM
  T1
WHERE
  C4='';

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST +COST')); 

Here is the output of the above:

no rows selected

SQL>
SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST +COST'));

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------------------------
SQL_ID  06ucccd8j8ss0, child number 0
-------------------------------------
SELECT /*+ GATHER_PLAN_STATISTICS INDEX(T1) */   * FROM   T1 WHERE
C4=''

Plan hash value: 1158503954

-------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name      | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |           |      1 |        |     1 (100)|      0 |00:00:00.01 |
|*  1 |  FILTER                      |           |      1 |        |            |      0 |00:00:00.01 |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1        |      0 |    100K|  3323   (1)|      0 |00:00:00.01 |
|   3 |    INDEX FULL SCAN           | IND_T1_C3 |      0 |    100K|   196   (1)|      0 |00:00:00.01 |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(NULL IS NOT NULL) 

Is the above execution plan even legal because the SQL statement is essentially looking for NULL values?  (Raise your hand if you know the answer.)

—-

OK, put down your hand – the other people in the office are probably laughing at your hand-waving by now.  ;-)

I wonder if the OP now understands the problem with his SQL statement?

—–

Late Addition June 20, 2012:

Recall that the optimizer selected that a full table scan should be used for the initial SELECT statement.  Part 2 of a previous blog article on this blog pointed to an article on another blog, asking whether or not a couple of statements made on the other blog were true regarding whether or not NULL values were ever stored in a B*tree index structure.  One of my previous articles seemed to offer a counter-point, that in fact it is possible for a B*tree index to contain NULL values in certain situations.

What might happen if we swap in a bind variable in place of the literal in the initial SELECT statement?  If we set the bind variable to have a value of (the same value as the literal), will the optimizer select to use a full table scan operation or an index range scan operation?  Will the STARTS column in the execution plan contain 0 for one or more of the rows in the execution plan?  Let’s test:

VARIABLE V1 VARCHAR2
EXEC :V1:=''
 
SELECT /*+ GATHER_PLAN_STATISTICS */
  *
FROM
  T1
WHERE
  C4= :V1;
 
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST +COST')); 

Here is the output:

no rows selected
 
SQL>
SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST +COST'));
 
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------
SQL_ID  c4faj4cgbdrw1, child number 0
-------------------------------------
SELECT /*+ GATHER_PLAN_STATISTICS */   * FROM   T1 WHERE   C4= :V1
 
Plan hash value: 7035821
 
------------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   |
------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |      1 |        |     2 (100)|      0 |00:00:00.01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1        |      1 |      1 |     2   (0)|      0 |00:00:00.01 |
|*  2 |   INDEX RANGE SCAN          | IND_T1_C4 |      1 |      1 |     1   (0)|      0 |00:00:00.01 |
------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C4"=:V1)

Notice that none of the rows in the execution plan have a STARTS value of 0, and that filter(NULL IS NOT NULL) does not appear in the Predicate Information section of the execution plan as had happened when we used a literal with the same value as the bind variable.  So, if the OP would like to see an index access path in the execution plan, perhaps he should use bind variables rather than literals?  Is the above execution plan more efficient or less efficient than the execution plan with the full table scan operation that was seen when a literal was passed in the SQL statement?

Let’s insert a row into table T4 with our bind variable value being placed in column C4:

INSERT INTO T1 VALUES(
  -1,
  -1,
  -1,
  :V1,
  'A');
 
COMMIT;
 
SELECT /*+ GATHER_PLAN_STATISTICS  FIND_ME */
  *
FROM
  T1
WHERE
  C4= :V1;
  
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST +COST'));

Will a row be returned?  Will the execution plan change?  Here is the output for the above script:

no rows selected
 
SQL>
SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST +COST'));
 
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------
SQL_ID  csmkq5csd6f0a, child number 0
-------------------------------------
SELECT /*+ GATHER_PLAN_STATISTICS  FIND_ME */   * FROM   T1 WHERE   C4=
:V1
 
Plan hash value: 7035821
 
------------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   |
------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |      1 |        |     2 (100)|      0 |00:00:00.01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1        |      1 |      1 |     2   (0)|      0 |00:00:00.01 |
|*  2 |   INDEX RANGE SCAN          | IND_T1_C4 |      1 |      1 |     1   (0)|      0 |00:00:00.01 |
------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C4"=:V1)

No rows selected, even though a row was just inserted into the table with the same value  as what is in bind variable V1.  Let’s DUMP the values of a couple of columns from the row that was just inserted:

COLUMN DUMP_C1 FORMAT A25
COLUMN DUMP_C4 FORMAT A25
 
SELECT
  C1,
  DUMP(C1) DUMP_C1,
  C4,
  DUMP(C4) DUMP_C4
FROM
  T1
WHERE
  C1=-1;
 
        C1 DUMP_C1                   C4     DUMP_C4
---------- ------------------------- ------ -------------------------
        -1 Typ=2 Len=3: 62,100,102          NULL

The above output shows that column C4 of this row contains a NULL value.  I suggest that it might not matter whether or not the execution plan shows an index access path or a full table scan for this SQL statement – if the SQL statement is answering a different question than what the OP expects, then it does not matter whether or not the SQL statement executes efficiently.

However, as a bonus we were able to see the optimizer using an index access path to check for a NULL value.





SQL Challenges

14 06 2012

June 14, 2012

Dominic Delmolino put together a very interesting challenge.  The challenge is to produce something called a Pascal matrix using Oracle Database… more specifically, just SQL.  I had a vague recollection of Pascal matrixes when I read Dominic’s challenge.  Basically, the goal is to create a matrix similar to the following:

The rule for generating the matrix is simply that a cell’s value is the sum of the value in the cell that is immediately to the left plus the value in the cell that is immediately above.  Sounds easy, right?

If we were just working in Microsoft Excel (or some other spreadsheet package), we could do something like this to quickly create the matrix:

Dominic’s challenge probably would not be much of a challenge if we could just type in formulas like the above into a SQL statement.  Give his challenge a try to see if you are able to derive a unique solution to the problem.  I probably spent a couple of minutes (maybe 60 seconds with the help of copy and paste) creating the above example using Microsoft Excel, but spent a couple of hours trying to produce a solution that worked using SQL.

——

Part 2 of the challenge.

Take a look at the bullet point items in this blog article about NULL values in table’s columns, in particular the first three large bullet point items.  Do you agree or disagree with the statements, and why?








Follow

Get every new post delivered to your Inbox.

Join 143 other followers