Test Case Showing Oracle Database 11.2.0.1 Completely Ignoring an Index Even when Hinted

22 09 2010

September 22, 2010

In a recent OTN thread a person asked an interesting question: why isn’t my index being used?  A query of a table with 8,000,000 rows should quickly return exactly 3 rows when an available index is used, and that index is used when the WHERE clause is simply:

WHERE
  C200000020 LIKE 'BOSS' || '%' 

However, the application is submitting a WHERE clause that includes an impossible condition in an OR clause, like the following, which is not much different from stating OR 1=2:

WHERE
  C200000020 LIKE 'BOSS' || '%'
  OR 'BOSS' = ''

That constant1 = constant2 predicate, at least on Oracle Database 10.1 and above, is sufficient to keep the index from being used, thus the query performs a full table scan.  But why?

I think that we need a test case to see what is happening.  First, we will create a simple table with our column of interest and a large column that should help to discourage full table scans:

CREATE TABLE T1 (
  C200000020 VARCHAR2(20),
  PADDING VARCHAR2(250));

Next, we will insert 10,000,000 rows into the table such that an index built on the column C200000020 will have a very high clustering factor, and 3 rows will have a value that begins with BOSS (as a result of the DECODE statement):

INSERT INTO
  T1
SELECT
  DECODE(MOD(ROWNUM,3000000),0,'BOSS'||ROWNUM,
  CHR(90-MOD(ROWNUM-1,26))||
  CHR(75+MOD(ROWNUM,10))||
  CHR(80+MOD(ROWNUM,5))||
  'S'||ROWNUM) C200000020,
  RPAD('A',200,'A') PADDING
FROM
  (SELECT
    ROWNUM RN
  FROM
    DUAL
  CONNECT BY
    LEVEL<=10000) V1,
  (SELECT
    ROWNUM RN
  FROM
    DUAL
  CONNECT BY
    LEVEL<=1000) V2;

Now to create the index and collect statistics:

CREATE INDEX IND_T1 ON T1(C200000020);

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T1',CASCADE=>TRUE)

Let’s take a look at the execution plans:

SET AUTOTRACE TRACEONLY EXPLAIN

SELECT
  *
FROM
  T1
WHERE
  C200000020 LIKE 'BOSS' || '%';

Plan hash value: 634656657

--------------------------------------------------------------------------------------
| Id  | Operation                   | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |        |     1 |   213 |     5   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1     |     1 |   213 |     5   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IND_T1 |     1 |       |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C200000020" LIKE 'BOSS%')
       filter("C200000020" LIKE 'BOSS%')

An index access, just like we had hoped.  The optimizer is predicting a single row to be retrieved.  Let’s try the other query:

SELECT
  *
FROM
  T1
WHERE
  C200000020 LIKE 'BOSS' || '%'
  OR 'BOSS' = '';

Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |   100K|    20M| 82353   (1)| 00:16:29 |
|*  1 |  TABLE ACCESS FULL| T1   |   100K|    20M| 82353   (1)| 00:16:29 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter('BOSS'='' OR "C200000020" LIKE 'BOSS%')

A full table scan, just like the original poster in the OTN thread experienced.  Notice that the optimizer is now predicting that 100,000 rows (1% of the rows) will be retrieved.  Repeating, 1% of the rows and a full table scan.  Let’s generate a 10053 trace for the SQL statement:

ALTER SYSTEM FLUSH SHARED_POOL;

SET AUTOTRACE OFF

ALTER SESSION SET TRACEFILE_IDENTIFIER = 'T1_10053';
ALTER SESSION SET EVENTS '10053 TRACE NAME CONTEXT FOREVER, LEVEL 1';

SELECT
  *
FROM
  T1
WHERE
  C200000020 LIKE 'BOSS' || '%'
  OR 'BOSS' = '';

ALTER SESSION SET EVENTS '10053 TRACE NAME CONTEXT OFF';

Inside the 10053 trace, my 11.2.0.1 test database produced the following:

***************************************
BASE STATISTICAL INFORMATION
***********************
Table Stats::
  Table: T1  Alias: T1
    #Rows: 10000000  #Blks:  303031  AvgRowLen:  213.00
Index Stats::
  Index: IND_T1  Col#: 1
    LVLS: 2  #LB: 32323  #DK: 9939968  LB/K: 1.00  DB/K: 1.00  CLUF: 10120176.00
Access path analysis for T1
***************************************
SINGLE TABLE ACCESS PATH
  Single Table Cardinality Estimation for T1[T1]
  Table: T1  Alias: T1
    Card: Original: 10000000.000000  Rounded: 100001  Computed: 100001.02  Non Adjusted: 100001.02
  Access Path: TableScan
    Cost:  82352.89  Resp: 82352.89  Degree: 0
      Cost_io: 82073.00  Cost_cpu: 7150017105
      Resp_io: 82073.00  Resp_cpu: 7150017105
  ****** trying bitmap/domain indexes ******
  ****** finished trying bitmap/domain indexes ******
  Best:: AccessPath: TableScan
         Cost: 82352.89  Degree: 1  Resp: 82352.89  Card: 100001.02  Bytes: 0

***************************************

OPTIMIZER STATISTICS AND COMPUTATIONS
***************************************
GENERAL PLANS
***************************************
Considering cardinality-based initial join order.
Permutations for Starting Table :0
Join order[1]:  T1[T1]#0
***********************
Best so far:  Table#: 0  cost: 82352.8851  card: 100001.0195  bytes: 21300213
***********************
(newjo-stop-1) k:0, spcnt:0, perm:1, maxperm:2000

*********************************
Number of join permutations tried: 1
*********************************

The unknown result of the constant in the WHERE clause (‘BOSS’ = ”) caused Oracle to predict that the cardinality will be (1 row) + (1% of the rows) = 100,001.  With a clustering factor of 10,120,176 the optimizer is (possibly) convinced that it will need to perform single block physical reads of a large number of table blocks to read the 100,001 rows that it expects to retrieve, so it decided that a full table scan would complete faster.  But the situation is worse than that – it did not even consider an index access path.  As a demonstration, I will manually set the index’s clustering factor to a low value and check the execution plan again:

EXEC DBMS_STATS.SET_INDEX_STATS(OWNNAME=>USER,INDNAME=>'IND_T1',CLSTFCT=>10000)

SET AUTOTRACE TRACEONLY EXPLAIN

ALTER SYSTEM FLUSH SHARED_POOL;

SELECT
  *
FROM
  T1
WHERE
  C200000020 LIKE 'BOSS' || '%'
  OR 'BOSS' = '';

Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |   100K|    20M| 82353   (1)| 00:16:29 |
|*  1 |  TABLE ACCESS FULL| T1   |   100K|    20M| 82353   (1)| 00:16:29 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter('BOSS'='' OR "C200000020" LIKE 'BOSS%')

Still a full table scan.  If we had generated a 10053 trace, we would see that the clustering factor for the index was indeed adjusted from what we saw earlier:

***************************************
BASE STATISTICAL INFORMATION
***********************
Table Stats::
  Table: T1  Alias: T1
    #Rows: 10000000  #Blks:  303031  AvgRowLen:  213.00
Index Stats::
  Index: IND_T1  Col#: 1
    LVLS: 2  #LB: 32323  #DK: 9939968  LB/K: 1.00  DB/K: 1.00  CLUF: 10000.00
Access path analysis for T1
***************************************
SINGLE TABLE ACCESS PATH
  Single Table Cardinality Estimation for T1[T1]
  Table: T1  Alias: T1
    Card: Original: 10000000.000000  Rounded: 100001  Computed: 100001.02  Non Adjusted: 100001.02
  Access Path: TableScan
    Cost:  82352.89  Resp: 82352.89  Degree: 0
      Cost_io: 82073.00  Cost_cpu: 7150017105
      Resp_io: 82073.00  Resp_cpu: 7150017105
  ****** trying bitmap/domain indexes ******
  ****** finished trying bitmap/domain indexes ******
  Best:: AccessPath: TableScan
         Cost: 82352.89  Degree: 1  Resp: 82352.89  Card: 100001.02  Bytes: 0

Let’s force the execution plan with an index hint to see what happens:

SELECT /*+ INDEX(T1 IND_T1) */
  *
FROM
  T1
WHERE
  C200000020 LIKE 'BOSS' || '%'
  OR 'BOSS' = '';

Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |   100K|    20M| 82353   (1)| 00:16:29 |
|*  1 |  TABLE ACCESS FULL| T1   |   100K|    20M| 82353   (1)| 00:16:29 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter('BOSS'='' OR "C200000020" LIKE 'BOSS%')

Note that the optimizer did not (or could not) obey the hint.  It decided to apply the ‘BOSS’=” predicate first, so maybe that is the problem.  Let’s try a hint to force the optimizer to apply the predicates in order, rather than based on calculated cost:

SELECT /*+ ORDERED_PREDICATES INDEX(T1 IND_T1) */
  *
FROM
  T1
WHERE
  C200000020 LIKE 'BOSS' || '%'
  OR 'BOSS' = '';

Execution Plan
----------------------------------------------------------
Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |   100K|    20M| 82353   (1)| 00:16:29 |
|*  1 |  TABLE ACCESS FULL| T1   |   100K|    20M| 82353   (1)| 00:16:29 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("C200000020" LIKE 'BOSS%' OR 'BOSS'='')

The predicate section of the execution plan changed, but the optimizer still will not consider an index access path for the SQL statement.  There is a chance that the OP could do something to force the index access path by hacking a stored outline for the query, but my guess is that the restriction on the C200000020 column changes from time to time, so an outline likely will not work.  The OP could try to file an Oracle bug report because the optimizer completely ignored the index access paths (as shown in the 10053 trace file), but a better course of action would be to have the application submitting the SQL statement fixed.

Let’s try a small variation on the original test SQL statement.  Let’s see what happens when we add a space between the two characters:

SET AUTOTRACE TRACEONLY EXPLAIN

ALTER SESSION SET TRACEFILE_IDENTIFIER = 'T1_10053-3';
ALTER SESSION SET EVENTS '10053 TRACE NAME CONTEXT FOREVER, LEVEL 1';

SELECT
  *
FROM
  T1
WHERE
  C200000020 LIKE 'BOSS' || '%'
  OR 'BOSS' = ' ';

Execution Plan
----------------------------------------------------------
Plan hash value: 634656657
--------------------------------------------------------------------------------------
| Id  | Operation                   | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |        |     1 |   213 |     4   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1     |     1 |   213 |     4   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IND_T1 |     1 |       |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C200000020" LIKE 'BOSS%')
       filter("C200000020" LIKE 'BOSS%')

ALTER SESSION SET EVENTS '10053 TRACE NAME CONTEXT OFF';

Notice by looking at the Predicate Information section of the plan that Oracle removed the nonsensical OR clause.  The 10053 trace file showed this:

***************************************
BASE STATISTICAL INFORMATION
***********************
Table Stats::
  Table: T1  Alias: T1
    #Rows: 10000000  #Blks:  303031  AvgRowLen:  213.00
Index Stats::
  Index: IND_T1  Col#: 1
    LVLS: 2  #LB: 32323  #DK: 9939968  LB/K: 1.00  DB/K: 1.00  CLUF: 10000.00
Access path analysis for T1
***************************************
SINGLE TABLE ACCESS PATH
  Single Table Cardinality Estimation for T1[T1]
  Table: T1  Alias: T1
    Card: Original: 10000000.000000  Rounded: 1  Computed: 1.03  Non Adjusted: 1.03
  Access Path: TableScan
    Cost:  82255.34  Resp: 82255.34  Degree: 0
      Cost_io: 82073.00  Cost_cpu: 4658017105
      Resp_io: 82073.00  Resp_cpu: 4658017105
kkofmx: index filter:"T1"."C200000020" LIKE 'BOSS%'
  Access Path: index (RangeScan)
    Index: IND_T1
    resc_io: 4.00  resc_cpu: 29226
    ix_sel: 0.000000  ix_sel_with_filters: 0.000000
    Cost: 4.00  Resp: 4.00  Degree: 1
  Best:: AccessPath: IndexRange
  Index: IND_T1
         Cost: 4.00  Degree: 1  Resp: 4.00  Card: 1.03  Bytes: 0

So, it appears that if the optimizer is presented with a zero length VARCHAR being compared with another VARCHAR in the WHERE clause, there could be unexpected cases were index access paths will not be used even when hinted.

Toon Koppelaars mentioned in the OTN thread that the WHERE clause should be using bind variables, and suggested the following for the WHERE clause:

( (T131.C200000020 LIKE (:B0 || '%')) OR (:B0 IS NULL))

I agree with Toon regarding the use of bind variables.  Unfortunately, it does not look like bind variables improve the situation, at least in my test case.

I cannot use AUTOTRACE due to the risk that it will display an incorrect execution plan due to the bind variables, so I will use DBMS_XPLAN.DISPLAY_CURSOR along with a GATHER_PLAN_STATISTICS hint in the SQL statement.  First the statistics collection (to correct any manual adjustment to the index’s clustering factor that was performed earlier) and bind variable setup:

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T1',CASCADE=>TRUE)

VARIABLE B0 VARCHAR2(50)
VARIABLE B1 VARCHAR2(50)
VARIABLE B2 VARCHAR2(50)

EXEC :B0 := 'BOSS'
EXEC :B1 := 'BOSS'
EXEC :B2 := ''

SET AUTOTRACE OFF

Since we do not know the intention of the developer, I will try a couple of combinations to see what happens:

SELECT /*+ GATHER_PLAN_STATISTICS */
  *
FROM
  T1
WHERE
  C200000020 LIKE :B0 || '%'
  OR :B1 = :B2;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

SQL_ID  b47qzqbb6wymu, child number 0
-------------------------------------
SELECT /*+ GATHER_PLAN_STATISTICS */   * FROM   T1 WHERE   C200000020
LIKE :B0 || '%'   OR :B1 = :B2

Plan hash value: 3617692013

------------------------------------------------------------------------------------
| Id  | Operation         | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |      1 |        |      3 |00:00:00.34 |     303K|
|*  1 |  TABLE ACCESS FULL| T1   |      1 |    100K|      3 |00:00:00.34 |     303K|
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter((:B1=:B2 OR "C200000020" LIKE :B0||'%'))

The above resulted in the same full table scan that we saw earlier.

SELECT /*+ GATHER_PLAN_STATISTICS */
  *
FROM
  T1
WHERE
  C200000020 LIKE :B0 || '%'
  OR :B0 = :B2;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

SQL_ID  1vdyc7t6wazhz, child number 0
-------------------------------------
SELECT /*+ GATHER_PLAN_STATISTICS */   * FROM   T1 WHERE   C200000020
LIKE :B0 || '%'   OR :B0 = :B2

Plan hash value: 3617692013

------------------------------------------------------------------------------------
| Id  | Operation         | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |      1 |        |      3 |00:00:00.34 |     303K|
|*  1 |  TABLE ACCESS FULL| T1   |      1 |    100K|      3 |00:00:00.34 |     303K|
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter((:B0=:B2 OR "C200000020" LIKE :B0||'%'))

Specifying the B0 bind variable twice did not help.

SELECT /*+ GATHER_PLAN_STATISTICS */
  *
FROM
  T1
WHERE
  C200000020 LIKE :B0 || '%'
  OR :B0 IS NULL;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

SQL_ID  8n1bg0z9j0001, child number 0
-------------------------------------
SELECT /*+ GATHER_PLAN_STATISTICS */   * FROM   T1 WHERE   C200000020
LIKE :B0 || '%'   OR :B0 IS NULL

Plan hash value: 3617692013

------------------------------------------------------------------------------------
| Id  | Operation         | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |      1 |        |      3 |00:00:00.34 |     303K|
|*  1 |  TABLE ACCESS FULL| T1   |      1 |    500K|      3 |00:00:00.34 |     303K|
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter((:B0 IS NULL OR "C200000020" LIKE :B0||'%'))

Specifying that the B0 bind variable IS NULL did not help either, but notice the change in the predicted cardinality (the 100,000 rows increased to 500,000 rows).

Let’s try an index hint:

SELECT /*+ GATHER_PLAN_STATISTICS INDEX(T1 IND_T1) */
  *
FROM
  T1
WHERE
  C200000020 LIKE :B0 || '%'
  OR :B0 IS NULL;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

SQL_ID  a241dy7mvudtk, child number 0
-------------------------------------
SELECT /*+ GATHER_PLAN_STATISTICS INDEX(T1 IND_T1) */   * FROM   T1
WHERE   C200000020 LIKE :B0 || '%'   OR :B0 IS NULL

Plan hash value: 3617692013

------------------------------------------------------------------------------------
| Id  | Operation         | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |      1 |        |      3 |00:00:00.34 |     303K|
|*  1 |  TABLE ACCESS FULL| T1   |      1 |    500K|      3 |00:00:00.34 |     303K|
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter((:B0 IS NULL OR "C200000020" LIKE :B0||'%'))

The optimizer still did not use the index.

—————–

Cases where Oracle’s optimizer ignores index hints are typically indications of bugs in the optimizer – as we saw, the optimizer did not even consider (generate a calculated cost for) an index access path when no space appeared between the two ‘ characters in the original SQL statement.  Other cases of Oracle’s optimizer ignoring hints may be found here: Demonstration of Oracle “Ignoring” an Index Hint.








Follow

Get every new post delivered to your Inbox.

Join 147 other followers