Non-Specific Index Hints

24 01 2012

January 24, 2012 (Modified January 25, 2012)

As I read the “Troubleshooting Oracle Performance” book for the second time a couple of months ago, I made note on page 385 that it was possible to specify table columns in an index hint, rather than specifying specific index names (or just specifying the table name).  This might be useful, for example, if you have tables in your database with primary key columns with system assigned names for the supporting indexes, and an index hint is needed to correct specific performance issues.  I was again reminded that it was possible to create non-specific index hints that specify table columns when a recent quiz was posted that asked to find specific cases where the behavior is other than expected with the newer index hint syntax.

As an example of the newer syntax, I put together a brief demonstration.  First, the test table and index creation script:

CREATE TABLE T2 (
  C1 NUMBER,
  C2 NUMBER,
  C3 VARCHAR2(100),
  PRIMARY KEY(C1));

INSERT INTO
  T2
SELECT
  ROWNUM C1,
  MOD(ROWNUM-1,20)+1 C2,
  RPAD('A',100,'A')
FROM
  DUAL
CONNECT BY
  LEVEL<=100000;

COMMIT;

CREATE INDEX IND_T2_C2 ON T2(C2);

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T2',CASCADE=>TRUE)

SET AUTOTRACE TRACEONLY EXPLAIN
SET LINESIZE 120
SET PAGESIZE 1000 

Let’s try a simple query that specifies columns C1 and C2 in the WHERE clause:

SELECT
  C1,
  C2,
  C3
FROM
  T2
WHERE
  C1 BETWEEN 1 AND 10000
  AND C2=1;

Plan hash value: 906133967

------------------------------------------------------------------------------------------------
| Id  | Operation                        | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                 |             |   500 | 54500 |   121   (1)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID     | T2          |   500 | 54500 |   121   (1)| 00:00:01 |
|   2 |   BITMAP CONVERSION TO ROWIDS    |             |       |       |            |          |
|   3 |    BITMAP AND                    |             |       |       |            |          |
|   4 |     BITMAP CONVERSION FROM ROWIDS|             |       |       |            |          |
|*  5 |      INDEX RANGE SCAN            | IND_T2_C2   |       |       |    10   (0)| 00:00:01 |
|   6 |     BITMAP CONVERSION FROM ROWIDS|             |       |       |            |          |
|   7 |      SORT ORDER BY               |             |       |       |            |          |
|*  8 |       INDEX RANGE SCAN           | SYS_C008661 |       |       |    20   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   5 - access("C2"=1)
   8 - access("C1">=1 AND "C1"<=10000) 

The above shows that if this SQL statement were actually executed, the index IND_T2_C2, and the index SYS_C008661 (that is used to help enforce the primary key) would be used when executing the SQL statement.  You will only see the above execution plan in the Enterprise Edition of Oracle Database.  Let’s try again with a hint that prohibits the BITMAP CONVERSION TO FROM ROWIDS operation:

SELECT /*+ OPT_PARAM('_B_TREE_BITMAP_PLANS','FALSE') */
  C1,
  C2,
  C3
FROM
  T2
WHERE
  C1 BETWEEN 1 AND 10000
  AND C2=1;

Execution Plan
----------------------------------------------------------
Plan hash value: 1513984157

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |   500 | 54500 |   131   (2)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T2   |   500 | 54500 |   131   (2)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("C2"=1 AND "C1"<=10000 AND "C1">=1) 

The above execution plan shows that without the ability to perform the  the BITMAP CONVERSION TO FROM ROWIDS operation to allow the BITMAP AND operation, a full table scan was selected, so this is a case where the Standard Edition of Oracle Database and the Enterprise Edition might exhibit different execution performance.

Let’s use the previous SQL statement with hint as a starting point, and use a non-specific index hint to instruct the optimizer to use an index on the T2 table:

SELECT /*+ INDEX(T2) OPT_PARAM('_B_TREE_BITMAP_PLANS','FALSE') */
  C1,
  C2,
  C3
FROM
  T2
WHERE
  C1 BETWEEN 1 AND 10000
  AND C2=1;

Plan hash value: 3350885058

-------------------------------------------------------------------------------------------
| Id  | Operation                   | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |             |   500 | 54500 |   179   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS BY INDEX ROWID| T2          |   500 | 54500 |   179   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | SYS_C008661 | 10000 |       |    20   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("C2"=1)
   2 - access("C1">=1 AND "C1"<=10000) 

As shown by the above, the optimizer selected to use the SYS_C008661 primary key index, which would require it to retrieve 10,000 ROWID values from the index, rather than using the index on the T2 column which would have retrieved 5,000 (100,000 * 1/20) ROWIDs from the IND_T2_C2 index (the clustering factor of the IND_T2_C2 index was likely the deciding factor).  Let’s specifically request (demand) that the optimizer use the IND_T2_C2 index by adjusting the hint:

SELECT /*+ INDEX(T2 IND_T2_C2) OPT_PARAM('_B_TREE_BITMAP_PLANS','FALSE') */
  C1,
  C2,
  C3
FROM
  T2
WHERE
  C1 BETWEEN 1 AND 10000
  AND C2=1;

Plan hash value: 174424276

-----------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |   500 | 54500 |  1596   (1)| 00:00:01 |
|*  1 |  TABLE ACCESS BY INDEX ROWID| T2        |   500 | 54500 |  1596   (1)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IND_T2_C2 |  5000 |       |    10   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("C1"<=10000 AND "C1">=1)
   2 - access("C2"=1) 

Notice in the above that the IND_T2_C2 index was selected, is expected to return 5,000 ROWID values from the index, and the execution plan now has a calculated cost of 1,596.  The calculated cost is a simple explanation why the optimizer did not select to use this index automatically.

Now, consider a situation where the optimizer insists on using the IND_T2_C2 index, rather than the SYS_C008661 primary key index for this SQL statement, which could happen if the CLUSTERING_FACTOR of the indexes are not set correctly:

SET AUTOTRACE OFF

SELECT
  INDEX_NAME,
  CLUSTERING_FACTOR
FROM
  USER_INDEXES
WHERE
  TABLE_NAME='T2';

INDEX_NAME   CLUSTERING_FACTOR
------------ -----------------
SYS_C008661               1585
IND_T2_C2                31700

EXEC DBMS_STATS.SET_INDEX_STATS (OWNNAME=>USER,INDNAME=>'IND_T2_C2',CLSTFCT=>1585,NO_INVALIDATE=>FALSE)

SELECT
  INDEX_NAME,
  CLUSTERING_FACTOR
FROM
  USER_INDEXES
WHERE
  TABLE_NAME='T2';

INDEX_NAME   CLUSTERING_FACTOR
------------ -----------------
SYS_C008661               1585
IND_T2_C2                 1585 

We started with the primary key index having a clustering factor of 1,585 and the IND_T2_C2 index having a clustering factor of 31,700.  After using DBMS_STATS.SET_INDEX_STATS, the optimizer is convinced that the IND_T2_C2 index also has a clusting factor of 1,585.  Now both of the indexes have the same CLUSTERING_FACTOR statistic value, what happens if we execute the SQL statement again that specifies an index should be used to access table T1, but does not specify the exact index name?:

SELECT /*+ INDEX(T2) OPT_PARAM('_B_TREE_BITMAP_PLANS','FALSE') */
  C1,
  C2,
  C3
FROM
  T2
WHERE
  C1 BETWEEN 1 AND 10000
  AND C2=1;

Execution Plan
----------------------------------------------------------
Plan hash value: 174424276

-----------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |   500 | 54500 |    90   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS BY INDEX ROWID| T2        |   500 | 54500 |    90   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IND_T2_C2 |  5000 |       |    10   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("C1"<=10000 AND "C1">=1)
   2 - access("C2"=1) 

So, now that the CLUSTERING_FACTOR values are the same for both indexes, the optimizer is selecting the IND_T2_C2 index for the SQL statement.  We have successfully painted ourselves into a corner, telling the optimizer that the IND_T2_C2 index really is helpful for this SQL statement.  We are fairly certain that the IND_T2_C2 index is not ideal, and that the the SYS_C008661 primary key index is a better choice if an index access path is determined to be better than a full table scan, based on the order in which the data was inserted into the table’s blocks.  The problem now is that the SYS_C008661 index name is not consistent from one database to the next, or even in the same database if you drop table T2 and repeat the test.  How do we tell the optimizer to pick the index that is on the primary key column, column C1?  Oracle Database 10.1 introduced new syntax for index hints, which allows us to specify the column name(s), rather than the index name in the hint:

SELECT /*+ INDEX(T2 (C1)) OPT_PARAM('_B_TREE_BITMAP_PLANS','FALSE') */
  C1,
  C2,
  C3
FROM
  T2
WHERE
  C1 BETWEEN 1 AND 10000
  AND C2=1;

Plan hash value: 3350885058

-------------------------------------------------------------------------------------------
| Id  | Operation                   | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |             |   500 | 54500 |   179   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS BY INDEX ROWID| T2          |   500 | 54500 |   179   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | SYS_C008661 | 10000 |       |    20   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("C2"=1)
   2 - access("C1">=1 AND "C1"<=10000) 

As can be seen by the above, the index on column C1, which happens to be the primary key index, was selected by the optimizer at the request of the index hint.

Let’s take a look at the documentation for Oracle Database 10.2:

“Hints that specify an index can use either a simple index name or a parenthesized list of columns as follows:

index (table (column))
  • table specifies the name
  • columnspecifies the name of a column in the specified table
    • The columns can optionally be prefixed with table qualifiers allowing the hint to specify bitmap join indexes where the index columns are on a different table than the indexed table. If tables qualifiers are present, they must be base tables, not aliases in the query.
    • Each column in an index specification must be a base column in the specified table, not an expression. Function-based indexes cannot be hinted using a column specification unless the columns specified in the index specification form the prefix of a function-based index.
  • index specifies an index name

The hint is resolved as follows:

  • If an index name is specified, only that index is considered.
  • If a column list is specified and an index exists whose columns match the specified columns in number and order, only that index is considered. If no such index exists, then any index on the table with the specified columns as the prefix in the order specified is considered. In either case, the behavior is exactly as if the user had specified the same hint individually on all the matching indexes.

For example, in Example 16-3 the job_history table has a single-column index on the employee_id column and a concatenated index on employee_id and start_date columns. To specifically instruct the optimizer on index use, the query can be hinted as follows:

SELECT /*+ INDEX(v.j jhist_employee_ix (employee_id start_date)) */ * FROM v;

That is almost crystal clear (I probably poorly translated the diagram), even though it seems that the cost calculation might not have an impact (“If no such index exists, then any index on the table with the specified columns as the prefix in the order specified is considered“).  If you have the sample database schema loaded into a database, jump to the view definition in Example 16-3, create the view, and then see if the index hint in the documentation (at the bottom of the above quote block) actually works – it did not work for me.  The same hint example is found in the Oracle Database 11.2 documentation.

However, in the example above, where I used the hint syntax INDEX (TABLE (COLUMN)), the hint worked as expected.  Interestingly, the hint reference for Oracle Database 11.2 does not specifically mention this new syntax, and that might be why I was slow to recognize the new hint syntax.

Now that the basics are covered, let’s see if we are able to confuse the optimizer with index hints, taking up the challenge proposed by the blog article mentioned at the start of this article.  Below is a slightly modified version of the test script that I posted as a comment in the other blog article (with an additional column in the table, an additional index, and an enabled 10053 trace).  First, we will create the table, a couple of indexes on the table, change a couple of SQL*Plus parameters, and then enable a 10053 trace:

CREATE TABLE T1 (
  C1 NUMBER NOT NULL,
  C2 NUMBER NOT NULL,
  C3 VARCHAR2(30) NOT NULL,
  C4 VARCHAR2(200),
  C5 VARCHAR2(10));

INSERT INTO T1
SELECT
  MOD(ROWNUM-1, 90) * 4 C1,
  ROWNUM - 1 C2,
  TO_CHAR(ROWNUM - 1, 'RN') C3,
  LPAD('A',200,'A') C4,
  LPAD('B',10,'B') C5
FROM
  DUAL
CONNECT BY
  LEVEL<=1000000;

CREATE INDEX IND_T1_C1_C2 ON T1(C1,C2);
CREATE INDEX IND_T1_C1_C2_C5 ON T1(C1,C2,C5);
CREATE INDEX IND_T1_C2_C1_C3 ON T1(C2,C1,C3);
CREATE INDEX IND_T1_C3_C1_C2 ON T1(C3,C1,C2);

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T1',CASCADE=>TRUE)

SET AUTOTRACE TRACEONLY EXPLAIN
SET LINESIZE 120
SET PAGESIZE 1000
ALTER SESSION SET EVENTS '10053 TRACE NAME CONTEXT FOREVER, LEVEL 1';

Let’s begin the process of trying to confuse the Oracle Database 11.2.0.2 optimizer.  Unhinted, the following query accesses the index on columns C2, C1, and C3 to avoid accessing the table, and uses an INDEX FAST FULL SCAN operation that employs multi-block reads :

ALTER SESSION SET TRACEFILE_IDENTIFIER = 'UNHINTED';

SELECT
  C1,
  C2,
  C3
FROM
  T1;

Plan hash value: 2374279026

----------------------------------------------------------------------------------------
| Id  | Operation            | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |                 |  1000K|    23M|   387   (2)| 00:00:01 |
|   1 |  INDEX FAST FULL SCAN| IND_T1_C2_C1_C3 |  1000K|    23M|   387   (2)| 00:00:01 |
----------------------------------------------------------------------------------------

Let’s hint the optimizer to use the index on the columns C1 and C2:

ALTER SESSION SET TRACEFILE_IDENTIFIER = 'HINT_C1_C2';

SELECT /*+ INDEX(T1 (C1 C2)) */
  C1,
  C2,
  C3
FROM
  T1;

Plan hash value: 3388050039
--------------------------------------------------------------------------------------------
| Id  | Operation                   | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |              |  1000K|    23M|  1012K  (1)| 00:06:46 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1           |  1000K|    23M|  1012K  (1)| 00:06:46 |
|   2 |   INDEX FULL SCAN           | IND_T1_C1_C2 |  1000K|       |  3026  (10)| 00:00:02 |
--------------------------------------------------------------------------------------------

In the above, the optimizer obeyed the hint, even though the calculated cost from the unhinted plan increased from 387 to 1,012,000, and the optimizer selected to perform an INDEX FULL SCAN operation.

Let’s reverse the order of the columns in the index hint:

ALTER SESSION SET TRACEFILE_IDENTIFIER = 'HINT_C2_C1';

SELECT /*+ INDEX(T1 (C2 C1)) */
  C1,
  C2,
  C3
FROM
  T1;

Execution Plan
----------------------------------------------------------
Plan hash value: 1746297295

------------------------------------------------------------------------------------
| Id  | Operation        | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |                 |  1000K|    23M|  4851   (1)| 00:00:01 |
|   1 |  INDEX FULL SCAN | IND_T1_C2_C1_C3 |  1000K|    23M|  4851   (1)| 00:00:01 |
------------------------------------------------------------------------------------

In the above, note that the index on columns C2, C1, and C3 was used, but the cost is now calculated at 4,851 rather than 387 as it was in the unhinted plan. The INDEX FAST FULL SCAN operation is now shown as an INDEX FULL SCAN operation.  So, we have now convinced the optimizer to use an access path that employs single block reads of an index rather than multi-block reads of an index, simply by telling the optimizer to use the index that it would have used without the hint.  Have we confused the optimizer already?  Someone should probably take a look at the 10053 trace files.  :-)

We have an index on columns C2, C1, and C3, but we also have an index on columns C3, C1, and C2. What happens when we specify the columns C3, C1, and C2 in the index hint in that order?

ALTER SESSION SET TRACEFILE_IDENTIFIER = 'HINT_C3_C1_C2';

SELECT /*+ INDEX(T1 (C3 C1 C2)) */
  C1,
  C2,
  C3
FROM
  T1;

Plan hash value: 2273443829

------------------------------------------------------------------------------------
| Id  | Operation        | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |                 |  1000K|    23M|  4943   (1)| 00:00:01 |
|   1 |  INDEX FULL SCAN | IND_T1_C3_C1_C2 |  1000K|    23M|  4943   (1)| 00:00:01 |
------------------------------------------------------------------------------------

An index full scan was selected to be performed on the IND_T1_C3_C1_C2 index with a calculated cost of 4,943, rather than using the IND_T1_C2_C1_C3 index that previously resulted in a cost of 4,851 – so the optimizer will not alter the order of the columns in the index hint to reduce the calculated cost.

Let’s try another example where we select all of the columns that are present in the IND_T1_C1_C2_C5 index – will the optimizer use that index, or will it obey the expected behavior of the index hint?:

ALTER SESSION SET TRACEFILE_IDENTIFIER = 'HINT_C1_C2_WC5';

SELECT /*+ INDEX(T1 (C1 C2)) */
  C1,
  C2,
  C5
FROM
  T1;

Execution Plan
----------------------------------------------------------
Plan hash value: 3388050039

--------------------------------------------------------------------------------------------
| Id  | Operation                   | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |              |  1000K|    19M|  1003K  (1)| 00:00:23 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1           |  1000K|    19M|  1003K  (1)| 00:00:23 |
|   2 |   INDEX FULL SCAN           | IND_T1_C1_C2 |  1000K|       |  2750   (1)| 00:00:01 |
--------------------------------------------------------------------------------------------

The index that exactly included the columns specified in the index hint was selected, with a calculated cost of 1,003,000.

So, what happens if we are less specific in the index hint, and just list the first column, where there are two composite indexes that start with the specified column:

ALTER SESSION SET TRACEFILE_IDENTIFIER = 'HINT_C1_WC2_C5';

SELECT /*+ INDEX(T1 (C1)) */
  C1,
  C2,
  C5
FROM
  T1;

Execution Plan
----------------------------------------------------------
Plan hash value: 2942389535

------------------------------------------------------------------------------------
| Id  | Operation        | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |                 |  1000K|    19M|  4293   (1)| 00:00:01 |
|   1 |  INDEX FULL SCAN | IND_T1_C1_C2_C5 |  1000K|    19M|  4293   (1)| 00:00:01 |
------------------------------------------------------------------------------------

As shown in the above execution plan, the optimizer arrived at a cost of 4,293 by selecting to use the IND_T1_C1_C2_C5 index, which avoided the access to the T1 table.  So, sometimes the performance will be better with slightly less specific hints.

Out of curiosity, what do you think will happen if we completely eliminate the index hint?  Let’s test:

ALTER SESSION SET TRACEFILE_IDENTIFIER = 'UNHINTED_C1_C2_C5';

SELECT
  C1,
  C2,
  C5
FROM
  T1;

Execution Plan
----------------------------------------------------------
Plan hash value: 2722951733

----------------------------------------------------------------------------------------
| Id  | Operation            | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |                 |  1000K|    19M|   343   (3)| 00:00:01 |
|   1 |  INDEX FAST FULL SCAN| IND_T1_C1_C2_C5 |  1000K|    19M|   343   (3)| 00:00:01 |
----------------------------------------------------------------------------------------

In the above execution plan, the IND_T1_C1_C2_C5 index is still selected for use as it was in the previous execution plan, but notice that the access path has changed to an INDEX FAST FULL SCAN operation and the calculated cost dropped from 4,293 to 343.  So, the above output implies that sometimes the performance will be better if we simply do not hint an index access path, if the index access path would have been selected otherwise.  Someone want to take a look at the 10053 trace files and explain why?

If we add a WHERE clause that places a restriction on column C2 to be less than 10, the optimizer could use a couple of different access paths. Let’s specify the columns C1 and C2 in the index hint to see which index is selected:

ALTER SESSION SET TRACEFILE_IDENTIFIER = 'HINT_C1_C2_C2_WHERE';

SELECT /*+ INDEX(T1 (C1 C2)) */
  C1,
  C2,
  C3
FROM
  T1
WHERE
   C2<10;

Plan hash value: 1883798457

--------------------------------------------------------------------------------------------
| Id  | Operation                   | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |              |    10 |   250 |   103   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1           |    10 |   250 |   103   (0)| 00:00:01 |
|*  2 |   INDEX SKIP SCAN           | IND_T1_C1_C2 |    10 |       |    92   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C2"<10)
       filter("C2"<10)

In the above, an INDEX SKIP SCAN operation was selected because the index with columns C1 and C2 was specified in the hint.

Let’s try another example that possibly might be considered a case where the optimizer disobeys the hint or is free to change the order of the columns specified in the index hint (this might be incorrectly considered an edge case):

ALTER SESSION SET TRACEFILE_IDENTIFIER = 'HINT_C1_C2_C3_C2_WHERE';

SELECT /*+ INDEX(T1 (C1 C2 C3)) */
  C1,
  C2,
  C3
FROM
  T1
WHERE
  C2<10;

Plan hash value: 4150417361

------------------------------------------------------------------------------------
| Id  | Operation        | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |                 |    10 |   250 |     3   (0)| 00:00:01 |
|*  1 |  INDEX RANGE SCAN| IND_T1_C2_C1_C3 |    10 |   250 |     3   (0)| 00:00:01 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("C2"<10)

A quick peek at the above output might suggest that the optimizer could potentially decide to locate an index with columns C1, C2, and C3 in any order – but I do not believe that this is the case. I believe that the optimizer considered the index hint specified in the SQL statement as being invalid (a check of the 10053 trace might confirm).

Let’s create another index and then repeat the above SQL statement:

CREATE INDEX IND_T1_C1_C2_C3 ON T1(C1,C2,C3);

ALTER SESSION SET TRACEFILE_IDENTIFIER = 'HINT_C1_C2_C3_C2_WHERE2';

SELECT /*+ INDEX(T1 (C1 C2 C3)) */
  C1,
  C2,
  C3
FROM
  T1
WHERE
  C2<10;

Plan hash value: 212907557

------------------------------------------------------------------------------------
| Id  | Operation        | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |                 |    10 |   250 |    92   (0)| 00:00:01 |
|*  1 |  INDEX SKIP SCAN | IND_T1_C1_C2_C3 |    10 |   250 |    92   (0)| 00:00:01 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("C2"<10)
       filter("C2"<10)

The index with the columns that matched the order of the columns in the index hint was selected, even though the calculated cost would have been lower if the optimizer were permitted to select any index with the columns listed in the index hint.

What about a case where there is an exact match between an index definition and an index hint, and there is also another index with one additional column which would avoid the table access:

ALTER SESSION SET TRACEFILE_IDENTIFIER = 'HINT_C1_C2_C2_WHERE2';

SELECT /*+ INDEX(T1 (C1 C2)) */
  C1,
  C2,
  C3
FROM
  T1
WHERE
  C2<10;

Plan hash value: 1883798457

--------------------------------------------------------------------------------------------
| Id  | Operation                   | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |              |    10 |   250 |   103   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1           |    10 |   250 |   103   (0)| 00:00:01 |
|*  2 |   INDEX SKIP SCAN           | IND_T1_C1_C2 |    10 |       |    92   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C2"<10)
       filter("C2"<10)

The index that exactly matched the index hint was selected by the optimizer.

What if we only specify in the index hint a leading column, when there are two indexes with that leading column, one of which allows the optimizer to avoid the table access:

ALTER SESSION SET TRACEFILE_IDENTIFIER = 'HINT_C1_C2_WHERE';

SELECT /*+ INDEX(T1 (C1)) */
  C1,
  C2,
  C3
FROM
  T1
WHERE
  C2<10;

Plan hash value: 212907557

------------------------------------------------------------------------------------
| Id  | Operation        | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |                 |    10 |   250 |    92   (0)| 00:00:01 |
|*  1 |  INDEX SKIP SCAN | IND_T1_C1_C2_C3 |    10 |   250 |    92   (0)| 00:00:01 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("C2"<10)
       filter("C2"<10)

The optimizer selected the lowest cost access path from the two indexes that matched the hint.

What if we specify a column in the index hint that is not listed in the SELECT or WHERE clauses?

ALTER SESSION SET TRACEFILE_IDENTIFIER = 'HINT_C3_C2_WHERE';

SELECT /*+ INDEX(T1 (C3)) */
  C1,
  C2
FROM
  T1
WHERE
  C2<10;

Plan hash value: 1328421701

------------------------------------------------------------------------------------
| Id  | Operation        | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |                 |    10 |    90 |  4039   (1)| 00:00:02 |
|*  1 |  INDEX SKIP SCAN | IND_T1_C3_C1_C2 |    10 |    90 |  4039   (1)| 00:00:02 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("C2"<10)
       filter("C2"<10)

The above shows the original execution plan (captured a couple of days ago without column C5 in the table definition, without index IND_T1_C1_C2_C5, and without an enabled 10053 trace), which shows that the optimizer still obeyed the intention of the hint – it found an index that started with the specified column and selected to perform an INDEX SKIP SCAN operation even though column C2, specified in the WHERE clause, is the third column in the index definition.

As luck would have it, with the slightly altered table definition (and possibly different statistics estimates) the execution plan has changed to use an INDEX FULL SCAN operation rather than an INDEX SKIP SCAN operation, and now the execution plan has a higher calculated cost.  So, why did the execution plan change from an INDEX SKIP SCAN?  Here is the current execution plan:

Execution Plan
----------------------------------------------------------
Plan hash value: 2273443829

------------------------------------------------------------------------------------
| Id  | Operation        | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |                 |    10 |    90 |  4945   (1)| 00:00:01 |
|*  1 |  INDEX FULL SCAN | IND_T1_C3_C1_C2 |    10 |    90 |  4945   (1)| 00:00:01 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("C2"<10)
       filter("C2"<10)

So, for the challenge, try to explain the oddities that I pointed out above.  I have not yet reviewed the 10053 trace files, but I will later.








Follow

Get every new post delivered to your Inbox.

Join 147 other followers