Repeat After Me: NULL Values are Not Stored in Indexes?

28 02 2012

February 28, 2012

I do not always get as much benefit from the books that I read as the books’ authors probably intended, although the contents of books, whether right or wrong, sometimes help me remember nearly forgotten facts.  Some of the books, for instance “Pro Oracle SQL” describe how to use b*tree indexes to locate rows with NULL values in a column.  The book “Expert Oracle Database Architecture: Oracle Database Programming 9i, 10g, and 11g Techniques and Solutions, Second Edition” also shows several techniques for utilizing b*tree indexes to locate rows with NULL values in a column.  Richard Foote’s blog also has at least one article that describes how to use b*tree indexes to locate NULL values in a column.

So, what headed me down the path of NULL Values are Not Stored in Indexes?  I read the following sentence in the book “Oracle Database 11gR2 Performance Tuning Cookbook” on page 177:

“NULL values are not stored in indexes, so when we query for records with a NULL value in field X, even if the X column is indexed, the index will not be used.”

I can’t help but feel that I have seen very similar statements in the past.  The Oracle Database documentation, at least from 8.1 through 11.1 included the following quote:

“The query must be guaranteed not to need any NULL values from the indexed expression, because NULL values are not stored in indexes.”

But where else have I seen that message? A Google search for: oracle NULL values are not stored in indexes found a couple of places where the message is repeated.

Oracle Database 11g: The Complete Reference” (I read the 9i version of this book years ago):

NULL values are not stored in indexes. Therefore, the following query will not use an index; there is no way the index could help to resolve the query:”

select Title
  from BOOKSHELF
 where CategoryName is null;

Oracle 9i Performance Tuning Tips & Techniques” page 39 (I quickly paged through the 10g book in a book store once, and read the sample chapter on Statspack reports):

“Using IS NULL or IS NOT NULL will also suppress index use because the value of NULL is undefined.”

Oracle Data Warehouse Tuning for 10g” page 51 (I have not had the opportunity to read this book):

“For a BTree index, NULL values are not included in the BTree structure and, thus, not even accessible through the index.”

Expert Indexing in Oracle Database 11g” page 159 (any problems here – I think that I have a test case somewhere that suggests that the second column should be a number, I thought about buying this book):

“If all index columns are NULL, Oracle Database doesn’t include rows into an index. However, you can actually index NULL values by simply adding another column to the index, like so:

SQL> create index with_null on employees(nullable_column, '1');

So, what is my review comment for the Cookbook?

The book states, “NULL values are not stored in indexes, so when we query for records with a NULL value in field X, even if the X column is indexed, the index will not be used.”  The book’s description is incomplete.  NULL values are not stored in single column b*tree indexes.  There are at least four methods to work around this issue and allow indexes to be used to identify rows with a NULL value in the indexed column:

  1. Define a composite index with at least one other column that has a NOT NULL constraint – ideally, the column in which the NULL values might appear would be the leading column in the composite index.
  2. Define a composite index with a numeric constant (such as 1) as the second column in the composite index.
  3. Bitmap indexes always store NULL values – if appropriate (column experiences few updates, deletes, inserts, and an Enterprise Edition database), create a bitmap index for the column.
  4. If the number of NULL values in a column will be relatively small (compared to the number of rows in the table), and the original SQL statement may be modified, create a function based index that converts NULL values to 1 and non-NULL values to NULL:
    • DECODE(C3,NULL,1)
    • (CASE WHEN C3 IS NULL THEN 1 ELSE NULL END)
    • (NVL2(C3,NULL,1))

Let’s build a little test case to demonstrate.  First, a table is created with 1,000,000 rows, two indexes are created, and then statistics are gathered with histograms generated for all indexed columns:

CREATE TABLE T2 (
  C1 NUMBER NOT NULL,
  C2 VARCHAR2(15) NOT NULL,
  C3 DATE,
  C4 VARCHAR2(255));

INSERT INTO T2
SELECT
  ROWNUM C1,
  RPAD(CHR(MOD(ROWNUM-1,26)+65),15,'0') C2,
  DECODE(MOD(ROWNUM,100),0,NULL,SYSDATE+ROWNUM/10000) C3,
  RPAD('A',255,'A') C4
FROM
  DUAL
CONNECT BY
  LEVEL<=1000000;

COMMIT;

CREATE INDEX IND_T2_C3_C2 ON T2(C3,C2);
CREATE INDEX IND_T2_C3_C ON T2(C3,1);

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T2',CASCADE=>TRUE,ESTIMATE_PERCENT=>100,METHOD_OPT=>'FOR ALL INDEXED COLUMNS SIZE 254') 

In the above table, 1% of the rows will have a NULL value in column C3.

Let’s try the first test to see if one of the above two indexes may be used to locate the NULL values in column C3:

SELECT
  C1,
  C2,
  C3
FROM
  T2
WHERE
  C2='D00000000000000'
  AND C3 IS NULL;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'TYPICAL'));

SQL_ID  3sjqqpjradbgz, child number 0
-------------------------------------
SELECT   C1,   C2,   C3 FROM   T2 WHERE   C2='D00000000000000'   AND C3
IS NULL

Plan hash value: 895813321

--------------------------------------------------------------------------------------------
| Id  | Operation                   | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |              |       |       |   389 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID| T2           |   385 | 14245 |   389   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IND_T2_C3_C2 |   385 |       |     4   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C3" IS NULL AND "C2"='D00000000000000')
       filter("C2"='D00000000000000') 

Well, it appears that the composite index on columns C3 and C2 might have helped quickly locate the rows with NULL values in column C3 (we just tested point #1 above).  Let’s drop that index and try again:

DROP INDEX IND_T2_C3_C2;

SELECT
  C1,
  C2,
  C3
FROM
  T2
WHERE
  C2='D00000000000000'
  AND C3 IS NULL;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'TYPICAL'));

SQL_ID  3sjqqpjradbgz, child number 0
-------------------------------------
SELECT   C1,   C2,   C3 FROM   T2 WHERE   C2='D00000000000000'   AND C3
IS NULL

Plan hash value: 1053304445

-------------------------------------------------------------------------------------------
| Id  | Operation                   | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |             |       |       |   550 (100)|          |
|*  1 |  TABLE ACCESS BY INDEX ROWID| T2          |   385 | 14245 |   550   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IND_T2_C3_C | 10000 |       |    33   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("C2"='D00000000000000')
   2 - access("C3" IS NULL) 

This time we cannot simply state that “NULL Values are Not Stored in Indexes”.  The Predicate Information section of the execution plan shows access(“C3” IS NULL) (we just tested point #2 above).

Let’s drop the index and try something else:

DROP INDEX IND_T2_C3_C;

CREATE BITMAP INDEX IND_T2_C3_BIN ON T2(C3);

SELECT
  C1,
  C2,
  C3
FROM
  T2
WHERE
  C2='D00000000000000'
  AND C3 IS NULL;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'TYPICAL'));

SQL_ID  3sjqqpjradbgz, child number 0
-------------------------------------
SELECT   C1,   C2,   C3 FROM   T2 WHERE   C2='D00000000000000'   AND C3
IS NULL

Plan hash value: 1153509852

----------------------------------------------------------------------------------------------
| Id  | Operation                    | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |               |       |       |  1954 (100)|          |
|*  1 |  TABLE ACCESS BY INDEX ROWID | T2            |   385 | 14245 |  1954   (0)| 00:00:01 |
|   2 |   BITMAP CONVERSION TO ROWIDS|               |       |       |            |          |
|*  3 |    BITMAP INDEX SINGLE VALUE | IND_T2_C3_BIN |       |       |            |          |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("C2"='D00000000000000')
   3 - access("C3" IS NULL) 

Once again, we found that we cannot simply state that “NULL Values are Not Stored in Indexes”.  The Predicate Information section of the execution plan shows access(“C3” IS NULL) (we just tested point #3 above).

Let’s drop the index and try just indexing the NULLs (sub-point 1 of point #4 above):

DROP INDEX IND_T2_C3_BIN;

CREATE INDEX IND_T2_C3_FN ON T2 DECODE(C3,NULL,1);

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T2',CASCADE=>TRUE,ESTIMATE_PERCENT=>100,METHOD_OPT=>'FOR ALL HIDDEN COLUMNS SIZE 254 FOR ALL INDEXED COLUMNS SIZE 254',NO_INVALIDATE=>FALSE)

SELECT /*+ INDEX(T2) */
  C1,
  C2,
  C3,
  NVL2(C3,NULL,1) C4
FROM
  T2
WHERE
  C2='D00000000000000'
  AND DECODE(C3,NULL,1)=1;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'TYPICAL'));

SQL_ID  8psj7gcwcn72m, child number 0
-------------------------------------
SELECT /*+ INDEX(T2) */   C1,   C2,   C3,   NVL2(C3,NULL,1) C4 FROM
T2 WHERE   C2='D00000000000000'   AND DECODE(C3,NULL,1)=1

Plan hash value: 2303772603

--------------------------------------------------------------------------------------------
| Id  | Operation                   | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |              |       |       |  3752 (100)|          |
|*  1 |  TABLE ACCESS BY INDEX ROWID| T2           |   385 | 14245 |  3752   (1)| 00:00:01 |
|*  2 |   INDEX FULL SCAN           | IND_T2_C3_FN | 10000 |       |  3234   (2)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("C2"='D00000000000000')
   2 - filter(DECODE(INTERNAL_FUNCTION("C3"),NULL,1)=1) 

The index that we created was used.  Once again, we found that we cannot simply state that “NULL Values are Not Stored in Indexes”.

Let’s drop the index and try just indexing the NULLs (sub-point 2 of point #4 above):

DROP INDEX IND_T2_C3_FN;

CREATE INDEX IND_T2_C3_FN ON T2 (CASE WHEN C3 IS NULL THEN 1 ELSE NULL END);

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T2',CASCADE=>TRUE,ESTIMATE_PERCENT=>100,METHOD_OPT=>'FOR ALL HIDDEN COLUMNS SIZE 254 FOR ALL INDEXED COLUMNS SIZE 254',NO_INVALIDATE=>FALSE)

SELECT /*+ INDEX(T2) */
  C1,
  C2,
  C3,
  NVL2(C3,NULL,1) C4
FROM
  T2
WHERE
  C2='D00000000000000'
  AND (CASE WHEN C3 IS NULL THEN 1 ELSE NULL END)=1;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'TYPICAL'));

SQL_ID  2w9h5jm4tdzpz, child number 0
-------------------------------------
SELECT /*+ INDEX(T2) */   C1,   C2,   C3,   NVL2(C3,NULL,1) C4 FROM
T2 WHERE   C2='D00000000000000'   AND (CASE WHEN C3 IS NULL THEN 1 ELSE
NULL END)=1

Plan hash value: 941108248

--------------------------------------------------------------------------------------------
| Id  | Operation                   | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |              |       |       | 10024 (100)|          |
|*  1 |  TABLE ACCESS BY INDEX ROWID| T2           |   385 | 15015 | 10024   (1)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IND_T2_C3_FN | 10000 |       |    20   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("C2"='D00000000000000')
   2 - access("T2"."SYS_NC00005$"=1) 

The index that we created was used, although notice that the Predicate Information section of the plan is a bit different from before, and this time we have an INDEX RANGE SCAN operation rather than an INDEX FULL SCAN operation.  Once again, we found that we cannot simply state that “NULL Values are Not Stored in Indexes”.

Let’s drop the index and try just indexing the NULLs (sub-point 3 of point #4 above):

DROP INDEX IND_T2_C3_FN;

CREATE INDEX IND_T2_C3_FN ON T2 NVL2(C3,NULL,1);

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T2',CASCADE=>TRUE,ESTIMATE_PERCENT=>100,METHOD_OPT=>'FOR ALL HIDDEN COLUMNS SIZE 254 FOR ALL INDEXED COLUMNS SIZE 254',NO_INVALIDATE=>FALSE)

SELECT
  C1,
  C2,
  C3,
  NVL2(C3,NULL,1) C4
FROM
  T2
WHERE
  C2='D00000000000000'
  AND NVL2(C3,NULL,1)=1;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'TYPICAL'));

SQL_ID  fmsg8vztz32d2, child number 0
-------------------------------------
SELECT   C1,   C2,   C3,   NVL2(C3,NULL,1) C4 FROM   T2 WHERE
C2='D00000000000000'   AND NVL2(C3,NULL,1)=1

Plan hash value: 1513984157

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |       |       |  3282 (100)|          |
|*  1 |  TABLE ACCESS FULL| T2   |   385 | 14245 |  3282   (1)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(("C2"='D00000000000000' AND NVL2("C3",NULL,1)=1)) 

The index that we created was… not used?  But is it not the case that the NVL2(C3,NULL,1) function result is the equivalent of the DECODE and the CASE function results?  Let’s give this one another try with a hint:

SELECT /*+ INDEX(T2) */
  C1,
  C2,
  C3,
  NVL2(C3,NULL,1) C4
FROM
  T2
WHERE
  C2='D00000000000000'
  AND NVL2(C3,NULL,1)=1;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'TYPICAL'));

SQL_ID  f510adk7fqwyu, child number 0
-------------------------------------
SELECT /*+ INDEX(T2) */   C1,   C2,   C3,   NVL2(C3,NULL,1) C4 FROM
T2 WHERE   C2='D00000000000000'   AND NVL2(C3,NULL,1)=1

Plan hash value: 2303772603

--------------------------------------------------------------------------------------------
| Id  | Operation                   | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |              |       |       |  3740 (100)|          |
|*  1 |  TABLE ACCESS BY INDEX ROWID| T2           |   385 | 14245 |  3740   (1)| 00:00:01 |
|*  2 |   INDEX FULL SCAN           | IND_T2_C3_FN | 10000 |       |  3223   (1)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("C2"='D00000000000000')
   2 - filter(NVL2("C3",NULL,1)=1) 

That’s better, even if we did receive an INDEX FULL SCAN operation rather than in INDEX RANGE SCAN operation as was the intended result.

Something fun to think about – why did the technique using the CASE syntax that resulted in an INDEX RANGE SCAN operation have a plan with a calculated cost of 10,024, when the plans with the INDEX FULL SCAN operations have a calculated cost of about 3,750?


Actions

Information

18 responses

28 02 2012
Oracle

So what is your statement about null values and indexes?
Does it have maybe influence on db version?

28 02 2012
Charles Hooper

I think that Yasser may have already answered your question, but then I may be misunderstanding your question.

In short, indexes may be used to locate rows that contain NULL values in a table column. NULLs may be stored in indexes. Whether or not a specific index should be created or adapted specifically to locate NULL values will depend on the importance of query performance, compared to the use of additional disk space for the index and index maintenance delays during inserts, updates, and deletes of the column data as additional redo and undo is created and the index structure grows.

Method 1 uses a standard composite b*tree index. Method 2 creates a function based composite index with the second column containing a numeric constant (I think that the Enterprise Edition of Oracle Database 8i was the first version that supported function based indexes, and that feature was extended to the Standard Edition in 9i). Method 3 creates a bitmap index – that requires the Enterprise Edition. Method 4 creates a function based index that indexes only cases where a column value is null.

28 02 2012
Yasser

I think up-front statement “NULL values are not stored in indexes” in renowned Oracle book, with author having idea behind the scenes to avoid complexity can sometimes cause reader to feel uncomfortable, but i think we should be thankful to author for publishing the scenarios and making our brains to think in different views and avoid “Compulsive Tuning Disorders”

To me this statement “NULL values are not stored in indexes, so when we query for records with a NULL value in field X, even if the X column is indexed, the index will not be used.” seems to be WRONG and incompletely framed ‘sentence’.

-Yasser

29 02 2012
Narendra

Charles,

Once again, nice demonstration and good question.
why did the technique using the CASE syntax that resulted in an INDEX RANGE SCAN operation have a plan with a calculated cost of 10,024, when the plans with the INDEX FULL SCAN operations have a calculated cost of about 3,750?
I must admit I don’t know the answer but my best guess is the costing has something to do with the way index is being used (access vs. filter). In case of INDEX RANGE SCAN, while the index step has ACCESS predicate (and a low cost of 20) whereas in case of INDEX FULL SCAN, the index step has FILTER predicate (and significantly higher cost of 3234). Till this date, am not quite clear about the difference in ACCESS and FILTER predicates.

29 02 2012
Charles Hooper

Narendra,

I am not sure that I have a nice, short description of access and filter predicates. You might think of it this way, if you think about the index in the front of a book:
Access Predicate: You know that the topic of interest is found in chapter 8, under a heading of “Examining Execution Plans and Plan Statistics” which is in a section of the book from pages 219 through 226 (or 227). You then flip to pages 219 through 226 and start reading.
Filter Predicate: As you are reading through the section, you are discarding sentences that do not contain the exact information that is of interest, and remembering the information that is of interest. (If the index were more detailed, but the specific item of interest was not detailed – you did not know to look specifically under the heading “Examining Execution Plans and Plan Statistics”, then part of this filtering might have taken place while examining the index).

This is the description that is provided on page 222 of the book “Expert Oracle Practices”:

The Predicate Information section of the execution plan shows the conditions applied to indexes and joins during access (the access entries), as well as the restrictions applied to the table data to filter out unwanted data as specified by the conditions in the WHERE clause (the filter entries) and automatically generated predicates added by the optimizer. Filter predicates may appear for plan lines showing index accesses when the leading column(s) of a composite index are not specified in the WHERE clause (resulting in an index skip scan operation in the plan); when intermediate columns of an index are omitted from the WHERE clause, for instance, specifying access criteria for columns 1, 2, and 4 of a composite index, but not column 3, which prevents column 4 from being used as restriction criteria during the access operation; and when a range comparison (less than, greater than, between, and so forth) is used on a composite index’s column, those restrictions placed on columns in the index definition after the column with the range operation may not be examined during the access operation. The filter predicate entries are also able to reveal potentially time-consuming implicit datatype conversions in joining and filtering predicates.

From the Oracle Database documentation:
http://docs.oracle.com/cd/E11882_01/server.112/e16638/ex_plan.htm

ACCESS_PREDICATES: Predicates used to locate rows in an access structure. For example, start or stop predicates for an index range scan.
FILTER_PREDICATES: Predicates used to filter rows before producing them.

Description by Karen Morton:
https://method-r.fogbugz.com/default.asp?method-r.11.552.2

Description from the book “Cost Based Oracle Fundamentals”:
http://books.google.com/books?id=TGSd3pkMx5IC&pg=PA383#v=onepage&q&f=false

Description from the book “Pro Oracle SQL”:
http://books.google.com/books?id=nrwu-f4JwjcC&pg=PA164#v=onepage&q&f=false

Description from the book “Troubleshooting Oracle Performance”:
http://books.google.com/books?id=b3DIkYO2gBQC&pg=PA211#v=onepage&q&f=false

29 02 2012
Narendra

Charles,

With all due respect, I don’t think your analogy explains my issue and neither does the description in the links that you have provided. Your analogy is in-line with what happens in the case of INDEX RANGE SCAN example above. Essentially, I understand that, typically, an ACCESS PREDICATE is applied to the index access step in the plan (as you have described with book index analogy) and the FILTER PREDICATE is applied to the table access step in the plan (as you described in “As you are reading through the section…”).
However, I had pointed out the difference in the way predicates are reported for index access only. In your examples, with DECODE index the plan reports the predicate for index access step as filter predicate (with much higher cost) as compared to the access predicate (with much lower cost) reported when a CASE based index is used.
So far, the only resource that comes close to this is http://jonathanlewis.wordpress.com/2010/08/31/filter-bug/
Would love to know if there are any better or more detailed explanations available.
BTW, apologies for diverting the discussions away from your main question (which, as I said, I don’t know the answer for)

29 02 2012
Charles Hooper

I might not be answering your question yet. However, did you notice that the CASE example that showed an INDEX RANGE SCAN had the following in the Predicate Information section access(“T2”.”SYS_NC00005$”=1) – SYS_NC00005$ is a hidden/virtual column that was automatically generated when the function based index was built:

SQL> SELECT COLUMN_NAME FROM USER_TAB_COLUMNS WHERE TABLE_NAME='T2';

COLUMN_NAME
------------
C1
C2
C3
C4

SQL> SELECT COLUMN_NAME,HIDDEN_COLUMN,VIRTUAL_COLUMN FROM USER_TAB_COLS WHERE TABLE_NAME='T2';

COLUMN_NAME  HID VIR
------------ --- ---
C1           NO  NO
C2           NO  NO
C3           NO  NO
C4           NO  NO
SYS_NC00005$ YES YES

So, why did we not see the same virtual columns with the NVL2(C3,NULL,1) function. I could say that this test result was intentional, or I could say that I accidentally omitted the () around the NVL function when I created the index. Take a look at the output of the following script that first incorrectly formats the index specification, and then correctly formats the index specification:

DROP INDEX IND_T2_C3_FN;
 
CREATE INDEX IND_T2_C3_FN ON T2 NVL2(C3,NULL,1);
 
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T2',CASCADE=>TRUE,ESTIMATE_PERCENT=>100,METHOD_OPT=>'FOR ALL HIDDEN COLUMNS SIZE 254 FOR ALL INDEXED COLUMNS SIZE 254',NO_INVALIDATE=>FALSE)
 
SELECT /*+ INDEX(T2) */
  C1,
  C2,
  C3,
  NVL2(C3,NULL,1) C4
FROM
  T2
WHERE
  C2='D00000000000000'
  AND NVL2(C3,NULL,1)=1;
  
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'TYPICAL'));
  
DROP INDEX IND_T2_C3_FN;
  
CREATE INDEX IND_T2_C3_FN ON T2 (NVL2(C3,NULL,1));
 
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T2',CASCADE=>TRUE,ESTIMATE_PERCENT=>100,METHOD_OPT=>'FOR ALL HIDDEN COLUMNS SIZE 254 FOR ALL INDEXED COLUMNS SIZE 254',NO_INVALIDATE=>FALSE)
 
SELECT /*+ INDEX(T2) */
  C1,
  C2,
  C3,
  NVL2(C3,NULL,1) C4
FROM
  T2
WHERE
  C2='D00000000000000'
  AND NVL2(C3,NULL,1)=1;
 
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'TYPICAL'));
SQL_ID  f510adk7fqwyu, child number 0
-------------------------------------
SELECT /*+ INDEX(T2) */   C1,   C2,   C3,   NVL2(C3,NULL,1) C4 FROM
T2 WHERE   C2='D00000000000000'   AND NVL2(C3,NULL,1)=1
 
Plan hash value: 2303772603
 
--------------------------------------------------------------------------------------------
| Id  | Operation                   | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |              |       |       |  3740 (100)|          |
|*  1 |  TABLE ACCESS BY INDEX ROWID| T2           |   100 |  3000 |  3740   (1)| 00:00:01 |
|*  2 |   INDEX FULL SCAN           | IND_T2_C3_FN | 10000 |       |  3223   (1)| 00:00:01 |
--------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("C2"='D00000000000000')
   2 - filter(NVL2("C3",NULL,1)=1)
 
---
 
SQL_ID  f510adk7fqwyu, child number 0
-------------------------------------
SELECT /*+ INDEX(T2) */   C1,   C2,   C3,   NVL2(C3,NULL,1) C4 FROM
T2 WHERE   C2='D00000000000000'   AND NVL2(C3,NULL,1)=1

Plan hash value: 941108248

--------------------------------------------------------------------------------------------
| Id  | Operation                   | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |              |       |       | 10024 (100)|          |
|*  1 |  TABLE ACCESS BY INDEX ROWID| T2           |   100 |  3200 | 10024   (1)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IND_T2_C3_FN | 10000 |       |    20   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("C2"='D00000000000000')
   2 - access("T2"."SYS_NC00005$"=1)

Notice now that the hidden/virtual column appears in the second execution plan.

SQL> SELECT COLUMN_NAME,HIDDEN_COLUMN,VIRTUAL_COLUMN FROM USER_TAB_COLS WHERE TABLE_NAME='T2';
 
COLUMN_NAME  HID VIR
------------ --- ---
C1           NO  NO
C2           NO  NO
C3           NO  NO
C4           NO  NO
SYS_NC00005$ YES YES

The hidden/virtual column was created.

Now, let’s fix the DECODE test case:

DROP INDEX IND_T2_C3_FN;
 
CREATE INDEX IND_T2_C3_FN ON T2 DECODE(C3,NULL,1);
 
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T2',CASCADE=>TRUE,ESTIMATE_PERCENT=>100,METHOD_OPT=>'FOR ALL HIDDEN COLUMNS SIZE 254 FOR ALL INDEXED COLUMNS SIZE 254',NO_INVALIDATE=>FALSE)
 
SELECT /*+ INDEX(T2) */
  C1,
  C2,
  C3,
  NVL2(C3,NULL,1) C4
FROM
  T2
WHERE
  C2='D00000000000000'
  AND DECODE(C3,NULL,1)=1;
 
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'TYPICAL'));
  
DROP INDEX IND_T2_C3_FN;
 
CREATE INDEX IND_T2_C3_FN ON T2 (DECODE(C3,NULL,1));
 
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T2',CASCADE=>TRUE,ESTIMATE_PERCENT=>100,METHOD_OPT=>'FOR ALL HIDDEN COLUMNS SIZE 254 FOR ALL INDEXED COLUMNS SIZE 254',NO_INVALIDATE=>FALSE)
 
SELECT /*+ INDEX(T2) */
  C1,
  C2,
  C3,
  NVL2(C3,NULL,1) C4
FROM
  T2
WHERE
  C2='D00000000000000'
  AND DECODE(C3,NULL,1)=1;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'TYPICAL'));

The output:

SQL_ID  8psj7gcwcn72m, child number 0
-------------------------------------
SELECT /*+ INDEX(T2) */   C1,   C2,   C3,   NVL2(C3,NULL,1) C4 FROM
T2 WHERE   C2='D00000000000000'   AND DECODE(C3,NULL,1)=1

Plan hash value: 2303772603

--------------------------------------------------------------------------------------------
| Id  | Operation                   | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |              |       |       |  3752 (100)|          |
|*  1 |  TABLE ACCESS BY INDEX ROWID| T2           |   100 |  3000 |  3752   (1)| 00:00:01 |
|*  2 |   INDEX FULL SCAN           | IND_T2_C3_FN | 10000 |       |  3234   (2)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("C2"='D00000000000000')
   2 - filter(DECODE(INTERNAL_FUNCTION("C3"),NULL,1)=1)
 
SQL_ID  8psj7gcwcn72m, child number 0
-------------------------------------
SELECT /*+ INDEX(T2) */   C1,   C2,   C3,   NVL2(C3,NULL,1) C4 FROM
T2 WHERE   C2='D00000000000000'   AND DECODE(C3,NULL,1)=1

Plan hash value: 1513984157

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |       |       |  3281 (100)|          |
|*  1 |  TABLE ACCESS FULL| T2   |   100 |  3000 |  3281   (1)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(("C2"='D00000000000000' AND
              DECODE(INTERNAL_FUNCTION("C3"),NULL,1)=1))

No index access in the second execution plan, even though an index access path was hinted. So, what happened in the second execution plan? Let’s take a look at the hidden/virtual columns:

SQL> SELECT COLUMN_NAME,HIDDEN_COLUMN,VIRTUAL_COLUMN FROM USER_TAB_COLS WHERE TABLE_NAME='T2';

COLUMN_NAME  HID VIR
------------ --- ---
C1           NO  NO
C2           NO  NO
C3           NO  NO
C4           NO  NO
SYS_NC00005$ YES YES

The hidden column exists. Maybe the index was never created? Let’s check:

SELECT
  INDEX_NAME
FROM
  USER_INDEXES
WHERE
  TABLE_NAME='T2';
 
INDEX_NAME
------------
IND_T2_C3_FN

Yes, the index is there. Maybe the Filter Predicate in the execution plan provides a clue DECODE(INTERNAL_FUNCTION(“C3”),NULL,1)=1.

The new challenge then, why was a full table scan selected?

29 02 2012
Narendra

I guess the answer lies in USER_IND_EXPRESSIONS (or ALL). I remember quite some time back reading about why a function-based index was not being used on AskTom site and Tom pointed out that the function-based index can be used provided the predicate matches the expression in those views for the index. Not sure what version was that and can not find the link to it. In that thread, Tom had explained how the ALL_IND_EXPRESSIONS store the expression in different way that the one specified while creating the index (in some particular case). Not sure it is still valid.

29 02 2012
Narendra

Here is my test on a 10.2.0.5 db (Can’t remember how to format as code). But you can see the difference when, as you said, brackets are used or not.

SQL> CREATE INDEX IND_T2_C3_FN ON T2 NVL2(C3,NULL,1);

Index created.

SQL> EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T2',CASCADE=>TRUE,ESTIMATE_PERCENT=>100,METHOD_OPT=>'FOR ALL HIDDEN COLUMNS SIZE 254 FOR ALL INDEXED COLUMNS SIZE 254',NO_INVALIDATE=>FALSE)
;

PL/SQL procedure successfully completed.

SQL> SELECT COLUMN_NAME,HIDDEN_COLUMN,VIRTUAL_COLUMN FROM USER_TAB_COLS WHERE TABLE_NAME='T2';

COLUMN_NAME		       HID VIR
------------------------------ --- ---
C1			       NO  NO
C2			       NO  NO
C3			       NO  NO
C4			       NO  NO
SYS_NC00005$		       YES YES
SYS_NC00006$		       YES YES

6 rows selected.

SQL> select index_name, column_expression from user_ind_expressions where table_name = 'T2' ;

INDEX_NAME		       COLUMN_EXPRESSION
------------------------------ --------------------------------------------------------------------------------
IND_T2_C3_FN		       NULL
IND_T2_C3_FN		       1

SQL> set autotrace traceonly explain
SELECT /*+ INDEX(T2) */
  C1,
  C2,
  C3,
  NVL2(C3,NULL,1) C4
FROM
  T2
WHERE
  C2='D00000000000000'
 10    AND NVL2(C3,NULL,1)=1;

Execution Plan
----------------------------------------------------------
Plan hash value: 2303772603

--------------------------------------------------------------------------------------------
| Id  | Operation		    | Name	   | Rows  | Bytes | Cost (%CPU)| Time	   |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT	    |		   |   100 |  3000 |  3734   (1)| 00:00:45 |
|*  1 |  TABLE ACCESS BY INDEX ROWID| T2	   |   100 |  3000 |  3734   (1)| 00:00:45 |
|*  2 |   INDEX FULL SCAN	    | IND_T2_C3_FN | 10000 |	   |  3216   (1)| 00:00:39 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("C2"='D00000000000000')
   2 - filter(NVL2("C3",NULL,1)=1)

SQL> set autotrace off
SQL> DROP INDEX IND_T2_C3_FN;

Index dropped.

SQL> CREATE INDEX IND_T2_C3_FN ON T2 (NVL2(C3,NULL,1)) nologging ;

Index created.

SQL> EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T2',CASCADE=>TRUE,ESTIMATE_PERCENT=>100,METHOD_OPT=>'FOR ALL HIDDEN COLUMNS SIZE 254 FOR ALL INDEXED COLUMNS SIZE 254',NO_INVALIDATE=>FALSE) ;

PL/SQL procedure successfully completed.

SQL> SELECT COLUMN_NAME,HIDDEN_COLUMN,VIRTUAL_COLUMN FROM USER_TAB_COLS WHERE TABLE_NAME='T2';

COLUMN_NAME		       HID VIR
------------------------------ --- ---
C1			       NO  NO
C2			       NO  NO
C3			       NO  NO
C4			       NO  NO
SYS_NC00005$		       YES YES

SQL> select index_name, column_expression from user_ind_expressions where table_name = 'T2' ;

INDEX_NAME		       COLUMN_EXPRESSION
------------------------------ --------------------------------------------------------------------------------
IND_T2_C3_FN		       NVL2("C3",NULL,1)

SQL> set autotrace traceonly explain
SELECT /*+ INDEX(T2) */
  C1,
  C2,
  C3,
  NVL2(C3,NULL,1) C4
FROM
  T2
WHERE
  C2='D00000000000000'
  AND NVL2(C3,NULL,1)=1;
 10  
Execution Plan
----------------------------------------------------------
Plan hash value: 941108248

--------------------------------------------------------------------------------------------
| Id  | Operation		    | Name	   | Rows  | Bytes | Cost (%CPU)| Time	   |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT	    |		   |   100 |  3000 | 10023   (1)| 00:02:01 |
|*  1 |  TABLE ACCESS BY INDEX ROWID| T2	   |   100 |  3000 | 10023   (1)| 00:02:01 |
|*  2 |   INDEX RANGE SCAN	    | IND_T2_C3_FN | 10000 |	   |	20   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("C2"='D00000000000000')
   2 - access(NVL2("C3",NULL,1)=1)
29 02 2012
Narendra

Charles,

Apologies but the INTERNAL_FUNCTION appears because C3 is DATE datatype and it is being compared to NULL (which, by default, is VARCHAR2, I think). Hence the conversion for data type compatibility.

29 02 2012
Charles Hooper

Narendra,

Well done. Another case where a single test case is insufficient, especially if an intentional (or unintentional) mistake is made in the test case 🙂

SQL> SELECT COLUMN_EXPRESSION FROM USER_IND_EXPRESSIONS WHERE INDEX_NAME='IND_T2_C3_FN';
 
COLUMN_EXPRESSION
----------------------------------------
DECODE(TO_CHAR("C3",'DD-MON-RR'),NULL,1)

Well, that is not what I requested – the problem extends beyond problems with CURSOR_SHARING=FORCE – NLS settings could also be an issue.

Back to the original test, and a fix.

SELECT /*+ INDEX(T2) */
  C1,
  C2,
  C3,
  NVL2(C3,NULL,1) C4
FROM
  T2
WHERE
  C2='D00000000000000'
  AND DECODE(C3,NULL,1)=1;
 
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'TYPICAL'));
 
SELECT /*+ INDEX(T2) */
  C1,
  C2,
  C3,
  NVL2(C3,NULL,1) C4
FROM
  T2
WHERE
  C2='D00000000000000'
  AND DECODE(TO_CHAR("C3",'DD-MON-RR'),NULL,1)=1;
 
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'TYPICAL'));

Results:

SQL_ID  8psj7gcwcn72m, child number 0
-------------------------------------
SELECT /*+ INDEX(T2) */   C1,   C2,   C3,   NVL2(C3,NULL,1) C4 FROM
T2 WHERE   C2='D00000000000000'   AND DECODE(C3,NULL,1)=1
 
Plan hash value: 1513984157
 
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |       |       |  3346 (100)|          |
|*  1 |  TABLE ACCESS FULL| T2   |   769 | 23070 |  3346   (1)| 00:00:01 |
--------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(("C2"='D00000000000000' AND
              DECODE(INTERNAL_FUNCTION("C3"),NULL,1)=1))
 
---
 
SQL_ID  8av2y4gms3baq, child number 0
-------------------------------------
SELECT /*+ INDEX(T2) */   C1,   C2,   C3,   NVL2(C3,NULL,1) C4 FROM
T2 WHERE   C2='D00000000000000'   AND
DECODE(TO_CHAR("C3",'DD-MON-RR'),NULL,1)=1
 
Plan hash value: 941108248
 
--------------------------------------------------------------------------------------------
| Id  | Operation                   | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |              |       |       | 10024 (100)|          |
|*  1 |  TABLE ACCESS BY INDEX ROWID| T2           |   100 |  3200 | 10024   (1)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IND_T2_C3_FN | 10000 |       |    20   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("C2"='D00000000000000')
   2 - access("T2"."SYS_NC00005$"=1)
29 02 2012
Narendra

Charles,

Guess what? I don’t get table scan when using DECODE. My results are same as that using NVL (again, apologies for not formatting)

SQL> drop INDEX IND_T2_C3_FN ;

Index dropped.

SQL> CREATE INDEX IND_T2_C3_FN ON T2 DECODE(C3,NULL,1);

Index created.

SQL> EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T2',CASCADE=>TRUE,ESTIMATE_PERCENT=>100,METHOD_OPT=>'FOR ALL HIDDEN COLUMNS SIZE 254 FOR ALL INDEXED COLUMNS SIZE 254',NO_INVALIDATE=>FALSE) ;

PL/SQL procedure successfully completed.

SQL> SELECT COLUMN_NAME,HIDDEN_COLUMN,VIRTUAL_COLUMN FROM USER_TAB_COLS WHERE TABLE_NAME='T2';

COLUMN_NAME		       HID VIR
------------------------------ --- ---
C1			       NO  NO
C2			       NO  NO
C3			       NO  NO
C4			       NO  NO
SYS_NC00006$		       YES YES
SYS_NC00005$		       YES YES

6 rows selected.

SQL> select index_name, column_expression from user_ind_expressions where table_name = 'T2' ;

INDEX_NAME		       COLUMN_EXPRESSION
------------------------------ --------------------------------------------------------------------------------
IND_T2_C3_FN		       NULL
IND_T2_C3_FN		       1

SQL> set autotrace traceonly explain
SELECT /*+ INDEX(T2) */
  C1,
  C2,
  C3,
  NVL2(C3,NULL,1) C4
FROM
  T2
WHERE
  C2='D00000000000000'
  AND DECODE(C3,NULL,1)=1;
 10  
Execution Plan
----------------------------------------------------------
Plan hash value: 2303772603

--------------------------------------------------------------------------------------------
| Id  | Operation		    | Name	   | Rows  | Bytes | Cost (%CPU)| Time	   |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT	    |		   |   100 |  3000 |  3743   (1)| 00:00:45 |
|*  1 |  TABLE ACCESS BY INDEX ROWID| T2	   |   100 |  3000 |  3743   (1)| 00:00:45 |
|*  2 |   INDEX FULL SCAN	    | IND_T2_C3_FN | 10000 |	   |  3225   (1)| 00:00:39 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("C2"='D00000000000000')
   2 - filter(DECODE(INTERNAL_FUNCTION("C3"),NULL,1)=1)

SQL> DROP INDEX IND_T2_C3_FN;

Index dropped.

SQL> CREATE INDEX IND_T2_C3_FN ON T2 (DECODE(C3,NULL,1)) nologging ;

Index created.

SQL> EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T2',CASCADE=>TRUE,ESTIMATE_PERCENT=>100,METHOD_OPT=>'FOR ALL HIDDEN COLUMNS SIZE 254 FOR ALL INDEXED COLUMNS SIZE 254',NO_INVALIDATE=>FALSE) ;

PL/SQL procedure successfully completed.

SQL> set autotrace off
SQL> SELECT COLUMN_NAME,HIDDEN_COLUMN,VIRTUAL_COLUMN FROM USER_TAB_COLS WHERE TABLE_NAME='T2';

COLUMN_NAME		       HID VIR
------------------------------ --- ---
C1			       NO  NO
C2			       NO  NO
C3			       NO  NO
C4			       NO  NO
SYS_NC00005$		       YES YES

SQL> select index_name, column_expression from user_ind_expressions where table_name = 'T2' ;

INDEX_NAME		       COLUMN_EXPRESSION
------------------------------ --------------------------------------------------------------------------------
IND_T2_C3_FN		       DECODE("C3",NULL,1)

SQL> set autotrace traceonly explain
SELECT /*+ INDEX(T2) */
  C1,
  C2,
  C3,
  NVL2(C3,NULL,1) C4
FROM
  T2
WHERE
  C2='D00000000000000'
 10    AND DECODE(C3,NULL,1)=1;

Execution Plan
----------------------------------------------------------
Plan hash value: 941108248

--------------------------------------------------------------------------------------------
| Id  | Operation		    | Name	   | Rows  | Bytes | Cost (%CPU)| Time	   |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT	    |		   |   100 |  3000 | 10023   (1)| 00:02:01 |
|*  1 |  TABLE ACCESS BY INDEX ROWID| T2	   |   100 |  3000 | 10023   (1)| 00:02:01 |
|*  2 |   INDEX RANGE SCAN	    | IND_T2_C3_FN | 10000 |	   |	20   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("C2"='D00000000000000')
   2 - access(DECODE(INTERNAL_FUNCTION("C3"),NULL,1)=1)
29 02 2012
Narendra

Charles,
Back to the original test, and a fix.
Nice one. What do you get if you convert NULL and not C3 ? 🙂

29 02 2012
David Fitzjarrell (@ddfdba)

Proved this almost four years ago on my own blog:

http://oratips-ddf.blogspot.com/2008/04/tale-of-two-indexes.html

29 02 2012
Charles Hooper

David,

Thank you for posting a link to your article – always good to see links to articles that demonstrate helpful information. You certainly did capture all of the options.

The first time I read the article, my eyes became a bit blurry… why would Oracle’s optimizer select to use an INDEX FULL SCAN operation, which reads one block at a time (from disk), when it can just about as quickly read 8, 16, or 128 8KB blocks at a time (from disk) during a FULL TABLE SCAN operation. Then it hit me on the second read through… your test table only had 10 very narrow rows – the indexes probably contained only an index root block.

I extended your test case to 1,000,000 rows, with each row being much wider (column C_2 is padded to 255 characters) (tested on 11.2.0.2, I am explaining the steps for the other people who may read this comment, not necessarily for David):

create table test1(
       c_1 number not null,
       c_2 varchar2(255)
  );
 
INSERT INTO
  TEST1
SELECT
  ROWNUM,
  RPAD('Test '||ROWNUM,255,'A')
FROM
  DUAL
CONNECT BY
  LEVEL<=1000000;
 
create index test1_c_2 on test1(c_2);
create index test1_c_1 on test1(c_1);
 
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'TEST1',CASCADE=>TRUE,ESTIMATE_PERCENT=>NULL)
 
SET AUTOTRACE TRACEONLY STATISTICS EXPLAIN
SET LINESIZE 120
SET PAGESIZE 1000

Will the optimizer select to use the TEST1_C_2 index if c_2 IS NOT NULL is specified in the WHERE clause:

SELECT
  C_2
FROM
  TEST1
WHERE
  C_2 IS NOT NULL;
 
1000000 rows selected.
 
Execution Plan
----------------------------------------------------------
Plan hash value: 4122059633
 
---------------------------------------------------------------------------
| Id  | Operation         | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |       |  1000K|   244M| 13143   (5)| 00:00:06 |
|*  1 |  TABLE ACCESS FULL| TEST1 |  1000K|   244M| 13143   (5)| 00:00:06 |
---------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("C_2" IS NOT NULL)
 
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
     103774  consistent gets
          0  physical reads
          0  redo size
  264867023  bytes sent via SQL*Net to client
     733686  bytes received via SQL*Net from client
      66668  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
    1000000  rows processed

Well, in this case a FULL TABLE SCAN operation was selected with a calculated cost of 13,143.

Let’s force an index access path:

SELECT /*+ INDEX(TEST1) */
  C_2
FROM
  TEST1
WHERE
  C_2 IS NOT NULL;
 
1000000 rows selected.
 
Execution Plan
----------------------------------------------------------
Plan hash value: 2529630288
 
------------------------------------------------------------------------------
| Id  | Operation        | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |           |  1000K|   244M| 54890   (2)| 00:00:22 |
|*  1 |  INDEX FULL SCAN | TEST1_C_2 |  1000K|   244M| 54890   (2)| 00:00:22 |
------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("C_2" IS NOT NULL)
 
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
     117174  consistent gets
          0  physical reads
          0  redo size
  264867023  bytes sent via SQL*Net to client
     733686  bytes received via SQL*Net from client
      66668  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
    1000000  rows processed

An INDEX FULL SCAN operation was used, but it was not automatically selected because the 54,890 cost is greater than the 13,143 cost for the FULL TABLE SCAN operation.

Let’s try again with a narrower version of the table, with column C_2 restricted to 10 characters:

TRUNCATE TABLE TEST1;
 
INSERT INTO
  TEST1
SELECT
  ROWNUM,
  RPAD('Test '||ROWNUM,10,'A')
FROM
  DUAL
CONNECT BY
  LEVEL<=1000000;
  
COMMIT;
 
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'TEST1',CASCADE=>TRUE,ESTIMATE_PERCENT=>NULL,NO_INVALIDATE=>FALSE)

Repeating the test with C_2 IS NOT NULL specified in the WHERE clause:

SELECT
  C_2
FROM
  TEST1
WHERE
  C_2 IS NOT NULL;
 
1000000 rows selected.
 
Execution Plan
----------------------------------------------------------
Plan hash value: 4122059633
 
---------------------------------------------------------------------------
| Id  | Operation         | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |       |  1000K|    10M|  1239  (23)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| TEST1 |  1000K|    10M|  1239  (23)| 00:00:01 |
---------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("C_2" IS NOT NULL)
 
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
      69372  consistent gets
          0  physical reads
          0  redo size
   17867023  bytes sent via SQL*Net to client
     733686  bytes received via SQL*Net from client
      66668  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
    1000000  rows processed

A full table scan again, this time with a calculated cost of 1,239.

Let’s force an index access path:

SELECT /*+ INDEX(TEST1) */
  C_2
FROM
  TEST1
WHERE
  C_2 IS NOT NULL;
 
1000000 rows selected.
 
Execution Plan
----------------------------------------------------------
Plan hash value: 2529630288
 
------------------------------------------------------------------------------
| Id  | Operation        | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |           |  1000K|    10M|  5472   (6)| 00:00:03 |
|*  1 |  INDEX FULL SCAN | TEST1_C_2 |  1000K|    10M|  5472   (6)| 00:00:03 |
------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("C_2" IS NOT NULL)
 
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
      71481  consistent gets
          0  physical reads
          0  redo size
   11833627  bytes sent via SQL*Net to client
     733686  bytes received via SQL*Net from client
      66668  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
    1000000  rows processed

The INDEX FULL SCAN operation had a calculated cost of 5,472, which is more than the 1,239 for the FULL TABLE SCAN operation, so that is why the full table scan was selected.

Let’s check selecting column C_1 to see if we also experience a FULL TABLE SCAN for this query:

SELECT
  C_1
FROM
  TEST1;
 
1000000 rows selected.
 
Execution Plan
----------------------------------------------------------
Plan hash value: 2062908727
 
----------------------------------------------------------------------------------
| Id  | Operation            | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |           |  1000K|  4882K|   921  (20)| 00:00:01 |
|   1 |  INDEX FAST FULL SCAN| TEST1_C_1 |  1000K|  4882K|   921  (20)| 00:00:01 |
----------------------------------------------------------------------------------
 
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
      68759  consistent gets
          0  physical reads
          0  redo size
   11846825  bytes sent via SQL*Net to client
     733686  bytes received via SQL*Net from client
      66668  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
    1000000  rows processed

No full table scan, but we did receive an INDEX FAST FULL SCAN operation, which performs multi-block reads (from disk) of the index blocks, much like what would happen with the table blocks during a FULL TABLE SCAN operation.

Let’s see what happens if we try to force an index access path using a hint for the same query:

SELECT /*+ INDEX(TEST1) */
  C_1
FROM
  TEST1;
 
1000000 rows selected.
 
Execution Plan
----------------------------------------------------------
Plan hash value: 2345822858
 
------------------------------------------------------------------------------
| Id  | Operation        | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |           |  1000K|  4882K|  2509  (12)| 00:00:02 |
|   1 |  INDEX FULL SCAN | TEST1_C_1 |  1000K|  4882K|  2509  (12)| 00:00:02 |
------------------------------------------------------------------------------
 
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
      68746  consistent gets
          0  physical reads
          0  redo size
   11846825  bytes sent via SQL*Net to client
     733686  bytes received via SQL*Net from client
      66668  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
    1000000  rows processed

Notice that the cost increased dramatically, and that the execution plan shows an INDEX FULL SCAN operation rather than an INDEX FAST FULL SCAN operation.

For fun, let’s check the calculated cost of a FULL TABLE SCAN so that we are able to compare the cost with the INDEX FULL SCAN operation:

SELECT /*+ FULL(TEST1) */
  C_1
FROM
  TEST1
WHERE
  C_1 IS NOT NULL;
 
1000000 rows selected.
 
Execution Plan
----------------------------------------------------------
Plan hash value: 4122059633
 
---------------------------------------------------------------------------
| Id  | Operation         | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |       |  1000K|  4882K|  1187  (19)| 00:00:01 |
|   1 |  TABLE ACCESS FULL| TEST1 |  1000K|  4882K|  1187  (19)| 00:00:01 |
---------------------------------------------------------------------------
 
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
      69372  consistent gets
          0  physical reads
          0  redo size
   11846825  bytes sent via SQL*Net to client
     733686  bytes received via SQL*Net from client
      66668  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
    1000000  rows processed

The cost of the FULL TABLE SCAN is less than that of the INDEX FULL SCAN operation – so if the INDEX FAST FULL SCAN operation was not legal, the FULL TABLE SCAN operation would have been selected.

Maybe it is a problem with skewed system statistics?

COLUMN PNAME FORMAT A20
 
SELECT
  PNAME,
  PVAL1
FROM
  SYS.AUX_STATS$;
 
PNAME                     PVAL1
-------------------- ----------
STATUS
DSTART
DSTOP
FLAGS                         1
CPUSPEEDNW           2116.57559
IOSEEKTIM                    10
IOTFRSPEED                 4096
SREADTIM                     .4
MREADTIM                     .8
CPUSPEED                   1922
MBRC                          6
MAXTHR                155910144
SLAVETHR

The MBRC is just 6, so the optimizer definitely was not under-costing the FULL TABLE SCAN (and INDEX FAST FULL SCAN) operation.

Let’s bump the MBRC to 16 just to see what happens:

EXEC DBMS_STATS.SET_SYSTEM_STATS('MBRC',16)
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'TEST1',CASCADE=>TRUE,ESTIMATE_PERCENT=>NULL,NO_INVALIDATE=>FALSE)
 
SET AUTOTRACE TRACEONLY STATISTICS EXPLAIN
 
SELECT
  C_1
FROM
  TEST1;
 
1000000 rows selected.
 
Execution Plan
----------------------------------------------------------
Plan hash value: 2062908727
 
----------------------------------------------------------------------------------
| Id  | Operation            | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |           |  1000K|  4882K|   457  (39)| 00:00:01 |
|   1 |  INDEX FAST FULL SCAN| TEST1_C_1 |  1000K|  4882K|   457  (39)| 00:00:01 |
----------------------------------------------------------------------------------
 
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
      68759  consistent gets
          0  physical reads
          0  redo size
   11846825  bytes sent via SQL*Net to client
     733686  bytes received via SQL*Net from client
      66668  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
    1000000  rows processed

The INDEX FAST FULL SCAN operation’s cost dropped from 921 to 457.

So, what would happen if an INDEX FAST FULL SCAN operation were not valid for this SQL statement?:

SELECT /*+ NO_INDEX_FFS(TEST1) */
  C_1
FROM
  TEST1;
 
1000000 rows selected.
 
Execution Plan
----------------------------------------------------------
Plan hash value: 4122059633
 
---------------------------------------------------------------------------
| Id  | Operation         | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |       |  1000K|  4882K|   585  (38)| 00:00:01 |
|   1 |  TABLE ACCESS FULL| TEST1 |  1000K|  4882K|   585  (38)| 00:00:01 |
---------------------------------------------------------------------------
 
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
      69372  consistent gets
          0  physical reads
          0  redo size
   11846825  bytes sent via SQL*Net to client
     733686  bytes received via SQL*Net from client
      66668  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
    1000000  rows processed

A full table scan with a cost of 585.

Let’s switch back to an INDEX FULL SCAN operation:

SELECT /*+ INDEX(TEST1) */
  C_1
FROM
  TEST1;
 
1000000 rows selected.
 
Execution Plan
----------------------------------------------------------
Plan hash value: 2345822858
 
------------------------------------------------------------------------------
| Id  | Operation        | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |           |  1000K|  4882K|  2509  (12)| 00:00:02 |
|   1 |  INDEX FULL SCAN | TEST1_C_1 |  1000K|  4882K|  2509  (12)| 00:00:02 |
------------------------------------------------------------------------------
 
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
      68746  consistent gets
          0  physical reads
          0  redo size
   11846825  bytes sent via SQL*Net to client
     733686  bytes received via SQL*Net from client
      66668  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
    1000000  rows processed

The calculated cost is unchanged after the change of the MBRC system statistic.

David,

Would you be able to re-run your test with Oracle Database 11.2.0.1, 11.2.0.2, or 11.2.0.3 with a larger number of rows? I am curious to know if you will see INDEX FAST FULL SCAN operations, INDEX FULL SCAN operations, or TABLE ACCESS FULL operations in the execution plans.

24 04 2012
damirvadas

Oracle doc:
http://docs.oracle.com/cd/B12037_01/server.101/b10736/indexes.htm
“Unlike most other types of indexes, bitmap indexes include rows that have NULL values. ”

And now what?
🙂

Rg,
Damir

11 06 2012
3 10 2012
Yes, yes you can index nulls » SQLfail

[…] by posts from Charles Hooper and Richard Foote. Share this:FacebookTwitter  Posted by Chris Saxon at 08:00  Tagged […]

Leave a reply to Yes, yes you can index nulls » SQLfail Cancel reply