Repeat After Me: NULL Values are Not Stored in Indexes?

28 02 2012

February 28, 2012

I do not always get as much benefit from the books that I read as the books’ authors probably intended, although the contents of books, whether right or wrong, sometimes help me remember nearly forgotten facts.  Some of the books, for instance “Pro Oracle SQL” describe how to use b*tree indexes to locate rows with NULL values in a column.  The book “Expert Oracle Database Architecture: Oracle Database Programming 9i, 10g, and 11g Techniques and Solutions, Second Edition” also shows several techniques for utilizing b*tree indexes to locate rows with NULL values in a column.  Richard Foote’s blog also has at least one article that describes how to use b*tree indexes to locate NULL values in a column.

So, what headed me down the path of NULL Values are Not Stored in Indexes?  I read the following sentence in the book “Oracle Database 11gR2 Performance Tuning Cookbook” on page 177:

“NULL values are not stored in indexes, so when we query for records with a NULL value in field X, even if the X column is indexed, the index will not be used.”

I can’t help but feel that I have seen very similar statements in the past.  The Oracle Database documentation, at least from 8.1 through 11.1 included the following quote:

“The query must be guaranteed not to need any NULL values from the indexed expression, because NULL values are not stored in indexes.”

But where else have I seen that message? A Google search for: oracle NULL values are not stored in indexes found a couple of places where the message is repeated.

Oracle Database 11g: The Complete Reference” (I read the 9i version of this book years ago):

NULL values are not stored in indexes. Therefore, the following query will not use an index; there is no way the index could help to resolve the query:”

select Title
  from BOOKSHELF
 where CategoryName is null;

Oracle 9i Performance Tuning Tips & Techniques” page 39 (I quickly paged through the 10g book in a book store once, and read the sample chapter on Statspack reports):

“Using IS NULL or IS NOT NULL will also suppress index use because the value of NULL is undefined.”

Oracle Data Warehouse Tuning for 10g” page 51 (I have not had the opportunity to read this book):

“For a BTree index, NULL values are not included in the BTree structure and, thus, not even accessible through the index.”

Expert Indexing in Oracle Database 11g” page 159 (any problems here – I think that I have a test case somewhere that suggests that the second column should be a number, I thought about buying this book):

“If all index columns are NULL, Oracle Database doesn’t include rows into an index. However, you can actually index NULL values by simply adding another column to the index, like so:

SQL> create index with_null on employees(nullable_column, '1');

So, what is my review comment for the Cookbook?

The book states, “NULL values are not stored in indexes, so when we query for records with a NULL value in field X, even if the X column is indexed, the index will not be used.”  The book’s description is incomplete.  NULL values are not stored in single column b*tree indexes.  There are at least four methods to work around this issue and allow indexes to be used to identify rows with a NULL value in the indexed column:

  1. Define a composite index with at least one other column that has a NOT NULL constraint – ideally, the column in which the NULL values might appear would be the leading column in the composite index.
  2. Define a composite index with a numeric constant (such as 1) as the second column in the composite index.
  3. Bitmap indexes always store NULL values – if appropriate (column experiences few updates, deletes, inserts, and an Enterprise Edition database), create a bitmap index for the column.
  4. If the number of NULL values in a column will be relatively small (compared to the number of rows in the table), and the original SQL statement may be modified, create a function based index that converts NULL values to 1 and non-NULL values to NULL:
    • DECODE(C3,NULL,1)
    • (CASE WHEN C3 IS NULL THEN 1 ELSE NULL END)
    • (NVL2(C3,NULL,1))

Let’s build a little test case to demonstrate.  First, a table is created with 1,000,000 rows, two indexes are created, and then statistics are gathered with histograms generated for all indexed columns:

CREATE TABLE T2 (
  C1 NUMBER NOT NULL,
  C2 VARCHAR2(15) NOT NULL,
  C3 DATE,
  C4 VARCHAR2(255));

INSERT INTO T2
SELECT
  ROWNUM C1,
  RPAD(CHR(MOD(ROWNUM-1,26)+65),15,'0') C2,
  DECODE(MOD(ROWNUM,100),0,NULL,SYSDATE+ROWNUM/10000) C3,
  RPAD('A',255,'A') C4
FROM
  DUAL
CONNECT BY
  LEVEL<=1000000;

COMMIT;

CREATE INDEX IND_T2_C3_C2 ON T2(C3,C2);
CREATE INDEX IND_T2_C3_C ON T2(C3,1);

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T2',CASCADE=>TRUE,ESTIMATE_PERCENT=>100,METHOD_OPT=>'FOR ALL INDEXED COLUMNS SIZE 254') 

In the above table, 1% of the rows will have a NULL value in column C3.

Let’s try the first test to see if one of the above two indexes may be used to locate the NULL values in column C3:

SELECT
  C1,
  C2,
  C3
FROM
  T2
WHERE
  C2='D00000000000000'
  AND C3 IS NULL;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'TYPICAL'));

SQL_ID  3sjqqpjradbgz, child number 0
-------------------------------------
SELECT   C1,   C2,   C3 FROM   T2 WHERE   C2='D00000000000000'   AND C3
IS NULL

Plan hash value: 895813321

--------------------------------------------------------------------------------------------
| Id  | Operation                   | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |              |       |       |   389 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID| T2           |   385 | 14245 |   389   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IND_T2_C3_C2 |   385 |       |     4   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C3" IS NULL AND "C2"='D00000000000000')
       filter("C2"='D00000000000000') 

Well, it appears that the composite index on columns C3 and C2 might have helped quickly locate the rows with NULL values in column C3 (we just tested point #1 above).  Let’s drop that index and try again:

DROP INDEX IND_T2_C3_C2;

SELECT
  C1,
  C2,
  C3
FROM
  T2
WHERE
  C2='D00000000000000'
  AND C3 IS NULL;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'TYPICAL'));

SQL_ID  3sjqqpjradbgz, child number 0
-------------------------------------
SELECT   C1,   C2,   C3 FROM   T2 WHERE   C2='D00000000000000'   AND C3
IS NULL

Plan hash value: 1053304445

-------------------------------------------------------------------------------------------
| Id  | Operation                   | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |             |       |       |   550 (100)|          |
|*  1 |  TABLE ACCESS BY INDEX ROWID| T2          |   385 | 14245 |   550   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IND_T2_C3_C | 10000 |       |    33   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("C2"='D00000000000000')
   2 - access("C3" IS NULL) 

This time we cannot simply state that “NULL Values are Not Stored in Indexes”.  The Predicate Information section of the execution plan shows access(“C3″ IS NULL) (we just tested point #2 above).

Let’s drop the index and try something else:

DROP INDEX IND_T2_C3_C;

CREATE BITMAP INDEX IND_T2_C3_BIN ON T2(C3);

SELECT
  C1,
  C2,
  C3
FROM
  T2
WHERE
  C2='D00000000000000'
  AND C3 IS NULL;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'TYPICAL'));

SQL_ID  3sjqqpjradbgz, child number 0
-------------------------------------
SELECT   C1,   C2,   C3 FROM   T2 WHERE   C2='D00000000000000'   AND C3
IS NULL

Plan hash value: 1153509852

----------------------------------------------------------------------------------------------
| Id  | Operation                    | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |               |       |       |  1954 (100)|          |
|*  1 |  TABLE ACCESS BY INDEX ROWID | T2            |   385 | 14245 |  1954   (0)| 00:00:01 |
|   2 |   BITMAP CONVERSION TO ROWIDS|               |       |       |            |          |
|*  3 |    BITMAP INDEX SINGLE VALUE | IND_T2_C3_BIN |       |       |            |          |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("C2"='D00000000000000')
   3 - access("C3" IS NULL) 

Once again, we found that we cannot simply state that “NULL Values are Not Stored in Indexes”.  The Predicate Information section of the execution plan shows access(“C3″ IS NULL) (we just tested point #3 above).

Let’s drop the index and try just indexing the NULLs (sub-point 1 of point #4 above):

DROP INDEX IND_T2_C3_BIN;

CREATE INDEX IND_T2_C3_FN ON T2 DECODE(C3,NULL,1);

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T2',CASCADE=>TRUE,ESTIMATE_PERCENT=>100,METHOD_OPT=>'FOR ALL HIDDEN COLUMNS SIZE 254 FOR ALL INDEXED COLUMNS SIZE 254',NO_INVALIDATE=>FALSE)

SELECT /*+ INDEX(T2) */
  C1,
  C2,
  C3,
  NVL2(C3,NULL,1) C4
FROM
  T2
WHERE
  C2='D00000000000000'
  AND DECODE(C3,NULL,1)=1;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'TYPICAL'));

SQL_ID  8psj7gcwcn72m, child number 0
-------------------------------------
SELECT /*+ INDEX(T2) */   C1,   C2,   C3,   NVL2(C3,NULL,1) C4 FROM
T2 WHERE   C2='D00000000000000'   AND DECODE(C3,NULL,1)=1

Plan hash value: 2303772603

--------------------------------------------------------------------------------------------
| Id  | Operation                   | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |              |       |       |  3752 (100)|          |
|*  1 |  TABLE ACCESS BY INDEX ROWID| T2           |   385 | 14245 |  3752   (1)| 00:00:01 |
|*  2 |   INDEX FULL SCAN           | IND_T2_C3_FN | 10000 |       |  3234   (2)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("C2"='D00000000000000')
   2 - filter(DECODE(INTERNAL_FUNCTION("C3"),NULL,1)=1) 

The index that we created was used.  Once again, we found that we cannot simply state that “NULL Values are Not Stored in Indexes”.

Let’s drop the index and try just indexing the NULLs (sub-point 2 of point #4 above):

DROP INDEX IND_T2_C3_FN;

CREATE INDEX IND_T2_C3_FN ON T2 (CASE WHEN C3 IS NULL THEN 1 ELSE NULL END);

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T2',CASCADE=>TRUE,ESTIMATE_PERCENT=>100,METHOD_OPT=>'FOR ALL HIDDEN COLUMNS SIZE 254 FOR ALL INDEXED COLUMNS SIZE 254',NO_INVALIDATE=>FALSE)

SELECT /*+ INDEX(T2) */
  C1,
  C2,
  C3,
  NVL2(C3,NULL,1) C4
FROM
  T2
WHERE
  C2='D00000000000000'
  AND (CASE WHEN C3 IS NULL THEN 1 ELSE NULL END)=1;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'TYPICAL'));

SQL_ID  2w9h5jm4tdzpz, child number 0
-------------------------------------
SELECT /*+ INDEX(T2) */   C1,   C2,   C3,   NVL2(C3,NULL,1) C4 FROM
T2 WHERE   C2='D00000000000000'   AND (CASE WHEN C3 IS NULL THEN 1 ELSE
NULL END)=1

Plan hash value: 941108248

--------------------------------------------------------------------------------------------
| Id  | Operation                   | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |              |       |       | 10024 (100)|          |
|*  1 |  TABLE ACCESS BY INDEX ROWID| T2           |   385 | 15015 | 10024   (1)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IND_T2_C3_FN | 10000 |       |    20   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("C2"='D00000000000000')
   2 - access("T2"."SYS_NC00005$"=1) 

The index that we created was used, although notice that the Predicate Information section of the plan is a bit different from before, and this time we have an INDEX RANGE SCAN operation rather than an INDEX FULL SCAN operation.  Once again, we found that we cannot simply state that “NULL Values are Not Stored in Indexes”.

Let’s drop the index and try just indexing the NULLs (sub-point 3 of point #4 above):

DROP INDEX IND_T2_C3_FN;

CREATE INDEX IND_T2_C3_FN ON T2 NVL2(C3,NULL,1);

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T2',CASCADE=>TRUE,ESTIMATE_PERCENT=>100,METHOD_OPT=>'FOR ALL HIDDEN COLUMNS SIZE 254 FOR ALL INDEXED COLUMNS SIZE 254',NO_INVALIDATE=>FALSE)

SELECT
  C1,
  C2,
  C3,
  NVL2(C3,NULL,1) C4
FROM
  T2
WHERE
  C2='D00000000000000'
  AND NVL2(C3,NULL,1)=1;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'TYPICAL'));

SQL_ID  fmsg8vztz32d2, child number 0
-------------------------------------
SELECT   C1,   C2,   C3,   NVL2(C3,NULL,1) C4 FROM   T2 WHERE
C2='D00000000000000'   AND NVL2(C3,NULL,1)=1

Plan hash value: 1513984157

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |       |       |  3282 (100)|          |
|*  1 |  TABLE ACCESS FULL| T2   |   385 | 14245 |  3282   (1)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(("C2"='D00000000000000' AND NVL2("C3",NULL,1)=1)) 

The index that we created was… not used?  But is it not the case that the NVL2(C3,NULL,1) function result is the equivalent of the DECODE and the CASE function results?  Let’s give this one another try with a hint:

SELECT /*+ INDEX(T2) */
  C1,
  C2,
  C3,
  NVL2(C3,NULL,1) C4
FROM
  T2
WHERE
  C2='D00000000000000'
  AND NVL2(C3,NULL,1)=1;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'TYPICAL'));

SQL_ID  f510adk7fqwyu, child number 0
-------------------------------------
SELECT /*+ INDEX(T2) */   C1,   C2,   C3,   NVL2(C3,NULL,1) C4 FROM
T2 WHERE   C2='D00000000000000'   AND NVL2(C3,NULL,1)=1

Plan hash value: 2303772603

--------------------------------------------------------------------------------------------
| Id  | Operation                   | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |              |       |       |  3740 (100)|          |
|*  1 |  TABLE ACCESS BY INDEX ROWID| T2           |   385 | 14245 |  3740   (1)| 00:00:01 |
|*  2 |   INDEX FULL SCAN           | IND_T2_C3_FN | 10000 |       |  3223   (1)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("C2"='D00000000000000')
   2 - filter(NVL2("C3",NULL,1)=1) 

That’s better, even if we did receive an INDEX FULL SCAN operation rather than in INDEX RANGE SCAN operation as was the intended result.

Something fun to think about – why did the technique using the CASE syntax that resulted in an INDEX RANGE SCAN operation have a plan with a calculated cost of 10,024, when the plans with the INDEX FULL SCAN operations have a calculated cost of about 3,750?








Follow

Get every new post delivered to your Inbox.

Join 142 other followers