Repeat After Me: NULL Values are Not Stored in Indexes?

28 02 2012

February 28, 2012

I do not always get as much benefit from the books that I read as the books’ authors probably intended, although the contents of books, whether right or wrong, sometimes help me remember nearly forgotten facts.  Some of the books, for instance “Pro Oracle SQL” describe how to use b*tree indexes to locate rows with NULL values in a column.  The book “Expert Oracle Database Architecture: Oracle Database Programming 9i, 10g, and 11g Techniques and Solutions, Second Edition” also shows several techniques for utilizing b*tree indexes to locate rows with NULL values in a column.  Richard Foote’s blog also has at least one article that describes how to use b*tree indexes to locate NULL values in a column.

So, what headed me down the path of NULL Values are Not Stored in Indexes?  I read the following sentence in the book “Oracle Database 11gR2 Performance Tuning Cookbook” on page 177:

“NULL values are not stored in indexes, so when we query for records with a NULL value in field X, even if the X column is indexed, the index will not be used.”

I can’t help but feel that I have seen very similar statements in the past.  The Oracle Database documentation, at least from 8.1 through 11.1 included the following quote:

“The query must be guaranteed not to need any NULL values from the indexed expression, because NULL values are not stored in indexes.”

But where else have I seen that message? A Google search for: oracle NULL values are not stored in indexes found a couple of places where the message is repeated.

Oracle Database 11g: The Complete Reference” (I read the 9i version of this book years ago):

NULL values are not stored in indexes. Therefore, the following query will not use an index; there is no way the index could help to resolve the query:”

select Title
  from BOOKSHELF
 where CategoryName is null;

Oracle 9i Performance Tuning Tips & Techniques” page 39 (I quickly paged through the 10g book in a book store once, and read the sample chapter on Statspack reports):

“Using IS NULL or IS NOT NULL will also suppress index use because the value of NULL is undefined.”

Oracle Data Warehouse Tuning for 10g” page 51 (I have not had the opportunity to read this book):

“For a BTree index, NULL values are not included in the BTree structure and, thus, not even accessible through the index.”

Expert Indexing in Oracle Database 11g” page 159 (any problems here – I think that I have a test case somewhere that suggests that the second column should be a number, I thought about buying this book):

“If all index columns are NULL, Oracle Database doesn’t include rows into an index. However, you can actually index NULL values by simply adding another column to the index, like so:

SQL> create index with_null on employees(nullable_column, '1');

So, what is my review comment for the Cookbook?

The book states, “NULL values are not stored in indexes, so when we query for records with a NULL value in field X, even if the X column is indexed, the index will not be used.”  The book’s description is incomplete.  NULL values are not stored in single column b*tree indexes.  There are at least four methods to work around this issue and allow indexes to be used to identify rows with a NULL value in the indexed column:

  1. Define a composite index with at least one other column that has a NOT NULL constraint – ideally, the column in which the NULL values might appear would be the leading column in the composite index.
  2. Define a composite index with a numeric constant (such as 1) as the second column in the composite index.
  3. Bitmap indexes always store NULL values – if appropriate (column experiences few updates, deletes, inserts, and an Enterprise Edition database), create a bitmap index for the column.
  4. If the number of NULL values in a column will be relatively small (compared to the number of rows in the table), and the original SQL statement may be modified, create a function based index that converts NULL values to 1 and non-NULL values to NULL:
    • DECODE(C3,NULL,1)
    • (CASE WHEN C3 IS NULL THEN 1 ELSE NULL END)
    • (NVL2(C3,NULL,1))

Let’s build a little test case to demonstrate.  First, a table is created with 1,000,000 rows, two indexes are created, and then statistics are gathered with histograms generated for all indexed columns:

CREATE TABLE T2 (
  C1 NUMBER NOT NULL,
  C2 VARCHAR2(15) NOT NULL,
  C3 DATE,
  C4 VARCHAR2(255));

INSERT INTO T2
SELECT
  ROWNUM C1,
  RPAD(CHR(MOD(ROWNUM-1,26)+65),15,'0') C2,
  DECODE(MOD(ROWNUM,100),0,NULL,SYSDATE+ROWNUM/10000) C3,
  RPAD('A',255,'A') C4
FROM
  DUAL
CONNECT BY
  LEVEL<=1000000;

COMMIT;

CREATE INDEX IND_T2_C3_C2 ON T2(C3,C2);
CREATE INDEX IND_T2_C3_C ON T2(C3,1);

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T2',CASCADE=>TRUE,ESTIMATE_PERCENT=>100,METHOD_OPT=>'FOR ALL INDEXED COLUMNS SIZE 254') 

In the above table, 1% of the rows will have a NULL value in column C3.

Let’s try the first test to see if one of the above two indexes may be used to locate the NULL values in column C3:

SELECT
  C1,
  C2,
  C3
FROM
  T2
WHERE
  C2='D00000000000000'
  AND C3 IS NULL;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'TYPICAL'));

SQL_ID  3sjqqpjradbgz, child number 0
-------------------------------------
SELECT   C1,   C2,   C3 FROM   T2 WHERE   C2='D00000000000000'   AND C3
IS NULL

Plan hash value: 895813321

--------------------------------------------------------------------------------------------
| Id  | Operation                   | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |              |       |       |   389 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID| T2           |   385 | 14245 |   389   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IND_T2_C3_C2 |   385 |       |     4   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C3" IS NULL AND "C2"='D00000000000000')
       filter("C2"='D00000000000000') 

Well, it appears that the composite index on columns C3 and C2 might have helped quickly locate the rows with NULL values in column C3 (we just tested point #1 above).  Let’s drop that index and try again:

DROP INDEX IND_T2_C3_C2;

SELECT
  C1,
  C2,
  C3
FROM
  T2
WHERE
  C2='D00000000000000'
  AND C3 IS NULL;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'TYPICAL'));

SQL_ID  3sjqqpjradbgz, child number 0
-------------------------------------
SELECT   C1,   C2,   C3 FROM   T2 WHERE   C2='D00000000000000'   AND C3
IS NULL

Plan hash value: 1053304445

-------------------------------------------------------------------------------------------
| Id  | Operation                   | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |             |       |       |   550 (100)|          |
|*  1 |  TABLE ACCESS BY INDEX ROWID| T2          |   385 | 14245 |   550   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IND_T2_C3_C | 10000 |       |    33   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("C2"='D00000000000000')
   2 - access("C3" IS NULL) 

This time we cannot simply state that “NULL Values are Not Stored in Indexes”.  The Predicate Information section of the execution plan shows access(“C3″ IS NULL) (we just tested point #2 above).

Let’s drop the index and try something else:

DROP INDEX IND_T2_C3_C;

CREATE BITMAP INDEX IND_T2_C3_BIN ON T2(C3);

SELECT
  C1,
  C2,
  C3
FROM
  T2
WHERE
  C2='D00000000000000'
  AND C3 IS NULL;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'TYPICAL'));

SQL_ID  3sjqqpjradbgz, child number 0
-------------------------------------
SELECT   C1,   C2,   C3 FROM   T2 WHERE   C2='D00000000000000'   AND C3
IS NULL

Plan hash value: 1153509852

----------------------------------------------------------------------------------------------
| Id  | Operation                    | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |               |       |       |  1954 (100)|          |
|*  1 |  TABLE ACCESS BY INDEX ROWID | T2            |   385 | 14245 |  1954   (0)| 00:00:01 |
|   2 |   BITMAP CONVERSION TO ROWIDS|               |       |       |            |          |
|*  3 |    BITMAP INDEX SINGLE VALUE | IND_T2_C3_BIN |       |       |            |          |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("C2"='D00000000000000')
   3 - access("C3" IS NULL) 

Once again, we found that we cannot simply state that “NULL Values are Not Stored in Indexes”.  The Predicate Information section of the execution plan shows access(“C3″ IS NULL) (we just tested point #3 above).

Let’s drop the index and try just indexing the NULLs (sub-point 1 of point #4 above):

DROP INDEX IND_T2_C3_BIN;

CREATE INDEX IND_T2_C3_FN ON T2 DECODE(C3,NULL,1);

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T2',CASCADE=>TRUE,ESTIMATE_PERCENT=>100,METHOD_OPT=>'FOR ALL HIDDEN COLUMNS SIZE 254 FOR ALL INDEXED COLUMNS SIZE 254',NO_INVALIDATE=>FALSE)

SELECT /*+ INDEX(T2) */
  C1,
  C2,
  C3,
  NVL2(C3,NULL,1) C4
FROM
  T2
WHERE
  C2='D00000000000000'
  AND DECODE(C3,NULL,1)=1;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'TYPICAL'));

SQL_ID  8psj7gcwcn72m, child number 0
-------------------------------------
SELECT /*+ INDEX(T2) */   C1,   C2,   C3,   NVL2(C3,NULL,1) C4 FROM
T2 WHERE   C2='D00000000000000'   AND DECODE(C3,NULL,1)=1

Plan hash value: 2303772603

--------------------------------------------------------------------------------------------
| Id  | Operation                   | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |              |       |       |  3752 (100)|          |
|*  1 |  TABLE ACCESS BY INDEX ROWID| T2           |   385 | 14245 |  3752   (1)| 00:00:01 |
|*  2 |   INDEX FULL SCAN           | IND_T2_C3_FN | 10000 |       |  3234   (2)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("C2"='D00000000000000')
   2 - filter(DECODE(INTERNAL_FUNCTION("C3"),NULL,1)=1) 

The index that we created was used.  Once again, we found that we cannot simply state that “NULL Values are Not Stored in Indexes”.

Let’s drop the index and try just indexing the NULLs (sub-point 2 of point #4 above):

DROP INDEX IND_T2_C3_FN;

CREATE INDEX IND_T2_C3_FN ON T2 (CASE WHEN C3 IS NULL THEN 1 ELSE NULL END);

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T2',CASCADE=>TRUE,ESTIMATE_PERCENT=>100,METHOD_OPT=>'FOR ALL HIDDEN COLUMNS SIZE 254 FOR ALL INDEXED COLUMNS SIZE 254',NO_INVALIDATE=>FALSE)

SELECT /*+ INDEX(T2) */
  C1,
  C2,
  C3,
  NVL2(C3,NULL,1) C4
FROM
  T2
WHERE
  C2='D00000000000000'
  AND (CASE WHEN C3 IS NULL THEN 1 ELSE NULL END)=1;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'TYPICAL'));

SQL_ID  2w9h5jm4tdzpz, child number 0
-------------------------------------
SELECT /*+ INDEX(T2) */   C1,   C2,   C3,   NVL2(C3,NULL,1) C4 FROM
T2 WHERE   C2='D00000000000000'   AND (CASE WHEN C3 IS NULL THEN 1 ELSE
NULL END)=1

Plan hash value: 941108248

--------------------------------------------------------------------------------------------
| Id  | Operation                   | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |              |       |       | 10024 (100)|          |
|*  1 |  TABLE ACCESS BY INDEX ROWID| T2           |   385 | 15015 | 10024   (1)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IND_T2_C3_FN | 10000 |       |    20   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("C2"='D00000000000000')
   2 - access("T2"."SYS_NC00005$"=1) 

The index that we created was used, although notice that the Predicate Information section of the plan is a bit different from before, and this time we have an INDEX RANGE SCAN operation rather than an INDEX FULL SCAN operation.  Once again, we found that we cannot simply state that “NULL Values are Not Stored in Indexes”.

Let’s drop the index and try just indexing the NULLs (sub-point 3 of point #4 above):

DROP INDEX IND_T2_C3_FN;

CREATE INDEX IND_T2_C3_FN ON T2 NVL2(C3,NULL,1);

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T2',CASCADE=>TRUE,ESTIMATE_PERCENT=>100,METHOD_OPT=>'FOR ALL HIDDEN COLUMNS SIZE 254 FOR ALL INDEXED COLUMNS SIZE 254',NO_INVALIDATE=>FALSE)

SELECT
  C1,
  C2,
  C3,
  NVL2(C3,NULL,1) C4
FROM
  T2
WHERE
  C2='D00000000000000'
  AND NVL2(C3,NULL,1)=1;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'TYPICAL'));

SQL_ID  fmsg8vztz32d2, child number 0
-------------------------------------
SELECT   C1,   C2,   C3,   NVL2(C3,NULL,1) C4 FROM   T2 WHERE
C2='D00000000000000'   AND NVL2(C3,NULL,1)=1

Plan hash value: 1513984157

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |       |       |  3282 (100)|          |
|*  1 |  TABLE ACCESS FULL| T2   |   385 | 14245 |  3282   (1)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(("C2"='D00000000000000' AND NVL2("C3",NULL,1)=1)) 

The index that we created was… not used?  But is it not the case that the NVL2(C3,NULL,1) function result is the equivalent of the DECODE and the CASE function results?  Let’s give this one another try with a hint:

SELECT /*+ INDEX(T2) */
  C1,
  C2,
  C3,
  NVL2(C3,NULL,1) C4
FROM
  T2
WHERE
  C2='D00000000000000'
  AND NVL2(C3,NULL,1)=1;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'TYPICAL'));

SQL_ID  f510adk7fqwyu, child number 0
-------------------------------------
SELECT /*+ INDEX(T2) */   C1,   C2,   C3,   NVL2(C3,NULL,1) C4 FROM
T2 WHERE   C2='D00000000000000'   AND NVL2(C3,NULL,1)=1

Plan hash value: 2303772603

--------------------------------------------------------------------------------------------
| Id  | Operation                   | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |              |       |       |  3740 (100)|          |
|*  1 |  TABLE ACCESS BY INDEX ROWID| T2           |   385 | 14245 |  3740   (1)| 00:00:01 |
|*  2 |   INDEX FULL SCAN           | IND_T2_C3_FN | 10000 |       |  3223   (1)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("C2"='D00000000000000')
   2 - filter(NVL2("C3",NULL,1)=1) 

That’s better, even if we did receive an INDEX FULL SCAN operation rather than in INDEX RANGE SCAN operation as was the intended result.

Something fun to think about – why did the technique using the CASE syntax that resulted in an INDEX RANGE SCAN operation have a plan with a calculated cost of 10,024, when the plans with the INDEX FULL SCAN operations have a calculated cost of about 3,750?





Full Table Scans and the Buffer Cache in 11.2 – What is Wrong with this Quote?

26 02 2012

February 26, 2012 (Modified February 27, 2012)

I found another interesting quote in the “Oracle Database 11gR2 Performance Tuning Cookbook“, this time related to tables and full table scans.  This quote is found on page 170 of the book:

“If we do an FTS [full table scan], database buffers are used to read all the table data, and this situation may lead to flushing the buffer cache data to make room for the FTS data. To avoid this situation and to limit the consequences on the database buffer cache, the database blocks from FTS operations are put on the top of the LRU (Least Recently Used) list.”

What, if anything, is wrong (and/or right) with the above quote from the book?

Added February 27, 2012:

Part 2:

An additional interesting quote is found on page 176 related to full table scans when indexes are present.  The test case that follows is slightly different than what is presented in the book, however the outcome is the same.  Consider the following table and indexes (note that histograms will be created on columns C1 and C2, and that column C2 will have a single row with a 1 value and 999,999 rows with a 0 value):

CREATE TABLE T1 AS
SELECT
  ROWNUM C1,
  DECODE(ROWNUM,1,1,0) C2,
  LPAD('A',255,'A') C3
FROM
  DUAL
CONNECT BY
  LEVEL<=1000000;

CREATE UNIQUE INDEX IND_T1_C1 ON T1(C1);
CREATE INDEX IND_T1_C2 ON T1(C2);

ALTER TABLE T1 MODIFY (C1 NOT NULL, C2 NOT NULL);

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T1',CASCADE=>TRUE,ESTIMATE_PERCENT=>100,METHOD_OPT=>'FOR ALL INDEXED COLUMNS SIZE 254') 

The test case script:

SET LINESIZE 120
SET PAGESIZE 1000

SELECT
  C1,
  C2
FROM
  T1
WHERE
  C2=1;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'TYPICAL'));
 
SELECT
  C1,
  C2
FROM
  T1
WHERE
  C2<>0;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'TYPICAL'));

The (slightly reformatted) output from my execution of the above script on Oracle Database 11.2.0.2:

SQL> SELECT
  2    C1,
  3    C2
  4  FROM
  5    T1
  6  WHERE
  7    C2=1;

        C1         C2
---------- ----------
         1          1

SQL_ID  8fv30tbr8jdds, child number 0
-------------------------------------
SELECT   C1,   C2 FROM   T1 WHERE   C2=1

Plan hash value: 236868917

-----------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |       |       |     4 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1        |     1 |     8 |     4   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IND_T1_C2 |     1 |       |     3   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C2"=1) 

---

SQL> SELECT
  2    C1,
  3    C2
  4  FROM
  5    T1
  6  WHERE
  7    C2<>0;

        C1         C2
---------- ----------
         1          1

SQL_ID  bu17044puyhkx, child number 0
-------------------------------------
SELECT   C1,   C2 FROM   T1 WHERE   C2<>0

Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |       |       |  3049 (100)|          |
|*  1 |  TABLE ACCESS FULL| T1   |     1 |     8 |  3049   (1)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("C2"<>0)

Notice in the above execution plans that the Oracle query optimizer correctly determined that only 1 row would be returned in both cases, yet in the first case an index was used, and in the second case a full table scan.  The book states the following:

“Why did the database optimizer switch back to a long-running FTS operation, instead of the previous Index Range Scan? The answer is simple – indexes cannot be used when we compare values with a not equal operator.”

I have seen the above answer, with slight variations, provided in at least two other books.  What, if anything, is wrong (and/or right) with the above quote from the book?

The point of blog articles like this one is not to insult authors who have spent thousands of hours carefully constructing an accurate and helpful book, but instead to suggest that readers investigate when something stated does not exactly match what one believes to be true. It could be that the author “took a long walk down a short pier”, or that the author is revealing accurate information which simply cannot be found through other resources (and may in the process be directly contradicting information sources you have used in the past). If you do not investigate in such cases, you may lose an important opportunity to learn something that could prove to be extremely valuable.





Interesting Index “Facts” – What is Wrong with these Quotes?

22 02 2012

February 22, 2012

There are some amazing (or just plain interesting) facts to be found about how indexes work in Oracle Database.  One Internet site in particular that has been helpful in my understanding of Oracle indexes is Richard Foote’s blog.  I occasionally see information on the Internet, in the Oracle documentation, in Metalink (My Oracle Support), and in books that conflict with one another – just how does one determine what information is correct, and what information is incorrect? 

If you paid money to learn something, should you just assume that what you are learning is correct?  What if test cases are included, should you just assume that the test case is showing what the document claims that the test case shows?  What if, you never ask “What if?” … what if, you do not know when to ask “What if?”

As I mentioned in a previous article, I am in the process of reading and reviewing the book “Oracle Database 11gR2 Performance Tuning Cookbook“, and I seem to be spending quite a bit of time asking What if … or why does this statement not agree with what I believe to be true?  For this blog article, I have pulled seven quotes from the book that will be mentioned in my review of the book – I thought that I would give the readers of this blog an opportunity to ask, “What if”.  As always, I attempted to keep the length of the quotes to a minimum – just long enough so that the context (and meaning) of the quote is not lost.

#1 Page 88:

Faster scans: tables and indexes that require full scans can see faster performance when placed in a large block size.”

#2 Page 119:

“We create the IX3_CUSTOMERS index to show how we can create a descending index, that is, an index which stores the data for one or more fields in descending order…

We can also create a function-based descending index.”

#3 Page 119:

“… this test allows us to dispel a myth. Oracle uses the indexes even if the leading columns are not referenced in the WHERE predicate of the query.  We can see that in such a case, the operation will be an INDEX FAST FULL SCAN.  Conversely, when we reference the leading column of the index, we will end up in an INDEX RANGE SCAN.

#4 Page 120:

“The last type [descending indexes] is useful when we want to use queries in which we want the data sorted in a mixed ascending/descending way on the fields indexed. If we use a regular index to access the data, Oracle is unable to do the sort in a mixed way, in a query like this…  ORDER BY FIELD1 DESC, FIELD2

#5 Page 127:

“However, when we have a table on which there are many INSERTs and DELETEs, we could schedule an index rebuild, because when deleting an index entry, the space is not freed in the index leaf, but just marked as deleted. If we have massive DELETE and INSERT operations, we could have a skewed index structure, which could slow performance due to intra-block fragmentation.” 

#6 Page 127:

“If the value for DEL_LF_ROWS/LF_ROWS is greater than 2, or LF_ROWS is lower than LF_BLKS, or HEIGHT is 4 then the index should be rebuilt.”

#7 Page 139:

“When rows are frequently inserted, deleted, and updated, there is a performance bottleneck if we use a bitmap index. When the index is updated, all the bitmap segments are locked.”

What, if anything, is wrong (and/or right) with the above quotes from the book?

The point of blog articles like this one is not to insult authors who have spent thousands of hours carefully constructing an accurate and helpful book, but instead to suggest that readers investigate when something stated does not exactly match what one believes to be true. It could be that the author “took a long walk down a short pier”, or that the author is revealing accurate information which simply cannot be found through other resources (and may in the process be directly contradicting information sources you have used in the past). If you do not investigate in such cases, you may lose an important opportunity to learn something that could prove to be extremely valuable.





Directly Executing a Query is Slower than Returning a REFCURSOR from a PL/SQL Procedure – What is Wrong with this Quote?

20 02 2012

February 20, 2012

My copy of the “Oracle Database 11gR2 Performance Tuning Cookbook” arrived from Amazon, and I will say that I like the seven steps for solving performance problems that is found on page 12, although the diagram of the process on page 14 may lead to a condition known as Compulsive Tuning Disorder.

I am delighted to see that the book makes use of test case scripts, which often allow the reader to demonstrate that the book’s suggestion works in the reader’s Oracle environment.  One such test case is provided on pages 54-55.  I will not reproduce the test case script here, but the script may be found in the download library for the book, in chapter 2′s 2602_02_StoredProcedure.sql file.  The download library for the book may be found by selecting the book from this page, and entering your email address.

After connecting to the database:

sqlplus /@TESTDB AS SYSDBA

The test case script compares the performance of this SQL statement that is directly executed in SQL*Plus:

SELECT PROD_ID, SUM(AMOUNT_SOLD) FROM SH.SALES GROUP BY PROD_ID; 

With the performance of returning a SYS_REFCURSOR to a SQL*Plus variable that is declared as a REFCURSOR datatype.  The book shows that the SQL statement executed directly in SQL*Plus required 1.26 seconds, while the method using the REFCURSOR required just 0.45 seconds, with the obvious extension being that the second method is more efficient.  I tried the test case, and found that the normal SQL statement executed in SQL*Plus required 0.47 seconds, and the REFCURSOR method required just 0.26 seconds, so my results are consistent with those from the book - the REFCURSOR method shows less Elapsed time just as stated in the book.

What, if anything, is wrong with the above quote (test case) from the book?

The point of blog articles like this one is not to insult authors who have spent thousands of hours carefully constructing an accurate and helpful book, but instead to suggest that readers investigate when something stated does not exactly match what one believes to be true. It could be that the author “took a long walk down a short pier”, or that the author is revealing accurate information which simply cannot be found through other resources (and may in the process be directly contradicting information sources you have used in the past). If you do not investigate in such cases, you may lose an important opportunity to learn something that could prove to be extremely valuable.





Tables and Their Indexes Should be Located on Different Disks – What is Wrong with this Quote?

17 02 2012

February 17, 2012

As I mentioned in a previous blog article, I placed an order for the book “Oracle Database 11gR2 Performance Tuning Cookbook“, mostly because I was curious how the book would compare with the “Oracle Database 11g Performance Tuning Recipes” book (my review of that book was a total of about 35 typewritten pages in length), and some of the other Oracle performance related books that I have reviewed.

After placing my order for the book, I took a quick look at a couple of the book’s preview pages on Amazon, downloaded the sample chapter from the book publisher - chapter 10 “Tuning I/O”, and spent a bit of time reading the sample chapter (making comments in the margins of the printed pages).  I circled the following paragraph (sorry for the long quote – I want to make certain that the context of the quote is not lost), found on page 422 of the book (page 17 in the PDF download):

“We can also distribute tables and related indexes on different disks, to obtain performance gain in both read and write operations. If we have tables and indexes on the same disk, we need to read and write in two different places on the same disk. By dividing the work between two disks, we can perform an index range scan faster than when the index and the table are on the same disk.”

What, if anything, is wrong with the above quote from the book?  For some reason, the acronym MASE came to mind when I read the above paragraph. 

The point of blog articles like this one is not to insult authors who have spent thousands of hours carefully constructing an accurate and helpful book, but instead to suggest that readers investigate when something stated does not exactly match what one believes to be true.  It could be that the author “took a long walk down a short pier”, or that the author is revealing accurate information which simply cannot be found through other resources (and may in the process be directly contradicting information sources you have used in the past).  If you do not investigate in such cases, you may lose an important opportunity to learn something that could prove to be extremely valuable.





Will Enabling a 10046 Trace Make My Query Slower (or Faster)?

13 02 2012

February 13, 2012 (Modified February 14, 2012)

It seems that I often wonder about the things that I read… is it really true?  Several years ago, one of the items that I wondered about is whether or not an execution plan for a SQL statement could change magically simply because the Oracle RDBMS knows that it is being watched (a bit short-sighted, but stay with me for a couple of minutes).  When you first learn to read raw 10046 extended SQL trace files, you might also start to wonder whether or not enabling a 10046 trace could make the Oracle RDBMS suddenly start to behave differently.

Let’s take a quick look at a small portion of a 11.2.0.2 trace file:

PARSING IN CURSOR #371254768 len=53 dep=0 uid=62 oct=3 lid=62 tim=184435767875 hv=1334288504 ad='7ffb8c1be40' sqlid='a75xc757sg83s'
SELECT /*+ INDEX(T3) */
  *
FROM
  T3
WHERE
  C1<=200
END OF STMT
PARSE #371254768:c=0,e=699,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,plh=216260220,tim=184435767874
EXEC #371254768:c=0,e=25,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=216260220,tim=184435767947
WAIT #371254768: nam='SQL*Net message to client' ela= 0 driver id=1413697536 #bytes=1 p3=0 obj#=0 tim=184435767976
WAIT #371254768: nam='db file scattered read' ela= 320 file#=4 block#=4145624 blocks=8 obj#=73036 tim=184435768358
WAIT #371254768: nam='db file scattered read' ela= 276 file#=4 block#=4145616 blocks=8 obj#=73035 tim=184435768704
FETCH #371254768:c=0,e=764,p=16,cr=2,cu=0,mis=0,r=1,dep=0,og=1,plh=216260220,tim=184435768757 

In the above, notice the mis=1 entry on the PARSE #371254768 line – that entry indicates a hard parse, a re-optimization of the SQL statement by the query optimizer.  If this SQL statement used bind variables, a feature known as bind variable peeking (introduced in Oracle 9.0.1) would take place during re-optimization, which might provide the optimizer with more information (with the help of a histogram, for instance) and potentially result in a change to the SQL statement’s execution plan.

(A partially related note: I just recently placed an order for the book “Oracle Database 11gR2 Performance Tuning Cookbook“, and while taking a quick peek at two or three pages in the book’s preview on Amazon, I found an error related to the above mentioned feature.)

Let’s take a look at a quick example of the above situation.  We will start by creating a table, inserting 100,000 rows where 20% of the rows have the value 0 in column C2, creating an index on column C2, and gathering statistics with histograms:

CREATE TABLE T5(
  C1 NUMBER NOT NULL,
  C2 NUMBER NOT NULL,
  C3 VARCHAR2(300) NOT NULL);

INSERT INTO
  T5
SELECT
  ROWNUM C1,
  DECODE(MOD(ROWNUM,5),0,0,ROWNUM) C2,
  RPAD('A',300,'A') C3
FROM
  DUAL
CONNECT BY
  LEVEL<=100000;

COMMIT;

CREATE INDEX IND_T5_C2 ON T5(C2);

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>NULL,TABNAME=>'T5',CASCADE=>TRUE,ESTIMATE_PERCENT=>100,METHOD_OPT=>'FOR ALL COLUMNS SIZE 254') 

Now for the first segment of the test script, where a single row will be selected from the table.  To save space, I will not show the output from the select of the table T5, but will show the DBMS_XPLAN output:

ALTER SESSION SET TRACEFILE_IDENTIFIER = '10046TraceFindMe';

SET LINESIZE 120
SET PAGESIZE 1000
VAR V2 NUMBER
EXEC :V2:=4

SELECT
  C1
FROM
  T5
WHERE
  C2=:V2;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'TYPICAL'));

SQL_ID  cmm43s6usjbk1, child number 0
-------------------------------------
SELECT   C1 FROM   T5 WHERE   C2=:V2

Plan hash value: 3443248787

-----------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |       |       |     2 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID| T5        |     1 |    10 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IND_T5_C2 |     1 |       |     1   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C2"=:V2) 

An index range scan, that is reasonable.  Let’s change the value of the bind variable so that 20% of the rows from the table will be selected, and take a look at the execution plan:

EXEC :V2:=0

SELECT
  C1
FROM
  T5
WHERE
  C2=:V2;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'TYPICAL'));

SQL_ID  cmm43s6usjbk1, child number 0
-------------------------------------
SELECT   C1 FROM   T5 WHERE   C2=:V2

Plan hash value: 3443248787

-----------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |       |       |     2 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID| T5        |     1 |    10 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IND_T5_C2 |     1 |       |     1   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C2"=:V2) 

Notice in the above that the child number is still 0 and that the execution plan remained the same, even though we requested the retrieval of 20% of the rows from table T5.  In Oracle Database 11.1 and above, adaptive cursor sharing might eventually step in to change the execution plan for this bind variable value to use a full table scan, but that change could not happen on the first execution.

Now, we will enable a 10046 trace, set the bind variable value to the second value listed above, and return the execution plan:

ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT FOREVER, LEVEL 8';

EXEC :V2:=0

SELECT
  C1
FROM
  T5
WHERE
  C2=:V2;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'TYPICAL'));

SQL_ID  cmm43s6usjbk1, child number 1
-------------------------------------
SELECT   C1 FROM   T5 WHERE   C2=:V2

Plan hash value: 2002323537

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |       |       |   358 (100)|          |
|*  1 |  TABLE ACCESS FULL| T5   | 19685 |   192K|   358   (1)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("C2"=:V2) 

Notice in the above output that while the SQL_ID has remained constant, the child number is now 1, and the execution plan changed to a full table scan.  Bind variable peeking and the histogram on column C2 provided the optimizer with enough information to result in an execution plan change.

Let’s switch back to the original bind variable value, which will result in a single row being returned:

SELECT
  C1
FROM
  T5
WHERE
  C2=:V2;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'TYPICAL'));

SQL_ID  cmm43s6usjbk1, child number 1
-------------------------------------
SELECT   C1 FROM   T5 WHERE   C2=:V2

Plan hash value: 2002323537

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |       |       |   358 (100)|          |
|*  1 |  TABLE ACCESS FULL| T5   | 19685 |   192K|   358   (1)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("C2"=:V2) 

Note that child cursor 1 is still used – a full table scan where an index range scan was used previous to enabling a 10046 trace.  So, while we have not examined execution times, we have now seen cases where a query could potentially execute faster or slower due to a change in the execution plan after a 10046 extended SQL trace is enabled.

Let’s try one more time with the 10046 trace still enabled:

SELECT
  C1
FROM
  T5
WHERE
  C2=:V2;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'TYPICAL')); 

SQL_ID  cmm43s6usjbk1, child number 2
-------------------------------------
SELECT   C1 FROM   T5 WHERE   C2=:V2

Plan hash value: 3443248787

-----------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |       |       |     2 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID| T5        |     1 |    10 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IND_T5_C2 |     1 |       |     1   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C2"=:V2)

Now the child cursor number is displayed as 2, and the execution plan returned to an index range scan.  Adaptive cursor sharing kicked in and caused a re-optimization of the SQL statement.

Let’s disable the 10046 extended SQL trace:

ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT OFF';

A quick look at a small (marked up) portion of the raw 10046 extended SQL trace file – the first execution with the 10046 trace enabled:

PARSING IN CURSOR #429376080 len=20 dep=0 uid=62 oct=47 lid=62 tim=11461702777 hv=1269726367 ad='7ffb76adba8' sqlid='5dxda7t5uwz4z'
BEGIN :V2:=0; END;
END OF STMT
PARSE #429376080:c=0,e=56,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=0,tim=11461702777
WAIT #429376080: nam='SQL*Net message to client' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=532 tim=11461702853
EXEC #429376080:c=0,e=53,p=0,cr=0,cu=0,mis=0,r=1,dep=0,og=1,plh=0,tim=11461702867
WAIT #429376080: nam='SQL*Net message from client' ela= 1401 driver id=1413697536 #bytes=1 p3=0 obj#=532 tim=11461704296
CLOSE #429376080:c=0,e=12,dep=0,type=0,tim=11461704325
=====================
PARSING IN CURSOR #429376080 len=36 dep=0 uid=62 oct=3 lid=62 tim=11461704495 hv=3045633601 ad='7ffb77363b0' sqlid='cmm43s6usjbk1'
SELECT
  C1
FROM
  T5
WHERE
  C2=:V2
END OF STMT
PARSE #429376080:c=0,e=152,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,plh=0,tim=11461704494   <--- Hard Parse
EXEC #429376080:c=0,e=784,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,plh=2002323537,tim=11461705325
WAIT #429376080: nam='SQL*Net message to client' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=532 tim=11461705357
FETCH #429376080:c=0,e=35,p=0,cr=3,cu=0,mis=0,r=1,dep=0,og=1,plh=2002323537,tim=11461705408
...
WAIT #429376080: nam='SQL*Net message to client' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=532 tim=11463556616
FETCH #429376080:c=0,e=21,p=0,cr=1,cu=0,mis=0,r=4,dep=0,og=1,plh=2002323537,tim=11463556630
STAT #429376080 id=1 cnt=20000 pid=0 pos=1 obj=73243 op='TABLE ACCESS FULL T5 (cr=5813 pr=0 pw=0 time=15413 us cost=358 size=196850 card=19685)'
... 

In the above, notice the mis=1 entry on the PARSE #429376080 line – the optimizer hard parsed the SQL statement.  Moving down a bit in the trace file, the second execution with the changed bind variable value:

PARSING IN CURSOR #429376080 len=20 dep=0 uid=62 oct=47 lid=62 tim=11505336665 hv=1039981032 ad='7ffb873a408' sqlid='857u964yztqg8'
BEGIN :V2:=4; END;
END OF STMT
PARSE #429376080:c=0,e=93,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=0,tim=11505336664
WAIT #429376080: nam='SQL*Net message to client' ela= 2 driver id=1413697536 #bytes=1 p3=0 obj#=532 tim=11505336754
EXEC #429376080:c=0,e=63,p=0,cr=0,cu=0,mis=0,r=1,dep=0,og=1,plh=0,tim=11505336769
WAIT #429376080: nam='SQL*Net message from client' ela= 2382 driver id=1413697536 #bytes=1 p3=0 obj#=532 tim=11505339191
CLOSE #429376080:c=0,e=13,dep=0,type=0,tim=11505339229
=====================
PARSING IN CURSOR #429376080 len=36 dep=0 uid=62 oct=3 lid=62 tim=11505339288 hv=3045633601 ad='7ffb77363b0' sqlid='cmm43s6usjbk1'
SELECT
  C1
FROM
  T5
WHERE
  C2=:V2
END OF STMT
PARSE #429376080:c=0,e=39,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=2002323537,tim=11505339288   <--- NO Hard Parse
EXEC #429376080:c=0,e=20,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=2002323537,tim=11505339355
WAIT #429376080: nam='SQL*Net message to client' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=532 tim=11505339381
FETCH #429376080:c=0,e=296,p=0,cr=4,cu=0,mis=0,r=1,dep=0,og=1,plh=2002323537,tim=11505339693
... 

In the above, notice the mis=0 entry on the PARSE #429376080 line – the optimizer did NOT re-optimize this SQL statement, so the execution plan remained a full table scan.  Now the final section that was influenced by adaptive cursor sharing:

PARSING IN CURSOR #429376080 len=36 dep=0 uid=62 oct=3 lid=62 tim=12566538414 hv=3045633601 ad='7ffb77363b0' sqlid='cmm43s6usjbk1'
SELECT
  C1
FROM
  T5
WHERE
  C2=:V2
END OF STMT
PARSE #429376080:c=0,e=55,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=2002323537,tim=12566538414   <--- NO Hard Parse
EXEC #429376080:c=0,e=1594,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,plh=3443248787,tim=12566540065  <--- Hard Parse
WAIT #429376080: nam='SQL*Net message to client' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=532 tim=12566540198
FETCH #429376080:c=0,e=29,p=0,cr=3,cu=0,mis=0,r=1,dep=0,og=1,plh=3443248787,tim=12566540245 

In the above, notice the mis=0 entry on the PARSE #429376080 line – the optimizer did NOT re-optimize this SQL statement during the parse phase.  However, remember that the cursor number changed from the previous value of 1 to 2; the above EXEC #429376080 contains mis=1, so a hard parse was performed during the execution phase.

You might be thinking that maybe the execution plans can only change if we use bind variables when histograms are present.  Let’s try another experiment.  First, we will create a table, create an index on that table, insert 100,000 rows with the same characteristics as those found in table T5, and NOT collect statistics:

CREATE TABLE T6 (
  C1 NUMBER NOT NULL,
  C2 NUMBER NOT NULL,
  C3 VARCHAR2(300) NOT NULL);

CREATE INDEX IND_T6_C2 ON T6(C2);

INSERT INTO
  T6
SELECT
  ROWNUM C1,
  DECODE(MOD(ROWNUM,5),0,0,ROWNUM) C2,
  RPAD('A',300,'A') C3
FROM
  DUAL
CONNECT BY
  LEVEL<=100000;

COMMIT; 

Let’s repeat the previous test case:

SET LINESIZE 120
SET PAGESIZE 1000
VAR V2 NUMBER
EXEC :V2:=4

SELECT
  C1
FROM
  T6
WHERE
  C2=:V2;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'TYPICAL'));

SQL_ID  9krhnbpjpvpwh, child number 0
-------------------------------------
SELECT   C1 FROM   T6 WHERE   C2=:V2

Plan hash value: 4242288932

-----------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |       |       |     1 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID| T6        |     1 |    26 |     1   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IND_T6_C2 |     1 |       |     1   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C2"=:V2)

Note
-----
   - dynamic sampling used for this statement (level=2) 

The above shows that dynamic sampling was used during the hard parse, and as expected an index range scan was performed.  Part 2 of the previous test script:

EXEC :V2:=0

SELECT
  C1
FROM
  T6
WHERE
  C2=:V2;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'TYPICAL'));

SQL_ID  9krhnbpjpvpwh, child number 0
-------------------------------------
SELECT   C1 FROM   T6 WHERE   C2=:V2

Plan hash value: 4242288932

-----------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |       |       |     1 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID| T6        |     1 |    26 |     1   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IND_T6_C2 |     1 |       |     1   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C2"=:V2)

Note
-----
   - dynamic sampling used for this statement (level=2) 

Again, nothing new – no hard parse when the bind variable was changed, and so the index was still used.  Part 3 of the previous test script:

ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT FOREVER, LEVEL 8';

EXEC :V2:=0

SELECT
  C1
FROM
  T6
WHERE
  C2=:V2;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'TYPICAL'));

SQL_ID  9krhnbpjpvpwh, child number 1
-------------------------------------
SELECT   C1 FROM   T6 WHERE   C2=:V2

Plan hash value: 4242288932

-----------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |       |       |   142 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID| T6        | 34140 |   866K|   142   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IND_T6_C2 | 34140 |       |   142   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C2"=:V2)

Note
-----
   - dynamic sampling used for this statement (level=2) 

There was a hard parse, as is noted by the change in the child cursor number, yet the optimizer still decided that it was most efficient to perform an index range scan.  The optimizer estimated that 34,140 rows would be returned with an estimated size of 866KB.  You might recall that earlier the same test for table T5 showed the following execution plan:

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |       |       |   358 (100)|          |
|*  1 |  TABLE ACCESS FULL| T5   | 19685 |   192K|   358   (1)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("C2"=:V2) 

With a histogram (and without dynamic sampling), the optimizer estimated that 19,685 rows would be returned with an estimated size of 192KB – and in that case a full table scan was performed.  But why was a full table scan not performed when experimenting with dynamic sampling?  Is it possible that if the query optimizer uses dynamic sampling, that the execution plan is NOT subject to change when a 10046 extended SQL trace is enabled, or is there a simplier cause?

Let’s try executing the query with a FULL hint:

SELECT /*+ FULL(T6) */
  C1
FROM
  T6
WHERE
  C2=:V2;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'TYPICAL'));

SQL_ID  bhm179jq4vvca, child number 0
-------------------------------------
SELECT /*+ FULL(T6) */   C1 FROM   T6 WHERE   C2=:V2

Plan hash value: 1930642322

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |       |       |   357 (100)|          |
|*  1 |  TABLE ACCESS FULL| T6   | 34140 |   866K|   357   (1)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("C2"=:V2)

Note
-----
   - dynamic sampling used for this statement (level=2) 

Well, that explains part of the issue.  While the calculated cost of the full table scan access path is 357 (very close to the 358 calculated for the same SQL statement that accessed table T5), the calculated cost of the index access path was only 142 – the lowest calculated cost access path wins.

What if we force an index access path for table T5 with the same value of 0 for the bind variable.  Will the calculated cost also be roughly 142?

SELECT /*+ INDEX(T5) */
  C1
FROM
  T5
WHERE
  C2=:V2;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'TYPICAL'));

SQL_ID  5y34jm13up8xh, child number 0
-------------------------------------
SELECT /*+ INDEX(T5) */   C1 FROM   T5 WHERE   C2=:V2

Plan hash value: 3443248787

-----------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |       |       |  1826 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID| T5        | 19685 |   192K|  1826   (1)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IND_T5_C2 | 19685 |       |    43   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C2"=:V2)  

Obviously, no – otherwise the index access path would have been selected for table T5 also – as shown above, the cost is actually 1826.

—-

The challenge (drifting a bit from the topic of this article): If dynamic sampling is used, and the optimizer knows that a many (possibly 34%) of the rows will be returned when the query is executed, why is an index access path used?  In other words, what causes the index access path to be calculated so cheaply when compared to the calculated cost for the same access path with table T5?  Is the above simply a bug found in Oracle Database 11.2.0.2, or is there a better explanation?

-

Added February 14, 2012:

One person (Narendra) noticed that there appeared to be a bug in the optimizer’s calculations.  If you closely examine the execution plans above, you just might notice that the Cost displayed on the INDEX RANGE SCAN operation’s line of the execution plan (142) is identical to the Cost displayed on its the parent operation TABLE ACCESS BY INDEX ROWID (142).  As is probably well known, a parent operation’s Cost, Buffers, Reads, and various other columns display their child statistics summed with the statistics that are directly associated with the parent operation.  Take another look at a portion of one of the earlier execution plans:

-----------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |       |       |   142 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID| T6        | 34140 |   866K|   142   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IND_T6_C2 | 34140 |       |   142   (0)| 00:00:01 |
----------------------------------------------------------------------------------------- 

The cost displayed for the TABLE ACCESS BY INDEX ROWID operation should be greater than the cost displayed for the index access – unless the optimizer forgot that the table must be accessed when dynamic sampling is performed.  Oracle Database 11.2.0.1 and 11.2.0.2 appear to be affected by this particular bug, while Oracle Database 10.2.0.5 and 11.1.0.7 seem to under-estimate the cost of the table access when dynamic sampling is performed.  As such, if statistics on some of the tables (for instance temp tables) are locked without any statistics set on the tables (to force dynamic sampling), there could be a minor issue that becomes a more significant issue when upgrading to 11.2.0.1 or 11.2.0.2.

Recreating the same test table as earlier:

DROP TABLE T6 PURGE;

CREATE TABLE T6 (
  C1 NUMBER NOT NULL,
  C2 NUMBER NOT NULL,
  C3 VARCHAR2(300) NOT NULL);

CREATE INDEX IND_T6_C2 ON T6(C2);

INSERT INTO
  T6
SELECT
  ROWNUM C1,
  DECODE(MOD(ROWNUM,5),0,0,ROWNUM) C2,
  RPAD('A',300,'A') C3
FROM
  DUAL
CONNECT BY
  LEVEL<=100000;

COMMIT; 

The first portion of the test:

ALTER SESSION SET TRACEFILE_IDENTIFIER = '10053NO10046TraceFindMe';
ALTER SESSION SET EVENTS '10053 TRACE NAME CONTEXT FOREVER, LEVEL 1';

SET LINESIZE 120
SET PAGESIZE 1000
VAR V2 NUMBER
EXEC :V2:=4

SELECT /*+ GATHER_PLAN_STATISTICS */
  C1
FROM
  T6
WHERE
  C2=:V2;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST +PEEKED_BINDS +COST'));

SQL_ID  bxnurhc1nmq2g, child number 0
-------------------------------------
SELECT /*+ GATHER_PLAN_STATISTICS */   C1 FROM   T6 WHERE   C2=:V2

Plan hash value: 4242288932

----------------------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |      1 |        |     1 (100)|      1 |00:00:00.01 |       4 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T6        |      1 |      1 |     1   (0)|      1 |00:00:00.01 |       4 |
|*  2 |   INDEX RANGE SCAN          | IND_T6_C2 |      1 |      1 |     1   (0)|      1 |00:00:00.01 |       3 |
----------------------------------------------------------------------------------------------------------------

Peeked Binds (identified by position):
--------------------------------------
   1 - (NUMBER): 4

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C2"=:V2)

Note
-----
   - dynamic sampling used for this statement (level=2)

The second portion of the test:

EXEC :V2:=0

SELECT /*+ GATHER_PLAN_STATISTICS */
  C1
FROM
  T6
WHERE
  C2=:V2;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST +PEEKED_BINDS +COST'));

SQL_ID  bxnurhc1nmq2g, child number 0
-------------------------------------
SELECT /*+ GATHER_PLAN_STATISTICS */   C1 FROM   T6 WHERE   C2=:V2

Plan hash value: 4242288932

----------------------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |      1 |        |     1 (100)|  20000 |00:00:00.06 |    6981 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T6        |      1 |      1 |     1   (0)|  20000 |00:00:00.06 |    6981 |
|*  2 |   INDEX RANGE SCAN          | IND_T6_C2 |      1 |      1 |     1   (0)|  20000 |00:00:00.03 |    1410 |
----------------------------------------------------------------------------------------------------------------

Peeked Binds (identified by position):
--------------------------------------
   1 - (NUMBER): 4

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C2"=:V2)

Note
-----
   - dynamic sampling used for this statement (level=2) 

Notice in the above that the execution plan has not changed, and the peeked bind variable value is still 4 (this is expected).

The third portion of the test:

ALTER SESSION SET TRACEFILE_IDENTIFIER = '1005310046TraceFindMe';
ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT FOREVER, LEVEL 8';

EXEC :V2:=0

SELECT /*+ GATHER_PLAN_STATISTICS */
  C1
FROM
  T6
WHERE
  C2=:V2;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST +PEEKED_BINDS +COST'));

SQL_ID  bxnurhc1nmq2g, child number 1
-------------------------------------
SELECT /*+ GATHER_PLAN_STATISTICS */   C1 FROM   T6 WHERE   C2=:V2

Plan hash value: 4242288932

----------------------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |      1 |        |    49 (100)|  20000 |00:00:00.06 |    6944 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T6        |      1 |  18399 |    49   (0)|  20000 |00:00:00.06 |    6944 |
|*  2 |   INDEX RANGE SCAN          | IND_T6_C2 |      1 |  18399 |    49   (0)|  20000 |00:00:00.03 |    1373 |
----------------------------------------------------------------------------------------------------------------

Peeked Binds (identified by position):
--------------------------------------
   1 - (NUMBER): 0

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C2"=:V2)

Note
-----
   - dynamic sampling used for this statement (level=2) 

Notice in the above that the Cost displayed for the INDEX RANGE SCAN operation is identical to the Cost displayed for the TABLE ACCESS BY INDEX ROWID – also notice the low estimated Cost for the index access (this might be a reasonable value, as will be shown below).

The fourth portion of the test, checking the cost of the full table scan access path:

SELECT /*+ FULL(T6) GATHER_PLAN_STATISTICS */
  C1
FROM
  T6
WHERE
  C2=:V2;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST +PEEKED_BINDS +COST'));

Plan hash value: 1930642322

-------------------------------------------------------------------------------------------------
| Id  | Operation         | Name | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |      1 |        |  1579 (100)|  20000 |00:00:00.05 |    5815 |
|*  1 |  TABLE ACCESS FULL| T6   |      1 |  18399 |  1579   (5)|  20000 |00:00:00.05 |    5815 |
-------------------------------------------------------------------------------------------------

Peeked Binds (identified by position):
--------------------------------------
   1 - (NUMBER): 0

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("C2"=:V2)

Note
-----
   - dynamic sampling used for this statement (level=2)

So, the calculated cost of the full table scan access path is quite high compared to the cost of the index access path.

Let’s try an undocumented OPT_ESTIMATE hint to see what happens:

SELECT /*+ OPT_ESTIMATE(TABLE, T6, ROWS=20000) GATHER_PLAN_STATISTICS */
  C1
FROM
  T6
WHERE
  C2=:V2;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST +PEEKED_BINDS +COST'));

Plan hash value: 4242288932

----------------------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |      1 |        |     1 (100)|  20000 |00:00:00.06 |    6944 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T6        |      1 |  20000 |     1   (0)|  20000 |00:00:00.06 |    6944 |
|*  2 |   INDEX RANGE SCAN          | IND_T6_C2 |      1 |      1 |     1   (0)|  20000 |00:00:00.02 |    1373 |
----------------------------------------------------------------------------------------------------------------

Peeked Binds (identified by position):
--------------------------------------
   1 - (NUMBER): 0

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C2"=:V2)

Now the calculated Cost of the index access path is just 1 – bad idea for a solution.

Let’s collect statistics on the table with a histogram on column C2 to see what happens:

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>NULL,TABNAME=>'T6',CASCADE=>TRUE,ESTIMATE_PERCENT=>100,METHOD_OPT=>'FOR COLUMNS C2 SIZE 254',NO_INVALIDATE=>FALSE)

SELECT /*+ GATHER_PLAN_STATISTICS */
  C1
FROM
  T6
WHERE
  C2=:V2;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST +PEEKED_BINDS +COST'));

SQL_ID  bxnurhc1nmq2g, child number 0
-------------------------------------
SELECT /*+ GATHER_PLAN_STATISTICS */   C1 FROM   T6 WHERE   C2=:V2

Plan hash value: 1930642322

-------------------------------------------------------------------------------------------------
| Id  | Operation         | Name | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |      1 |        |  1582 (100)|  20000 |00:00:00.05 |    5814 |
|*  1 |  TABLE ACCESS FULL| T6   |      1 |  19685 |  1582   (5)|  20000 |00:00:00.05 |    5814 |
-------------------------------------------------------------------------------------------------

Peeked Binds (identified by position):
--------------------------------------
   1 - (NUMBER): 0

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("C2"=:V2)

The optimizer now selected to use a full table scan, rather than an index access path.

Let’s check the cost of the index access path:

SELECT /*+ INDEX(T6) GATHER_PLAN_STATISTICS */
  C1
FROM
  T6
WHERE
  C2=:V2;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST +PEEKED_BINDS +COST'));

SQL_ID  4nqwgus0ghxmc, child number 0
-------------------------------------
SELECT /*+ INDEX(T6) GATHER_PLAN_STATISTICS */   C1 FROM   T6 WHERE
C2=:V2

Plan hash value: 4242288932

----------------------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |      1 |        |  1847 (100)|  20000 |00:00:00.07 |    6944 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T6        |      1 |  19685 |  1847   (2)|  20000 |00:00:00.07 |    6944 |
|*  2 |   INDEX RANGE SCAN          | IND_T6_C2 |      1 |  19685 |    44  (12)|  20000 |00:00:00.02 |    1373 |
----------------------------------------------------------------------------------------------------------------

Peeked Binds (identified by position):
--------------------------------------
   1 - (NUMBER): 0

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C2"=:V2)

-

The results of the test from 10.2.0.5:

The first portion of the test:

ALTER SESSION SET TRACEFILE_IDENTIFIER = '10053NO10046TraceFindMe';
ALTER SESSION SET EVENTS '10053 TRACE NAME CONTEXT FOREVER, LEVEL 1';
 
SET LINESIZE 120
SET PAGESIZE 1000
VAR V2 NUMBER
EXEC :V2:=4
 
SELECT /*+ GATHER_PLAN_STATISTICS */
  C1
FROM
  T6
WHERE
  C2=:V2;
 
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST +PEEKED_BINDS +COST'));
 
SQL_ID  bxnurhc1nmq2g, child number 0
-------------------------------------
SELECT /*+ GATHER_PLAN_STATISTICS */   C1 FROM   T6 WHERE   C2=:V2
 
Plan hash value: 4242288932
 
----------------------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |      1 |        |     2 (100)|      1 |00:00:00.01 |       4 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T6        |      1 |      1 |     2   (0)|      1 |00:00:00.01 |       4 |
|*  2 |   INDEX RANGE SCAN          | IND_T6_C2 |      1 |      1 |     1   (0)|      1 |00:00:00.01 |       3 |
----------------------------------------------------------------------------------------------------------------
 
Peeked Binds (identified by position):
--------------------------------------
   1 - (NUMBER): 4
 
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C2"=:V2)
 
Note
-----
   - dynamic sampling used for this statement

Notice in the above output that the Cost displayed for the TABLE ACCESS BY INDEX ROWID operation is 2, while on 11.2.0.2 it was 1.

The second portion of the test script on 10.2.0.5:

EXEC :V2:=0
  
SELECT /*+ GATHER_PLAN_STATISTICS */
  C1
FROM
  T6
WHERE
  C2=:V2;
 
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST +PEEKED_BINDS +COST'));
 
SQL_ID  bxnurhc1nmq2g, child number 0
-------------------------------------
SELECT /*+ GATHER_PLAN_STATISTICS */   C1 FROM   T6 WHERE   C2=:V2
 
Plan hash value: 4242288932
 
----------------------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |      1 |        |     2 (100)|  20000 |00:00:00.04 |    6969 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T6        |      1 |      1 |     2   (0)|  20000 |00:00:00.04 |    6969 |
|*  2 |   INDEX RANGE SCAN          | IND_T6_C2 |      1 |      1 |     1   (0)|  20000 |00:00:00.01 |    1407 |
----------------------------------------------------------------------------------------------------------------
 
Peeked Binds (identified by position):
--------------------------------------
   1 - (NUMBER): 4
 
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C2"=:V2)
 
Note
-----
   - dynamic sampling used for this statement

No change in the execution plan, as expected.

The third portion of the script on 10.2.0.5:

ALTER SESSION SET TRACEFILE_IDENTIFIER = '1005310046TraceFindMe';
ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT FOREVER, LEVEL 8';
  
EXEC :V2:=0
 
SELECT /*+ GATHER_PLAN_STATISTICS */
  C1
FROM
  T6
WHERE
  C2=:V2;
 
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST +PEEKED_BINDS +COST'));
 
SQL_ID  bxnurhc1nmq2g, child number 1
-------------------------------------
SELECT /*+ GATHER_PLAN_STATISTICS */   C1 FROM   T6 WHERE   C2=:V2
 
Plan hash value: 4242288932
 
----------------------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |      1 |        |   209 (100)|  20000 |00:00:00.02 |    6932 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T6        |      1 |  20556 |   209   (0)|  20000 |00:00:00.02 |    6932 |
|*  2 |   INDEX RANGE SCAN          | IND_T6_C2 |      1 |  20556 |    49   (0)|  20000 |00:00:00.01 |    1370 |
----------------------------------------------------------------------------------------------------------------
 
Peeked Binds (identified by position):
--------------------------------------
   1 - (NUMBER): 0
 
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C2"=:V2)
 
Note
-----
   - dynamic sampling used for this statement

Notice in the output above the that while the cost of the TABLE ACCESS BY INDEX ROWID operation is significantly higher than the INDEX RANGE SCAN child operation, the Cost was not high enough to result in a full table scan.

Checking the calculated cost of a full table scan:

SELECT /*+ FULL(T6) GATHER_PLAN_STATISTICS */
  C1
FROM
  T6
WHERE
  C2=:V2;
 
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST +PEEKED_BINDS +COST'));
 
SQL_ID  bk8rwkf0ab8h5, child number 0
-------------------------------------
SELECT /*+ FULL(T6) GATHER_PLAN_STATISTICS */   C1 FROM   T6 WHERE   C2=:V2
 
Plan hash value: 1930642322
 
-------------------------------------------------------------------------------------------------
| Id  | Operation         | Name | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |      1 |        |   358 (100)|  20000 |00:00:00.02 |    5815 |
|*  1 |  TABLE ACCESS FULL| T6   |      1 |  20556 |   358   (1)|  20000 |00:00:00.02 |    5815 |
-------------------------------------------------------------------------------------------------
 
Peeked Binds (identified by position):
--------------------------------------
   1 - (NUMBER): 0
 
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("C2"=:V2)
 
Note
-----
   - dynamic sampling used for this statement 

358 is greater than 209, so no full table scan without a hint.

OPT_ESTIMATE hint test:

SELECT /*+ OPT_ESTIMATE(TABLE, T6, ROWS=20000) GATHER_PLAN_STATISTICS */
  C1
FROM
  T6
WHERE
  C2=:V2;
 
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST +PEEKED_BINDS +COST')); 
 
SQL_ID  bud5avdhnmssf, child number 0
-------------------------------------
SELECT /*+ OPT_ESTIMATE(TABLE, T6, ROWS=20000) GATHER_PLAN_STATISTICS */   C1 FROM   T6 WHERE   C2=:V2
 
Plan hash value: 4242288932
 
----------------------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |      1 |        |     1 (100)|  20000 |00:00:00.04 |    6932 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T6        |      1 |  20000 |     1   (0)|  20000 |00:00:00.04 |    6932 |
|*  2 |   INDEX RANGE SCAN          | IND_T6_C2 |      1 |      1 |     1   (0)|  20000 |00:00:00.01 |    1370 |
----------------------------------------------------------------------------------------------------------------
 
Peeked Binds (identified by position):
--------------------------------------
   1 - (NUMBER): 0
 
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C2"=:V2)

Gather statistics on 10.2.0.5 with a histogram on column C2, then repeat the test:

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>NULL,TABNAME=>'T6',CASCADE=>TRUE,ESTIMATE_PERCENT=>100,METHOD_OPT=>'FOR COLUMNS C2 SIZE 254',NO_INVALIDATE=>FALSE) 
 
SELECT /*+ GATHER_PLAN_STATISTICS */
  C1
FROM
  T6
WHERE
  C2=:V2;
 
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST +PEEKED_BINDS +COST'));
 
SQL_ID  bxnurhc1nmq2g, child number 0
-------------------------------------
SELECT /*+ GATHER_PLAN_STATISTICS */   C1 FROM   T6 WHERE   C2=:V2
 
Plan hash value: 1930642322
 
-------------------------------------------------------------------------------------------------
| Id  | Operation         | Name | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |      1 |        |   358 (100)|  20000 |00:00:00.02 |    5815 |
|*  1 |  TABLE ACCESS FULL| T6   |      1 |  19685 |   358   (1)|  20000 |00:00:00.02 |    5815 |
-------------------------------------------------------------------------------------------------
 
Peeked Binds (identified by position):
--------------------------------------
   1 - (NUMBER): 0
 
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("C2"=:V2)

The optimizer determined that a full table scan would be most efficient.

Let’s check the cost of the index access path on 10.2.0.5:

SELECT /*+ INDEX(T6) GATHER_PLAN_STATISTICS */
  C1
FROM
  T6
WHERE
  C2=:V2;
 
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST +PEEKED_BINDS +COST'));
 
SQL_ID  4nqwgus0ghxmc, child number 0
-------------------------------------
SELECT /*+ INDEX(T6) GATHER_PLAN_STATISTICS */   C1 FROM   T6 WHERE   C2=:V2
 
Plan hash value: 4242288932
 
----------------------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |      1 |        |  1822 (100)|  20000 |00:00:00.02 |    6932 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T6        |      1 |  19685 |  1822   (1)|  20000 |00:00:00.02 |    6932 |
|*  2 |   INDEX RANGE SCAN          | IND_T6_C2 |      1 |  19685 |    39   (0)|  20000 |00:00:00.01 |    1370 |
----------------------------------------------------------------------------------------------------------------
 
Peeked Binds (identified by position):
--------------------------------------
   1 - (NUMBER): 0
 
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C2"=:V2)

The Cost of 1822 is greater than the Cost of 358 for the full table scan access path, so that explains why the full table scan access path was selected.





SESSIONS Derived Value Formula – Doc Bug Explains

8 02 2012

February 8, 2012

Default parameter values are great, as long as it is possible to predict the default values, and the defaults are appropriate for the environment in which the defaults are present.  It is sometimes a challenge to remember all of the rules, and exceptions to those rules, that determine the defaults. I had a vague recollection of how the value of the SESSIONS parameter value is derived from the value of the PROCESSES parameter value.  An Oracle Database 11.2.0.2 question recently arrived in my email box from an ERP mailing list.  The original poster (OP) showed the following output in the email:

SQL> show parameter sessions

NAME                                 TYPE        VALUE
------------------------------------ ----------- -----
sessions                             integer     624

SQL> show parameter processes

NAME                                 TYPE        VALUE
------------------------------------ ----------- -----
processes                            integer     400 

At first glance, I thought that the OP made a mistake with the manual specification of the SESSIONS parameter value when taking into consideration the current value of the PROCESSES parameter value.

A Google search found a couple of helpful discussion threads that seemed to provide a formula that showed how the default value for the SESSIONS parameter is derived.  I decided to also perform a Google search of the Oracle documentation (trying to avoid circular references).   The same formula found in the helpful discussion threads is also included in the documentation, so that lends a bit of credibility to the information.  Let’s take a look at the Oracle Database 11.2 documentation for the SESSIONS parameter:

Default Value: Derived: (1.1 * PROCESSES) + 5

“… You should always set this parameter explicitly to a value equivalent to your estimate of the maximum number of concurrent users, plus the number of background processes, plus approximately 10% for recursive sessions.

Oracle uses the default value of this parameter as its minimum. Values between 1 and the default do not trigger errors, but Oracle ignores them and uses the default instead.”

A second page from the documentation library included additional information about the PROCESSES parameter:

“Oracle uses the default value of this parameter as its minimum. Values between 1 and the default do not trigger errors, but Oracle ignores them and uses the default instead…”

If you plan on running 50 user processes, a good estimate would be to set the PROCESSES initialization parameter to 70.”

So, let’s take another look at the PROCESSES value specified by the OP, keeping in mind the information from the documentation.  The OP has a database instance with a PROCESSES value of 400, and with the insight provided by the first documentation quote, that would suggest that the SESSIONS parameter would have been autoset to a value of (1.1 * 400) + 5 = 445.  That formula result coupled with the first paragraph of the second quote might lead someone to believe that the OP manually set the SESSIONS value to 624 – there could be a good reason for doing so.  The second paragraph in the second quote from the documentation suggests that OP’s PROCESSES value would allow about (400 – (70 – 50)) =  380 user processes  (each session will require 1 dedicated process, so that value would permit about 380 user sessions).

In an earlier blog article we saw that the auto-tuned value of the DB_FILE_MULTIBLOCK_READ_COUNT parameter is dependent, in part, on the value of the SESSIONS parameter and the size of the SGA.  In another blog article we found that the maximum number of blocks read by serial direct path read is the greatest power of 2 that is equal to or less than the value of the DB_FILE_MULTIBLOCK_READ_COUNT parameter.  Various other items are also affected by the SESSIONS parameter value, so setting the value to too high of a value has few rewards.

So, did the OP make a mistake when setting the SESSIONS parameter value?  I took a look at a test 11.2.0.2 database instance where I had set the PROCESSES parameter to a value of 150 and left the SESSIONS parameter unset.  The SESSIONS parameter value in the 11.2.0.2 database instance showed a value of 248 when I opened the database and executed:

SHOW PARAMETER SESSIONS

Something a bit stange is happening.  Oracle Database 10.2 seemed to respect the 11.2 documentation’s formula for the auto-calculation of the SESSIONS parameter, but Oracle Database 11.2.0.2 seems to use a different formula.  It was late at night when I posted my reply, so I guessed that the formula was about (1.52 * PROCESSES) + 20 (it was too late at night for thinking in terms of basic algebra, so I just picked a set of numbers that worked with my test PROCESSES and SESSIONS parameter values).

Since responding to the email, I took a sampling of a couple of 11.2.0.2 database instances, and summarized those findings below, including the actual derived SESSIONS value and the calculated value using the formula from the documentation:

PROCESSES SESSIONS
(Actual Derived)
SESSIONS
(Documentation Derived)
330 520 368
150 248 170
400 624 445
60 112 71
80 144 93

The actual numbers don’t quite match what the documentation suggests.

Time for a little basic algebra.  The formula from the documentation could be represented as:

(x * PROCESSES) + y = SESSIONS

If the structure of the above formula is still correct, just with unknown values for x and y, we will need at least two sets of equations to determine the value of the x and y variables.  To find the x and y variable values, we will use the known PROCESSES and actual derived SESSIONS values from first two rows of the above table:

(x * 330) + y = 520
(x * 150) + y = 248

Solving for y in the first equation:

y = 520 - (x * 330)
y = 520 - 330x

Substituting the above value of y into the second equation to solve for x:

(x * 150) + (520 - 330x) = 248
150x + 520 - 330x = 248
-180x = -272
x = 272/180
x = 1.51111111

Plug the value for x into the above equation for y:

y = 520 - 330x
y = 520 - 330*(1.51111111)
y = 21.3333337

So, based on the values from the first two rows from the above table, in Oracle Database 11.2.0.2:

SESSIONS = (1.51111111 * PROCESSES) + 21.3333337

The above formula works for the first, second, and fourth rows in the above table, but the formula is a bit off for the remaining two rows:

PROCESSES SESSIONS
(Actual Derived)
SESSIONS
(Documentation Derived)
   Formula 1
330 520 368   520
150 248 170   248
400 624 445   625.77778
60 112 71   112
80 144 93   142.22222

Let’s try solving the formula again using the values from the two rows that did not agree with the new formula for the derived SESSIONS parameter value:

(x * 400) + y = 624
(x * 80) + y = 144

Solving for y in the first equation:

y = 624 - (x * 400)
y = 624 - 400x

Substituting the above value of y into the second equation to solve for x:

(x * 80) + (624 - 400x) = 144
80x - 400x + 624 = 144
-320x = -480
x = 480/320
x = 1.5

Plug the value for x into the above equation for y:

y = 624 - 400 * 1.5
y = 624 - 600
y = 24

The above result leads to the following formula:

SESSIONS = (1.5 * PROCESSES) + 24

Let’s take a look at the values generated by the two formulas side-by-side with the actual derived SESSIONS values:

PROCESSES SESSIONS
(Actual Derived)
SESSIONS
(Documentation Derived)
  Formula 1 Formula 2
330 520 368   520 519
150 248 170   248 249
400 624 445   625.77778 624
60 112 71   112 114
80 144 93   142.22222 144

Interesting results… Doc Bug?  Why three formulas (plus the guess that I wrote in the email)?  No wonder it is sometimes difficult to remember the default derived values for parameters.

What about this suggestion from the documentation:

“If you plan on running 50 user processes, a good estimate would be to set the PROCESSES initialization parameter to 70.”

That suggestion might also be out of date.  From a generic test 11.2.0.2 Standard Edition database instance running on Windows:

SELECT
  COUNT(*)
FROM
  V$SESSION
WHERE
  USERNAME IS NULL;

COUNT(*)
--------
      27

SELECT
  PROGRAM
FROM
  V$SESSION
WHERE
  USERNAME IS NULL
ORDER BY
  PROGRAM;

PROGRAM
-----------------
ORACLE.EXE (ARC0)
ORACLE.EXE (ARC1)
ORACLE.EXE (ARC2)
ORACLE.EXE (ARC3)
ORACLE.EXE (CJQ0)
ORACLE.EXE (CKPT)
ORACLE.EXE (DBRM)
ORACLE.EXE (DBW0)
ORACLE.EXE (DIA0)
ORACLE.EXE (DIAG)
ORACLE.EXE (GEN0)
ORACLE.EXE (J000)
ORACLE.EXE (J001)
ORACLE.EXE (LGWR)
ORACLE.EXE (MMAN)
ORACLE.EXE (MMNL)
ORACLE.EXE (MMON)
ORACLE.EXE (PMON)
ORACLE.EXE (PSP0)
ORACLE.EXE (Q001)
ORACLE.EXE (Q002)
ORACLE.EXE (QMNC)
ORACLE.EXE (RECO)
ORACLE.EXE (SMCO)
ORACLE.EXE (SMON)
ORACLE.EXE (VKTM)
ORACLE.EXE (W000) 

Interesting results… Doc Bug?  So this generic Standard Edition database has 27 background processes.  Maybe this page in the documentation needs a little work also?  Ah, Doc Bug explains it.





Oracle Query Optimizer Vanishing Acts

3 02 2012

February 3, 2012

A couple of days ago I noticed an interesting thread in the comp.databases.oracle.server Usenet group that described a problem of vanishing tables.  The title of the thread certainly caught my attention, and I was a bit disappointed when I found that the there was little to no magic involved in the vanishing act.  The situation reported in the thread was not that the table and its data simply disappeared, but instead that a table referenced in a SQL statement simply did not appear in an execution plan.  While not quite as entertaining as the TV show Magic’s Biggest Secrets Finally Revealed, the thread is worth reading, with discussion of enhancements provided in recent Oracle Database releases.

Almost two years ago I wrote a blog article that showed a similar vanishing act, where a SQL statement referencing the same table eight times triggered an interesting enhancement:

SELECT /*+ USE_NL(T51 T52 T53 T54 T55 T56 T57 T58) */
  T51.C1,
  T51.C2,
  T52.C1,
  T52.C2,
  T53.C1,
  T53.C2,
  T54.C1,
  T54.C2,
  T55.C1,
  T55.C2,
  T56.C1,
  T56.C2,
  T57.C1,
  T57.C2,
  T58.C1,
  T58.C2
FROM
  T5 T51,
  T5 T52,
  T5 T53,
  T5 T54,
  T5 T55,
  T5 T56,
  T5 T57,
  T5 T58
WHERE
  T51.C1=T52.C1
  AND T51.C1=T53.C1
  AND T51.C1=T54.C1
  AND T51.C1=T55.C1
  AND T51.C1=T56.C1
  AND T51.C1=T57.C1
  AND T51.C1=T58.C1
  AND T51.C1 BETWEEN 'A 0000000' AND 'A 1000000'; 

Oracle Database 11.1.0.6 behaved as expected, where table T5 was included eight times in the execution plan for the above SQL statement.  Oracle Database 11.2.0.1 output the following execution plan for the above SQL statement:

Plan hash value: 2002323537

------------------------------------------------------------------------------------
| Id  | Operation         | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |      1 |        |   1000K|00:00:00.50 |   26055 |
|*  1 |  TABLE ACCESS FULL| T5   |      1 |   1000K|   1000K|00:00:00.50 |   26055 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(("T51"."C1">='A 0000000' AND "T51"."C1"<='A 1000000')) 

A fine example of becoming eight times as efficient, while producing the same output.

Coincidentally, the same day that I saw the above mentioned Usenet thread, an email arrived from an ERP mailing list.  The original poster (OP) in the ERP mailing list reported that a SQL statement similar to the following was possibly causing a report to fail when the report was viewed by a typical ERP user:

SELECT
  C.TERRITORY,
  C.TAX_ID_NUMBER,
  O.CONTACT_FIRST_NAME,
  O.CONTACT_LAST_NAME,
  O.CONTACT_PHONE,
  O.CONTACT_FAX
FROM
  CUSTOMERS C,
  CUSTOMER_ORDERS O
WHERE
  C.ID = :CO_CUSTOMER_ID
  AND O.ID = :CO_ID ; 

Notice in the above that there is no declared join between the CUSTOMERS and CUSTOMER_ORDERS tables.  The ID column in each table is the primary key column, so at most one row will be returned from each table.  There is a declared foreign key constraint on the CUSTOMER_ORDERS.CUSTOMER_ID column that points to the CUSTOMERS.ID column (you might be wondering if that foreign key constraint might generate an additional predicate in the optimized version of the WHERE clause).  I suppose that if we want to be technical, we could state that the query creates a Cartesian join between the CUSTOMERS and CUSTOMER_ORDERS table, but in this case I do not see this Cartesian join as a problem since each row source will return at most one row.

Let’s try a quick experiment in SQL*Plus with Oracle Database 11.2.0.2 to see what the execution plan looks like for the above SQL statement:

EXEC :CO_CUSTOMER_ID:='CLA/COM/STA'
EXEC :CO_ID:='10002'

SET LINESIZE 120
SET PAGESIZE 1000

SELECT /*+ GATHER_PLAN_STATISTICS */
  C.TERRITORY,
  C.TAX_ID_NUMBER,
  O.CONTACT_FIRST_NAME,
  O.CONTACT_LAST_NAME,
  O.CONTACT_PHONE,
  O.CONTACT_FAX
FROM
  CUSTOMERS C,
  CUSTOMER_ORDERS O
WHERE
  C.ID = :CO_CUSTOMER_ID
  AND O.ID = :CO_ID ;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

Plan hash value: 31577051

----------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name            | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                 |      1 |        |      1 |00:00:00.01 |       5 |
|   1 |  NESTED LOOPS                |                 |      1 |      1 |      1 |00:00:00.01 |       5 |
|   2 |   TABLE ACCESS BY INDEX ROWID| CUSTOMER_ORDERS |      1 |      1 |      1 |00:00:00.01 |       3 |
|*  3 |    INDEX UNIQUE SCAN         | SYS_C0021619    |      1 |      1 |      1 |00:00:00.01 |       2 |
|   4 |   TABLE ACCESS BY INDEX ROWID| CUSTOMERS       |      1 |      1 |      1 |00:00:00.01 |       2 |
|*  5 |    INDEX UNIQUE SCAN         | SYS_C0021612    |      1 |      1 |      1 |00:00:00.01 |       1 |
----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("O"."ID"=:CO_ID)
   5 - access("C"."ID"=:CO_CUSTOMER_ID) 

So, the optimizer managed to recognize that the SQL statement is requesting a single row (the E-Rows column), where each row in the outer table (CUSTOMER_ORDERS) is expected to cause the retrieval of one row from the inner table (CUSTOMERS), and that prediction was accurate based on the values shown in the Starts and A-Rows columns.

Another reader of the ERP mailing list mentioned that the joins between tables should always be specified in SQL statements.  While logically correct, I also recall reading in the “Cost-Based Oracle Fundamentals” book about an interesting side-effect of transitive closure, where join conditions between tables in SQL statements might automatically vanish during query optimization; based on the information found in the book, the vanishing act is Oracle Database version dependent and the behavior may be affected by certain initialization parameters (Oracle Database 10.2 is said to not remove the join condition by default).  The OP in the ERP mailing list is running Oracle Database 8.1.0.7.  If we did modify the query to specify the join condition C.ID=O.CUSTOMER_ID, would that join condition still appear in the “optimized” version of the SQL statement after transitive closure on Oracle Database 8.1.0.7?  What about Oracle Database 9.0.1, 9.2.0.8, 10.2.0.5, or even something more recent such as 11.2.0.2?

Why guess, when you can easily set up a test case script?  First, we will create two sample tables with a declared foreign key relationship.  Each table will be created with a FILLER column that is declared as a VARCHAR2(200) – that column will substitute for the various other columns (in the production version of the tables) that typically consume roughly 200 characters per row:

CREATE TABLE CUSTOMERS (
  ID VARCHAR2(15) NOT NULL ENABLE,
  NAME VARCHAR2(50),
  CONTACT_FIRST_NAME VARCHAR2(30),
  CONTACT_LAST_NAME VARCHAR2(30),
  CONTACT_PHONE VARCHAR2(20),
  CONTACT_FAX VARCHAR2(20),
  TERRITORY VARCHAR2(15),
  TAX_ID_NUMBER VARCHAR2(25),
  FILLER VARCHAR2(200),
  PRIMARY KEY (ID));

CREATE TABLE CUSTOMER_ORDERS (
  ID VARCHAR2(15) NOT NULL ENABLE,
  CUSTOMER_ID VARCHAR2(15) NOT NULL ENABLE,
  CONTACT_FIRST_NAME VARCHAR2(30),
  CONTACT_LAST_NAME VARCHAR2(30),
  CONTACT_PHONE VARCHAR2(20),
  CONTACT_FAX VARCHAR2(20),
  FILLER VARCHAR2(200),
  PRIMARY KEY (ID),
  CONSTRAINT CHK_CUST FOREIGN KEY (CUSTOMER_ID)
    REFERENCES CUSTOMERS (ID) ENABLE); 

Next, we will insert a combination of reproducible and random data into the two tables (with 1,000 rows being inserted into the CUSTOMERS table and 500,000 rows being inserted into the CUSTOMER_ORDERS table), create an index on the foreign key column in the CUSTOMER_ORDERS table, and collect table and index statistics:

INSERT INTO
  CUSTOMERS
SELECT
  RPAD(CHR(MOD(ROWNUM-1,26)+65),8,CHR(MOD(ROWNUM,26)+65))||TO_CHAR(ROWNUM) ID,
  RPAD(CHR(MOD(ROWNUM-1,26)+65),30,CHR(MOD(ROWNUM,20)+65))||TO_CHAR(ROWNUM) NAME,
  RPAD(CHR(MOD(ROWNUM-1,26)+65),10,CHR(MOD(ROWNUM,26)+96)) CONTACT_FIRST_NAME,
  RPAD(CHR(MOD(ROWNUM+1,26)+65),10,CHR(MOD(ROWNUM+2,26)+96)) CONTACT_LAST_NAME,
  '###-###-####' CONTACT_PHONE,
  '###-###-####' CONTACT_FAX,
  DBMS_RANDOM.STRING('A',10) TERRITORY,
  DBMS_RANDOM.STRING('A',20) TAX_ID_NUMBER,
  RPAD('A',200,'A') FILLER
FROM
  DUAL
CONNECT BY
  LEVEL<=1000;

INSERT INTO
  CUSTOMER_ORDERS
SELECT /*+ LEADING(R) */
  TO_CHAR(ROWNUM+10000) ID,
  C.ID CUSTOMER_ID,
  C.CONTACT_FIRST_NAME,
  C.CONTACT_LAST_NAME,
  C.CONTACT_PHONE,
  C.CONTACT_FAX,
  C.FILLER
FROM
  CUSTOMERS C,
  (SELECT
     ROWNUM RN
   FROM
     DUAL
   CONNECT BY
     LEVEL<=500) R;

COMMIT;

CREATE INDEX IND_CO_CUSTOMER ON CUSTOMER_ORDERS (CUSTOMER_ID);

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'CUSTOMERS',CASCADE=>TRUE)
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'CUSTOMER_ORDERS',CASCADE=>TRUE) 

In the test script that follows, I will use the customer order 15000.  In my test case tables, the customer ID for this customer order is LMMMMMMM1000, but your CUSTOMER_ORDERS table might have a different customer ID for that row.  Let’s see which customer ID is associated with customer order 15000:

SELECT
  CUSTOMER_ID
FROM
  CUSTOMER_ORDERS
WHERE
  ID='15000';

CUSTOMER_ID
---------------
LMMMMMMM1000 

The heart of the test script follows.  In the script, change the value LMMMMMMM1000 where the bind variable value is set so that the value matches the CUSTOMER_ID that was returned by the above SQL statement, and set the OPTIMIZER_FEATURES_ENABLE parameter value to the default value for your database version.  The test script first includes the SQL statement that appeared in the OP’s email, followed by a modified version of the SQL statement that also includes the table join condition C.ID=O.CUSTOMER_ID, and a third SQL statement that joins the foreign key column in the CUSTOMER_ORDERS table with the primary key column in the CUSTOMERS table (in substitution of explicitly specifying the C.ID = :CO_CUSTOMER_ID predicate in the WHERE clause).  After each SQL statement is executed, the execution plan is retrieved for that SQL statement.  Will the execution plans for the three SQL statements be the same, or will the execution plans for one or more SQL statements differ?:

VARIABLE CO_CUSTOMER_ID VARCHAR2(15)
VARIABLE CO_ID VARCHAR2(15)

EXEC :CO_CUSTOMER_ID:='LMMMMMMM1000'
EXEC :CO_ID:='15000'

SET LINESIZE 120
SET PAGESIZE 1000

ALTER SESSION SET OPTIMIZER_FEATURES_ENABLE='11.2.0.2';

SELECT /*+ GATHER_PLAN_STATISTICS */
  C.TERRITORY,
  C.TAX_ID_NUMBER,
  O.CONTACT_FIRST_NAME,
  O.CONTACT_LAST_NAME,
  O.CONTACT_PHONE,
  O.CONTACT_FAX
FROM
  CUSTOMERS C,
  CUSTOMER_ORDERS O
WHERE
  C.ID = :CO_CUSTOMER_ID
  AND O.ID = :CO_ID ;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

SELECT /*+ GATHER_PLAN_STATISTICS */
  C.TERRITORY,
  C.TAX_ID_NUMBER,
  O.CONTACT_FIRST_NAME,
  O.CONTACT_LAST_NAME,
  O.CONTACT_PHONE,
  O.CONTACT_FAX
FROM
  CUSTOMERS C,
  CUSTOMER_ORDERS O
WHERE
  C.ID = :CO_CUSTOMER_ID
  AND O.ID = :CO_ID
  AND C.ID=O.CUSTOMER_ID;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

SELECT /*+ GATHER_PLAN_STATISTICS */
  C.TERRITORY,
  C.TAX_ID_NUMBER,
  O.CONTACT_FIRST_NAME,
  O.CONTACT_LAST_NAME,
  O.CONTACT_PHONE,
  O.CONTACT_FAX
FROM
  CUSTOMERS C,
  CUSTOMER_ORDERS O
WHERE
  O.ID = :CO_ID
  AND C.ID=O.CUSTOMER_ID;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST')); 

Repeat the above script several times, using progressively older Oracle Database versions in the ALTER SESSION SET OPTIMIZER_FEATURES_ENABLE line of the script (I used 11.2.0.2, 10.2.0.5, 9.2.0.8, 9.0.1, and 8.1.7).  Will the execution plans remain the same for the various executions of the script, or will the value of the OPTIMIZER_FEATURES_ENABLE parameter impact the execution plan?

Compare your results with those that I obtained below (to shorten the output, I removed the non-essential row that was returned by the SQL statements).

With OPTIMIZER_FEATURES_ENABLE=11.2.0.2:

SQL> SELECT /*+ GATHER_PLAN_STATISTICS */
  2    C.TERRITORY,
  3    C.TAX_ID_NUMBER,
  4    O.CONTACT_FIRST_NAME,
  5    O.CONTACT_LAST_NAME,
  6    O.CONTACT_PHONE,
  7    O.CONTACT_FAX
  8  FROM
  9    CUSTOMERS C,
 10    CUSTOMER_ORDERS O
 11  WHERE
 12    C.ID = :CO_CUSTOMER_ID
 13    AND O.ID = :CO_ID ;

SQL_ID  0vy1zt7jq25fs, child number 0
-------------------------------------
SELECT /*+ GATHER_PLAN_STATISTICS */   C.TERRITORY,   C.TAX_ID_NUMBER,
 O.CONTACT_FIRST_NAME,   O.CONTACT_LAST_NAME,   O.CONTACT_PHONE,
O.CONTACT_FAX FROM   CUSTOMERS C,   CUSTOMER_ORDERS O WHERE   C.ID =
:CO_CUSTOMER_ID   AND O.ID = :CO_ID

Plan hash value: 1588498623

----------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name            | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                 |      1 |        |      1 |00:00:00.01 |       7 |
|   1 |  NESTED LOOPS                |                 |      1 |      1 |      1 |00:00:00.01 |       7 |
|   2 |   TABLE ACCESS BY INDEX ROWID| CUSTOMER_ORDERS |      1 |      1 |      1 |00:00:00.01 |       4 |
|*  3 |    INDEX UNIQUE SCAN         | SYS_C008716     |      1 |      1 |      1 |00:00:00.01 |       3 |
|   4 |   TABLE ACCESS BY INDEX ROWID| CUSTOMERS       |      1 |      1 |      1 |00:00:00.01 |       3 |
|*  5 |    INDEX UNIQUE SCAN         | SYS_C008713     |      1 |      1 |      1 |00:00:00.01 |       2 |
----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("O"."ID"=:CO_ID)
   5 - access("C"."ID"=:CO_CUSTOMER_ID)

SQL>
SQL> SELECT /*+ GATHER_PLAN_STATISTICS */
  2    C.TERRITORY,
  3    C.TAX_ID_NUMBER,
  4    O.CONTACT_FIRST_NAME,
  5    O.CONTACT_LAST_NAME,
  6    O.CONTACT_PHONE,
  7    O.CONTACT_FAX
  8  FROM
  9    CUSTOMERS C,
 10    CUSTOMER_ORDERS O
 11  WHERE
 12    C.ID = :CO_CUSTOMER_ID
 13    AND O.ID = :CO_ID
 14    AND C.ID=O.CUSTOMER_ID;

SQL_ID  6suwfw1m251j0, child number 0
-------------------------------------
SELECT /*+ GATHER_PLAN_STATISTICS */   C.TERRITORY,   C.TAX_ID_NUMBER,
 O.CONTACT_FIRST_NAME,   O.CONTACT_LAST_NAME,   O.CONTACT_PHONE,
O.CONTACT_FAX FROM   CUSTOMERS C,   CUSTOMER_ORDERS O WHERE   C.ID =
:CO_CUSTOMER_ID   AND O.ID = :CO_ID   AND C.ID=O.CUSTOMER_ID

Plan hash value: 1588498623

----------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name            | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                 |      1 |        |      1 |00:00:00.01 |       7 |
|   1 |  NESTED LOOPS                |                 |      1 |      1 |      1 |00:00:00.01 |       7 |
|*  2 |   TABLE ACCESS BY INDEX ROWID| CUSTOMER_ORDERS |      1 |      1 |      1 |00:00:00.01 |       4 |
|*  3 |    INDEX UNIQUE SCAN         | SYS_C008716     |      1 |      1 |      1 |00:00:00.01 |       3 |
|   4 |   TABLE ACCESS BY INDEX ROWID| CUSTOMERS       |      1 |      1 |      1 |00:00:00.01 |       3 |
|*  5 |    INDEX UNIQUE SCAN         | SYS_C008713     |      1 |      1 |      1 |00:00:00.01 |       2 |
----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("O"."CUSTOMER_ID"=:CO_CUSTOMER_ID)
   3 - access("O"."ID"=:CO_ID)
   5 - access("C"."ID"=:CO_CUSTOMER_ID)

SQL>
SQL> SELECT /*+ GATHER_PLAN_STATISTICS */
  2    C.TERRITORY,
  3    C.TAX_ID_NUMBER,
  4    O.CONTACT_FIRST_NAME,
  5    O.CONTACT_LAST_NAME,
  6    O.CONTACT_PHONE,
  7    O.CONTACT_FAX
  8  FROM
  9    CUSTOMERS C,
 10    CUSTOMER_ORDERS O
 11  WHERE
 12    O.ID = :CO_ID
 13    AND C.ID=O.CUSTOMER_ID;

SQL_ID  abbq9g44p111b, child number 0
-------------------------------------
SELECT /*+ GATHER_PLAN_STATISTICS */   C.TERRITORY,   C.TAX_ID_NUMBER,
 O.CONTACT_FIRST_NAME,   O.CONTACT_LAST_NAME,   O.CONTACT_PHONE,
O.CONTACT_FAX FROM   CUSTOMERS C,   CUSTOMER_ORDERS O WHERE   O.ID =
:CO_ID   AND C.ID=O.CUSTOMER_ID

Plan hash value: 1588498623

----------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name            | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                 |      1 |        |      1 |00:00:00.01 |       7 |
|   1 |  NESTED LOOPS                |                 |      1 |      1 |      1 |00:00:00.01 |       7 |
|   2 |   TABLE ACCESS BY INDEX ROWID| CUSTOMER_ORDERS |      1 |      1 |      1 |00:00:00.01 |       4 |
|*  3 |    INDEX UNIQUE SCAN         | SYS_C008716     |      1 |      1 |      1 |00:00:00.01 |       3 |
|   4 |   TABLE ACCESS BY INDEX ROWID| CUSTOMERS       |      1 |   1000 |      1 |00:00:00.01 |       3 |
|*  5 |    INDEX UNIQUE SCAN         | SYS_C008713     |      1 |      1 |      1 |00:00:00.01 |       2 |
----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("O"."ID"=:CO_ID)
   5 - access("C"."ID"="O"."CUSTOMER_ID") 

Notice in the above output that while the SQL_ID changes for the three versions of the SQL statement, the Plan hash value remains 1588498623 (you might obtain a different constant value, such as 3347798118).  The consistent Plan hash value does NOT mean that the Predicate Information section of the execution plan output is identical, nor does it mean that the estimated number of rows will be the same.  If you closely examine the Predicate Information section of the second SQL statement, you might notice that the 11.2.0.2 optimizer introduced an additional predicate, while at the same time removing the explicit join condition between the two tables.  Take a look at the E-Rows column in the last of the execution plans – might this be a sign of a bug.  Should the optimizer really predict that each row of the CUSTOMER_ORDERS table should return 1000 rows from the CUSTOMERS table when there is a declared foreign key relationship that points to the primary key column of the CUSTOMERS table, and is there a chance that this type of bad prediction might adversely affect the execution plans of other SQL statements?

OPTIMIZER_FEATURES_ENABLE=10.2.0.5:
(same as 11.2.0.2)

OPTIMIZER_FEATURES_ENABLE=9.2.0.8
(same as 11.2.0.2)

OPTIMIZER_FEATURES_ENABLE=9.0.1
(same as 11.2.0.2)

With OPTIMIZER_FEATURES_ENABLE=8.1.7:

SQL> SELECT /*+ GATHER_PLAN_STATISTICS */
  2    C.TERRITORY,
  3    C.TAX_ID_NUMBER,
  4    O.CONTACT_FIRST_NAME,
  5    O.CONTACT_LAST_NAME,
  6    O.CONTACT_PHONE,
  7    O.CONTACT_FAX
  8  FROM
  9    CUSTOMERS C,
 10    CUSTOMER_ORDERS O
 11  WHERE
 12    C.ID = :CO_CUSTOMER_ID
 13    AND O.ID = :CO_ID ;

SQL_ID  0vy1zt7jq25fs, child number 4
-------------------------------------
SELECT /*+ GATHER_PLAN_STATISTICS */   C.TERRITORY,   C.TAX_ID_NUMBER,
 O.CONTACT_FIRST_NAME,   O.CONTACT_LAST_NAME,   O.CONTACT_PHONE,
O.CONTACT_FAX FROM   CUSTOMERS C,   CUSTOMER_ORDERS O WHERE   C.ID =
:CO_CUSTOMER_ID   AND O.ID = :CO_ID

Plan hash value: 1588498623

----------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name            | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                 |      1 |        |      1 |00:00:00.01 |       7 |
|   1 |  NESTED LOOPS                |                 |      1 |      1 |      1 |00:00:00.01 |       7 |
|   2 |   TABLE ACCESS BY INDEX ROWID| CUSTOMER_ORDERS |      1 |      1 |      1 |00:00:00.01 |       4 |
|*  3 |    INDEX UNIQUE SCAN         | SYS_C008716     |      1 |      1 |      1 |00:00:00.01 |       3 |
|   4 |   TABLE ACCESS BY INDEX ROWID| CUSTOMERS       |      1 |      1 |      1 |00:00:00.01 |       3 |
|*  5 |    INDEX UNIQUE SCAN         | SYS_C008713     |      1 |      1 |      1 |00:00:00.01 |       2 |
----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("O"."ID"=:CO_ID)
   5 - access("C"."ID"=:CO_CUSTOMER_ID)

Note
-----
   - cpu costing is off (consider enabling it)

SQL>
SQL> SELECT /*+ GATHER_PLAN_STATISTICS */
  2    C.TERRITORY,
  3    C.TAX_ID_NUMBER,
  4    O.CONTACT_FIRST_NAME,
  5    O.CONTACT_LAST_NAME,
  6    O.CONTACT_PHONE,
  7    O.CONTACT_FAX
  8  FROM
  9    CUSTOMERS C,
 10    CUSTOMER_ORDERS O
 11  WHERE
 12    C.ID = :CO_CUSTOMER_ID
 13    AND O.ID = :CO_ID
 14    AND C.ID=O.CUSTOMER_ID;

SQL_ID  6suwfw1m251j0, child number 4
-------------------------------------
SELECT /*+ GATHER_PLAN_STATISTICS */   C.TERRITORY,   C.TAX_ID_NUMBER,
 O.CONTACT_FIRST_NAME,   O.CONTACT_LAST_NAME,   O.CONTACT_PHONE,
O.CONTACT_FAX FROM   CUSTOMERS C,   CUSTOMER_ORDERS O WHERE   C.ID =
:CO_CUSTOMER_ID   AND O.ID = :CO_ID   AND C.ID=O.CUSTOMER_ID

Plan hash value: 1588498623

----------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name            | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                 |      1 |        |      1 |00:00:00.01 |       7 |
|   1 |  NESTED LOOPS                |                 |      1 |      1 |      1 |00:00:00.01 |       7 |
|*  2 |   TABLE ACCESS BY INDEX ROWID| CUSTOMER_ORDERS |      1 |      1 |      1 |00:00:00.01 |       4 |
|*  3 |    INDEX UNIQUE SCAN         | SYS_C008716     |      1 |      1 |      1 |00:00:00.01 |       3 |
|   4 |   TABLE ACCESS BY INDEX ROWID| CUSTOMERS       |      1 |      1 |      1 |00:00:00.01 |       3 |
|*  5 |    INDEX UNIQUE SCAN         | SYS_C008713     |      1 |      1 |      1 |00:00:00.01 |       2 |
----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("O"."CUSTOMER_ID"=:CO_CUSTOMER_ID)
   3 - access("O"."ID"=:CO_ID)
   5 - access("C"."ID"=:CO_CUSTOMER_ID)

Note
-----
   - cpu costing is off (consider enabling it)

SQL>
SQL> SELECT /*+ GATHER_PLAN_STATISTICS */
  2    C.TERRITORY,
  3    C.TAX_ID_NUMBER,
  4    O.CONTACT_FIRST_NAME,
  5    O.CONTACT_LAST_NAME,
  6    O.CONTACT_PHONE,
  7    O.CONTACT_FAX
  8  FROM
  9    CUSTOMERS C,
 10    CUSTOMER_ORDERS O
 11  WHERE
 12    O.ID = :CO_ID
 13    AND C.ID=O.CUSTOMER_ID;

SQL_ID  abbq9g44p111b, child number 4
-------------------------------------
SELECT /*+ GATHER_PLAN_STATISTICS */   C.TERRITORY,   C.TAX_ID_NUMBER,
 O.CONTACT_FIRST_NAME,   O.CONTACT_LAST_NAME,   O.CONTACT_PHONE,
O.CONTACT_FAX FROM   CUSTOMERS C,   CUSTOMER_ORDERS O WHERE   O.ID =
:CO_ID   AND C.ID=O.CUSTOMER_ID

Plan hash value: 1588498623

----------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name            | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                 |      1 |        |      1 |00:00:00.01 |       7 |
|   1 |  NESTED LOOPS                |                 |      1 |      1 |      1 |00:00:00.01 |       7 |
|   2 |   TABLE ACCESS BY INDEX ROWID| CUSTOMER_ORDERS |      1 |      1 |      1 |00:00:00.01 |       4 |
|*  3 |    INDEX UNIQUE SCAN         | SYS_C008716     |      1 |      1 |      1 |00:00:00.01 |       3 |
|   4 |   TABLE ACCESS BY INDEX ROWID| CUSTOMERS       |      1 |   1000 |      1 |00:00:00.01 |       3 |
|*  5 |    INDEX UNIQUE SCAN         | SYS_C008713     |      1 |      1 |      1 |00:00:00.01 |       2 |
----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("O"."ID"=:CO_ID)
   5 - access("C"."ID"="O"."CUSTOMER_ID")

Note
-----
   - cpu costing is off (consider enabling it) 

Other than the note cpu costing is off (consider enabling it), the output with the OPTIMIZER_FEATURES_ENABLED parameter set to 8.1.7 is identical to the output when that parameter was set to 11.2.0.2.

How about testing the real thing… actually executing the SQL statements in the test case script on an older version of Oracle Database – will you receive the same execution plans as shown above?  I tested Oracle Database 10.2.0.5 and obtained the same execution plans as I saw with 11.2.0.2.  Anyone with access to Oracle Database 10.1, 9.2, 9.0.1, or 8.1.7 that is able to test the above script?  The DBMS_XPLAN.DISPLAY_CURSOR function is not available in Oracle Database versions prior to 10.1, so you will need to be creative to display the execution plan (AUTOTRACE may show the wrong execution plan - you might experiment with a solution offered by Tanel Poder for Oracle Database 9i).

As a reminder, you may post execution plans in a blog comment by enclosing the execution plan inside <pre> </pre> tags:

<pre>
Plan hash value: 3347798118
----------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name            | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                 |      1 |        |      1 |00:00:00.01 |       7 |
|   1 |  NESTED LOOPS                |                 |      1 |      1 |      1 |00:00:00.01 |       7 |
|*  2 |   TABLE ACCESS BY INDEX ROWID| CUSTOMER_ORDERS |      1 |      1 |      1 |00:00:00.01 |       4 |
|*  3 |    INDEX UNIQUE SCAN         | SYS_C0042378    |      1 |      1 |      1 |00:00:00.01 |       3 |
|   4 |   TABLE ACCESS BY INDEX ROWID| CUSTOMERS       |      1 |      1 |      1 |00:00:00.01 |       3 |
|*  5 |    INDEX UNIQUE SCAN         | SYS_C0042375    |      1 |      1 |      1 |00:00:00.01 |       2 |
----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("O"."CUSTOMER_ID"=:CO_CUSTOMER_ID)
   3 - access("O"."ID"=:CO_ID)
   5 - access("C"."ID"=:CO_CUSTOMER_ID)
</pre>







Follow

Get every new post delivered to your Inbox.

Join 139 other followers