Plan Cardinality Estimates Problem with 11.1.0.7 and 11.2.0.1

15 02 2010

February 15, 2010

Here is a fun test on Oracle Database 11.1.0.7 and 11.2.0.1 (all posted tests are from 11.2.0.1).

First, we will create a table and collect statistics for the table with indexes on a couple of the columns and histograms on a couple of the columns:

CREATE TABLE T1 (
  C1 NUMBER NOT NULL,
  C2 NUMBER NOT NULL,
  C3 NUMBER NOT NULL,
  C4 NUMBER NOT NULL,
  C5 VARCHAR2(30) NOT NULL,
  C6 VARCHAR2(30) NOT NULL,
  FILLER VARCHAR2(200),
  PRIMARY KEY (C1));

INSERT INTO T1
SELECT
  ROWNUM,
  ROWNUM,
  TRUNC(ROWNUM/100+1),
  TRUNC(ROWNUM/100+1),
  CHR(65+TRUNC(ROWNUM/10000))||TRUNC(ROWNUM/100+1),
  CHR(65+TRUNC(ROWNUM/10000))||TRUNC(ROWNUM/100+1),
  LPAD('A',200,'A')
FROM
  DUAL
CONNECT BY
  LEVEL<=100000;

COMMIT;

CREATE INDEX IND_T1_C3 ON T1(C3);
CREATE INDEX IND_T1_C4 ON T1(C4);
CREATE INDEX IND_T1_C5 ON T1(C5);

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T1',CASCADE=>TRUE,ESTIMATE_PERCENT=>100,METHOD_OPT=>'FOR COLUMNS SIZE 254 C2, C4, C6')

Just for a review of that GATHER_TABLE_STATS procedure, from the documentation:

When gathering statistics on a table, DBMS_STATS gathers information about the data distribution of the columns within the table. The most basic information about the data distribution is the maximum value and minimum value of the column. However, this level of statistics may be insufficient for the optimizer’s needs if the data within the column is skewed. For skewed data distributions, histograms can also be created as part of the column statistics to describe the data distribution of a given column. Histograms are described in more details in “Viewing Histograms“.

Histograms are specified using the METHOD_OPT argument of the DBMS_STATS gathering procedures. Oracle recommends setting the METHOD_OPT to FOR ALL COLUMNS SIZE AUTO. With this setting, Oracle Database automatically determines which columns require histograms and the number of buckets (size) of each histogram. You can also manually specify which columns should have histograms and the size of each histogram.

The table created by the above script will have 100,000 rows with indexes on columns C3, C4, and C5.  Columns C2, C4, and C6 will have histograms with 254 buckets.  Let’s see the maximum values for the table columns (column C2 values are identical to C1, column C4 values are identical to C3, column C6 values are identical to C5)):

SELECT
  MAX(C1) MAX_C1,
  MAX(C3) MAX_C3,
  MAX(C5) MAX_C5
FROM
  T1;

MAX_C1     MAX_C3 MAX_C5
------ ---------- ------
100000       1001 K1001

The maximum value for column C3 is 1001 – most of the distinct values in that column have 100 matching rows, except for the value 1001 which has a single row.  We will try a couple of tests with my VBS tool for automatically generating DBMS_XPLANs, with the DBMS_XPLAN Type set to “ALLSTATS LAST” and Stats Level set to “ALL“.  First, we will try the maximum value for column C3 (and also C4):

SELECT
  *
FROM
  T1
WHERE
  C3=1001;

The returned execution plan is displayed below (after this point the execution plan will be displayed directly below the SQL statement):

SQL_ID  0vcvak7bgbdzt, child number 0
-------------------------------------
SELECT    *  FROM    T1  WHERE    C3=1001

Plan hash value: 1220227203

---------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |      1 |        |      1 |00:00:00.01 |       3 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1        |      1 |   1000 |      1 |00:00:00.01 |       3 |
|*  2 |   INDEX RANGE SCAN          | IND_T1_C3 |      1 |    400 |      1 |00:00:00.01 |       2 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C3"=1001)

OK, it was more or less expected that the cardinality estimate would be incorrect here because there is no histogram on column C3.  It is a bit odd that the optimizer predicts that the index will return 400 rows that then causes 1000 rows to be returned from the table.  Now let’s try the same SQL statement referencing column C4:

SELECT
  *
FROM
  T1
WHERE
  C4=1001;

SQL_ID  d3zfa447tsjrz, child number 0
-------------------------------------
SELECT    *  FROM    T1  WHERE    C4=1001

Plan hash value: 7035821

---------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |      1 |        |      1 |00:00:00.01 |       3 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1        |      1 |    100 |      1 |00:00:00.01 |       3 |
|*  2 |   INDEX RANGE SCAN          | IND_T1_C4 |      1 |    100 |      1 |00:00:00.01 |       2 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C4"=1001)

100 is closer to the actual number of rows than were predicted for the same SQL statement using column C3, so the histogram probably helped.  What happens when we specify values for C3 and C4 that exceed the maximum values in those columns?

SELECT
  *
FROM
  T1
WHERE
  C3=1101;

SQL_ID  7hy399svng33n, child number 0
-------------------------------------
SELECT    *  FROM    T1  WHERE    C3=1101

Plan hash value: 1220227203

---------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |      1 |        |      0 |00:00:00.01 |       2 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1        |      1 |   1000 |      0 |00:00:00.01 |       2 |
|*  2 |   INDEX RANGE SCAN          | IND_T1_C3 |      1 |    400 |      0 |00:00:00.01 |       2 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C3"=1101)

-

SELECT
  *
FROM
  T1
WHERE
  C4=1101;

SQL_ID  at676unwj7uk1, child number 0
-------------------------------------
SELECT    *  FROM    T1  WHERE    C4=1101

Plan hash value: 7035821

---------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |      1 |        |      0 |00:00:00.01 |       2 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1        |      1 |     90 |      0 |00:00:00.01 |       2 |
|*  2 |   INDEX RANGE SCAN          | IND_T1_C4 |      1 |     90 |      0 |00:00:00.01 |       2 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C4"=1101)

In the above, the SQL statement that accesses column C3 continued to estimate the same number of rows would be returned when the value exceeded the maximum value for the column by roughly 10%.  When the same restriction was applied to column C4, the optimizer predicted that 10% less rows would be returned.  Interesting…

SELECT
  *
FROM
  T1
WHERE
  C3=1201;

SQL_ID  f94b21zwvsn11, child number 0
-------------------------------------
SELECT    *  FROM    T1  WHERE    C3=1201

Plan hash value: 1220227203

---------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |      1 |        |      0 |00:00:00.01 |       2 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1        |      1 |   1000 |      0 |00:00:00.01 |       2 |
|*  2 |   INDEX RANGE SCAN          | IND_T1_C3 |      1 |    400 |      0 |00:00:00.01 |       2 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C3"=1201)

-

SELECT
  *
FROM
  T1
WHERE
  C4=1201;

SQL_ID  4sf3hjx44u1sn, child number 0
-------------------------------------
SELECT    *  FROM    T1  WHERE    C4=1201

Plan hash value: 7035821

---------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |      1 |        |      0 |00:00:00.01 |       2 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1        |      1 |     80 |      0 |00:00:00.01 |       2 |
|*  2 |   INDEX RANGE SCAN          | IND_T1_C4 |      1 |     80 |      0 |00:00:00.01 |       2 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C4"=1201)

Exceeding the maximum value by 20% returned the same cardinality estimate for the SQL statement using column C3 as seen earlier, while the cardinality estimate decreased by 20% for the SQL statement accessing column C4.

SELECT
  *
FROM
  T1
WHERE
  C3=1901;

SQL_ID  86z7nbb5u2p26, child number 0
-------------------------------------
SELECT    *  FROM    T1  WHERE    C3=1901

Plan hash value: 1220227203

---------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |      1 |        |      0 |00:00:00.01 |       2 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1        |      1 |   1000 |      0 |00:00:00.01 |       2 |
|*  2 |   INDEX RANGE SCAN          | IND_T1_C3 |      1 |    400 |      0 |00:00:00.01 |       2 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C3"=1901)

-

SELECT
  *
FROM
  T1
WHERE
  C4=1901;

SQL_ID  08rg4uf562h4x, child number 0
-------------------------------------
SELECT    *  FROM    T1  WHERE    C4=1901

Plan hash value: 7035821

---------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |      1 |        |      0 |00:00:00.01 |       2 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1        |      1 |     10 |      0 |00:00:00.01 |       2 |
|*  2 |   INDEX RANGE SCAN          | IND_T1_C4 |      1 |     10 |      0 |00:00:00.01 |       2 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C4"=1901)

When exceeding the maximum value by 90% we see the same pattern, the cardinality estimates for the first SQL statement were unaffected, while the cardinality estimate for the second SQL statement decreased by 90%.  With a value of 2001 specified for column C4 the optimizer’s predicted cardinality decreased to 1 row.

What about value ranges?

SELECT
  *
FROM
  T1
WHERE
  C3 BETWEEN 1101 AND 1201;

SQL_ID  95zzq8vb523gf, child number 0
-------------------------------------
SELECT    *  FROM    T1  WHERE    C3 BETWEEN 1101 AND 1201

Plan hash value: 1220227203

---------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |      1 |        |      0 |00:00:00.01 |       2 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1        |      1 |    250 |      0 |00:00:00.01 |       2 |
|*  2 |   INDEX RANGE SCAN          | IND_T1_C3 |      1 |    450 |      0 |00:00:00.01 |       2 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C3">=1101 AND "C3"<=1201)

Notice that the optimizer is estimating that 250 rows will be returned, which is less than the sum of the estimated 1,000 rows that would be returned for the value 1101 and the estimated 1,000 rows for value 1201.  Let’s try again specifying column C4:

SELECT
  *
FROM
  T1
WHERE
  C4 BETWEEN 1101 AND 1201;

SQL_ID  9t2fv8dcz7jsv, child number 0
-------------------------------------
SELECT    *  FROM    T1  WHERE    C4 BETWEEN 1101 AND 1201

Plan hash value: 7035821

---------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |      1 |        |      0 |00:00:00.01 |       2 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1        |      1 |    100 |      0 |00:00:00.01 |       2 |
|*  2 |   INDEX RANGE SCAN          | IND_T1_C4 |      1 |     90 |      0 |00:00:00.01 |       2 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C4">=1101 AND "C4"<=1201)

Interesting – the index is predicted to return 90 rows which then causes the table to return 100 rows (edit: Oracle 10.2.0.2 predicts 90 rows for both the index and the table).  Let’s try again specifying ranges that exceed the maximum values by 90% to 100%:

SELECT
  *
FROM
  T1
WHERE
  C3 BETWEEN 1901 AND 2001;

SQL_ID  99btm350uvvbp, child number 0
-------------------------------------
SELECT    *  FROM    T1  WHERE    C3 BETWEEN 1901 AND 2001

Plan hash value: 1220227203

---------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |      1 |        |      0 |00:00:00.01 |       2 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1        |      1 |    250 |      0 |00:00:00.01 |       2 |
|*  2 |   INDEX RANGE SCAN          | IND_T1_C3 |      1 |    450 |      0 |00:00:00.01 |       2 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C3">=1901 AND "C3"<=2001)

-

SELECT
  *
FROM
  T1
WHERE
  C4 BETWEEN 1901 AND 2001;

SQL_ID  bnwyf98m74q26, child number 0
-------------------------------------
SELECT    *  FROM    T1  WHERE    C4 BETWEEN 1901 AND 2001

Plan hash value: 7035821

---------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |      1 |        |      0 |00:00:00.01 |       2 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1        |      1 |    100 |      0 |00:00:00.01 |       2 |
|*  2 |   INDEX RANGE SCAN          | IND_T1_C4 |      1 |     10 |      0 |00:00:00.01 |       2 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C4">=1901 AND "C4"<=2001)

The first query is predicting the same number of rows will be returned as seen earlier.  The second query is again predicting that 100 rows will be returned from the table as a result of the 10 predicted rows that will be returned from the index (edit: Oracle 10.2.0.2 predicts 10 rows for both the index and the table).  When the range was changed to 1101 through 2001, the same cardinality estimates displayed for the range of 1101 to 1201 were again returned for both SQL statements.

If the above accurately depicts what happens when the maximum recorded value for a column is exceeded, what might happen when statistics are not gathered on a regular basis?  Hold that thought.

Let’s try again using a value range for C3 and C4 that are not beyond the maximum values for the columns:

SELECT
  *
FROM
  T1
WHERE
  C3 BETWEEN 101 AND 201;

SQL_ID  8jd9h693mbkvc, child number 0
-------------------------------------
SELECT    *  FROM    T1  WHERE    C3 BETWEEN 101 AND 201

Plan hash value: 1220227203

---------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |      1 |        |  10100 |00:00:00.02 |     547 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1        |      1 |    250 |  10100 |00:00:00.02 |     547 |
|*  2 |   INDEX RANGE SCAN          | IND_T1_C3 |      1 |    450 |  10100 |00:00:00.01 |     124 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C3">=101 AND "C3"<=201)

-

SELECT
  *
FROM
  T1
WHERE
  C4 BETWEEN 101 AND 201;

SQL_ID  2bk0njs0atfpw, child number 0
-------------------------------------
SELECT    *  FROM    T1  WHERE    C4 BETWEEN 101 AND 201

Plan hash value: 7035821

---------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |      1 |        |  10100 |00:00:00.02 |     547 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1        |      1 |  10039 |  10100 |00:00:00.02 |     547 |
|*  2 |   INDEX RANGE SCAN          | IND_T1_C4 |      1 |  10039 |  10100 |00:00:00.01 |     124 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C4">=101 AND "C4"<=201)

Once again, the first SQL statement is estimating that the index range scan will return 450 rows which will then cause the table to return 250 rows.  In actuality, both the index and the table return 10,100 rows (do we possibly have a statistics problem here?).  The second query returns cardinality estimates that are closer to the actual number of rows returned – the histogram helped here.

Let’s try again with a wider value range:

SELECT
  *
FROM
  T1
WHERE
  C3 BETWEEN 101 AND 1401;

SQL_ID  0bapwrbn3ch8j, child number 0
-------------------------------------
SELECT    *  FROM    T1  WHERE    C3 BETWEEN 101 AND 1401

Plan hash value: 1220227203

---------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |      1 |        |  90001 |00:00:00.13 |    4865 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1        |      1 |    250 |  90001 |00:00:00.13 |    4865 |
|*  2 |   INDEX RANGE SCAN          | IND_T1_C3 |      1 |    450 |  90001 |00:00:00.04 |    1090 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C3">=101 AND "C3"<=1401)

-

SELECT
  *
FROM
  T1
WHERE
  C4 BETWEEN 101 AND 1401;

SQL_ID  3ppn43z5ukur6, child number 0
-------------------------------------
SELECT    *  FROM    T1  WHERE    C4 BETWEEN 101 AND 1401

Plan hash value: 3617692013

------------------------------------------------------------------------------------
| Id  | Operation         | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |      1 |        |  90001 |00:00:00.05 |    4147 |
|*  1 |  TABLE ACCESS FULL| T1   |      1 |  89961 |  90001 |00:00:00.05 |    4147 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(("C4">=101 AND "C4"<=1401))

Again, the cardinality estimates and execution plan are the same as before for the query using column C3, even though we are selecting 90% of the rows in the table.  The query using column C4 switched to a full table scan, and has a much more accurate cardinality estimate.  This same pattern holds true regardless of the values specified for the low and high ends of the range.

Let’s switch back to SQL*Plus and try a couple of experiments with bind variables (note that I am actually submitting the SQL statements using my Toy Project so that I do not have to watch all of the rows scroll on the screen):

VARIABLE N1 NUMBER
VARIABLE N2 NUMBER

ALTER SESSION SET STATISTICS_LEVEL='ALL';

EXEC :N1:=101
EXEC :N2:=1401

SET ARRAYSIZE 100

SELECT
  *
FROM
  T1
WHERE
  C3 BETWEEN :N1 AND :N2;

...

SELECT
  *
FROM
  TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

SQL_ID  2ksn64btq6fx4, child number 0
-------------------------------------
SELECT    *  FROM    T1  WHERE    C3 BETWEEN :1 AND :2

Plan hash value: 108045900

----------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |           |      1 |        |  90001 |00:00:00.18 |    4865 |
|*  1 |  FILTER                      |           |      1 |        |  90001 |00:00:00.18 |    4865 |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1        |      1 |    250 |  90001 |00:00:00.13 |    4865 |
|*  3 |    INDEX RANGE SCAN          | IND_T1_C3 |      1 |    450 |  90001 |00:00:00.04 |    1090 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(:1<=:2)
   3 - access("C3">=:1 AND "C3"<=:2)

The same execution plan as returned when we used literals (constants), so bind peeking is probably working.  Now let’s try the query that accesses column C4:

SELECT
  *
FROM
  T1
WHERE
  C4 BETWEEN :N1 AND :N2;

...

SELECT
  *
FROM
  TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

SQL_ID  gwgk5h8u3k4tp, child number 0
-------------------------------------
SELECT    *  FROM    T1  WHERE    C4 BETWEEN :1 AND :2

Plan hash value: 3332582666

-------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |  90001 |00:00:00.10 |    4147 |
|*  1 |  FILTER            |      |      1 |        |  90001 |00:00:00.10 |    4147 |
|*  2 |   TABLE ACCESS FULL| T1   |      1 |  89961 |  90001 |00:00:00.05 |    4147 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(:1<=:2)
   2 - filter(("C4">=:1 AND "C4"<=:2))

Bind variable peeking was used to obtain a close estimate for the cardinalities.  So, what happens when we change the value of bind variable N2?

EXEC :N2:=101

SELECT
  *
FROM
  T1
WHERE
  C3 BETWEEN :N1 AND :N2;

SELECT
  *
FROM
  TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

SQL_ID  2ksn64btq6fx4, child number 0
-------------------------------------
SELECT    *  FROM    T1  WHERE    C3 BETWEEN :1 AND :2

Plan hash value: 108045900

----------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |           |      1 |        |    100 |00:00:00.01 |       7 |
|*  1 |  FILTER                      |           |      1 |        |    100 |00:00:00.01 |       7 |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1        |      1 |    250 |    100 |00:00:00.01 |       7 |
|*  3 |    INDEX RANGE SCAN          | IND_T1_C3 |      1 |    450 |    100 |00:00:00.01 |       3 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(:1<=:2)
   3 - access("C3">=:1 AND "C3"<=:2)

-

SELECT
  *
FROM
  T1
WHERE
  C3 BETWEEN :N1 AND :N2;

SELECT
  *
FROM
  TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

SQL_ID  gwgk5h8u3k4tp, child number 0
-------------------------------------
SELECT    *  FROM    T1  WHERE    C4 BETWEEN :1 AND :2

Plan hash value: 3332582666

-------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |    100 |00:00:00.01 |    3263 |
|*  1 |  FILTER            |      |      1 |        |    100 |00:00:00.01 |    3263 |
|*  2 |   TABLE ACCESS FULL| T1   |      1 |  89961 |    100 |00:00:00.01 |    3263 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(:1<=:2)
   2 - filter(("C4">=:1 AND "C4"<=:2))

The bind variable in the presence of bind peeking with a histogram came back to bit us – ideally Oracle would have used an index range scan for this set of bind variables.

Let’s add another 10,000 rows to the table without gathering statistics and see what happens to the cardinality estimates in the plans:

INSERT INTO T1
SELECT
  (100000+ROWNUM),
  (100000+ROWNUM),
  TRUNC((100000+ROWNUM)/100+1),
  TRUNC((100000+ROWNUM)/100+1),
  CHR(65+TRUNC((100000+ROWNUM)/10000))||TRUNC((100000+ROWNUM)/100+1),
  CHR(65+TRUNC((100000+ROWNUM)/10000))||TRUNC((100000+ROWNUM)/100+1),
  LPAD('A',200,'A')
FROM
  DUAL
CONNECT BY
  LEVEL<=100000;

COMMIT;

ALTER SYSTEM FLUSH SHARED_POOL;

Here are the execution plans:

SELECT
  *
FROM
  T1
WHERE
  C3=1901;

SQL_ID  86z7nbb5u2p26, child number 0
-------------------------------------
SELECT    *  FROM    T1  WHERE    C3=1901

Plan hash value: 1220227203

---------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |      1 |        |    100 |00:00:00.01 |       8 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1        |      1 |   1000 |    100 |00:00:00.01 |       8 |
|*  2 |   INDEX RANGE SCAN          | IND_T1_C3 |      1 |    400 |    100 |00:00:00.01 |       4 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C3"=1901)

-

SELECT
  *
FROM
  T1
WHERE
  C4=1901;

SQL_ID  08rg4uf562h4x, child number 0
-------------------------------------
SELECT    *  FROM    T1  WHERE    C4=1901

Plan hash value: 7035821

---------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |      1 |        |    100 |00:00:00.01 |       8 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1        |      1 |     10 |    100 |00:00:00.01 |       8 |
|*  2 |   INDEX RANGE SCAN          | IND_T1_C4 |      1 |     10 |    100 |00:00:00.01 |       4 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C4"=1901)

The cardinality estimates are unchanged from what we saw earlier.  The first query is estimating 10 times too many rows will be returned, and the second query is estimating 10 times too few rows will be returned.  Let’s delete the rows that were just added and check a couple of statistics:

DELETE FROM
  T1
WHERE
  C1>100000;

100000 rows deleted.

COMMIT;

SELECT
  SUBSTR(COLUMN_NAME,1,3) COL,
  DENSITY,
  NUM_BUCKETS,
  LAST_ANALYZED
FROM
  DBA_TAB_COLUMNS
WHERE
  TABLE_NAME='T1'
ORDER BY
  1;

COL    DENSITY NUM_BUCKETS LAST_ANAL
--- ---------- ----------- ---------
C1
C2      .00001         254 14-FEB-10
C3
C4   .00099998         254 14-FEB-10
C5
C6   .00099998         254 14-FEB-10
FIL

No density values for columns C1, C3, or C5 and the LAST_ANALYZED column is NULL for those same entries (edit: same results on Oracle 10.2.0.2).  Let’s try collecting statistics again without specifying the columns for which histograms should be created:

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T1',CASCADE=>TRUE,ESTIMATE_PERCENT=>100,NO_INVALIDATE=>FALSE)

SELECT
  SUBSTR(COLUMN_NAME,1,3) COL,
  DENSITY,
  NUM_BUCKETS,
  LAST_ANALYZED
FROM
  DBA_TAB_COLUMNS
WHERE
  TABLE_NAME='T1'
ORDER BY
  1;

COL    DENSITY NUM_BUCKETS LAST_ANAL
--- ---------- ----------- ---------
C1      .00001           1 14-FEB-10
C2      .00001           1 14-FEB-10
C3  .000999001           1 14-FEB-10
C4  .000999001           1 14-FEB-10
C5  .000999001           1 14-FEB-10
C6  .000999001           1 14-FEB-10
FIL          1           1 14-FEB-10

Now we have no histograms, but the density and LAST_ANALYZED columns are populated for all rows returned by the above query (edit: same results on Oracle 10.2.0.2).  For fun let’s retry one of the queries that returned odd cardinality estimates earlier:

SELECT
  *
FROM
  T1
WHERE
  C3 BETWEEN 101 AND 201;

SQL_ID  8jd9h693mbkvc, child number 0
-------------------------------------
SELECT    *  FROM    T1  WHERE    C3 BETWEEN 101 AND 201

Plan hash value: 1220227203

---------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |      1 |        |  10100 |00:00:00.02 |     547 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1        |      1 |  10200 |  10100 |00:00:00.02 |     547 |
|*  2 |   INDEX RANGE SCAN          | IND_T1_C3 |      1 |  10200 |  10100 |00:00:00.01 |     124 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C3">=101 AND "C3"<=201)

The estimated cardinality is just about as accurate as it was with the query that accessed column C4 which had a histogram.

What are your thoughts?  Is the METHOD_OPT parameter of DBMS_STATS.GATHER_TABLE_STATS only used to specify how histograms will be collected for columns, or does that parameter also specify for which columns column-level statistics should be gathered?  If you again collect statistics using the DBMS_STATS.GATHER_TABLE_STATS command found near the start of this article, the query of DBA_TAB_COLUMNS returns the following, which is what was originally expected (edit: same results on Oracle 10.2.0.2):

COL    DENSITY NUM_BUCKETS LAST_ANAL
--- ---------- ----------- ---------
C1      .00001           1 14-FEB-10
C2      .00001         254 14-FEB-10
C3  .000999001           1 14-FEB-10
C4   .00099998         254 14-FEB-10
C5  .000999001           1 14-FEB-10
C6   .00099998         254 14-FEB-10
FIL          1           1 14-FEB-10

——-

Follow-up (8 hours after this article was scheduled to appear):

It is important at times to check more than one source in the Oracle documentation (bold/italic emphasis is mine):

“METHOD_OPT – The value controls column statistics collection and histogram creation.”

Here is an example where Christian Antognini tried to drive that point home:

“Since the syntax is FOR ALL INDEXED COLUMNS, you are gathering statistics for all indexed columns only. I.e. not for all columns. FOR ALL COLUMNS should be used for that…”

“When the option FOR ALL INDEXED COLUMNS is specified, columns statistics are gathered only for the indexed columns.”

“My point was that the parameter method_opt not only impacts histograms, but also column statistics.”

Greg Rahn also made the point in this blog article.

“The METHOD_OPT parameter of DBMS_STATS controls two things:

  1. on which columns statistics will be collected
  2. on which columns histograms will be collected (and how many buckets)”

The message of this blog article is to make certain that you know what the GATHER_TABLE_STATS procedure is actually doing when you use the METHOD_OPT parameter (or alter the default value for the parameter).  The apparent benefit from having the histogram in place might actually be a false benefit – it might be that previously you were not updating the column statistics for that column, until you started creating a histogram on that column (assuming that you were previously collecting histograms on other columns in the table, and you assumed that column statistics were updated for the remaining columns).

Incidentally, if instead of using this to collect statistics for the table:

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T1',CASCADE=>TRUE,ESTIMATE_PERCENT=>100,METHOD_OPT=>'FOR COLUMNS SIZE 254 C2, C4, C6')

You used this:

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T1',CASCADE=>TRUE,ESTIMATE_PERCENT=>100,METHOD_OPT=>'FOR ALL INDEXED COLUMNS size skewonly')

You would end up with the following when executing the query against DBA_TAB_COLUMNS:

COL    DENSITY NUM_BUCKETS LAST_ANAL
--- ---------- ----------- ---------
C1      .00001           1 15-FEB-10
C2
C3  .000999001           1 15-FEB-10
C4  .000999001           1 15-FEB-10
C5   .00099998         254 15-FEB-10
C6
FIL

The above output confirms Christian Antognini’s comment in the OTN thread – column-level statistics were not collected for columns C2, C6, and FILLER.  There is effectively no histogram on columns C3 and C4 (a single bucket, therefore no histogram), but the cardinality estimates in the plan will be close because column-level statistics are present for columns C3 and C4.


Actions

Information

5 responses

15 02 2010
Narendra

What are your thoughts? Is the METHOD_OPT parameter of DBMS_STATS.GATHER_TABLE_STATS only used to specify how histograms will be collected for columns, or does that parameter also specify for which columns column-level statistics should be gathered?

Is the “intelligence” in statistics collection, that checks SYS.COL_USAGE$, the reason ?

15 02 2010
Charles Hooper

Narendra,

I am not sure that I understand your question. Please see the follow-up to the blog article (at the end of the blog article) to see if your question is answered in that section.

17 03 2010
coskan

I think Narenda tried to explain the issue Hemant covered here

http://hemantoracledba.blogspot.com/2008/08/more-tests-of-colusage.html

Basically when “for all columns size auto” used data distribution and the workload of the columns (column usage) are considered.mentioned in doc .

16 02 2010
Narendra

Charles,

Apologies. My followup point was completely misguided.

18 03 2010
Charles Hooper

Coskan,

Thanks for filling in the missing details, Narendra’s comment makes a little more sense with the links that you provided.

Incidentally, if you do not want to delete from the data dictionary as Hemant did (with appropriate cautions mentioned), you can see whether or not SYS.COL_USAGE$ is used during a statistics collection with an approach like this (using one of the tables created for the March 18, 2010 blog article)”

EXEC DBMS_STATS.DELETE_TABLE_STATS(OWNNAME=>USER,TABNAME=>'USER_PROGRAM_PERMISSION')
 
ALTER SESSION SET TRACEFILE_IDENTIFIER = 'STATISTICS';
ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT FOREVER, LEVEL 1';
 
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'USER_PROGRAM_PERMISSION',CASCADE=>TRUE,METHOD_OPT=>'FOR ALL COLUMNS SIZE AUTO')
 
ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT OFF';

If you pass the resulting trace file through a program, such as TKPROF, you will see a variety of SQL statements with the dep= set to 1 or greater. One of those SQL statements looks something like this:

SELECT /*+ ordered use_nl(o c cu h) index(u i_user1) index(o i_obj2) 
               index(ci_obj#) index(cu i_col_usage$) 
               index(h i_hh_obj#_intcol#) */
  C.NAME COL_NAME, C.TYPE# COL_TYPE, C.CHARSETFORM COL_CSF,
  C.DEFAULT$ COL_DEF, C.NULL$ COL_NULL, C.PROPERTY COL_PROP,
  C.COL# COL_UNUM, C.INTCOL# COL_INUM, C.OBJ# COL_OBJ, C.SCALE COL_SCALE,
  H.BUCKET_CNT H_BCNT, (T.ROWCNT-H.NULL_CNT)/GREATEST(H.DISTCNT,1) H_PFREQ,
  C.LENGTH COL_LEN, CU.TIMESTAMP CU_TIME, CU.EQUALITY_PREDS CU_EP,
  CU.EQUIJOIN_PREDS CU_EJP, CU.RANGE_PREDS CU_RP, CU.LIKE_PREDS CU_LP,
  CU.NONEQUIJOIN_PREDS CU_NEJP, CU.NULL_PREDS NP
FROM
  SYS.USER$ U, SYS.OBJ$ O, SYS.TAB$ T, SYS.COL$ C,
  SYS.COL_USAGE$ CU, SYS.HIST_HEAD$ H 
WHERE
  :B3 = '0' AND U.NAME = :B2 AND O.OWNER# = U.USER# AND O.TYPE# = 2
  AND O.NAME = :B1 AND O.OBJ# = T.OBJ# AND O.OBJ# = C.OBJ#
  AND C.OBJ# = CU.OBJ#(+) AND C.INTCOL# = CU.INTCOL#(+) AND
  C.OBJ# = H.OBJ#(+) AND C.INTCOL# = H.INTCOL#(+)
UNION ALL 
SELECT /*+ ordered use_nl(c) */
  C.KQFCONAM COL_NAME, C.KQFCODTY COL_TYPE, DECODE(C.KQFCODTY, 1, 1, 0) COL_CSF,
  NULL COL_DEF, 0 COL_NULL, 0 COL_PROP, C.KQFCOCNO COL_UNUM, C.KQFCOCNO COL_INUM,
  O.KQFTAOBJ COL_OBJ, DECODE(C.KQFCODTY, 2, -127, 0) COL_SCALE,
  H.BUCKET_CNT H_BCNT, (ST.ROWCNT-NULL_CNT)/GREATEST(H.DISTCNT,1) H_PFREQ,
  DECODE(C.KQFCODTY, 2, 22, C.KQFCOSIZ) COL_LEN, CU.TIMESTAMP CU_TIME,
  CU.EQUALITY_PREDS CU_EP, CU.EQUIJOIN_PREDS CU_EJP, CU.RANGE_PREDS CU_RP,
  CU.LIKE_PREDS CU_LP, CU.NONEQUIJOIN_PREDS CU_NEJP, CU.NULL_PREDS NP
FROM
  SYS.X$KQFTA O, SYS.TAB_STATS$ ST, SYS.X$KQFCO C,
  SYS.COL_USAGE$ CU, SYS.HIST_HEAD$ H
WHERE
  :B3 != '0' AND :B2 = 'SYS' AND O.KQFTANAM = :B1 AND O.KQFTAOBJ = ST.OBJ#(+)
  AND O.KQFTAOBJ = C.KQFCOTOB AND C.KQFCOTOB = CU.OBJ#(+)
  AND C.KQFCOCNO = CU.INTCOL#(+) AND C.KQFCOTOB = H.OBJ#(+)
  AND C.KQFCOCNO = H.INTCOL#(+);

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s




Follow

Get every new post delivered to your Inbox.

Join 142 other followers

%d bloggers like this: