Histograms and Bind Variables, But Why?

29 01 2011

January 29, 2011

In a recent OTN thread a person asked a couple of good questions about why histograms and the use of bind variables sometimes cause problems.  The questions did not ask whether or not one should use histograms on the table columns where those columns often appear in WHERE clauses, with the columns compared to  bind variables.  Instead, the original poster (OP) asked the all important question WHY.  In a previous article I provided my response to an OTN thread where the OP of that thread wanted to use histograms to fix bind peeking problems.

The specific questions asked in the recent OTN thread include:

When a SQL is using bind variables how histograms affect the excution plan?

Why histograms can’t work well with bind variables?

I remember a document mentioned that “do not use histograms when using bind variables”. But why? 

The answers to these questions have been answered many times in articles written by a number of authors, for example:

Rather than point the OP to one of the above articles, I decided instead to create a test case to demonstrate what could happen on Oracle Database 10.2.0.4 (simulated) and 11.2.0.2 when columns that are compared to bind variables in the WHERE clause also have histograms.  Below is my response, slightly reworded:

—-

Histograms can work with bind variables, but the end result is typically not the desired outcome. Bind variables are used to reduce the number of different execution plans. Histograms are used to help the optimizer find what is supposed to be the best execution plan for the supplied predicates, and in the case of bind variables, those are the peeked values of the bind variables. So, if you have a histogram on a column, and for the initial hard parse of the SQL statement the most common value in that column is submitted in the bind variable – the generated execution plan is considered by the optimizer to be the “best” execution plan for the supplied bind variable values. Now assume that instead, the least popular value in the column is specified – the optimizer could produce a very different execution plan for the same SQL statement, one that is optimized for the least popular value (this might be an index range scan, rather than a full table scan). Assume that the execution plan cannot change when the bind variable values change during future executions – if the table column contains a single popular value and many unpopular values, if the initial hard parse is performed with the single popular value, you could find that all future executions of that SQL statement perform full table scans, even when only a couple of rows from the table are selected.

Here is a quick test case on Oracle Database 11.2.0.2 to demonstrate:

CREATE TABLE T1 (
  C1 NUMBER,
  C2 NUMBER,
  C3 VARCHAR2(300));

INSERT INTO
  T1
SELECT
  *
FROM
  (SELECT
    ROWNUM C1,
    DECODE(MOD(ROWNUM,100),99,99,1) C2,
    RPAD('A',300,'A') C3
  FROM
    DUAL
  CONNECT BY
    LEVEL <= 1000000)
ORDER BY
  C2;

CREATE INDEX IND_T1_C2 ON T1(C2);

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T1',CASCADE=>TRUE,METHOD_OPT=>'FOR ALL INDEXED COLUMNS SIZE 254') 

The above created a table with 1,000,000 rows where 99% of the rows have a value of 1 in column C2 and 1% have a value of 99, and the rows are inserted with a perfect clustering factor due to the ORDER BY clause. A histogram was created on the indexed column.

Let’s try a test, we will pick an unpopular value of 2 for the bind variable when the query is initially hard parsed:

VARIABLE N1 NUMBER
EXEC :N1:=2

SELECT /*+ GATHER_PLAN_STATISTICS */
  C1,
  C2
FROM
  T1
WHERE
  C2 = :N1;

no rows selected

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

SQL_ID  c7su63uw7nch6, child number 0
-------------------------------------
SELECT /*+ GATHER_PLAN_STATISTICS */   C1,   C2 FROM   T1 WHERE   C2 =
:N1

Plan hash value: 236868917

------------------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |      1 |        |      0 |00:00:00.01 |       3 |      1 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1        |      1 |   5957 |      0 |00:00:00.01 |       3 |      1 |
|*  2 |   INDEX RANGE SCAN          | IND_T1_C2 |      1 |   5957 |      0 |00:00:00.01 |       3 |      1 |
------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C2"=:N1) 

So, there were no rows selected, the optimizer predicted that 5,957 rows would be returned, and an index access path was selected for data retrieval. Would this index access path also be appropriate for a bind variable value of 1? Let’s continue the test, this time picking the value 99 for the bind variable:

EXEC :N1:=99
SET TIMING ON

SELECT /*+ GATHER_PLAN_STATISTICS */
  C1,
  C2
FROM
  T1
WHERE
  C2 = :N1;

...
10000 rows selected.

Elapsed: 00:00:05.35

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

SQL_ID  c7su63uw7nch6, child number 0
-------------------------------------
SELECT /*+ GATHER_PLAN_STATISTICS */   C1,   C2 FROM   T1 WHERE   C2 =
:N1

Plan hash value: 236868917

---------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |      1 |        |  10000 |00:00:00.02 |    1783 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1        |      1 |   5957 |  10000 |00:00:00.02 |    1783 |
|*  2 |   INDEX RANGE SCAN          | IND_T1_C2 |      1 |   5957 |  10000 |00:00:00.01 |     690 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C2"=:N1) 

Once again, the execution plan shows that the optimizer predicted 5,957 rows would be retrieved even though 10,000 rows were actually retrieved. Notice also that the child number is still shown as 0, indicating that a hard parse was not performed. Let’s continue the test, this time with a bind variable value of 1:

EXEC :N1:=1

SET AUTOTRACE TRACEONLY STATISTICS

SELECT /*+ GATHER_PLAN_STATISTICS */
  C1,
  C2
FROM
  T1
WHERE
  C2 = :N1;

990000 rows selected.

Elapsed: 00:00:18.78

Statistics
---------------------------------------------------
          1  recursive calls
          1  db block gets
     108571  consistent gets
          0  physical reads
         96  redo size
   21958348  bytes sent via SQL*Net to client
     726508  bytes received via SQL*Net from client
      66001  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
     990000  rows processed

SET AUTOTRACE OFF 

Because I used AUTOTRACE to prevent the 990,000 rows from scrolling on screen, I have to specify the SQL_ID and CHILD_NUMBER to retrieve the execution plan:

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR('c7su63uw7nch6',0,'ALLSTATS LAST'));

SQL_ID  c7su63uw7nch6, child number 0
-------------------------------------
SELECT /*+ GATHER_PLAN_STATISTICS */   C1,   C2 FROM   T1 WHERE   C2 =
:N1

Plan hash value: 236868917

---------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |      1 |        |  10000 |00:00:00.02 |    1783 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1        |      1 |   5957 |  10000 |00:00:00.02 |    1783 |
|*  2 |   INDEX RANGE SCAN          | IND_T1_C2 |      1 |   5957 |  10000 |00:00:00.01 |     690 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C2"=:N1) 

That cannot be the execution plan that was used because it still shows that 10,000 rows were retrieved during the last execution, where the AUTOTRACE statistics showed that 990,000 rows were actually retrieved. Let’s try again, this time retrieving the execution plan for CHILD_NUMBER 1:

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR('c7su63uw7nch6',1,'ALLSTATS LAST'));

SQL_ID  c7su63uw7nch6, child number 1
-------------------------------------
SELECT /*+ GATHER_PLAN_STATISTICS */   C1,   C2 FROM   T1 WHERE   C2 =
:N1

Plan hash value: 3617692013

------------------------------------------------------------------------------------
| Id  | Operation         | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |      1 |        |    990K|00:00:00.83 |     108K|
|*  1 |  TABLE ACCESS FULL| T1   |      1 |    988K|    990K|00:00:00.83 |     108K|
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("C2"=:N1) 

The above shows the actual execution plan that was used (sse the article Explain Plan Lies, Autotrace Lies, TKPROF Lies, What is the Plan?  to see why we cannot use AUTOTRACE or EXPLAIN PLAN to see the actual execution plan). Adaptive cursor sharing (first available with Oracle Database 11.1) stepped in and forced the re-evaluation of the execution plan to prevent a very slow retrieval through the index – that re-evaluation will not happen prior to Oracle Database 11.1 (CURSOR_SHARING=’SIMILAR’ might have the same effect in older Oracle Database releases when literal values are used in the SQL statement).

Just to demonstrate:

ALTER SESSION SET OPTIMIZER_FEATURES_ENABLE='10.2.0.4';

VARIABLE N1 NUMBER
EXEC :N1:=2

SELECT /*+ GATHER_PLAN_STATISTICS */
  C1,
  C2
FROM
  T1
WHERE
  C2 = :N1;

no rows selected

Elapsed: 00:00:00.00

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

SQL_ID  c7su63uw7nch6, child number 2
-------------------------------------
SELECT /*+ GATHER_PLAN_STATISTICS */   C1,   C2 FROM   T1 WHERE   C2 =
:N1

Plan hash value: 236868917

---------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |      1 |        |      0 |00:00:00.01 |       3 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1        |      1 |   5957 |      0 |00:00:00.01 |       3 |
|*  2 |   INDEX RANGE SCAN          | IND_T1_C2 |      1 |   5957 |      0 |00:00:00.01 |       3 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C2"=:N1) 

Note in the above that the CHILD_NUMBER is now 2 because we changed the optimizer’s execution environment (see the articles How to Determine which First Rows OPTIMIZER_MODE was SpecifiedSELECT Statement is Fast, INSERT INTO Using the SELECT Statement is Brutally Slow 3Reviewing Session-Level Parameters to better understand what might trigger a change in the optimizer’s execution environment).

Continuing:

EXEC :N1:=99
SET TIMING ON

SELECT /*+ GATHER_PLAN_STATISTICS */
  C1,
  C2
FROM
  T1
WHERE
  C2 = :N1;

10000 rows selected.

Elapsed: 00:00:05.31

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

SQL_ID  c7su63uw7nch6, child number 2
-------------------------------------
SELECT /*+ GATHER_PLAN_STATISTICS */   C1,   C2 FROM   T1 WHERE   C2 =
:N1

Plan hash value: 236868917

---------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |      1 |        |  10000 |00:00:00.02 |    1783 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1        |      1 |   5957 |  10000 |00:00:00.02 |    1783 |
|*  2 |   INDEX RANGE SCAN          | IND_T1_C2 |      1 |   5957 |  10000 |00:00:00.01 |     690 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C2"=:N1)

The CHILD_NUMBER is still 2, so there was no hard parse.

Continuing:

EXEC :N1:=1

SET AUTOTRACE TRACEONLY STATISTICS

SELECT /*+ GATHER_PLAN_STATISTICS */
  C1,
  C2
FROM
  T1
WHERE
  C2 = :N1;

990000 rows selected.

Elapsed: 00:00:16.91

Statistics
---------------------------------------------------
          0  recursive calls
          0  db block gets
     175927  consistent gets
          0  physical reads
          0  redo size
   21958348  bytes sent via SQL*Net to client
     726508  bytes received via SQL*Net from client
      66001  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
     990000  rows processed

SET AUTOTRACE OFF

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR('c7su63uw7nch6',2,'ALLSTATS LAST'));

SQL_ID  c7su63uw7nch6, child number 2
-------------------------------------
SELECT /*+ GATHER_PLAN_STATISTICS */   C1,   C2 FROM   T1 WHERE   C2 =
:N1

Plan hash value: 236868917

---------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |      1 |        |    990K|00:00:01.63 |     175K|
|   1 |  TABLE ACCESS BY INDEX ROWID| T1        |      1 |   5957 |    990K|00:00:01.63 |     175K|
|*  2 |   INDEX RANGE SCAN          | IND_T1_C2 |      1 |   5957 |    990K|00:00:00.68 |   67932 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C2"=:N1) 

The above is the execution plan for CHILD_NUMBER 2 – notice that this time it is reporting 990,000 rows retrieved, so this IS the execution plan that was used for the bind variable value that exists in 99% of the table rows.  Adaptive cursor sharing did not take effect and force the re-evaluation of the execution plan – the execution plan was NOT changed to a full table scan. That is the risk that you take if you allow histograms to exist on columns that have unequal distributions of values, bind variables are used in the WHERE clause that references the column, and bind variable peeking is enabled (enabled by default in Oracle Database 9i and above, bind variable peeking is controlled by the hidden parameter _OPTIM_PEEK_USER_BINDS, which defaults to TRUE).


Actions

Information

6 responses

29 01 2011
Amardeep Sidhu

Nice post Charles !

30 01 2011
neerajbhatia

Hi Charles,

What a perfect timing! I was working on part-2 of my “Everything you want to know about Oracle Histogram” series and then yesterday I found your post.

Anyway, it’s very well written and left no scope of improvement for me!

Appreciate your good work.

Cheers, Neeraj

31 01 2011
Centinul

Charles —

Excellent work as usual. There is an option to the DISPLAY_CURSOR format parameter that allows you to display the peeked bind variables. So your command would look like:

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST +PEEKED_BINDS'));

Here is the output from your first run with the +PEEKED_BINDS option:

SQL > VARIABLE N1 NUMBER
SQL > EXEC :N1:=2

PL/SQL procedure successfully completed.

SQL > SELECT /*+ GATHER_PLAN_STATISTICS */
  2    C1,
  3    C2
  4  FROM
  5    T1
  6  WHERE
  7    C2 = :N1;

no rows selected

SQL > SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST +PEEKED_BINDS'));

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------

SQL_ID  c7su63uw7nch6, child number 0
-------------------------------------
SELECT /*+ GATHER_PLAN_STATISTICS */   C1,   C2 FROM   T1 WHERE   C2 =
:N1

Plan hash value: 236868917

------------------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |      1 |        |      0 |00:00:00.01 |       3 |   1 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1        |      1 |   4322 |      0 |00:00:00.01 |       3 |   1 |
|*  2 |   INDEX RANGE SCAN          | IND_T1_C2 |      1 |   4322 |      0 |00:00:00.01 |       3 |   1 |
------------------------------------------------------------------------------------------------------------

Peeked Binds (identified by position):
--------------------------------------

   1 - (NUMBER): 2

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("C2"=:N1)

Now running with this:

SQL > VARIABLE N1 NUMBER
SQL > EXEC :N1:=99</code>

We see:

<code>SQL > SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST +PEEKED_BINDS'));

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------

SQL_ID  c7su63uw7nch6, child number 0
-------------------------------------
SELECT /*+ GATHER_PLAN_STATISTICS */   C1,   C2 FROM   T1 WHERE   C2 =
:N1

Plan hash value: 236868917

------------------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |      1 |        |  10000 |00:00:00.06 |    1781 |  86 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1        |      1 |   4322 |  10000 |00:00:00.06 |    1781 |  86 |
|*  2 |   INDEX RANGE SCAN          | IND_T1_C2 |      1 |   4322 |  10000 |00:00:00.04 |     690 |   0 |
------------------------------------------------------------------------------------------------------------

Peeked Binds (identified by position):
--------------------------------------

   1 - (NUMBER): 2

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("C2"=:N1)

We still see that the peeked value has not changed at all. I think this new section in the execution plan helps further illustrate this point.

This was all run from an 11.2.0.2 environment.

31 01 2011
Charles Hooper

Centinul,
Great idea for an enhancement, and probably would have saved me from having to derive the fact that the execution plan was not designed just for the current bind variable values.

I mentioned the PEEKED_BINDS format parameter here, but oddly in the example at the end of the article the bind variable values did not display:

http://hoopercharles.wordpress.com/2010/03/01/dbms_xplan-format-parameters/

Jonathan Lewis also covered the topic, including a couple of different ways to examine bind variable values used in SQL statements:

http://jonathanlewis.wordpress.com/2008/07/24/bind-capture/

One of those methods involves querying V$SQL_BIND_DATA, which according to the documentation, will only show the bind variables submitted by the current session for SQL statements executed by the current session:

http://download.oracle.com/docs/cd/E14072_01/server.112/e10820/dynviews_3044.htm

2 02 2011
Ravi

The topics not covered by many are 1) how do we do a good capacity planning for oracle DB using the current workload? What is a good scentific way to arrive at the capacity( CPU, memory and I/O) needs of a DB server? Could you please share your thoughts on these topics?

7 02 2011

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s




Follow

Get every new post delivered to your Inbox.

Join 141 other followers

%d bloggers like this: