Parallel Parking – What is causing the Query to be Executed in Parallel?

31 07 2012

July 31, 2012

I encountered an interesting OTN thread that caused me to stop and think for a couple of minutes.  The OP (original poster) in the OTN thread wondered why parallel execution was used for a query when the parallel degree for the table accessed by the query and its indexes were specified to have a parallel degree of 1.  For some reason, while the OP provided DBMS_XPLAN generated execution plan output, the Note section from the output was not included in the OTN post.  With parallel execution enabled in the session (ALTER SESSION ENABLE PARALLEL QUERY;) the execution plan showed a cost of 2,025.  With parallel execution disabled in the session (ALTER SESSION DISABLE PARALLEL QUERY;) the execution plan showed a cost of 36,504.  From this, we can determine that the parallel degree for the query is 20 (36504 / 2025 / 0.9 = 20.03).  The table is partitioned on a date column, apparently partitioned by month.

Interesting?

Could the parallel execution be caused by the PARALLEL_AUTOMATIC_TUNING parameter’s value?  The DBMS_XPLAN output indicates that the OP is using Oracle Database 11.2.0.2, where that parameter is deprecated.  What about the PARALLEL_DEGREE_POLICY parameter?  When that parameter is set to AUTO, the query optimizer is free to plan a parallel execution for a query, even when the tables (and their indexes) accessed by the query are set to a parallel degree of 1.

A table creation script for a couple of tests (note that the table created in the test case is using INTERVAL partitioning using an approach very similar to that found on page 576 in the book Expert Oracle Database Architecture: Oracle Database Programming 9i, 10g, and 11g Techniques and Solutions, Second Edition except that I have a DATE column in the table rather than a TIMESTAMP column):

DROP TABLE T1 PURGE;

CREATE TABLE T1 (
  C1 NUMBER,
  C2 VARCHAR2(10),
  C3 DATE,
  C4 VARCHAR2(20))
  PARTITION BY RANGE(C3)
  INTERVAL(NUMTOYMINTERVAL(1, 'MONTH'))
  ( PARTITION p0 VALUES LESS THAN (TO_DATE('1-1-2007', 'DD-MM-YYYY')));

INSERT INTO
  T1
SELECT
  ROWNUM,
  RPAD(ROWNUM,10,'A'),
  TRUNC(SYSDATE)+MOD(ROWNUM-1,3000),
  RPAD('A',20,'A')
FROM
  (SELECT
    1 C1
  FROM
    DUAL
  CONNECT BY
    LEVEL<100000),
  (SELECT
    1 C1
  FROM
    DUAL
  CONNECT BY
    LEVEL<1000);

COMMIT; 

Let’s see, a Cartesian join between a rowsource with 1,000 rows and a rowsource with 100,000 rows produces (on 11.2.0.1, 11.2.0.2, and 11.2.0.3):

99899001 rows created. 

Interesting, I might have to investigate that oddity at a later time.  I was expecting 1,000 * 100,000 = 100,000,000 rows.

Let’s try a simple test:

select count(*) from t1;

SET LINESIZE 140
SET PAGESIZE 1000

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'TYPICAL'));

SQL_ID  5bc0v4my7dvr5, child number 0
-------------------------------------
select count(*) from t1

Plan hash value: 2705263620

-------------------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Cost (%CPU)| Time     | Pstart| Pstop |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |       | 59271 (100)|          |       |       |
|   1 |  SORT AGGREGATE      |      |     1 |            |          |       |       |
|   2 |   PARTITION RANGE ALL|      |   115M| 59271   (2)| 00:07:55 |     1 |1048575|
|   3 |    TABLE ACCESS FULL | T1   |   115M| 59271   (2)| 00:07:55 |     1 |1048575|
-------------------------------------------------------------------------------------

Note
-----
   - dynamic sampling used for this statement (level=2)

1,048,575 partitions… I was expecting about 100 partitions because there should be 3,000 distinct dates in column C3.  Maybe I should find some time to investigate?  Moving on…

To test the theory that the unexpected parallel execution might be caused by a non-default value for the PARALLEL_DEGREE_POLICY parameter, we need to first use the DBMS_RESOURCE_MANAGER.CALIBRATE_IO procedure to indicate performance metrics for the server.  To save time, I found a blog article (by another OakTable Network member) that shows a back-door approach that does not require the CALIBRATE_IO procedure (the usual warnings apply here – we are technically modifying objects in the SYS schema).  The back-door approach, when connected as the SYS user (note that the script first selects from RESOURCE_IO_CALIBRATE$ so that if values are currently present, those values may be restored after testing completes):

SELECT
  *
FROM
  RESOURCE_IO_CALIBRATE$;

DELETE FROM
  RESOURCE_IO_CALIBRATE$;

INSERT INTO
  RESOURCE_IO_CALIBRATE$
VALUES(
  CURRENT_TIMESTAMP,
  CURRENT_TIMESTAMP,
  0,
  0,
  200,
  0,
  0);

COMMIT; 

I will start by posting results that I obtained from Oracle Database 11.2.0.3.

Let’s try the previous select from T1 again (note that I am first setting the PARALLEL_DEGREE_POLICY parameter at the session level to its default value):

ALTER SESSION SET PARALLEL_DEGREE_POLICY=MANUAL;

select count(*) from t1;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR('5bc0v4my7dvr5',NULL,'TYPICAL'));

SQL_ID  5bc0v4my7dvr5, child number 0
-------------------------------------
select count(*) from t1

Plan hash value: 2705263620

-------------------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Cost (%CPU)| Time     | Pstart| Pstop |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |       | 59271 (100)|          |       |       |
|   1 |  SORT AGGREGATE      |      |     1 |            |          |       |       |
|   2 |   PARTITION RANGE ALL|      |   115M| 59271   (2)| 00:00:03 |     1 |1048575|
|   3 |    TABLE ACCESS FULL | T1   |   115M| 59271   (2)| 00:00:03 |     1 |1048575|
-------------------------------------------------------------------------------------

Note
-----
   - dynamic sampling used for this statement (level=2)

Let’s try again with a modified value for that parameter:

ALTER SESSION SET PARALLEL_DEGREE_POLICY=AUTO;

select count(*) from t1;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR('5bc0v4my7dvr5',NULL,'TYPICAL'));

SQL_ID  5bc0v4my7dvr5, child number 2
-------------------------------------
select count(*) from t1

Plan hash value: 974985148

------------------------------------------------------------------------------------------------------------------------
| Id  | Operation              | Name     | Rows  | Cost (%CPU)| Time     | Pstart| Pstop |    TQ  |IN-OUT| PQ Distrib |
------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |          |       | 32790 (100)|          |       |       |        |      |            |
|   1 |  SORT AGGREGATE        |          |     1 |            |          |       |       |        |      |            |
|   2 |   PX COORDINATOR       |          |       |            |          |       |       |        |      |            |
|   3 |    PX SEND QC (RANDOM) | :TQ10000 |     1 |            |          |       |       |  Q1,00 | P->S | QC (RAND)  |
|   4 |     SORT AGGREGATE     |          |     1 |            |          |       |       |  Q1,00 | PCWP |            |
|   5 |      PX BLOCK ITERATOR |          |   115M| 32790   (2)| 00:00:02 |     1 |1048575|  Q1,00 | PCWC |            |
|*  6 |       TABLE ACCESS FULL| T1       |   115M| 32790   (2)| 00:00:02 |     1 |1048575|  Q1,00 | PCWP |            |
------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   6 - access(:Z>=:Z AND :Z<=:Z)

Note
-----
   - dynamic sampling used for this statement (level=2)
   - automatic DOP: Computed Degree of Parallelism is 2 

In the above, the Note section indicates that the automatic degree of parallelism was set to 2, and that dynamic sampling was performed at level 2.  I have yet to collect statistics on table T1, so that explains why dynamic sampling was used when the query was executed with and without parallel execution.  If you take a close look at the OTN thread that is linked to at the start of this blog article, you will see that the dynamic sampling level was set to 6, but only for the parallel execution.  The simple answer for this change in behavior is that it is expected, as described in this article by Maria Colgan (another OakTable Network member).  We can confirm that the cost calculation is working as expected: 59271 / 32790 / 0.9 = 2.01.

Let’s try executing the query of table T1 using a WHERE clause that is similar to what is found in the OTN thread:

ALTER SESSION SET PARALLEL_DEGREE_POLICY=MANUAL;

select count(*) from t1
WHERE C3 BETWEEN to_date('01/08/2012', 'dd/mm/yyyy') and to_date('14/10/2012', 'dd/mm/yyyy');

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'TYPICAL +OUTLINE'));

SQL_ID  1ggk1hy0r22kt, child number 0
-------------------------------------
select count(*) from t1 WHERE C3 BETWEEN to_date('01/08/2012',
'dd/mm/yyyy') and to_date('14/10/2012', 'dd/mm/yyyy')

Plan hash value: 2744578615

--------------------------------------------------------------------------------------------------
| Id  | Operation                 | Name | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |      |       |       |   235K(100)|          |       |       |
|   1 |  SORT AGGREGATE           |      |     1 |     9 |            |          |       |       |
|   2 |   PARTITION RANGE ITERATOR|      |  2286K|    19M|   235K(100)| 00:00:10 |    69 |    71 |
|*  3 |    TABLE ACCESS FULL      | T1   |  2286K|    19M|   235K(100)| 00:00:10 |    69 |    71 |
--------------------------------------------------------------------------------------------------

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('11.2.0.3')
      DB_VERSION('11.2.0.3')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$1")
      FULL(@"SEL$1" "T1"@"SEL$1")
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter(("C3">=TO_DATE(' 2012-08-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "C3"<=TO_DATE(' 2012-10-14 00:00:00', 'syyyy-mm-dd hh24:mi:ss')))

Note
-----
   - dynamic sampling used for this statement (level=2) 

Let’s try again, this time with the non-default value for the PARALLEL_DEGREE_POLICY:

ALTER SESSION SET PARALLEL_DEGREE_POLICY=AUTO;

select count(*) from t1
WHERE C3 BETWEEN to_date('01/08/2012', 'dd/mm/yyyy') and to_date('14/10/2012', 'dd/mm/yyyy');

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'TYPICAL +OUTLINE'));

SQL_ID  1ggk1hy0r22kt, child number 1
-------------------------------------
select count(*) from t1 WHERE C3 BETWEEN to_date('01/08/2012',
'dd/mm/yyyy') and to_date('14/10/2012', 'dd/mm/yyyy')

Plan hash value: 2946076746

--------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation              | Name     | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |    TQ  |IN-OUT| PQ Distrib |
--------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |          |       |       |   835K(100)|          |       |       |        |      |            |
|   1 |  SORT AGGREGATE        |          |     1 |     9 |            |          |       |       |        |      |            |
|   2 |   PX COORDINATOR       |          |       |       |            |          |       |       |        |      |            |
|   3 |    PX SEND QC (RANDOM) | :TQ10000 |     1 |     9 |            |          |       |       |  Q1,00 | P->S | QC (RAND)  |
|   4 |     SORT AGGREGATE     |          |     1 |     9 |            |          |       |       |  Q1,00 | PCWP |            |
|   5 |      PX BLOCK ITERATOR |          |   112M|   968M|   835K(100)| 00:00:33 |    69 |    71 |  Q1,00 | PCWC |            |
|*  6 |       TABLE ACCESS FULL| T1       |   112M|   968M|   835K(100)| 00:00:33 |    69 |    71 |  Q1,00 | PCWP |            |
--------------------------------------------------------------------------------------------------------------------------------

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('11.2.0.3')
      DB_VERSION('11.2.0.3')
      ALL_ROWS
      SHARED(16)
      OUTLINE_LEAF(@"SEL$1")
      FULL(@"SEL$1" "T1"@"SEL$1")
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   6 - access(:Z>=:Z AND :Z<=:Z)
       filter(("C3">=TO_DATE(' 2012-08-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "C3"<=TO_DATE(' 2012-10-14
              00:00:00', 'syyyy-mm-dd hh24:mi:ss')))

Note
-----
   - dynamic sampling used for this statement (level=2)
   - automatic DOP: Computed Degree of Parallelism is 16 because of degree limit 

Notice anything strange… really strange about the above output?  The new version of the query is limited to just 3 partitions, yet the calculated cost of the non-parallel execution is approximately 235,000 (much higher than the calculated cost of selecting from the full table without the WHERE clause) - that is just one reason why relying on cost for tuning performance purposes (as suggested by a couple of books) is potentially misleading.  But wait, the plan using parallel execution with an automatic degree of parallelism of 16 has a plan cost of approximately 835,000 and the optimizer still selected that plan!  I strongly suspect that a query optimizer bug caused a miscalculation of the plan cost; I could not reproduce the odd cost calculation on Oracle Database 11.2.0.1 or 11.2.0.2.  Still suspicious of the cost in the execution plan, at the end of testing I dropped and recreated the table.  After recreating the test table, the following execution plan was generated with a less crazy cost calculation:

SQL_ID  1ggk1hy0r22kt, child number 1
-------------------------------------
select count(*) from t1 WHERE C3 BETWEEN to_date('01/08/2012',
'dd/mm/yyyy') and to_date('14/10/2012', 'dd/mm/yyyy')

Plan hash value: 2946076746

--------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation              | Name     | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |    TQ  |IN-OUT| PQ Distrib |
--------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |          |       |       | 23822 (100)|          |       |       |        |      |            |
|   1 |  SORT AGGREGATE        |          |     1 |     9 |            |          |       |       |        |      |            |
|   2 |   PX COORDINATOR       |          |       |       |            |          |       |       |        |      |            |
|   3 |    PX SEND QC (RANDOM) | :TQ10000 |     1 |     9 |            |          |       |       |  Q1,00 | P->S | QC (RAND)  |
|   4 |     SORT AGGREGATE     |          |     1 |     9 |            |          |       |       |  Q1,00 | PCWP |            |
|   5 |      PX BLOCK ITERATOR |          |  2286K|    19M| 23822 (100)| 00:00:01 |    69 |    71 |  Q1,00 | PCWC |            |
|*  6 |       TABLE ACCESS FULL| T1       |  2286K|    19M| 23822 (100)| 00:00:01 |    69 |    71 |  Q1,00 | PCWP |            |
--------------------------------------------------------------------------------------------------------------------------------

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('11.2.0.3')
      DB_VERSION('11.2.0.3')
      ALL_ROWS
      SHARED(11)
      OUTLINE_LEAF(@"SEL$1")
      FULL(@"SEL$1" "T1"@"SEL$1")
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   6 - access(:Z>=:Z AND :Z<=:Z)
       filter(("C3">=TO_DATE(' 2012-08-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "C3"<=TO_DATE(' 2012-10-14
              00:00:00', 'syyyy-mm-dd hh24:mi:ss')))

Note
-----
   - dynamic sampling used for this statement (level=2)
   - automatic DOP: Computed Degree of Parallelism is 11 

Let’s double-check the calculated cost in the parallel execution plan: 235,000 / 23822 / 0.9 = 10.96 – that calculated cost is consistent with the Computed Degree of Parallelism.

I am curious about whether or not dynamic sampling might affect just the parallel execution.  Let’s collect statistics on table T1 with 100% sampling (note that it could take a long time for this statistics collection to complete):

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T1',ESTIMATE_PERCENT=>NULL)

Let’s try our query again (note that I have forced a hard parse of the SQL statement by changing the capitalization of a couple parts of the SQL statement):

ALTER SESSION SET PARALLEL_DEGREE_POLICY=MANUAL;

SELECT COUNT(*) from t1
WHERE C3 BETWEEN to_date('01/08/2012', 'dd/mm/yyyy') and to_date('14/10/2012', 'dd/mm/yyyy');

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'TYPICAL +OUTLINE'));

SQL_ID  2q2uz15vyzqsy, child number 1
-------------------------------------
SELECT COUNT(*) from t1 WHERE C3 BETWEEN to_date('01/08/2012',
'dd/mm/yyyy') and to_date('14/10/2012', 'dd/mm/yyyy')

Plan hash value: 2744578615

--------------------------------------------------------------------------------------------------
| Id  | Operation                 | Name | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |      |       |       |  1816 (100)|          |       |       |
|   1 |  SORT AGGREGATE           |      |     1 |     8 |            |          |       |       |
|   2 |   PARTITION RANGE ITERATOR|      |  2531K|    19M|  1816   (3)| 00:00:01 |    69 |    71 |
|*  3 |    TABLE ACCESS FULL      | T1   |  2531K|    19M|  1816   (3)| 00:00:01 |    69 |    71 |
--------------------------------------------------------------------------------------------------

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('11.2.0.3')
      DB_VERSION('11.2.0.3')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$1")
      FULL(@"SEL$1" "T1"@"SEL$1")
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter(("C3"<=TO_DATE(' 2012-10-14 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "C3">=TO_DATE(' 2012-08-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')))

Notice in the above that the execution plan for the non-parallel execution has a calculated cost that decreased from 235,000 to just 1,816, even though the same Pstart (69) and Pstop (71) columns are listed, and approximately the same estimated number of rows (~2,286,000 vs ~2,531,000) are expected to be returned.  So, if you are attempting to performance tune by attempting to simply reduce the calculated cost of a query (not a valid approach), you might be lead to believe that collecting statistics improved the performance of this query by a factor of 129.4 (the performance did improve a little due to dynamic sampling no longer being required, but the performance did not improve nearly that much).

Let’s try the query again with a modified value for the PARALLEL_DEGREE_POLICY parameter:

ALTER SESSION SET PARALLEL_DEGREE_POLICY=AUTO;

select COUNT(*) from t1
WHERE C3 BETWEEN to_date('01/08/2012', 'dd/mm/yyyy') and to_date('14/10/2012', 'dd/mm/yyyy');

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'TYPICAL +OUTLINE'));

SQL_ID  2q2uz15vyzqsy, child number 0
-------------------------------------
SELECT COUNT(*) from t1 WHERE C3 BETWEEN to_date('01/08/2012',
'dd/mm/yyyy') and to_date('14/10/2012', 'dd/mm/yyyy')

Plan hash value: 2744578615

--------------------------------------------------------------------------------------------------
| Id  | Operation                 | Name | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |      |       |       |  1816 (100)|          |       |       |
|   1 |  SORT AGGREGATE           |      |     1 |     8 |            |          |       |       |
|   2 |   PARTITION RANGE ITERATOR|      |  2531K|    19M|  1816   (3)| 00:00:01 |    69 |    71 |
|*  3 |    TABLE ACCESS FULL      | T1   |  2531K|    19M|  1816   (3)| 00:00:01 |    69 |    71 |
--------------------------------------------------------------------------------------------------

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('11.2.0.3')
      DB_VERSION('11.2.0.3')
      ALL_ROWS
      NO_PARALLEL
      OUTLINE_LEAF(@"SEL$1")
      FULL(@"SEL$1" "T1"@"SEL$1")
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter(("C3"<=TO_DATE(' 2012-10-14 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "C3">=TO_DATE(' 2012-08-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')))

Note
-----
   - automatic DOP: Computed Degree of Parallelism is 1 because of parallel threshold

The Note section indicates that parallel execution was considered, but not used.  So, after collecting statistics, the parallel degree for this query decreased from 16 (or 11 in the non-bug affected execution plan) to 1 – a non-parallel execution.

The above output was generated with Oracle Database 11.2.0.3 using the following system (CPU) statistics:

COLUMN PNAME FORMAT A14
COLUMN PVAL2 FORMAT A20

SELECT
  PNAME,
  PVAL1,
  PVAL2
FROM
  SYS.AUX_STATS$;

PNAME               PVAL1 PVAL2
-------------- ---------- ----------------
STATUS                    COMPLETED
DSTART                    07-11-2012 15:11
DSTOP                     07-11-2012 15:11
FLAGS                   1
CPUSPEEDNW     1720.20725
IOSEEKTIM              10
IOTFRSPEED           4096
SREADTIM                8
MREADTIM               10
CPUSPEED             2664
MBRC                   16
MAXTHR           19181568
SLAVETHR

While experimenting with Oracle Database 11.2.0.2, I used the following system (CPU) statistics:

COLUMN PNAME FORMAT A14
COLUMN PVAL2 FORMAT A20

SELECT
  PNAME,
  PVAL1,
  PVAL2
FROM
  SYS.AUX_STATS$;

PNAME               PVAL1 PVAL2
-------------- ---------- ----------------
STATUS                    COMPLETED
DSTART                    02-29-2012 19:07
DSTOP                     02-29-2012 19:07
FLAGS                   1
CPUSPEEDNW     2116.57559
IOSEEKTIM              10
IOTFRSPEED           4096
SREADTIM               .4
MREADTIM               .8
CPUSPEED             1922
MBRC                   16
MAXTHR          155910144
SLAVETHR

11.2.0.2 showed the following execution plan after collecting statistics for table T1 and setting the PARALLEL_DEGREE_POLICY parameter to AUTO:

SQL_ID  81dw7f2yz0jxp, child number 1
-------------------------------------
select COUNT(*) from t1 WHERE C3 BETWEEN to_date('01/08/2012',
'dd/mm/yyyy') and to_date('14/10/2012', 'dd/mm/yyyy')

Plan hash value: 2744578615

--------------------------------------------------------------------------------------------------
| Id  | Operation                 | Name | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |      |       |       |  4082 (100)|          |       |       |
|   1 |  SORT AGGREGATE           |      |     1 |     8 |            |          |       |       |
|   2 |   PARTITION RANGE ITERATOR|      |  2531K|    19M|  4082  (29)| 00:00:01 |    69 |    71 |
|*  3 |    TABLE ACCESS FULL      | T1   |  2531K|    19M|  4082  (29)| 00:00:01 |    69 |    71 |
--------------------------------------------------------------------------------------------------

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('11.2.0.2')
      DB_VERSION('11.2.0.2')
      ALL_ROWS
      NO_PARALLEL
      OUTLINE_LEAF(@"SEL$1")
      FULL(@"SEL$1" "T1"@"SEL$1")
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter(("C3"<=TO_DATE(' 2012-10-14 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "C3">=TO_DATE(' 2012-08-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')))

Note
-----
   - automatic DOP: Computed Degree of Parallelism is 1 because of parallel threshold

In the above execution plan, notice that the calculated cost is 4,082 (this is higher than the calculated cost returned by Oracle Database 11.2.0.3).  The SREADTIM and MREADTIM system statistics were manually set to very small numbers – due to a bug in Oracle Database 11.2.0.1 and 11.2.0.2 these system statistics could be set to much large values, such as 2,491 and 9,163, respectively.  What might happen to the calculated cost if we were to increase the SREADTIM and MREADTIM system statistics by a factor of 10?  Such a change would indicate to the query optimizer that each disk access will required 10 times longer to complete.  Should the calculated cost of this query increase or decrease after changing the system statistics?

First, let’s manually alter the value of the system statistics:

EXEC DBMS_STATS.SET_SYSTEM_STATS('SREADTIM',4)
EXEC DBMS_STATS.SET_SYSTEM_STATS('MREADTIM',8)

Next, execute the query and display the execution plan (notice that capitalization of the word SELECT was changed to force a hard parse of the query):

ALTER SESSION SET PARALLEL_DEGREE_POLICY=AUTO;

SELECT COUNT(*) from t1
WHERE C3 BETWEEN to_date('01/08/2012', 'dd/mm/yyyy') and to_date('14/10/2012', 'dd/mm/yyyy');

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'TYPICAL +OUTLINE'));

SQL_ID  2q2uz15vyzqsy, child number 1
-------------------------------------
SELECT COUNT(*) from t1 WHERE C3 BETWEEN to_date('01/08/2012',
'dd/mm/yyyy') and to_date('14/10/2012', 'dd/mm/yyyy')

Plan hash value: 2744578615

--------------------------------------------------------------------------------------------------
| Id  | Operation                 | Name | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |      |       |       |  3020 (100)|          |       |       |
|   1 |  SORT AGGREGATE           |      |     1 |     8 |            |          |       |       |
|   2 |   PARTITION RANGE ITERATOR|      |  2531K|    19M|  3020   (4)| 00:00:01 |    69 |    71 |
|*  3 |    TABLE ACCESS FULL      | T1   |  2531K|    19M|  3020   (4)| 00:00:01 |    69 |    71 |
--------------------------------------------------------------------------------------------------

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('11.2.0.2')
      DB_VERSION('11.2.0.2')
      ALL_ROWS
      NO_PARALLEL
      OUTLINE_LEAF(@"SEL$1")
      FULL(@"SEL$1" "T1"@"SEL$1")
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter(("C3"<=TO_DATE(' 2012-10-14 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "C3">=TO_DATE(' 2012-08-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')))

Note
-----
   - automatic DOP: Computed Degree of Parallelism is 1 because of parallel threshold

Notice the changes in the execution plan?  The calculated cost shown in the execution plan decreased from 4,082 to 3,020, and the %CPU for the third line in the execution plan decreased from 29 to 4.  Why did the calculated cost decrease when the optimizer is informed that disk accesses require 10 times as long to complete?  For the answer, take a look at this blog article written by Randolf Geist (another OakTable Network member).

OK, so I drifted a bit from the focus of the OTN message thread that provoked this article.  What do you think caused parallel execution to be used when the OP executed the query found in the OTN thread?





Book Review: Oracle Database 11gR2 Performance Tuning Cookbook (Part 2)

23 07 2012

July 23, 2012

Oracle Database Performance Tuning Test Cases without Many “Why”, “When”, and “How Much” Filler Details
http://www.amazon.com/Oracle-Database-Performance-Tuning-Cookbook/dp/1849682607

(Back to the Previous Post in the Series)

This blog article contains the review for the second half of the “Oracle Database 11gR2 Performance Tuning Cookbook” book.  It has been nearly five months since I posted the first half of this review.  My full review of this book is roughly 26 typewritten pages in length.

This book definitely is not an easy read.  Instead, it is necessary to pick through the material in the book, in search of correct information, and then perform a bit of research to determine where the book’s contents deviated from describing actual Oracle Database behavior.  Unfortunately, the same review process was required when reading a handful of other Oracle Database performance related books, so this book is not unique in this regard.

Five months ago I submitted 21 errata items for the first 80 or so pages of this book.  As of July 23, 2012, the publisher’s website shows the following when attempting to find the errata for this book:

“No errata currently available.

Known errata are listed here. Please let us know if you have found any errata not on this list by completing the errata submission form. Our editors will check them and post them to the list. Thank you.”

The errata list found on the book’s errata page is disappointing!

Test case scripts are used throughout this book, and that is one of the biggest selling points of this book.  One of the problems, however, is that it appears that the primary goal of this book is to demonstrate the author’s 50+ scripts, show screen captures of the each script’s execution, describe the action performed by each command in the script, and then describe the result of the script.  The all-important “how”, “when”, “how much”, and “why” components were largely excluded in the book.  There are also many errors, misstatements, and misinterpretations of the scripts’ output in the book – items that should have never survived the technical review cycle for the book.  I have no desire to “bash” this book, or to discourage readers of this blog from buying the book.

This review currently excludes chapter 11 on the topic of “Tuning Contention” and a couple of pages in another chapter.  The format of this review mirrors the format used in several of my recent book reviews.

Data Dictionary Views:

  • V$SYSSTAT (pages 208, 400)
  • V$SQL_WORKAREA_HISTOGRAM (page 209)
  • V$PGA_TARGET_ADVICE (page 210)
  • V$SORT_SEGMENT,V$TEMPFILE (page 249)
  • V$SORT_USAGE (page 252)
  • DBA_OPTSTAT_OPERATIONS (page 305)
  • USER_TAB_HISTOGRAMS (pages 306, 318)
  • V$SGA, V$SGAINFO (page 376)
  • V$PROCESS (page 377)
  • V$LIBRARYCACHE, V$SQLAREA, V$SQL_PLAN (page 384)
  • V$SHARED_POOL_RESERVED,V$ROWCACHE (page 389)
  • V$DB_OBJECT_CACHE (pages 389, 394)
  • V$SQLAREA (page 395)
  • V$SHARED_POOL_ADVICE (page 395)
  • V$ROWCACHE (page 395)
  • V$MYSTAT, V$SESSTAT, V$STATNAME (page 399)
  • V$DB_CACHE_ADVICE (page 400)
  • DBA_TABLES, DBA_INDEXES, V$BUFFER_POOL_STATISTICS (page 401)
  • X$BH, DBA_OBJECTS (page 402)
  • V$FILESTAT, V$TEMPSTAT, DBA_DATA_FILES, V$LOGFILE (page 413)
  • ALL_INDEXES (page 419)
  • DBA_HIST_SEG_STAT (page 420)
  • V$INSTANCE_RECOVERY (page 429)
  • V$SYSTEM_EVENT, V$LOGFILE, V$LOG, V$LOG_HISTORY (page 433)

Parameters:

  • WORKAREA_SIZE_POLICY, SORT_AREA_SIZE, SORT_AREA_RETAINED_SIZE (page 208)
  • PGA_AGGREGATE_TARGET (pages 211, 215)
  • STATISTICS_LEVEL (page 215)
  • PLSQL_CODE_TYPE (page 276)
  • PLSQL_OPTIMIZE_LEVEL (page 284)
  • OPTIMIZER_DYNAMIC_SAMPLING (page 304)
  • USE_STORED_OUTLINES (page 311)
  • CREATE_STORED_OUTLINES (page 316)
  • USE_PRIVATE_OUTLINES (page 317)
  • _OPTIMIZER_EXTENDED_CURSOR_SHARING_REL (page 327)
  • CLIENT_RESULT_CACHE_SIZE (page 342)
  • RESULT_CACHE_MODE (page 344)
  • RESULT_CACHE_MAX_SIZE (page 345)
  • PARALLEL_MIN_SERVERS, PARALLEL_MAX_SERVERS, PARALLEL_DEGREE_POLICY (page 350)
  • LOCK_SGA (pages 377, 381)
  • CURSOR_SHARING (pages 388, 399)
  • SHARED_POOL_SIZE (pages 388, 396)
  • SHARED_POOL_RESERVED_SIZE (page 392)
  • OPEN_CURSORS, SESSION_CACHED_CURSORS (page 399)
  • TIMED_STATISTICS (page 412)
  • LOG_ARCHIVE_DEST_1, LOG_ARCHIVE_DEST_2, LOG_ARCHIVE_DEST_3 (page 414)
  • FILESYSTEMIO_OPTIONS (page 425)
  • LOG_CHECKPOINTS_TO_ALERT (page 428)
  • LOG_CHECKPOINT_INTERVAL, LOG_CHECKPOINT_TIMEOUT, FAST_START_MTTR_TARGET (page 431)

Hints:

  • ALL_ROWS, FIRST_ROWS(n), FULL, CLUSTER, HASH (page 296)
  • INDEX, NO_INDEX, LEADING, ORDERED, USE_NL, USE_MERGE, USE_HASH, PARALLEL, STAR_TRANSFORMATION, REWRITE, APPEND, CACHE (page 297)
  • DYNAMIC_SAMPLING (page 304)
  • RESULT_CACHE (page 343)
  • NO_RESULT_CACHE (page 345)
  • PARALLEL (page 347)
  • APPEND (page 352, 355)

Error Messages:

  • ORA-4031: “unable to allocate X bytes of shared memory” (page 390)
  • RMAN-06207 (page 417)

Comments, Corrections, and Problems:

  • A script in the book attempts to set the SORT_AREA_SIZE to a value of 1000 (less than 1KB), after setting the WORKAREA_SIZE_POLICY parameter to MANUAL.  A side effect of this change is that the HASH_AREA_SIZE defaults to twice the SORT_AREA_SIZE, thus the change set the HASH_AREA_SIZE to 2000 bytes.  The execution plan printed in the book shows a single sort operation, and two hash join operations – thus the author is not just testing what happens when the SORT_AREA_SIZE parameter is set to less than 1/65 of its already small default value.  The script in the book orders the rows by AMOUNT_SOLD DESC, while the script in the script library orders the rows by TIME_ID ASC. (page 208)
  • The book states, “We then set a MANUAL policy, reserving the size of 1000 blocks of memory for sorting, with the following statements… ALTER SESSION SET SORT_AREA_SIZE = 1000; …”  The SORT_AREA_SIZE parameter is NOT specified in blocks, but instead in bytes. (page 212)
  • The book states, “In fact, the DB_BLOCK_SIZE for our database is 8 KB, which reserves 8192 KB of memory for the sort operation, but actually needs 24 MB of memory to complete the sort in-memory.”  If the unit of measure of the SORT_AREA_SIZE parameter is blocks (it is bytes, not blocks), 8 KB * 1000 is 8000 KB, not 8192 KB.  The 24 MB figure from the autotrace execution plan is only an estimate of the temp space requirements for the query – my test execution showed 4096 KB was actually required. (page 213)
  • The book states, “If we use an Oracle Shared Server Connection, the User Global Area is not inside the PGA, but it’s in the Shared Pool. Hence, an application which often performs sorting should not use Oracle Shared Server Connections.”  The second sentence is not a logical conclusion of the first sentence.  The first sentence is not completely accurate, as the UGA could be in the large pool, rather than the shared pool. (page 215)
  • The script output shows a HASH UNIQUE operation in the execution plan when the author issued a SQL statement with a DISTINCT clause to demonstrate that a sort to disk would be performed when DISTINCT is specified in a SQL statement – HASH UNIQUE is not a sort operation.  The results of this test case script conflicts with the second paragraph on page 207. (page 218)
  • The script output shows a HASH GROUP BY operation in the execution plan when the author issued a SQL statement with a GROUP BY clause to demonstrate that a sort to disk would be performed when GROUP BY is specified in a SQL statement – HASH GROUP BY is not a sort operation.  The results of this test case script conflicts with the second paragraph on page 207. (page 219)
  • The book states, “Select the first 10 customers, ordered by their age, from youngest to oldest.”  It might be better to state that 10 customers will be selected at random, specifically those customers with rows in the first set of table blocks – this statement was clarified on page 226. (page 224)
  • The book states that changing the where clause to “WHERE ROWNUM < 11” will permit the query using RANK to obtain the same result as earlier.  There are a couple of reasons why this statement is incorrect: 1) the order of the rows is COINCIDENTALLY the same as the value assigned by RANK, but that will not always be true.  2) there are 31 rows that have a RANK of 1, so even if the rows were produced in sequential rank order, there is no guarantee that the same 10 of 31 rows will appear as had happened earlier. (page 227)
  • The example in the book shows that an “INDEX FULL SCAN (MIN/MAX)” operation is used when an index is present on a column to which the MAX or MIN functions are used.  The book also shows that if both the MIN and MAX values of a column are requested in a single SELECT statement, the optimizer switched to a “TABLE ACCESS FULL”  operation, but the book did not explain why that happened.  The work-around suggested by the book almost works, except that the solution produces two rows rather than a single row. (pages 234-236)
  • While the author’s statement that “the myth SELECT (1) is faster than SELECT(*) is wrong – they are the same query” is headed in the right direction, the approach is wrong.  It is simply insufficient to look at the execution plan to determine that two SQL statements are the same (and are performing the same work).  Had the author suggested generating a 10053 trace for both SQL statement, he would have had the opportunity to mention that the query optimizer silently transformed the “SELECT COUNT(1)” query into “SELECT COUNT(*)” – so yes, the queries are the same, but the book’s approach to determining the queries are the same is not correct. (page 237)
  • The author created a temporary tablespace with a 16MB uniform extent size without identifying why that extent size was selected rather than a 1MB default extent size. (page 249)
  • The book states, “If we haven’t defined a default temporary tablespace for a user, the SYSTEM tablespace is used to store SORT SEGMENTS.”  The statement in the book appears to be out of date – the Oracle Database 11.2 documentation states, “Users who are not explicitly assigned a temporary tablespace use the database default temporary tablespace, which for new installations is TEMP.” (page 251)
  • The book’s discussion of defining the INITIAL and NEXT storage parameters for temporary tablespaces as a multiple of the SORT_AREA_SIZE parameter seems to be out of place in an Oracle Database 11g R2 performance tuning book – dictionary managed tablespaces were deprecated with the release of Oracle Database 9.2, and the effect of these parameters is different in locally managed tablespaces. (page 252)
  • The book states, “The PCTINCREASE parameter cannot be specified when we use the AUTOALLOCATE option for the tablespace.”  This statement appears to be incorrect – the Oracle Database documentation states that a specified value for PCTINCREASE will be ignored when creating an AUTOALLOCATE tablespace. (page 252)
  • The book states, “We can have different tablespaces for each user and we can query the dynamic performance view V$SORT_USAGE to retrieve the details about the active disk sorts occurring in the instance.”  This statement is a bit confusing, so it is probably best to break the sentence into two separate logical sentences for analysis.  The first half of the statement seems to suggest that a separate temp tablespace should (or could) be created for each user – I am not sure that this is the author’s intended interpretation; the default temporary tablespace may be set at the user level so that not all users are required to use the same (default) temporary tablespace.  For the second logical sentence, the V$SORT_USAGE performance view was deprecated with the release of Oracle Database 9.2, replaced with V$TEMPSEG_USAGE.  In Oracle Database 11.2, the V$SORT_USAGE view is based on the GV$SORT_USAGE view which selects from X$KTSSO (SELECT VIEW_DEFINITION FROM V$FIXED_VIEW_DEFINITION WHERE VIEW_NAME=’GV$SORT_USAGE’;).  The GV$SORT_USAGE  view definition, which is pointed to by both the V$SORT_USAGE and V$TEMPSEG_USAGE synonyms, indicates that the SEGTYPE column may be one of SORT, HASH, DATA, INDEX, LOB_ DATA, LOB_INDEX , or UNDEFINED.  The potential values of the SEGTYPE column suggests that the view potentially shows a variety of activity in addition to what the book mentions (the WHERE clause should be adapted to restrict the rows returned from this view, so that the results are consistent with this chapter’s contents).  The V$SORT_USAGE view also shows entries for aborted SQL statements, for instance when Ctrl-C is pressed in SQL*Plus and another SQL statement has not been executed by the session. (page 252)
  • The book states, “Please note that – due to their nature – temporary tablespaces are not affected by backup and restore operations.”  This is a vague statement at best, and the accuracy of the statement depends in part on the type of backup performed (online or offline, user-managed or RMAN hot backup, etc.).  An online RMAN backup will not back up the tempfiles of locally managed temporary tablespaces; however, when those backed up databases are restored and recovered, Oracle Database 10.2 and later will recreate the temporary tablespace tempfiles when the database is opened. (page 252)
  • The book states, “Using the LIMIT clause, we can see a slight decline in performance, so why use it?”  The screen captures in the book actually show the test without the LIMIT clause completing in 0.44 seconds, and the test with the LIMIT clause completing in 0.30 seconds, so in this case the LIMIT clause appears to have improved performance. (page 261)
  • Better clarification is needed in the section describing the use of the PL/SQL NOCOPY clause.  The book states that when a parameter variable is passed by reference, a value change to that variable actually changes the value of a copy of that variable, without affecting the originally passed in variable – this concept is backward.  The book should also make an effort to clarify the statement that the NOCOPY clause is a hint, touching on some of the causes of the NOCOPY clause being intentionally ignored as described in the documentation. (page 266)
  • The description of the CLUSTER hint is almost a verbatim copy of the description from the Oracle Database documentation.  The description in the book is a little vague regarding the usage of that hint, considering that the author lists the hint among the most commonly used hints.  The book did not mention this fact found in the documentation: “This hint applies only to tables in an index cluster.” (reference page 296)
  • The book’s description of the LEADING hint is misleading if there are more than two tables (or other row sources) specified in the SQL statement.  The book states, “Asks the optimizer to join table_name1 and table_name2 in the given order.”  First, a hint is a directive – it does not “ask” of the optimizer, it “demands” of the optimizer (this “Asks” wording is used in several of the hint descriptions).  The LEADING hint does not just specify that two tables should be joined in the listed order.  That hint states that the join order should begin with the first table (or other row source) specified in the hint, joining each successive table (or other row source) listed in the hint in the specified order. (page 297)
  • The book states that the default value for the OPTIMIZER_DYNAMIC_SAMPLING parameter is 2, which is true for Oracle Database 10.1 and higher.  However, the book also describes dynamic sampling as being introduced in Oracle Database 9.2.  Oracle Database 9.2 had a default value of 1 for this parameter.  The book offered no description of what happens when a value, such as 8, is used in a DYNAMIC_SAMPLING hint, other than 0 means that dynamic sampling should not be used, and 10 means “use aggressive sampling”. (page 304)
  • The book states, “Histograms are useful only on indexed columns containing skewed values, because they help the optimizer to choose whether to use the index or not to access values.  Obviously, if the frequency for a specific value is very high, using the index won’t be the best choice.”  “Don’t use histograms in situations where: 1. the column is not used in the WHERE clauses of the queries; 2. the data in the column is uniformly distributed; 3. bind variables are used when comparing against the column.”  The quoted section from the book seems to be closely derived from a page of the Oracle Database 8.1.7 documentation which was modified and finally removed from later releases of the documentation.  Histograms are potentially useful when created on non-indexed columns, with the goal of improving cardinality estimates, which are used to determine join order and join type when multiple tables are referenced in a SQL statement.  Histograms are also potentially useful when the data is not skewed, helping to improve the cardinality estimates of range scans when outlier values are present (such as an out of range value taking the place of NULL values).  Regarding the statement, “Obviously, if the frequency for a specific value is very high, using the index won’t be the best choice,” the frequency of a value in a column is but one of several factors that the optimizer considers, but a high frequency of a value is not necessarily a problem if the use of the index avoids the need to visit the related table.  Regarding the bullet points that describes when histograms should not be used, “The column is not used in the WHERE clauses of the queries,” with the introduction of ANSI join syntax in Oracle Database 9.0.1, a column that only appears in JOIN clauses could potentially benefit from a histogram.  Prior to the introduction of adaptive cursor sharing in Oracle Database 11.1, bind variables when combined with bind variable peeking and histograms could generate execution plans that, while efficient for the first execution, were potentially considerably inefficient when different bind variable values were submitted during later executions of the SQL statement.  Adaptive cursor sharing attempts to address the issues caused by the combination of bind variable peeking and histograms.  In cases where the same value is always submitted for a comparison with a submitted bind variable, a histogram on the comparison column could significantly improve the optimizer’s cardinality estimates such that a more efficient execution plan is developed (consider a case when an index on a STATUS column is present, and that STATUS column contains the value of either PROCESSED or NOT PROCESSED).  The book only mentioned that adaptive cursor sharing would be discussed in a later recipe – there was no comment in this section of the book regarding whether adaptive cursor sharing helped or further hindered the bullet-pointed items in the book. (page 309)
  • The discussion of stored outlines missed a couple of key items.  One item missed is that other sessions will not use stored outlines, unless the USE_STORED_OUTLINES parameter is set to a value other than FALSE at either the session (after the session connects) or system level (after each bounce of the database) because the parameter cannot be included in pfiles (init.ora) or spfiles.  In the demonstration script, the author displayed the execution plan for a query, then displayed the execution plan for the same query modified with an index hint, before creating a stored outline for the query; it appears that while creating the script, that the author thought that the unhinted query would use the stored outline for the hinted version of the query – that will not happen without implementing a technique that was not discussed in the book.  The book states, “If we want to avoid changes to the execution plan, we can store them in stored outlines – the plans in stored outlines don’t change…”  A stored outline does not contain an execution plan, it stores a series of hints (visible in DBA_OUTLINE_HINTS), and as stated in the Oracle Database documentation, “The degree to which plan stability controls execution plans is dictated by how much the Oracle Database hint mechanism controls execution plans…” (pages 310-317)
  • The AUTOTRACE generated plans for a SQL statement using bind variables, intended to show adaptive cursor sharing in action, may in fact show a different execution plan thzn the actual execution plan that will be used because AUTOTRACE handles all bind variables as VARCHAR2, does not perform bind variable peeking, and does not pull an existing execution plan from the library cache. (pages 321-322)
  • The TKPROF output shows two executions, implying that the TKPROF output shows the output for test script line 16 and 17, not just 16 as stated in the book. (page 325)
  • The TKPROF output shows that the execution plan changed with the different bind variable value, which means that the author must have re-executed the SQL statement again after the first execution with a bind variable value of 1 resulted in an inefficient execution plan.  The book does not mention that adaptive cursor sharing only takes effect after the runtime engine first executes the SQL statement at least once with an inefficient execution plan. (page 326)
  • The book states, “Pre Oracle Database 10g, bind variables and histograms could not be used together.”  This is a confusing sentence – is the book implying that histograms could not be utilized on any columns, even those that are only compared with literals, if the SQL statement uses bind variables?  Bind variable peeking was introduced with Oracle Database 9.0.1, which is sufficient to allow the optimizer to take advantage of histograms on columns referenced in the WHERE clause when those columns are compared with bind variables. (page 326)
  • The book states that in an OLTP type database, “we probably want to disable the Adaptive Cursor Sharing feature to eliminate the related overhead.”  The book then suggests changing the CURSOR_SHARING parameter to a value of EXACT, and the  _OPTIMIZER_EXTENDED_CURSOR_SHARING_REL parameter to a value of NONE.  First, the book should not suggest altering a hidden parameter without mentioning that hidden parameters should only be changed after consulting Oracle support.  Second, it is not the CURSOR_SHARING parameter that should be set to a value of EXACT, but the _OPTIMIZER_ADAPTIVE_CURSOR_SHARING parameter that should be set to a value of FALSE (see Metalink (MOS) Doc ID 11657468.8).  Third, the blanket statement that adaptive cursor sharing should be disabled in OLTP databases seems to be an incredibly silly suggestion for any Oracle Database version other than 11.1.0.6 (this version contained a bug that lead to an impressive number of child cursors due to repeated executions of a SQL statement with different bind variable values). (page 327)
  • (Skipped pages 328-340)
  • The example demonstrating the client-side result cache was of limited use because the author specified “SET AUTOT TRACE EXP” in the script, meaning that neither the query without the RESULT_CACHE hint, nor the query with the hint actually executed.  For the example to be useful, autotrace should have been disabled, DBMS_XPLAN.DISPLAY_CURSOR should have been used to display the ALLSTATS LAST statistics, and the query with the RESULT_CACHE hint should have been executed twice. (pages 342-344)
  • When trying to compare the relative performance of the PARALLEL hint, rather than using SET TERM OFF, and SPOOL /DEV/NULL to avoid outputting the results to the screen, and selecting the SYSDATE for timing information, it probably would have been better to use SET AUTOTRACE TRACEONLY STATISTICS and SET TIMING ON.  Caching at the operating system, SAN, and disk level also needs to be considered.  For example, simply executing the SQL statement with the PARALLEL hint first might have required that version of the SQL statement to take longer than the unhinted SQL statement (hinted: 12.81 seconds, unhinted 10.05 seconds; hinted:10.80 seconds, unhinted: 7.58 seconds; hinted: 9.15 seconds, unhinted 13.00 seconds). (page 346-349)
  • The book states, “In OLTP environments parallel executions cause too many locks and consume more resources – limiting the scalability of the application.”  The author should clarify the statement that parallel executions cause too many locks – is the author describing parallel DML, parallel DDL, or parallel query? (page 350)
  • The book states, “Another issue can arise when using direct path insert – remember that conventional load inserts in memory – and only the redo log is written to disk on commit.”  This statement is confusing and misleading.  The contents of the redo log buffer are flushed to disk on commit – a commit in any session.  But the redo log buffer is also flushed every 3 seconds, and when it is at least 1/3 full.  The section of the author’s test script without the APPEND hint generated roughly 37 MB of redo (in a NOARCHIVELOG mode test database) when I tested, and the book shows that the insert required 5.3 seconds during the author’s test run.  DBWR is free to write blocks modified by a session to disk before the session issues a commit.  The demonstration should have mentioned that triggers and foreign key constraints on the table will prevent the APPEND hint from working as expected, and that more redo would be expected to be written in an ARCHIVELOG mode database unless NOLOGGING is set on the table or NOLOGGING is specified in the INSERT statement. (page 355)
  • The book states, “The NOLOGGING clause doesn’t write redo log entries for the operation, which is unrecoverable due to this behavior.”  The wording of this statement makes it a little difficult to understand the point of the statement.  Additionally, including the NOLOGGING clause in the CREATE TABLE AS SELECT statement does NOT prevent the generation of redo log entries – 178KB of redo was written when I reproduced the author’s test case.  That clause will minimize the amount of redo generated, not eliminate the generation of redo. (page 358)
  • The book includes a good test case example showing the effects of row level triggers and excessive indexing.  The author possibly could have made the performance difference point a bit clearer (or more dramatic) by using single row inserts, rather than INSERT INTO … SELECT. (reference pages 359-365)
  • The book states, “We will execute SQL*Plus from the /oracle/home/ directory…” – it appears from looking at the script that the intended to write “/home/oracle/ directory”.  The bang (!) shortcut for the SQL*Plus host command might not work on some operating systems such as VMS and Windows, so ideally the author would have spelled out the word host (or the first two characters of the word) in the scripts. (pages 366-367)
  • The book states that automatic memory management should not be used with Oracle Multithreaded server.  Considering that the “Multithreaded server” name was replaced with “Shared server” with the release of Oracle Database 9.0.1, and Automatic Memory Management is a feature introduced in Oracle Database 11.1.0.6, this statement in the book is a bit confusing. (page 381)
  • The select from V$SQL_PLAN using only HASH_VALUE in the WHERE clause is very odd, it would seem that using the SQL_ID and CHILD_NUMBER columns would make more sense.  Considering that the book is targeting Oracle Database 11.2, using DBMS_XPLAN rather than directly querying V$SQL_PLAN probably would have yielded more useful results than simply listing the operation, owner, and object name. (page 384)
  • The sample code showing how to pin a cursor in the library cache using DBMS_SHARED_POOL.KEEP is missing a single quote after the SQL statement’s hash value and the letter C (or any letter other than P, T, R, or Q) in single quotes.  As a result, the error “ORA-01756: quoted string not properly terminated” will be returned when that sample code is executed.  In addition to showing the sample code, the author should have provided guidance regarding why this feature should or should not be used, and when the feature should or should not be used (this is a frequent problem throughout the book – showing the how without showing the “why”, “when”, and “how much” details). (pages 390, 394)
  • The book states, “The UGA is located in the PGA when using dedicated server environments, and inside the Shared Pool when using shared servers.”  A similar sentence appears on the previous page.  It would be a bit more accurate to state that in a shared server environment, the Large Pool (if configured) contains the session UGA.  With a value set for the SGA_TARGET parameter, the Large Pool should be configured automatically, with its size indicated by the __LARGE_POOL_SIZE hidden parameter. (reference page 397)
  • The query that calculates the MAX of the “session uga memory max” statistic found in V$SESSTAT is limited in usefulness for a couple of reasons: 1) PGA memory used by sessions is allocated as needed and deallocated when no longer needed – summing the maximum value for the various sessions does not indicate the concurrent high watermark of session memory usage, rather it overstates the high watermark; 2) when a session disconnects, the statistics representing the PGA memory used by that session will disappear from V$SESSTAT. (page 399)
  • The formula for calculating the buffer cache hit ratio, while it appeared in the Oracle Database 9.2 documentation, is incorrect.  The author used the “physical reads”, “physical reads direct”, “physical reads direct (lob)” and “session logical reads” statistics from V$SYSSTAT, while the Oracle Database 11.2 documentation uses the statistics “physical reads cache”, “consistent gets from cache”, and “db block gets from cache” for calculating the buffer cache hit ratio.  The usefulness of the derived metric is limited at best. (page 400)
  • The book states, “The main use of the LRU list is to add blocks to the LRU end using a full table scan, while the normal operations add blocks to the MRU end of the list, and hence they are quickly replaced by blocks required for subsequent operations.”  This statement is similar to one made on page 170 of the book.  Recent releases (9.0.1 and above?) of Oracle Database perform mid-point insertions, even with full table scans.  If the CACHE keyword is specified when an object is created or altered, then that object’s blocks will be placed at the MRU end of the list when the object’s blocks are accessed from disk. (reference page 408).
  • The book states, “On this basis, we will establish a strategy to distribute the Oracle files on different disks to help improve performance.”  Perhaps a suggestion to use SAME (stripe and mirror everything), as recommended by Oracle Corporation, would be a better approach. (page 412)
  • The book states that the TIMED_STATISTICS parameter should be set to a value of TRUE in order to diagnose I/O performance issues.  While it is true that this parameter should have a value of TRUE, that parameter should default to a value of TRUE when the value of the STATISTICS_LEVEL parameter is set at its default value of TYPICAL, or adjusted to a value of ALL. (page 412)
  • The book states, “The following steps will show how to destribute Oravle files to increase performance.”  That sentence includes two uncorrected typos. (page 412)
  • The query of V$FILESTAT and the query of V$TEMPSTAT are of limited value because there was no attempt to calculate delta values for the statistics.  The queries do not take into consideration the effects of backup activity and statistics collection, because the delta period is since the opening of the database. (page 413)
  • Rather than just arbitrarily moving the redo logs from one directory belonging to the database TESTDB to another directory apparently on the same hard drive volume, the directory which belongs to the database TESTDB2 (assuming that optimal flexible architecture approaches are in use), the author should explain the factors that influence the decision whether or not to move the redo logs. (page 413)
  • Justification should be provided for changing the archived redo log locations.  Justification should be provided for moving the datafiles for a tablespace – where is the evidence that the current location of the datafiles is causing a performance problem that will be addressed by changing the directory in which the datafiles are located? (page 414)
  • When describing the output of the V$FILESTAT query, the author states, “For each file, we can see the number of physical reads and writes performed and the time spent (in milliseconds) for these operations.”  V$FILESTAT displays the time in centiseconds (1/100 of a second) not in milliseconds (1/1000 of a second).  The output from V$FILESTAT shows that none of the files were heavily accessed, so there is probably little value in using the output to make the decision that one or another datafile should be relocated. (page 415)
  • The book states, “When we heavily access data files and we know which file is executing the query against the statistics (shown in step 2 and step 3), we can gain performance by separating the heavily-accessed data files on different disks.”  This statement is confusing – a file does not execute a query.  Just because the blocks in two datafiles are heavily accessed, that does not mean that there will be contention between the datafiles if those datafiles reside on the same disk volume.  Considering the number of times the author recommended relocating files to “different disks” it might be interesting to understand how many disk volumes the author considers as being essential to efficient operation of the database. (page 417)
  • The book states, “We distribute the data files based on disk controller allocation – distributing them on different mount points managed by the same disk controller brings no performance improvement.”  This statement is limited in accuracy, and because the sentence is stated as an absolute, it is incorrect.  The current and maximum throughput of the disk controller are but two of several factors that might have an impact on performance. (page 418)
  • The book states, “The CKPT, LGWR, and ARCn processes can only read and write control files.”  I think that the author intended something other than what is written in this quote – otherwise someone might wonder which process writes to the online redo logs, and which process archives those redo logs once filled. (page 418)
  • The book states, “LGWR writes online redo logs sequentially, using a RAID 5 on the disks, where online redo logs are stored.”  I think that this sentence requires a lot of reworking before it is understandable.  The book goes on to state, “This can lead to poor performance due to the slower write times that characterize this type of disk array [RAID 5] – using RAID 0+1 is preferable.”  Suggesting the more robust RAID 10, rather than RAID 0+1, would be a good idea in this sentence. (page 419)
  • The author suggested querying the DBA_HIST_SEG_STAT view without mentioning the licensing requirements for accessing that AWR view. (page 420)
  • The book states, “If we have tables and indexes on the same disk, we need to read and write in two different places on the same disk.  By dividing the work between two disks, we can perform an index range scan faster than when the index and table are on the same disk.”  While a statement similar to this one has appeared in a couple of older Oracle Database related books, the statement has been shown to be incorrect for various reasons. (page 422)
  • The book states, “The latter [hardware-based RAID] offers guaranteed performance and no overhead on the CPU.”  Hardware-based RAID does not offer guaranteed performance, although it should have less CPU overhead than software based RAID. (page 423)
  • The book states, “Starting with step 1, in terms of performance the best RAID level for an Oracle database is RAID 0+1, also known as RAID 10.”  RAID 0+1 and RAID 10 are not the same – the two approaches have different fault tolerances.  The same mistake is made on page 425.  (page 423)
  • The book states, “It’s not a good idea to store redo logs and temporary segments on RAID level 1 configurations, because redo logs should be multiplexed on different disks, and there is no need to protect temporary segments from failures.”  Using multiple RAID 1 arrays for the online redo logs is certainly possible – it is not clear if the book suggests that the location where the online redo logs are located should not be protected by one of the redundant RAID levels (RAID 1, RAID 10, etc.).  It could easily be argued that the tempfiles should be located on an array with a redundant RAID level; it is true that the tempfiles can be recreated, but losing the drive hosting the tempfile could be disastrous for the connected sessions. (page 424)
  • The description of RAID 5 should have described how losing one drive in the array affects performance, and how the performance is affected when a second drive in the array is lost. (page 424)
  • The book states, “A checkpoint is used to ensure consistency in the database; during this operation, all data files are synchronized with the data blocks in memory.”  This statement needs a bit of clarification, first of what is the meaning of “synchronized with the data blocks in memory.”  There are several types of checkpoints, including an object level checkpoint, a checkpoint which takes place prior to the execution of a parallel query for all tables that are referenced in the query. (page 428)
  • The script demonstrating how checkpoints work has a couple of flaws.  The script changes the value of the LOG_CHECKPOINTS_TO_ALERT parameter with a scope of SPFILE, and does not mention that the database must be bounced after executing this command.  The script issues an “ALTER SYSTEM SWITCH LOGFILE” command, rather than an “ALTER SYSTEM CHECKPOINT” command. (page 428)
  • The book states, “In the following screenshot, you can see that the checkpoint information was written to the alert log.”  The screen capture shows the log switch, but not the checkpoint message. (page 430)
  • The book essentially states that the value of the LOG_CHECKPOINT_INTERVAL multiplied by the redo block size (typically 512 bytes, but may be 1024 or 4096 in later releases) cannot exceed 90% of the smallest redo log file – that statement conflicts with the Oracle Database 10.2 documentation.  Regardless of that fact, both the LOG_CHECKPOINT_INTERVAL and LOG_CHECKPOINT_TIMEOUT parameters described on the page are deprecated as of Oracle Database 10.1 per Metalink (MOS) Doc ID 472821.1. (page 431)
  • The book states, “The symptoms of I/O issues, related to slow disks on which redo log files are written, are highlighted by a high value for log file sync statistic. Often a high value for the log file parallel write is confirmed by a high value for this parameter.”  The log file sync wait event probably should not be referred to as either a statistic or a parameter, because those labels have very specific meaning in Oracle Database terminology.  The excessive time spent in the log file sync wait event could be caused by several factors other than slow disks where the redo logs are located, including excessive CPU competition and simply excessive COMMITs (or ROLLBACKs) issued by a session. (page 434)




Parallel Execution Challenge – It’s About Time

8 07 2012

July 8, 2012

As I was reading a book, I saw a comment essentially stating that with multiple CPUs/cores, parallel execution will be faster than serial execution of the same query.  The book provided a test case - I decided to change around the test case a bit so that the results were a bit more fair, and then I performed a bit of testing using the sample database data that Oracle Corporation provides. 

My testing was performed in Oracle Database 11.2.0.1 on 64 bit Windows with a quad core CPU that supports hyperthreading, and two SSD drives in a RAID 0 array (with all scripts executed on the server).  So, is parallel query always faster than serial execution when the server has multiple CPUs/cores?  But wait, that is not the interesting question… not the point of this parallel execution challenge, but still an interesting question.  Here is the test script, which will be executed three times, with the results of the first execution thrown out so that we may eliminate the side effects of the hard parse:

ALTER SESSION SET CURRENT_SCHEMA=SH;
SET AUTOTRACE TRACEONLY STATISTICS EXPLAIN
SET TIMING ON
SET LINESIZE 145
SET PAGESIZE 1000

ALTER SYSTEM FLUSH BUFFER_CACHE;
SELECT /*+ PARALLEL(2) */ S.PROD_ID, S.CUST_ID, S.TIME_ID FROM SH.SALES S ORDER BY S.AMOUNT_SOLD DESC;

ALTER SYSTEM FLUSH BUFFER_CACHE;
SELECT S.PROD_ID, S.CUST_ID, S.TIME_ID FROM SH.SALES S ORDER BY S.AMOUNT_SOLD DESC; 

Notice in the above that the test case performs the parallel execution first, with a degree of parallelism of 2, and then executes the same query without using parallel execution.

Let’s look at the output for the parallel execution:

918843 rows selected.

Elapsed: 00:00:11.97

Execution Plan
----------------------------------------------------------
Plan hash value: 2055439529

-----------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation               | Name     | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     | Pstart| Pstop |    TQ  |IN-OUT| PQ Distrib |
-----------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |          |   918K|    19M|       |   308  (14)| 00:00:04 |       |       |        |      |            |
|   1 |  PX COORDINATOR         |          |       |       |       |            |          |       |       |        |      |            |
|   2 |   PX SEND QC (ORDER)    | :TQ10001 |   918K|    19M|       |   308  (14)| 00:00:04 |       |       |  Q1,01 | P->S | QC (ORDER) |
|   3 |    SORT ORDER BY        |          |   918K|    19M|    31M|   308  (14)| 00:00:04 |       |       |  Q1,01 | PCWP |            |
|   4 |     PX RECEIVE          |          |   918K|    19M|       |   277   (4)| 00:00:04 |       |       |  Q1,01 | PCWP |            |
|   5 |      PX SEND RANGE      | :TQ10000 |   918K|    19M|       |   277   (4)| 00:00:04 |       |       |  Q1,00 | P->P | RANGE      |
|   6 |       PX BLOCK ITERATOR |          |   918K|    19M|       |   277   (4)| 00:00:04 |     1 |    28 |  Q1,00 | PCWC |            |
|   7 |        TABLE ACCESS FULL| SALES    |   918K|    19M|       |   277   (4)| 00:00:04 |     1 |    28 |  Q1,00 | PCWP |            |
-----------------------------------------------------------------------------------------------------------------------------------------

Note
-----
   - Degree of Parallelism is 2 because of hint

Statistics
----------------------------------------------------------
         12  recursive calls
          0  db block gets
       1802  consistent gets
       1647  physical reads
          0  redo size
   22116448  bytes sent via SQL*Net to client
    1102982  bytes received via SQL*Net from client
      61258  SQL*Net roundtrips to/from client
          3  sorts (memory)
          0  sorts (disk)
     918843  rows processed 

918,843 rows selected (not displayed in SQL*Plus due to the AUTOTRACE setting), 1,647 blocks read from disk, 1,802 consistent gets, an estimated cost of 308, and the execution completed in 11.97 seconds.

Let’s take a look at the serial execution output:

918843 rows selected.

Elapsed: 00:00:08.84

Execution Plan
----------------------------------------------------------
Plan hash value: 3803407550

------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name  | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     | Pstart| Pstop |
------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |       |   918K|    19M|       |  6637   (2)| 00:01:20 |       |       |
|   1 |  SORT ORDER BY       |       |   918K|    19M|    31M|  6637   (2)| 00:01:20 |       |       |
|   2 |   PARTITION RANGE ALL|       |   918K|    19M|       |   499   (4)| 00:00:06 |     1 |    28 |
|   3 |    TABLE ACCESS FULL | SALES |   918K|    19M|       |   499   (4)| 00:00:06 |     1 |    28 |
------------------------------------------------------------------------------------------------------

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       1718  consistent gets
       1647  physical reads
          0  redo size
   22116448  bytes sent via SQL*Net to client
    1102982  bytes received via SQL*Net from client
      61258  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
     918843  rows processed 

918,843 rows selected (not displayed in SQL*Plus due to the AUTOTRACE setting), 1,647 blocks read from disk, 1,718 consistent gets, an estimated cost of 6,637, and the execution completed in 8.84 seconds.

* The first part of the challenge is to describe how the cost is derived in the above execution plan for the parallel execution.  Here is the easy part of the cost calculation:

cost of serial full table scan / degree of parallel / 0.90
499 / 2 / 0.90 = 277.22

The above calculation is easy to understand, and is explained in this article.  The SORT ORDER BY operation in the parallel query execution plan added 31 to the calculated cost (308 – 277), while the same operation in the serial execution added 6,138 to the calculated cost (6637 – 499).

The first challenge is to describe why the calculated cost of the parallel execution plan is so small compared to the serial execution.

The timing output indicated that the serial execution required 3.13 fewer seconds than the parallel execution.  Let’s look at the results of the third execution of this test case script:

918843 rows selected.

Elapsed: 00:00:10.66

Execution Plan
----------------------------------------------------------
Plan hash value: 2055439529

-----------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation               | Name     | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     | Pstart| Pstop |    TQ  |IN-OUT| PQ Distrib |
-----------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |          |   918K|    19M|       |   308  (14)| 00:00:04 |       |       |        |      |            |
|   1 |  PX COORDINATOR         |          |       |       |       |            |          |       |       |        |      |            |
|   2 |   PX SEND QC (ORDER)    | :TQ10001 |   918K|    19M|       |   308  (14)| 00:00:04 |       |       |  Q1,01 | P->S | QC (ORDER) |
|   3 |    SORT ORDER BY        |          |   918K|    19M|    31M|   308  (14)| 00:00:04 |       |       |  Q1,01 | PCWP |            |
|   4 |     PX RECEIVE          |          |   918K|    19M|       |   277   (4)| 00:00:04 |       |       |  Q1,01 | PCWP |            |
|   5 |      PX SEND RANGE      | :TQ10000 |   918K|    19M|       |   277   (4)| 00:00:04 |       |       |  Q1,00 | P->P |      RANGE |
|   6 |       PX BLOCK ITERATOR |          |   918K|    19M|       |   277   (4)| 00:00:04 |     1 |    28 |  Q1,00 | PCWC |            |
|   7 |        TABLE ACCESS FULL| SALES    |   918K|    19M|       |   277   (4)| 00:00:04 |     1 |    28 |  Q1,00 | PCWP |            |
-----------------------------------------------------------------------------------------------------------------------------------------

Note
-----
   - Degree of Parallelism is 2 because of hint

Statistics
----------------------------------------------------------
         12  recursive calls
          0  db block gets
       1802  consistent gets
       1647  physical reads
          0  redo size
   22116448  bytes sent via SQL*Net to client
    1102982  bytes received via SQL*Net from client
      61258  SQL*Net roundtrips to/from client
          3  sorts (memory)
          0  sorts (disk)
     918843  rows processed 

The above shows the results of the parallel execution.  918,843 rows selected (not displayed in SQL*Plus due to the AUTOTRACE setting), 1,647 blocks read from disk, 1,802 consistent gets, an estimated cost of 308, and completed in 10.66 seconds.  Notice that with the exception of the elapsed time, the statistics are identical to what we saw during the previous execution of the test script.

918843 rows selected.

Elapsed: 00:00:08.48

Execution Plan
----------------------------------------------------------
Plan hash value: 3803407550

------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name  | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     | Pstart| Pstop |
------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |       |   918K|    19M|       |  6637   (2)| 00:01:20 |       |       |
|   1 |  SORT ORDER BY       |       |   918K|    19M|    31M|  6637   (2)| 00:01:20 |       |       |
|   2 |   PARTITION RANGE ALL|       |   918K|    19M|       |   499   (4)| 00:00:06 |     1 |    28 |
|   3 |    TABLE ACCESS FULL | SALES |   918K|    19M|       |   499   (4)| 00:00:06 |     1 |    28 |
------------------------------------------------------------------------------------------------------

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       1718  consistent gets
       1647  physical reads
          0  redo size
   22116448  bytes sent via SQL*Net to client
    1102982  bytes received via SQL*Net from client
      61258  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
     918843  rows processed 

The above shows the results of the serial execution. 918,843 rows selected (not displayed in SQL*Plus due to the AUTOTRACE setting), 1,647 blocks read from disk, 1,718 consistent gets, an estimated cost of 6,637, and completed in 8.48 seconds. Notice that with the exception of the elapsed time, the statistics are identical to what we saw during the previous execution of the test script.

The second challenge is to describe why the serial execution completed at least 25% faster than the parallel execution of the same query – this result is counter to what the calculated cost of the execution plans suggest, and counter to what the book claimed.

Maybe we need to look at the result from a different angle.  The book did not use SET TIMING ON as my test case above did, so let’s change the test case again.  This time, we will use the GATHER_PLAN_STATISTICS hint so that the runtime engine will collect the timing and various other statistics, and then use DBMS_XPLAN to display the actual execution plan.  To eliminate the effects of hard parsing, the script will be executed three times with the first execution thrown out.  The new script:

ALTER SESSION SET CURRENT_SCHEMA=SH;
SET AUTOTRACE TRACEONLY STATISTICS
SET TIMING OFF
SET LINESIZE 145
SET PAGESIZE 1000

ALTER SYSTEM FLUSH BUFFER_CACHE;
SELECT /*+ Q1 PARALLEL(2) GATHER_PLAN_STATISTICS */ S.PROD_ID, S.CUST_ID, S.TIME_ID FROM SH.SALES S ORDER BY S.AMOUNT_SOLD DESC;

ALTER SYSTEM FLUSH BUFFER_CACHE;
SELECT /*+ Q2 GATHER_PLAN_STATISTICS */ S.PROD_ID, S.CUST_ID, S.TIME_ID FROM SH.SALES S ORDER BY S.AMOUNT_SOLD DESC;

SET AUTOTRACE OFF 

SELECT
  SQL_ID,
  CHILD_NUMBER,
  SUBSTR(SQL_TEXT,1,30) SQL_TEXT
FROM
  V$SQL
WHERE
  SQL_TEXT LIKE 'SELECT /*+ Q%'
ORDER BY
  3,
  1,
  2; 

The last of the above SQL statements simply retrieves the SQL_ID and CHILD_NUMBER for the two previous queries.  For my results, I received the following output from the last of the above SQL statements:

SQL_ID        CHILD_NUMBER SQL_TEXT
------------- ------------ ------------------------------
5p47nbddfn0k7            0 SELECT /*+ Q1 PARALLEL(2) GATH
8k6bdqr09xc8u            0 SELECT /*+ Q2 GATHER_PLAN_STAT 

To display the execution plans with the runtime statistics, we are able to use the following two statements:

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR('5p47nbddfn0k7',0,'ALLSTATS LAST +COST'));

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR('8k6bdqr09xc8u',0,'ALLSTATS LAST +COST')); 

Here are the autotrace statistics for the parallel execution:

Statistics
----------------------------------------------------------
         12  recursive calls
          0  db block gets
       1802  consistent gets
       1647  physical reads
          0  redo size
   22116448  bytes sent via SQL*Net to client
    1102982  bytes received via SQL*Net from client
      61258  SQL*Net roundtrips to/from client
          3  sorts (memory)
          0  sorts (disk)
     918843  rows processed 

The DBMS_XPLAN generated execution plan for the parallel execution follows:

SQL_ID  5p47nbddfn0k7, child number 0
-------------------------------------
SELECT /*+ Q1 PARALLEL(2) GATHER_PLAN_STATISTICS */ S.PROD_ID,
S.CUST_ID, S.TIME_ID FROM SH.SALES S ORDER BY S.AMOUNT_SOLD DESC

Plan hash value: 2055439529

-----------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation               | Name     | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |          |      1 |        |   308 (100)|    918K|00:00:00.70 |     116 |     28 |       |       |          |
|   1 |  PX COORDINATOR         |          |      1 |        |            |    918K|00:00:00.70 |     116 |     28 |       |       |          |
|   2 |   PX SEND QC (ORDER)    | :TQ10001 |      0 |    918K|   308  (14)|      0 |00:00:00.01 |       0 |      0 |       |       |          |
|   3 |    SORT ORDER BY        |          |      0 |    918K|   308  (14)|      0 |00:00:00.01 |       0 |      0 |    83M|  3142K|   37M (0)|
|   4 |     PX RECEIVE          |          |      0 |    918K|   277   (4)|      0 |00:00:00.01 |       0 |      0 |       |       |          |
|   5 |      PX SEND RANGE      | :TQ10000 |      0 |    918K|   277   (4)|      0 |00:00:00.01 |       0 |      0 |       |       |          |
|   6 |       PX BLOCK ITERATOR |          |      0 |    918K|   277   (4)|      0 |00:00:00.01 |       0 |      0 |       |       |          |
|*  7 |        TABLE ACCESS FULL| SALES    |      0 |    918K|   277   (4)|      0 |00:00:00.01 |       0 |      0 |       |       |          |
-----------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   7 - access(:Z>=:Z AND :Z<=:Z)

Note
-----
   - Degree of Parallelism is 2 because of hint 

Notice that the execution plan claims that the query completed in 0.70 seconds, that the calculated cost of the execution plan is (still) 308, that 28 blocks were read from disk, and 116 consistent gets were performed.

Here are the autotrace statistics for the serial execution:

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       1718  consistent gets
       1647  physical reads
          0  redo size
   22116448  bytes sent via SQL*Net to client
    1102982  bytes received via SQL*Net from client
      61258  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
     918843  rows processed 

The DBMS_XPLAN generated execution plan for the serial execution plan follows:

SQL_ID  8k6bdqr09xc8u, child number 0
-------------------------------------
SELECT /*+ Q2 GATHER_PLAN_STATISTICS */ S.PROD_ID, S.CUST_ID, S.TIME_ID
FROM SH.SALES S ORDER BY S.AMOUNT_SOLD DESC

Plan hash value: 3803407550

-----------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |       |      1 |        |  6637 (100)|    918K|00:00:00.93 |    1718 |   1647 |       |       |          |
|   1 |  SORT ORDER BY       |       |      1 |    918K|  6637   (2)|    918K|00:00:00.93 |    1718 |   1647 |    42M|  2308K|   37M (0)|
|   2 |   PARTITION RANGE ALL|       |      1 |    918K|   499   (4)|    918K|00:00:00.31 |    1718 |   1647 |       |       |          |
|   3 |    TABLE ACCESS FULL | SALES |     28 |    918K|   499   (4)|    918K|00:00:00.22 |    1718 |   1647 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------------------- 

Notice that the execution plan claims that the query completed in 0.93 seconds, that the calculated cost of the execution plan is (still) 6,637, that 1,647 blocks were read from disk, and 1,718 consistent gets were performed.

The third challenge is to describe why DBMS_XPLAN claims that the parallel execution completed roughly 32.9% faster than the serial execution, when the timing output in SQL*Plus indicated that the reverse was true.  Bonus: Why do the Buffers and Reads statistics in the top row of the execution plan for the serial execution plan exactly match the autotrace generated consistent gets and physical reads statistics, yet the same is not true for the execution plan for the parallel execution?  Double Bonus: the execution plan claims that the query completed in 0.93 seconds, yet the timing information in SQL*Plus indicated that the query required an additional 7.55 seconds, why is there so much difference between the reported times (keep in mind that all testing was performed on the server, and the output was discarded without formatting and without displaying on screen)?

If cost is time, and time is money, what is the exchange rate for parallel execution?  ;-)

I have not fully worked the answers to the above challenge questions, and I will wait a couple of days before providing any feedback regarding proposed answers supplied by readers.  I strongly suspect that some readers know without a doubt the answers to all of the above questions.





Histograms – What is Wrong with this Quote?

4 07 2012

July 4, 2012

It has been several months since I posted my review of the first half of the “Oracle Database 11gR2 Performance Tuning Cookbook” book.  I had planned to post the review for the second half of the book a couple of weeks later, however four months have passed by since that time, and the second half of the review has yet to materialize.  Someone recently left the following comment attached to one of my reviews on Amazon:

“This is the most comprehensive feedback I have ever read!!”

I appeciate the comment.  Keep in mind that particular review was only roughly 9.5 typewritten pages in length; it takes a lot of time to put together an 18, 24, or 35 typewritten page review that is just as comprehensive as the one that elicited the above comment.

On to the topic of this article…

While reading the “Oracle Database 11gR2 Performance Tuning Cookbook” book, I noticed an interesting set of statements about the use of histograms (bolded sections appeared bolded in the book):

“Histograms are useful only on indexed columns containing skewed values, because they help the optimizer to choose whether to use the index or not to access values.  Obviously, if the frequency for a specific value is very high, using the index won’t be the best choice….”

“Don’t use histograms in situations where:

  • The column is not used in the WHERE clauses of the queries
  • The data in the column is uniformly distributed
  • Bind variables are used when comparing against the column”

What, if anything, is wrong with the above quote from the book?  If possible, provide test cases to support or refute the above quote from the book (see the tips at the bottom of the blue section at the right of this blog page regarding how to post code sections).

The point of blog articles like this one is not to insult authors who have spent thousands of hours carefully constructing an accurate and helpful book, but instead to suggest that readers investigate when something stated does not exactly match what one believes to be true. It could be that the author “took a long walk down a short pier”, or that the author is revealing accurate information which simply cannot be found through other resources (and may in the process be directly contradicting information sources you have used in the past). If you do not investigate in such cases, you may lose an important opportunity to learn something that could prove to be extremely valuable.








Follow

Get every new post delivered to your Inbox.

Join 148 other followers