Parallel Execution Challenge – It’s About Time

8 07 2012

July 8, 2012

As I was reading a book, I saw a comment essentially stating that with multiple CPUs/cores, parallel execution will be faster than serial execution of the same query. The book provided a test case – I decided to change around the test case a bit so that the results were a bit more fair, and then I performed a bit of testing using the sample database data that Oracle Corporation provides.

My testing was performed in Oracle Database 11.2.0.1 on 64 bit Windows with a quad core CPU that supports hyperthreading, and two SSD drives in a RAID 0 array (with all scripts executed on the server). So, is parallel query always faster than serial execution when the server has multiple CPUs/cores? But wait, that is not the interesting question… not the point of this parallel execution challenge, but still an interesting question. Here is the test script, which will be executed three times, with the results of the first execution thrown out so that we may eliminate the side effects of the hard parse:

ALTER SESSION SET CURRENT_SCHEMA=SH;
SET AUTOTRACE TRACEONLY STATISTICS EXPLAIN
SET TIMING ON
SET LINESIZE 145
SET PAGESIZE 1000

ALTER SYSTEM FLUSH BUFFER_CACHE;
SELECT /*+ PARALLEL(2) */ S.PROD_ID, S.CUST_ID, S.TIME_ID FROM SH.SALES S ORDER BY S.AMOUNT_SOLD DESC;

ALTER SYSTEM FLUSH BUFFER_CACHE;
SELECT S.PROD_ID, S.CUST_ID, S.TIME_ID FROM SH.SALES S ORDER BY S.AMOUNT_SOLD DESC;

Notice in the above that the test case performs the parallel execution first, with a degree of parallelism of 2, and then executes the same query without using parallel execution.

Let’s look at the output for the parallel execution:

918843 rows selected.

Elapsed: 00:00:11.97

Execution Plan
----------------------------------------------------------
Plan hash value: 2055439529

-----------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation               | Name     | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     | Pstart| Pstop |    TQ  |IN-OUT| PQ Distrib |
-----------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |          |   918K|    19M|       |   308  (14)| 00:00:04 |       |       |        |      |            |
|   1 |  PX COORDINATOR         |          |       |       |       |            |          |       |       |        |      |            |
|   2 |   PX SEND QC (ORDER)    | :TQ10001 |   918K|    19M|       |   308  (14)| 00:00:04 |       |       |  Q1,01 | P->S | QC (ORDER) |
|   3 |    SORT ORDER BY        |          |   918K|    19M|    31M|   308  (14)| 00:00:04 |       |       |  Q1,01 | PCWP |            |
|   4 |     PX RECEIVE          |          |   918K|    19M|       |   277   (4)| 00:00:04 |       |       |  Q1,01 | PCWP |            |
|   5 |      PX SEND RANGE      | :TQ10000 |   918K|    19M|       |   277   (4)| 00:00:04 |       |       |  Q1,00 | P->P | RANGE      |
|   6 |       PX BLOCK ITERATOR |          |   918K|    19M|       |   277   (4)| 00:00:04 |     1 |    28 |  Q1,00 | PCWC |            |
|   7 |        TABLE ACCESS FULL| SALES    |   918K|    19M|       |   277   (4)| 00:00:04 |     1 |    28 |  Q1,00 | PCWP |            |
-----------------------------------------------------------------------------------------------------------------------------------------

Note
-----
   - Degree of Parallelism is 2 because of hint

Statistics
----------------------------------------------------------
         12  recursive calls
          0  db block gets
       1802  consistent gets
       1647  physical reads
          0  redo size
   22116448  bytes sent via SQL*Net to client
    1102982  bytes received via SQL*Net from client
      61258  SQL*Net roundtrips to/from client
          3  sorts (memory)
          0  sorts (disk)
     918843  rows processed

918,843 rows selected (not displayed in SQL*Plus due to the AUTOTRACE setting), 1,647 blocks read from disk, 1,802 consistent gets, an estimated cost of 308, and the execution completed in 11.97 seconds.

Let’s take a look at the serial execution output:

918843 rows selected.

Elapsed: 00:00:08.84

Execution Plan
----------------------------------------------------------
Plan hash value: 3803407550

------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name  | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     | Pstart| Pstop |
------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |       |   918K|    19M|       |  6637   (2)| 00:01:20 |       |       |
|   1 |  SORT ORDER BY       |       |   918K|    19M|    31M|  6637   (2)| 00:01:20 |       |       |
|   2 |   PARTITION RANGE ALL|       |   918K|    19M|       |   499   (4)| 00:00:06 |     1 |    28 |
|   3 |    TABLE ACCESS FULL | SALES |   918K|    19M|       |   499   (4)| 00:00:06 |     1 |    28 |
------------------------------------------------------------------------------------------------------

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       1718  consistent gets
       1647  physical reads
          0  redo size
   22116448  bytes sent via SQL*Net to client
    1102982  bytes received via SQL*Net from client
      61258  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
     918843  rows processed

918,843 rows selected (not displayed in SQL*Plus due to the AUTOTRACE setting), 1,647 blocks read from disk, 1,718 consistent gets, an estimated cost of 6,637, and the execution completed in 8.84 seconds.

* The first part of the challenge is to describe how the cost is derived in the above execution plan for the parallel execution. Here is the easy part of the cost calculation:

cost of serial full table scan / degree of parallel / 0.90
499 / 2 / 0.90 = 277.22

The above calculation is easy to understand, and is explained in this article. The SORT ORDER BY operation in the parallel query execution plan added 31 to the calculated cost (308 – 277), while the same operation in the serial execution added 6,138 to the calculated cost (6637 – 499).

The first challenge is to describe why the calculated cost of the parallel execution plan is so small compared to the serial execution.

—

The timing output indicated that the serial execution required 3.13 fewer seconds than the parallel execution. Let’s look at the results of the third execution of this test case script:

918843 rows selected.

Elapsed: 00:00:10.66

Execution Plan
----------------------------------------------------------
Plan hash value: 2055439529

-----------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation               | Name     | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     | Pstart| Pstop |    TQ  |IN-OUT| PQ Distrib |
-----------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |          |   918K|    19M|       |   308  (14)| 00:00:04 |       |       |        |      |            |
|   1 |  PX COORDINATOR         |          |       |       |       |            |          |       |       |        |      |            |
|   2 |   PX SEND QC (ORDER)    | :TQ10001 |   918K|    19M|       |   308  (14)| 00:00:04 |       |       |  Q1,01 | P->S | QC (ORDER) |
|   3 |    SORT ORDER BY        |          |   918K|    19M|    31M|   308  (14)| 00:00:04 |       |       |  Q1,01 | PCWP |            |
|   4 |     PX RECEIVE          |          |   918K|    19M|       |   277   (4)| 00:00:04 |       |       |  Q1,01 | PCWP |            |
|   5 |      PX SEND RANGE      | :TQ10000 |   918K|    19M|       |   277   (4)| 00:00:04 |       |       |  Q1,00 | P->P |      RANGE |
|   6 |       PX BLOCK ITERATOR |          |   918K|    19M|       |   277   (4)| 00:00:04 |     1 |    28 |  Q1,00 | PCWC |            |
|   7 |        TABLE ACCESS FULL| SALES    |   918K|    19M|       |   277   (4)| 00:00:04 |     1 |    28 |  Q1,00 | PCWP |            |
-----------------------------------------------------------------------------------------------------------------------------------------

Note
-----
   - Degree of Parallelism is 2 because of hint

Statistics
----------------------------------------------------------
         12  recursive calls
          0  db block gets
       1802  consistent gets
       1647  physical reads
          0  redo size
   22116448  bytes sent via SQL*Net to client
    1102982  bytes received via SQL*Net from client
      61258  SQL*Net roundtrips to/from client
          3  sorts (memory)
          0  sorts (disk)
     918843  rows processed

The above shows the results of the parallel execution. 918,843 rows selected (not displayed in SQL*Plus due to the AUTOTRACE setting), 1,647 blocks read from disk, 1,802 consistent gets, an estimated cost of 308, and completed in 10.66 seconds. Notice that with the exception of the elapsed time, the statistics are identical to what we saw during the previous execution of the test script.

918843 rows selected.

Elapsed: 00:00:08.48

Execution Plan
----------------------------------------------------------
Plan hash value: 3803407550

------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name  | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     | Pstart| Pstop |
------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |       |   918K|    19M|       |  6637   (2)| 00:01:20 |       |       |
|   1 |  SORT ORDER BY       |       |   918K|    19M|    31M|  6637   (2)| 00:01:20 |       |       |
|   2 |   PARTITION RANGE ALL|       |   918K|    19M|       |   499   (4)| 00:00:06 |     1 |    28 |
|   3 |    TABLE ACCESS FULL | SALES |   918K|    19M|       |   499   (4)| 00:00:06 |     1 |    28 |
------------------------------------------------------------------------------------------------------

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       1718  consistent gets
       1647  physical reads
          0  redo size
   22116448  bytes sent via SQL*Net to client
    1102982  bytes received via SQL*Net from client
      61258  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
     918843  rows processed

The above shows the results of the serial execution. 918,843 rows selected (not displayed in SQL*Plus due to the AUTOTRACE setting), 1,647 blocks read from disk, 1,718 consistent gets, an estimated cost of 6,637, and completed in 8.48 seconds. Notice that with the exception of the elapsed time, the statistics are identical to what we saw during the previous execution of the test script.

The second challenge is to describe why the serial execution completed at least 25% faster than the parallel execution of the same query – this result is counter to what the calculated cost of the execution plans suggest, and counter to what the book claimed.

—

Maybe we need to look at the result from a different angle. The book did not use SET TIMING ON as my test case above did, so let’s change the test case again. This time, we will use the GATHER_PLAN_STATISTICS hint so that the runtime engine will collect the timing and various other statistics, and then use DBMS_XPLAN to display the actual execution plan. To eliminate the effects of hard parsing, the script will be executed three times with the first execution thrown out. The new script:

ALTER SESSION SET CURRENT_SCHEMA=SH;
SET AUTOTRACE TRACEONLY STATISTICS
SET TIMING OFF
SET LINESIZE 145
SET PAGESIZE 1000

ALTER SYSTEM FLUSH BUFFER_CACHE;
SELECT /*+ Q1 PARALLEL(2) GATHER_PLAN_STATISTICS */ S.PROD_ID, S.CUST_ID, S.TIME_ID FROM SH.SALES S ORDER BY S.AMOUNT_SOLD DESC;

ALTER SYSTEM FLUSH BUFFER_CACHE;
SELECT /*+ Q2 GATHER_PLAN_STATISTICS */ S.PROD_ID, S.CUST_ID, S.TIME_ID FROM SH.SALES S ORDER BY S.AMOUNT_SOLD DESC;

SET AUTOTRACE OFF 

SELECT
  SQL_ID,
  CHILD_NUMBER,
  SUBSTR(SQL_TEXT,1,30) SQL_TEXT
FROM
  V$SQL
WHERE
  SQL_TEXT LIKE 'SELECT /*+ Q%'
ORDER BY
  3,
  1,
  2;

The last of the above SQL statements simply retrieves the SQL_ID and CHILD_NUMBER for the two previous queries. For my results, I received the following output from the last of the above SQL statements:

SQL_ID        CHILD_NUMBER SQL_TEXT
------------- ------------ ------------------------------
5p47nbddfn0k7            0 SELECT /*+ Q1 PARALLEL(2) GATH
8k6bdqr09xc8u            0 SELECT /*+ Q2 GATHER_PLAN_STAT

To display the execution plans with the runtime statistics, we are able to use the following two statements:

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR('5p47nbddfn0k7',0,'ALLSTATS LAST +COST'));

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR('8k6bdqr09xc8u',0,'ALLSTATS LAST +COST'));

Here are the autotrace statistics for the parallel execution:

Statistics
----------------------------------------------------------
         12  recursive calls
          0  db block gets
       1802  consistent gets
       1647  physical reads
          0  redo size
   22116448  bytes sent via SQL*Net to client
    1102982  bytes received via SQL*Net from client
      61258  SQL*Net roundtrips to/from client
          3  sorts (memory)
          0  sorts (disk)
     918843  rows processed

The DBMS_XPLAN generated execution plan for the parallel execution follows:

SQL_ID  5p47nbddfn0k7, child number 0
-------------------------------------
SELECT /*+ Q1 PARALLEL(2) GATHER_PLAN_STATISTICS */ S.PROD_ID,
S.CUST_ID, S.TIME_ID FROM SH.SALES S ORDER BY S.AMOUNT_SOLD DESC

Plan hash value: 2055439529

-----------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation               | Name     | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |          |      1 |        |   308 (100)|    918K|00:00:00.70 |     116 |     28 |       |       |          |
|   1 |  PX COORDINATOR         |          |      1 |        |            |    918K|00:00:00.70 |     116 |     28 |       |       |          |
|   2 |   PX SEND QC (ORDER)    | :TQ10001 |      0 |    918K|   308  (14)|      0 |00:00:00.01 |       0 |      0 |       |       |          |
|   3 |    SORT ORDER BY        |          |      0 |    918K|   308  (14)|      0 |00:00:00.01 |       0 |      0 |    83M|  3142K|   37M (0)|
|   4 |     PX RECEIVE          |          |      0 |    918K|   277   (4)|      0 |00:00:00.01 |       0 |      0 |       |       |          |
|   5 |      PX SEND RANGE      | :TQ10000 |      0 |    918K|   277   (4)|      0 |00:00:00.01 |       0 |      0 |       |       |          |
|   6 |       PX BLOCK ITERATOR |          |      0 |    918K|   277   (4)|      0 |00:00:00.01 |       0 |      0 |       |       |          |
|*  7 |        TABLE ACCESS FULL| SALES    |      0 |    918K|   277   (4)|      0 |00:00:00.01 |       0 |      0 |       |       |          |
-----------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   7 - access(:Z>=:Z AND :Z<=:Z)

Note
-----
   - Degree of Parallelism is 2 because of hint

Notice that the execution plan claims that the query completed in 0.70 seconds, that the calculated cost of the execution plan is (still) 308, that 28 blocks were read from disk, and 116 consistent gets were performed.

Here are the autotrace statistics for the serial execution:

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       1718  consistent gets
       1647  physical reads
          0  redo size
   22116448  bytes sent via SQL*Net to client
    1102982  bytes received via SQL*Net from client
      61258  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
     918843  rows processed

The DBMS_XPLAN generated execution plan for the serial execution plan follows:

SQL_ID  8k6bdqr09xc8u, child number 0
-------------------------------------
SELECT /*+ Q2 GATHER_PLAN_STATISTICS */ S.PROD_ID, S.CUST_ID, S.TIME_ID
FROM SH.SALES S ORDER BY S.AMOUNT_SOLD DESC

Plan hash value: 3803407550

-----------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |       |      1 |        |  6637 (100)|    918K|00:00:00.93 |    1718 |   1647 |       |       |          |
|   1 |  SORT ORDER BY       |       |      1 |    918K|  6637   (2)|    918K|00:00:00.93 |    1718 |   1647 |    42M|  2308K|   37M (0)|
|   2 |   PARTITION RANGE ALL|       |      1 |    918K|   499   (4)|    918K|00:00:00.31 |    1718 |   1647 |       |       |          |
|   3 |    TABLE ACCESS FULL | SALES |     28 |    918K|   499   (4)|    918K|00:00:00.22 |    1718 |   1647 |       |       |          |
-----------------------------------------------------------------------------------------------------------------------------------------

Notice that the execution plan claims that the query completed in 0.93 seconds, that the calculated cost of the execution plan is (still) 6,637, that 1,647 blocks were read from disk, and 1,718 consistent gets were performed.

The third challenge is to describe why DBMS_XPLAN claims that the parallel execution completed roughly 32.9% faster than the serial execution, when the timing output in SQL*Plus indicated that the reverse was true. Bonus: Why do the Buffers and Reads statistics in the top row of the execution plan for the serial execution plan exactly match the autotrace generated consistent gets and physical reads statistics, yet the same is not true for the execution plan for the parallel execution? Double Bonus: the execution plan claims that the query completed in 0.93 seconds, yet the timing information in SQL*Plus indicated that the query required an additional 7.55 seconds, why is there so much difference between the reported times (keep in mind that all testing was performed on the server, and the output was discarded without formatting and without displaying on screen)?

—

If cost is time, and time is money, what is the exchange rate for parallel execution? 😉

—

I have not fully worked the answers to the above challenge questions, and I will wait a couple of days before providing any feedback regarding proposed answers supplied by readers. I strongly suspect that some readers know without a doubt the answers to all of the above questions.

Actions

Information

Date : July 8, 2012
Categories : Performance

15 responses

9 07 2012

Romeo (21:55:19) :

What book are you referring? I’m interested in PX.

9 07 2012

Charles Hooper (22:26:19) :

Romeo,

I suggest taking a look at some of the articles that Doug Burns wrote on his blog. I believe that his articles on the topic are well regarded (and free as a bonus). Clicking the following link should help you locate those articles:
http://www.google.com/#hl=en&q=site:oracledoug.com+parallel

If you are looking for a *vry* light-weight article on parallel execution, you can take a look at one of my articles:
https://hoopercharles.wordpress.com/2010/02/17/parallel-reasoning/

I have read several books on the topic of Oracle Database performance tuning (some much better than others). The only general performance tuning book for which I did not have page after page of review notes pointing out errors is the “Troubleshooting Oracle Performance” book (yes, I made a considerable effort to find logic and technical errors in the book). You can read my review of that book here:
https://hoopercharles.wordpress.com/2011/11/07/book-review-troubleshooting-oracle-performance-part-2/

Some of my review notes from that book related to parallel execution follow:

* Various details of parallel processing (query, DML, and DDL) are described. (pages 489-513)
* “Parallel DML statements are disabled by default (be careful, this is the opposite of parallel queries).” Use a command such as “ALTER SESSION ENABLE PARALLEL DML” or “ALTER SESSION FORCE PARALLEL DML PARALLEL 4” to specify the default degree of DML parallelism. (page 503)
* The book states: “Parallel DDL statements are enabled by default.” (page 505)
* Disabling parallel execution at the session level does not disable parallel execution for recursive SQL statements. (page 509)
* Parallel execution should not be enabled unless there is sufficient CPU/memory/IO bandwidth available and the query requires at least a minute to execute. “It is important to stress that if these two conditions are not met, the performance could decrease instead of increase.” (page 509)
* Regarding the various parallel related hints, the book states: “It is very important to understand that the hints parallel and parallel_index do not force the query optimizer to use parallel processing. Instead, they override the degree of parallelism defined at the table or index level.” (page 509)

You should be able to find the “Troubleshooting Oracle Performance” book in the online Google books site, and read a couple of pages to see if the book it is worth your money to buy the book.

The parallel execution hints (and some of the parallel execution behavior) changed a bit in Oracle Database 11.2, so you might want to seek out articles that are specific to 11.2.0.1 and above, if that is the release version of Oracle Database that you are using.

The book that I am in the process of finalizing the second half of the review for only has a couple of pages on the topic of parallel execution – that book used a table level parallel hint, while my test case above used a statement level parallel hint.

10 07 2012

olivier (04:18:47) :

Thanks for all these links Charles, much appreciated. I also enjoy very much your chalenges because they are simply very helpfull to learn again and again. regarding above challenges, I think i got the answers, but i prefer to think about it twice before posting something that would simply be stupide 🙂

10 07 2012

Charles Hooper (20:41:45) :

I thought for sure that I would have received several theories for the challenge questions by now.

I wonder if changing the script slightly to enable a couple types of traces might provide a clue:

ALTER SESSION SET CURRENT_SCHEMA=SH;
SET AUTOTRACE TRACEONLY STATISTICS EXPLAIN
SET TIMING ON
SET LINESIZE 145
SET PAGESIZE 1000
 
ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SESSION SET EVENTS '10053 TRACE NAME CONTEXT FOREVER, LEVEL 1';
ALTER SESSION SET EVENTS '10032 TRACE NAME CONTEXT FOREVER, LEVEL 1';
ALTER SESSION SET EVENTS '10033 TRACE NAME CONTEXT FOREVER, LEVEL 1';
ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT FOREVER, LEVEL 8';
ALTER SESSION SET TRACEFILE_IDENTIFIER='parallel';
SELECT /*+ PARALLEL(2) */ S.PROD_ID, S.CUST_ID, S.TIME_ID FROM SH.SALES S ORDER BY S.AMOUNT_SOLD DESC;
 
ALTER SESSION SET TRACEFILE_IDENTIFIER='flush'; 
ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SESSION SET TRACEFILE_IDENTIFIER='serial';
SELECT S.PROD_ID, S.CUST_ID, S.TIME_ID FROM SH.SALES S ORDER BY S.AMOUNT_SOLD DESC;
 
ALTER SESSION SET EVENTS '10053 TRACE NAME CONTEXT OFF';
ALTER SESSION SET EVENTS '10032 TRACE NAME CONTEXT OFF';
ALTER SESSION SET EVENTS '10033 TRACE NAME CONTEXT OFF';
ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT OFF';

Taking a look inside the large trace file containing “parallel” in its name, among other items, the file contains:

*********************************
Number of join permutations tried: 1
*********************************
    SORT ressource         Sort statistics
      Sort width:         598 Area size:     1048576 Max Area size:   104857600
      Degree:               1
      Blocks to Sort: 3938 Row size:     35 Total Rows:         918843
      Initial runs:   2 Merge passes:  1 IO Cost / pass:       2134
      Total IO sort cost: 6072      Total CPU sort cost: 931257443
      Total Temp space used: 33317000
    SORT response         Sort statistics
      Sort width:         598 Area size:    20971520 Max Area size:   104857600
      Degree:               2
      Blocks to Sort: 1969 Row size:     35 Total Rows:         459422
      Initial runs:   1 Merge passes:  0 IO Cost / pass:          0
      Total IO sort cost: 0      Total CPU sort cost: 448383289
      Total Temp space used: 0
Trying or-Expansion on query block SEL$1 (#0)
Transfer Optimizer annotations for query block SEL$1 (#0)
Final cost for query block SEL$1 (#0) - All Rows Plan:
  Best join order: 1
  Cost: 308.4848  Degree: 2  Card: 918843.0000  Bytes: 20214546
  Resc: 6636.0126  Resc_io: 6553.0000  Resc_cpu: 1179345053
  Resp: 308.4848  Resp_io: 267.2222  Resc_cpu: 586209739
 
...
 
---- Sort Parameters ------------------------------
sort_area_size                    65536
sort_area_retained_size           65536
sort_multiblock_read_count        1
max intermediate merge width      3
 
...
 
---- Sort Statistics ------------------------------
Input records                             180
Output records                            180
Total number of comparisons performed     396
  Comparisons performed by in-memory sort 396
Total amount of memory used               10240
Uses version 1 sort

Taking a look inside the large trace file containing “serial” in its name, among other items, the file contains:

*********************************
Number of join permutations tried: 1
*********************************
    SORT ressource         Sort statistics
      Sort width:         598 Area size:     1048576 Max Area size:   104857600
      Degree:               1
      Blocks to Sort: 3938 Row size:     35 Total Rows:         918843
      Initial runs:   2 Merge passes:  1 IO Cost / pass:       2134
      Total IO sort cost: 6072      Total CPU sort cost: 931257443
      Total Temp space used: 33317000
Trying or-Expansion on query block SEL$1 (#0)
Transfer Optimizer annotations for query block SEL$1 (#0)
Final cost for query block SEL$1 (#0) - All Rows Plan:
  Best join order: 1
  Cost: 6636.8993  Degree: 1  Card: 918843.0000  Bytes: 20214546
  Resc: 6636.8993  Resc_io: 6553.0000  Resc_cpu: 1191942880
  Resp: 6636.8993  Resp_io: 6553.0000  Resc_cpu: 1191942880
 
...
 
---- Sort Parameters ------------------------------
sort_area_size                    39721984
sort_area_retained_size           39721984
sort_multiblock_read_count        1
max intermediate merge width      2423
 
...
 
---- Sort Statistics ------------------------------
Input records                             918843
Output records                            918843
Total number of comparisons performed     5061139
  Comparisons performed by in-memory sort 5061139
Total amount of memory used               39721984
Uses version 2 sort
---- End of Sort Statistics -----------------------

Interesting? Remember that the PARALLEL hint does not force parallel execution, it only changes plan costing. Silly thought – could simply adding an ORDER BY clause cause a query to switch from serial execution to parallel execution? 😉

—

Let’s process the trace files with TKPROF. First, the large trace file containing “serial” in its name, just the section related to the test SQL statement:

SQL ID: gs08hn87cyhsv
Plan Hash: 3803407550
SELECT S.PROD_ID, S.CUST_ID, S.TIME_ID 
FROM
 SH.SALES S ORDER BY S.AMOUNT_SOLD DESC
 
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.01       0.01          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch    61258      3.30       3.57       1654       1718          0      918843
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total    61260      3.32       3.59       1654       1718          0      918843
 
Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 88  
Number of plan statistics captured: 1
 
Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
    918843     918843     918843  SORT ORDER BY (cr=1718 pr=1654 pw=0 time=825396 us cost=6637 size=20214546 card=918843)
    918843     918843     918843   PARTITION RANGE ALL PARTITION: 1 28 (cr=1718 pr=1654 pw=0 time=317031 us cost=499 size=20214546 card=918843)
    918843     918843     918843    TABLE ACCESS FULL SALES PARTITION: 1 28 (cr=1718 pr=1654 pw=0 time=238966 us cost=499 size=20214546 card=918843)
 
Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                   61258        0.00          0.05
  db file sequential read                        29        0.00          0.01
  db file scattered read                        226        0.00          0.06
  SQL*Net message from client                 61258        0.00          8.67

Now, the large trace file containing “parallel” in its name:

SQL ID: 14wuxha2d93v3
Plan Hash: 2055439529
SELECT /*+ PARALLEL(2) */ S.PROD_ID, S.CUST_ID, S.TIME_ID 
FROM
 SH.SALES S ORDER BY S.AMOUNT_SOLD DESC
 
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.01       0.02          0          0          0           0
Execute      1      0.00       0.05         28        116          0           0
Fetch    61258      3.08       3.32          0          0          0      918843
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total    61260      3.10       3.41         28        116          0      918843
 
Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 88  
Number of plan statistics captured: 1
 
Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
    918843     918843     918843  PX COORDINATOR  (cr=116 pr=28 pw=0 time=543650 us)
         0          0          0   PX SEND QC (ORDER) :TQ10001 (cr=0 pr=0 pw=0 time=0 us cost=308 size=20214546 card=918843)
         0          0          0    SORT ORDER BY (cr=0 pr=0 pw=0 time=0 us cost=308 size=20214546 card=918843)
         0          0          0     PX RECEIVE  (cr=0 pr=0 pw=0 time=0 us cost=277 size=20214546 card=918843)
         0          0          0      PX SEND RANGE :TQ10000 (cr=0 pr=0 pw=0 time=0 us cost=277 size=20214546 card=918843)
         0          0          0       PX BLOCK ITERATOR PARTITION: 1 28 (cr=0 pr=0 pw=0 time=0 us cost=277 size=20214546 card=918843)
         0          0          0        TABLE ACCESS FULL SALES PARTITION: 1 28 (cr=0 pr=0 pw=0 time=0 us cost=277 size=20214546 card=918843)
 
Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  Disk file operations I/O                        1        0.00          0.00
  db file parallel read                           1        0.00          0.00
  os thread startup                               4        0.00          0.02
  PX Deq Credit: send blkd                        4        0.00          0.00
  PX Deq: Join ACK                                2        0.00          0.01
  PX Deq: Parse Reply                             5        0.01          0.01
  SQL*Net message to client                   61258        0.00          0.05
  PX Deq: Execute Reply                          30        0.07          0.40
  PX Deq: Table Q qref                            1        0.00          0.00
  SQL*Net message from client                 61258        0.00          8.87
  PX Deq: Signal ACK RSG                          1        0.00          0.00
  PX Deq: Signal ACK EXT                          3        0.00          0.00
  PX Deq: Slave Session Stats                     6        0.00          0.00

The above “query” value of 116 matches the “Buffers” value found in the DBMS_XPLAN output for the parallel execution, and the “disk” value matches the “Reads” value found in the DBMS_XPLAN output. If you take another look at the DBMS_XPLAN output, you will see that the “Starts” column value for plan line IDs 2 through 7 is 0 (indicating that line in the plan was never executed), and the “A-Rows” column for those same plan line IDs is 0, just as the above execution plan shows. The “A-Time” value in the DBMS_XPLAN output does not exactly match the “time=” value shown above – of course enabling a trace is sufficient to increase the execution time, but the above indicates that the plan required 0.54 seconds, while the DBMS_XPLAN output suggests 0.70 seconds. I wonder if there is a clue here?

You might have noticed that I stated “the large trace file containing ‘parallel’ in its name” above. Let’s process the other trace files containing “parallel” in their names.
p003:

SQL ID: 14wuxha2d93v3
Plan Hash: 2055439529
SELECT /*+ PARALLEL(2) */ S.PROD_ID, S.CUST_ID, S.TIME_ID 
FROM
 SH.SALES S ORDER BY S.AMOUNT_SOLD DESC
 
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.26       0.32        817        845          0           0
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        2      0.26       0.32        817        845          0           0
 
Misses in library cache during parse: 0
Optimizer mode: ALL_ROWS
Parsing user id: 88     (recursive depth: 1)
Number of plan statistics captured: 1
 
Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         0          0          0  PX COORDINATOR  (cr=0 pr=0 pw=0 time=0 us)
         0          0          0   PX SEND QC (ORDER) :TQ10001 (cr=0 pr=0 pw=0 time=0 us cost=308 size=20214546 card=918843)
         0          0          0    SORT ORDER BY (cr=0 pr=0 pw=0 time=0 us cost=308 size=20214546 card=918843)
         0          0          0     PX RECEIVE  (cr=0 pr=0 pw=0 time=0 us cost=277 size=20214546 card=918843)
         0          0          0      PX SEND RANGE :TQ10000 (cr=0 pr=0 pw=0 time=0 us cost=277 size=20214546 card=918843)
    464069     464069     464069       PX BLOCK ITERATOR PARTITION: 1 28 (cr=845 pr=817 pw=0 time=177690 us cost=277 size=20214546 card=918843)
    464069     464069     464069        TABLE ACCESS FULL SALES PARTITION: 1 28 (cr=845 pr=817 pw=0 time=114705 us cost=277 size=20214546 card=918843)
 
Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  PX Deq: Execution Msg                          15        0.00          0.00
  Disk file operations I/O                        1        0.00          0.00
  db file scattered read                        114        0.00          0.05
  PX Deq: Table Q Get Keys                        3        0.00          0.00
  PX Deq Credit: need buffer                    268        0.00          0.01
  db file sequential read                         1        0.00          0.00

p002:

SQL ID: 14wuxha2d93v3
Plan Hash: 2055439529
SELECT /*+ PARALLEL(2) */ S.PROD_ID, S.CUST_ID, S.TIME_ID 
FROM
 SH.SALES S ORDER BY S.AMOUNT_SOLD DESC
 
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.31        809        841          0           0
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        2      0.00       0.31        809        841          0           0
 
Misses in library cache during parse: 0
Optimizer mode: ALL_ROWS
Parsing user id: 88     (recursive depth: 1)
Number of plan statistics captured: 1
 
Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         0          0          0  PX COORDINATOR  (cr=0 pr=0 pw=0 time=0 us)
         0          0          0   PX SEND QC (ORDER) :TQ10001 (cr=0 pr=0 pw=0 time=0 us cost=308 size=20214546 card=918843)
         0          0          0    SORT ORDER BY (cr=0 pr=0 pw=0 time=0 us cost=308 size=20214546 card=918843)
         0          0          0     PX RECEIVE  (cr=0 pr=0 pw=0 time=0 us cost=277 size=20214546 card=918843)
         0          0          0      PX SEND RANGE :TQ10000 (cr=0 pr=0 pw=0 time=0 us cost=277 size=20214546 card=918843)
    454774     454774     454774       PX BLOCK ITERATOR PARTITION: 1 28 (cr=841 pr=809 pw=0 time=166442 us cost=277 size=20214546 card=918843)
    454774     454774     454774        TABLE ACCESS FULL SALES PARTITION: 1 28 (cr=841 pr=809 pw=0 time=106440 us cost=277 size=20214546 card=918843)
 
Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  PX Deq: Execution Msg                          14        0.00          0.00
  Disk file operations I/O                        1        0.00          0.00
  db file scattered read                        113        0.00          0.05
  PX Deq: Table Q Get Keys                        3        0.00          0.00
  PX Deq Credit: need buffer                    214        0.00          0.00
  PX qref latch                                   1        0.00          0.00

p001:

SQL ID: 14wuxha2d93v3
Plan Hash: 2055439529
SELECT /*+ PARALLEL(2) */ S.PROD_ID, S.CUST_ID, S.TIME_ID 
FROM
 SH.SALES S ORDER BY S.AMOUNT_SOLD DESC
 
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00      17.98          0          0          0           0
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        2      0.00      17.98          0          0          0           0
 
Misses in library cache during parse: 0
Optimizer mode: ALL_ROWS
Parsing user id: 88     (recursive depth: 1)
Number of plan statistics captured: 1
 
Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         0          0          0  PX COORDINATOR  (cr=0 pr=0 pw=0 time=0 us)
         0          0          0   PX SEND QC (ORDER) :TQ10001 (cr=0 pr=0 pw=0 time=0 us cost=308 size=20214546 card=918843)
    901466     901466     901466    SORT ORDER BY (cr=0 pr=0 pw=0 time=334881 us cost=308 size=20214546 card=918843)
    901466     901466     901466     PX RECEIVE  (cr=0 pr=0 pw=0 time=230518 us cost=277 size=20214546 card=918843)
         0          0          0      PX SEND RANGE :TQ10000 (cr=0 pr=0 pw=0 time=0 us cost=277 size=20214546 card=918843)
         0          0          0       PX BLOCK ITERATOR PARTITION: 1 28 (cr=0 pr=0 pw=0 time=0 us cost=277 size=20214546 card=918843)
         0          0          0        TABLE ACCESS FULL SALES PARTITION: 1 28 (cr=0 pr=0 pw=0 time=0 us cost=277 size=20214546 card=918843)
 
Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  PX Deq: Execution Msg                           3        0.00          0.00
  PX Deq Credit: send blkd                     1044        0.32         17.25
  PX Deq: Table Q Sample                          1        0.00          0.00
  PX Deq: Table Q Normal                        284        0.00          0.02
  PX qref latch                                   2        0.00          0.00

p000:

SQL ID: 14wuxha2d93v3
Plan Hash: 2055439529
SELECT /*+ PARALLEL(2) */ S.PROD_ID, S.CUST_ID, S.TIME_ID 
FROM
 SH.SALES S ORDER BY S.AMOUNT_SOLD DESC
 
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.01       0.68          0          0          0           0
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        2      0.01       0.68          0          0          0           0
 
Misses in library cache during parse: 0
Optimizer mode: ALL_ROWS
Parsing user id: 88     (recursive depth: 1)
Number of plan statistics captured: 1
 
Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         0          0          0  PX COORDINATOR  (cr=0 pr=0 pw=0 time=0 us)
         0          0          0   PX SEND QC (ORDER) :TQ10001 (cr=0 pr=0 pw=0 time=0 us cost=308 size=20214546 card=918843)
     17377      17377      17377    SORT ORDER BY (cr=0 pr=0 pw=0 time=5405 us cost=308 size=20214546 card=918843)
     17377      17377      17377     PX RECEIVE  (cr=0 pr=0 pw=0 time=2612404 us cost=277 size=20214546 card=918843)
         0          0          0      PX SEND RANGE :TQ10000 (cr=0 pr=0 pw=0 time=0 us cost=277 size=20214546 card=918843)
         0          0          0       PX BLOCK ITERATOR PARTITION: 1 28 (cr=0 pr=0 pw=0 time=0 us cost=277 size=20214546 card=918843)
         0          0          0        TABLE ACCESS FULL SALES PARTITION: 1 28 (cr=0 pr=0 pw=0 time=0 us cost=277 size=20214546 card=918843)
 
Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  PX Deq: Execution Msg                           3        0.07          0.07
  PX Deq Credit: send blkd                       19        0.01          0.27
  PX Deq: Table Q Sample                          1        0.00          0.00
  PX Deq: Table Q Normal                         28        0.04          0.31

—

I do not know if these MOS (Metalink) documents apply, but I thought that the documents appeared to be interesting:
Bug 13040301: “CBO CHOOSE PARALLEL EXECUTION DUE CHEAPER COST BUT SERIAL IS RUNNING FASTER” – costing problem, related to views and partitioned tables
Bug 12339245: “OPTIMIZER IMPROPERLY USING PARALLELISM FOR QUERIES” – probably does not apply, makes no mention of cost

A couple of possibly helpful documentation links:
http://docs.oracle.com/cd/E11882_01/server.112/e25523/parallel002.htm
http://docs.oracle.com/cd/E11882_01/server.112/e16638/ex_plan.htm

—-

OK – anyone want to try answering the parallel execution challenge questions?

11 07 2012

Lukasz (09:05:20) :

First problem I see with this query is fact that data distribution between parallel servers is totally incorrect. p001 receives almost whole dataset from both parallel servers performing table scans (900K rows), and almost whole sorting is performed by this server, while second sorting PX server (p000) sorts only 17K rows. This parallel (not-so-parallel?) sort operation, which is the most time-consuming operation of this query, won’t be performed faster than sort in serial execution. I suspect that PX coordination overhead is not so significant to make whole parallel query slower, so probably there are more issues here.

11 07 2012

Colin 't Hart (02:17:52) :

Something about the sorting looks fishy to me 🙂

11 07 2012

Charles Hooper (15:47:23) :

Colin,

I was thinking something similar. I thought that people running other Oracle Database release versions might determine if they see the same behavior (a table level PARALLEL hint will be required if using 11.1.0.7 or below). The fish have not had any bites yet, so I tested Oracle Database 11.2.0.3 with patch 7 (released less than a month ago) with a fresh copy of the sample database schema.

In the sample database, I hardcoded the system statistics and left all parameters at the default (SGA_TARGET at 6000M, PGA_AGGREGATE_TARGET at 2000M):

EXEC DBMS_STATS.SET_SYSTEM_STATS('MBRC',16)
EXEC DBMS_STATS.SET_SYSTEM_STATS('CPUSPEED',2664)
EXEC DBMS_STATS.SET_SYSTEM_STATS('SREADTIM',8)
EXEC DBMS_STATS.SET_SYSTEM_STATS('MREADTIM',10)
EXEC DBMS_STATS.SET_SYSTEM_STATS('MAXTHR',19181568)

Parallel run timing:

Elapsed: 00:00:03.70
 
Execution Plan
----------------------------------------------------------
Plan hash value: 2055439529
 
-----------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation               | Name     | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     | Pstart| Pstop |    TQ  |IN-OUT| PQ Distrib |
-----------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |          |   918K|    19M|       |   112  (25)| 00:00:01 |       |       |        |      |            |
|   1 |  PX COORDINATOR         |          |       |       |       |            |          |       |       |        |      |            |
|   2 |   PX SEND QC (ORDER)    | :TQ10001 |   918K|    19M|       |   112  (25)| 00:00:01 |       |       |  Q1,01 | P->S | QC (ORDER) |
|   3 |    SORT ORDER BY        |          |   918K|    19M|    31M|   112  (25)| 00:00:01 |       |       |  Q1,01 | PCWP |            |
|   4 |     PX RECEIVE          |          |   918K|    19M|       |    90   (7)| 00:00:01 |       |       |  Q1,01 | PCWP |            |
|   5 |      PX SEND RANGE      | :TQ10000 |   918K|    19M|       |    90   (7)| 00:00:01 |       |       |  Q1,00 | P->P |      RANGE |
|   6 |       PX BLOCK ITERATOR |          |   918K|    19M|       |    90   (7)| 00:00:01 |     1 |    28 |  Q1,00 | PCWC |            |
|   7 |        TABLE ACCESS FULL| SALES    |   918K|    19M|       |    90   (7)| 00:00:01 |     1 |    28 |  Q1,00 | PCWP |            |
-----------------------------------------------------------------------------------------------------------------------------------------
 
Note
-----
   - Degree of Parallelism is 2 because of hint
 
Statistics
----------------------------------------------------------
         12  recursive calls
          0  db block gets
       1684  consistent gets
       1635  physical reads
          0  redo size
   26710873  bytes sent via SQL*Net to client
     674335  bytes received via SQL*Net from client
      61258  SQL*Net roundtrips to/from client
          3  sorts (memory)
          0  sorts (disk)
     918843  rows processed

Serial run timing:

Elapsed: 00:00:03.59
 
Execution Plan
----------------------------------------------------------
Plan hash value: 3803407550
 
------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name  | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     | Pstart| Pstop |
------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |       |   918K|    19M|       |  5245   (2)| 00:00:42 |       |       |
|   1 |  SORT ORDER BY       |       |   918K|    19M|    31M|  5245   (2)| 00:00:42 |       |       |
|   2 |   PARTITION RANGE ALL|       |   918K|    19M|       |   163   (8)| 00:00:02 |     1 |    28 |
|   3 |    TABLE ACCESS FULL | SALES |   918K|    19M|       |   163   (8)| 00:00:02 |     1 |    28 |
------------------------------------------------------------------------------------------------------
 
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       1635  consistent gets
       1635  physical reads
          0  redo size
   26710873  bytes sent via SQL*Net to client
     674335  bytes received via SQL*Net from client
      61258  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
     918843  rows processed

Notice that the same odd sort costing is still in effect for the parallel execution.

From the 10053 parallel run:

*********************************
Number of join permutations tried: 1
*********************************
Enumerating distribution method (advanced)
    SORT ressource         Sort statistics
      Sort width:        1198 Area size:     1048576 Max Area size:   209715200
      Degree:               1
      Blocks to Sort: 3938 Row size:     35 Total Rows:         918843
      Initial runs:   2 Merge passes:  1 IO Cost / pass:       1100
      Total IO sort cost: 5038      Total CPU sort cost: 938362617
      Total Temp space used: 33317000
    SORT response         Sort statistics
      Sort width:        1198 Area size:    41943040 Max Area size:   209715200
      Degree:               2
      Blocks to Sort: 1969 Row size:     35 Total Rows:         459422
      Initial runs:   1 Merge passes:  0 IO Cost / pass:          0
      Total IO sort cost: 0      Total CPU sort cost: 456277927
      Total Temp space used: 0
Trying or-Expansion on query block SEL$1 (#0)
Transfer Optimizer annotations for query block SEL$1 (#0)
Final cost for query block SEL$1 (#0) - All Rows Plan:
  Best join order: 1
  Cost: 111.7654  Degree: 2  Card: 918843.0000  Bytes: 20214546
  Resc: 5244.6705  Resc_io: 5189.0000  Resc_cpu: 1186450227
  Resp: 111.7654  Resp_io: 83.8889  Resc_cpu: 594104377

From the 10053 serial run:

*********************************
Number of join permutations tried: 1
*********************************
Enumerating distribution method (advanced)
    SORT ressource         Sort statistics
      Sort width:        1198 Area size:     1048576 Max Area size:   209715200
      Degree:               1
      Blocks to Sort: 3938 Row size:     35 Total Rows:         918843
      Initial runs:   2 Merge passes:  1 IO Cost / pass:       1100
      Total IO sort cost: 5038      Total CPU sort cost: 938362617
      Total Temp space used: 33317000
Trying or-Expansion on query block SEL$1 (#0)
Transfer Optimizer annotations for query block SEL$1 (#0)
Final cost for query block SEL$1 (#0) - All Rows Plan:
  Best join order: 1
  Cost: 5245.3078  Degree: 1  Card: 918843.0000  Bytes: 20214546
  Resc: 5245.3078  Resc_io: 5189.0000  Resc_cpu: 1200030813
  Resp: 5245.3078  Resp_io: 5189.0000  Resc_cpu: 1200030813

—

Lukasz:
p000:

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         0          0          0  PX COORDINATOR  (cr=0 pr=0 pw=0 time=0 us)
         0          0          0   PX SEND QC (ORDER) :TQ10001 (cr=0 pr=0 pw=0 time=0 us cost=112 size=20214546 card=918843)
     19661      19661      19661    SORT ORDER BY (cr=0 pr=0 pw=0 time=201185 us cost=112 size=20214546 card=918843)
     19661      19661      19661     PX RECEIVE  (cr=0 pr=0 pw=0 time=4997 us cost=90 size=20214546 card=918843)
         0          0          0      PX SEND RANGE :TQ10000 (cr=0 pr=0 pw=0 time=0 us cost=90 size=20214546 card=918843)
         0          0          0       PX BLOCK ITERATOR PARTITION: 1 28 (cr=0 pr=0 pw=0 time=0 us cost=90 size=20214546 card=918843)
         0          0          0        TABLE ACCESS FULL SALES PARTITION: 1 28 (cr=0 pr=0 pw=0 time=0 us cost=90 size=20214546 card=918843)
 
Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  library cache: mutex X                          1        0.00          0.00
  PX Deq: Execution Msg                           4        5.49          5.54
  PX Deq Credit: send blkd                       23        0.00          0.10
  PX Deq: Table Q Sample                          1        0.00          0.00
  PX Deq: Table Q Normal                         31        0.02          0.18

p001:

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         0          0          0  PX COORDINATOR  (cr=0 pr=0 pw=0 time=0 us)
         0          0          0   PX SEND QC (ORDER) :TQ10001 (cr=0 pr=0 pw=0 time=0 us cost=112 size=20214546 card=918843)
    899182     899182     899182    SORT ORDER BY (cr=0 pr=0 pw=0 time=501150 us cost=112 size=20214546 card=918843)
    899182     899182     899182     PX RECEIVE  (cr=0 pr=0 pw=0 time=190214 us cost=90 size=20214546 card=918843)
         0          0          0      PX SEND RANGE :TQ10000 (cr=0 pr=0 pw=0 time=0 us cost=90 size=20214546 card=918843)
         0          0          0       PX BLOCK ITERATOR PARTITION: 1 28 (cr=0 pr=0 pw=0 time=0 us cost=90 size=20214546 card=918843)
         0          0          0        TABLE ACCESS FULL SALES PARTITION: 1 28 (cr=0 pr=0 pw=0 time=0 us cost=90 size=20214546 card=918843)
 
Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  library cache: mutex X                          1        0.00          0.00
  PX Deq: Execution Msg                           4        0.00          0.01
  PX Deq Credit: send blkd                     1044        0.12          5.40
  PX Deq: Table Q Sample                          1        0.00          0.00
  PX Deq: Table Q Normal                       1331        0.00          0.02

The same type of skewed division of rows for the sort operation is still present for the parallel slaves.

14 07 2012

Log Buffer #277, A Carnival of the Vanities for DBAs | The Pythian Blog (00:18:35) :

[…] Charles Hooper is dealing with a parallel execution challenge. […]

18 07 2012

stefan (11:47:28) :

the trace files claim not to use the same sorting mechanism v1 vs v2. but as I have forgoten most
of what I read about this on Jonthan Lewis blog I cannot jude if this makes a difference.

pure speculation, not tested anywhere, two theories about optimizer costing

1) There is something I never really understood in these and similar plans (maybe I should start reading some serious books)
If parallel slaves are (pre)sorting the data, is there not another sort or merge needed once they different sources are combined?
Is maybe only the merge/additional sort costed in the parallel case?

2) could it be that the optimizer thinks, the lower number of rows to sort per slave fits into memory,
whereas serial execution needs to spill to temp table space? hence more IO higher cost

18 07 2012

stefan (11:50:38) :

besides, does gather plan statistics gather all info or only the query coordinator?

19 07 2012

Charles Hooper (12:09:39) :

Stefan,

I also wondered the same thing about the GATHER_PLAN_STATISTICS hint – maybe that is why the actual runtime statistics are not updated in the execution plan. I then recalled that the GATHER_PLAN_STATISTICS hint is roughly equivalent to setting STATISTICS_LEVEL=ALL in regard to execution actual runtime statistics, but the hint has a much lower performance impacting overhead than does changing STATISTICS_LEVEL from the default of TYPICAL to ALL.

Let’s try the test again:

ALTER SESSION SET CURRENT_SCHEMA=SH;
ALTER SYSTEM SET STATISTICS_LEVEL=ALL;
SET AUTOTRACE TRACEONLY STATISTICS
SET TIMING OFF
SET LINESIZE 145
SET PAGESIZE 1000
 
ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SESSION SET EVENTS '10053 TRACE NAME CONTEXT FOREVER, LEVEL 1';
ALTER SESSION SET EVENTS '10032 TRACE NAME CONTEXT FOREVER, LEVEL 1';
ALTER SESSION SET EVENTS '10033 TRACE NAME CONTEXT FOREVER, LEVEL 1';
ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT FOREVER, LEVEL 8';
 
ALTER SESSION SET TRACEFILE_IDENTIFIER='parallel';
SELECT /*+ Q1 PARALLEL(2) GATHER_PLAN_STATISTICS */ S.PROD_ID, S.CUST_ID, S.TIME_ID FROM SH.SALES S ORDER BY S.AMOUNT_SOLD DESC;
 
ALTER SYSTEM FLUSH BUFFER_CACHE;
 
ALTER SESSION SET TRACEFILE_IDENTIFIER='serial';
SELECT /*+ Q2 GATHER_PLAN_STATISTICS */ S.PROD_ID, S.CUST_ID, S.TIME_ID FROM SH.SALES S ORDER BY S.AMOUNT_SOLD DESC;

ALTER SYSTEM FLUSH BUFFER_CACHE;
 
ALTER SESSION SET "_NEWSORT_ENABLED"=FALSE;
ALTER SESSION SET TRACEFILE_IDENTIFIER='serial2';
SELECT /*+ Q3 GATHER_PLAN_STATISTICS */ S.PROD_ID, S.CUST_ID, S.TIME_ID FROM SH.SALES S ORDER BY S.AMOUNT_SOLD DESC;
 
ALTER SESSION SET EVENTS '10053 TRACE NAME CONTEXT OFF';
ALTER SESSION SET EVENTS '10032 TRACE NAME CONTEXT OFF';
ALTER SESSION SET EVENTS '10033 TRACE NAME CONTEXT OFF';
ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT OFF';
 
SET AUTOTRACE OFF 
 
SELECT
  SQL_ID,
  CHILD_NUMBER,
  SUBSTR(SQL_TEXT,1,30) SQL_TEXT
FROM
  V$SQL
WHERE
  SQL_TEXT LIKE 'SELECT /*+ Q%'
ORDER BY
  3,
  1,
  2;

The SQL_IDs:

SQL_ID        CHILD_NUMBER SQL_TEXT
------------- ------------ ------------------------------
5p47nbddfn0k7            0 SELECT /*+ Q1 PARALLEL(2) GATH
8k6bdqr09xc8u            0 SELECT /*+ Q2 GATHER_PLAN_STAT
1vzjwgsacny45            0 SELECT /*+ Q3 GATHER_PLAN_STAT

Retrieving the actual statistics:

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR('5p47nbddfn0k7',0,'ALLSTATS LAST +COST'));
 
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR('8k6bdqr09xc8u',0,'ALLSTATS LAST +COST')); 
 
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR('1vzjwgsacny45',0,'ALLSTATS LAST +COST'));

The execution plan for the parallel execution still does not include the runtime statistics for the parallel slaves:

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  5p47nbddfn0k7, child number 0
-------------------------------------
SELECT /*+ Q1 PARALLEL(2) GATHER_PLAN_STATISTICS */ S.PROD_ID,
S.CUST_ID, S.TIME_ID FROM SH.SALES S ORDER BY S.AMOUNT_SOLD DESC
 
Plan hash value: 2055439529
 
-----------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation               | Name     | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |          |      1 |        |   112 (100)|    918K|00:00:00.81 |      48 |     16 |       |       |          |
|   1 |  PX COORDINATOR         |          |      1 |        |            |    918K|00:00:00.81 |      48 |     16 |       |       |          |
|   2 |   PX SEND QC (ORDER)    | :TQ10001 |      0 |    918K|   112  (25)|      0 |00:00:00.01 |       0 |      0 |       |       |          |
|   3 |    SORT ORDER BY        |          |      0 |    918K|   112  (25)|      0 |00:00:00.01 |       0 |      0 |    83M|  3138K|  865K (0)|
|   4 |     PX RECEIVE          |          |      0 |    918K|    90   (7)|      0 |00:00:00.01 |       0 |      0 |       |       |          |
|   5 |      PX SEND RANGE      | :TQ10000 |      0 |    918K|    90   (7)|      0 |00:00:00.01 |       0 |      0 |       |       |          |
|   6 |       PX BLOCK ITERATOR |          |      0 |    918K|    90   (7)|      0 |00:00:00.01 |       0 |      0 |       |       |          |
|*  7 |        TABLE ACCESS FULL| SALES    |      0 |    918K|    90   (7)|      0 |00:00:00.01 |       0 |      0 |       |       |          |
-----------------------------------------------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
   7 - access(:Z>=:Z AND :Z<=:Z)
 
Note
-----
   - Degree of Parallelism is 2 because of hint

Good that you caught the version 1 and version 2 sort difference. After reviewing the trace files generated by the above script, it appears that only the parallel query coordinator performs a version 1 sort, while it appears that parallel slave p001 performs two version 2 sorts. What is a version 1 sort, and what is a version 2 sort? Jonathan Lewis has a couple of articles that describe version 1 sorts well, and touch on version 2 sorts:
http://jonathanlewis.wordpress.com/2009/10/01/_smm_isort_cap/
http://jonathanlewis.wordpress.com/2009/12/28/short-sorts/

One of Jonathan’s articles mentions the _SMM_ISORT_CAP parameter, so I thought that I would check the value of that parameter (connected as the SYS user):

COLUMN NAME FORMAT A15
COLUMN VALUE FORMAT A15
 
SELECT 
  UPPER(NAM.KSPPINM) NAME,
  VAL.KSPPSTVL VALUE,
  NAM.INDX+1 NUM,
  NAM.KSPPITY TYPE,
  VAL.KSPPSTDF ISDEFAULT,
  DECODE(BITAND(NAM.KSPPIFLG/256,1),1,'TRUE','FALSE') ISSES_MODIFIABLE,
  DECODE(BITAND(NAM.KSPPIFLG/65536,3),
    1,'IMMEDIATE',
    2,'DEFERRED' ,
    3,'IMMEDIATE',
    'FALSE') ISSYS_MODIFIABLE,
  DECODE(BITAND(VAL.KSPPSTVF,7),
    1,'MODIFIED',
    4,'SYSTEM MODIFIED',
    'FALSE') ISMODIFIED,
  DECODE(BITAND(VAL.KSPPSTVF,2),2,'TRUE', 'FALSE') ISADJUSTED,
  NAM.KSPPDESC DESCRIPTION
FROM
  X$KSPPI NAM,
  X$KSPPSV VAL
WHERE 
  NAM.INDX = VAL.INDX 
  AND UPPER(NAM.KSPPINM)='_SMM_ISORT_CAP'
ORDER BY
  UPPER(DECODE(SUBSTR(NAM.KSPPINM,1,2),'__',SUBSTR(NAM.KSPPINM,3),
          DECODE(SUBSTR(NAM.KSPPINM,1,1),'_',SUBSTR(NAM.KSPPINM,2),NAM.KSPPINM))),
  UPPER(NAM.KSPPINM);
 
NAME            VALUE                  NUM       TYPE ISDEFAULT ISSES ISSYS_MOD ISMODIFIED      ISADJ DESCRIPTION
--------------- --------------- ---------- ---------- --------- ----- --------- --------------- ----- ----------------------------------------
_SMM_ISORT_CAP  102400                2233          3 TRUE      TRUE  IMMEDIATE FALSE           FALSE maximum work area for insertion sort(v1)

Just 102400? If the scale of the parameter is in bytes, that is just 100KB.

As shown in the above script, it is possible to disable the version 2 sort by setting the _NEWSORT_ENABLED hidden parameter to false.

The sort statistics from the trace file:
Query coordinator for the parallel execution:

---- Sort Parameters ------------------------------
sort_area_size                    65536
sort_area_retained_size           65536
sort_multiblock_read_count        1
max intermediate merge width      3
 
---- Sort Statistics ------------------------------
Input records                             180
Output records                            180
Total number of comparisons performed     280
  Comparisons performed by in-memory sort 280
Total amount of memory used               10240
Uses version 1 sort

p001 for the parallel execution:

---- Sort Parameters ------------------------------
sort_area_size                    38818816
sort_area_retained_size           38818816
sort_multiblock_read_count        1
max intermediate merge width      2368
 
---- Sort Statistics ------------------------------
Input records                             899182
Output records                            899182
Total number of comparisons performed     5065950
  Comparisons performed by in-memory sort 5065950
Total amount of memory used               38818816
Uses version 2 sort
---- End of Sort Statistics -----------------------
...
---- Sort Parameters ------------------------------
sort_area_size                    38818816
sort_area_retained_size           38818816
sort_multiblock_read_count        1
max intermediate merge width      2368

---- Sort Statistics ------------------------------
Input records                             899182
Output records                            899182
Total number of comparisons performed     4990078
  Comparisons performed by in-memory sort 4990078
Total amount of memory used               38818816
Uses version 2 sort

With version 2 sort enabled, the execution plan and sort statistics:

SQL_ID  8k6bdqr09xc8u, child number 0
-------------------------------------
SELECT /*+ Q2 GATHER_PLAN_STATISTICS */ S.PROD_ID, S.CUST_ID, S.TIME_ID
FROM SH.SALES S ORDER BY S.AMOUNT_SOLD DESC
 
Plan hash value: 3803407550
 
-----------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |       |      1 |        |  5245 (100)|    918K|00:00:00.90 |    1635 |   1635 |       |       |          |
|   1 |  SORT ORDER BY       |       |      1 |    918K|  5245   (2)|    918K|00:00:00.90 |    1635 |   1635 |    42M|  2308K|   37M (0)|
|   2 |   PARTITION RANGE ALL|       |      1 |    918K|   163   (8)|    918K|00:00:00.41 |    1635 |   1635 |       |       |          |
|   3 |    TABLE ACCESS FULL | SALES |     28 |    918K|   163   (8)|    918K|00:00:00.25 |    1635 |   1635 |       |       |          |
-----------------------------------------------------------------------------------------------------------------------------------------
 
---- Sort Parameters ------------------------------
sort_area_size                    39721984
sort_area_retained_size           39721984
sort_multiblock_read_count        1
max intermediate merge width      2423
 
---- Sort Statistics ------------------------------
Input records                             918843
Output records                            918843
Total number of comparisons performed     5061139
  Comparisons performed by in-memory sort 5061139
Total amount of memory used               39721984
Uses version 2 sort
---- End of Sort Statistics -----------------------

With version 2 sort disabled, the execution plan and sort statistics:

SQL_ID  1vzjwgsacny45, child number 0
-------------------------------------
SELECT /*+ Q3 GATHER_PLAN_STATISTICS */ S.PROD_ID, S.CUST_ID, S.TIME_ID
FROM SH.SALES S ORDER BY S.AMOUNT_SOLD DESC
 
Plan hash value: 3803407550
 
-----------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |       |      1 |        |  5245 (100)|    918K|00:00:01.06 |    1635 |   1635 |       |       |          |
|   1 |  SORT ORDER BY       |       |      1 |    918K|  5245   (2)|    918K|00:00:01.06 |    1635 |   1635 |    48M|  2447K|   43M (0)|
|   2 |   PARTITION RANGE ALL|       |      1 |    918K|   163   (8)|    918K|00:00:00.42 |    1635 |   1635 |       |       |          |
|   3 |    TABLE ACCESS FULL | SALES |     28 |    918K|   163   (8)|    918K|00:00:00.26 |    1635 |   1635 |       |       |          |
-----------------------------------------------------------------------------------------------------------------------------------------
 
---- Sort Parameters ------------------------------
sort_area_size                    45205504
sort_area_retained_size           45205504
sort_multiblock_read_count        1
max intermediate merge width      2758
 
---- Sort Statistics ------------------------------
Input records                             918843
Output records                            918843
Total number of comparisons performed     9022589
  Comparisons performed by in-memory sort 9022589
Total amount of memory used               45205504
Uses version 1 sort
---- End of Sort Statistics -----------------------

Still to be determined is why the serial execution was costed with a sort area size of 1MB, while the parallel execution was costed with a sort area size of 40MB (the value of the _PGA_MAX_SIZE parameter as it appeared in the trace files) – that would be the likely explanation for the costing difference of the sort operation.

23 07 2012

Dom Brooks (05:06:11) :

ALLSTATS LAST is not suitable for parallel executions because it only reports statistics from the QC.

Similarly, ALLSTATS ALL can be better but not if there have been multiple executions.

This is why Real Time SQL Monitoring is the better tool for reporting parallel execution statistics, but it requires Diagnostic+Tuning license.

Maria Colgan commented the above on a post of mine here:
http://orastory.wordpress.com/2011/03/10/real-time-sql-monitoring/

23 07 2012

Charles Hooper (06:52:11) :

Dominic,

The comments in the article that you linked to above cover the third challenge of this blog article very well; your summarization of the comments from the article is well stated.

I did not have a good explanation for the behavior – I assumed that the query was executed once, not once per parallel server process. Thinking out loud, by what value will the EXECUTIONS and PX_SERVERS_EXECUTIONS columns of V$SQL (http://docs.oracle.com/cd/E14072_01/server.112/e10820/dynviews_3042.htm ) and V$SQLAREA (http://docs.oracle.com/cd/E11882_01/server.112/e17110/dynviews_3064.htm ) be increased when a query is executed in parallel: 1) The values in those columns will be increased by a value of 1 for each execution by the query coordinator; 2) The values in those columns will be increased by a value of 1 per parallel server process (and quey coordinator) involved in the query. Time for another test case script…

23 07 2012

Dom Brooks (08:43:36) :

> By what value will the EXECUTIONS and PX_SERVER_EXECUTIONS be increased…
Questions that ring a bell with a recent OTN forum thread wondering how to tell what the real DOP was for a query:
https://forums.oracle.com/forums/thread.jspa?messageID=10447813&#10447813

23 07 2012

Charles Hooper (09:44:35) :

PX_SERVER_EXECUTIONS is described in that OTN forum thread – I had previously read that thread, but missed the mention of the PX_SERVER_EXECUTIONS statistic.

A little test script that uses the sample database supplied by Oracle Corp.:

SET AUTOTRACE TRACEONLY STATISTICS
ALTER SESSION SET CURRENT_SCHEMA=SH;
SET LINESIZE 145
SET PAGESIZE 1000
 
SELECT /*+ Q1 PARALLEL(4) GATHER_PLAN_STATISTICS */ S.PROD_ID, S.CUST_ID, S.TIME_ID FROM SH.SALES S ORDER BY S.AMOUNT_SOLD DESC;
 
SET AUTOTRACE OFF
 
SELECT
  SQL_ID,
  CHILD_NUMBER,
  EXECUTIONS,
  PX_SERVERS_EXECUTIONS,
  SUBSTR(SQL_TEXT,1,30) SQL_TEXT
FROM
  V$SQL
WHERE
  SQL_TEXT LIKE 'SELECT /*+ Q%'
ORDER BY
  1,
  2;

I executed the script 5 times, the results are posted below:

SQL_ID        CHILD_NUMBER EXECUTIONS PX_SERVERS_EXECUTIONS SQL_TEXT
------------- ------------ ---------- --------------------- ------------------------------
a41qh973zdz2k            0          1                     8 SELECT /*+ Q1 PARALLEL(4) GATH
 
SQL_ID        CHILD_NUMBER EXECUTIONS PX_SERVERS_EXECUTIONS SQL_TEXT
------------- ------------ ---------- --------------------- ------------------------------
a41qh973zdz2k            0          2                    16 SELECT /*+ Q1 PARALLEL(4) GATH
 
SQL_ID        CHILD_NUMBER EXECUTIONS PX_SERVERS_EXECUTIONS SQL_TEXT
------------- ------------ ---------- --------------------- ------------------------------
a41qh973zdz2k            0          3                    24 SELECT /*+ Q1 PARALLEL(4) GATH
 
SQL_ID        CHILD_NUMBER EXECUTIONS PX_SERVERS_EXECUTIONS SQL_TEXT
------------- ------------ ---------- --------------------- ------------------------------
a41qh973zdz2k            0          4                    32 SELECT /*+ Q1 PARALLEL(4) GATH
 
SQL_ID        CHILD_NUMBER EXECUTIONS PX_SERVERS_EXECUTIONS SQL_TEXT
------------- ------------ ---------- --------------------- ------------------------------
a41qh973zdz2k            0          5                    40 SELECT /*+ Q1 PARALLEL(4) GATH

With Jonathan’s explanation from the OTN thread, would this be a DOP (degree of parallel) of 4 with 2 sets of slaves to run it?

Let’s take a look at the execution plan:

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR('a41qh973zdz2k',0,'TYPICAL +PARALLEL'));
 
SQL_ID  a41qh973zdz2k, child number 0
-------------------------------------
SELECT /*+ Q1 PARALLEL(4) GATHER_PLAN_STATISTICS */ S.PROD_ID,
S.CUST_ID, S.TIME_ID FROM SH.SALES S ORDER BY S.AMOUNT_SOLD DESC
 
Plan hash value: 2055439529
 
-----------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation               | Name     | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     | Pstart| Pstop |    TQ  |IN-OUT| PQ Distrib |
-----------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |          |       |       |       |    56 (100)|          |       |       |        |      |            |
|   1 |  PX COORDINATOR         |          |       |       |       |            |          |       |       |        |      |            |
|   2 |   PX SEND QC (ORDER)    | :TQ10001 |   918K|    19M|       |    56  (25)| 00:00:01 |       |       |  Q1,01 | P->S | QC (ORDER) |
|   3 |    SORT ORDER BY        |          |   918K|    19M|    31M|    56  (25)| 00:00:01 |       |       |  Q1,01 | PCWP |            |
|   4 |     PX RECEIVE          |          |   918K|    19M|       |    45   (7)| 00:00:01 |       |       |  Q1,01 | PCWP |            |
|   5 |      PX SEND RANGE      | :TQ10000 |   918K|    19M|       |    45   (7)| 00:00:01 |       |       |  Q1,00 | P->P |      RANGE |
|   6 |       PX BLOCK ITERATOR |          |   918K|    19M|       |    45   (7)| 00:00:01 |     1 |    28 |  Q1,00 | PCWC |            |
|*  7 |        TABLE ACCESS FULL| SALES    |   918K|    19M|       |    45   (7)| 00:00:01 |     1 |    28 |  Q1,00 | PCWP |            |
-----------------------------------------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
   7 - access(:Z>=:Z AND :Z<=:Z)
 
Note
-----
   - Degree of Parallelism is 4 because of hint

As is probably known by several readers, the degree of parallel in the note section may differ from the actual degree of parallel (this was mentioned in the thread).

Let’s try the test script again, this time with a GATHER_PLAN_STATISTICS hint in the query:

SET AUTOTRACE TRACEONLY STATISTICS
ALTER SESSION SET CURRENT_SCHEMA=SH;
SET LINESIZE 145
SET PAGESIZE 1000
 
SELECT /*+ Q1 GATHER_PLAN_STATISTICS PARALLEL(4) GATHER_PLAN_STATISTICS */ S.PROD_ID, S.CUST_ID, S.TIME_ID FROM SH.SALES S ORDER BY S.AMOUNT_SOLD DESC;
 
SET AUTOTRACE OFF
 
SELECT
  SQL_ID,
  CHILD_NUMBER,
  EXECUTIONS,
  PX_SERVERS_EXECUTIONS,
  SUBSTR(SQL_TEXT,1,30) SQL_TEXT
FROM
  V$SQL
WHERE
  SQL_TEXT LIKE 'SELECT /*+ Q%'
ORDER BY
  1,
  2;
 
SQL_ID        CHILD_NUMBER EXECUTIONS PX_SERVERS_EXECUTIONS SQL_TEXT
------------- ------------ ---------- --------------------- ------------------------------
5jkxkzarc5myx            0          1                     8 SELECT /*+ Q1 GATHER_PLAN_STAT
a41qh973zdz2k            0          5                    40 SELECT /*+ Q1 PARALLEL(4) GATH

Will the ALLSTATS ALL format parameter of the DBMS_XPLAN call help? The ALLSTATS LAST format parameter call’s output:

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR('5jkxkzarc5myx',0,'ALLSTATS LAST +PARALLEL'));
 
SQL_ID  5jkxkzarc5myx, child number 0
-------------------------------------
SELECT /*+ Q1 GATHER_PLAN_STATISTICS PARALLEL(4) GATHER_PLAN_STATISTICS
*/ S.PROD_ID, S.CUST_ID, S.TIME_ID FROM SH.SALES S ORDER BY
S.AMOUNT_SOLD DESC
 
Plan hash value: 2055439529
 
------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation               | Name     | Starts | E-Rows |    TQ  |IN-OUT| PQ Distrib | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |          |      1 |        |        |      |            |    918K|00:00:00.42 |      48 |       |       |          |
|   1 |  PX COORDINATOR         |          |      1 |        |        |      |            |    918K|00:00:00.42 |      48 |       |       |          |
|   2 |   PX SEND QC (ORDER)    | :TQ10001 |      0 |    918K|  Q1,01 | P->S | QC (ORDER) |      0 |00:00:00.01 |       0 |       |       |          |
|   3 |    SORT ORDER BY        |          |      0 |    918K|  Q1,01 | PCWP |            |      0 |00:00:00.01 |       0 |    73M|  2952K|  865K (0)|
|   4 |     PX RECEIVE          |          |      0 |    918K|  Q1,01 | PCWP |            |      0 |00:00:00.01 |       0 |       |       |          |
|   5 |      PX SEND RANGE      | :TQ10000 |      0 |    918K|  Q1,00 | P->P | RANGE      |      0 |00:00:00.01 |       0 |       |       |          |
|   6 |       PX BLOCK ITERATOR |          |      0 |    918K|  Q1,00 | PCWC |            |      0 |00:00:00.01 |       0 |       |       |          |
|*  7 |        TABLE ACCESS FULL| SALES    |      0 |    918K|  Q1,00 | PCWP |            |      0 |00:00:00.01 |       0 |       |       |          |
------------------------------------------------------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
   7 - access(:Z>=:Z AND :Z<=:Z)
 
Note
-----
   - Degree of Parallelism is 4 because of hint

Now, for comparison the ALLSTATS ALL format parameter call’s output:

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR('5jkxkzarc5myx',0,'ALLSTATS ALL +PARALLEL'));
 
SQL_ID  5jkxkzarc5myx, child number 0
-------------------------------------
SELECT /*+ Q1 GATHER_PLAN_STATISTICS PARALLEL(4) GATHER_PLAN_STATISTICS
*/ S.PROD_ID, S.CUST_ID, S.TIME_ID FROM SH.SALES S ORDER BY
S.AMOUNT_SOLD DESC
 
Plan hash value: 2055439529
 
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation               | Name     | Starts | E-Rows |E-Bytes|E-Temp | Cost (%CPU)| E-Time   | Pstart| Pstop |    TQ  |IN-OUT| PQ Distrib | A-Rows |   A-Time   | Buffers |  OMem |  1Mem |  O/1/M   |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |          |      1 |        |       |       |    56 (100)|          |       |       |        |      |            |    918K|00:00:00.42 |      48 |       |       |          |
|   1 |  PX COORDINATOR         |          |      1 |        |       |       |            |          |       |       |        |      |            |    918K|00:00:00.42 |      48 |       |       |          |
|   2 |   PX SEND QC (ORDER)    | :TQ10001 |      0 |    918K|    19M|       |    56  (25)| 00:00:01 |       |       |  Q1,01 | P->S | QC (ORDER) |      0 |00:00:00.01 |       0 |       |       |          |
|   3 |    SORT ORDER BY        |          |      4 |    918K|    19M|    31M|    56  (25)| 00:00:01 |       |       |  Q1,01 | PCWP |            |    918K|00:00:01.08 |       0 |    73M|  2952K|     3/0/0|
|   4 |     PX RECEIVE          |          |      4 |    918K|    19M|       |    45   (7)| 00:00:01 |       |       |  Q1,01 | PCWP |            |    918K|00:00:01.18 |       0 |       |       |          |
|   5 |      PX SEND RANGE      | :TQ10000 |      0 |    918K|    19M|       |    45   (7)| 00:00:01 |       |       |  Q1,00 | P->P |      RANGE |      0 |00:00:00.01 |       0 |       |       |          |
|   6 |       PX BLOCK ITERATOR |          |      4 |    918K|    19M|       |    45   (7)| 00:00:01 |     1 |    28 |  Q1,00 | PCWC |            |    918K|00:00:00.49 |    1665 |       |       |          |
|*  7 |        TABLE ACCESS FULL| SALES    |     46 |    918K|    19M|       |    45   (7)| 00:00:01 |     1 |    28 |  Q1,00 | PCWP |            |    918K|00:00:00.27 |    1665 |       |       |          |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$1
   7 - SEL$1 / S@SEL$1
 
Predicate Information (identified by operation id):
---------------------------------------------------
   7 - access(:Z>=:Z AND :Z<=:Z)
 
Column Projection Information (identified by operation id):
-----------------------------------------------------------
   1 - "S"."PROD_ID"[NUMBER,22], "S"."CUST_ID"[NUMBER,22], "S"."TIME_ID"[DATE,7]
   2 - (#keys=0) "S"."PROD_ID"[NUMBER,22], "S"."CUST_ID"[NUMBER,22], "S"."TIME_ID"[DATE,7]
   3 - (#keys=1) INTERNAL_FUNCTION("S"."AMOUNT_SOLD")[22], "S"."PROD_ID"[NUMBER,22], "S"."CUST_ID"[NUMBER,22], "S"."TIME_ID"[DATE,7]
   4 - INTERNAL_FUNCTION("S"."AMOUNT_SOLD")[22], "S"."PROD_ID"[NUMBER,22], "S"."CUST_ID"[NUMBER,22], "S"."TIME_ID"[DATE,7]
   5 - (#keys=1) INTERNAL_FUNCTION("S"."AMOUNT_SOLD")[22], "S"."PROD_ID"[NUMBER,22], "S"."CUST_ID"[NUMBER,22], "S"."TIME_ID"[DATE,7]
   6 - "S"."PROD_ID"[NUMBER,22], "S"."CUST_ID"[NUMBER,22], "S"."TIME_ID"[DATE,7], "S"."AMOUNT_SOLD"[NUMBER,22]
   7 - "S"."PROD_ID"[NUMBER,22], "S"."CUST_ID"[NUMBER,22], "S"."TIME_ID"[DATE,7], "S"."AMOUNT_SOLD"[NUMBER,22]
 
Note
-----
   - Degree of Parallelism is 4 because of hint

The ALLSTATS ALL format parameter of the DBMS_XPLAN.DISPLAY_CURSOR function worked as described by Maria Colgan (and Doug Burns).

	Charles Hooper on Favorite Free Applications to…
	Zach on Favorite Free Applications to…
	Charles Hooper on Favorite Free Applications to…
	Zach on Favorite Free Applications to…
	Charles Hooper on Huge Presentation – Work…
	Dan on Huge Presentation – Work…
	Charles Hooper on Review of Synology DS415+, How…
	Matthias Clausen on Review of Synology DS415+, How…
	Charles Hooper on _SMALL_TABLE_THRESHOLD Paramet…
	mohd atif on _SMALL_TABLE_THRESHOLD Paramet…
	Charles Hooper on Free ANSI SQL to Oracle Specif…
	Kalin Agrawal on Free ANSI SQL to Oracle Specif…
	Charles Hooper on Large Page Support or Not…
	vaurob on Large Page Support or Not…
	Locating Oracle… on Finding a New Home for a Clien…

Charles Hooper's Oracle Notes