Different Performance from Standard Edition and Enterprise Edition? 3

22 11 2010

November 22, 2010

(Back to the Previous Post in the Series) (Forward to the Next Post in the Series)

In the previous articles of this series we looked at different performance behavior exhibited by Standard Edition and Enterprise Editions for release versions 10.2.0.5 and 11.2.0.1, when presented with a very simple SQL statement that forced an index access path.  Today’s article takes a slightly different path, where we will see radically different execution plans for the Standard Edition and Enterprise Edition, as well as an issue that will cause some of the proponents of the buffer cache hit ratio to panic when the statistics produced by the Enterprise Edition of 10.2.0.5 are compared to the statistics produced by the Enterprise Edition of 11.2.0.1 (another new discussion thread on the OTN forums discussing the BCHR). 

Once again, we will manually specify the system (CPU) statistics as follows so that we might prevent slight differences in the statistics already set for the databases from causing unexpected results:

EXEC DBMS_STATS.SET_SYSTEM_STATS('MBRC',16)
EXEC DBMS_STATS.SET_SYSTEM_STATS('CPUSPEED',2664)
EXEC DBMS_STATS.SET_SYSTEM_STATS('SREADTIM',8)
EXEC DBMS_STATS.SET_SYSTEM_STATS('MREADTIM',10)
EXEC DBMS_STATS.SET_SYSTEM_STATS('MAXTHR',19181568) 

The script to create the two test tables and associated indexes:

CREATE TABLE T1 (
  C1 NUMBER NOT NULL,
  C2 VARCHAR2(20) NOT NULL,
  C3 NUMBER NOT NULL,
  C4 DATE NOT NULL,
  PADDING VARCHAR2(500));

INSERT INTO
  T1
SELECT
  TO_NUMBER(TO_CHAR(TRUNC(SYSDATE)+(ROWNUM-1), 'DD')),
  TRIM(TO_CHAR(TRUNC(SYSDATE)+(ROWNUM-1), 'DAY')),
  TO_NUMBER(TO_CHAR(TRUNC(SYSDATE)+(ROWNUM-1), 'DDD')),
  TRUNC(SYSDATE)+(ROWNUM-1),
  LPAD('A',500,'A')
FROM
  DUAL
CONNECT BY
  LEVEL<=500000;

COMMIT;

CREATE INDEX IND_T1_C1 ON T1(C1);
CREATE INDEX IND_T1_C2 ON T1(C2);
CREATE INDEX IND_T1_C3 ON T1(C3);

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T1',ESTIMATE_PERCENT=>NULL,METHOD_OPT=>'FOR ALL COLUMNS SIZE 1',NO_INVALIDATE=>FALSE)

CREATE TABLE T2 AS
SELECT
  *
FROM
  T1;

CREATE INDEX IND_T2_C1 ON T2(C1);
CREATE INDEX IND_T2_C2 ON T2(C2);
CREATE INDEX IND_T2_C3 ON T2(C3);

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T2',ESTIMATE_PERCENT=>NULL,METHOD_OPT=>'FOR ALL COLUMNS SIZE 1',NO_INVALIDATE=>FALSE)

COLUMN SEGMENT FORMAT A12

SELECT
  SEGMENT_NAME SEGMENT,
  COUNT(*) EXTENTS,
  BYTES/1024 EXT_SIZE_KB,
  (COUNT(*) * BYTES)/1048576 TOTAL_MB
FROM
  DBA_EXTENTS
WHERE
  OWNER=USER
  AND SEGMENT_NAME IN ('IND_T1_C1','IND_T1_C2','IND_T1_C3','T1')
GROUP BY
  SEGMENT_NAME,
  BYTES
ORDER BY
  SEGMENT_NAME,
  BYTES;

The tables were created in an ASSM autoallocate tablespace, with extents allocated as follows (consistent results with 10.2.0.5 and 11.2.0.1):

SEGMENT         EXTENTS EXT_SIZE_KB   TOTAL_MB
------------ ---------- ----------- ----------
IND_T1_C1            16          64          1
IND_T1_C1             7        1024          7
IND_T1_C2            16          64          1
IND_T1_C2            10        1024         10
IND_T1_C3            16          64          1
IND_T1_C3             8        1024          8
T1                   16          64          1
T1                   63        1024         63
T1                   30        8192        240 

The test script, once again divided into two nearly identical halves, with the results from the first half thrown out.  The first SQL statement retrieves the dates on which the day of the week is Monday and the day of the year is between 200 and 255.  The second SQL statement repeats the request with table T1 joined in a many to many type join to table T2:

ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE;

SET ARRAYSIZE 1000
SET AUTOTRACE TRACEONLY STATISTICS

ALTER SESSION SET TRACEFILE_IDENTIFIER = 'IND_JOIN_10046_10053_JUNK';
ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT FOREVER, LEVEL 8';
ALTER SESSION SET EVENTS '10053 TRACE NAME CONTEXT FOREVER, LEVEL 1';

SELECT /* JUNK */
  T1.C4
FROM
  T1
WHERE
  T1.C2='MONDAY'
  AND T1.C3 BETWEEN 200 AND 255;

SELECT /* JUNK */
  T1.C4,
  T2.C4
FROM
  T1,
  T2
WHERE
  T1.C2='MONDAY'
  AND T1.C3 BETWEEN 200 AND 255
  AND T1.C1=T2.C1
  AND T1.C3=T2.C3;

ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT OFF';
ALTER SESSION SET EVENTS '10053 TRACE NAME CONTEXT OFF';
SET AUTOTRACE OFF

ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE;

SET AUTOTRACE TRACEONLY STATISTICS

ALTER SESSION SET TRACEFILE_IDENTIFIER = 'IND_JOIN_10046_10053';
ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT FOREVER, LEVEL 8';
ALTER SESSION SET EVENTS '10053 TRACE NAME CONTEXT FOREVER, LEVEL 1';

SELECT /* FIND_ME */
  T1.C4
FROM
  T1
WHERE
  T1.C2='MONDAY'
  AND T1.C3 BETWEEN 200 AND 255;

SELECT SYSDATE FROM DUAL;

ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT OFF';
ALTER SESSION SET EVENTS '10053 TRACE NAME CONTEXT OFF';

ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE;

ALTER SESSION SET TRACEFILE_IDENTIFIER = 'IND_JOIN_10046_10053_J';
ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT FOREVER, LEVEL 8';
ALTER SESSION SET EVENTS '10053 TRACE NAME CONTEXT FOREVER, LEVEL 1';

SELECT /* FIND_ME */
  T1.C4,
  T2.C4
FROM
  T1,
  T2
WHERE
  T1.C2='MONDAY'
  AND T1.C3 BETWEEN 200 AND 255
  AND T1.C1=T2.C1
  AND T1.C3=T2.C3;

SELECT SYSDATE FROM DUAL;

ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT OFF';
ALTER SESSION SET EVENTS '10053 TRACE NAME CONTEXT OFF';
SET AUTOTRACE OFF  

We will start by comparing the statistics when the above script is executed in Oracle Database 10.2.0.5.

Standard Edition:

SQL> SELECT /* FIND_ME */
  2    T1.C4
  3  FROM
  4    T1
  5  WHERE
  6    T1.C2='MONDAY'
  7    AND T1.C3 BETWEEN 200 AND 255;

10952 rows selected.

Statistics
---------------------------------------------------
          1  recursive calls
          0  db block gets
      38681  consistent gets
      38658  physical reads
          0  redo size
     104979  bytes sent via SQL*Net to client
        582  bytes received via SQL*Net from client
         12  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      10952  rows processed 

Enterprise Edition:

Statistics
---------------------------------------------------
          1  recursive calls
          0  db block gets
       6878  consistent gets
      33648  physical reads
          0  redo size
     104979  bytes sent via SQL*Net to client
        582  bytes received via SQL*Net from client
         12  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
      10952  rows processed 

The Standard Edition read 38,658 blocks from disk and performed 38,681 consistent gets, so it might have selected to perform a full table scan (using the extent allocation displayed above, 304MB / 8KB = 38,912, which is close to the number of physical blocks that were read).  The Enterprise Edition read 33,648 blocks from disk and performed only 6,878 consistent gets (what happened to my BCHR, quick – downgrade to Standard Edition).  It is interesting that the number of consistent gets is significantly less than the number of blocks read from disk, and the number of blocks read from disk is less than that of the Standard Edition.  Interesting, how did the Enterprise Edition do that?  Let’s take a look at the TKPROF output.

Standard Edition:

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        2      0.01       0.02          0          0          0           0
Execute      2      0.00       0.00          0          0          0           0
Fetch       12      0.15       3.72      38658      38681          0       10952
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total       16      0.17       3.74      38658      38681          0       10952

Misses in library cache during parse: 1
Parsing user id: 47 

Rows     Row Source Operation
-------  ---------------------------------------------------
  10952  TABLE ACCESS FULL T1 (cr=38681 pr=38658 pw=0 time=25071316 us)

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                      13        0.00          0.00
  SQL*Net message from client                    13        0.01          0.02
  db file sequential read                         1        0.01          0.01
  db file scattered read                        319        0.05          3.49
  SQL*Net more data to client                    46        0.00          0.00 

Enterprise Edition:

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        2      0.01       0.05          0          0          0           0
Execute      2      0.00       0.00          0          0          0           0
Fetch       12      0.40       3.93      33648       6878          0       10952
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total       16      0.42       3.99      33648       6878          0       10952

Misses in library cache during parse: 1
Parsing user id: 164 

Rows     Row Source Operation
-------  ---------------------------------------------------
  10952  TABLE ACCESS BY INDEX ROWID T1 (cr=6878 pr=33648 pw=0 time=5271629 us)
  10952   BITMAP CONVERSION TO ROWIDS (cr=346 pr=376 pw=0 time=162037 us)
      2    BITMAP AND  (cr=346 pr=376 pw=0 time=189455 us)
     11     BITMAP CONVERSION FROM ROWIDS (cr=183 pr=200 pw=0 time=103387 us)
  71429      INDEX RANGE SCAN IND_T1_C2 (cr=183 pr=200 pw=0 time=65401 us)(object id 51682)
      3     BITMAP CONVERSION FROM ROWIDS (cr=163 pr=176 pw=0 time=71984 us)
  76664      SORT ORDER BY (cr=163 pr=176 pw=0 time=57590 us)
  76664       INDEX RANGE SCAN IND_T1_C3 (cr=163 pr=176 pw=0 time=13038 us)(object id 51683)

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                      13        0.00          0.00
  SQL*Net message from client                    13        0.01          0.01
  db file scattered read                       4181        0.04          3.47
  SQL*Net more data to client                    46        0.00          0.00
  db file sequential read                       206        0.03          0.13 

Even though the Standard Edition read more blocks from disk, and performed considerably more consistent gets, it executed the SQL statement 0.25 seconds faster than the Enterprise Edition.  Enterprise Edition certainly gnerated a more interesting execution plan.  So, why are the consistent gets significantly less than the number of blocks read from disk (cr= and query indicates consistent gets, pr= and disk indicates blocks read from disk)?  Is the difference caused by a bug (scroll down and compare with the output from 11.2.0.1)?  Moving on to the second SQL statement.

Standard Edition:

SQL> SELECT /* FIND_ME */
  2    T1.C4,
  3    T2.C4
  4  FROM
  5    T1,
  6    T2
  7  WHERE
  8    T1.C2='MONDAY'
  9    AND T1.C3 BETWEEN 200 AND 255
 10    AND T1.C1=T2.C1
 11    AND T1.C3=T2.C3;

9484744 rows selected.

Statistics
---------------------------------------------------
          1  recursive calls
          0  db block gets
      85913  consistent gets
      77121  physical reads
          0  redo size
  125651171  bytes sent via SQL*Net to client
     104796  bytes received via SQL*Net from client
       9486  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
    9484744  rows processed  

Enterprise Edition:

Statistics
---------------------------------------------------
          1  recursive calls
          0  db block gets
      54035  consistent gets
      72111  physical reads
          0  redo size
  125651171  bytes sent via SQL*Net to client
     104796  bytes received via SQL*Net from client
       9486  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
    9484744  rows processed 

The physical reads are close for the two editions, but again the consistent gets differs significantly.  Taking a look at the output of TKPROF.

Standard Edition:

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        2      0.01       0.04          0          0          0           0
Execute      2      0.00       0.00          0          0          0           0
Fetch     9486      6.50      10.21      77121      85913          0     9484744
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total     9490      6.52      10.26      77121      85913          0     9484744

Misses in library cache during parse: 1
Parsing user id: 47 

Rows     Row Source Operation
-------  ---------------------------------------------------
9484744  HASH JOIN  (cr=85913 pr=77121 pw=0 time=3814912 us)
  10952   TABLE ACCESS FULL T1 (cr=38670 pr=38658 pw=0 time=20749036 us)
  76664   TABLE ACCESS FULL T2 (cr=47243 pr=38463 pw=0 time=2036828 us)

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                    9487        0.00          0.00
  SQL*Net message from client                  9487        0.01          5.04
  db file sequential read                         2        0.01          0.01
  db file scattered read                        637        0.06          4.01
  SQL*Net more data to client                 56908        0.00          1.54 

Enterprise Edition:

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        2      0.03       0.09          0          0          0           0
Execute      2      0.00       0.00          0          0          0           0
Fetch     9486      5.91       8.81      72111      54035          0     9484744
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total     9490      5.94       8.91      72111      54035          0     9484744

Misses in library cache during parse: 1
Parsing user id: 164 

Rows     Row Source Operation
-------  ---------------------------------------------------
9484744  HASH JOIN  (cr=54035 pr=72111 pw=0 time=2857143 us)
  10952   TABLE ACCESS BY INDEX ROWID T1 (cr=6870 pr=33648 pw=0 time=2991228 us)
  10952    BITMAP CONVERSION TO ROWIDS (cr=345 pr=376 pw=0 time=145613 us)
      2     BITMAP AND  (cr=345 pr=376 pw=0 time=172112 us)
     11      BITMAP CONVERSION FROM ROWIDS (cr=182 pr=200 pw=0 time=75187 us)
  71429       INDEX RANGE SCAN IND_T1_C2 (cr=182 pr=200 pw=0 time=35170 us)(object id 51682)
      3      BITMAP CONVERSION FROM ROWIDS (cr=163 pr=176 pw=0 time=89497 us)
  76664       SORT ORDER BY (cr=163 pr=176 pw=0 time=74771 us)
  76664        INDEX RANGE SCAN IND_T1_C3 (cr=163 pr=176 pw=0 time=30103 us)(object id 51683)
  76664   TABLE ACCESS FULL T2 (cr=47165 pr=38463 pw=0 time=1095520 us)

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                    9487        0.00          0.00
  SQL*Net message from client                  9487        0.01          5.34
  db file scattered read                       4499        0.04          2.78
  db file sequential read                       207        0.01          0.05
  SQL*Net more data to client                 56908        0.00          1.18 

In the Standard Edition, we see that Oracle performed a full table scan of table T1 and T2, and joined those tables using a hash join.  Enterprise Edition was much more creative when generating the execution plan, allowing it to complete 1.35 seconds faster.  Are we seeing another bug on the second line of the Row Source Operation execution plan with the accounting of the consistent gets and physical reads?

——————–

Repeating the tests with the Standard Edition of 11.2.0.1 and the Enterprise Edition of 11.2.0.1.

Standard Edition:

SQL> SELECT /* FIND_ME */
  2    T1.C4
  3  FROM
  4    T1
  5  WHERE
  6    T1.C2='MONDAY'
  7    AND T1.C3 BETWEEN 200 AND 255;

10952 rows selected.

Statistics
---------------------------------------------------
          1  recursive calls
          0  db block gets
      38680  consistent gets
      38658  physical reads
          0  redo size
     104990  bytes sent via SQL*Net to client
        629  bytes received via SQL*Net from client
         12  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      10952  rows processed 

Enterprise Edition:

Statistics
---------------------------------------------------
          1  recursive calls
          0  db block gets
       6874  consistent gets
       7700  physical reads
          0  redo size
     104990  bytes sent via SQL*Net to client
        630  bytes received via SQL*Net from client
         12  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
      10952  rows processed 

Interesting, Enterprise Edition performed roughly 20% of the consistent gets and physical reads reported by Standard Edition. Taking a look at the output of TKPROF.

Standard Edition:

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.01       0.01          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch       12      0.34       2.87      38658      38680          0       10952
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total       14      0.35       2.89      38658      38680          0       10952

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 51 

Rows     Row Source Operation
-------  ---------------------------------------------------
  10952  TABLE ACCESS FULL T1 (cr=38680 pr=38658 pw=0 time=2879855 us cost=3041 size=234234 card=11154)

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                      12        0.00          0.00
  db file sequential read                         1        0.01          0.01
  db file scattered read                        319        0.04          2.55
  SQL*Net message from client                    12        0.00          0.00
  SQL*Net more data to client                    11        0.00          0.00 

Enterprise Edition:

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.01       0.01          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch       12      0.17       3.29       7700       6874          0       10952
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total       14      0.18       3.31       7700       6874          0       10952

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 194 

Rows     Row Source Operation
-------  ---------------------------------------------------
  10952  TABLE ACCESS BY INDEX ROWID T1 (cr=6874 pr=7700 pw=0 time=15073 us cost=2976 size=234234 card=11154)
  10952   BITMAP CONVERSION TO ROWIDS (cr=346 pr=376 pw=0 time=1417 us)
      2    BITMAP AND  (cr=346 pr=376 pw=0 time=31853 us)
     11     BITMAP CONVERSION FROM ROWIDS (cr=183 pr=200 pw=0 time=27060 us)
  71429      INDEX RANGE SCAN IND_T1_C2 (cr=183 pr=200 pw=0 time=11755 us cost=194 size=0 card=0)(object id 83186)
      3     BITMAP CONVERSION FROM ROWIDS (cr=163 pr=176 pw=0 time=10656 us)
  76664      SORT ORDER BY (cr=163 pr=176 pw=0 time=9598 us)
  76664       INDEX RANGE SCAN IND_T1_C3 (cr=163 pr=176 pw=0 time=8830 us cost=164 size=0 card=0)(object id 83187)

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                      12        0.00          0.00
  db file scattered read                        202        0.02          0.33
  SQL*Net message from client                    12        0.00          0.00
  db file parallel read                         274        0.05          2.78
  SQL*Net more data to client                    11        0.00          0.00
  db file sequential read                         1        0.00          0.00 

Are you ready to toss out the Enterprise Edition?  Even though Standard Edition perform 5 times as many physical block reads and 5 times as many consistent gets, it completed the query 0.42 seconds faster.  Spend some time comparing the Row Source Operation execution plan with that of 10.2.0.5 – which Enterprise Edition contains the bug in the reporting of physical block reads and/or consistent gets?  Continuing with the second SQL statement:

Standard Edition:

SQL> SELECT /* FIND_ME */
  2    T1.C4,
  3    T2.C4
  4  FROM
  5    T1,
  6    T2
  7  WHERE
  8    T1.C2='MONDAY'
  9    AND T1.C3 BETWEEN 200 AND 255
 10    AND T1.C1=T2.C1
 11    AND T1.C3=T2.C3;

9484744 rows selected.

Statistics
---------------------------------------------------
          1  recursive calls
          0  db block gets
      85894  consistent gets
      77121  physical reads
          0  redo size
  125651186  bytes sent via SQL*Net to client
     104843  bytes received via SQL*Net from client
       9486  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
    9484744  rows processed 

Enterprise Edition:

Statistics
---------------------------------------------------
          1  recursive calls
          0  db block gets
      54096  consistent gets
      46238  physical reads
          0  redo size
  125651186  bytes sent via SQL*Net to client
     104844  bytes received via SQL*Net from client
       9486  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
    9484744  rows processed 

Taking a look at the output of TKPROF.

Standard Edition:

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.01       0.02          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch     9486      4.11       7.24      77121      85894          0     9484744
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total     9488      4.13       7.26      77121      85894          0     9484744

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 51 

Rows     Row Source Operation
-------  ---------------------------------------------------
9484744  HASH JOIN  (cr=85894 pr=77121 pw=0 time=2663572 us cost=6087 size=2994849 card=76791)
  10952   TABLE ACCESS FULL T1 (cr=38670 pr=38658 pw=0 time=6680619 us cost=3041 size=267696 card=11154)
  76664   TABLE ACCESS FULL T2 (cr=47224 pr=38463 pw=0 time=1238145 us cost=3045 size=1171125 card=78075)

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                    9486        0.00          0.00
  db file sequential read                         2        0.01          0.02
  db file scattered read                        637        0.04          2.82
  SQL*Net message from client                  9486        0.00         10.21
  SQL*Net more data to client                  9485        0.00          0.35
  asynch descriptor resize                        1        0.00          0.00 

Enterprise Edition:

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.01       0.02          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch     9486      4.71       7.46      46238      54096          0     9484744
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total     9488      4.72       7.48      46238      54096          0     9484744

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 194 

Rows     Row Source Operation
-------  ---------------------------------------------------
9484744  HASH JOIN  (cr=54096 pr=46238 pw=0 time=2306193 us cost=6022 size=2994849 card=76791)
  10952   TABLE ACCESS BY INDEX ROWID T1 (cr=6870 pr=7775 pw=0 time=2895266 us cost=2976 size=267696 card=11154)
  10952    BITMAP CONVERSION TO ROWIDS (cr=345 pr=376 pw=0 time=2292 us)
      2     BITMAP AND  (cr=345 pr=376 pw=0 time=30695 us)
     11      BITMAP CONVERSION FROM ROWIDS (cr=182 pr=200 pw=0 time=25870 us)
  71429       INDEX RANGE SCAN IND_T1_C2 (cr=182 pr=200 pw=0 time=11883 us cost=194 size=0 card=0)(object id 83186)
      3      BITMAP CONVERSION FROM ROWIDS (cr=163 pr=176 pw=0 time=11066 us)
  76664       SORT ORDER BY (cr=163 pr=176 pw=0 time=12158 us)
  76664        INDEX RANGE SCAN IND_T1_C3 (cr=163 pr=176 pw=0 time=10878 us cost=164 size=0 card=0)(object id 83187)
  76664   TABLE ACCESS FULL T2 (cr=47226 pr=38463 pw=0 time=291011 us cost=3045 size=1171125 card=78075)

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                    9486        0.00          0.00
  db file scattered read                        210        0.03          0.34
  db file parallel read                         274        0.04          2.69
  db file sequential read                         2        0.01          0.01
  direct path read                                9        0.01          0.04
  SQL*Net message from client                  9486        0.00         10.33
  SQL*Net more data to client                  9485        0.00          0.35
  asynch descriptor resize                        2        0.00          0.00 

Standard Edition again performed more consistent gets and physical reads than Enterprise Edition, yet it completed the execution 0.22 seconds faster than Enterprise Edition.  Compare the output of the Enterprise Edition of 11.2.0.1 with that of the Enterprise Edition of 10.2.0.5 – which release version is incorrectly reporting the number of physical reads and/or consistent gets?

——

Things to try:

  • Change some of the manually specified system (CPU) statistics, what statistic values will cause the Standard Edition to perform index range scans rather than full table scans?
  • Modify the SQL statements that join table T1 to table T2 so that instead of selecting from table T2, select from T1 aliased as T2 (create a self join).  How do the statistics generated by 10.2.0.5 differ from those generated by 11.2.0.1 when this change is made?
  • Change MONDAY to Monday in the SQL statements, and repeat the test – do the execution plans change when the test is repeated?
  • Change MONDAY to Monday in the SQL statements, generate histograms for the various column, and repeat the test  – do the execution plans change when the test is repeated?

Sample commands to generate histograms:

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T1',ESTIMATE_PERCENT=>NULL,METHOD_OPT=>'FOR ALL COLUMNS SIZE 254',NO_INVALIDATE=>FALSE)
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T2',ESTIMATE_PERCENT=>NULL,METHOD_OPT=>'FOR ALL COLUMNS SIZE 254',NO_INVALIDATE=>FALSE)

——

We might need another test script, one which joins several tables having primary and foreign keys, before we see any significant performance differences.  Yet, the results of the above script are interesting.


Actions

Information

9 responses

22 11 2010
Noons

Thanks for sharing this. “interesting” is nott he word I’d use. More like: “scary”.
There is no description or hint anywhere claiming EE performs differently than SE other than in the use of parallelism and partitioning where possible.
Not the case here?

23 11 2010
dombrooks

We’re already expecting parallel operations not to work on Standard Edition.
The same is true of bitmap indexes – an EE edition only feature.

But this demonstrates that it’s not just the creation of a bitmap index – which is what you might initially thing – it’s also internal operations like the old btree bitmap conversion.

This operation should also be familiar as a potential troublemaker (not always) and so it being available for consideration by the optimizer in EE is not always a good thing.

23 11 2010
Niall Litchfield

Noons,

SE does perform differently than EE for a variety of reasons, mostly related to features available in EE but not SE, these aren’t limited to parallelism and partitioning and the effects can be subtle. The interesting thing with Charles’ test is how suitable many of his queries are (regarded as being by the cbo) for b_tree_bitmap_plans. These are not available in SE (though I definitely seem to recall seeing them – and disabling them – in either 9.0 or 9.2 SE, don’t have either of those available to test anymore). In general you can expect these plans to occur when you get either an ‘and’ or an ‘or’ predicate on two columns which are each indexed and the indexes are somewhat selective.

Charles,
My you are pumping out these tests fast, and stealing some of my thunder from that email conversation :), I was working through a similar line of enquiry, but the day job and next weeks presentation are getting in the way 🙂 I am a little bit suprised that you didn’t get a CONCATENATION operation in the index join case in SE – that’s what I got in 11.2.0.1 from your first test. It would be interesting (to me) to see the 10053 traces and cost calculations.

Niall

23 11 2010
Charles Hooper

Niall,

I suspect that there is still a lot that I have not covered yet regarding the differences between Standard Edition and Enterprise Edition – probably plenty that I am not aware of yet. I look forward to reading your blog article series on the topic. I will have to take a closer look at the 11.2.0.1 Standard Edition results – did you use the same workload system statistics that I listed at the start of the article, or did you use other settings that are more accurate for your server environment? It took close to 5 hours to put together this blog article, even though it probably looks like 20 minutes of effort (writing the darn test script took too long, and I ended up deleting the last half of the test script because the results were too similar to the first half).

23 11 2010
Noons

I’m always a bit turned off by the “effects can be subtle” thing. To me that reads “we don’t exactly know”. And given the hopelessness of getting any true in depth information from Oracle for more than one release on exactly how does the optimizer work, we’re better off testing,testing,testing…
Which is not good and doesn’t help make a case for clients out there upgrading their software: they can see clearly how expensive it’s truly gonna be.
I agree with you entirely that the differences in behaviour present here are due to features of EE that are simply not present – or partly disabled – in SE. Bitmaps being an obvious one.
There should really be an option for the optimizer to indicate which features have not been considered in elaborating a plan.
What I find interesting is that the 10g-onwards auto-maintenance default jobs make full use of creating/dropping partitions to clean up their work tables, regardless of the db being SE/EE/EE with PO. Which seems to indicate the ability is there for the optimizer to clearly indicate what has been turned off/on, regardless of database level/version.

23 11 2010
Niall Litchfield

Hi Noons.

I said it and I agree! BTree/Bitmap conversion is kinda interesting to me, since you can still set the parameter in SE even in 11.2.0.1

SYS @ SE >@show_parameter _b_tree_bitmap_plans
old   4: AND kSPPINM='&1'
new   4: AND kSPPINM='_b_tree_bitmap_plans'

NAME                                                             VALUE
---------------------------------------------------------------- ------------------------------
_b_tree_bitmap_plans                                             TRUE

SYS @ SE >

but it doesn’t seem to be honoured (perhaps rightly, perhaps not).

I like your suggestion re: indicating which features have been considered. Perhaps externalizing the presumably existing STRUCT(s?) behind optimizer_features_enable would be useful. I’m pretty sure that even back to 8i there were the odd partitioned objects used by SYS even though constructing them was not available in the product.

23 11 2010
Noons

Exactly. What we need really is a “10054” – or a “10053 on steroids” if you prefer – that clearly spells out what optmizer-affecting db options have been taken into consideration in a given plan. It’d go a long way in dispelling all these “peformance ghosts” we keep seeing on every release/patch level combo and between instances.

Having been tasked to make our test, acceptance and production instances behave the same way when running the same SQL, I find myself fighting this same battle over and over again, when really it should be a lot easier. Constant source of headaches, as is.

What’s worse: every new release/patch level introduces the potential for different behaviour. Exhaustively testing all that and preventing it turns out to be expensive, if one considers the perspective of the software developer as pointed out by Charles. Any wonder why sometimes those developers take weird decisions in terms of db design and SQL usage? I can hear their pain…

Charles pointed out some of the very important CBO behaviour differences with seemingly harmless changes in unrelated parameters. Hidden features like the “behind the scenes parallelism” we both mentioned add to this problem: there is no way to be sure we’re not seeing a side effect of one of those… I’m abstracting the side effects of any bu^H^Hfeatures that might creep in, of course!

I suppose it makes the job interesting, but really I’d prefer to have it simplified!

And no: OEM and grid’s performance pack won’t resolve this problem, it doesn’t even begin to understand it! (just pre-empting the strike before someone else jumps in with that furphy!) 🙂

23 11 2010
Charles Hooper

I was a bit concerned that I did not post enough detail in the article in between the scripts and script output. Noons, Dombrooks, and Niall, I appreciate your comments – those comments filled in what I considered to be the worst of the gaps in this blog article.

Just to spin things a bit, I repeated the test on my laptop, which has the Enterprise Edition of 11.2.0.1. (For some reason, TKPROF in 11.2.0.1 has now decided to start including the “Rows (1st) Rows (avg) Rows (max)” columns in the output – previously I could only see those columns when processing a trace file with TKPROF from 11.2.0.2.) I did not change the workload system (CPU) statistics, so the 10053 portion of the trace file showed those values as:

-----------------------------
SYSTEM STATISTICS INFORMATION
-----------------------------
  Using WORKLOAD Stats
  CPUSPEED: 2714 millions instructions/sec
  SREADTIM: 0.100000 milliseconds
  MREADTIM: 0.200000 millisecons
  MBRC: 78 blocks
  MAXTHR: 302080 bytes/sec
  SLAVETHR: -1 bytes/sec

Also, it appears that DB_FILE_MULTIBLOCK_READ_COUNT was forced to a value of 128, while it was autoset to 128 in the other databases. And rather than having the SGA_TARGET set to 8000M and the PGA_AGGREGATE_TARGET set 1800M, those parameters were set to 12000M and 2000M, respectively.

Now, what do you think happened to the statistics displayed in TKPROF and the execution plans?

The SQL*Plus output for the first SQL statement:

SQL> SELECT /* FIND_ME */
  2    T1.C4
  3  FROM
  4    T1
  5  WHERE
  6    T1.C2='MONDAY'
  7    AND T1.C3 BETWEEN 200 AND 255;
 
10952 rows selected.
 
Statistics
---------------------------------------------------
          1  recursive calls
          0  db block gets
      38485  consistent gets
      38463  physical reads
          0  redo size
     103802  bytes sent via SQL*Net to client
        469  bytes received via SQL*Net from client
         12  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      10952  rows processed

The TKPROF output for the first SQL statement:

SELECT /* FIND_ME */
  T1.C4
FROM
  T1
WHERE
  T1.C2='MONDAY'
  AND T1.C3 BETWEEN 200 AND 255
 
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.01       0.01          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch       12      0.21       0.55      38463      38485          0       10952
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total       14      0.23       0.57      38463      38485          0       10952
 
Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 286  
Number of plan statistics captured: 1
 
Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
     10952      10952      10952  TABLE ACCESS FULL T1 (cr=38485 pr=38463 pw=0 time=3880132 us cost=2459 size=234234 card=11154)
 
Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                      12        0.00          0.00
  db file sequential read                         1        0.00          0.00
  db file scattered read                        316        0.00          0.40
  SQL*Net message from client                    12        0.00          0.00
  SQL*Net more data to client                    11        0.00          0.00

The SQL*Plus output for the second SQL statement:

SQL> SELECT /* FIND_ME */
  2    T1.C4,
  3    T2.C4
  4  FROM
  5    T1,
  6    T2
  7  WHERE
  8    T1.C2='MONDAY'
  9    AND T1.C3 BETWEEN 200 AND 255
 10    AND T1.C1=T2.C1
 11    AND T1.C3=T2.C3;
 
9484744 rows selected.
 
Statistics
---------------------------------------------------
          1  recursive calls
          0  db block gets
      85724  consistent gets
      76926  physical reads
          0  redo size
  124731018  bytes sent via SQL*Net to client
     104683  bytes received via SQL*Net from client
       9486  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
    9484744  rows processed

The TKPROF output for the second SQL statement:

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.03       0.01          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch     9486      3.99       4.87      76926      85724          0     9484744
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total     9488      4.02       4.89      76926      85724          0     9484744
  
Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 286  
Number of plan statistics captured: 1
 
Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
   9484744    9484744    9484744  HASH JOIN  (cr=85724 pr=76926 pw=0 time=4592034 us cost=4982 size=2994849 card=76791)
     10952      10952      10952   TABLE ACCESS FULL T1 (cr=38475 pr=38463 pw=0 time=3799997 us cost=2459 size=267696 card=11154)
     76664      76664      76664   TABLE ACCESS FULL T2 (cr=47249 pr=38463 pw=0 time=2836531 us cost=2487 size=1171125 card=78075)
 
Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                    9486        0.00          0.00
  db file sequential read                         2        0.00          0.00
  db file scattered read                        632        0.00          0.66
  SQL*Net message from client                  9486        0.06         10.27
  SQL*Net more data to client                  9485        0.00          0.35
  asynch descriptor resize                        1        0.00          0.00

Now, compare the above Row Source Operation execution plans with those of Standard Edition in the article – the Row Source Operation execution plans are quite similar to what we saw with Standard Edition, but the wait event might be a bit different. Note also the “asynch descriptor resize” wait event – I wonder what that means?

If I have learned anything from this exercise it is that there are plenty of opportunities to see different performance in development, test, and production databases when the database edition does not match, and when the workload system (CPU) statistics do not match. Scary? Yes, especially for application development companies that develop database applications using an entirely different database configuration than their customers.

23 11 2010
Charles Hooper

An interesting twist, maybe. In the article I mentioned that in several cases the number of blocks read from disk exceeded the number of consistent gets – those were visible in the Row Source Operation execution plans in the Enterprise Edition. I just re-executed the test on my laptop using an unset value for DB_FILE_MULTIBLOCK_READ_COUNT and the system (CPU) statistics listed at the start of this blog article. With the changes on my laptop, I obtained the same execution plans as seen in the blog article, with one exception. The consistent gets are now always greater than or equal to the number of blocks physically read from disk. Interesting?

SQL> SELECT /* FIND_ME */
  2    T1.C4
  3  FROM
  4    T1
  5  WHERE
  6    T1.C2='MONDAY'
  7    AND T1.C3 BETWEEN 200 AND 255;

10952 rows selected.


Statistics
---------------------------------------------------
          1  recursive calls
          0  db block gets
       6884  consistent gets
       6876  physical reads
          0  redo size
     103802  bytes sent via SQL*Net to client
        470  bytes received via SQL*Net from client
         12  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
      10952  rows processed
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.01       0.01          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch       12      0.24       0.55       6876       6884          0       10952
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total       14      0.26       0.56       6876       6884          0       10952
 
Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 286  
Number of plan statistics captured: 1
 
Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
     10952      10952      10952  TABLE ACCESS BY INDEX ROWID T1 (cr=6884 pr=6876 pw=0 time=20871 us cost=2973 size=234234 card=11154)
     10952      10952      10952   BITMAP CONVERSION TO ROWIDS (cr=346 pr=345 pw=0 time=2963 us)
         2          2          2    BITMAP AND  (cr=346 pr=345 pw=0 time=24470 us)
        11         11         11     BITMAP CONVERSION FROM ROWIDS (cr=183 pr=182 pw=0 time=57010 us)
     71428      71428      71428      INDEX RANGE SCAN IND_T1_C2 (cr=183 pr=182 pw=0 time=70788 us cost=194 size=0 card=0)(object id 82006)
         3          3          3     BITMAP CONVERSION FROM ROWIDS (cr=163 pr=163 pw=0 time=10584 us)
     76664      76664      76664      SORT ORDER BY (cr=163 pr=163 pw=0 time=12414 us)
     76664      76664      76664       INDEX RANGE SCAN IND_T1_C3 (cr=163 pr=163 pw=0 time=19709 us cost=164 size=0 card=0)(object id 82007)
 
Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                      12        0.00          0.00
  db file sequential read                       493        0.00          0.08
  SQL*Net message from client                    12        0.00          0.00
  db file parallel read                         274        0.00          0.30
  SQL*Net more data to client                    11        0.00          0.00
SQL> SELECT /* FIND_ME */
  2    T1.C4,
  3    T2.C4
  4  FROM
  5    T1,
  6    T2
  7  WHERE
  8    T1.C2='MONDAY'
  9    AND T1.C3 BETWEEN 200 AND 255
 10    AND T1.C1=T2.C1
 11    AND T1.C3=T2.C3;

9484744 rows selected.


Statistics
---------------------------------------------------
          1  recursive calls
          0  db block gets
      54125  consistent gets
      45339  physical reads
          0  redo size
  124731018  bytes sent via SQL*Net to client
     104684  bytes received via SQL*Net from client
       9486  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
    9484744  rows processed
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.03       0.01          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch     9486      4.16       4.92      45339      54125          0     9484744
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total     9488      4.19       4.94      45339      54125          0     9484744
 
Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 286  
Number of plan statistics captured: 1
 
Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
   9484744    9484744    9484744  HASH JOIN  (cr=54125 pr=45339 pw=0 time=4969125 us cost=5999 size=2994849 card=76791)
     10952      10952      10952   TABLE ACCESS BY INDEX ROWID T1 (cr=6876 pr=6876 pw=0 time=1043019 us cost=2973 size=267696 card=11154)
     10952      10952      10952    BITMAP CONVERSION TO ROWIDS (cr=345 pr=345 pw=0 time=4456 us)
         2          2          2     BITMAP AND  (cr=345 pr=345 pw=0 time=24642 us)
        11         11         11      BITMAP CONVERSION FROM ROWIDS (cr=182 pr=182 pw=0 time=53970 us)
     71428      71428      71428       INDEX RANGE SCAN IND_T1_C2 (cr=182 pr=182 pw=0 time=68743 us cost=194 size=0 card=0)(object id 82006)
         3          3          3      BITMAP CONVERSION FROM ROWIDS (cr=163 pr=163 pw=0 time=11190 us)
     76664      76664      76664       SORT ORDER BY (cr=163 pr=163 pw=0 time=10622 us)
     76664      76664      76664        INDEX RANGE SCAN IND_T1_C3 (cr=163 pr=163 pw=0 time=22013 us cost=164 size=0 card=0)(object id 82007)
     76664      76664      76664   TABLE ACCESS FULL T2 (cr=47249 pr=38463 pw=0 time=3398812 us cost=3025 size=1171125 card=78075)
 
Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                    9486        0.00          0.00
  db file sequential read                       620        0.00          0.15
  db file parallel read                         274        0.00          0.31
  db file scattered read                        316        0.00          0.15
  SQL*Net message from client                  9486        0.06         10.45
  SQL*Net more data to client                  9485        0.00          0.34

Now I am starting to wonder if Oracle Corp. released a buggy version of 11.2.0.1 that can’t count physical block reads correctly, and a less buggy version that can count physical block reads and also adds the “Rows (1st) Rows (avg) Rows (max)” columns in the TKPROF output?

Leave a reply to Noons Cancel reply