November 22, 2010
(Back to the Previous Post in the Series) (Forward to the Next Post in the Series)
In the previous articles of this series we looked at different performance behavior exhibited by Standard Edition and Enterprise Editions for release versions 10.2.0.5 and 11.2.0.1, when presented with a very simple SQL statement that forced an index access path. Today’s article takes a slightly different path, where we will see radically different execution plans for the Standard Edition and Enterprise Edition, as well as an issue that will cause some of the proponents of the buffer cache hit ratio to panic when the statistics produced by the Enterprise Edition of 10.2.0.5 are compared to the statistics produced by the Enterprise Edition of 11.2.0.1 (another new discussion thread on the OTN forums discussing the BCHR).
Once again, we will manually specify the system (CPU) statistics as follows so that we might prevent slight differences in the statistics already set for the databases from causing unexpected results:
EXEC DBMS_STATS.SET_SYSTEM_STATS('MBRC',16) EXEC DBMS_STATS.SET_SYSTEM_STATS('CPUSPEED',2664) EXEC DBMS_STATS.SET_SYSTEM_STATS('SREADTIM',8) EXEC DBMS_STATS.SET_SYSTEM_STATS('MREADTIM',10) EXEC DBMS_STATS.SET_SYSTEM_STATS('MAXTHR',19181568)
The script to create the two test tables and associated indexes:
CREATE TABLE T1 ( C1 NUMBER NOT NULL, C2 VARCHAR2(20) NOT NULL, C3 NUMBER NOT NULL, C4 DATE NOT NULL, PADDING VARCHAR2(500)); INSERT INTO T1 SELECT TO_NUMBER(TO_CHAR(TRUNC(SYSDATE)+(ROWNUM-1), 'DD')), TRIM(TO_CHAR(TRUNC(SYSDATE)+(ROWNUM-1), 'DAY')), TO_NUMBER(TO_CHAR(TRUNC(SYSDATE)+(ROWNUM-1), 'DDD')), TRUNC(SYSDATE)+(ROWNUM-1), LPAD('A',500,'A') FROM DUAL CONNECT BY LEVEL<=500000; COMMIT; CREATE INDEX IND_T1_C1 ON T1(C1); CREATE INDEX IND_T1_C2 ON T1(C2); CREATE INDEX IND_T1_C3 ON T1(C3); EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T1',ESTIMATE_PERCENT=>NULL,METHOD_OPT=>'FOR ALL COLUMNS SIZE 1',NO_INVALIDATE=>FALSE) CREATE TABLE T2 AS SELECT * FROM T1; CREATE INDEX IND_T2_C1 ON T2(C1); CREATE INDEX IND_T2_C2 ON T2(C2); CREATE INDEX IND_T2_C3 ON T2(C3); EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T2',ESTIMATE_PERCENT=>NULL,METHOD_OPT=>'FOR ALL COLUMNS SIZE 1',NO_INVALIDATE=>FALSE) COLUMN SEGMENT FORMAT A12 SELECT SEGMENT_NAME SEGMENT, COUNT(*) EXTENTS, BYTES/1024 EXT_SIZE_KB, (COUNT(*) * BYTES)/1048576 TOTAL_MB FROM DBA_EXTENTS WHERE OWNER=USER AND SEGMENT_NAME IN ('IND_T1_C1','IND_T1_C2','IND_T1_C3','T1') GROUP BY SEGMENT_NAME, BYTES ORDER BY SEGMENT_NAME, BYTES;
The tables were created in an ASSM autoallocate tablespace, with extents allocated as follows (consistent results with 10.2.0.5 and 11.2.0.1):
SEGMENT EXTENTS EXT_SIZE_KB TOTAL_MB ------------ ---------- ----------- ---------- IND_T1_C1 16 64 1 IND_T1_C1 7 1024 7 IND_T1_C2 16 64 1 IND_T1_C2 10 1024 10 IND_T1_C3 16 64 1 IND_T1_C3 8 1024 8 T1 16 64 1 T1 63 1024 63 T1 30 8192 240
The test script, once again divided into two nearly identical halves, with the results from the first half thrown out. The first SQL statement retrieves the dates on which the day of the week is Monday and the day of the year is between 200 and 255. The second SQL statement repeats the request with table T1 joined in a many to many type join to table T2:
ALTER SYSTEM FLUSH BUFFER_CACHE; ALTER SYSTEM FLUSH BUFFER_CACHE; SET ARRAYSIZE 1000 SET AUTOTRACE TRACEONLY STATISTICS ALTER SESSION SET TRACEFILE_IDENTIFIER = 'IND_JOIN_10046_10053_JUNK'; ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT FOREVER, LEVEL 8'; ALTER SESSION SET EVENTS '10053 TRACE NAME CONTEXT FOREVER, LEVEL 1'; SELECT /* JUNK */ T1.C4 FROM T1 WHERE T1.C2='MONDAY' AND T1.C3 BETWEEN 200 AND 255; SELECT /* JUNK */ T1.C4, T2.C4 FROM T1, T2 WHERE T1.C2='MONDAY' AND T1.C3 BETWEEN 200 AND 255 AND T1.C1=T2.C1 AND T1.C3=T2.C3; ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT OFF'; ALTER SESSION SET EVENTS '10053 TRACE NAME CONTEXT OFF'; SET AUTOTRACE OFF ALTER SYSTEM FLUSH BUFFER_CACHE; ALTER SYSTEM FLUSH BUFFER_CACHE; SET AUTOTRACE TRACEONLY STATISTICS ALTER SESSION SET TRACEFILE_IDENTIFIER = 'IND_JOIN_10046_10053'; ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT FOREVER, LEVEL 8'; ALTER SESSION SET EVENTS '10053 TRACE NAME CONTEXT FOREVER, LEVEL 1'; SELECT /* FIND_ME */ T1.C4 FROM T1 WHERE T1.C2='MONDAY' AND T1.C3 BETWEEN 200 AND 255; SELECT SYSDATE FROM DUAL; ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT OFF'; ALTER SESSION SET EVENTS '10053 TRACE NAME CONTEXT OFF'; ALTER SYSTEM FLUSH BUFFER_CACHE; ALTER SYSTEM FLUSH BUFFER_CACHE; ALTER SESSION SET TRACEFILE_IDENTIFIER = 'IND_JOIN_10046_10053_J'; ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT FOREVER, LEVEL 8'; ALTER SESSION SET EVENTS '10053 TRACE NAME CONTEXT FOREVER, LEVEL 1'; SELECT /* FIND_ME */ T1.C4, T2.C4 FROM T1, T2 WHERE T1.C2='MONDAY' AND T1.C3 BETWEEN 200 AND 255 AND T1.C1=T2.C1 AND T1.C3=T2.C3; SELECT SYSDATE FROM DUAL; ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT OFF'; ALTER SESSION SET EVENTS '10053 TRACE NAME CONTEXT OFF'; SET AUTOTRACE OFF
We will start by comparing the statistics when the above script is executed in Oracle Database 10.2.0.5.
Standard Edition:
SQL> SELECT /* FIND_ME */ 2 T1.C4 3 FROM 4 T1 5 WHERE 6 T1.C2='MONDAY' 7 AND T1.C3 BETWEEN 200 AND 255; 10952 rows selected. Statistics --------------------------------------------------- 1 recursive calls 0 db block gets 38681 consistent gets 38658 physical reads 0 redo size 104979 bytes sent via SQL*Net to client 582 bytes received via SQL*Net from client 12 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 10952 rows processed
Enterprise Edition:
Statistics --------------------------------------------------- 1 recursive calls 0 db block gets 6878 consistent gets 33648 physical reads 0 redo size 104979 bytes sent via SQL*Net to client 582 bytes received via SQL*Net from client 12 SQL*Net roundtrips to/from client 1 sorts (memory) 0 sorts (disk) 10952 rows processed
The Standard Edition read 38,658 blocks from disk and performed 38,681 consistent gets, so it might have selected to perform a full table scan (using the extent allocation displayed above, 304MB / 8KB = 38,912, which is close to the number of physical blocks that were read). The Enterprise Edition read 33,648 blocks from disk and performed only 6,878 consistent gets (what happened to my BCHR, quick – downgrade to Standard Edition). It is interesting that the number of consistent gets is significantly less than the number of blocks read from disk, and the number of blocks read from disk is less than that of the Standard Edition. Interesting, how did the Enterprise Edition do that? Let’s take a look at the TKPROF output.
Standard Edition:
call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 2 0.01 0.02 0 0 0 0 Execute 2 0.00 0.00 0 0 0 0 Fetch 12 0.15 3.72 38658 38681 0 10952 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 16 0.17 3.74 38658 38681 0 10952 Misses in library cache during parse: 1 Parsing user id: 47 Rows Row Source Operation ------- --------------------------------------------------- 10952 TABLE ACCESS FULL T1 (cr=38681 pr=38658 pw=0 time=25071316 us) Elapsed times include waiting on following events: Event waited on Times Max. Wait Total Waited ---------------------------------------- Waited ---------- ------------ SQL*Net message to client 13 0.00 0.00 SQL*Net message from client 13 0.01 0.02 db file sequential read 1 0.01 0.01 db file scattered read 319 0.05 3.49 SQL*Net more data to client 46 0.00 0.00
Enterprise Edition:
call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 2 0.01 0.05 0 0 0 0 Execute 2 0.00 0.00 0 0 0 0 Fetch 12 0.40 3.93 33648 6878 0 10952 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 16 0.42 3.99 33648 6878 0 10952 Misses in library cache during parse: 1 Parsing user id: 164 Rows Row Source Operation ------- --------------------------------------------------- 10952 TABLE ACCESS BY INDEX ROWID T1 (cr=6878 pr=33648 pw=0 time=5271629 us) 10952 BITMAP CONVERSION TO ROWIDS (cr=346 pr=376 pw=0 time=162037 us) 2 BITMAP AND (cr=346 pr=376 pw=0 time=189455 us) 11 BITMAP CONVERSION FROM ROWIDS (cr=183 pr=200 pw=0 time=103387 us) 71429 INDEX RANGE SCAN IND_T1_C2 (cr=183 pr=200 pw=0 time=65401 us)(object id 51682) 3 BITMAP CONVERSION FROM ROWIDS (cr=163 pr=176 pw=0 time=71984 us) 76664 SORT ORDER BY (cr=163 pr=176 pw=0 time=57590 us) 76664 INDEX RANGE SCAN IND_T1_C3 (cr=163 pr=176 pw=0 time=13038 us)(object id 51683) Elapsed times include waiting on following events: Event waited on Times Max. Wait Total Waited ---------------------------------------- Waited ---------- ------------ SQL*Net message to client 13 0.00 0.00 SQL*Net message from client 13 0.01 0.01 db file scattered read 4181 0.04 3.47 SQL*Net more data to client 46 0.00 0.00 db file sequential read 206 0.03 0.13
Even though the Standard Edition read more blocks from disk, and performed considerably more consistent gets, it executed the SQL statement 0.25 seconds faster than the Enterprise Edition. Enterprise Edition certainly gnerated a more interesting execution plan. So, why are the consistent gets significantly less than the number of blocks read from disk (cr= and query indicates consistent gets, pr= and disk indicates blocks read from disk)? Is the difference caused by a bug (scroll down and compare with the output from 11.2.0.1)? Moving on to the second SQL statement.
Standard Edition:
SQL> SELECT /* FIND_ME */ 2 T1.C4, 3 T2.C4 4 FROM 5 T1, 6 T2 7 WHERE 8 T1.C2='MONDAY' 9 AND T1.C3 BETWEEN 200 AND 255 10 AND T1.C1=T2.C1 11 AND T1.C3=T2.C3; 9484744 rows selected. Statistics --------------------------------------------------- 1 recursive calls 0 db block gets 85913 consistent gets 77121 physical reads 0 redo size 125651171 bytes sent via SQL*Net to client 104796 bytes received via SQL*Net from client 9486 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 9484744 rows processed
Enterprise Edition:
Statistics --------------------------------------------------- 1 recursive calls 0 db block gets 54035 consistent gets 72111 physical reads 0 redo size 125651171 bytes sent via SQL*Net to client 104796 bytes received via SQL*Net from client 9486 SQL*Net roundtrips to/from client 1 sorts (memory) 0 sorts (disk) 9484744 rows processed
The physical reads are close for the two editions, but again the consistent gets differs significantly. Taking a look at the output of TKPROF.
Standard Edition:
call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 2 0.01 0.04 0 0 0 0 Execute 2 0.00 0.00 0 0 0 0 Fetch 9486 6.50 10.21 77121 85913 0 9484744 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 9490 6.52 10.26 77121 85913 0 9484744 Misses in library cache during parse: 1 Parsing user id: 47 Rows Row Source Operation ------- --------------------------------------------------- 9484744 HASH JOIN (cr=85913 pr=77121 pw=0 time=3814912 us) 10952 TABLE ACCESS FULL T1 (cr=38670 pr=38658 pw=0 time=20749036 us) 76664 TABLE ACCESS FULL T2 (cr=47243 pr=38463 pw=0 time=2036828 us) Elapsed times include waiting on following events: Event waited on Times Max. Wait Total Waited ---------------------------------------- Waited ---------- ------------ SQL*Net message to client 9487 0.00 0.00 SQL*Net message from client 9487 0.01 5.04 db file sequential read 2 0.01 0.01 db file scattered read 637 0.06 4.01 SQL*Net more data to client 56908 0.00 1.54
Enterprise Edition:
call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 2 0.03 0.09 0 0 0 0 Execute 2 0.00 0.00 0 0 0 0 Fetch 9486 5.91 8.81 72111 54035 0 9484744 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 9490 5.94 8.91 72111 54035 0 9484744 Misses in library cache during parse: 1 Parsing user id: 164 Rows Row Source Operation ------- --------------------------------------------------- 9484744 HASH JOIN (cr=54035 pr=72111 pw=0 time=2857143 us) 10952 TABLE ACCESS BY INDEX ROWID T1 (cr=6870 pr=33648 pw=0 time=2991228 us) 10952 BITMAP CONVERSION TO ROWIDS (cr=345 pr=376 pw=0 time=145613 us) 2 BITMAP AND (cr=345 pr=376 pw=0 time=172112 us) 11 BITMAP CONVERSION FROM ROWIDS (cr=182 pr=200 pw=0 time=75187 us) 71429 INDEX RANGE SCAN IND_T1_C2 (cr=182 pr=200 pw=0 time=35170 us)(object id 51682) 3 BITMAP CONVERSION FROM ROWIDS (cr=163 pr=176 pw=0 time=89497 us) 76664 SORT ORDER BY (cr=163 pr=176 pw=0 time=74771 us) 76664 INDEX RANGE SCAN IND_T1_C3 (cr=163 pr=176 pw=0 time=30103 us)(object id 51683) 76664 TABLE ACCESS FULL T2 (cr=47165 pr=38463 pw=0 time=1095520 us) Elapsed times include waiting on following events: Event waited on Times Max. Wait Total Waited ---------------------------------------- Waited ---------- ------------ SQL*Net message to client 9487 0.00 0.00 SQL*Net message from client 9487 0.01 5.34 db file scattered read 4499 0.04 2.78 db file sequential read 207 0.01 0.05 SQL*Net more data to client 56908 0.00 1.18
In the Standard Edition, we see that Oracle performed a full table scan of table T1 and T2, and joined those tables using a hash join. Enterprise Edition was much more creative when generating the execution plan, allowing it to complete 1.35 seconds faster. Are we seeing another bug on the second line of the Row Source Operation execution plan with the accounting of the consistent gets and physical reads?
——————–
Repeating the tests with the Standard Edition of 11.2.0.1 and the Enterprise Edition of 11.2.0.1.
Standard Edition:
SQL> SELECT /* FIND_ME */ 2 T1.C4 3 FROM 4 T1 5 WHERE 6 T1.C2='MONDAY' 7 AND T1.C3 BETWEEN 200 AND 255; 10952 rows selected. Statistics --------------------------------------------------- 1 recursive calls 0 db block gets 38680 consistent gets 38658 physical reads 0 redo size 104990 bytes sent via SQL*Net to client 629 bytes received via SQL*Net from client 12 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 10952 rows processed
Enterprise Edition:
Statistics --------------------------------------------------- 1 recursive calls 0 db block gets 6874 consistent gets 7700 physical reads 0 redo size 104990 bytes sent via SQL*Net to client 630 bytes received via SQL*Net from client 12 SQL*Net roundtrips to/from client 1 sorts (memory) 0 sorts (disk) 10952 rows processed
Interesting, Enterprise Edition performed roughly 20% of the consistent gets and physical reads reported by Standard Edition. Taking a look at the output of TKPROF.
Standard Edition:
call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 1 0.01 0.01 0 0 0 0 Execute 1 0.00 0.00 0 0 0 0 Fetch 12 0.34 2.87 38658 38680 0 10952 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 14 0.35 2.89 38658 38680 0 10952 Misses in library cache during parse: 1 Optimizer mode: ALL_ROWS Parsing user id: 51 Rows Row Source Operation ------- --------------------------------------------------- 10952 TABLE ACCESS FULL T1 (cr=38680 pr=38658 pw=0 time=2879855 us cost=3041 size=234234 card=11154) Elapsed times include waiting on following events: Event waited on Times Max. Wait Total Waited ---------------------------------------- Waited ---------- ------------ SQL*Net message to client 12 0.00 0.00 db file sequential read 1 0.01 0.01 db file scattered read 319 0.04 2.55 SQL*Net message from client 12 0.00 0.00 SQL*Net more data to client 11 0.00 0.00
Enterprise Edition:
call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 1 0.01 0.01 0 0 0 0 Execute 1 0.00 0.00 0 0 0 0 Fetch 12 0.17 3.29 7700 6874 0 10952 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 14 0.18 3.31 7700 6874 0 10952 Misses in library cache during parse: 1 Optimizer mode: ALL_ROWS Parsing user id: 194 Rows Row Source Operation ------- --------------------------------------------------- 10952 TABLE ACCESS BY INDEX ROWID T1 (cr=6874 pr=7700 pw=0 time=15073 us cost=2976 size=234234 card=11154) 10952 BITMAP CONVERSION TO ROWIDS (cr=346 pr=376 pw=0 time=1417 us) 2 BITMAP AND (cr=346 pr=376 pw=0 time=31853 us) 11 BITMAP CONVERSION FROM ROWIDS (cr=183 pr=200 pw=0 time=27060 us) 71429 INDEX RANGE SCAN IND_T1_C2 (cr=183 pr=200 pw=0 time=11755 us cost=194 size=0 card=0)(object id 83186) 3 BITMAP CONVERSION FROM ROWIDS (cr=163 pr=176 pw=0 time=10656 us) 76664 SORT ORDER BY (cr=163 pr=176 pw=0 time=9598 us) 76664 INDEX RANGE SCAN IND_T1_C3 (cr=163 pr=176 pw=0 time=8830 us cost=164 size=0 card=0)(object id 83187) Elapsed times include waiting on following events: Event waited on Times Max. Wait Total Waited ---------------------------------------- Waited ---------- ------------ SQL*Net message to client 12 0.00 0.00 db file scattered read 202 0.02 0.33 SQL*Net message from client 12 0.00 0.00 db file parallel read 274 0.05 2.78 SQL*Net more data to client 11 0.00 0.00 db file sequential read 1 0.00 0.00
Are you ready to toss out the Enterprise Edition? Even though Standard Edition perform 5 times as many physical block reads and 5 times as many consistent gets, it completed the query 0.42 seconds faster. Spend some time comparing the Row Source Operation execution plan with that of 10.2.0.5 – which Enterprise Edition contains the bug in the reporting of physical block reads and/or consistent gets? Continuing with the second SQL statement:
Standard Edition:
SQL> SELECT /* FIND_ME */ 2 T1.C4, 3 T2.C4 4 FROM 5 T1, 6 T2 7 WHERE 8 T1.C2='MONDAY' 9 AND T1.C3 BETWEEN 200 AND 255 10 AND T1.C1=T2.C1 11 AND T1.C3=T2.C3; 9484744 rows selected. Statistics --------------------------------------------------- 1 recursive calls 0 db block gets 85894 consistent gets 77121 physical reads 0 redo size 125651186 bytes sent via SQL*Net to client 104843 bytes received via SQL*Net from client 9486 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 9484744 rows processed
Enterprise Edition:
Statistics --------------------------------------------------- 1 recursive calls 0 db block gets 54096 consistent gets 46238 physical reads 0 redo size 125651186 bytes sent via SQL*Net to client 104844 bytes received via SQL*Net from client 9486 SQL*Net roundtrips to/from client 1 sorts (memory) 0 sorts (disk) 9484744 rows processed
Taking a look at the output of TKPROF.
Standard Edition:
call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 1 0.01 0.02 0 0 0 0 Execute 1 0.00 0.00 0 0 0 0 Fetch 9486 4.11 7.24 77121 85894 0 9484744 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 9488 4.13 7.26 77121 85894 0 9484744 Misses in library cache during parse: 1 Optimizer mode: ALL_ROWS Parsing user id: 51 Rows Row Source Operation ------- --------------------------------------------------- 9484744 HASH JOIN (cr=85894 pr=77121 pw=0 time=2663572 us cost=6087 size=2994849 card=76791) 10952 TABLE ACCESS FULL T1 (cr=38670 pr=38658 pw=0 time=6680619 us cost=3041 size=267696 card=11154) 76664 TABLE ACCESS FULL T2 (cr=47224 pr=38463 pw=0 time=1238145 us cost=3045 size=1171125 card=78075) Elapsed times include waiting on following events: Event waited on Times Max. Wait Total Waited ---------------------------------------- Waited ---------- ------------ SQL*Net message to client 9486 0.00 0.00 db file sequential read 2 0.01 0.02 db file scattered read 637 0.04 2.82 SQL*Net message from client 9486 0.00 10.21 SQL*Net more data to client 9485 0.00 0.35 asynch descriptor resize 1 0.00 0.00
Enterprise Edition:
call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 1 0.01 0.02 0 0 0 0 Execute 1 0.00 0.00 0 0 0 0 Fetch 9486 4.71 7.46 46238 54096 0 9484744 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 9488 4.72 7.48 46238 54096 0 9484744 Misses in library cache during parse: 1 Optimizer mode: ALL_ROWS Parsing user id: 194 Rows Row Source Operation ------- --------------------------------------------------- 9484744 HASH JOIN (cr=54096 pr=46238 pw=0 time=2306193 us cost=6022 size=2994849 card=76791) 10952 TABLE ACCESS BY INDEX ROWID T1 (cr=6870 pr=7775 pw=0 time=2895266 us cost=2976 size=267696 card=11154) 10952 BITMAP CONVERSION TO ROWIDS (cr=345 pr=376 pw=0 time=2292 us) 2 BITMAP AND (cr=345 pr=376 pw=0 time=30695 us) 11 BITMAP CONVERSION FROM ROWIDS (cr=182 pr=200 pw=0 time=25870 us) 71429 INDEX RANGE SCAN IND_T1_C2 (cr=182 pr=200 pw=0 time=11883 us cost=194 size=0 card=0)(object id 83186) 3 BITMAP CONVERSION FROM ROWIDS (cr=163 pr=176 pw=0 time=11066 us) 76664 SORT ORDER BY (cr=163 pr=176 pw=0 time=12158 us) 76664 INDEX RANGE SCAN IND_T1_C3 (cr=163 pr=176 pw=0 time=10878 us cost=164 size=0 card=0)(object id 83187) 76664 TABLE ACCESS FULL T2 (cr=47226 pr=38463 pw=0 time=291011 us cost=3045 size=1171125 card=78075) Elapsed times include waiting on following events: Event waited on Times Max. Wait Total Waited ---------------------------------------- Waited ---------- ------------ SQL*Net message to client 9486 0.00 0.00 db file scattered read 210 0.03 0.34 db file parallel read 274 0.04 2.69 db file sequential read 2 0.01 0.01 direct path read 9 0.01 0.04 SQL*Net message from client 9486 0.00 10.33 SQL*Net more data to client 9485 0.00 0.35 asynch descriptor resize 2 0.00 0.00
Standard Edition again performed more consistent gets and physical reads than Enterprise Edition, yet it completed the execution 0.22 seconds faster than Enterprise Edition. Compare the output of the Enterprise Edition of 11.2.0.1 with that of the Enterprise Edition of 10.2.0.5 – which release version is incorrectly reporting the number of physical reads and/or consistent gets?
——
Things to try:
- Change some of the manually specified system (CPU) statistics, what statistic values will cause the Standard Edition to perform index range scans rather than full table scans?
- Modify the SQL statements that join table T1 to table T2 so that instead of selecting from table T2, select from T1 aliased as T2 (create a self join). How do the statistics generated by 10.2.0.5 differ from those generated by 11.2.0.1 when this change is made?
- Change MONDAY to Monday in the SQL statements, and repeat the test – do the execution plans change when the test is repeated?
- Change MONDAY to Monday in the SQL statements, generate histograms for the various column, and repeat the test – do the execution plans change when the test is repeated?
Sample commands to generate histograms:
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T1',ESTIMATE_PERCENT=>NULL,METHOD_OPT=>'FOR ALL COLUMNS SIZE 254',NO_INVALIDATE=>FALSE) EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T2',ESTIMATE_PERCENT=>NULL,METHOD_OPT=>'FOR ALL COLUMNS SIZE 254',NO_INVALIDATE=>FALSE)
——
We might need another test script, one which joins several tables having primary and foreign keys, before we see any significant performance differences. Yet, the results of the above script are interesting.
Thanks for sharing this. “interesting” is nott he word I’d use. More like: “scary”.
There is no description or hint anywhere claiming EE performs differently than SE other than in the use of parallelism and partitioning where possible.
Not the case here?
We’re already expecting parallel operations not to work on Standard Edition.
The same is true of bitmap indexes – an EE edition only feature.
But this demonstrates that it’s not just the creation of a bitmap index – which is what you might initially thing – it’s also internal operations like the old btree bitmap conversion.
This operation should also be familiar as a potential troublemaker (not always) and so it being available for consideration by the optimizer in EE is not always a good thing.
Noons,
SE does perform differently than EE for a variety of reasons, mostly related to features available in EE but not SE, these aren’t limited to parallelism and partitioning and the effects can be subtle. The interesting thing with Charles’ test is how suitable many of his queries are (regarded as being by the cbo) for b_tree_bitmap_plans. These are not available in SE (though I definitely seem to recall seeing them – and disabling them – in either 9.0 or 9.2 SE, don’t have either of those available to test anymore). In general you can expect these plans to occur when you get either an ‘and’ or an ‘or’ predicate on two columns which are each indexed and the indexes are somewhat selective.
Charles,
My you are pumping out these tests fast, and stealing some of my thunder from that email conversation :), I was working through a similar line of enquiry, but the day job and next weeks presentation are getting in the way 🙂 I am a little bit suprised that you didn’t get a CONCATENATION operation in the index join case in SE – that’s what I got in 11.2.0.1 from your first test. It would be interesting (to me) to see the 10053 traces and cost calculations.
Niall
Niall,
I suspect that there is still a lot that I have not covered yet regarding the differences between Standard Edition and Enterprise Edition – probably plenty that I am not aware of yet. I look forward to reading your blog article series on the topic. I will have to take a closer look at the 11.2.0.1 Standard Edition results – did you use the same workload system statistics that I listed at the start of the article, or did you use other settings that are more accurate for your server environment? It took close to 5 hours to put together this blog article, even though it probably looks like 20 minutes of effort (writing the darn test script took too long, and I ended up deleting the last half of the test script because the results were too similar to the first half).
I’m always a bit turned off by the “effects can be subtle” thing. To me that reads “we don’t exactly know”. And given the hopelessness of getting any true in depth information from Oracle for more than one release on exactly how does the optimizer work, we’re better off testing,testing,testing…
Which is not good and doesn’t help make a case for clients out there upgrading their software: they can see clearly how expensive it’s truly gonna be.
I agree with you entirely that the differences in behaviour present here are due to features of EE that are simply not present – or partly disabled – in SE. Bitmaps being an obvious one.
There should really be an option for the optimizer to indicate which features have not been considered in elaborating a plan.
What I find interesting is that the 10g-onwards auto-maintenance default jobs make full use of creating/dropping partitions to clean up their work tables, regardless of the db being SE/EE/EE with PO. Which seems to indicate the ability is there for the optimizer to clearly indicate what has been turned off/on, regardless of database level/version.
Hi Noons.
I said it and I agree! BTree/Bitmap conversion is kinda interesting to me, since you can still set the parameter in SE even in 11.2.0.1
but it doesn’t seem to be honoured (perhaps rightly, perhaps not).
I like your suggestion re: indicating which features have been considered. Perhaps externalizing the presumably existing STRUCT(s?) behind optimizer_features_enable would be useful. I’m pretty sure that even back to 8i there were the odd partitioned objects used by SYS even though constructing them was not available in the product.
Exactly. What we need really is a “10054” – or a “10053 on steroids” if you prefer – that clearly spells out what optmizer-affecting db options have been taken into consideration in a given plan. It’d go a long way in dispelling all these “peformance ghosts” we keep seeing on every release/patch level combo and between instances.
Having been tasked to make our test, acceptance and production instances behave the same way when running the same SQL, I find myself fighting this same battle over and over again, when really it should be a lot easier. Constant source of headaches, as is.
What’s worse: every new release/patch level introduces the potential for different behaviour. Exhaustively testing all that and preventing it turns out to be expensive, if one considers the perspective of the software developer as pointed out by Charles. Any wonder why sometimes those developers take weird decisions in terms of db design and SQL usage? I can hear their pain…
Charles pointed out some of the very important CBO behaviour differences with seemingly harmless changes in unrelated parameters. Hidden features like the “behind the scenes parallelism” we both mentioned add to this problem: there is no way to be sure we’re not seeing a side effect of one of those… I’m abstracting the side effects of any bu^H^Hfeatures that might creep in, of course!
I suppose it makes the job interesting, but really I’d prefer to have it simplified!
And no: OEM and grid’s performance pack won’t resolve this problem, it doesn’t even begin to understand it! (just pre-empting the strike before someone else jumps in with that furphy!) 🙂
I was a bit concerned that I did not post enough detail in the article in between the scripts and script output. Noons, Dombrooks, and Niall, I appreciate your comments – those comments filled in what I considered to be the worst of the gaps in this blog article.
Just to spin things a bit, I repeated the test on my laptop, which has the Enterprise Edition of 11.2.0.1. (For some reason, TKPROF in 11.2.0.1 has now decided to start including the “Rows (1st) Rows (avg) Rows (max)” columns in the output – previously I could only see those columns when processing a trace file with TKPROF from 11.2.0.2.) I did not change the workload system (CPU) statistics, so the 10053 portion of the trace file showed those values as:
Also, it appears that DB_FILE_MULTIBLOCK_READ_COUNT was forced to a value of 128, while it was autoset to 128 in the other databases. And rather than having the SGA_TARGET set to 8000M and the PGA_AGGREGATE_TARGET set 1800M, those parameters were set to 12000M and 2000M, respectively.
Now, what do you think happened to the statistics displayed in TKPROF and the execution plans?
The SQL*Plus output for the first SQL statement:
The TKPROF output for the first SQL statement:
The SQL*Plus output for the second SQL statement:
The TKPROF output for the second SQL statement:
Now, compare the above Row Source Operation execution plans with those of Standard Edition in the article – the Row Source Operation execution plans are quite similar to what we saw with Standard Edition, but the wait event might be a bit different. Note also the “asynch descriptor resize” wait event – I wonder what that means?
If I have learned anything from this exercise it is that there are plenty of opportunities to see different performance in development, test, and production databases when the database edition does not match, and when the workload system (CPU) statistics do not match. Scary? Yes, especially for application development companies that develop database applications using an entirely different database configuration than their customers.
An interesting twist, maybe. In the article I mentioned that in several cases the number of blocks read from disk exceeded the number of consistent gets – those were visible in the Row Source Operation execution plans in the Enterprise Edition. I just re-executed the test on my laptop using an unset value for DB_FILE_MULTIBLOCK_READ_COUNT and the system (CPU) statistics listed at the start of this blog article. With the changes on my laptop, I obtained the same execution plans as seen in the blog article, with one exception. The consistent gets are now always greater than or equal to the number of blocks physically read from disk. Interesting?
Now I am starting to wonder if Oracle Corp. released a buggy version of 11.2.0.1 that can’t count physical block reads correctly, and a less buggy version that can count physical block reads and also adds the “Rows (1st) Rows (avg) Rows (max)” columns in the TKPROF output?