TKPROF Elapsed Time Challenge – the Elapsed Time is Half of the Wait Event Time

20 05 2012

May 20, 2012

An interesting quirk was recently brought to my attention by Mich Talebzadeh.  He generated a 10046 trace at level 8 for a session, executed some SQL statements, disabled the trace, and then processed the resulting trace file with TKPROF.  His TKPROF output included the following:

UPDATE TESTWRITES SET PADDING1 = RPAD('y',4000,'y')
WHERE
OBJECT_ID = :B1

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        4      0.00       0.00          0          0          0           0
Execute  17293     21.62    1245.41    1064667    2411398    6280588     1000000
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total    17297     21.62    1245.41    1064667    2411398    6280588     1000000

Misses in library cache during parse: 1
Misses in library cache during execute: 1
Optimizer mode: ALL_ROWS
Parsing user id: 93     (recursive depth: 1)

Rows     Row Source Operation
-------  ---------------------------------------------------
      0  UPDATE  TESTWRITES (cr=214 pr=427 pw=0 time=0 us)
    100   INDEX UNIQUE SCAN TESTWRITES_PK (cr=110 pr=3 pw=0 time=0 us cost=2 size=4007 card=1)(object id 92380)

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  Disk file operations I/O                        2        0.00          0.00
  db file sequential read                   2160263        1.57       2304.22
  db file scattered read                      27769        0.26         99.78 

Mich noticed that the reported elapsed time, at 1,245.41 seconds, is less than the sum of the wait event times, at 2,400.00 seconds.  Typically, the sum of the wait events plus the CPU time should be equal to or slightly less than the elapsed time (assuming that parallel query is not used, and the CPU reporting granularity did not cause odd side effects, and the session’s process did not spend much time in the CPU run queue).  In this case, the elapsed time is about 51.9% of the wait event time for this SQL statement. 

How is this possible?  Mich stated the following:

“A comment has been it is due to using different clock granularities.”

Think about that for a minute – what could explain the odd inconsistency in the output?

I recall reading in the book “Optimizing Oracle Performance” about double-counting of statistics due to recursive database calls.  Page 92 of that book includes the following quote:

“In Oracle releases through at least Oracle9i Release 2, a database call’s c, e, p, cr, and cu statistics contain an aggregation of the resources consumed by the database call itself and its entire recursive progeny.”

Page 91 of the same book makes the following statement:

“The rule for determining the recursive relationships among database calls is simple:
A database call with dep=n + 1 is the recursive child of the first subsequent dep=n database call listed in the SQL trace data stream.”

Could it be the case that this particular issue with recursive calls was addressed in later releases of TKPROF?  Might that fix explain why the elapsed time is 51.9% of the wait event time?

The OP provided a partial test case script with a description of how the tables were created.  I reduced the size of the original source table from 1,700,000 rows to 60,000 rows, and reduced the secondary source table from 1,000,000 rows to 50,000 rows.  My test script follows (note that the line that builds the IND_T2_OBJ index is commented out – we will change that later):

DROP TABLE T1 PURGE;
DROP TABLE T2 PURGE;

CREATE TABLE T1 AS
SELECT
  *
FROM
  ALL_OBJECTS
WHERE
  ROWNUM<=60000;

ALTER TABLE T1 ADD (
  PADDING1 VARCHAR2(4000),
  PADDING2 VARCHAR2(4000));

CREATE TABLE T2 AS
SELECT
  *
FROM
  T1
WHERE
  ROWNUM<=50000;

CREATE UNIQUE INDEX IND_T1_OBJ ON T1(OBJECT_ID);
--CREATE UNIQUE INDEX IND_T2_OBJ ON T2(OBJECT_ID);

ALTER SYSTEM FLUSH SHARED_POOL;

ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE;

EXEC DBMS_MONITOR.SESSION_TRACE_ENABLE ( waits=>true, plan_stat=>'ALL_EXECUTIONS' );

DECLARE
        type ObjIdArray is table of T1.object_id%TYPE index by binary_integer;
        l_ids objIdArray;
        CURSOR c IS SELECT object_id FROM T1;
BEGIN
        OPEN c;
        LOOP
                BEGIN
                        FETCH c BULK COLLECT INTO l_ids LIMIT 100;

                        FORALL rs in 1 .. l_ids.COUNT
                                UPDATE T2
                                  SET PADDING1 =  RPAD('y',4000,'y')
                                WHERE object_id = l_ids(rs);
                        commit;
                        FORALL rs in 1 .. l_ids.COUNT
                                DELETE FROM T2
                                WHERE object_id = l_ids(rs);
                        commit;
                        FORALL rs in 1 .. l_ids.COUNT
                                INSERT INTO T2
                                SELECT * FROM T1 t WHERE t.object_id = l_ids(rs);
                        commit;
                        EXIT WHEN C%NOTFOUND;
                EXCEPTION
                  WHEN NO_DATA_FOUND THEN
                    NULL;
                  WHEN OTHERS THEN
                    DBMS_OUTPUT.PUT_LINE('Transaction failed');
                END;
        END LOOP;
        CLOSE c;
END;
/

EXEC DBMS_MONITOR.SESSION_TRACE_DISABLE;

That test case definitely contains calls in the trace file at a dep greater than zero.  As a matter of fact, the SQL statement of interest is at a dep of one, resulting in space management (and other calls) at a dep of two or greater:

PARSING IN CURSOR #397285536 len=66 dep=1 uid=64 oct=6 lid=64 tim=1879257875778 hv=3592437181 ad='3edea5858' sqlid='1ahnbczb20gdx'
UPDATE T2 SET PADDING1 = RPAD('y',4000,'y') WHERE OBJECT_ID = :B1 
END OF STMT
PARSE #397285536:c=0,e=169,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=1,plh=0,tim=1879257875778

I executed the test case script on 64 bit Windows with Oracle Database 11.2.0.2 with patch 5 installed.  I processed the resulting trace file with TKPROF from the 32 bit Windows Oracle 11.2.0.1 client with the ODBC update patch 10155837 installed.  My TKPROF output for the UPDATE statement showed the following:

SQL ID: 1ahnbczb20gdx
Plan Hash: 2722410703
UPDATE T2 SET PADDING1 = RPAD('y',4000,'y') 
WHERE
 OBJECT_ID = :B1

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute    600    197.38     197.60        452   50299237     426582       50000
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total      601    197.38     197.60        452   50299237     426582       50000

Misses in library cache during parse: 1
Misses in library cache during execute: 1
Optimizer mode: ALL_ROWS
Parsing user id: 64     (recursive depth: 1)
Number of plan statistics captured: 1

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         0          0          0  UPDATE  T2 (cr=80409 pr=436 pw=0 time=357525 us)
       100        100        100   TABLE ACCESS FULL T2 (cr=80182 pr=420 pw=0 time=337295 us cost=116 size=16120 card=8)
         0          0          0  UPDATE  T2 (cr=83795 pr=0 pw=0 time=344133 us)
       100        100        100   TABLE ACCESS FULL T2 (cr=83598 pr=0 pw=0 time=332180 us cost=116 size=16120 card=8)
         0          0          0  UPDATE  T2 (cr=83790 pr=2 pw=0 time=303134 us)
       100        100        100   TABLE ACCESS FULL T2 (cr=83596 pr=0 pw=0 time=288815 us cost=116 size=16120 card=8)
         0          0          0  UPDATE  T2 (cr=83789 pr=2 pw=0 time=283738 us)
       100        100        100   TABLE ACCESS FULL T2 (cr=83595 pr=0 pw=0 time=271190 us cost=116 size=16120 card=8)
...
         0          0          0  UPDATE  T2 (cr=87900 pr=0 pw=0 time=375297 us)
         0          0          0   TABLE ACCESS FULL T2 (cr=87900 pr=0 pw=0 time=375015 us cost=116 size=16120 card=8)

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  db file sequential read                        36        0.00          0.01
  db file scattered read                         31        0.00          0.01

I decided to compare the output of TKPROF with the output of my Hyper-Extended Oracle Performance Monitor program, which I know does not attempt to resolve parent and child recursive statistics:

Statement Depth 1 (Trigger Code)
Cursor 220   Ver 1   Parse at 0.254283 SQL_ID='1ahnbczb20gdx'  (TD Prev 0.000877)  Similar Cnt 1
|PARSEs       1|CPU S    0.000000|CLOCK S    0.000169|ROWs        0|PHY RD BLKs         0|CON RD BLKs (Mem)         0|CUR RD BLKs (Mem)         0|SHARED POOL MISs      1|
|EXECs      600|CPU S  197.403668|CLOCK S  197.659011|ROWs    50000|PHY RD BLKs       774|CON RD BLKs (Mem)  50299674|CUR RD BLKs (Mem)    426582|SHARED POOL MISs      1|
|FETCHs       0|CPU S    0.000000|CLOCK S    0.000000|ROWs        0|PHY RD BLKs         0|CON RD BLKs (Mem)         0|CUR RD BLKs (Mem)         0|SHARED POOL MISs      0|
  CPU S 25.08%  CLOCK S 25.01%
|               +++++||               +++++|
UPDATE T2 SET PADDING1 = RPAD('y',4000,'y') WHERE OBJECT_ID = :B1

       (Rows 0)   UPDATE  T2 (cr=80409 pr=436 pw=0 time=357525 us)
     (Rows 100)    TABLE ACCESS FULL T2 (cr=80182 pr=420 pw=0 time=337295 us cost=116 size=16120 card=8)

       (Rows 0)   UPDATE  T2 (cr=83795 pr=0 pw=0 time=344133 us)
     (Rows 100)    TABLE ACCESS FULL T2 (cr=83598 pr=0 pw=0 time=332180 us cost=116 size=16120 card=8)
       (Rows 0)   UPDATE  T2 (cr=83790 pr=2 pw=0 time=303134 us)
     (Rows 100)    TABLE ACCESS FULL T2 (cr=83596 pr=0 pw=0 time=288815 us cost=116 size=16120 card=8)
...

Notice that the TKPROF summary shows 0.059011 fewer elapsed seconds for this SQL statement, 0.023668 fewer CPU seconds, 322 fewer blocks read from disk, 437 fewer consistent gets, and the same number of current mode reads.  Why the differences?

My program wrote the wait event details for the UPDATE statement to an Excel spreadsheet.  Examination of the wait events indicated that there were 36 db file sequential read waits (thus 36 total blocks read from disk), and 31 db file scattered read waits (416 total blocks read from disk) – the wait event details support the statistics output by TKPROF.  So, maybe the double counting of recursive calls was fixed since the release of Oracle Database 9.2.  Nice!  But we are not done yet, we still do not have the index on table T2, as was described by the OP.

Locate in the script the following line:

--CREATE UNIQUE INDEX IND_T2_OBJ ON T2(OBJECT_ID);

Change that line by removing the at the start of the line, so that the line appears like this:

CREATE UNIQUE INDEX IND_T2_OBJ ON T2(OBJECT_ID);

Disconnect from the database, then reconnect (so that you will be assigned a different default trace filename).  Now, re-execute the script, and process the resulting trace file with TKPROF.  My TKPROF output for the UPDATE SQL statement looks like this:

SQL ID: 1ahnbczb20gdx
Plan Hash: 1751280057
UPDATE T2 SET PADDING1 = RPAD('y',4000,'y') 
WHERE
 OBJECT_ID = :B1 

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute    190      1.35       1.42        348      52552     165435       19000
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total      191      1.35       1.42        348      52552     165435       19000

Misses in library cache during parse: 1
Misses in library cache during execute: 1
Optimizer mode: ALL_ROWS
Parsing user id: 64     (recursive depth: 1)
Number of plan statistics captured: 1

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         0          0          0  UPDATE  T2 (cr=332 pr=30 pw=0 time=19223 us)
       100        100        100   INDEX UNIQUE SCAN IND_T2_OBJ (cr=105 pr=8 pw=0 time=1372 us cost=1 size=2015 card=1)(object id 103855)
         0          0          0  UPDATE  T2 (cr=288 pr=2 pw=0 time=8629 us)
       100        100        100   INDEX UNIQUE SCAN IND_T2_OBJ (cr=105 pr=0
...
         0          0          0  UPDATE  T2 (cr=291 pr=0 pw=0 time=6051 us)
       100        100        100   INDEX UNIQUE SCAN IND_T2_OBJ (cr=105 pr=0 pw=0 time=430 us cost=1 size=2015 card=1)(object id 103855)

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  db file scattered read                         40        0.00          0.03
  db file sequential read                        32        0.00          0.01
  log file switch completion                      2        0.00          0.01

For the sake of comparison, I also processed the trace file using my Hyper-Extended Oracle Performance Monitor program.  This is my program’s output:

Statement Depth 1 (Trigger Code)
Cursor 910   Ver 1   Parse at 0.187068 SQL_ID='1ahnbczb20gdx'  (TD Prev 0.000996)  Similar Cnt 1
|PARSEs       1|CPU S    0.000000|CLOCK S    0.000162|ROWs        0|PHY RD BLKs         0|CON RD BLKs (Mem)         0|CUR RD BLKs (Mem)         0|SHARED POOL MISs      1|
|EXECs      600|CPU S    3.603618|CLOCK S    3.683344|ROWs    50000|PHY RD BLKs       912|CON RD BLKs (Mem)    144826|CUR RD BLKs (Mem)    426496|SHARED POOL MISs      1|
|FETCHs       0|CPU S    0.000000|CLOCK S    0.000000|ROWs        0|PHY RD BLKs         0|CON RD BLKs (Mem)         0|CUR RD BLKs (Mem)         0|SHARED POOL MISs      0|
  CPU S 16.91%  CLOCK S 15.45%
|                 +++||                 +++|
UPDATE T2 SET PADDING1 = RPAD('y',4000,'y') WHERE OBJECT_ID = :B1

       (Rows 0)   UPDATE  T2 (cr=332 pr=30 pw=0 time=19223 us)
     (Rows 100)    INDEX UNIQUE SCAN IND_T2_OBJ (cr=105 pr=8 pw=0 time=1372 us cost=1 size=2015 card=1)

       (Rows 0)   UPDATE  T2 (cr=288 pr=2 pw=0 time=8629 us)
     (Rows 100)    INDEX UNIQUE SCAN IND_T2_OBJ (cr=105 pr=0 pw=0 time=430 us cost=1 size=2015 card=1)
...
       (Rows 0)   UPDATE  T2 (cr=105 pr=0 pw=0 time=193 us)
       (Rows 0)    INDEX UNIQUE SCAN IND_T2_OBJ (cr=105 pr=0 pw=0 time=153 us cost=1 size=2015 card=1)

Notice that there is now a significant difference in the elapsed time at 2.263344 seconds difference, CPU seconds at a difference of 2.253618 seconds, and the number of blocks read from disk at a difference of 564 blocks.

Let’s take a look at the wait events that my program wrote to Excel.   Examination of the wait events indicated that there were 42 db file sequential read waits (thus 42 total blocks read from disk), and 107 db file scattered read waits (842 total blocks read from disk), for a grand total of 884 blocks read from disk that are associated with this UPDATE SQL statement.  This time, not only are the TKPROF statistics not supported by the raw wait events, but the wait event summary included in the TKPROF summary also is not supported by the raw wait events.  Where did those number from TKPROF originate?

Well, let’s just process the 10046 trace file using TKPROF from 11.2.0.2 to confirm the output:

SQL ID: 1ahnbczb20gdx Plan Hash: 1751280057

UPDATE T2 SET PADDING1 = RPAD('y',4000,'y') 
WHERE
 OBJECT_ID = :B1 

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute    600      3.57       3.64        884     144451     426496       50000
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total      601      3.57       3.64        884     144451     426496       50000

Misses in library cache during parse: 1
Misses in library cache during execute: 1
Optimizer mode: ALL_ROWS
Parsing user id: 64     (recursive depth: 1)
Number of plan statistics captured: 600

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         0          0          0  UPDATE  T2 (cr=241 pr=1 pw=0 time=5550 us)
       100         83        100   INDEX UNIQUE SCAN IND_T2_OBJ (cr=106 pr=0 pw=0 time=425 us cost=1 size=2015 card=1)(object id 103855)

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  db file scattered read                        107        0.00          0.06
  db file sequential read                        42        0.00          0.01
  log file switch completion                      2        0.00          0.01

Those TKPROF numbers appear quite different; translation – it you do not like the numbers produced by TKPROF, just use a different version of TKPROF.  🙂

It appears that TKPROF in 11.2.0.1 potentially lies when reporting on a trace file generated by Oracle Database 11.2.0.2.  I wonder if this problem is caused by the large cursor number in 11.2.0.2+, or if TKPROF in 11.2.0.1 also has the same problem with trace files generated by Oracle Database 11.2.0.1?

Is anyone able to reproduce the problem reported by the OP?

If you wish to experiment with my trace file and TKPROF summaries, you may download the following file:

or1122p_ora_1808.zip

When you download the file, rename the download from or1122p_ora_1808-zip.doc to or1122p_ora_1808.zip.  The enclosed or1122p_ora_1808.trc file is the raw 10446 trace file, or1122p_ora_1808.txt is the 11.2.0.1 TKPROF generated output file, and or1122p_ora_1808-2.txt is the 11.2.0.2 generated output file.


Actions

Information

7 responses

22 05 2012
Mich Talebzadeh

The problem seems to be multiple statements in PL/SQL black. If I just did the update (thus excluding DELETES AND INSERTS), I get the following output.

SQL ID: 5qvgta8arnhya
Plan Hash: 2666452017
UPDATE TESTWRITES SET PADDING1 = RPAD('y',4000,'y')
WHERE
 OBJECT_ID = :B1


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute  17293     29.44    1070.20    1126609    2891927    8091191     1000000
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total    17294     29.44    1070.20    1126609    2891927    8091191     1000000

Misses in library cache during parse: 1
Misses in library cache during execute: 1
Optimizer mode: ALL_ROWS
Parsing user id: 93     (recursive depth: 1)

Rows     Row Source Operation
-------  ---------------------------------------------------
      0  UPDATE  TESTWRITES (cr=214 pr=426 pw=0 time=0 us)
    100   INDEX UNIQUE SCAN TESTWRITES_PK (cr=110 pr=3 pw=0 time=0 us cost=2 size=4007 card=1)(object id 92592)


Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  db file sequential read                    686458        1.00        741.53
  Disk file operations I/O                        2        0.00          0.00
  db file scattered read                      55783        1.16        147.49
  log file switch completion                      5        0.15          0.47
  log file switch (private strand flush incomplete)
                                                  3        0.21          0.41
********************************************************************************

which gives Total Waited as 889.9 which looks more reasonable compared to the elasped time of 1070.20 sec. However, it begs the question if there is bug with the version of tkprof I am using (Release 11.2.0.1.0). I guess I can go directly to the trace file and use UNIX sed and awk to extract and sum data. However, that defeats the object. Well the issue we saw before is not a quantization error. Well I do not think one can attibute 49% differences to error in meassurement as in the original case.

Cheers,

Mich

22 05 2012
Charles Hooper

Mich,

Thank you for the follow up.

Is there any way that you can pass the original trace file through tkprof from 11.2.0.2 or 11.2.0.3 so that you can confirm that there was a bug in 11.2.0.1’s tkprof? If you download the zip file at the end of my blog article (https://hoopercharles.files.wordpress.com/2012/05/or1122p_ora_1808-zip.doc ) and process the enclosed .trc file with tkprof, do you obtain the incorrect statistics from 11.2.0.1 shown above, or do you obtain the correct statistics from 11.2.0.2 above?

A search through Metalink (MOS) found an interesting bug fix in Exadata 11.2.0.1 bundle patch 7: “Bug 8328200 SHRD CRSRS: Tkprof does not aggregate STAT lines for ALL_EXECUTIONS”. If you have access to MOS, you can see the bug report here:
https://supporthtml.oracle.com/epmos/faces/ui/km/DocumentDisplay.jspx?_afrLoop=3303558429470000&id=8328200.8

22 05 2012
Mich Talebzadeh

Charles,

This is the processed tkprof file with source from your zipped trc file

tkprof or1122p_ora_1808.trc mytrace.txt sys=no

SQL ID: 1ahnbczb20gdx
Plan Hash: 1751280057
UPDATE T2 SET PADDING1 = RPAD('y',4000,'y')
WHERE
 OBJECT_ID = :B1


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute    190      1.35       1.42        348      52552     165435       19000
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total      191      1.35       1.42        348      52552     165435       19000
..
..
Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  db file scattered read                         40        0.00          0.03
  db file sequential read                        32        0.00          0.01
  log file switch completion                      2        0.00          0.01
********************************************************************************

Now why am I getting 19000 rows as opposed to 50000 rows?

22 05 2012
Charles Hooper

Mich,

The good news is that what you posted above exactly matches the output that I posted above from the 11.2.0.1 version of tkprof on my system, so there is some consistency. The bad news is that it is consistently bad – the output does NOT match the output from the 11.2.0.2 version of tkprof. The bad news probably extends also to the trace file that you generated and the processed with tkprof, and that probably explains why the figures in the tkprof output from your trace file are not self-consistent.

At this point, I recommend trying to pass your trace file through tkprof from 11.2.0.2 or 11.2.0.3 and compare the output. I have a strong feeling that you are encounting the same bug that you found when you processed my trace file.

It appears that there is a similar topic on the OTN forums. If you find that tkprof from 11.2.0.2 or 11.2.0.3 produces non-self-conflicting output, you might post that information in the OTN thread:
https://forums.oracle.com/forums/thread.jspa?messageID=10343194

22 05 2012
Mich Talebzadeh

Thanks Charles will do that.

Ok that was easy enough. However, I did that test because I wanted to load a typical OLTP simulation for solid state disks versus normal magnetic disks
(SSD vs HDD). I recall a while back you dis some tests on SSD as well. The challenge is whether it is possible for Oracle to write the same volume of data to disk with SSD against HDD.

If I rewrite the code to take AWR snapshots with checkpoints before and after, I should be able to verify the volume of writes.

alter system flush buffer_cache;
alter system flush shared_pool;
ALTER SYSTEM CHECKPOINT;
EXEC DBMS_WORKLOAD_REPOSITORY.CREATE_SNAPSHOT();
DECLARE
        type ObjIdArray is table of tdash.object_id%TYPE index by binary_integer;
        l_ids objIdArray;
        CURSOR c IS SELECT object_id FROM tdash;
BEGIN
        OPEN c;
        LOOP
                BEGIN
                        FETCH c BULK COLLECT INTO l_ids LIMIT 100;

                        FORALL rs in 1 .. l_ids.COUNT
                                UPDATE testwrites
                                  SET PADDING1 =  RPAD('y',4000,'y')
                                WHERE object_id = l_ids(rs);
                        commit;
                        FORALL rs in 1 .. l_ids.COUNT
                                DELETE FROM testwrites
                                WHERE object_id = l_ids(rs);
                        commit;
                        FORALL rs in 1 .. l_ids.COUNT
                                INSERT INTO testwrites
                                SELECT * FROM tdash t WHERE t.object_id = l_ids(rs);
                        commit;
                        EXIT WHEN C%NOTFOUND;
                EXCEPTION
                  WHEN NO_DATA_FOUND THEN
                    NULL;
                  WHEN OTHERS THEN
                    DBMS_OUTPUT.PUT_LINE('Transaction failed');
                END;
        END LOOP;
        CLOSE c;
END;
/
ALTER SYSTEM CHECKPOINT;
EXEC DBMS_WORKLOAD_REPOSITORY.CREATE_SNAPSHOT();

If we run the script against tables on SSD and again HDD it would be expected to see roughly the same volume of writes. However, is that the case? Look below. The first section is about HDD and tablescape is called APP1_HDD

 Tablespace
------------------------------
                 Av       Av     Av                       Av     Buffer  Av Buf
         Reads Reads/s  Rd(ms) Blks/Rd       Writes Writes/s      Waits  Wt(ms)
-------------- ------- ------- ------- ------------ -------- ---------- -------
APP1_HDD
     3,535,159     738     1.2     1.1    1,396,750      292          0     0.0
UNDOTBS1
           104       0     0.2     1.0      513,323      107          0     0.0
SYSTEM
         8,050       2     0.2     1.5          190        0          0     0.0
SYSAUX
         3,736       1     0.2     2.0        1,496        0          0     0.0
APP1
           144       0     0.0     1.0          117        0          0     0.0

Now the same stats for SSD tablespace APP1

Tablespace
------------------------------
                 Av       Av     Av                       Av     Buffer  Av Buf
         Reads Reads/s  Rd(ms) Blks/Rd       Writes Writes/s      Waits  Wt(ms)
-------------- ------- ------- ------- ------------ -------- ---------- -------
APP1
     2,860,666   2,268     0.3     1.1    3,564,644    2,826          0     0.0
UNDOTBS1
            90       0     9.8     1.0    1,990,759    1,578          0     0.0
SYSTEM
         7,190       6     0.2     1.3          204        0          0     0.0
SYSAUX
         1,938       2     0.3     2.1        1,047        1          0     0.0
APP1_HDD
           105       0     0.1     1.0          105        0          0     0.0
 

So I still do not see why the volumes of writes are higher for SSD except perhaps that updates (erase OS blocks) are causing extra writes for SSD.

23 05 2012
Charles Hooper

Mich,

That is an interesting observation. The Oracle Database statistics should NOT be able to see the extra writes to erase blocks on SSD – that should all be hidden in the operating system call to write to disk; Oracle Database would be able to determine the time required for the call to complete, but not the amount of internal disk-level work done.

Is there any chance that one of your test runs could have been performed during the time when an automatic AWR collection was initiated (by default at the start of every hour)?

I think that looking at the number of blocks written by DBWR as shown in the AWR report could cause some confusion. Oracle Database is fundamentally lazy… it procrastinates doing tasks that are time consuming and that do not need to be done immediately. When the commits in the PL/SQL block are executed, the redo log buffer *must* be flushed to disk, but DBWR does not need to flush the commited changes in the blocks to disk immediately. The uncommitted changes in the blocks could also be flushed to disk before a session issues a commit, as additional space is needed in the buffer cache. Throw delayed block cleanout and dynamically resized buffer caches into the mix, and you could find that the same block may be written to disk several times in one of the test runs, but not in the other. It is interesting to note that the number of blocks read from the APP1_HDD tablespace’s datafiles is nearly identical to the number of blocks written to the APP1 tablespace’s datafiles in the other AWR report.

http://docs.oracle.com/cd/E14072_01/server.112/e10713/memory.htm#sthref1580

“The database updates data blocks in the cache and stores metadata about the changes in the redo log buffer. After a COMMIT, the database writes the redo buffers to disk but does not immediately write data blocks to disk. Instead, database writer (DBWn) performs lazy writes in the background.”

You may also enjoy reading the following by Jonathan Lewis:
http://jonathanlewis.wordpress.com/2009/06/16/clean-it-up/
http://books.google.com/books?id=r86URDzs1TAC&pg=PA137 (the book “Oracle Core”, see “PL/SQL Optimization” on page 126, and “Database Writer” on pages 137-156)

When testing performance, you need to research what can change that is not under the control of your experiment. For instance, the server resynonchonrizing its time with an outside time source (NTP) could throw off time-delta calculations, non-instance related load placed on the drives or server CPU or physical memory, operating system level caching (seems to be enabled in your test case), automatic processing in the database (such as automatic statistics collection that typically starts around 10:00 PM) or automatic AWR collection, virus scanner activity (both Windows and Linux servers could be affected if a virus scanner is installed on the server), PL/SQL memory usage from a previous test run leaving Oracle Database believing that there is more competition for PGA memory – thus either automatically growing the PGA memory while decreasing the SGA memory (assuming that MEMORY_TARGET is used) or gradually permitting each session to use a smaller percentage of memory for PGA requirements, delayed block cleanouts caused by previous tests (or other activity) that affect later tests, etc.

In the end, it comes down to timing. After performing the test several times, what is the average elapsed time, and how much of the server’s resources were consumed to achieve that average elapsed time? In my experience, SATA drives in servers seem to drive up kernel mode CPU usage (and may not offer as effective native command queuing) when compared to SCSI (or SAS – serial attached SCSI), so that could also be a factor as you take a look at expanding the test to multiple sessions performing concurrent work at the same time.

Sorry, not a simple answer from me (I might be overlooking something painfully obvious). Maybe one of the other blog readers are able to better interpret the output that you provided.

23 05 2012
Mich Talebzadeh

Thanks again Charles.

I ran the SSD test and new AWR report.

              Snap Id      Snap Time      Sessions Curs/Sess
            --------- ------------------- -------- ---------
Begin Snap:      4565 23-May-12 15:10:56        21       1.6
  End Snap:      4566 23-May-12 15:32:08        25       1.6
   Elapsed:               21.20 (mins)
   DB Time:               21.38 (mins)

Segments by Physical Writes               DB/Inst: MYDB/mydb  Snaps: 4565-4566
-> Total Physical Writes:       7,034,915
-> Captured Segments account for   63.9% of Total

           Tablespace                      Subobject  Obj.      Physical
Owner         Name    Object Name            Name     Type        Writes  %Total
---------- ---------- -------------------- ---------- ----- ------------ -------
SSDTESTER  APP1       TESTWRITES                      TABLE    4,493,979   63.88
SSDTESTER  APP1       TESTWRITES_PK                   INDEX        3,708     .05
SYS        SYSAUX     WRH$_ACTIVE_SESSION_ 49386_4552 TABLE           16     .00
SYS        SYSAUX     WRH$_PARAMETER_PK    49386_4552 INDEX           14     .00
SYS        SYSTEM     SEG$                            TABLE           13     .00
          -------------------------------------------------------------

Segments by Physical Write Requests       DB/Inst: MYDB/mydb  Snaps: 4565-4566
-> Total Physical Write Requestss:       5,735,831
-> Captured Segments account for   63.9% of Total

           Tablespace                      Subobject  Obj.    Phys Write
Owner         Name    Object Name            Name     Type      Requests  %Total
---------- ---------- -------------------- ---------- ----- ------------ -------
SSDTESTER  APP1       TESTWRITES                      TABLE    3,664,810   63.89
SSDTESTER  APP1       TESTWRITES_PK                   INDEX        2,144     .04
SYS        SYSTEM     SEG$                            TABLE           13     .00
SYSMAN     SYSAUX     MGMT_SYSTEM_PERFORMA            TABLE           11     .00
SYS        SYSAUX     WRH$_SQL_PLAN                   TABLE           11     .00
          -------------------------------------------------------------


Tablespace
------------------------------
                 Av       Av     Av                       Av     Buffer  Av Buf
         Reads Reads/s  Rd(ms) Blks/Rd       Writes Writes/s      Waits  Wt(ms)
-------------- ------- ------- ------- ------------ -------- ---------- -------
APP1
     2,837,745   2,230     0.3     1.1    3,667,136    2,882          0     0.0
UNDOTBS1
           104       0     2.4     1.0    2,067,857    1,625          0     0.0
SYSTEM
         5,993       5     0.2     1.3          141        0          2     0.0

Compared to the prtevious HDD

              Snap Id      Snap Time      Sessions Curs/Sess
            --------- ------------------- -------- ---------
Begin Snap:      4417 17-May-12 12:31:47        30       2.7
  End Snap:      4418 17-May-12 13:51:35        32       3.2
   Elapsed:               79.80 (mins)
   DB Time:               81.75 (mins)

Segments by Physical Writes               DB/Inst: MYDB/mydb  Snaps: 4417-4418
-> Total Physical Writes:       6,857,129
-> Captured Segments account for   62.9% of Total

           Tablespace                      Subobject  Obj.      Physical
Owner         Name    Object Name            Name     Type        Writes  %Total
---------- ---------- -------------------- ---------- ----- ------------ -------
TESTER     APP1_HDD   TESTWRITES                      TABLE    4,310,514   62.86
TESTER     APP1_HDD   TESTWRITES_PK                   INDEX        3,730     .05
SYSMAN     SYSAUX     MGMT_METRICS_RAW_PK             INDEX        1,488     .02
SYSMAN     SYSAUX     MGMT_METRICS_1HOUR_P            INDEX          126     .00
SYSMAN     SYSAUX     MGMT_SYSTEM_PERF_LOG            INDEX           66     .00
          -------------------------------------------------------------

Segments by Physical Write Requests       DB/Inst: MYDB/mydb  Snaps: 4417-4418
-> Total Physical Write Requestss:       1,912,076
-> Captured Segments account for   73.1% of Total

           Tablespace                      Subobject  Obj.    Phys Write
Owner         Name    Object Name            Name     Type      Requests  %Total
---------- ---------- -------------------- ---------- ----- ------------ -------
TESTER     APP1_HDD   TESTWRITES                      TABLE    1,395,810   73.00
TESTER     APP1_HDD   TESTWRITES_PK                   INDEX          770     .04
SYSMAN     SYSAUX     MGMT_METRICS_RAW_PK             INDEX          681     .04
SYSMAN     SYSAUX     MGMT_SYSTEM_PERF_LOG            INDEX           48     .00
SYSMAN     SYSAUX     MGMT_METRICS_1HOUR_P            INDEX           41     .00
          -------------------------------------------------------------

Tablespace
------------------------------
                 Av       Av     Av                       Av     Buffer  Av Buf
         Reads Reads/s  Rd(ms) Blks/Rd       Writes Writes/s      Waits  Wt(ms)
-------------- ------- ------- ------- ------------ -------- ---------- -------
APP1_HDD
     3,535,159     738     1.2     1.1    1,396,750      292          0     0.0
UNDOTBS1
           104       0     0.2     1.0      513,323      107          0     0.0
SYSTEM
         8,050       2     0.2     1.5          190        0          0     0.0
SYSAUX
         3,736       1     0.2     2.0        1,496        0          0     0.0

Stiil do not know what the issue could be:

Obviously we don’t have any info as to how many writes the OS does when we ask it to write 8k, we just ask it to write 8k and increment our metric stating that we wrote 8k.

Cheers,

Mich

Leave a reply to Mich Talebzadeh Cancel reply