May 20, 2012
An interesting quirk was recently brought to my attention by Mich Talebzadeh. He generated a 10046 trace at level 8 for a session, executed some SQL statements, disabled the trace, and then processed the resulting trace file with TKPROF. His TKPROF output included the following:
UPDATE TESTWRITES SET PADDING1 = RPAD('y',4000,'y') WHERE OBJECT_ID = :B1 call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 4 0.00 0.00 0 0 0 0 Execute 17293 21.62 1245.41 1064667 2411398 6280588 1000000 Fetch 0 0.00 0.00 0 0 0 0 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 17297 21.62 1245.41 1064667 2411398 6280588 1000000 Misses in library cache during parse: 1 Misses in library cache during execute: 1 Optimizer mode: ALL_ROWS Parsing user id: 93 (recursive depth: 1) Rows Row Source Operation ------- --------------------------------------------------- 0 UPDATE TESTWRITES (cr=214 pr=427 pw=0 time=0 us) 100 INDEX UNIQUE SCAN TESTWRITES_PK (cr=110 pr=3 pw=0 time=0 us cost=2 size=4007 card=1)(object id 92380) Elapsed times include waiting on following events: Event waited on Times Max. Wait Total Waited ---------------------------------------- Waited ---------- ------------ Disk file operations I/O 2 0.00 0.00 db file sequential read 2160263 1.57 2304.22 db file scattered read 27769 0.26 99.78
Mich noticed that the reported elapsed time, at 1,245.41 seconds, is less than the sum of the wait event times, at 2,400.00 seconds. Typically, the sum of the wait events plus the CPU time should be equal to or slightly less than the elapsed time (assuming that parallel query is not used, and the CPU reporting granularity did not cause odd side effects, and the session’s process did not spend much time in the CPU run queue). In this case, the elapsed time is about 51.9% of the wait event time for this SQL statement.
How is this possible? Mich stated the following:
“A comment has been it is due to using different clock granularities.”
Think about that for a minute – what could explain the odd inconsistency in the output?
—
—
—
—
—
—
—
—
I recall reading in the book “Optimizing Oracle Performance” about double-counting of statistics due to recursive database calls. Page 92 of that book includes the following quote:
“In Oracle releases through at least Oracle9i Release 2, a database call’s c, e, p, cr, and cu statistics contain an aggregation of the resources consumed by the database call itself and its entire recursive progeny.”
Page 91 of the same book makes the following statement:
“The rule for determining the recursive relationships among database calls is simple:
A database call with dep=n + 1 is the recursive child of the first subsequent dep=n database call listed in the SQL trace data stream.”
Could it be the case that this particular issue with recursive calls was addressed in later releases of TKPROF? Might that fix explain why the elapsed time is 51.9% of the wait event time?
—
—
—
—
—
—
—
—
The OP provided a partial test case script with a description of how the tables were created. I reduced the size of the original source table from 1,700,000 rows to 60,000 rows, and reduced the secondary source table from 1,000,000 rows to 50,000 rows. My test script follows (note that the line that builds the IND_T2_OBJ index is commented out – we will change that later):
DROP TABLE T1 PURGE; DROP TABLE T2 PURGE; CREATE TABLE T1 AS SELECT * FROM ALL_OBJECTS WHERE ROWNUM<=60000; ALTER TABLE T1 ADD ( PADDING1 VARCHAR2(4000), PADDING2 VARCHAR2(4000)); CREATE TABLE T2 AS SELECT * FROM T1 WHERE ROWNUM<=50000; CREATE UNIQUE INDEX IND_T1_OBJ ON T1(OBJECT_ID); --CREATE UNIQUE INDEX IND_T2_OBJ ON T2(OBJECT_ID); ALTER SYSTEM FLUSH SHARED_POOL; ALTER SYSTEM FLUSH BUFFER_CACHE; ALTER SYSTEM FLUSH BUFFER_CACHE; EXEC DBMS_MONITOR.SESSION_TRACE_ENABLE ( waits=>true, plan_stat=>'ALL_EXECUTIONS' ); DECLARE type ObjIdArray is table of T1.object_id%TYPE index by binary_integer; l_ids objIdArray; CURSOR c IS SELECT object_id FROM T1; BEGIN OPEN c; LOOP BEGIN FETCH c BULK COLLECT INTO l_ids LIMIT 100; FORALL rs in 1 .. l_ids.COUNT UPDATE T2 SET PADDING1 = RPAD('y',4000,'y') WHERE object_id = l_ids(rs); commit; FORALL rs in 1 .. l_ids.COUNT DELETE FROM T2 WHERE object_id = l_ids(rs); commit; FORALL rs in 1 .. l_ids.COUNT INSERT INTO T2 SELECT * FROM T1 t WHERE t.object_id = l_ids(rs); commit; EXIT WHEN C%NOTFOUND; EXCEPTION WHEN NO_DATA_FOUND THEN NULL; WHEN OTHERS THEN DBMS_OUTPUT.PUT_LINE('Transaction failed'); END; END LOOP; CLOSE c; END; / EXEC DBMS_MONITOR.SESSION_TRACE_DISABLE;
That test case definitely contains calls in the trace file at a dep greater than zero. As a matter of fact, the SQL statement of interest is at a dep of one, resulting in space management (and other calls) at a dep of two or greater:
PARSING IN CURSOR #397285536 len=66 dep=1 uid=64 oct=6 lid=64 tim=1879257875778 hv=3592437181 ad='3edea5858' sqlid='1ahnbczb20gdx' UPDATE T2 SET PADDING1 = RPAD('y',4000,'y') WHERE OBJECT_ID = :B1 END OF STMT PARSE #397285536:c=0,e=169,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=1,plh=0,tim=1879257875778
I executed the test case script on 64 bit Windows with Oracle Database 11.2.0.2 with patch 5 installed. I processed the resulting trace file with TKPROF from the 32 bit Windows Oracle 11.2.0.1 client with the ODBC update patch 10155837 installed. My TKPROF output for the UPDATE statement showed the following:
SQL ID: 1ahnbczb20gdx Plan Hash: 2722410703 UPDATE T2 SET PADDING1 = RPAD('y',4000,'y') WHERE OBJECT_ID = :B1 call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 1 0.00 0.00 0 0 0 0 Execute 600 197.38 197.60 452 50299237 426582 50000 Fetch 0 0.00 0.00 0 0 0 0 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 601 197.38 197.60 452 50299237 426582 50000 Misses in library cache during parse: 1 Misses in library cache during execute: 1 Optimizer mode: ALL_ROWS Parsing user id: 64 (recursive depth: 1) Number of plan statistics captured: 1 Rows (1st) Rows (avg) Rows (max) Row Source Operation ---------- ---------- ---------- --------------------------------------------------- 0 0 0 UPDATE T2 (cr=80409 pr=436 pw=0 time=357525 us) 100 100 100 TABLE ACCESS FULL T2 (cr=80182 pr=420 pw=0 time=337295 us cost=116 size=16120 card=8) 0 0 0 UPDATE T2 (cr=83795 pr=0 pw=0 time=344133 us) 100 100 100 TABLE ACCESS FULL T2 (cr=83598 pr=0 pw=0 time=332180 us cost=116 size=16120 card=8) 0 0 0 UPDATE T2 (cr=83790 pr=2 pw=0 time=303134 us) 100 100 100 TABLE ACCESS FULL T2 (cr=83596 pr=0 pw=0 time=288815 us cost=116 size=16120 card=8) 0 0 0 UPDATE T2 (cr=83789 pr=2 pw=0 time=283738 us) 100 100 100 TABLE ACCESS FULL T2 (cr=83595 pr=0 pw=0 time=271190 us cost=116 size=16120 card=8) ... 0 0 0 UPDATE T2 (cr=87900 pr=0 pw=0 time=375297 us) 0 0 0 TABLE ACCESS FULL T2 (cr=87900 pr=0 pw=0 time=375015 us cost=116 size=16120 card=8) Elapsed times include waiting on following events: Event waited on Times Max. Wait Total Waited ---------------------------------------- Waited ---------- ------------ db file sequential read 36 0.00 0.01 db file scattered read 31 0.00 0.01
I decided to compare the output of TKPROF with the output of my Hyper-Extended Oracle Performance Monitor program, which I know does not attempt to resolve parent and child recursive statistics:
Statement Depth 1 (Trigger Code) Cursor 220 Ver 1 Parse at 0.254283 SQL_ID='1ahnbczb20gdx' (TD Prev 0.000877) Similar Cnt 1 |PARSEs 1|CPU S 0.000000|CLOCK S 0.000169|ROWs 0|PHY RD BLKs 0|CON RD BLKs (Mem) 0|CUR RD BLKs (Mem) 0|SHARED POOL MISs 1| |EXECs 600|CPU S 197.403668|CLOCK S 197.659011|ROWs 50000|PHY RD BLKs 774|CON RD BLKs (Mem) 50299674|CUR RD BLKs (Mem) 426582|SHARED POOL MISs 1| |FETCHs 0|CPU S 0.000000|CLOCK S 0.000000|ROWs 0|PHY RD BLKs 0|CON RD BLKs (Mem) 0|CUR RD BLKs (Mem) 0|SHARED POOL MISs 0| CPU S 25.08% CLOCK S 25.01% | +++++|| +++++| UPDATE T2 SET PADDING1 = RPAD('y',4000,'y') WHERE OBJECT_ID = :B1 (Rows 0) UPDATE T2 (cr=80409 pr=436 pw=0 time=357525 us) (Rows 100) TABLE ACCESS FULL T2 (cr=80182 pr=420 pw=0 time=337295 us cost=116 size=16120 card=8) (Rows 0) UPDATE T2 (cr=83795 pr=0 pw=0 time=344133 us) (Rows 100) TABLE ACCESS FULL T2 (cr=83598 pr=0 pw=0 time=332180 us cost=116 size=16120 card=8) (Rows 0) UPDATE T2 (cr=83790 pr=2 pw=0 time=303134 us) (Rows 100) TABLE ACCESS FULL T2 (cr=83596 pr=0 pw=0 time=288815 us cost=116 size=16120 card=8) ...
Notice that the TKPROF summary shows 0.059011 fewer elapsed seconds for this SQL statement, 0.023668 fewer CPU seconds, 322 fewer blocks read from disk, 437 fewer consistent gets, and the same number of current mode reads. Why the differences?
My program wrote the wait event details for the UPDATE statement to an Excel spreadsheet. Examination of the wait events indicated that there were 36 db file sequential read waits (thus 36 total blocks read from disk), and 31 db file scattered read waits (416 total blocks read from disk) – the wait event details support the statistics output by TKPROF. So, maybe the double counting of recursive calls was fixed since the release of Oracle Database 9.2. Nice! But we are not done yet, we still do not have the index on table T2, as was described by the OP.
Locate in the script the following line:
--CREATE UNIQUE INDEX IND_T2_OBJ ON T2(OBJECT_ID);
Change that line by removing the — at the start of the line, so that the line appears like this:
CREATE UNIQUE INDEX IND_T2_OBJ ON T2(OBJECT_ID);
Disconnect from the database, then reconnect (so that you will be assigned a different default trace filename). Now, re-execute the script, and process the resulting trace file with TKPROF. My TKPROF output for the UPDATE SQL statement looks like this:
SQL ID: 1ahnbczb20gdx Plan Hash: 1751280057 UPDATE T2 SET PADDING1 = RPAD('y',4000,'y') WHERE OBJECT_ID = :B1 call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 1 0.00 0.00 0 0 0 0 Execute 190 1.35 1.42 348 52552 165435 19000 Fetch 0 0.00 0.00 0 0 0 0 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 191 1.35 1.42 348 52552 165435 19000 Misses in library cache during parse: 1 Misses in library cache during execute: 1 Optimizer mode: ALL_ROWS Parsing user id: 64 (recursive depth: 1) Number of plan statistics captured: 1 Rows (1st) Rows (avg) Rows (max) Row Source Operation ---------- ---------- ---------- --------------------------------------------------- 0 0 0 UPDATE T2 (cr=332 pr=30 pw=0 time=19223 us) 100 100 100 INDEX UNIQUE SCAN IND_T2_OBJ (cr=105 pr=8 pw=0 time=1372 us cost=1 size=2015 card=1)(object id 103855) 0 0 0 UPDATE T2 (cr=288 pr=2 pw=0 time=8629 us) 100 100 100 INDEX UNIQUE SCAN IND_T2_OBJ (cr=105 pr=0 ... 0 0 0 UPDATE T2 (cr=291 pr=0 pw=0 time=6051 us) 100 100 100 INDEX UNIQUE SCAN IND_T2_OBJ (cr=105 pr=0 pw=0 time=430 us cost=1 size=2015 card=1)(object id 103855) Elapsed times include waiting on following events: Event waited on Times Max. Wait Total Waited ---------------------------------------- Waited ---------- ------------ db file scattered read 40 0.00 0.03 db file sequential read 32 0.00 0.01 log file switch completion 2 0.00 0.01
For the sake of comparison, I also processed the trace file using my Hyper-Extended Oracle Performance Monitor program. This is my program’s output:
Statement Depth 1 (Trigger Code) Cursor 910 Ver 1 Parse at 0.187068 SQL_ID='1ahnbczb20gdx' (TD Prev 0.000996) Similar Cnt 1 |PARSEs 1|CPU S 0.000000|CLOCK S 0.000162|ROWs 0|PHY RD BLKs 0|CON RD BLKs (Mem) 0|CUR RD BLKs (Mem) 0|SHARED POOL MISs 1| |EXECs 600|CPU S 3.603618|CLOCK S 3.683344|ROWs 50000|PHY RD BLKs 912|CON RD BLKs (Mem) 144826|CUR RD BLKs (Mem) 426496|SHARED POOL MISs 1| |FETCHs 0|CPU S 0.000000|CLOCK S 0.000000|ROWs 0|PHY RD BLKs 0|CON RD BLKs (Mem) 0|CUR RD BLKs (Mem) 0|SHARED POOL MISs 0| CPU S 16.91% CLOCK S 15.45% | +++|| +++| UPDATE T2 SET PADDING1 = RPAD('y',4000,'y') WHERE OBJECT_ID = :B1 (Rows 0) UPDATE T2 (cr=332 pr=30 pw=0 time=19223 us) (Rows 100) INDEX UNIQUE SCAN IND_T2_OBJ (cr=105 pr=8 pw=0 time=1372 us cost=1 size=2015 card=1) (Rows 0) UPDATE T2 (cr=288 pr=2 pw=0 time=8629 us) (Rows 100) INDEX UNIQUE SCAN IND_T2_OBJ (cr=105 pr=0 pw=0 time=430 us cost=1 size=2015 card=1) ... (Rows 0) UPDATE T2 (cr=105 pr=0 pw=0 time=193 us) (Rows 0) INDEX UNIQUE SCAN IND_T2_OBJ (cr=105 pr=0 pw=0 time=153 us cost=1 size=2015 card=1)
Notice that there is now a significant difference in the elapsed time at 2.263344 seconds difference, CPU seconds at a difference of 2.253618 seconds, and the number of blocks read from disk at a difference of 564 blocks.
Let’s take a look at the wait events that my program wrote to Excel. Examination of the wait events indicated that there were 42 db file sequential read waits (thus 42 total blocks read from disk), and 107 db file scattered read waits (842 total blocks read from disk), for a grand total of 884 blocks read from disk that are associated with this UPDATE SQL statement. This time, not only are the TKPROF statistics not supported by the raw wait events, but the wait event summary included in the TKPROF summary also is not supported by the raw wait events. Where did those number from TKPROF originate?
—
—
—
—
—
Well, let’s just process the 10046 trace file using TKPROF from 11.2.0.2 to confirm the output:
SQL ID: 1ahnbczb20gdx Plan Hash: 1751280057 UPDATE T2 SET PADDING1 = RPAD('y',4000,'y') WHERE OBJECT_ID = :B1 call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 1 0.00 0.00 0 0 0 0 Execute 600 3.57 3.64 884 144451 426496 50000 Fetch 0 0.00 0.00 0 0 0 0 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 601 3.57 3.64 884 144451 426496 50000 Misses in library cache during parse: 1 Misses in library cache during execute: 1 Optimizer mode: ALL_ROWS Parsing user id: 64 (recursive depth: 1) Number of plan statistics captured: 600 Rows (1st) Rows (avg) Rows (max) Row Source Operation ---------- ---------- ---------- --------------------------------------------------- 0 0 0 UPDATE T2 (cr=241 pr=1 pw=0 time=5550 us) 100 83 100 INDEX UNIQUE SCAN IND_T2_OBJ (cr=106 pr=0 pw=0 time=425 us cost=1 size=2015 card=1)(object id 103855) Elapsed times include waiting on following events: Event waited on Times Max. Wait Total Waited ---------------------------------------- Waited ---------- ------------ db file scattered read 107 0.00 0.06 db file sequential read 42 0.00 0.01 log file switch completion 2 0.00 0.01
–
Those TKPROF numbers appear quite different; translation – it you do not like the numbers produced by TKPROF, just use a different version of TKPROF. 🙂
It appears that TKPROF in 11.2.0.1 potentially lies when reporting on a trace file generated by Oracle Database 11.2.0.2. I wonder if this problem is caused by the large cursor number in 11.2.0.2+, or if TKPROF in 11.2.0.1 also has the same problem with trace files generated by Oracle Database 11.2.0.1?
Is anyone able to reproduce the problem reported by the OP?
—
If you wish to experiment with my trace file and TKPROF summaries, you may download the following file:
When you download the file, rename the download from or1122p_ora_1808-zip.doc to or1122p_ora_1808.zip. The enclosed or1122p_ora_1808.trc file is the raw 10446 trace file, or1122p_ora_1808.txt is the 11.2.0.1 TKPROF generated output file, and or1122p_ora_1808-2.txt is the 11.2.0.2 generated output file.
The problem seems to be multiple statements in PL/SQL black. If I just did the update (thus excluding DELETES AND INSERTS), I get the following output.
which gives Total Waited as 889.9 which looks more reasonable compared to the elasped time of 1070.20 sec. However, it begs the question if there is bug with the version of tkprof I am using (Release 11.2.0.1.0). I guess I can go directly to the trace file and use UNIX sed and awk to extract and sum data. However, that defeats the object. Well the issue we saw before is not a quantization error. Well I do not think one can attibute 49% differences to error in meassurement as in the original case.
Cheers,
Mich
Mich,
Thank you for the follow up.
Is there any way that you can pass the original trace file through tkprof from 11.2.0.2 or 11.2.0.3 so that you can confirm that there was a bug in 11.2.0.1’s tkprof? If you download the zip file at the end of my blog article (https://hoopercharles.files.wordpress.com/2012/05/or1122p_ora_1808-zip.doc ) and process the enclosed .trc file with tkprof, do you obtain the incorrect statistics from 11.2.0.1 shown above, or do you obtain the correct statistics from 11.2.0.2 above?
A search through Metalink (MOS) found an interesting bug fix in Exadata 11.2.0.1 bundle patch 7: “Bug 8328200 SHRD CRSRS: Tkprof does not aggregate STAT lines for ALL_EXECUTIONS”. If you have access to MOS, you can see the bug report here:
https://supporthtml.oracle.com/epmos/faces/ui/km/DocumentDisplay.jspx?_afrLoop=3303558429470000&id=8328200.8
Charles,
This is the processed tkprof file with source from your zipped trc file
Now why am I getting 19000 rows as opposed to 50000 rows?
Mich,
The good news is that what you posted above exactly matches the output that I posted above from the 11.2.0.1 version of tkprof on my system, so there is some consistency. The bad news is that it is consistently bad – the output does NOT match the output from the 11.2.0.2 version of tkprof. The bad news probably extends also to the trace file that you generated and the processed with tkprof, and that probably explains why the figures in the tkprof output from your trace file are not self-consistent.
At this point, I recommend trying to pass your trace file through tkprof from 11.2.0.2 or 11.2.0.3 and compare the output. I have a strong feeling that you are encounting the same bug that you found when you processed my trace file.
It appears that there is a similar topic on the OTN forums. If you find that tkprof from 11.2.0.2 or 11.2.0.3 produces non-self-conflicting output, you might post that information in the OTN thread:
https://forums.oracle.com/forums/thread.jspa?messageID=10343194
Thanks Charles will do that.
Ok that was easy enough. However, I did that test because I wanted to load a typical OLTP simulation for solid state disks versus normal magnetic disks
(SSD vs HDD). I recall a while back you dis some tests on SSD as well. The challenge is whether it is possible for Oracle to write the same volume of data to disk with SSD against HDD.
If I rewrite the code to take AWR snapshots with checkpoints before and after, I should be able to verify the volume of writes.
If we run the script against tables on SSD and again HDD it would be expected to see roughly the same volume of writes. However, is that the case? Look below. The first section is about HDD and tablescape is called APP1_HDD
Now the same stats for SSD tablespace APP1
So I still do not see why the volumes of writes are higher for SSD except perhaps that updates (erase OS blocks) are causing extra writes for SSD.
Mich,
That is an interesting observation. The Oracle Database statistics should NOT be able to see the extra writes to erase blocks on SSD – that should all be hidden in the operating system call to write to disk; Oracle Database would be able to determine the time required for the call to complete, but not the amount of internal disk-level work done.
Is there any chance that one of your test runs could have been performed during the time when an automatic AWR collection was initiated (by default at the start of every hour)?
I think that looking at the number of blocks written by DBWR as shown in the AWR report could cause some confusion. Oracle Database is fundamentally lazy… it procrastinates doing tasks that are time consuming and that do not need to be done immediately. When the commits in the PL/SQL block are executed, the redo log buffer *must* be flushed to disk, but DBWR does not need to flush the commited changes in the blocks to disk immediately. The uncommitted changes in the blocks could also be flushed to disk before a session issues a commit, as additional space is needed in the buffer cache. Throw delayed block cleanout and dynamically resized buffer caches into the mix, and you could find that the same block may be written to disk several times in one of the test runs, but not in the other. It is interesting to note that the number of blocks read from the APP1_HDD tablespace’s datafiles is nearly identical to the number of blocks written to the APP1 tablespace’s datafiles in the other AWR report.
http://docs.oracle.com/cd/E14072_01/server.112/e10713/memory.htm#sthref1580
You may also enjoy reading the following by Jonathan Lewis:
http://jonathanlewis.wordpress.com/2009/06/16/clean-it-up/
http://books.google.com/books?id=r86URDzs1TAC&pg=PA137 (the book “Oracle Core”, see “PL/SQL Optimization” on page 126, and “Database Writer” on pages 137-156)
When testing performance, you need to research what can change that is not under the control of your experiment. For instance, the server resynonchonrizing its time with an outside time source (NTP) could throw off time-delta calculations, non-instance related load placed on the drives or server CPU or physical memory, operating system level caching (seems to be enabled in your test case), automatic processing in the database (such as automatic statistics collection that typically starts around 10:00 PM) or automatic AWR collection, virus scanner activity (both Windows and Linux servers could be affected if a virus scanner is installed on the server), PL/SQL memory usage from a previous test run leaving Oracle Database believing that there is more competition for PGA memory – thus either automatically growing the PGA memory while decreasing the SGA memory (assuming that MEMORY_TARGET is used) or gradually permitting each session to use a smaller percentage of memory for PGA requirements, delayed block cleanouts caused by previous tests (or other activity) that affect later tests, etc.
In the end, it comes down to timing. After performing the test several times, what is the average elapsed time, and how much of the server’s resources were consumed to achieve that average elapsed time? In my experience, SATA drives in servers seem to drive up kernel mode CPU usage (and may not offer as effective native command queuing) when compared to SCSI (or SAS – serial attached SCSI), so that could also be a factor as you take a look at expanding the test to multiple sessions performing concurrent work at the same time.
Sorry, not a simple answer from me (I might be overlooking something painfully obvious). Maybe one of the other blog readers are able to better interpret the output that you provided.
Thanks again Charles.
I ran the SSD test and new AWR report.
Compared to the prtevious HDD
Stiil do not know what the issue could be:
Obviously we don’t have any info as to how many writes the OS does when we ask it to write 8k, we just ask it to write 8k and increment our metric stating that we wrote 8k.
Cheers,
Mich