April 14, 2011
I find interesting details from time to time about Oracle Database. I am in the process of organizing material for a presentation, and I encountered something interesting when testing Oracle Database 11.2.0.2 with patch 3 applied. I traced the execution of an application and then processed the 660MB trace file using a trace file profiler that I created. A portion of my trace file profiler’s output looks like this:
Statement Depth 0 (Application Code) Cursor 27 Ver 1 Parse at 170.113677 SQL_ID='aw9mpsx4pyta2' (TD Prev 25.179488) Similar Cnt 1 |PARSEs 1|CPU S 0.000000|CLOCK S 0.000491|ROWs 0|PHY RD BLKs 0|CON RD BLKs (Mem) 0|CUR RD BLKs (Mem) 0|SHARED POOL MISs 1| |EXECs 82848|CPU S 25.849376|CLOCK S 26.046921|ROWs 82848|PHY RD BLKs 560|CON RD BLKs (Mem) 248703|CUR RD BLKs (Mem) 84702|SHARED POOL MISs 1| |FETCHs 0|CPU S 0.000000|CLOCK S 0.000000|ROWs 0|PHY RD BLKs 0|CON RD BLKs (Mem) 0|CUR RD BLKs (Mem) 0|SHARED POOL MISs 0| CPU S 20.68% CLOCK S 20.47% | ++++|| ++++| UPDATE REQUIREMENT SET REQUIRED_DATE = :1 WHERE WORKORDER_TYPE = :2 AND WORKORDER_BASE_ID = :3 AND WORKORDER_SUB_ID = :4 AND WORKORDER_LOT_ID = :5 AND WORKORDER_SPLIT_ID = :6 AND OPERATION_SEQ_NO = :7 AND PIECE_NO = :8 (Rows 0) UPDATE REQUIREMENT (cr=3 pr=2 pw=0 time=861 us) (Rows 1) INDEX UNIQUE SCAN SYS_C0025561 (cr=3 pr=2 pw=0 time=754 us cost=2 size=33 card=1
The above output shows that this SQL statement, which was executed 82,848 times, consumed a total of 25.85 seconds of CPU time and 26.05 second for the elapsed time. This SQL statement accounted for 20.68% of the total CPU consumption found in the trace file, and 20.47% of the total elapsed time found in the trace file. The execution plan displayed is from the first execution of the SQL statement, which is output by default after the first execution starting with Oracle Database 11.1. Since this SQL statement consumed twice as much CPU and elapsed time as the second highest consumer, I thought that I would take a look in the raw 10046 trace file to see what is happening. Among other things, I was attempting to determine if there is a time consuming row-level trigger that is firing on each execution, which then caused another row-level trigger to fire.
Here is a portion of that trace file which shows two executions of the SQL statement:
PARSING IN CURSOR #435118472 len=219 dep=0 uid=70 oct=6 lid=70 tim=150927664059 hv=1230988610 ad='466efdfb8' sqlid='aw9mpsx4pyta2' UPDATE REQUIREMENT SET REQUIRED_DATE = :1 WHERE WORKORDER_TYPE = :2 AND WORKORDER_BASE_ID = :3 AND WORKORDER_SUB_ID = :4 AND WORKORDER_LOT_ID = :5 AND WORKORDER_SPLIT_ID = :6 AND OPERATION_SEQ_NO = :7 AND PIECE_NO = :8 END OF STMT PARSE #435118472:c=0,e=491,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,plh=0,tim=150927664059 WAIT #435118472: nam='SQL*Net message to client' ela= 2 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=150927664110 WAIT #435118472: nam='SQL*Net message from client' ela= 364 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=150927664488 ... WAIT #435118472: nam='SQL*Net message to client' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=81989 tim=150927704272 WAIT #435118472: nam='SQL*Net message from client' ela= 284 driver id=1413697536 #bytes=1 p3=0 obj#=81989 tim=150927704573 BINDS #435118472: Bind#0 oacdty=96 mxl=32(16) mxlc=00 mal=00 scl=00 pre=00 oacflg=01 fl2=1000000 frm=01 csi=178 siz=256 off=0 kxsbbbfp=0532a410 bln=32 avl=16 flg=05 value="2011-04-20-0:0:0" Bind#1 oacdty=96 mxl=32(01) mxlc=00 mal=00 scl=00 pre=00 oacflg=01 fl2=1000000 frm=01 csi=178 siz=0 off=32 kxsbbbfp=0532a430 bln=32 avl=01 flg=01 value="W" Bind#2 oacdty=96 mxl=32(05) mxlc=00 mal=00 scl=00 pre=00 oacflg=01 fl2=1000000 frm=01 csi=178 siz=0 off=64 kxsbbbfp=0532a450 bln=32 avl=05 flg=01 value="16750" Bind#3 oacdty=96 mxl=32(03) mxlc=00 mal=00 scl=00 pre=00 oacflg=01 fl2=1000000 frm=01 csi=178 siz=0 off=96 kxsbbbfp=0532a470 bln=32 avl=03 flg=01 value="200" Bind#4 oacdty=96 mxl=32(01) mxlc=00 mal=00 scl=00 pre=00 oacflg=01 fl2=1000000 frm=01 csi=178 siz=0 off=128 kxsbbbfp=0532a490 bln=32 avl=01 flg=01 value="1" Bind#5 oacdty=96 mxl=32(01) mxlc=00 mal=00 scl=00 pre=00 oacflg=01 fl2=1000000 frm=01 csi=178 siz=0 off=160 kxsbbbfp=0532a4b0 bln=32 avl=01 flg=01 value="0" Bind#6 oacdty=96 mxl=32(02) mxlc=00 mal=00 scl=00 pre=00 oacflg=01 fl2=1000000 frm=01 csi=178 siz=0 off=192 kxsbbbfp=0532a4d0 bln=32 avl=02 flg=01 value="20" Bind#7 oacdty=96 mxl=32(02) mxlc=00 mal=00 scl=00 pre=00 oacflg=01 fl2=1000000 frm=01 csi=178 siz=0 off=224 kxsbbbfp=0532a4f0 bln=32 avl=02 flg=01 value="30" EXEC #435118472:c=0,e=300,p=0,cr=3,cu=1,mis=0,r=1,dep=0,og=1,plh=755874150,tim=150927704902 WAIT #435118472: nam='SQL*Net message to client' ela= 2 driver id=1413697536 #bytes=1 p3=0 obj#=81989 tim=150927704977 WAIT #435118472: nam='SQL*Net message from client' ela= 177 driver id=1413697536 #bytes=1 p3=0 obj#=81989 tim=150927705171 BINDS #435118472: Bind#0 oacdty=96 mxl=32(16) mxlc=00 mal=00 scl=00 pre=00 oacflg=01 fl2=1000000 frm=01 csi=178 siz=256 off=0 kxsbbbfp=0532a410 bln=32 avl=16 flg=05 value="2011-04-20-0:0:0" Bind#1 oacdty=96 mxl=32(01) mxlc=00 mal=00 scl=00 pre=00 oacflg=01 fl2=1000000 frm=01 csi=178 siz=0 off=32 kxsbbbfp=0532a430 bln=32 avl=01 flg=01 value="W" ... Bind#7 oacdty=96 mxl=32(02) mxlc=00 mal=00 scl=00 pre=00 oacflg=01 fl2=1000000 frm=01 csi=178 siz=0 off=224 kxsbbbfp=0532a4f0 bln=32 avl=02 flg=01 value="40" EXEC #435118472:c=15600,e=510,p=0,cr=3,cu=1,mis=0,r=1,dep=0,og=1,plh=755874150,tim=150927705707 WAIT #435118472: nam='SQL*Net message to client' ela= 2 driver id=1413697536 #bytes=1 p3=0 obj#=81989 tim=150927705800 WAIT #435118472: nam='SQL*Net message from client' ela= 1217 driver id=1413697536 #bytes=1 p3=0 obj#=81989 tim=150927707038
Looking at the above, you may see a couple of odd items. First, the cursor number is not 27 as was indicated in my trace file profiler’s output – it is instead the huge number 435,118,472. Tanel Poder provided an explanation of this change which was introduced in Oracle Database 11.2.0.2 here – that change initially caused a lot of problems for my trace file profiler which in hindsight made the silly assumption that no session would have more than 5,000 or 10,000 open cursors (which would thus result in cursors numbers below 10,000).
What I thought that the casual observer might find interesting is this line in the trace file:
EXEC #435118472:c=15600,e=510,p=0,cr=3,cu=1,mis=0,r=1,dep=0,og=1,plh=755874150,tim=150927705707
The above line indicates that the instance consumed 0.015600 seconds of CPU time in a 0.000510 second time period – a bit of an unusual feat to see the CPU consumption 30 times greater than the elapsed time considering that this SQL statement was not executed in parallel. Is there a clue why this happened? Take a look at the trace file line that shows the previous execution:
EXEC #435118472:c=0,e=300,p=0,cr=3,cu=1,mis=0,r=1,dep=0,og=1,plh=755874150,tim=150927704902
This time the elapsed time is 300 / 0 times longer than the CPU time. So now the brain teaser: if the CPU time is 30 times (15600 / 510) greater than the elapsed time, and then the elapsed time is infinitely (300 / 0) longer than the CPU time, how is it possible that the overall CPU time for this SQL statement is roughly the same as the overall elapsed time? (On a side note, you might wonder if this could compound into odd time reporting errors if different SQL statements are executed in between the executions of this SQL statement.)
Just for fun, here is the TKPROF output for this SQL statement:
SQL ID: aw9mpsx4pyta2 Plan Hash: 755874150 UPDATE REQUIREMENT SET REQUIRED_DATE = :1 WHERE WORKORDER_TYPE = :2 AND WORKORDER_BASE_ID = :3 AND WORKORDER_SUB_ID = :4 AND WORKORDER_LOT_ID = :5 AND WORKORDER_SPLIT_ID = :6 AND OPERATION_SEQ_NO = :7 AND PIECE_NO = :8 call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 1 0.00 0.00 0 0 0 0 Execute 82848 25.84 26.04 556 248550 84702 82848 Fetch 0 0.00 0.00 0 0 0 0 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 82849 25.84 26.04 556 248550 84702 82848 Misses in library cache during parse: 1 Misses in library cache during execute: 1 Optimizer mode: ALL_ROWS Parsing user id: 70 Number of plan statistics captured: 1 Rows (1st) Rows (avg) Rows (max) Row Source Operation ---------- ---------- ---------- --------------------------------------------------- 0 0 0 UPDATE REQUIREMENT (cr=3 pr=2 pw=0 time=861 us) 1 1 1 INDEX UNIQUE SCAN SYS_C0025561 (cr=3 pr=2 pw=0 time=754 us cost=2 size=33 card=1)(object id 81989) Elapsed times include waiting on following events: Event waited on Times Max. Wait Total Waited ---------------------------------------- Waited ---------- ------------ SQL*Net message to client 82849 0.00 0.10 SQL*Net message from client 82849 0.00 19.33 db file sequential read 95 0.00 0.03 db file scattered read 95 0.00 0.04
Interesting.
I am yet not aware of all the changes related to how oracle writes data to trace files so not sure if this is specific to 11g.
But could it be due to the fact (and I don’t know if it is still one in 11g) that CPU time accounts for even the tiny amount
of time whereas Elapsed time might not account for the time, if it is too small, which tends to happen when you have lots
of tiny operations carried out in relatively small amount of time.
In other words, could this be similar to the cases where one might see CPU time to be greater than Elapsed time in TkProf,
due to the difference in the way both are measured?
Narendra,
That is one possibility. However, take a look at the following sequential list (except for the portion after … where I skipped several executions) of EXEC calls for this cursor:
0 and 15,600 (roughly 1/64th of a second) appear to be popular values for the CPU time on an individual EXEC line.
The brain teaser continues.
Hi Charles,
Can you please let me know the patch 3 # so i cna search it on MOS?
Thanks,
Dani
Dani,
I used an advanced patch search on Metalink (MOS) and located the following patches:
11.2.0.2 Windows x64 Patch 3: 11731184 released 2011-03-04 — this is the one I used
11.2.0.2 Windows x64 Patch 4: 11896292 released 2011-04-08
Charles —
Any chance that the resolution of the Windows x64 system timer is larger than the amount of time between each execution? I found an interesting article on MSDN that seems to indicate a 15 ms resolution which, coincidentally, similar to the 15600 us times you are seeing in the trace.
http://msdn.microsoft.com/en-us/magazine/cc163996.aspx
Centinul,
Interesting article, thank you for posting a link. That article lead to other thought processes, which then lead to an interesting SysInternals (Microsoft) utility named ClockRes:
http://technet.microsoft.com/en-us/sysinternals/bb897568
The output on this particular server:
The above output seems to match the behavior found in the 10046 trace file, where the CPU time is either reported as 0 or as 15,600. But that also brings me back to a comment that I made at the end of the article:
One might wonder if we could find a situation where the CPU usage is reported at 0, when it is actually much higher after 100+ executions.
——————–
Edit (roughly 10 minutes later):
Here is the ClockRes output from an old 3.8GHz Pentium 4 running Windows XP:
I am digging through trace files from an AIX system running 11gR2. I have 2 cursors that are logging high amounts of cpu time, but they both look like awfully round numbers. Here are the trace file lines:
====> EXEC #4575888568:c=318740000,e=352279609,p=0,cr=33982302,cu=0,mis=0,r=0,dep=1,og=1,plh=3504418200,tim=75990584749721
====> EXEC #4575992080:c=319670000,e=353631095,p=0,cr=33983344,cu=0,mis=0,r=1,dep=0,og=1,plh=0,tim=75990585978417
It really seems unlikely that the measures for cpu time are precise– there appears to be a granularity to how much time is added. This reminds me that Oracle uses the term ‘quantum’ when referring to waits imposed by Resource Manager. We have that turned on in our database … do you have it turned on in yours, by any chance?
I already checked for an LCD among my values and your magic 15,600. They are all divisible by 400, but I would guess that the granularity on my system is different …. because here is a unique listing of all the c= values from my trace file:
0
10000
20000
30000
40000
50000
60000
70000
90000
100000
360000
760000
318740000
319670000
Looks like I have a granularity of 10,000 and you have a granularity of 15,600. I have plenty of c=0 entries too, by the way, to it appears that Oracle is simply ’rounding down’ when the spirit moves it.
Gary
Hi Gary,
Thank you for providing your finding from AIX. It is interesting that your reported CPU usage intervals are in centisecond (0.01 seconds) increments – now I am a bit curious to know if that observation partially explains why wait events and various other timing events were originally reported in centiseconds in Oracle Databases. Centinul’s comment in this article led me to the ClockRes (Windows) utility that read the CPU clock resolution of the Intel CPU as 15.600 ms, and that seems to correspond to the granularity of 15,600 found in the trace file. The ClockRes utility reported that the minimum CPU clock resolution is 0.500 ms, which would seem to imply that it might be possible to drop the granularity to 500, but I am not sure if this is controllable by the operating system (maybe it would not be a good idea to change this?).
I am not so sure that the reported values are “rounding down” (but I could very well be wrong). The impression that I received is that something like this is happening:
Hi Charles,
Very interesting brain teaser indeed! Based on what you and others have written and that MSDN article about granularity of 15 ms on windows (which is what your database’s OS is), my hypothesis is:
Oracle uses GetSystemTime widnow system call to get the current system time after each EXEC and does a delta to report c (CPU) values. Oracle uses Performance counters to get current value for elapsed time and does a delta after each EXEC to report e (Elapsed) values. There are multiple EXEC within 15 msec SYSTIME SLOTS i.e. EXEC CPU time is much shorter than 15 msec.
Based on above:
START: SYSTIME = X PERFORMANCE COUNTER TIME = Y
AFTER EXEC1 – SYSTIME = X PERFORMANCE COUNTER TIME = Y+2100 so c=X-X=0 and e=Y+2100-Y=2100
AFTER EXEC2 – SYSTIME = X PERFORMANCE COUNTER TIME = Y+1900 so c=X-X=0 and e=Y+1900-Y=1900
AFTER EXEC3 – SYSTIME = X PERFORMANCE COUNTER TIME = Y+2000 so c=X-X=0 and e=Y+2000-Y=2000
AFTER EXEC4 – SYSTIME = X PERFORMANCE COUNTER TIME = Y+2100 so c=X-X=0 and e=Y+2100-Y=2100
AFTER EXEC5 – SYSTIME = X + 15 msec (15600 to be precise) PERFORMANCE COUNTER TIME = Y+2100 so c=X+15600 -X=15600 and e=Y+2100-Y=2100
Now my question is why note use Performance counter to measure c also? May be someone can put some light on this.
Something similar might be happening on AIX?
Other possible explanation is – I worked on a custom software piece where we had to report the code execution times. We figured that the code to measure the systime after each execution was too much in terms of CPU cost compared to the actual application code’s CPU cost so we took a short cut and used to measure the time after ‘N’ executions reporting c=0 for those intermediate EXECS. The trace was for our internal use only and we as programmers understood that the final C (reported after N EXECs) has to be divided by N to get c (for each EXECs), hope Oracle is not doing this!!!!
Regards,
Paresh
I am not sure that the CPU time rounding situation is specific to Windows. Here is part of a trace file from Oracle Database 11.1.0.6 running on 64 bit Linux:
Above we see the same rounding issue, although the resolution is 0.001 seconds. I have not analyzed the CPU clock resolution on this particular server with an Intel CPU, but it appears that Linux is using “Minimum timer interval: 1.000 ms” as output by the ClockRes program in a comment above.
A similar trace from 11.2.0.1 on 64 bit Linux:
Again, the above shows a resolution of 1ms, while the Windows trace shows a resolution of about 15ms.
I wonder if Metalink (MOS) Doc ID 436797.1, “Poor Performance with STATISTICS_LEVEL=ALL including Excessive Gettimeofday Calls on LINUX 64 or 32-bit” provides any clues? I will have to see if I can locate a 10.2.0.3 or earlier 10046 trace file from a Linux system.
Untested, but it appears that there is a boot.ini setting on Windows systems that controls the timer resolution:
See:
http://www.pantz.org/software/windows/bootinistartupswitches.html
http://www.windowsnetworking.com/nt/atips/atips136.shtml
http://groups.google.com/group/comp.os.ms-windows.programmer.win32/browse_thread/thread/764f642befe49acb <– also lists a Windows API call
Also untested, an application to adjust the timer resolution:
http://www.lucashale.com/timer-resolution/
Charles,
I posted earlier with some AIX trace file snippets. Digging around later on for an unrelated question, I found Cary Millsap’s paper “Mastering SQL Performance with Extended SQL Trace.” He says on page 9: “For a detailed description of why the CPU consumption statistic is accurate to only +/- 10,000 microseconds, see “Optimizing Oracle Performance” … pp161-165.”
That book says that timing is done by polling and that 10,000 microseconds is a practical limit to how often you should poll for elapsed time.
My guess is that the 11G traces express this lower precision in the reporting cpu timings by appending the trailing 0’s– an improvement in my book.
Gary
Gary,
Nice find – I have the “Optimizing Oracle Book” right next to me. That section of the book seems to exactly answer this brain teaser. The pages seem to suggest that the minimum reported CPU consumption on Linux is c=10000, while one of my comments above shows Oracle Database on Linux reporting c=1000 – so it must be the case that the Scheduler resolution in Linux changed over the years since the book was published (it could be a feature of the CPU platform). Page 163 of the book includes a Perl script that appears to produce the same information as ClockRes on the Windows platform (I have not tested the script on Linux). A quote from page 164: