Extended SQL Trace – the Case of the Missing 40,000 Seconds

4 02 2010

February 4, 2010

An interesting question recently appeared on Oracle’s OTN forums:
http://forums.oracle.com/forums/thread.jspa?threadID=1021152

The original poster stated that he noticed a slow running job and initiated a 10046 trace using ORADEBUG.  The job ran for roughly 60,000 seconds, but the 10046 trace was disabled after about 45,000 seconds.  Below is the partial output of TKPROF for his trace file that was provided in the thread:

********************************************************************************
OVERALL TOTALS FOR ALL NON-RECURSIVE STATEMENTS

call     count       cpu    elapsed       disk      query    current        rows
------- ------   -------- ---------- ---------- ---------- ----------  ----------
Parse        34      0.07       0.05          0          0          0           0
Execute  949806    924.33    1010.25          0          0          0          34
Fetch    964106    264.38    2540.21     301146   18226522          0     5094481
------- -------  -------- ---------- ---------- ---------- ----------  ----------
total   1913946   1188.78    3550.51     301146   18226522          0     5094515

Misses in library cache during parse: 5
Misses in library cache during execute: 5

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  db file sequential read                    301151        0.49       1598.40
  SQL*Net message to client                  964141        0.01          1.54
  SQL*Net message from client                964140      973.95       1720.04
  read by other session                      201948        0.98        705.53
  latch: cache buffers chains                 11496        0.01          1.40
  latch free                                     38        0.00          0.02
  cursor: pin S                                  27        0.01          0.19
  SQL*Net more data from client                  87        0.00          0.00
  wait list latch free                            1        0.01          0.01
  latch: object queue header operation            8        0.00          0.00

OVERALL TOTALS FOR ALL RECURSIVE STATEMENTS

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        7      0.00       0.00          0          0          0           0
Execute      7      0.00       0.00          0         21          0           0
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total       14      0.00       0.00          0         21          0           0

Misses in library cache during parse: 0

   45  user  SQL statements in session.
    7  internal SQL statements in session.
   52  SQL statements in session.
********************************************************************************
Trace file: s1p_ora_9761.trc
Trace file compatibility: 10.01.00
Sort options: exeela  fchela  prsela
        1  session in tracefile.
       45  user  SQL statements in trace file.
        7  internal SQL statements in trace file.
       52  SQL statements in trace file.
       46  unique SQL statements in trace file.
 71815238  lines in trace file.
    45080  elapsed seconds in trace file.

Let’s see if we are able to make some sense of what was posted.  3,550.51 elapsed seconds based on the PARSE, EXEC, and FETCH lines in the trace file for SQL submitted by the user’s session; 0.00 elapsed seconds based on the PARSE, EXEC, and FETCH lines in the trace file for recursive SQL (trigger, PL/SQL functions, SQL optimization, and space management calls); 4,027.13 seconds in wait events; 1,188.78 seconds of CPU time; 45,080 elapsed seconds in the trace file (last tim= value found in the trace file minus the first tim= value found in the trace file); OP claims that the trace was enabled for about 45,000 seconds and that the entire job required 60,000 seconds.  So, what could cause the odd timing difference?

The OP was asked to post at least the last 30 lines found in the trace file, and this is what was provided:

WAIT #69: nam='db file sequential read' ela= 10255 file#=253 block#=514924 blocks=1 obj#=211550 tim=52370728986939
WAIT #69: nam='db file sequential read' ela= 23269 file#=260 block#=1202218 blocks=1 obj#=211554 tim=52370729010404
WAIT #69: nam='db file sequential read' ela= 14643 file#=260 block#=1202511 blocks=1 obj#=211554 tim=52370729025179
WAIT #69: nam='db file sequential read' ela= 4114 file#=265 block#=1033501 blocks=1 obj#=211554 tim=52370729029406
WAIT #69: nam='db file sequential read' ela= 13199 file#=253 block#=516876 blocks=1 obj#=211550 tim=52370729042681
WAIT #69: nam='db file sequential read' ela= 15608 file#=253 block#=517214 blocks=1 obj#=211550 tim=52370729058435
WAIT #69: nam='db file sequential read' ela= 11327 file#=253 block#=517552 blocks=1 obj#=211550 tim=52370729069910
WAIT #69: nam='db file sequential read' ela= 3025 file#=265 block#=1033566 blocks=1 obj#=211554 tim=52370729073032
WAIT #69: nam='db file sequential read' ela= 9539 file#=255 block#=981553 blocks=1 obj#=211550 tim=52370729082732
WAIT #69: nam='db file sequential read' ela= 13474 file#=257 block#=1259903 blocks=1 obj#=211550 tim=52370729096346
WAIT #69: nam='db file sequential read' ela= 15618 file#=265 block#=1033723 blocks=1 obj#=211554 tim=52370729112106
WAIT #69: nam='db file sequential read' ela= 7040 file#=265 block#=1033868 blocks=1 obj#=211554 tim=52370729119454
WAIT #69: nam='db file sequential read' ela= 7995 file#=257 block#=1261555 blocks=1 obj#=211550 tim=52370729127527
WAIT #69: nam='db file sequential read' ela= 9090 file#=257 block#=1261616 blocks=1 obj#=211550 tim=52370729136775
WAIT #69: nam='db file sequential read' ela= 6459 file#=257 block#=1260091 blocks=1 obj#=211550 tim=52370729143360
WAIT #69: nam='db file sequential read' ela= 14971 file#=265 block#=1033992 blocks=1 obj#=211554 tim=52370729158426
WAIT #69: nam='db file sequential read' ela= 8528 file#=257 block#=1260350 blocks=1 obj#=211550 tim=52370729167077
WAIT #69: nam='db file sequential read' ela= 17749 file#=257 block#=1262569 blocks=1 obj#=211550 tim=52370729185407
WAIT #69: nam='db file sequential read' ela= 14060 file#=257 block#=1262585 blocks=1 obj#=211550 tim=52370729199789
WAIT #69: nam='db file sequential read' ela= 5944 file#=265 block#=1034201 blocks=1 obj#=211554 tim=52370729205855
WAIT #69: nam='db file sequential read' ela= 7822 file#=265 block#=1034081 blocks=1 obj#=211554 tim=52370729213801
WAIT #69: nam='db file sequential read' ela= 9290 file#=257 block#=1260904 blocks=1 obj#=211550 tim=52370729223162
WAIT #69: nam='db file sequential read' ela= 8096 file#=257 block#=1263630 blocks=1 obj#=211550 tim=52370729231435
WAIT #69: nam='db file sequential read' ela= 17243 file#=266 block#=1164414 blocks=1 obj#=211554 tim=52370729248781
WAIT #69: nam='db file sequential read' ela= 13505 file#=257 block#=1265138 blocks=1 obj#=211550 tim=52370729262412
WAIT #69: nam='db file sequential read' ela= 8196 file#=257 block#=1263679 blocks=1 obj#=211550 tim=52370729270695
WAIT #69: nam='db file sequential read' ela= 10854 file#=266 block#=1164488 blocks=1 obj#=211554 tim=52370729281699
WAIT #69: nam='db file sequential read' ela= 6916 file#=257 block#=1266913 blocks=1 obj#=211550 tim=52370729288757
WAIT #69: nam='db file sequential read' ela= 7534 file#=266 block#=1164629 blocks=1 obj#=211554 tim=52370729296373
WAIT #69: nam='db file sequential read' ela= 11014 file#=262 block#=1215470 blocks=1 obj#=211550 tim=52370729307690
WAIT #69: nam='db file sequential read' ela= 5464 file#=266 block#=1164797 blocks=1 obj#=211554 tim=52370729313292
WAIT #69: nam='db file sequential read' ela= 7087 file#=266 block#=1164693 blocks=1 obj#=211554 tim=52370729320517
WAIT #69: nam='db file sequential read' ela= 9120 file#=266 block#=1164681 blocks=1 obj#=211554 tim=52370729329728
WAIT #69: nam='db file sequential read' ela= 9765 file#=263 block#=897664 blocks=1 obj#=211550 tim=52370729339566
WAIT #69: nam='db file sequential read' ela= 9884 file#=266 block#=1164905 blocks=1 obj#=211554 tim=52370729349619
WAIT #69: nam='db file sequential read' ela= 6704 file#=263 block#=896829 blocks=1 obj#=211550 tim=52370729356392
WAIT #69: nam='db file sequential read' ela= 11685 file#=263 block#=898947 blocks=1 obj#=211550 tim=52370729368192
WAIT #69: nam='db file sequential read' ela= 4374 file#=257 block#=1265707 blocks=1 obj#=211550 tim=52370729372745
WAIT #69: nam='db file sequential read' ela= 10176 file#=266 block#=1164943 blocks=1 obj#=211554 tim=52370729383118
WAIT #69: nam='db file sequential read' ela= 5685 file#=263 block#=898167 blocks=1 obj#=211550 tim=52370729388872
WAIT #69: nam='db file sequential read' ela= 12407 file#=263 block#=898515 blocks=1 obj#=211550 tim=52370729401365
WAIT #69: nam='db file sequential read' ela= 8000 file#=266 block#=1165054 blocks=1 obj#=211554 tim=52370729409495
WAIT #69: nam='db file sequential read' ela= 10842 file#=263 block#=898387 blocks=1 obj#=211550 tim=52370729420558
WAIT #69: nam='db file sequential read' ela= 9459 file#=266 block#=1164884 blocks=1 obj#=211554 tim=52370729430097
WAIT #69: nam='db file sequential read' ela= 9176 file#=263 block#=899680 blocks=1 obj#=211550 tim=52370729439344
WAIT #69: nam='db file sequential read' ela= 9433 file#=266 block#=1165227 blocks=1 obj#=211554 tim=52370729448885
WAIT #69: nam='db file sequential read' ela= 9860 file#=264 block#=1166755 blocks=1 obj#=211550 tim=52370729458836
WAIT #69: nam='db file sequential read' ela= 15433 file#=263 block#=899734 blocks=1 obj#=211550 tim=52370729474390
WAIT #69: nam='db file sequential read' ela= 2525 file#=264 block#=1168289 blocks=1 obj#=211550 tim=52370729477056
WAIT #69: nam='db file sequential read' ela= 14267 file#=264 block#=1166072 blocks=1 obj#=211550 tim=52370729491427
WAIT #69: nam='db file sequential read' ela= 4862 file#=266 block#=1165189 blocks=1 obj#=211554 tim=52370729496443
WAIT #69: nam='db file sequential read' ela= 6263 file#=264 block#=1165451 blocks=1 obj#=211550 tim=52370729502825
WAIT #69: nam='db file sequential read' ela= 7960 file#=266 block#=1165114 blocks=1 obj#=211554 tim=52370729510959
WAIT #69: nam='db file sequential read' ela= 11191 file#=263 block#=899590 blocks=1 obj#=211550 tim=52370729522220
WAIT #69: nam='db file sequential read' ela= 16590 file#=264 block#=1166723 blocks=1 obj#=211550 tim=52370729539121
WAIT #69: nam='db file sequential read' ela= 17177 file#=264 block#=1169520 blocks=1 obj#=211550 tim=52370729556924
WAIT #69: nam='db file sequential read' ela= 10969 file#=265 block#=1042853 blocks=1 obj#=211550 tim=52370729568068
WAIT #69: nam='db file sequential read' ela= 9767 file#=264 block#=1169529 blocks=1 obj#=211550 tim=52370729578053
WAIT #69: nam='db file sequential read' ela= 8938 file#=267 block#=874778 blocks=1 obj#=211554 tim=52370729587146
WAIT #69: nam='db file sequential read' ela= 14940 file#=7 block#=192178 blocks=1 obj#=211550 tim=52370729602405
WAIT #69: nam='db file sequential read' ela= 11551 file#=7 block#=194476 blocks=1 obj#=211550 tim=52370729614295
FETCH #69:c=50000,e=2063972,p=176,cr=1956,cu=0,mis=0,r=489,dep=0,og=1,tim=52370729615480
*** 2010-01-29 03:51:49.192
Received ORADEBUG command 'tracefile_name' from process Unix process pid: 9597, image:
*** 2010-01-29 03:53:38.062
Received ORADEBUG command 'event 10046 trace name context off' from process Unix process pid: 9597, image:
Dump file /oracle/S1P/saptrace/usertrace/s1p_ora_9761.trc
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, Data Mining and Real Application Testing options
ORACLE_HOME = /oracle/S1P/102_64
System name:    HP-UX
Node name:      nlscuxp0
Release:        B.11.23
Version:        U
Machine:        ia64
Instance name:  S1P
Redo thread mounted by this instance: 1
Oracle process number: 0
Unix process pid: 9761, image: oracle@nlscuxp0

From the above portion of the trace file, we see that the OP used ORADEBUG at 2010-01-29 03:51:49.192 to determine the trace file name for the session, and roughly 1 minute and 49 seconds later the OP used ORADEBUG to disable tracing.  The last fetch call in the trace file completed in roughly 2.06 seconds, required 0.05 seconds of CPU time and 174 physical block reads (possibly read one block at a time), and returned 489 rows to the client.  So, what happened to the missing 40,000 seconds?

  • Problem with a long CPU run queue causing a lot of missing time?
  • Problem with scheduling priority for the operating system processes?
  • Problem with a long running SQL statement that had been running for roughly 40,000 seconds and only performed logical I/O?
  • Problem where the batch process called DBMS_SLEEP?
  • Problem where the server made excessive use of the swap file, causing lost time on many “memory” accesses.
  • Problem with the network when transferring 489 rows (or some other large number) back to the client?
  • Extensive client side processing that lasted 40,000 seconds, but it would seem that Oracle would periodically write the current date and time to the trace file.

It is possibly interesting that a SQL*Net wait event of some sort did not appear after the fetch call (nothing written in the nearly 2 minute time period at the end of the trace file).  It is also possibly interesting that the TKPROF output indicated that the trace file included 45,080 seconds, which probably means that the lost 40,000 seconds did not happen in the last 50 or so lines of the trace file (this can be confirmed by checking the delta values of the tim= entries) – yet the elapsed CPU seconds plus wait events, including client side events, did not add up to the 45,000+ seconds.  Some strange/inconsistent output in the trace file should be expected if tracing is enabled or disabled in the middle of a SQL statement’s execution.

So, what happened to the missing 40,000 seconds?  Another case of a time warp?


Actions

Information

2 responses

4 02 2010
Timur Akhmadeev

>Problem where the batch process called DBMS_SLEEP?
I doubt so. DBMS_LOCK.SLEEP would results in PL/SQL lock timer waits.

Another possibility is SQL trace inaccuracies, for ex. due to system calls overheads. I’ve seen such behavior on a Linux boxes where gettimeofday calls were not so fast and introduced significant load on the CPU – 75-90% instead of 10-15% with disabled SQL trace.

4 02 2010
Charles Hooper

Excellent point regarding the DBMS_LOCK.SLEEP function – thanks for catching the typo with that function name. Good suggestion regarding the gettimeofday calls causing “lost” time.

As I posted in the thread, the delta values of the tim= entries in the trace file section provided by the OP typically showed a variance from the wait event time ranging between roughly 0.000069 seconds and 0.000626 seconds – it might be an odd coincidence that the high end of the variance (0.000626) multiplied by the number of lines in the trace file (71,815,238) would total about 44,956 seconds – a bit more than the missing 40,000 seconds. Of course, not all lines in the trace file would have generated this variance, and the majority of the variances in the section of the trace file posted by the OP were less than 0.0002 seconds.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s




Follow

Get every new post delivered to your Inbox.

Join 141 other followers

%d bloggers like this: