March 3, 2010
Let’s say that you encouter a query that is taking an unexpectedly long time to run, possibly 30 minutes. Maybe the query looks something like this:
SELECT mds.messagepartdata, ms.status, mi.ID, mi.messageguid, mi.channel, ROWNUM AS messagecount FROM pfmq_messagedata md, pfmq_messagedatastorage mds, pfmq_messageinfo mi, pfmq_messagestatus ms WHERE ( mi.queuename = 'CL312911032' AND mi.ID = ms.ID AND mi.ID = md.ID AND mi.ID = mds.ID AND md.ID = mds.ID AND md.messageparttype = mds.messageparttype AND md.messageparttype = 1 AND (ms.statusrevisionnumber = (SELECT MAX (statusrevisionnumber) FROM pfmq_messagestatus ms2 WHERE ms2.ID = ms.ID) ) ) AND ((ms.status = 64) AND (mi.direction = 1) AND mi.messagetype = 0) ORDER BY mi.sequenceordinalnumber, mi.senttime
In the above, the table PFMQ_MESSAGEDATASTORAGE contains a long raw column – the MESSAGEPARTDATA column. You enable a 10046 trace, but only at level 1 – so there are no wait events written to the trace file. The resulting trace file is then processed using TKPROF, and you obtain the following output:
call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ----- Parse 1 0.00 0.00 0 0 0 0 Execute 1 0.00 0.00 0 0 0 0 Fetch 4321 14.56 580.31 231750 746064 0 64806 ------- ------ -------- ---------- ---------- ---------- ---------- ----- total 4323 14.56 580.31 231750 746064 0 64806 Misses in library cache during parse: 1 Optimizer goal: CHOOSE Parsing user id: 29 Rows Row Source Operation ------- --------------------------------------------------- 64806 SORT ORDER BY (cr=681258 r=184767 w=0 time=403515790 us) 64806 COUNT (cr=681258 r=184767 w=0 time=1729762996 us) 64806 NESTED LOOPS (cr=681258 r=184767 w=0 time=1729717540 us) 64806 NESTED LOOPS (cr=486600 r=92648 w=0 time=901417748 us) 64806 NESTED LOOPS (cr=356748 r=46572 w=0 time=268980743 us) 64820 TABLE ACCESS BY INDEX ROWID PFMQ_MESSAGEINFO (cr=31514 r=23422 w=0 time=44437657 us) 120784 INDEX RANGE SCAN AK1_PFMQ_SEQUENCENUMBER (cr=3117 r=3062 w=0 time=10896605 us)(object id 6511) 64806 TABLE ACCESS BY INDEX ROWID PFMQ_MESSAGESTATUS (cr=325234 r=23150 w=0 time=224278563 us) 64820 INDEX RANGE SCAN XPKPF_MESSAGESTATUS (cr=260414 r=15792 w=0 time=208616639 us)(object id 6515) 64820 SORT AGGREGATE (cr=129644 r=116 w=0 time=1973822 us) 64820 FIRST ROW (cr=129644 r=116 w=0 time=1810738 us) 64820 INDEX RANGE SCAN (MIN/MAX) XPKPF_MESSAGESTATUS (cr=129644 r=116 w=0 time=1756030 us)(object id 6515) 64806 INDEX UNIQUE SCAN XPKPF_MESSAGEDATA (cr=129852 r=46076 w=0 time=632244506 us)(object id 6505) 64806 TABLE ACCESS BY INDEX ROWID PFMQ_MESSAGEDATASTORAGE (cr=194658 r=92119 w=0 time=828055493 us) 64806 INDEX UNIQUE SCAN XPKPF_MESSAGEDATASTORAGE (cr=129852 r=46036 w=0 time=613528422 us)(object id 6507)
How would you troubleshoot this performance problem? What are the problems? What looks good about the above output? What about the above makes you wonder if some detail is missing? You will find the above SQL statement in this comp.databases.oracle.server Usenet thread.
What can be said about the above output?
- The client application is retrieving roughly 15 rows in each fetch request: 64,806/4,321 = 14.998 rows per fetch. Maybe setting the fetch array size to a larger value would help?
- The fetch calls required 14.56 seconds of the server’s CPU time, while the elapsed time for the fetch was 508.31 580.31 seconds. 565.75 seconds were probably spent doing something other than actively burning CPU cycles – such as waiting for the completion of a disk read.
- Most of the indexes accessed were high precision indexes, meaning that few of the rowids returned by the index where eliminated at the table level, with the exception of the AK1_PFMQ_SEQUENCENUMBER index, where 46% of the rows identified by the index were discarded. Those rows were discarded very early in the plan, so the performance impact was likely minimal.
- There was heavy use of nested loop joins – that might be OK, but might not be as efficient as a hash join if a large percentage of the tables were being accessed or if the clustering factor were high.
- There were 231,750 blocks read from disk, and considering the heavy use of indexes and nested loop joins, those blocks were likely read one at a time from disk. If that was the case, the average time to read a block from disk was 0.0024412 seconds (2.4412ms), which is considered to be very fast access times for physical reads from disk.
- Considering the WHERE clause, the execution plan starts with the two tables with the greatest number of predicates – so that is probably a smart starting point.
- The elapsed time reported on the last line of the execution plan is greater than the elapsed time reported on the first line of the execution plan – just an odd inconsistency that the time reported on the last line was 613.5 seconds, yet the elapsed time reported by TKPROF for the entire execution was only 580.31 seconds? The second line of the execution plan shows 1,729.7 seconds (28.8 minutes) on the time= entry, which again is inconsistent with TKPROF’s elapsed time statistic.
- 580.31 seconds is roughly 9.7 minutes – what happened to the other 20.3 minutes in the 30 minute total execution time that was reported by the original poster?
What might be the next couple of steps for troubleshooting this performance problem?
- Generate the 10046 trace at level 8, rather than level 1. That will cause the wait events to be written to the trace file. If the trace file contains a large number of long duration waits on SQL*Net type wait events, check the network with a packet sniffer (Wireshark, for example) and check the client-side activity to make certain that it is not the client application that is the source of the slow performance. If you see large unexplained gaps in the tim= values in the trace file without a corresponding long wait event in between, investigate whether the server’s CPUs are over-burdened.
- Check the statistics on the tables and indexes to make certain that those statistics are reasonably accurate and up to date.
- Review the current optimizer related initialization parameters to make certain that silly parameters are not specified.
- Most of all, determine where the missing 20.3 minutes have gone.

Charles,
Nice analysis. You covered almost everything that I could think of and then asked what else. That is not fair.
Here is what I can think of:
1) To confirm the 3rd point (about row elimination) and 6th point in your analysis (about execution plan starting point), I would check what is the EXPLAIN PLAN outcome. Assuming this is on database version that gives predicate information in plan, the EXPLAIN PLAN will a) prove if expected plan is same as actual plan and b) tell clearly at what stages filter predicates are applied. Also, the ROWS column in EXPLAIN PLAN output shows expected rowcounts (per iteration) for a NESTED LOOP join which can be vital info as ROW SOURCE OPERATION reports row counts for all iterations combined (although in this case, it may not provide any new details)
2) I was actually a bit surprised to see following steps in ROW SOURCE
OPERATION
With the way query is written, I was expecting PFMQ_MESSAGESTATUS to be accessed more than once (for each iteraion). Is this some kind of optimizer improvement? Or is it because XPKPF_MESSAGESTATUS index includes both “statusrevisionnumber” and “ID” columns? Will there be any benifit in rewriting this query (using analytical function) in order to remove the correlated subquery ?
3) Finally, my most important question will be- What is done (by the “application”) with 64806 ordered set of rows that this query produce? In other words, is it absolutely essential to have this query being executed on its own?
I am not saying this is necessarily wrong but I would certainly validate the approach/design.
p.s. BTW, what does the “r=” component in ROW SOURCE OPERATION indicate? Physical Reads? I always seem to get “pr=” in 10.2.0.x versions and no “r=” in TkProf reports.
Narendra,
It is good that you are reading these blog articles somewhat as personal challenges – that is how many of my articles are written. I do not always tell everything that I see to be wrong or right – partially due to time constraints, and partially to encourage other people to offer comments that provide much more complete (and accurate) answers than I could have offered. Fortunately, there are a number of very smart people who are willing to freely share their knowledge.
Good idea to check the estimated number of rows per operation in an execution plan – it could be that statistics have not been gathered in the last 8 years which will cause Oracle to greatly under-estimate the cardinality for the operations, or it could be a problem with non-mutually exclusive predicates (I think that Randolf described this as a potential problem). Randolf’s comment about the timing in the portion of the execution plan agrees with a comment that I thought about making, but his explanation is better than the one I considered offering.
Good point about questioning what would happen with the 64,806 rows – will 64,800 of those rows be thrown away, or will all of the rows end up on a report of some sort?
The Oracle 11.2.0.x documentation describes the “r=” entry in the Row Source Operation section of the TKPROF output. Quoting from the documentation:
Rows Row Source Operation ------- --------------------------------------------------- 0 DELETE (cr=43141 r=266947 w=25854 time=60235565 us) 28144 HASH JOIN ANTI (cr=43057 r=262332 w=25854 time=48830056 us) 51427 TABLE ACCESS FULL STATS$SQLTEXT (cr=3465 r=3463 w=0 time=865083 us) 647529 INDEX FAST FULL SCAN STATS$SQL_SUMMARY_PK (cr=39592 r=39325 w=0 time=10522877 us) (object id 7409)“In this sample TKPROF output, note the following under the Row Source Operation column:
•cr specifies consistent reads performed by the row source
•r specifies physical reads performed by the row source
•w specifies physical writes performed by the row source
•time specifies time in microseconds”
The original poster in the Usenet thread is running Oracle Database 9.2.0.6, so the “r=” entry on the STAT= lines in the trace file (the “Row Source Operation” lines in the TKPROF output) indicate the number of blocks that were physically read from disk. With your help, I think that we just found a documentation bug – something that was not updated in the documentation to reflect changes in Oracle’s tracing behavior. I am guessing that the STAT line trace file content changed starting with Oracle 10.1.0.x, but the documentation was never updated to reflect that change.
This is what the Row Source Operation section looks like in Oracle 10.2.0.x:
Rows Row Source Operation ------- --------------------------------------------------- 1 NESTED LOOPS (cr=7 pr=0 pw=0 time=160 us) 2 TABLE ACCESS BY INDEX ROWID CURRENCY (cr=2 pr=0 pw=0 time=74 us) 2 INDEX FULL SCAN SYS_C0018003 (cr=1 pr=0 pw=0 time=42 us)(object id 28289) 1 TABLE ACCESS BY INDEX ROWID CUST_ORDER_CURR (cr=5 pr=0 pw=0 time=63 us) 1 INDEX UNIQUE SCAN SYS_C0018028 (cr=4 pr=0 pw=0 time=37 us)(object id 28305)Centinul answered your second question about how I calculated the average time for a block read. Interestingly, Randolf came up with entirely different average read times by using the “Row Source Operation” section from the TKPROF output. He calculated that the average read time for an index block was 13 ms, while the average read time for a table block was between 3 and 5 ms. Which calculation is more accurate, I do not know.
Charles,
One question. How did you come to the conclusion the average time to read a block from disk was 0.0024412 seconds (2.4412ms) ?
Narendra –
I believe the computation is as follows:
(total elapsed time – time spent on cpu) / # blocks read from disk
(580.31 – 14.56) / 231750 = 0.0024412 seconds / block
Charles,
I’ve replied to the original thread on Usenet group – there are a couple of noteworthy oddities with this TKPROF output that might explain the issue with the missing 20 minutes.
@Narendra: The PFMQ_MESSAGESTATUS is actually queried for every iteration, since it is a child operation of the inner table of the NESTED LOOP operation. But as you can see from the TKPROF output it does not seem to be relevant in this particular case – it only takes 2 seconds to execute this subquery 64,000 times, so there are other areas to tackle first and no need to replace this subquery with the present execution plan.
See my comments on the original thread for more information.
Randolf
Randolf,
Thanks for stopping by my blog and sharing your great troubleshooting skills. I did not feel comfortable/confident with using the time= values in the Row Source Operation section of the TKPROF output for analysis purposes because of the apparent inconsistencies between the STAT lines and the aggregation of the PARSE, EXEC, and FETCH lines, but that does not mean that the time= values should not be used.
I seem to recall Jonathan Lewis telling me in the first half of 2008 on the OTN forums that the time= values in the STAT lines could be inaccurate. I do not recall if he stated that the TIME= values could not be trusted because not all of the code paths are instrumented, there are rounding errors when a code path is repeatedly executed in a loop (a NESTED LOOP join), or possibly another reason such as time occasionally being allocated to the wrong operation in the plan. Of course, I have been unable to find that thread on OTN, so I might just be remembering his statement incorrectly.
It is an interesting problem, so I hope that the OP returns to the Usenet thread.
Centinul,
Thanks. I was wondering if that was somehow based on elapsed time figures of individual plan steps.
[...] 11-How to interpret tkprof and execution plan ? Charles Hooper-Query Performance Problem, or is Something Else Happening? [...]