Query Performance Problem, or is Something Else Happening?

3 03 2010

March 3, 2010

Let’s say that you encouter a query that is taking an unexpectedly long time to run, possibly 30 minutes.  Maybe the query looks something like this:

SELECT   mds.messagepartdata, ms.status, mi.ID, mi.messageguid, mi.channel,
         ROWNUM AS messagecount
    FROM pfmq_messagedata md,
         pfmq_messagedatastorage mds,
         pfmq_messageinfo mi,
         pfmq_messagestatus ms
   WHERE (    mi.queuename = 'CL312911032'
          AND mi.ID = ms.ID
          AND mi.ID = md.ID
          AND mi.ID = mds.ID
          AND md.ID = mds.ID
          AND md.messageparttype = mds.messageparttype
          AND md.messageparttype = 1
          AND (ms.statusrevisionnumber = (SELECT MAX (statusrevisionnumber)
                                            FROM pfmq_messagestatus ms2
                                           WHERE ms2.ID = ms.ID)
     AND ((ms.status = 64) AND (mi.direction = 1) AND mi.messagetype = 0)
ORDER BY mi.sequenceordinalnumber, mi.senttime

In the above, the table PFMQ_MESSAGEDATASTORAGE contains a long raw column – the MESSAGEPARTDATA column.  You enable a 10046 trace, but only at level 1 – so there are no wait events written to the trace file.  The resulting trace file is then processed using TKPROF, and you obtain the following output:

call     count       cpu    elapsed       disk      query    current  rows
------- ------  -------- ---------- ---------- ---------- ---------- -----
Parse        1      0.00       0.00          0          0          0     0
Execute      1      0.00       0.00          0          0          0     0
Fetch     4321     14.56     580.31     231750     746064          0 64806
------- ------  -------- ---------- ---------- ---------- ---------- -----
total     4323     14.56     580.31     231750     746064          0 64806

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 29

Rows     Row Source Operation
-------  ---------------------------------------------------
  64806  SORT ORDER BY (cr=681258 r=184767 w=0 time=403515790 us)
  64806   COUNT  (cr=681258 r=184767 w=0 time=1729762996 us)
  64806    NESTED LOOPS  (cr=681258 r=184767 w=0 time=1729717540 us)
  64806     NESTED LOOPS  (cr=486600 r=92648 w=0 time=901417748 us)
  64806      NESTED LOOPS  (cr=356748 r=46572 w=0 time=268980743 us)
  64820       TABLE ACCESS BY INDEX ROWID PFMQ_MESSAGEINFO (cr=31514 r=23422 w=0 time=44437657 us)
 120784        INDEX RANGE SCAN AK1_PFMQ_SEQUENCENUMBER (cr=3117 r=3062 w=0 time=10896605 us)(object id 6511)
  64806       TABLE ACCESS BY INDEX ROWID PFMQ_MESSAGESTATUS (cr=325234 r=23150 w=0 time=224278563 us)
  64820        INDEX RANGE SCAN XPKPF_MESSAGESTATUS (cr=260414 r=15792 w=0 time=208616639 us)(object id 6515)
  64820         SORT AGGREGATE (cr=129644 r=116 w=0 time=1973822 us)
  64820          FIRST ROW  (cr=129644 r=116 w=0 time=1810738 us)
  64820           INDEX RANGE SCAN (MIN/MAX) XPKPF_MESSAGESTATUS (cr=129644 r=116 w=0 time=1756030 us)(object id 6515)
  64806      INDEX UNIQUE SCAN XPKPF_MESSAGEDATA (cr=129852 r=46076 w=0 time=632244506 us)(object id 6505)
  64806     TABLE ACCESS BY INDEX ROWID PFMQ_MESSAGEDATASTORAGE (cr=194658 r=92119 w=0 time=828055493 us)
  64806      INDEX UNIQUE SCAN XPKPF_MESSAGEDATASTORAGE (cr=129852 r=46036 w=0 time=613528422 us)(object id 6507)

How would you troubleshoot this performance problem?  What are the problems?  What looks good about the above output?  What about the above makes you wonder if some detail is missing?  You will find the above SQL statement in this comp.databases.oracle.server Usenet thread.

What can be said about the above output?

  • The client application is retrieving roughly 15 rows in each fetch request: 64,806/4,321 = 14.998 rows per fetch.  Maybe setting the fetch array size to a larger value would help?
  • The fetch calls required 14.56 seconds of the server’s CPU time, while the elapsed time for the fetch was 508.31 580.31 seconds.  565.75 seconds were probably spent doing something other than actively burning CPU cycles – such as waiting for the completion of a disk read.
  • Most of the indexes accessed were high precision indexes, meaning that few of the rowids returned by the index where eliminated at the table level, with the exception of the AK1_PFMQ_SEQUENCENUMBER index, where 46% of the rows identified by the index were discarded.  Those rows were discarded very early in the plan, so the performance impact was likely minimal.
  • There was heavy use of nested loop joins – that might be OK, but might not be as efficient as a hash join if a large percentage of the tables were being accessed or if the clustering factor were high.
  • There were 231,750 blocks read from disk, and considering the heavy use of indexes and nested loop joins, those blocks were likely read one at a time from disk.  If that was the case, the average time to read a block from disk was 0.0024412 seconds (2.4412ms), which is considered to be very fast access times for physical reads from disk.
  • Considering the WHERE clause, the execution plan starts with the two tables with the greatest number of predicates – so that is probably a smart starting point.
  • The elapsed time reported on the last line of the execution plan is greater than the elapsed time reported on the first line of the execution plan – just an odd inconsistency that the time reported on the last line was 613.5 seconds, yet the elapsed time reported by TKPROF for the entire execution was only 580.31 seconds?  The second line of the execution plan shows 1,729.7 seconds (28.8 minutes) on the time= entry, which again is inconsistent with TKPROF’s elapsed time statistic.
  • 580.31 seconds is roughly 9.7 minutes – what happened to the other 20.3 minutes in the 30 minute total execution time that was reported by the original poster?

What might be the next couple of steps for troubleshooting this performance problem?

  • Generate the 10046 trace at level 8, rather than level 1.  That will cause the wait events to be written to the trace file.  If the trace file contains a large number of long duration waits on SQL*Net type wait events, check the network with a packet sniffer (Wireshark, for example) and check the client-side activity to make certain that it is not the client application that is the source of the slow performance.  If you see large unexplained gaps in the tim= values in the trace file without a corresponding long wait event in between, investigate whether the server’s CPUs are over-burdened.
  • Check the statistics on the tables and indexes to make certain that those statistics are reasonably accurate and up to date.
  • Review the current optimizer related initialization parameters to make certain that silly parameters are not specified.
  • Most of all, determine where the missing 20.3 minutes have gone.



8 responses

3 03 2010


Nice analysis. You covered almost everything that I could think of and then asked what else. That is not fair. 🙂 Here is what I can think of:
1) To confirm the 3rd point (about row elimination) and 6th point in your analysis (about execution plan starting point), I would check what is the EXPLAIN PLAN outcome. Assuming this is on database version that gives predicate information in plan, the EXPLAIN PLAN will a) prove if expected plan is same as actual plan and b) tell clearly at what stages filter predicates are applied. Also, the ROWS column in EXPLAIN PLAN output shows expected rowcounts (per iteration) for a NESTED LOOP join which can be vital info as ROW SOURCE OPERATION reports row counts for all iterations combined (although in this case, it may not provide any new details)
2) I was actually a bit surprised to see following steps in ROW SOURCE

  64806       TABLE ACCESS BY INDEX ROWID PFMQ_MESSAGESTATUS (cr=325234 r=23150 w=0 time=224278563 us)
  64820        INDEX RANGE SCAN XPKPF_MESSAGESTATUS (cr=260414 r=15792 w=0 time=208616639 us)(object id 6515)
  64820         SORT AGGREGATE (cr=129644 r=116 w=0 time=1973822 us)
  64820          FIRST ROW  (cr=129644 r=116 w=0 time=1810738 us)
  64820           INDEX RANGE SCAN (MIN/MAX) XPKPF_MESSAGESTATUS (cr=129644 r=116 w=0 time=1756030 us)(object id 6515)

With the way query is written, I was expecting PFMQ_MESSAGESTATUS to be accessed more than once (for each iteraion). Is this some kind of optimizer improvement? Or is it because XPKPF_MESSAGESTATUS index includes both “statusrevisionnumber” and “ID” columns? Will there be any benifit in rewriting this query (using analytical function) in order to remove the correlated subquery ?
3) Finally, my most important question will be- What is done (by the “application”) with 64806 ordered set of rows that this query produce? In other words, is it absolutely essential to have this query being executed on its own?
I am not saying this is necessarily wrong but I would certainly validate the approach/design.

p.s. BTW, what does the “r=” component in ROW SOURCE OPERATION indicate? Physical Reads? I always seem to get “pr=” in 10.2.0.x versions and no “r=” in TkProf reports.

3 03 2010
Charles Hooper


It is good that you are reading these blog articles somewhat as personal challenges – that is how many of my articles are written. I do not always tell everything that I see to be wrong or right – partially due to time constraints, and partially to encourage other people to offer comments that provide much more complete (and accurate) answers than I could have offered. Fortunately, there are a number of very smart people who are willing to freely share their knowledge.

Good idea to check the estimated number of rows per operation in an execution plan – it could be that statistics have not been gathered in the last 8 years which will cause Oracle to greatly under-estimate the cardinality for the operations, or it could be a problem with non-mutually exclusive predicates (I think that Randolf described this as a potential problem). Randolf’s comment about the timing in the portion of the execution plan agrees with a comment that I thought about making, but his explanation is better than the one I considered offering.

Good point about questioning what would happen with the 64,806 rows – will 64,800 of those rows be thrown away, or will all of the rows end up on a report of some sort?

The Oracle 11.2.0.x documentation describes the “r=” entry in the Row Source Operation section of the TKPROF output. Quoting from the documentation:

Rows     Row Source Operation
-------  ---------------------------------------------------
      0  DELETE  (cr=43141 r=266947 w=25854 time=60235565 us)
  28144   HASH JOIN ANTI (cr=43057 r=262332 w=25854 time=48830056 us)
  51427    TABLE ACCESS FULL STATS$SQLTEXT (cr=3465 r=3463 w=0 time=865083 us)
                      (cr=39592 r=39325 w=0 time=10522877 us) (object id 7409)

“In this sample TKPROF output, note the following under the Row Source Operation column:

•cr specifies consistent reads performed by the row source
•r specifies physical reads performed by the row source
•w specifies physical writes performed by the row source
•time specifies time in microseconds”

The original poster in the Usenet thread is running Oracle Database, so the “r=” entry on the STAT= lines in the trace file (the “Row Source Operation” lines in the TKPROF output) indicate the number of blocks that were physically read from disk. With your help, I think that we just found a documentation bug – something that was not updated in the documentation to reflect changes in Oracle’s tracing behavior. I am guessing that the STAT line trace file content changed starting with Oracle 10.1.0.x, but the documentation was never updated to reflect that change.

This is what the Row Source Operation section looks like in Oracle 10.2.0.x:

Rows     Row Source Operation
-------  ---------------------------------------------------
      1  NESTED LOOPS  (cr=7 pr=0 pw=0 time=160 us)
      2   TABLE ACCESS BY INDEX ROWID CURRENCY (cr=2 pr=0 pw=0 time=74 us)
      2    INDEX FULL SCAN SYS_C0018003 (cr=1 pr=0 pw=0 time=42 us)(object id 28289)
      1   TABLE ACCESS BY INDEX ROWID CUST_ORDER_CURR (cr=5 pr=0 pw=0 time=63 us)
      1    INDEX UNIQUE SCAN SYS_C0018028 (cr=4 pr=0 pw=0 time=37 us)(object id 28305)

Centinul answered your second question about how I calculated the average time for a block read. Interestingly, Randolf came up with entirely different average read times by using the “Row Source Operation” section from the TKPROF output. He calculated that the average read time for an index block was 13 ms, while the average read time for a table block was between 3 and 5 ms. Which calculation is more accurate, I do not know.

3 03 2010


One question. How did you come to the conclusion the average time to read a block from disk was 0.0024412 seconds (2.4412ms) ?

3 03 2010

Narendra —

I believe the computation is as follows:

(total elapsed time – time spent on cpu) / # blocks read from disk

(580.31 – 14.56) / 231750 = 0.0024412 seconds / block

3 03 2010
Randolf Geist


I’ve replied to the original thread on Usenet group – there are a couple of noteworthy oddities with this TKPROF output that might explain the issue with the missing 20 minutes.

@Narendra: The PFMQ_MESSAGESTATUS is actually queried for every iteration, since it is a child operation of the inner table of the NESTED LOOP operation. But as you can see from the TKPROF output it does not seem to be relevant in this particular case – it only takes 2 seconds to execute this subquery 64,000 times, so there are other areas to tackle first and no need to replace this subquery with the present execution plan.

See my comments on the original thread for more information.


3 03 2010
Charles Hooper


Thanks for stopping by my blog and sharing your great troubleshooting skills. I did not feel comfortable/confident with using the time= values in the Row Source Operation section of the TKPROF output for analysis purposes because of the apparent inconsistencies between the STAT lines and the aggregation of the PARSE, EXEC, and FETCH lines, but that does not mean that the time= values should not be used.

I seem to recall Jonathan Lewis telling me in the first half of 2008 on the OTN forums that the time= values in the STAT lines could be inaccurate. I do not recall if he stated that the TIME= values could not be trusted because not all of the code paths are instrumented, there are rounding errors when a code path is repeatedly executed in a loop (a NESTED LOOP join), or possibly another reason such as time occasionally being allocated to the wrong operation in the plan. Of course, I have been unable to find that thread on OTN, so I might just be remembering his statement incorrectly.

It is an interesting problem, so I hope that the OP returns to the Usenet thread.

3 03 2010


Thanks. I was wondering if that was somehow based on elapsed time figures of individual plan steps.

25 03 2010
Blogroll Report 26/02/2010 – 05/03/2010 « Coskan’s Approach to Oracle

[…] 11-How to interpret tkprof and execution plan ? Charles Hooper-Query Performance Problem, or is Something Else Happening? […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: