Simple Query Generates Complex Execution Plan, the Mysterious 4063.88 Second Single Block Read Wait

22 01 2010

January 22, 2010

There was an interesting thread on Oracle’s OTN forums in April 2008:
http://forums.oracle.com/forums/thread.jspa?threadID=642641

In the thread, the original poster wrote:

I have 4 tables. I wrote one query as follows. But it’s taking 45 mins for executing. Can you please write a query for this to improve performance. The tables are as follows:

Case

Item

Master

Warehouse

The (table) columns are:
Case: Item, Supply_indicator, Country_Indicator, item_size
Item: Item, location, location_type, ondate, offdate, status, create_date
Master: item, status,  pack_indicator, Item_level(either 1 or 0), Trns_level(either 1 or 0), create_date, Foreind (either Y or N)
Warehouse: Warehose_no, Name, Address

The OP posted the SQL statement and a form of an execution plan:

Comparing the query with the explan plan:

SELECT
  IM.LOCATION,
  IM.ITEM,
  CS.CASE_SIZE,
  IM.ONDATE,
  IM.OFFDATE,
  MAS.STATUS
FROM
  CASE_SIZE CS,
  ITEM IM,
  MASTER MAS,
  WAREHOUSE WH
WHERE
  MAS.PACK_IND = 'N'
  AND MAS.ITEM_LEVEL = MAS.TRNS_LEVEL
  AND MAS.STATUS = 'A'
  AND MAS.FOREIND = 'Y'
  AND MAS.ITEM = IM.ITEM
  AND IM.LOCATION_TYPE = 'S'
  AND MAS.ITEM = CS.ITEM
  AND CS.SUPPLY_INDICATOR = 'Y'
  AND CS.COUNTRY_INDICATOR = 'Y'
  AND IM.LOCATION =WH.WAREHOSE_NO
  AND NVL(WH.CLOSE_DATE,'04-APR-9999')>=TO_DATE(&VERSDATE}, 'YYYYMMDD')

Execution Plan
--------------------------------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=15085 Card=1139512 Bytes=134462416)
   1    0   HASH JOIN (Cost=15085 Card=1139512 Bytes=134462416)
   2    1     HASH JOIN (Cost=580 Card=30772 Bytes=1569372)
   3    2       TABLE ACCESS (FULL) OF 'MASTER' (Cost=63 Card=1306 Bytes=18284)
   4    2       TABLE ACCESS (BY GLOBAL INDEX ROWID) OF 'ITEM' (Cost=2 Card=105311 Bytes=3369952)
   5    4         NESTED LOOPS (Cost=86 Card=4296685 Bytes=158977345)
   6    5           TABLE ACCESS (FULL) OF 'WAREHOUSE' (Cost=4 Card=41 Bytes=205)
   7    5           INDEX (RANGE SCAN) OF 'PK_LOCATION' (UNIQUE) (Cost=1 Card=210622)
   8    1     VIEW (Cost=14271 Card=48399 Bytes=3242733)
   9    8       SORT (UNIQUE) (Cost=14271 Card=48399 Bytes=6098274)
  10    9         HASH JOIN (Cost=992 Card=187614 Bytes=23639364)
  11   10           HASH JOIN (Cost=186 Card=7449 Bytes=581022)
  12   11             TABLE ACCESS (FULL) OF 'MASTER' (Cost=63 Card=10451 Bytes=156765)
  13   11             HASH JOIN (Cost=105 Card=12489 Bytes=786807)
  14   13               MERGE JOIN (CARTESIAN) (Cost=40 Card=12489 Bytes=549516)
  15   14                 MERGE JOIN (CARTESIAN) (Cost=6 Card=1 Bytes=13)
  16   15                   TABLE ACCESS (FULL) OF 'SYSTEM' (Cost=3 Card=1 Bytes=3)
  17   15                   BUFFER (SORT) (Cost=3 Card=1 Bytes=10)
  18   17                     TABLE ACCESS (FULL) OF 'SYSTEM' (Cost=3 Card=1 Bytes=10)
  19   14                 BUFFER (SORT) (Cost=37 Card=12489 Bytes=387159)
  20   19                   TABLE ACCESS (FULL) OF 'ITEM_SUPPLIER_COUNTRY' (Cost=34 Card=12489 Bytes=387159)
  21   13               TABLE ACCESS (FULL) OF 'SUPPLIER' (Cost=28 Card=24989 Bytes=474791)
  22   10           VIEW (Cost=536 Card=172449 Bytes=8277552)
  23   22             UNION-ALL
  24   23               INDEX (FAST FULL SCAN) OF 'PK_ITEM_SUPPLIER_COUNTRY' (UNIQUE) (Cost=11 Card=24978 Bytes=324714)
  25   23               TABLE ACCESS (FULL) OF 'ITEM_SUPPLIER_COUNTRY' (Cost=34 Card=24978 Bytes=399648)
  26   23               TABLE ACCESS (FULL) OF 'ITEM_SUPPLIER_COUNTRY' (Cost=34 Card=24978 Bytes=374670)
  27   23               TABLE ACCESS (FULL) OF 'ITEM_SUPPLIER_COUNTRY' (Cost=34 Card=24978 Bytes=499560)
  28   23               VIEW (Cost=141 Card=24179 Bytes=1039697)
  29   28                 SORT (UNIQUE) (Cost=141 Card=24179 Bytes=507759)
  30   29                   INDEX (FAST FULL SCAN) OF 'PK_CASE_UPDATES' (UNIQUE) (Cost=14 Card=24179 Bytes=507759)
  31   23               VIEW (Cost=141 Card=24179 Bytes=1039697)
  32   31                 SORT (UNIQUE) (Cost=141 Card=24179 Bytes=507759)
  33   32                   INDEX (FAST FULL SCAN) OF 'PK_CASE_UPDATES' (UNIQUE) (Cost=14 Card=24179 Bytes=507759)
  34   23               VIEW (Cost=141 Card=24179 Bytes=1039697)
  35   34                 SORT (UNIQUE) (Cost=141 Card=24179 Bytes=507759)
  36   35                   INDEX (FAST FULL SCAN) OF 'PK_CASE_UPDATES' (UNIQUE) (Cost=14 Card=24179 Bytes=507759)

So, how could a query of 4 tables produce the above execution plan, and is there anything wrong with the plan?

The OP also generated a 10046 trace and processed that trace file with TKPROF. The TKPROF output for this SQL statement is as follows:

(select statement)
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.32       0.29          0        593          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch   819704    874.76    1529.15    3879188   12619996          5    12295541
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total   819706    875.08    1529.45    3879188   12620589          5    12295541

Misses in library cache during parse: 1
Optimizer goal: ALL_ROWS
Parsing user id: 315  (DEVEL)

 Rows     Row Source Operation
 -------  ---------------------------------------------------
12295541  HASH JOIN  (cr=12619996 r=3879188 w=2408 time=1503903336 us)
  212315   VIEW  (cr=7553336 r=2408 w=2408 time=82297538 us)
(as per my prevous post)

Rows     Execution Plan
-------  ---------------------------------------------------
(as per  my previous post)

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                  819704        0.00          0.45
  direct path write                               4        0.00          0.00
  direct path read                               51        0.00          0.07
  db file sequential read                     94884        0.41         60.91
  db file scattered read                     222696     2302.70       4675.10
  SQL*Net message from client                819704     3672.87       6299.98
  latch free                                      2        0.00          0.00

How could the query of the 4 tables cause TKPROF to output the above information, and is there anything wrong with the information provided? To get you started, what would cause a 2,302.70 second wait for a multi-block read?

The OP also posted the following portion of a 10046 trace:

WAIT #3: nam='SQL*Net message to client' ela= 0 p1=1650815232 p2=1 p3=0
FETCH #3:c=0,e=131,p=0,cr=1,cu=0,mis=0,r=15,dep=0,og=4,tim=1180445875641857
WAIT #3: nam='SQL*Net message from client' ela= 704 p1=1650815232 p2=1 p3=0
WAIT #3: nam='SQL*Net message to client' ela= 1 p1=1650815232 p2=1 p3=0
FETCH #3:c=0,e=140,p=0,cr=1,cu=0,mis=0,r=15,dep=0,og=4,tim=1180445875642773
WAIT #3: nam='SQL*Net message from client' ela= 668 p1=1650815232 p2=1 p3=0
WAIT #3: nam='SQL*Net message to client' ela= 1 p1=1650815232 p2=1 p3=0
FETCH #3:c=0,e=140,p=0,cr=2,cu=0,mis=0,r=15,dep=0,og=4,tim=1180445875643631
WAIT #3: nam='SQL*Net message from client' ela= 4179109066 p1=1650815232 p2=1 p3=0
WAIT #3: nam='SQL*Net message to client' ela= 0 p1=1650815232 p2=1 p3=0
FETCH #3:c=10000,e=144,p=0,cr=1,cu=0,mis=0,r=15,dep=0,og=4,tim=1180445875643757
WAIT #3: nam='SQL*Net message from client' ela= 660 p1=1650815232 p2=1 p3=0
WAIT #3: nam='SQL*Net message to client' ela= 0 p1=1650815232 p2=1 p3=0
FETCH #3:c=0,e=128,p=0,cr=1,cu=0,mis=0,r=15,dep=0,og=4,tim=1180445875644594

Anything wrong in the above?

What if we try real hard to analyze the above trace file snippet, anything wrong? I will cheat and use my Toy Project for Performance Tuning to generate the following output.

Fetch at 1180445875.641860 (Parse to Fetch 1180445875.641860),CPU Time 0.000000,Elapsed Time 0.000131,Rows Retrievd 15,Blks from Buff 1,Blks from Disk 0
     0.000704   SQL*Net message from client
     0.000001   SQL*Net message to client
Fetch at 1180445875.642770 (Parse to Fetch 1180445875.642770),CPU Time 0.000000,Elapsed Time 0.000140,Rows Retrievd 15,Blks from Buff 1,Blks from Disk 0
     0.000668   SQL*Net message from client
     0.000001   SQL*Net message to client
Fetch at 1180445875.643630 (Parse to Fetch 1180445875.643630),CPU Time 0.000000,Elapsed Time 0.000140,Rows Retrievd 15,Blks from Buff 2,Blks from Disk 0
  4179.109066   SQL*Net message from client
     0.000000   SQL*Net message to client
Fetch at 1180445875.643760 (Parse to Fetch 1180445875.643760),CPU Time 0.010000,Elapsed Time 0.000144,Rows Retrievd 15,Blks from Buff 1,Blks from Disk 0
     0.000660   SQL*Net message from client
     0.000000   SQL*Net message to client
Fetch at 1180445875.644590 (Parse to Fetch 1180445875.644590),CPU Time 0.000000,Elapsed Time 0.000128,Rows Retrievd 15,Blks from Buff 1,Blks from Disk 0

Don’t worry too much if the above does not make too much sense. At 1180445875.641860 seconds there was a fetch call that required 0 CPU seconds, 0.000131 elapsed seconds, 15 rows were retrieved, there was one consistent get, and 0 physical reads. The next fetch happened 0.00091 (5.642770 – 5.641860) seconds later with two short duration wait events between the two fetch calls (0.000704 seconds on SQL*Net message from client and 0.000001 seconds on SQL*Net message to client). Take a closer look at the waits between the fetch call at 1180445875.643630 seconds and the fetch call at 1180445875.643760 seconds (0.00013 later).

The OP also posted the following odd snippet from a 10046 trace:

WAIT #3: nam='db file sequential read' ela= 260 p1=354 p2=128875 p3=1
WAIT #3: nam='db file sequential read' ela= 180 p1=354 p2=128878 p3=1
WAIT #3: nam='db file sequential read' ela= 179 p1=354 p2=128877 p3=1
WAIT #3: nam='db file sequential read' ela= 178 p1=354 p2=128880 p3=1
WAIT #3: nam='db file sequential read' ela= 192 p1=354 p2=128879 p3=1
WAIT #3: nam='db file sequential read' ela= 197 p1=354 p2=128882 p3=1
WAIT #3: nam='db file sequential read' ela= 192 p1=354 p2=128881 p3=1
WAIT #3: nam='db file sequential read' ela= 4063882583 p1=354 p2=128884 p3=1
WAIT #3: nam='db file sequential read' ela= 194 p1=354 p2=128883 p3=1
WAIT #3: nam='db file sequential read' ela= 180 p1=354 p2=128885 p3=1
WAIT #3: nam='db file sequential read' ela= 192 p1=354 p2=128887 p3=1
WAIT #3: nam='db file sequential read' ela= 176 p1=354 p2=128886 p3=1
WAIT #3: nam='db file sequential read' ela= 179 p1=354 p2=128889 p3=1
WAIT #3: nam='db file sequential read' ela= 186 p1=354 p2=128888 p3=1

So, what is odd about the above? Is there an explanation?

The thread contains other interesting details, but it does not look like the OP ever received an explanation for why his query required 45 minutes to execute. My money is still on a faulty RAID 10 floppy array.

Actions

Information

Date : January 22, 2010
Categories : Performance

6 responses

22 01 2010: Narendra (18:35:50) :

The only reasons that are preventing me from saying “This is PLAIN WRONG” are
1) It is ORACLE after all ; no shortage of strange bugs
2) Your last statement (the concepts of which are Latin to me 🙂 )

I have not read the forum post but I find it almost impossible to believe that the execution plan is of the query which is based on 4 “tables”. I do not think all of them are tables. At least, MASTER and CASE (or CASE_SIZE) look like (non-mergeable) views or materialized views. How else do you justify the presense of SYSTEM (steps 16 to 18), ITEM_SUPPLIER_COUNTRY(Steps 20, 24 to 27) ?
Also, the plan from step 22 onwards looks impossible with plain tables.

Reply
22 01 2010: Charles Hooper (20:57:41) :

When you read the thread, you will find that the claim of there being 4 tables named Case, Item, Master, and Warehouse was quickly determined to be incorrect – for the very reasons that you identified in your comment.

The RAID 10 floppy array was a joke that I wrote in the thread as an explanation for the nearly 68 minute wait for the single block read (DBA 1 to DBA2: “quick, find and mount floppy #1765799 so that the query will finish executing”).

Considering the nearly 68 minute wait for the single block read written to the trace file, how could the OP in the thread explain the 45 minute query execution time. Maybe the server was stuck in a time warp? Maybe an accuracy warp? Maybe someone was constantly re-syncing the clock on the server?

Reply
23 01 2010: Narendra (06:24:42) :

Charles,

After posting my comment, I read the thread. Apart from technical details, I must say how much I appreciate you and Jonathan for your persistence. If what OP mentioned in the thread is true, I believe his(or her) organization was sitting on a time bomb. 🙂 First, OP hardly knew anything about oracle and he (or she) claimed they do not have a DBA. Now coming to the problem, I would say the biggest root cause is poor design and/or poor capacity planning. The query is retrieving over 12 million records with 15 records at a time. For what ? Also, looking at the way OP was providing trace details, it appears he (or she) was executing some kind of procedure (from SQL*Plus) that involved many DML statements, before executing this query. It might be possible that OP was using some kind of logic to populate/update table data before querying it, which may result in stale statistics. Now, you did mention that OP might be using what appear to be some standard softwares but OP did not confirm the same. And as Hemant suggested (I guess), if the underlying tables were being updated simultaneously, I must say somebody has messed up the project big time.
I think this thread is a perfect example of how NOT to ask questions and how one can spend endless time, in vain, resolving performance issues due to lack of vital information.

Reply
24 01 2010: Gary (05:56:49) :

“the nearly 68 minute wait for the single block read “.
Haven’t gone through the thread in exact detail, but it appears the initial report of 45 minutes was the first ‘problem’ run and the trace was on a subsequent execution, which could have taken much longer.
Reading the thread, it appears the slow read was on a block in an UNDO tablespace.

Didn’t see any exact description of hardware, but I wonder if there is anything that would case a process to be completely de-prioritised /suspended by the OS.

Reply
24 01 2010: Charles Hooper (09:15:03) :

Narendra, I think that your final sentence summarizes the entire thread. The OP begins the thread with inaccurate information, and that causes confusion and inconsistencies that are brought up later in the thread. But, in support of the OP, at times it is very difficult to determine what information may be relevant to solving a particular problem. This is the case regardless of whether that problem is “where did I leave my car keys” or “why is it taking a long time to retrieve 12 million rows from the database when I am fetching 15 rows at a time”. I suspect that in many cases prior experience helps a lot when trying to determine what information needs to be provided. Suppose that the DBA on site (or possibly a consultant in the case of the OP) stated quite simply “I have been a DBA for 25 years and have never needed to look at a raw 10046 trace file – just pass the file through TKPROF, or use explain plan.” In such a case, prior experience may make it very difficult to find any answers. Jonathan definitely demonstrated considerable skill in that thread, and I think that I managed to extract a little bit of information from the OP to find some of the performance issues.

Gary, you make a good point that enabling a 10046 trace could have had an impact on the execution time. The OP’s disk sub-system seemed to be heavily loaded, causing long waits for disk reads, and the writing of the 10046 trace file could have further complicated the disk performance problem. I had not thought about the OS suspending the process as a possible cause, but as I understand it, the session checks the server’s current time, and then makes a disk read request – at that point operating system kernel level code is responsible for fetching the block – once the fetch completes the process again checks the server’s current time to calculate the elapsed time of the disk read. In the same response where I made the joke about the RAID 10 floppy array, I stated the following:

“There is a possible explanation for the 4179109066us time for the SQL*Net message from client wait and the 4063882583us time for the db file sequential read. The clue is found on page 148 of “Optimizing Oracle Performance” in the pseudo-code for procedure wevent. What if the value of ela1 is less than ela0 due to a timer resolution problem with gettimeofday? If I recall correctly, I believe that it is possible that the operating system does not check the hardware clock on every time call (on some platforms), so it is theoretically possible that the operating system occasionally corrects its “software” clock with the hardware clock (on some Windows systems, this drift may not be corrected until a reboot is performed, or until the time is synchronized with the domain).

This is just a guess, but a little mathematics with powers of 2, assuming that the OP is using a 32 bit system [note that if the software clock time drift was significant, the long duration wait time reported may be caused by an overflow condition that was caused by a negative value in the unsigned integer value):
4294967296us = 2^32us = 32 bit unsigned integer max value
4179109066us = OP’s long SQL*Net time
4063882583us = OP’s long single block read time

Browsing through Metalink finds a couple documents that may be of interest (yes these are old, but there may be others), that might identify other causes:
Bug No. 777326 USING POST WAIT THE CPU TIME STATISTICS ARE COMPLETELY WRONG
Bug No. 1704769 V$SESSION_WAIT.SECONDS_IN_WAIT SHOWS IMPOSSIBLY HIGH VALUES
Bug No. 2717586 UNREALISTIC HUGE VALUES IN V$SYSTEM_EVENT AND V$SESSION_EVENT

It is, however, odd that the tim= values were not thrown off in the same manner in that portion of the trace file.”

With the above in mind, the actual wait time may have been very short, but the server’s current time at the end of the wait may have been earlier than the server’s current time at the start of the wait if the server re-synchronized its software clock with the hardware clock (or some external time source) between the time the disk read started and the disk read completed.

Reply
29 01 2010: Log Buffer #176: a Carnival of the Vanities for DBAs | The Pythian Blog (14:11:38) :

[…] From Charles Hooper comes this investigation: Simple Query Generates Complex Execution Plan, the Mysterious 4063.88 Second Single Block Read Wait. […]

Reply

	Charles Hooper on Favorite Free Applications to…
	Zach on Favorite Free Applications to…
	Charles Hooper on Favorite Free Applications to…
	Zach on Favorite Free Applications to…
	Charles Hooper on Huge Presentation – Work…
	Dan on Huge Presentation – Work…
	Charles Hooper on Review of Synology DS415+, How…
	Matthias Clausen on Review of Synology DS415+, How…
	Charles Hooper on _SMALL_TABLE_THRESHOLD Paramet…
	mohd atif on _SMALL_TABLE_THRESHOLD Paramet…
	Charles Hooper on Free ANSI SQL to Oracle Specif…
	Kalin Agrawal on Free ANSI SQL to Oracle Specif…
	Charles Hooper on Large Page Support or Not…
	vaurob on Large Page Support or Not…
	Locating Oracle… on Finding a New Home for a Clien…

Charles Hooper's Oracle Notes