June 27, 2011
I recently received an email asking why different performance is achieved when a FIRST_ROWS hint, FIRST_ROWS(100) hint, and an unhinted version of the query are executed. This seems to be a simple problem, yet it might also be an interesting problem. I thought that it might be helpful to transform my response into a blog article (allowing the input of readers of this blog), rather than provide an email response (sorry, I typically do not respond to email requests). What if (flushing the buffer cache between executions with a single execution of ALTER SYSTEM FLUSH BUFFER_CACHE;):
- The unhinted version of the query completes 2 seconds faster than the FIRST_ROWS(100) hinted version of the query?
- The FIRST_ROWS hinted version of the query completes 22 seconds faster than the unhinted version of the query (thus the FIRST_ROWS hinted version completes 24 seconds faster than then FIRST_ROWS(100) hinted version of the query).
- According to the AUTOTRACE output, the execution plans for the FIRST_ROWS hinted version and the FIRST_ROWS(100) hinted version of the query both perform a descending index range scan on the primary key column for the one table (the only data source) in the query.
- According to the AUTOTRACE output, the unhinted version of the query shows that a full table scan was performed.
- While the FIRST_ROWS(100) hinted execution required longer to complete than the FIRST_ROWS hinted execution, the FIRST_ROWS(100) hinted execution performed 69% as many physical block reads as the FIRST_ROWS hinted execution, while exactly the same number of consistent gets were performed in both cases.
- The queries retrieved 1,042,684 rows with an array fetch size of 15.
- For the FIRST_ROWS(100) and FIRST_ROWS hinted versions of the query, roughly 263.97MB was returned to the client, while only about 163.62MB (62% of the hinted version) was returned in the unhinted version of the query.
The question is how would one explain the results? Here is a slightly modified version of the query, with just the table and column names replaced:
SELECT C1, C2, C3, C4, C5, C6, C7 FROM T1 WHERE C1 < 1042685 ORDER BY C1 DESC;
In the above, note that C1 must be the primary key column… unless of course there are multiple columns in the primary key (or the index displayed in the execution plans really is not the primary key index).
The order of the query executions was not specified, but let’s assume that the OP listed the executions in the same order as executed.
The unhinted execution:
Elapsed: 00:00:53.00 ---------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time ---------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1042K| 255M| | 120K (1)| 00:24:03 | | 1 | SORT ORDER BY | | 1042K| 255M| 313M| 120K (1)| 00:24:03 | |* 2 | TABLE ACCESS FULL| T1 | 1042K| 255M| | 62490 (1)| 00:12:30 | --------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter("C1"<1042685) Statistics ---------------------------------------------------------- 1 recursive calls 1 db block gets 244502 consistent gets 244458 physical reads 2836 redo size 171565915 bytes sent via SQL*Net to client 765155 bytes received via SQL*Net from client 69514 SQL*Net roundtrips to/from client 1 sorts (memory) 0 sorts (disk) 1042684 rows processed
The FIRST_ROWS hinted execution:
Elapsed: 00:00:31.11 --------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1042K| 255M| 189K (1)| 00:37:51 | | 1 | TABLE ACCESS BY INDEX ROWID | T1 | 1042K| 255M| 189K (1)| 00:37:51 | |* 2 | INDEX RANGE SCAN DESCENDING| PK_T1 | 1042K| | 2136 (1)| 00:00:26 | --------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("C1"<1042685) Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 345027 consistent gets 38750 physical reads 0 redo size 276788545 bytes sent via SQL*Net to client 765156 bytes received via SQL*Net from client 69514 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1042684 rows processed
The FIRST_ROWS(100) hinted execution:
Elapsed: 00:00:55.69 Execution Plan ---------------------------------------------------------- Plan hash value: 2367722674 --------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 101 | 25957 | 22 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID | T1 | 1042K| 255M| 22 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN DESCENDING| PK_T1 | 101 | | 3 (0)| 00:00:01 | --------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("C1"<1042685) Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 345027 consistent gets 26788 physical reads 0 redo size 276788545 bytes sent via SQL*Net to client 765156 bytes received via SQL*Net from client 69514 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1042684 rows processed
There are a number of items that we do not know about the OP’s configuration. Just a couple of random thoughts:
- How might we explain why only 62% as many bytes were transferred over the network in the unhinted version of the query? Could it be a case where SQL*Net compression has helped to strip the repeating column values from adjacent rows in the result set prior to transmission over the network. Could it be the case that the rows were ordered differently if column C1 were the only column in the primary key? What if there were multiple columns in the primary key – could that make a difference (note that the C1 < 1042685 predicate in the WHERE clause combined with the 1,042,684 rows returned suggests that column C1 might be the only column in the primary key)? Does SQL*Net compression not work when rows are retrieved by an index access path to avoid a sort operation? This portion of the result is a bit interesting.
- How might we explain the number 2836 next to redo size in the unhinted execution? Might this be a case of Delayed Block Cleanout? Could that number also partially explain the execution time for that version of the query?
- How might we explain the decrease in the number of physical block reads in the FIRST_ROWS(100) hinted execution compared to the FIRST_ROWS hinted execution? Assuming that the executions were performed in the above order, what if the ALTER SYSTEM FLUSH BUFFER_CACHE; command did not actually flush all of the blocks from the buffer cache – is that possible? Could undo blocks for a consistent read have an impact?
- The operations displayed in the execution plans for the two hinted queries are identical, yet the execution times are significantly different. How might we explain the time difference?
- — Do we know for certain that the runtime execution engine used the execution plan that was displayed?
- — Is it possible that one of the queries was executed with SET AUTOTRACE ON specified in SQL*Plus while the fast execution was performed with SET AUTOTRACE ON STATISTICS specified in SQL*Plus (thus entirely avoiding the client-side formatting of the output)?
- — Is it possible that SET AUTOTRACE ON was specified in both cases, yet the SQL*Plus window was minimized during the FIRST_ROWS hinted execution (thus avoiding the client-side screen refreshes)?
- — Is it possible that there was less network congestion during the FIRST_ROWS hinted execution?
- — Is it possible that the client computer’s CPU (or the server’s CPU) was less busy with other tasks during the FIRST_ROWS hinted execution?
- — Is it possible that operating system caching is having an effect – if the entire execution time was spent in single block read wait events, the average single block read time for the FIRST_ROWS hinted execution was roughly 0.00080 seconds, while the average single block read time for the FIRST_ROWS(100) hinted execution was roughly 0.00208 seconds.
- — Note that the AUTOTRACE output indicates that one recursive call was performed per execution – is it possible that the time difference and the difference in the number of physical block reads could be attributed to the recursive call?
What other items might explain the time difference in the three executions? What advice would you provide to the OP? Should we answer the original question using 10 more questions?
Charles,
interesting questions – to which I have no answers. The minimized window seems to me a plausible option – but I think it’s almost impossible to solve this riddle based on the given data.
I was a little bit surprised to see the big differences in costing for FIRST_ROWS and FIRST_ROWS(100) – it seems that FIRST_ROWS has no impact on the cost, but FIRST_ROWS(100) reduces the cost to approximately (100/number_of_rows_in_table) * initial_cost. With a simple test table I get:
With Hints I get the following cost results:
– first_rows: 780 (same as without a hint)
– first_rows(1): 2
– first_rows(1000): 3
– first_rows(10000): 9
– first_rows(100000): 80
– first_rows(500000): 391
After I made this simple test I found your article https://hoopercharles.wordpress.com/2011/03/10/what-is-the-difference-between-the-first_rows-hint-and-rownum-in-the-where-clause/ – seems that I missed that one…
Regards
Martin
Martin,
Thanks for noticing and pointing out the costing difference between the FIRST_ROWS and FIRST_ROWS(100) – sorry for the delayed response (and thanks for finding the related article). I repeated your test case and obtained similar results on Oracle Database 11.2.0.2:
Then I went through the sequence of FIRST_ROW(n) hints:
These are my results:
– first_rows: 427 (same as without a hint)
– first_rows(1): 2
– first_rows(1000): 2
– first_rows(10000): 6
– first_rows(100000): 44
– first_rows(500000): 216
Now let’s add an index into the test case to see what happens:
Making a small modification to add a WHERE clause:
Interestingly, the cost of the unhinted plan increased by 2 after adding the index – a minor change, but still something that is interesting.
Let’s go through the various FIRST_ROWS and FIRST_ROWS(n) hints:
– first_rows: 1119 (more than twice without a hint)
– first_rows(1): 3
– first_rows(1000): 5
– first_rows(10000): 25
– first_rows(100000): 227
– first_rows(500000): 429 (full table scan)
Note in the above that there is a bit of logic in the FIRST_ROWS(n) optimizer mode, the optimizer switched from an index range scan to a full table scan when it sensed that we *promised* to retrieve all rows in the result set. It *almost* appears that the optimizer switched based on the calculated cost of the access path.
Let’s try again, bumping up the 500,000 number to 1,000,000:
The above suggests that the FIRST_ROWS hint calculates the cost such that we will work all the way through the data, yet the optimizer selects the index access path even though the WHERE clause tells the optimizer that according to the statistics we will be taking all rows in the table.
– first_rows: 2234 (more than twice without a hint)
– first_rows(1): 3
– first_rows(1000): 5
– first_rows(10000): 25
– first_rows(100000): 226
– first_rows(500000): 1118 (still an index range scan)
– first_rows(1000000): 429 (full table scan)
The FIRST_ROWS(n) optimizer mode sensed that we *promised* to take only the specified number of rows. An interesting thing happened when 500000 was specified in the hint – the optimizer did not switch to a full table scan based on a cost comparison (maybe that only happens when the index contains all of the data, thus avoiding the need to directly access the table?).
Just a quick experiment:
As shown by the above, the tipping point in this test case appears to be right at the point where the hint’s n value matches the expected number of rows to be returned.
Charles,
thank you for providing the interesting details: obviously the topic is much more difficult than I thougth … – even for FIRST_ROWS.
I published a sligthly expanded version of my test in my (german) blog and got a comment from (your apress co-author) Randolf Geist, who explained that the costing for FIRST_ROWS and FIRST_ROWS(n) is quite complex and added a link to his UKOUG 2009 (130 page) presentation “Everything You Wanted To Know About FIRST_ROWS_n But Were Afraid To Ask”: http://www.sqltools-plusplus.org:7676/media/FIRST_ROWS_n%20presentation%20UKOUG%202009%20notes.pdf. It’s amazing what you can find out about Oracle and the cbo if you look close enough.
Regards
Martin
Randolf Geist is quite knowledgeable about this topic – I suspect that he probably could have covered the topic in less than 100 slides in his presentation (I have only made it about 1/3 the way through the presentation slides this time). 🙂 Thanks for providing another useful link.
Considering the following in the OP’s email:
* The execution plans were (apparently, but not necessarily) the same
* The FIRST_ROWS hinted version of the query completed 24 seconds faster (1.79 times faster) than then FIRST_ROWS(100) hinted version of the query
* The calculated cost of FIRST_ROWS hinted version of the query is 8591 times greater than the calculated cost of the FIRST_ROWS(100) hinted version of the query
I think that it is safe to say that the FIRST_ROWS hint (and optimizer mode) does not cause something magic to happen in the database instance, where 8591 times as much work is accomplished in 55.86% of the amount of time – just in case someone was confused, and attempting to tune by the calculated cost that appears in execution plans.