January 10, 2011
An interesting hypothetical question was posed on the OTN forums yesterday. Consider this situation. You are using Oracle Database 11.2.0.2 and you find a query that is executing much slower than expected, typically requiring 10 to 15 minutes to execute. You add a RULE hint to the query and find that the query completes in just a couple of seconds. How would you do to address this issue?
—-
To help your thought process, consider the following test case that I included in the OTN thread. The table definition:
CREATE TABLE T5 ( C1 NUMBER, C2 NUMBER, C3 NUMBER, C4 NUMBER, C5 NUMBER, C6 NUMBER, C7 NUMBER, C8 NUMBER, C9 VARCHAR2(50), C10 VARCHAR2(50)); INSERT INTO T5 SELECT ROWNUM, ROWNUM, MOD(ROWNUM,50), MOD(ROWNUM,40), MOD(ROWNUM,100), ROUND(ROWNUM/1000), ROUND(ROWNUM/950), ROUND(ROWNUM/600), RPAD(CHR(65 + MOD(ROWNUM-1,26)),50,CHR(65 + MOD(ROWNUM-1,26))), RPAD(CHR(65 + MOD(ROWNUM-1,26)),50,CHR(65 + MOD(ROWNUM-1,26))) FROM DUAL CONNECT BY LEVEL<=1000000; COMMIT; CREATE INDEX IND_T5_C10_F ON T5(LOWER(C10)); EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T5',CASCADE=>TRUE)
Now, let’s put together a test case script that uses the above table:
ALTER SYSTEM FLUSH BUFFER_CACHE; ALTER SYSTEM FLUSH BUFFER_CACHE; SET TIMING ON SET AUTOTRACE TRACEONLY STATISTICS EXPLAIN SET ARRAYSIZE 1000 SELECT * FROM T5 WHERE LOWER(C10)=LPAD('a',50,'a'); SELECT /*+ RULE */ * FROM T5 WHERE LOWER(C10)=LPAD('a',50,'a'); SELECT /*+ __FAST=TRUE */ * FROM T5 WHERE LOWER(C10)=LPAD('a',50,'a'); SELECT /*+ INDEX(T5 IND_T5_C10_F) */ * FROM T5 WHERE LOWER(C10)=LPAD('a',50,'a');
In the above, we have an unhinted query, the same query with a RULE hint, the same query with the __FAST=TRUE hint, and the same query with an index hint. Which query will execute the fastest, and why? Let’s execute the test case script to find out (note that your results could be very different from my results):
SQL> SELECT 2 * 3 FROM 4 T5 5 WHERE 6 LOWER(C10)=LPAD('a',50,'a'); 38462 rows selected. Elapsed: 00:00:00.57 Execution Plan ---------------------------------------------------------- Plan hash value: 2002323537 -------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 38462 | 6911K| 5247 (1)| 00:01:03 | |* 1 | TABLE ACCESS FULL| T5 | 38462 | 6911K| 5247 (1)| 00:01:03 | -------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter(LOWER("C10")='aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaa') Statistics ---------------------------------------------------------- 99 recursive calls 0 db block gets 19441 consistent gets 19295 physical reads 0 redo size 1178619 bytes sent via SQL*Net to client 937 bytes received via SQL*Net from client 40 SQL*Net roundtrips to/from client 10 sorts (memory) 0 sorts (disk) 38462 rows processed
The unhinted version completed in 0.57 seconds, and used a full table scan (your execution time may be a bit longer).
—-
SQL> SELECT /*+ RULE */ 2 * 3 FROM 4 T5 5 WHERE 6 LOWER(C10)=LPAD('a',50,'a'); 38462 rows selected. Elapsed: 00:00:00.31 Execution Plan ---------------------------------------------------------- Plan hash value: 2002323537 ---------------------------------- | Id | Operation | Name | ---------------------------------- | 0 | SELECT STATEMENT | | |* 1 | TABLE ACCESS FULL| T5 | ---------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter(LOWER("C10")='aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaa') Note ----- - rule based optimizer used (consider using cbo) Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 19327 consistent gets 0 physical reads 0 redo size 1178619 bytes sent via SQL*Net to client 937 bytes received via SQL*Net from client 40 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 38462 rows processed
The version that was hinted to use the RULE based optimizer completed in 0.31 seconds (nearly twice as fast as the unhinted version), and also used a full table scan.
—-
SQL> SELECT /*+ __FAST=TRUE */ 2 * 3 FROM 4 T5 5 WHERE 6 LOWER(C10)=LPAD('a',50,'a'); 38462 rows selected. Elapsed: 00:00:00.28 Execution Plan ---------------------------------------------------------- Plan hash value: 2002323537 -------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 38462 | 6911K| 5247 (1)| 00:01:03 | |* 1 | TABLE ACCESS FULL| T5 | 38462 | 6911K| 5247 (1)| 00:01:03 | -------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter(LOWER("C10")='aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaa') Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 19326 consistent gets 0 physical reads 0 redo size 1178619 bytes sent via SQL*Net to client 937 bytes received via SQL*Net from client 40 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 38462 rows processed
The version hinted with __FAST=TRUE (a completely fake hint) completed in 0.28 seconds (0.03 seconds faster than the RULE hinted version) and also used a full table scan.
—-
SQL> SELECT /*+ INDEX(T5 IND_T5_C10_F) */ 2 * 3 FROM 4 T5 5 WHERE 6 LOWER(C10)=LPAD('a',50,'a'); 38462 rows selected. Elapsed: 00:00:00.17 Execution Plan ---------------------------------------------------------- Plan hash value: 1769636183 -------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 38462 | 6911K| 20458 (1)| 00:04:06 | | 1 | TABLE ACCESS BY INDEX ROWID| T5 | 38462 | 6911K| 20458 (1)| 00:04:06 | |* 2 | INDEX RANGE SCAN | IND_T5_C10_F | 40009 | | 350 (0)| 00:00:05 | -------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access(LOWER("C10")='aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa') Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 19576 consistent gets 352 physical reads 0 redo size 1178628 bytes sent via SQL*Net to client 937 bytes received via SQL*Net from client 40 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 38462 rows processed
The index hinted version of the query completed in 0.17 seconds, which is about twice as fast as the query with the RULE hint and the unhinted version.
—-
With this knowledge, certain questions might come to mind:
- Why was the first query slower than the third query?
- Why did the optimizer not automatically select the index when that access path was faster? What do I need to check to see why the index was not selected?
- Why didn’t the RULE based optimizer select the index access path?
- How might the test script results change if we were to replace the number 50 in the script with either the number 20 or the number 10?
- How might the test script results change if someone had altered the OPTIMIZER_INDEX_COST_ADJ parameter?
—-
The hypothetical question was essentially very simple. How would you guide Oracle’s optimizer to find the optimal execution path?
—-
Edit January 10, 2011: Note that the phrase “How would you do to address this issue” in the initial paragraph is intentionally left undefined, and left for your interpretation. “This issue” could very well have multiple intended meanings, depending on how you read the paragraph.
Charles,
Why on earth do you flush buffer_cache twice? I know about the habit to ‘sync’ os filesystems twice, but that’s something totally different.
1) in the first you have 19295 physical reads, in the 3rd zero (the same to the 2nd).
2) as I did not reproduce your test-case I’m not 100% sure. Based on your question 4) I assume dbms_stats does not store valid low and high value for such long strings (I know about this behaviour in histograms, but again just guess on general statistics) and therefore the optimiser has not good enough informations. Or are there histograms? have to check default_method_opt for 11.2.0.2 and what it means for FBIs and histograms.
3) RULE just does not know what a FBI is. In this case I assumed, the optimiser would just ignore the RULE hint and do the best it can (ALL_ROWS, isn’t it?). Or does it just ignore all objects it does not know until it finishes or stuck (and then ‘fall back to ALL_ROWS’)? => check 10053 to verify that.
4) next guess: as a) you used only single byte characters in C10 and b) the max hist length of a string is approx 15 byte and c) you can not use AL16UTF16 as default character set => with “10” the histograms will be useful and therefore the optimiser seeded with better data.
5) I do not do such cruel things. 😉
Do you thing the 0.03 performance increase of RULE against __FAST=TRUE is just random or any additional performance gain?
To answer “How would you do to address this issue” – it depends on the person who brings it to me! I do not need to discuss the status of RULE with you, also about caches, physical reads, whatever. With others I might have to prepare a good foundation before going into discussions.
The other question: “How would you guide Oracle’s optimizer to find the optimal execution path?” I would try to set the statistics manually so the optimiser have good informations about selectivity of C10.
Martin
Why flush the buffer cache twice? I do not remember the exact reason, but I believe that a trustworthy source stated that one or more buffers could be “pinned” by a session, and therefore could not be flushed from the buffer cache on the first try. The second flush attempts to flush those blocks in memory that were previously pinned by sessions.
Martin,
1) The performance improvement caused by the addition of the RULE hint could just be an illusion, for the reason that you identified.
2) Histograms – I had not even considered histograms. I am guessing that the optimizer did not collect histograms in this case, but the presence of histograms might very well apply to other cases.
3) It is true that the RULE based optimizer does not know what to do with a function based index, although Martin Preiss (in a comment below) found that the RULE based optimizer in Oracle Database 11.1.0.7 *will* use the function based index. So far I have tested Oracle Database 10.2.0.2, 10.2.0.5, 11.1.0.7, 11.2.0.1, and 11.2.0.2 – only 11.1.0.7 used the function based index with the RULE based optimizer. If I set a default parallel degree for the table the cost based (ALL_ROWS) optimizer will be used when the RULE hint is specified.
4) Another very good point. I used a single byte characterset – while it probably would not make a difference in this test case, it might make a difference in production system performance issues.
5) Nice answer. OK, then set the MBRC system statistic to a very low value, such as 1 or 2 (then be certain to change it back afterward).
If I re-executed the script, I might see different results. The result with the “__FAST=TRUE” hint, I believe, was just a lucky coincidence.
Good idea to work with the statistics manually. Any other ideas?
Charles,
when I saw your example, I thought: “physical reads” – and that the rbo would not be able to use a function based index. But then I made the test in 11.1.0.7 and got a strange result:
My first idea was: perhaps autotrace is lying – what says SQL_TRACE:
So I guess that the rbo can use a function based index and indeed Jonathan Lewis brings an example on his website: http://www.jlcomp.demon.co.uk/no_fbi.html. He shows there that the rbo can use an index, if the first column in it is not virtual – but he also explains: “that if the first column is a virtual column the rule-based optimizer won’t use the index.” So I still have no fitting explanation for my result.
For the sake of completeness I also made the test on 11.2 and there I got the expected FTS.
Regards
Martin
Martin,
That is a very interesting result that you found on 11.1.0.7! Tom Kyte’s book say that the RULE based optimizer cannot use function based (single column) indexes:
http://books.google.com/books?id=8_WChqD3nc4C&pg=PA288#v=onepage&q&f=false
http://books.google.com/books?id=TmPoYfpeJAUC&pg=PA456#v=onepage&q&f=false
http://books.google.com/books?id=mV6b43HuEAkC&pg=PA461#v=onepage&q&f=false
I might have to do some testing to see if I can find the cause for the use of the index. I wonder if it is possible that the cost-based optimizer was actually used even though the rule based optimizer note appeared (I assume that this is would be considered a bug).
Martin,
Nice catch. I can confirm that in 10.2.0.5, 11.2.0.1, and 11.2.0.2 the RULE based optimizer cannot use the IND_T5_C10_F function based index, while the RULE based optimizer in 11.1.0.7 is able to use the function based index (this happens when the RULE optimizer is hinted and also when the OPTIMIZER_MODE is set to RULE at the session level.
From a 10053 trace we see something slightly different in the predicate information section:
Note in the above that the RULE based optimizer was able to use the hidden column used by the function based index:
The 10053 trace file from 11.2.0.1 showed essentially the same information as 11.1.0.7 during the hard parse.
Metalink (MOS) offers one more confirmation that the rule based optimizer should not have been able to use the IND_T5_C10_F function based index, as had happened in 11.1.0.7:
Doc ID 66277.1 – “Concepts and Usage of Function Based Indexes”
Doc ID 189702.1 – “Rule Based Optimizer is to be Desupported in Oracle10g” (also lists other features that are not supported by the rule based optimizer)
—-
This Metalink document appears to be just what the OP of the OTN thread wanted to know:
Doc ID 247743.1 – “Guidelines for Tuning: Query under Rule is fast Query under Cost is Slow”
my code tags also don’t work the way I expected …
For some reason this blog accepts code tag in square brackets [] and not in angled brackets .
I’m curious about the effect of running the queues in reverse order would have on them. Such as running the index one first. Would that then still be the fastest one? My first thought (And I am probably wrong) is that since you are asking for the same data 4 times fairly quickly, wouldn’t it already be in a queue with a, I want to say a high priority value, but I’m not sure of the correct terminology at this time.
Jeremy,
Excellent point. The SGA_TARGET was set to 12G, so the order of the queries probably would have made a difference because the previously read blocks would likely already be in the buffer cache and would not need to be re-read from disk. In Oracle Database 11.1 and above there is also the risk that the full table scans will be performed with direct path reads (the Oracle buffer cache is not used in such a case) so there is a chance that the first and third executions could have completed in the same amount of time.
Couldn’t the SELECT STATEMENT as
“…WHERE C10=LPAD(‘a’,50,’a’);” induce the index to be used by the optimizer?
Removing the LOWER function from C10.
Hi Kate,
You are correct that using the LOWER function on an indexed column could (or very likely will unless you do something like I did in this blog article: https://hoopercharles.wordpress.com/2010/06/01/the-instr-function-will-never-use-an-index-i-will-give-you-a-hint/ ) prevent an index on a column from being used.
However, the statement that I used to create the index IND_T5_C10_F is this:
The above syntax tells Oracle to create a function based index on LOWER(C10) (and store the result of that function in the index). When defined that way, the index will only be used when LOWER(C10) appears in the WHERE clause (or in a JOIN clause in an ANSI formatted SQL statement).
It is my understanding that the RULE based optimizer is not supposed to be able to use function based indexes. One of the outcomes that I originally hoped would have happened is that the cost-based optimizer would have automatically selected to use the (potentially much slower) function based index access path, while the RULE based optimizer would have had to use a full table scan. The test server, with the default value for OPTIMIZER_INDEX_COST_ADJ, and its fast random access SSD disk array, probably skewed the outcome a bit.
Hi Charles,
“adjust” the clustering_factor could be a solution (note the order by clause that I added in your test case in the INSERT statement).
Therefore, in this case the FBI is used in all cases (except the one using the hint / * + RULE * /). I can reasonably state that:
1) the execution time of the first query is reduced significantly
2) the consistent gets are reduced
3) the physical reads are reduced
Instead this is what I get from the first query if re-run your test case:
I would say they have solved the problem, but I wonder if those who had the problem can be used
this solution.
In the real world may not be an applicable solution, depends on many factors ..
as the table is used, the queries that are executed for that table
and more…
Regards
In my test case I used the version:
Donatello,
That is a good idea to compare the CLUSTERING_FACTOR, which is a statistic that is not considered by the RULE based optimizer. It could be the case that the CLUSTERING_FACTOR is just poorly calculated (due to the way the statistic is calculated, if Oracle finds that it is jumping back and forth between a small number of table blocks as the index is read, that will cause the CLUSTERING_FACTOR to increase significantly, while the actual performance impact can be quite small) – in which case a manual change to the index’s calculated CLUSTERING_FACTOR statistic might help if the index was used by the RULE based optimizer and not the cost based optimizer (as might be the case when the test case is executed on 11.1.0.7, or if a normal index were used).
I tried your test case (with the ORDER BY 10 in the INSERT INTO SQL statement) on 10.2.0.2 and found that the index was not used automatically (but there is a reason for this). Below are my results for the modified test case:
As you can see from the above, the calculated cost for the full table scan access path is 1040, while the calculated cost for the index access path is 1081 – so a full table scan was still used. Why did this happen? Let’s take a look at the system (CPU) statistics:
Notice in the above that MBRC is set to a value of 32, let’s change that to 16 to see what happens:
Now to rerun the queries, skipping the output from the first query:
So, the RULE based optimizer continues to use a full table scan, how about the cost based optimizer?
The above shows that the cost based optimizer is now using the index access path, because we modified the MBRC system (CPU) statistic.
Let’s try a slight variation of the original test case, this time we will not create a function based index, I have removed the LOWER function from the WHERE clauses, and have reset the MBRC system statistic back to a value of 32 (you can optionally remove the ORDER BY 10 clause in the INSERT INTO statement):
Now let’s take a look at the output:
As shown above, the cost based optimizer will perform a full table scan (likely because of the MBRC system statistic value).
As shown above, the RULE based optimizer will use the index, because it does not care about the CLUSTERING_FACTOR of the index, nor the value of the MBRC system statistic.
I wonder if the comments provided by readers have now covered all of the reasons why a RULE hinted query might complete in seconds while an unhinted query might require considerably longer?
until 10.2 there were still some rule hints in the abysses of the server: http://jonathanlewis.wordpress.com/2010/04/13/rule-rules/. If my memory serves me well I optimized some monitoring queries on v$-views by adding rule hints some years ago (in 10.2). But the only reference I could find now for this “technique” is http://www.dba-oracle.com/t_tuning_oracle_dictionary_sql.htm – and so this remains anecdotal evidence …
SQL Profiles may also cause adding a rule hint to be faster.
If a profile was created that no longer is the optimal plan (say before an index was created), then adding the rule hint will stop the profile being used. Consider (on an 11.2.0.1 database):
NB – this uses Kerry Osbourne’s create_sql_profile.sql script to create the profile from the plan in the shared pool: http://kerryosborne.oracle-guy.com/2009/04/oracle-sql-profiles/
Other hints would also “break” the profile as shown above, of course. I’m not trying to imply that this the only way to stop the profile being used, just that adding it does cause the plan to change (in this case), and therefore is a possible reason for the differences.
Chris,
Thank you for posting this example. I had not considered the presence of a profile as the cause… and that might explain a question like this “why is my query slow when I list the columns in one order, but fast when I swap the order of the first two columns.”
I think that a parallel problem might be optimizing for performance when the data volumes are small (possibly with the help of hints or a profile), and then finding that as the data volumes grow (or the data changes significantly) the original optimizations actually hurt performance. Or, as Donatello demonstrated, if the data is loaded in such a way that a lower clustering factor is achieved, the execution plan is frozen, and then as the data is later inserted into the tables in a more or less random fashion, the clustering factor no longer approximates its original value, and thus causing the original optimizations to actually hinder performance.