January 15, 2011
While reading the “Pro Oracle SQL” book I learned something interesting. Commenting your work can improve database performance. You certainly are aware that thoroughly documenting what you do could prevent hours of headaches that might appear later when trying to investigate problems or add additional features to an existing procedure (I think that was part of the message in Cary Millsap’s recent blog article).
But how can commenting what you do actually improve database performance? To demonstrate, let’s create a simple test case using two tables and a single insert statement to populate the two tables:
CREATE TABLE T1 ( C1 NUMBER, C2 NUMBER, C3 VARCHAR2(200), PRIMARY KEY(C1)); CREATE TABLE T2 ( C1 NUMBER, C2 NUMBER, C3 VARCHAR2(200), PRIMARY KEY(C1)); INSERT ALL WHEN 1=1 THEN INTO T1 WHEN MOD(ROWNUM,2)=1 THEN INTO T2 SELECT ROWNUM C1, ROWNUM C2, LPAD('A',200,'A') C3 FROM DUAL CONNECT BY LEVEL<=1000000; COMMIT; EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T1',CASCADE=>TRUE) EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T2',CASCADE=>TRUE)
The tables created by the above script both have primary key indexes, with table T1 having 1,000,000 rows and table T2 having 500,000 rows. Now let’s create a simple SQL statement that joins the two tables and output the execution plan for the SQL statement:
SET LINESIZE 140 SET TRIMSPOOL ON SET PAGESIZE 1000 SET AUTOTRACE TRACEONLY EXPLAIN SELECT T1.C1, T1.C2, T2.C1, T2.C2 FROM T1, T2 WHERE T1.C1=T2.C1; Execution Plan ---------------------------------------------------------- Plan hash value: 2959412835 ----------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | ----------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 500K| 9765K| | 13974 (1)| 00:02:48 | |* 1 | HASH JOIN | | 500K| 9765K| 10M| 13974 (1)| 00:02:48 | | 2 | TABLE ACCESS FULL| T2 | 500K| 4882K| | 4126 (1)| 00:00:50 | | 3 | TABLE ACCESS FULL| T1 | 1000K| 9765K| | 8276 (1)| 00:01:40 | ----------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - access("T1"."C1"="T2"."C1")
The execution plan indicates that a hash join will be used to join the tables, with table T2 listed as the first row source below the words “HASH JOIN”. The query is expected to require roughly 2 minutes and 48 seconds to execute. Now let’s try the query again with a hint:
SELECT /*+ LEADING(T1) */ T1.C1, T1.C2, T2.C1, T2.C2 FROM T1, T2 WHERE T1.C1=T2.C1; Execution Plan ---------------------------------------------------------- Plan hash value: 1838229974 ----------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | ----------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 500K| 9765K| | 13973 (1)| 00:02:48 | |* 1 | HASH JOIN | | 500K| 9765K| 20M| 13973 (1)| 00:02:48 | | 2 | TABLE ACCESS FULL| T1 | 1000K| 9765K| | 8276 (1)| 00:01:40 | | 3 | TABLE ACCESS FULL| T2 | 500K| 4882K| | 4126 (1)| 00:00:50 | ----------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - access("T1"."C1"="T2"."C1")
The execution plan again indicates that a hash join will be used to join the tables, this time with table T1 listed as the first row source below the words “HASH JOIN”. The query is still expected to require roughly 2 minutes and 48 seconds to execute. Let’s try the query again with a second hint:
SELECT /*+ LEADING(T1) USE_NL(T2) */ T1.C1, T1.C2, T2.C1, T2.C2 FROM T1, T2 WHERE T1.C1=T2.C1; Execution Plan ---------------------------------------------------------- Plan hash value: 685492288 --------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 500K| 9765K| 1508K (1)| 05:01:46 | | 1 | NESTED LOOPS | | | | | | | 2 | NESTED LOOPS | | 500K| 9765K| 1508K (1)| 05:01:46 | | 3 | TABLE ACCESS FULL | T1 | 1000K| 9765K| 8276 (1)| 00:01:40 | |* 4 | INDEX UNIQUE SCAN | SYS_C0010199 | 1 | | 1 (0)| 00:00:01 | | 5 | TABLE ACCESS BY INDEX ROWID| T2 | 1 | 10 | 2 (0)| 00:00:01 | --------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 4 - access("T1"."C1"="T2"."C1")
The execution plan this time indicates that two nested loops joins will be used to join the tables, with table T1 listed as the first row source below the words “NESTED LOOPS”. The query is still expected to require roughly 301 minutes and 46 seconds to execute. Now let’s document what we did to alter performance so that the next person to investigate the performance of this SQL statement will know why it performs as it does:
SELECT /*+ comment: I added these hints on Jan 15, 2011 to fix a performance problem LEADING(T1) USE_NL(T2) */ T1.C1, T1.C2, T2.C1, T2.C2 FROM T1, T2 WHERE T1.C1=T2.C1; Execution Plan ---------------------------------------------------------- Plan hash value: 2959412835 ----------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | ----------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 500K| 9765K| | 13974 (1)| 00:02:48 | |* 1 | HASH JOIN | | 500K| 9765K| 10M| 13974 (1)| 00:02:48 | | 2 | TABLE ACCESS FULL| T2 | 500K| 4882K| | 4126 (1)| 00:00:50 | | 3 | TABLE ACCESS FULL| T1 | 1000K| 9765K| | 8276 (1)| 00:01:40 | ----------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - access("T1"."C1"="T2"."C1")
As you can see from the AUTOTRACE generated execution plan, simply commenting our changes is sufficient to convince the optimizer that the SQL statement will execute roughly 100 times faster than the version of the SQL statement without the comment that documents our work. The reason for this estimated performance improvement is explained on page 516 of the book. 🙂
I dont have chance to read the book tonight but Looks like some idiosyncrasy behaviour going on.
when I add comment to the end of hints instead of beginning, hints are not ignored.
another good case for why my hints are ignored question
thanks for sharing
Coskan,
I am hoping that the author of chapter 16 can explain what is happening. Strange things happen when you tell the optimizer that you want FAST PERFORMANCE – see my reply to Martin Preiss below.
Charles,
in this case a very beneficial comment – but a simple colon (:) seems to do the same. Another interesting detail is that the query without a hint shows a sligthly higher cost than the first hinted query (13974 to 13973) – not a big difference but still a surprise (for me).
Martin,
That is an interesting observation. Have you tried either the FAST PERFORMANCE, or the subset hints FAST or PERFORMANCE? Allow me to demonstrate on Oracle Database 11.2.0.2:
Obviously, if we misspell the PERFORMANCE hint, our query will take much longer to execute:
That is one of serveral items that I learned from the book.
I completely missed the change in the calculated cost – thanks for pointing that out to me.
Charles,
it seems that even a single letter can disable a following hint – but not every letter:
At first I began to think about vovels and consonants – but that theory died soon…
The hints are also invalidated by other leading signs (,.-;) and by reserved words (or, and, table).
Unfortunately this technique doesn’t seem to work if you comment in French –
Most likely it happens due to change in behavior of the query parser described in the Doc ID 826893.1.
Doc ID 826893.1 is an interesting, and very helpful document – thank you for finding that. Very likely this document is supposed to explain the behavior.
However, item number 2 below the CAUSE heading suggests that these might be SQL keywords based on other comments attached to this article:
* FAST
* PERFORMANCE
* COMMENT (OK, that is a keyword, so it is addressed by item number 2 in the article)
* :
* A
* D
Additionally, based on the comment left by nedoboi, we see that Oracle Database does not yet understand French – maybe that is a bug coming to version 12.1? 🙂
with regard to the single letters I looked for elements in dba_tab_columns and dba_objects with length=1 and found some of them (‘A’, ‘D’) – but the creation of a new table with a column named ‘B’ does not change the result; and of course the signs : ; , . – are not found there.
“Oracle Database does not yet understand French” – perhaps the hints have also to be translated according to the loi toubon: http://en.wikipedia.org/wiki/Toubon_Law 🙂
[…] Thanks to Timur Akhmadeev for telling us about bug 8432870 (you’ll need an account on MOS for the link) in his reply to Charles Hooper’s recent post. […]
Well I think you guys have it all figured out already. Keywords in a hint can cause the valid hints to be ignored. The suggestion in the book was “Just don’t do it”. Put your comments in a a separate comment construct. Like so.
/*+ full(a) */ /* This is where the comment goes … */
Since there are about a billion keywords (some of which are not documented or well known) and since Oracle can add a new one any time they want, It’s just best not to try to mix hints and comments in the same structure. I hadn’t seen that MOS note though so that was nice verification as to what I had observed and assumed to be true.
Kerry,
Thank you for the tip regarding including comments in hint blocks, and the suggested work-around for that problem. By the way, chapter 16 of the book is excellent – one of the best chapters in the book.
[…] SQL code is not only a good practice, comments could also improve SQL performance. What?? No, really – here is a use case. It’s a trick, read on the comments if the rationale […]