Adding Comments to SQL Statements Improves Performance?

15 01 2011

January 15, 2011

While reading the “Pro Oracle SQL” book I learned something interesting. Commenting your work can improve database performance. You certainly are aware that thoroughly documenting what you do could prevent hours of headaches that might appear later when trying to investigate problems or add additional features to an existing procedure (I think that was part of the message in Cary Millsap’s recent blog article).

But how can commenting what you do actually improve database performance? To demonstrate, let’s create a simple test case using two tables and a single insert statement to populate the two tables:

CREATE TABLE T1 (
  C1 NUMBER,
  C2 NUMBER,
  C3 VARCHAR2(200),
  PRIMARY KEY(C1));

CREATE TABLE T2 (
  C1 NUMBER,
  C2 NUMBER,
  C3 VARCHAR2(200),
  PRIMARY KEY(C1));

INSERT ALL
  WHEN 1=1 THEN
    INTO T1
  WHEN MOD(ROWNUM,2)=1 THEN
    INTO T2
SELECT
  ROWNUM C1,
  ROWNUM C2,
  LPAD('A',200,'A') C3
FROM
  DUAL
CONNECT BY
  LEVEL<=1000000;

COMMIT;

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T1',CASCADE=>TRUE)
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T2',CASCADE=>TRUE)

The tables created by the above script both have primary key indexes, with table T1 having 1,000,000 rows and table T2 having 500,000 rows. Now let’s create a simple SQL statement that joins the two tables and output the execution plan for the SQL statement:

SET LINESIZE 140
SET TRIMSPOOL ON
SET PAGESIZE 1000
SET AUTOTRACE TRACEONLY EXPLAIN

SELECT
  T1.C1,
  T1.C2,
  T2.C1,
  T2.C2
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1;

Execution Plan
----------------------------------------------------------
Plan hash value: 2959412835

-----------------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |   500K|  9765K|       | 13974   (1)| 00:02:48 |
|*  1 |  HASH JOIN         |      |   500K|  9765K|    10M| 13974   (1)| 00:02:48 |
|   2 |   TABLE ACCESS FULL| T2   |   500K|  4882K|       |  4126   (1)| 00:00:50 |
|   3 |   TABLE ACCESS FULL| T1   |  1000K|  9765K|       |  8276   (1)| 00:01:40 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."C1"="T2"."C1")

The execution plan indicates that a hash join will be used to join the tables, with table T2 listed as the first row source below the words “HASH JOIN”. The query is expected to require roughly 2 minutes and 48 seconds to execute. Now let’s try the query again with a hint:

SELECT /*+ LEADING(T1) */
  T1.C1,
  T1.C2,
  T2.C1,
  T2.C2
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1;

Execution Plan
----------------------------------------------------------
Plan hash value: 1838229974

-----------------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |   500K|  9765K|       | 13973   (1)| 00:02:48 |
|*  1 |  HASH JOIN         |      |   500K|  9765K|    20M| 13973   (1)| 00:02:48 |
|   2 |   TABLE ACCESS FULL| T1   |  1000K|  9765K|       |  8276   (1)| 00:01:40 |
|   3 |   TABLE ACCESS FULL| T2   |   500K|  4882K|       |  4126   (1)| 00:00:50 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."C1"="T2"."C1")

The execution plan again indicates that a hash join will be used to join the tables, this time with table T1 listed as the first row source below the words “HASH JOIN”. The query is still expected to require roughly 2 minutes and 48 seconds to execute. Let’s try the query again with a second hint:

SELECT /*+ LEADING(T1) USE_NL(T2) */
  T1.C1,
  T1.C2,
  T2.C1,
  T2.C2
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1;

Execution Plan
----------------------------------------------------------
Plan hash value: 685492288

---------------------------------------------------------------------------------------------
| Id  | Operation                    | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |              |   500K|  9765K|  1508K  (1)| 05:01:46 |
|   1 |  NESTED LOOPS                |              |       |       |            |          |
|   2 |   NESTED LOOPS               |              |   500K|  9765K|  1508K  (1)| 05:01:46 |
|   3 |    TABLE ACCESS FULL         | T1           |  1000K|  9765K|  8276   (1)| 00:01:40 |
|*  4 |    INDEX UNIQUE SCAN         | SYS_C0010199 |     1 |       |     1   (0)| 00:00:01 |
|   5 |   TABLE ACCESS BY INDEX ROWID| T2           |     1 |    10 |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("T1"."C1"="T2"."C1")

The execution plan this time indicates that two nested loops joins will be used to join the tables, with table T1 listed as the first row source below the words “NESTED LOOPS”. The query is still expected to require roughly 301 minutes and 46 seconds to execute. Now let’s document what we did to alter performance so that the next person to investigate the performance of this SQL statement will know why it performs as it does:

SELECT /*+ comment: I added these hints on Jan 15, 2011 to fix a performance problem LEADING(T1) USE_NL(T2) */
  T1.C1,
  T1.C2,
  T2.C1,
  T2.C2
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1;

Execution Plan
----------------------------------------------------------
Plan hash value: 2959412835

-----------------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |   500K|  9765K|       | 13974   (1)| 00:02:48 |
|*  1 |  HASH JOIN         |      |   500K|  9765K|    10M| 13974   (1)| 00:02:48 |
|   2 |   TABLE ACCESS FULL| T2   |   500K|  4882K|       |  4126   (1)| 00:00:50 |
|   3 |   TABLE ACCESS FULL| T1   |  1000K|  9765K|       |  8276   (1)| 00:01:40 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."C1"="T2"."C1")

As you can see from the AUTOTRACE generated execution plan, simply commenting our changes is sufficient to convince the optimizer that the SQL statement will execute roughly 100 times faster than the version of the SQL statement without the comment that documents our work. The reason for this estimated performance improvement is explained on page 516 of the book. 🙂

Actions

Information

Date : January 15, 2011
Categories : Performance, SQL

13 responses

15 01 2011

coskan gundogar (13:04:15) :

I dont have chance to read the book tonight but Looks like some idiosyncrasy behaviour going on.
when I add comment to the end of hints instead of beginning, hints are not ignored.

another good case for why my hints are ignored question

thanks for sharing

15 01 2011

Charles Hooper (18:15:37) :

Coskan,
I am hoping that the author of chapter 16 can explain what is happening. Strange things happen when you tell the optimizer that you want FAST PERFORMANCE – see my reply to Martin Preiss below.

15 01 2011

Martin Preiss (16:25:55) :

Charles,
in this case a very beneficial comment – but a simple colon (:) seems to do the same. Another interesting detail is that the query without a hint shows a sligthly higher cost than the first hinted query (13974 to 13973) – not a big difference but still a surprise (for me).

15 01 2011

Charles Hooper (18:09:48) :

Martin,
That is an interesting observation. Have you tried either the FAST PERFORMANCE, or the subset hints FAST or PERFORMANCE? Allow me to demonstrate on Oracle Database 11.2.0.2:

SELECT /*+ FAST LEADING(T1) USE_NL(T2) */
  T1.C1,
  T1.C2,
  T2.C1,
  T2.C2
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1;
 
Execution Plan
----------------------------------------------------------
Plan hash value: 2959412835
 
-----------------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |   500K|  9765K|       | 13974   (1)| 00:02:48 |
|*  1 |  HASH JOIN         |      |   500K|  9765K|    10M| 13974   (1)| 00:02:48 |
|   2 |   TABLE ACCESS FULL| T2   |   500K|  4882K|       |  4126   (1)| 00:00:50 |
|   3 |   TABLE ACCESS FULL| T1   |  1000K|  9765K|       |  8276   (1)| 00:01:40 |
-----------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."C1"="T2"."C1")

SELECT /*+ PERFORMANCE LEADING(T1) USE_NL(T2) */
  T1.C1,
  T1.C2,
  T2.C1,
  T2.C2
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1;
 
Execution Plan
----------------------------------------------------------
Plan hash value: 2959412835
 
-----------------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |   500K|  9765K|       | 13974   (1)| 00:02:48 |
|*  1 |  HASH JOIN         |      |   500K|  9765K|    10M| 13974   (1)| 00:02:48 |
|   2 |   TABLE ACCESS FULL| T2   |   500K|  4882K|       |  4126   (1)| 00:00:50 |
|   3 |   TABLE ACCESS FULL| T1   |  1000K|  9765K|       |  8276   (1)| 00:01:40 |
-----------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."C1"="T2"."C1")

Obviously, if we misspell the PERFORMANCE hint, our query will take much longer to execute:

SELECT /*+ PRFORMANCE LEADING(T1) USE_NL(T2) */
  T1.C1,
  T1.C2,
  T2.C1,
  T2.C2
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1;
  
Execution Plan
----------------------------------------------------------
Plan hash value: 685492288
 
---------------------------------------------------------------------------------------------
| Id  | Operation                    | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |              |   500K|  9765K|  1508K  (1)| 05:01:46 |
|   1 |  NESTED LOOPS                |              |       |       |            |          |
|   2 |   NESTED LOOPS               |              |   500K|  9765K|  1508K  (1)| 05:01:46 |
|   3 |    TABLE ACCESS FULL         | T1           |  1000K|  9765K|  8276   (1)| 00:01:40 |
|*  4 |    INDEX UNIQUE SCAN         | SYS_C0010199 |     1 |       |     1   (0)| 00:00:01 |
|   5 |   TABLE ACCESS BY INDEX ROWID| T2           |     1 |    10 |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("T1"."C1"="T2"."C1")

That is one of serveral items that I learned from the book.

I completely missed the change in the calculated cost – thanks for pointing that out to me.

16 01 2011

Martin Preiss (04:21:03) :

Charles,
it seems that even a single letter can disable a following hint – but not every letter:

-- Results from 11.2.0.1
SQL> SELECT /*+ a LEADING(T1) USE_NL(T2) */
  2    T1.C1,
  3    T1.C2,
  4    T2.C1,
  5    T2.C2
  6  FROM
  7    T1,
  8    T2
  9  WHERE
 10    T1.C1=T2.C1;
Abgelaufen: 00:00:00.00

Ausf³hrungsplan
----------------------------------------------------------
Plan hash value: 2959412835

-----------------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |   500K|  9765K|       | 14175   (1)| 00:02:51 |
|*  1 |  HASH JOIN         |      |   500K|  9765K|    10M| 14175   (1)| 00:02:51 |
|   2 |   TABLE ACCESS FULL| T2   |   500K|  4882K|       |  4124   (1)| 00:00:50 |
|   3 |   TABLE ACCESS FULL| T1   |  1000K|  9765K|       |  8481   (1)| 00:01:42 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("T1"."C1"="T2"."C1")

SQL> r
  1  SELECT /*+ b LEADING(T1) USE_NL(T2) */
  2    T1.C1,
  3    T1.C2,
  4    T2.C1,
  5    T2.C2
  6  FROM
  7    T1,
  8    T2
  9  WHERE
 10*   T1.C1=T2.C1
Abgelaufen: 00:00:00.00

Ausf³hrungsplan
----------------------------------------------------------
Plan hash value: 2984486645

---------------------------------------------------------------------------------------------
| Id  | Operation                    | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |              |   500K|  9765K|  1508K  (1)| 05:01:47 |
|   1 |  NESTED LOOPS                |              |       |       |            |          |
|   2 |   NESTED LOOPS               |              |   500K|  9765K|  1508K  (1)| 05:01:47 |
|   3 |    TABLE ACCESS FULL         | T1           |  1000K|  9765K|  8481   (1)| 00:01:42 |
|*  4 |    INDEX UNIQUE SCAN         | SYS_C0010667 |     1 |       |     1   (0)| 00:00:01 |
|   5 |   TABLE ACCESS BY INDEX ROWID| T2           |     1 |    10 |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("T1"."C1"="T2"."C1")

SQL> r
  1  SELECT /*+ c LEADING(T1) USE_NL(T2) */
  2    T1.C1,
  3    T1.C2,
  4    T2.C1,
  5    T2.C2
  6  FROM
  7    T1,
  8    T2
  9  WHERE
 10*   T1.C1=T2.C1
Abgelaufen: 00:00:00.00

Ausf³hrungsplan
----------------------------------------------------------
Plan hash value: 2984486645

---------------------------------------------------------------------------------------------
| Id  | Operation                    | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |              |   500K|  9765K|  1508K  (1)| 05:01:47 |
|   1 |  NESTED LOOPS                |              |       |       |            |          |
|   2 |   NESTED LOOPS               |              |   500K|  9765K|  1508K  (1)| 05:01:47 |
|   3 |    TABLE ACCESS FULL         | T1           |  1000K|  9765K|  8481   (1)| 00:01:42 |
|*  4 |    INDEX UNIQUE SCAN         | SYS_C0010667 |     1 |       |     1   (0)| 00:00:01 |
|   5 |   TABLE ACCESS BY INDEX ROWID| T2           |     1 |    10 |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("T1"."C1"="T2"."C1")

SQL> r
  1  SELECT /*+ d LEADING(T1) USE_NL(T2) */
  2    T1.C1,
  3    T1.C2,
  4    T2.C1,
  5    T2.C2
  6  FROM
  7    T1,
  8    T2
  9  WHERE
 10*   T1.C1=T2.C1
Abgelaufen: 00:00:00.00

Ausf³hrungsplan
----------------------------------------------------------
Plan hash value: 2959412835

-----------------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |   500K|  9765K|       | 14175   (1)| 00:02:51 |
|*  1 |  HASH JOIN         |      |   500K|  9765K|    10M| 14175   (1)| 00:02:51 |
|   2 |   TABLE ACCESS FULL| T2   |   500K|  4882K|       |  4124   (1)| 00:00:50 |
|   3 |   TABLE ACCESS FULL| T1   |  1000K|  9765K|       |  8481   (1)| 00:01:42 |
-----------------------------------------------------------------------------------

At first I began to think about vovels and consonants – but that theory died soon…
The hints are also invalidated by other leading signs (,.-;) and by reserved words (or, and, table).

15 01 2011

nedoboi (21:24:53) :

Unfortunately this technique doesn’t seem to work if you comment in French –

SELECT /*+ Essayez un nouveau navigateur avec traduction automatique LEADING(T1) USE_NL(T2) */
  T1.C1,
  T1.C2,
  T2.C1,
  T2.C2
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1;

Execution Plan
----------------------------------------------------------
Plan hash value: 8645535281

---------------------------------------------------------------------------------------------
| Id  | Operation                    | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |              |   500K|  9765K|  1508K  (1)| 05:01:47 |
|   1 |  NESTED LOOPS                |              |       |       |            |          |
|   2 |   NESTED LOOPS               |              |   500K|  9765K|  1508K  (1)| 05:01:47 |
|   3 |    TABLE ACCESS FULL         | T1           |  1000K|  9765K|  8279   (1)| 00:01:40 |
|*  4 |    INDEX UNIQUE SCAN         | SYS_C0011891 |     1 |       |     1   (0)| 00:00:01 |
|   5 |   TABLE ACCESS BY INDEX ROWID| T2           |     1 |    10 |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

16 01 2011

Timur Akhmadeev (09:44:34) :

Most likely it happens due to change in behavior of the query parser described in the Doc ID 826893.1.

16 01 2011

Charles Hooper (12:41:52) :

Doc ID 826893.1 is an interesting, and very helpful document – thank you for finding that. Very likely this document is supposed to explain the behavior.

However, item number 2 below the CAUSE heading suggests that these might be SQL keywords based on other comments attached to this article:
* FAST
* PERFORMANCE
* COMMENT (OK, that is a keyword, so it is addressed by item number 2 in the article)
* :
* A
* D

Additionally, based on the comment left by nedoboi, we see that Oracle Database does not yet understand French – maybe that is a bug coming to version 12.1? 🙂

16 01 2011

Martin Preiss (13:39:59) :

with regard to the single letters I looked for elements in dba_tab_columns and dba_objects with length=1 and found some of them (‘A’, ‘D’) – but the creation of a new table with a column named ‘B’ does not change the result; and of course the signs : ; , . – are not found there.

“Oracle Database does not yet understand French” – perhaps the hints have also to be translated according to the loi toubon: http://en.wikipedia.org/wiki/Toubon_Law 🙂

16 01 2011

Ignoring hints « Oracle Scratchpad (17:32:19) :

[…] Thanks to Timur Akhmadeev for telling us about bug 8432870 (you’ll need an account on MOS for the link) in his reply to Charles Hooper’s recent post. […]

16 01 2011

Kerry Osborne (22:07:26) :

Well I think you guys have it all figured out already. Keywords in a hint can cause the valid hints to be ignored. The suggestion in the book was “Just don’t do it”. Put your comments in a a separate comment construct. Like so.

/*+ full(a) */ /* This is where the comment goes … */

Since there are about a billion keywords (some of which are not documented or well known) and since Oracle can add a new one any time they want, It’s just best not to try to mix hints and comments in the same structure. I hadn’t seen that MOS note though so that was nice verification as to what I had observed and assumed to be true.

17 01 2011

Charles Hooper (07:23:49) :

Kerry,

Thank you for the tip regarding including comments in hint blocks, and the suggested work-around for that problem. By the way, chapter 16 of the book is excellent – one of the best chapters in the book.

22 01 2011

State of Data Last Week – #32 « Dr Data's Blog (16:43:21) :

[…] SQL code is not only a good practice, comments could also improve SQL performance. What?? No, really – here is a use case. It’s a trick, read on the comments if the rationale […]

	Charles Hooper on Favorite Free Applications to…
	Zach on Favorite Free Applications to…
	Charles Hooper on Favorite Free Applications to…
	Zach on Favorite Free Applications to…
	Charles Hooper on Huge Presentation – Work…
	Dan on Huge Presentation – Work…
	Charles Hooper on Review of Synology DS415+, How…
	Matthias Clausen on Review of Synology DS415+, How…
	Charles Hooper on _SMALL_TABLE_THRESHOLD Paramet…
	mohd atif on _SMALL_TABLE_THRESHOLD Paramet…
	Charles Hooper on Free ANSI SQL to Oracle Specif…
	Kalin Agrawal on Free ANSI SQL to Oracle Specif…
	Charles Hooper on Large Page Support or Not…
	vaurob on Large Page Support or Not…
	Locating Oracle… on Finding a New Home for a Clien…

Charles Hooper's Oracle Notes