Adding Comments to SQL Statements Improves Performance?

15 01 2011

January 15, 2011

While reading the “Pro Oracle SQL” book I learned something interesting.  Commenting your work can improve database performance.  You certainly are aware that thoroughly documenting what you do could prevent hours of headaches that might appear later when trying to investigate problems or add additional features to an existing procedure (I think that was part of the message in Cary Millsap’s recent blog article). 

But how can commenting what you do actually improve database performance?  To demonstrate, let’s create a simple test case using two tables and a single insert statement to populate the two tables:

CREATE TABLE T1 (
  C1 NUMBER,
  C2 NUMBER,
  C3 VARCHAR2(200),
  PRIMARY KEY(C1));

CREATE TABLE T2 (
  C1 NUMBER,
  C2 NUMBER,
  C3 VARCHAR2(200),
  PRIMARY KEY(C1));

INSERT ALL
  WHEN 1=1 THEN
    INTO T1
  WHEN MOD(ROWNUM,2)=1 THEN
    INTO T2
SELECT
  ROWNUM C1,
  ROWNUM C2,
  LPAD('A',200,'A') C3
FROM
  DUAL
CONNECT BY
  LEVEL<=1000000;

COMMIT;

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T1',CASCADE=>TRUE)
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T2',CASCADE=>TRUE) 

The tables created by the above script both have primary key indexes, with table T1 having 1,000,000 rows and table T2 having 500,000 rows.  Now let’s create a simple SQL statement that joins the two tables and output the execution plan for the SQL statement:

SET LINESIZE 140
SET TRIMSPOOL ON
SET PAGESIZE 1000
SET AUTOTRACE TRACEONLY EXPLAIN

SELECT
  T1.C1,
  T1.C2,
  T2.C1,
  T2.C2
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1;

Execution Plan
----------------------------------------------------------
Plan hash value: 2959412835

-----------------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |   500K|  9765K|       | 13974   (1)| 00:02:48 |
|*  1 |  HASH JOIN         |      |   500K|  9765K|    10M| 13974   (1)| 00:02:48 |
|   2 |   TABLE ACCESS FULL| T2   |   500K|  4882K|       |  4126   (1)| 00:00:50 |
|   3 |   TABLE ACCESS FULL| T1   |  1000K|  9765K|       |  8276   (1)| 00:01:40 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."C1"="T2"."C1") 

The execution plan indicates that a hash join will be used to join the tables, with table T2 listed as the first row source below the words “HASH JOIN”.  The query is expected to require roughly 2 minutes and 48 seconds to execute.  Now let’s try the query again with a hint:

SELECT /*+ LEADING(T1) */
  T1.C1,
  T1.C2,
  T2.C1,
  T2.C2
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1;

Execution Plan
----------------------------------------------------------
Plan hash value: 1838229974

-----------------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |   500K|  9765K|       | 13973   (1)| 00:02:48 |
|*  1 |  HASH JOIN         |      |   500K|  9765K|    20M| 13973   (1)| 00:02:48 |
|   2 |   TABLE ACCESS FULL| T1   |  1000K|  9765K|       |  8276   (1)| 00:01:40 |
|   3 |   TABLE ACCESS FULL| T2   |   500K|  4882K|       |  4126   (1)| 00:00:50 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."C1"="T2"."C1") 

The execution plan again indicates that a hash join will be used to join the tables, this time with table T1 listed as the first row source below the words “HASH JOIN”.  The query is still expected to require roughly 2 minutes and 48 seconds to execute.  Let’s try the query again with a second hint: 

SELECT /*+ LEADING(T1) USE_NL(T2) */
  T1.C1,
  T1.C2,
  T2.C1,
  T2.C2
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1;

Execution Plan
----------------------------------------------------------
Plan hash value: 685492288

---------------------------------------------------------------------------------------------
| Id  | Operation                    | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |              |   500K|  9765K|  1508K  (1)| 05:01:46 |
|   1 |  NESTED LOOPS                |              |       |       |            |          |
|   2 |   NESTED LOOPS               |              |   500K|  9765K|  1508K  (1)| 05:01:46 |
|   3 |    TABLE ACCESS FULL         | T1           |  1000K|  9765K|  8276   (1)| 00:01:40 |
|*  4 |    INDEX UNIQUE SCAN         | SYS_C0010199 |     1 |       |     1   (0)| 00:00:01 |
|   5 |   TABLE ACCESS BY INDEX ROWID| T2           |     1 |    10 |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("T1"."C1"="T2"."C1") 

The execution plan this time indicates that two nested loops joins will be used to join the tables, with table T1 listed as the first row source below the words “NESTED LOOPS”.  The query is still expected to require roughly 301 minutes and 46 seconds to execute.  Now let’s document what we did to alter performance so that the next person to investigate the performance of this SQL statement will know why it performs as it does:

SELECT /*+ comment: I added these hints on Jan 15, 2011 to fix a performance problem LEADING(T1) USE_NL(T2) */
  T1.C1,
  T1.C2,
  T2.C1,
  T2.C2
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1;

Execution Plan
----------------------------------------------------------
Plan hash value: 2959412835

-----------------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |   500K|  9765K|       | 13974   (1)| 00:02:48 |
|*  1 |  HASH JOIN         |      |   500K|  9765K|    10M| 13974   (1)| 00:02:48 |
|   2 |   TABLE ACCESS FULL| T2   |   500K|  4882K|       |  4126   (1)| 00:00:50 |
|   3 |   TABLE ACCESS FULL| T1   |  1000K|  9765K|       |  8276   (1)| 00:01:40 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."C1"="T2"."C1") 

As you can see from the AUTOTRACE generated execution plan, simply commenting our changes is sufficient to convince the optimizer that the SQL statement will execute roughly 100 times faster than the version of the SQL statement without the comment that documents our work.  The reason for this estimated performance improvement is explained on page 516 of the book.  :-)








Follow

Get every new post delivered to your Inbox.

Join 139 other followers