Hash Joins – What is Wrong with this Statement?

29 12 2010

December 29, 2010

I started reading the book “Pro Oracle SQL” a couple of days ago, and I am having trouble putting it down.  Other than a couple of minor word substitutions and intended, but not specified, qualifying words, the first 100 pages of the book are fantastic (I suspect that many of these cases are intentional since that book is not advertised as a SQL performance tuning guide – too much detail might cause confusion).  The description of how SQL processing works, and how to understand execution plans is on par with (or maybe even slightly better than) the treatment in the “Troubleshooting Oracle Performance” book (that reminds me, I still need to finish the second read through of the “Troubleshooting Oracle Performance” book).

As I have done with the last four books that I reviewed, I selected an interesting couple of sentences from this book.  What, if anything, is wrong with the following quote from page 94:

“Based on the table and index statistics, the table that is determined to return the fewest rows will be hashed in its entirety into memory. This hash table includes all of the row data for that table and is loaded into hash buckets based on a randomizing function that converts the join key to a hash value. As long as there is enough memory available, this hash table will reside in memory. However, if there is not enough memory available, the hash table may be written to temp disk space.”

Not as easy of an assignment as with some of the previously extracted quotes?

How about a hint?  Create two simple tables, and collect the full statistics (do not estimate the statistics) on those tables:

CREATE TABLE
  T1
AS
SELECT
  ROWNUM C1,
  LPAD(TO_CHAR(ROWNUM),10,'0') C2,
  LPAD('A',200,'A') PADDING
FROM
  DUAL
CONNECT BY
  LEVEL<=100000;

CREATE TABLE
  T2
AS
SELECT
  ROWNUM C1,
  LPAD(TO_CHAR(ROWNUM),10,'0') C2,
  LPAD('A',400,'A') PADDING
FROM
  DUAL
CONNECT BY
  LEVEL<=100000;

COMMIT;

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T1',ESTIMATE_PERCENT=>NULL)
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T2',ESTIMATE_PERCENT=>NULL) 

Now we should probably create a test script that uses the tables:

SET AUTOTRACE TRACEONLY EXPLAIN

SELECT
  *
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1;

SELECT
  T1.C1,
  T2.C1
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1;

SELECT
  T1.C1,
  T2.C1
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1
  AND T1.C2=T2.C2;

SELECT
  T1.C1,
  T2.C1,
  T1.C2
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1
  AND T1.C2=T2.C2;

SELECT
  T1.C1,
  T2.C1,
  T1.C2,
  T1.PADDING
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1
  AND T1.C2=T2.C2;

SELECT
  T1.C1,
  T2.C1,
  T1.C2,
  T1.PADDING,
  T2.PADDING
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1
  AND T1.C2=T2.C2; 

How did you do?  Did you find anything wrong with the quote from the book, or is it completely correct?

-

-

-

I hear someone shouting, “But the above script did not even execute a SQL statement!”.  OK, try this:

ALTER SYSTEM FLUSH SHARED_POOL;
ALTER SYSTEM FLUSH SHARED_POOL;
SET AUTOTRACE TRACEONLY STATISTICS
ALTER SESSION SET STATISTICS_LEVEL='ALL';
ALTER SYSTEM SET PGA_AGGREGATE_TARGET=2000M SCOPE=MEMORY;

Edit Dec 29, 2010: Wait 60 seconds before continuing, just to make certain that the changed PGA_AGGREGATE_TARGET had a chance to take effect (to avoid this trap).  Continuing:

SELECT
  *
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1;

SELECT
  T1.C1,
  T2.C1
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1;

SELECT
  T1.C1,
  T2.C1
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1
  AND T1.C2=T2.C2;

SELECT
  T1.C1,
  T2.C1,
  T1.C2
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1
  AND T1.C2=T2.C2;

SELECT
  T1.C1,
  T2.C1,
  T1.C2,
  T1.PADDING
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1
  AND T1.C2=T2.C2;

SELECT
  T1.C1,
  T2.C1,
  T1.C2,
  T1.PADDING,
  T2.PADDING
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1
  AND T1.C2=T2.C2;

SET AUTOTRACE OFF
SET LINESIZE 150
SET PAGESIZE 2000
SET TRIMSPOOL ON
SPOOL TEST_HASH_PLANS.TXT

SELECT /*+ LEADING(S) */
  T.PLAN_TABLE_OUTPUT
FROM
  (SELECT
    SQL_ID,
    CHILD_NUMBER
  FROM
    V$SQL
  WHERE
    SQL_TEXT LIKE 'SELECT%T1.C1=T2.C1%') S,
  TABLE(DBMS_XPLAN.DISPLAY_CURSOR(S.SQL_ID,S.CHILD_NUMBER,'ALLSTATS LAST')) T;

SPOOL OFF 

What, if anything, is wrong with the quote from the book?

-

-

-

-

-

Show you the output that I received?  OK:

SELECT
  *
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1;

Execution Plan
----------------------------------------------------------
Plan hash value: 1838229974

-----------------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |   100K|    60M|       |  1493   (1)| 00:00:12 |
|*  1 |  HASH JOIN         |      |   100K|    60M|    21M|  1493   (1)| 00:00:12 |
|   2 |   TABLE ACCESS FULL| T1   |   100K|    20M|       |   248   (1)| 00:00:02 |
|   3 |   TABLE ACCESS FULL| T2   |   100K|    39M|       |   464   (1)| 00:00:04 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."C1"="T2"."C1")

-

SELECT
  T1.C1,
  T2.C1
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1;

Execution Plan
----------------------------------------------------------
Plan hash value: 1838229974

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |   100K|   976K|   713   (1)| 00:00:06 |
|*  1 |  HASH JOIN         |      |   100K|   976K|   713   (1)| 00:00:06 |
|   2 |   TABLE ACCESS FULL| T1   |   100K|   488K|   248   (1)| 00:00:02 |
|   3 |   TABLE ACCESS FULL| T2   |   100K|   488K|   464   (1)| 00:00:04 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."C1"="T2"."C1")

-

SELECT
  T1.C1,
  T2.C1
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1
  AND T1.C2=T2.C2;

Execution Plan
----------------------------------------------------------
Plan hash value: 1838229974

-----------------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |   100K|  3125K|       |   781   (1)| 00:00:07 |
|*  1 |  HASH JOIN         |      |   100K|  3125K|  2736K|   781   (1)| 00:00:07 |
|   2 |   TABLE ACCESS FULL| T1   |   100K|  1562K|       |   248   (1)| 00:00:02 |
|   3 |   TABLE ACCESS FULL| T2   |   100K|  1562K|       |   464   (1)| 00:00:04 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."C1"="T2"."C1" AND "T1"."C2"="T2"."C2")

-

SELECT
  T1.C1,
  T2.C1,
  T1.C2
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1
  AND T1.C2=T2.C2;

Execution Plan
----------------------------------------------------------
Plan hash value: 1838229974

-----------------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |   100K|  3125K|       |   781   (1)| 00:00:07 |
|*  1 |  HASH JOIN         |      |   100K|  3125K|  2736K|   781   (1)| 00:00:07 |
|   2 |   TABLE ACCESS FULL| T1   |   100K|  1562K|       |   248   (1)| 00:00:02 |
|   3 |   TABLE ACCESS FULL| T2   |   100K|  1562K|       |   464   (1)| 00:00:04 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."C1"="T2"."C1" AND "T1"."C2"="T2"."C2")

-

SELECT
  T1.C1,
  T2.C1,
  T1.C2,
  T1.PADDING
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1
  AND T1.C2=T2.C2;

Execution Plan
----------------------------------------------------------
Plan hash value: 2959412835

-----------------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |   100K|    22M|       |  1018   (1)| 00:00:09 |
|*  1 |  HASH JOIN         |      |   100K|    22M|  2736K|  1018   (1)| 00:00:09 |
|   2 |   TABLE ACCESS FULL| T2   |   100K|  1562K|       |   464   (1)| 00:00:04 |
|   3 |   TABLE ACCESS FULL| T1   |   100K|    20M|       |   248   (1)| 00:00:02 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."C1"="T2"."C1" AND "T1"."C2"="T2"."C2")

-

SELECT
  T1.C1,
  T2.C1,
  T1.C2,
  T1.PADDING,
  T2.PADDING
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1
  AND T1.C2=T2.C2;

Execution Plan
----------------------------------------------------------
Plan hash value: 1838229974

-----------------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |   100K|    60M|       |  1493   (1)| 00:00:12 |
|*  1 |  HASH JOIN         |      |   100K|    60M|    21M|  1493   (1)| 00:00:12 |
|   2 |   TABLE ACCESS FULL| T1   |   100K|    20M|       |   248   (1)| 00:00:02 |
|   3 |   TABLE ACCESS FULL| T2   |   100K|    39M|       |   464   (1)| 00:00:04 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."C1"="T2"."C1" AND "T1"."C2"="T2"."C2") 

The output from the second script:

SQL_ID  c2nzwfydj9nft, child number 0
-------------------------------------
SELECT   * FROM   T1,   T2 WHERE   T1.C1=T2.C1

Plan hash value: 1838229974

----------------------------------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |    100K|00:00:00.24 |   15298 |       |       |          |
|*  1 |  HASH JOIN         |      |      1 |    100K|    100K|00:00:00.24 |   15298 |    25M|  3830K|   28M (0)|
|   2 |   TABLE ACCESS FULL| T1   |      1 |    100K|    100K|00:00:00.01 |    3131 |       |       |          |
|   3 |   TABLE ACCESS FULL| T2   |      1 |    100K|    100K|00:00:00.07 |   12167 |       |       |          |
----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."C1"="T2"."C1")

-

SQL_ID  1r4up8dp9fc00, child number 0
-------------------------------------
SELECT   T1.C1,   T2.C1 FROM   T1,   T2 WHERE   T1.C1=T2.C1

Plan hash value: 1838229974

----------------------------------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |    100K|00:00:00.16 |   15298 |       |       |          |
|*  1 |  HASH JOIN         |      |      1 |    100K|    100K|00:00:00.16 |   15298 |  3749K|  1936K| 5635K (0)|
|   2 |   TABLE ACCESS FULL| T1   |      1 |    100K|    100K|00:00:00.01 |    3131 |       |       |          |
|   3 |   TABLE ACCESS FULL| T2   |      1 |    100K|    100K|00:00:00.05 |   12167 |       |       |          |
----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."C1"="T2"."C1")

-

SQL_ID  0rnjh74c5m1dc, child number 0
-------------------------------------
SELECT   T1.C1,   T2.C1 FROM   T1,   T2 WHERE   T1.C1=T2.C1   AND
T1.C2=T2.C2

Plan hash value: 1838229974

----------------------------------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |    100K|00:00:00.25 |   15298 |       |       |          |
|*  1 |  HASH JOIN         |      |      1 |    100K|    100K|00:00:00.25 |   15298 |  4670K|  1438K| 6600K (0)|
|   2 |   TABLE ACCESS FULL| T1   |      1 |    100K|    100K|00:00:00.01 |    3131 |       |       |          |
|   3 |   TABLE ACCESS FULL| T2   |      1 |    100K|    100K|00:00:00.09 |   12167 |       |       |          |
----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."C1"="T2"."C1" AND "T1"."C2"="T2"."C2")

-

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------
SQL_ID  c691ypk4k096j, child number 0
-------------------------------------
SELECT   T1.C1,   T2.C1,   T1.C2 FROM   T1,   T2 WHERE   T1.C1=T2.C1
AND T1.C2=T2.C2

Plan hash value: 1838229974

----------------------------------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |    100K|00:00:00.23 |   15298 |       |       |          |
|*  1 |  HASH JOIN         |      |      1 |    100K|    100K|00:00:00.23 |   15298 |  4670K|  1438K| 6600K (0)|
|   2 |   TABLE ACCESS FULL| T1   |      1 |    100K|    100K|00:00:00.01 |    3131 |       |       |          |
|   3 |   TABLE ACCESS FULL| T2   |      1 |    100K|    100K|00:00:00.08 |   12167 |       |       |          |
----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."C1"="T2"."C1" AND "T1"."C2"="T2"."C2")

-

SQL_ID  dpq7d224rxc6t, child number 0
-------------------------------------
SELECT   T1.C1,   T2.C1,   T1.C2,   T1.PADDING FROM   T1,   T2 WHERE
T1.C1=T2.C1   AND T1.C2=T2.C2

Plan hash value: 2959412835

----------------------------------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |    100K|00:00:00.17 |   15482 |       |       |          |
|*  1 |  HASH JOIN         |      |      1 |    100K|    100K|00:00:00.17 |   15482 |  4670K|  1438K| 6600K (0)|
|   2 |   TABLE ACCESS FULL| T2   |      1 |    100K|    100K|00:00:00.01 |    5892 |       |       |          |
|   3 |   TABLE ACCESS FULL| T1   |      1 |    100K|    100K|00:00:00.05 |    9590 |       |       |          |
----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."C1"="T2"."C1" AND "T1"."C2"="T2"."C2")

-

SQL_ID  0k4q3ahjzu9nd, child number 0
-------------------------------------
SELECT   T1.C1,   T2.C1,   T1.C2,   T1.PADDING,   T2.PADDING FROM   T1,
  T2 WHERE   T1.C1=T2.C1   AND T1.C2=T2.C2

Plan hash value: 1838229974

----------------------------------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |    100K|00:00:00.24 |   15298 |       |       |          |
|*  1 |  HASH JOIN         |      |      1 |    100K|    100K|00:00:00.24 |   15298 |    25M|  3830K|   28M (0)|
|   2 |   TABLE ACCESS FULL| T1   |      1 |    100K|    100K|00:00:00.01 |    3131 |       |       |          |
|   3 |   TABLE ACCESS FULL| T2   |      1 |    100K|    100K|00:00:00.06 |   12167 |       |       |          |
----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."C1"="T2"."C1" AND "T1"."C2"="T2"."C2") 

I think that I forgot the question… was it something about the value of test cases?  :-)








Follow

Get every new post delivered to your Inbox.

Join 143 other followers