Hash Joins – What is Wrong with this Statement?

29 12 2010

December 29, 2010

I started reading the book “Pro Oracle SQL” a couple of days ago, and I am having trouble putting it down.  Other than a couple of minor word substitutions and intended, but not specified, qualifying words, the first 100 pages of the book are fantastic (I suspect that many of these cases are intentional since that book is not advertised as a SQL performance tuning guide – too much detail might cause confusion).  The description of how SQL processing works, and how to understand execution plans is on par with (or maybe even slightly better than) the treatment in the “Troubleshooting Oracle Performance” book (that reminds me, I still need to finish the second read through of the “Troubleshooting Oracle Performance” book).

As I have done with the last four books that I reviewed, I selected an interesting couple of sentences from this book.  What, if anything, is wrong with the following quote from page 94:

“Based on the table and index statistics, the table that is determined to return the fewest rows will be hashed in its entirety into memory. This hash table includes all of the row data for that table and is loaded into hash buckets based on a randomizing function that converts the join key to a hash value. As long as there is enough memory available, this hash table will reside in memory. However, if there is not enough memory available, the hash table may be written to temp disk space.”

Not as easy of an assignment as with some of the previously extracted quotes?

How about a hint?  Create two simple tables, and collect the full statistics (do not estimate the statistics) on those tables:

CREATE TABLE
  T1
AS
SELECT
  ROWNUM C1,
  LPAD(TO_CHAR(ROWNUM),10,'0') C2,
  LPAD('A',200,'A') PADDING
FROM
  DUAL
CONNECT BY
  LEVEL<=100000;

CREATE TABLE
  T2
AS
SELECT
  ROWNUM C1,
  LPAD(TO_CHAR(ROWNUM),10,'0') C2,
  LPAD('A',400,'A') PADDING
FROM
  DUAL
CONNECT BY
  LEVEL<=100000;

COMMIT;

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T1',ESTIMATE_PERCENT=>NULL)
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T2',ESTIMATE_PERCENT=>NULL) 

Now we should probably create a test script that uses the tables:

SET AUTOTRACE TRACEONLY EXPLAIN

SELECT
  *
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1;

SELECT
  T1.C1,
  T2.C1
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1;

SELECT
  T1.C1,
  T2.C1
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1
  AND T1.C2=T2.C2;

SELECT
  T1.C1,
  T2.C1,
  T1.C2
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1
  AND T1.C2=T2.C2;

SELECT
  T1.C1,
  T2.C1,
  T1.C2,
  T1.PADDING
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1
  AND T1.C2=T2.C2;

SELECT
  T1.C1,
  T2.C1,
  T1.C2,
  T1.PADDING,
  T2.PADDING
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1
  AND T1.C2=T2.C2; 

How did you do?  Did you find anything wrong with the quote from the book, or is it completely correct?

-

-

-

I hear someone shouting, “But the above script did not even execute a SQL statement!”.  OK, try this:

ALTER SYSTEM FLUSH SHARED_POOL;
ALTER SYSTEM FLUSH SHARED_POOL;
SET AUTOTRACE TRACEONLY STATISTICS
ALTER SESSION SET STATISTICS_LEVEL='ALL';
ALTER SYSTEM SET PGA_AGGREGATE_TARGET=2000M SCOPE=MEMORY;

Edit Dec 29, 2010: Wait 60 seconds before continuing, just to make certain that the changed PGA_AGGREGATE_TARGET had a chance to take effect (to avoid this trap).  Continuing:

SELECT
  *
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1;

SELECT
  T1.C1,
  T2.C1
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1;

SELECT
  T1.C1,
  T2.C1
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1
  AND T1.C2=T2.C2;

SELECT
  T1.C1,
  T2.C1,
  T1.C2
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1
  AND T1.C2=T2.C2;

SELECT
  T1.C1,
  T2.C1,
  T1.C2,
  T1.PADDING
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1
  AND T1.C2=T2.C2;

SELECT
  T1.C1,
  T2.C1,
  T1.C2,
  T1.PADDING,
  T2.PADDING
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1
  AND T1.C2=T2.C2;

SET AUTOTRACE OFF
SET LINESIZE 150
SET PAGESIZE 2000
SET TRIMSPOOL ON
SPOOL TEST_HASH_PLANS.TXT

SELECT /*+ LEADING(S) */
  T.PLAN_TABLE_OUTPUT
FROM
  (SELECT
    SQL_ID,
    CHILD_NUMBER
  FROM
    V$SQL
  WHERE
    SQL_TEXT LIKE 'SELECT%T1.C1=T2.C1%') S,
  TABLE(DBMS_XPLAN.DISPLAY_CURSOR(S.SQL_ID,S.CHILD_NUMBER,'ALLSTATS LAST')) T;

SPOOL OFF 

What, if anything, is wrong with the quote from the book?

-

-

-

-

-

Show you the output that I received?  OK:

SELECT
  *
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1;

Execution Plan
----------------------------------------------------------
Plan hash value: 1838229974

-----------------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |   100K|    60M|       |  1493   (1)| 00:00:12 |
|*  1 |  HASH JOIN         |      |   100K|    60M|    21M|  1493   (1)| 00:00:12 |
|   2 |   TABLE ACCESS FULL| T1   |   100K|    20M|       |   248   (1)| 00:00:02 |
|   3 |   TABLE ACCESS FULL| T2   |   100K|    39M|       |   464   (1)| 00:00:04 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."C1"="T2"."C1")

-

SELECT
  T1.C1,
  T2.C1
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1;

Execution Plan
----------------------------------------------------------
Plan hash value: 1838229974

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |   100K|   976K|   713   (1)| 00:00:06 |
|*  1 |  HASH JOIN         |      |   100K|   976K|   713   (1)| 00:00:06 |
|   2 |   TABLE ACCESS FULL| T1   |   100K|   488K|   248   (1)| 00:00:02 |
|   3 |   TABLE ACCESS FULL| T2   |   100K|   488K|   464   (1)| 00:00:04 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."C1"="T2"."C1")

-

SELECT
  T1.C1,
  T2.C1
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1
  AND T1.C2=T2.C2;

Execution Plan
----------------------------------------------------------
Plan hash value: 1838229974

-----------------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |   100K|  3125K|       |   781   (1)| 00:00:07 |
|*  1 |  HASH JOIN         |      |   100K|  3125K|  2736K|   781   (1)| 00:00:07 |
|   2 |   TABLE ACCESS FULL| T1   |   100K|  1562K|       |   248   (1)| 00:00:02 |
|   3 |   TABLE ACCESS FULL| T2   |   100K|  1562K|       |   464   (1)| 00:00:04 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."C1"="T2"."C1" AND "T1"."C2"="T2"."C2")

-

SELECT
  T1.C1,
  T2.C1,
  T1.C2
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1
  AND T1.C2=T2.C2;

Execution Plan
----------------------------------------------------------
Plan hash value: 1838229974

-----------------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |   100K|  3125K|       |   781   (1)| 00:00:07 |
|*  1 |  HASH JOIN         |      |   100K|  3125K|  2736K|   781   (1)| 00:00:07 |
|   2 |   TABLE ACCESS FULL| T1   |   100K|  1562K|       |   248   (1)| 00:00:02 |
|   3 |   TABLE ACCESS FULL| T2   |   100K|  1562K|       |   464   (1)| 00:00:04 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."C1"="T2"."C1" AND "T1"."C2"="T2"."C2")

-

SELECT
  T1.C1,
  T2.C1,
  T1.C2,
  T1.PADDING
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1
  AND T1.C2=T2.C2;

Execution Plan
----------------------------------------------------------
Plan hash value: 2959412835

-----------------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |   100K|    22M|       |  1018   (1)| 00:00:09 |
|*  1 |  HASH JOIN         |      |   100K|    22M|  2736K|  1018   (1)| 00:00:09 |
|   2 |   TABLE ACCESS FULL| T2   |   100K|  1562K|       |   464   (1)| 00:00:04 |
|   3 |   TABLE ACCESS FULL| T1   |   100K|    20M|       |   248   (1)| 00:00:02 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."C1"="T2"."C1" AND "T1"."C2"="T2"."C2")

-

SELECT
  T1.C1,
  T2.C1,
  T1.C2,
  T1.PADDING,
  T2.PADDING
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1
  AND T1.C2=T2.C2;

Execution Plan
----------------------------------------------------------
Plan hash value: 1838229974

-----------------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |   100K|    60M|       |  1493   (1)| 00:00:12 |
|*  1 |  HASH JOIN         |      |   100K|    60M|    21M|  1493   (1)| 00:00:12 |
|   2 |   TABLE ACCESS FULL| T1   |   100K|    20M|       |   248   (1)| 00:00:02 |
|   3 |   TABLE ACCESS FULL| T2   |   100K|    39M|       |   464   (1)| 00:00:04 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."C1"="T2"."C1" AND "T1"."C2"="T2"."C2") 

The output from the second script:

SQL_ID  c2nzwfydj9nft, child number 0
-------------------------------------
SELECT   * FROM   T1,   T2 WHERE   T1.C1=T2.C1

Plan hash value: 1838229974

----------------------------------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |    100K|00:00:00.24 |   15298 |       |       |          |
|*  1 |  HASH JOIN         |      |      1 |    100K|    100K|00:00:00.24 |   15298 |    25M|  3830K|   28M (0)|
|   2 |   TABLE ACCESS FULL| T1   |      1 |    100K|    100K|00:00:00.01 |    3131 |       |       |          |
|   3 |   TABLE ACCESS FULL| T2   |      1 |    100K|    100K|00:00:00.07 |   12167 |       |       |          |
----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."C1"="T2"."C1")

-

SQL_ID  1r4up8dp9fc00, child number 0
-------------------------------------
SELECT   T1.C1,   T2.C1 FROM   T1,   T2 WHERE   T1.C1=T2.C1

Plan hash value: 1838229974

----------------------------------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |    100K|00:00:00.16 |   15298 |       |       |          |
|*  1 |  HASH JOIN         |      |      1 |    100K|    100K|00:00:00.16 |   15298 |  3749K|  1936K| 5635K (0)|
|   2 |   TABLE ACCESS FULL| T1   |      1 |    100K|    100K|00:00:00.01 |    3131 |       |       |          |
|   3 |   TABLE ACCESS FULL| T2   |      1 |    100K|    100K|00:00:00.05 |   12167 |       |       |          |
----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."C1"="T2"."C1")

-

SQL_ID  0rnjh74c5m1dc, child number 0
-------------------------------------
SELECT   T1.C1,   T2.C1 FROM   T1,   T2 WHERE   T1.C1=T2.C1   AND
T1.C2=T2.C2

Plan hash value: 1838229974

----------------------------------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |    100K|00:00:00.25 |   15298 |       |       |          |
|*  1 |  HASH JOIN         |      |      1 |    100K|    100K|00:00:00.25 |   15298 |  4670K|  1438K| 6600K (0)|
|   2 |   TABLE ACCESS FULL| T1   |      1 |    100K|    100K|00:00:00.01 |    3131 |       |       |          |
|   3 |   TABLE ACCESS FULL| T2   |      1 |    100K|    100K|00:00:00.09 |   12167 |       |       |          |
----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."C1"="T2"."C1" AND "T1"."C2"="T2"."C2")

-

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------
SQL_ID  c691ypk4k096j, child number 0
-------------------------------------
SELECT   T1.C1,   T2.C1,   T1.C2 FROM   T1,   T2 WHERE   T1.C1=T2.C1
AND T1.C2=T2.C2

Plan hash value: 1838229974

----------------------------------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |    100K|00:00:00.23 |   15298 |       |       |          |
|*  1 |  HASH JOIN         |      |      1 |    100K|    100K|00:00:00.23 |   15298 |  4670K|  1438K| 6600K (0)|
|   2 |   TABLE ACCESS FULL| T1   |      1 |    100K|    100K|00:00:00.01 |    3131 |       |       |          |
|   3 |   TABLE ACCESS FULL| T2   |      1 |    100K|    100K|00:00:00.08 |   12167 |       |       |          |
----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."C1"="T2"."C1" AND "T1"."C2"="T2"."C2")

-

SQL_ID  dpq7d224rxc6t, child number 0
-------------------------------------
SELECT   T1.C1,   T2.C1,   T1.C2,   T1.PADDING FROM   T1,   T2 WHERE
T1.C1=T2.C1   AND T1.C2=T2.C2

Plan hash value: 2959412835

----------------------------------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |    100K|00:00:00.17 |   15482 |       |       |          |
|*  1 |  HASH JOIN         |      |      1 |    100K|    100K|00:00:00.17 |   15482 |  4670K|  1438K| 6600K (0)|
|   2 |   TABLE ACCESS FULL| T2   |      1 |    100K|    100K|00:00:00.01 |    5892 |       |       |          |
|   3 |   TABLE ACCESS FULL| T1   |      1 |    100K|    100K|00:00:00.05 |    9590 |       |       |          |
----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."C1"="T2"."C1" AND "T1"."C2"="T2"."C2")

-

SQL_ID  0k4q3ahjzu9nd, child number 0
-------------------------------------
SELECT   T1.C1,   T2.C1,   T1.C2,   T1.PADDING,   T2.PADDING FROM   T1,
  T2 WHERE   T1.C1=T2.C1   AND T1.C2=T2.C2

Plan hash value: 1838229974

----------------------------------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |    100K|00:00:00.24 |   15298 |       |       |          |
|*  1 |  HASH JOIN         |      |      1 |    100K|    100K|00:00:00.24 |   15298 |    25M|  3830K|   28M (0)|
|   2 |   TABLE ACCESS FULL| T1   |      1 |    100K|    100K|00:00:00.01 |    3131 |       |       |          |
|   3 |   TABLE ACCESS FULL| T2   |      1 |    100K|    100K|00:00:00.06 |   12167 |       |       |          |
----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."C1"="T2"."C1" AND "T1"."C2"="T2"."C2") 

I think that I forgot the question… was it something about the value of test cases?  :-)


Actions

Information

7 responses

29 12 2010
Nigel Thomas

I would assume that the expected size in memory of each possible hashed table would considered (accurately or not depending on what statistics were available); applying any possible filters and projecting only the required columns into the hash table could considerably reduce the memory footprint.

In some cases it would be better to have the table with the larger number of rows but smaller size in the hash table; that would take up less memory and require fewer probes into the hash table during the hash join itself as the second table (with fewer/larger rows) is scanned.

Mind you, at the moment that’s based on logic, not observation.

Regards Nigel

29 12 2010
Martin Berger

You gave a hice hint with Chris’ TOP.
And I was quite too lazy to test it myself, but as you used a different size of padding columns, this was the 2nd hint.
So I’d say ‘…return the fewest rows…’ is a good start for general use, but ‘… fewest bytes …’ would be some more precise.
But for the correct wording (‘byte’, ‘data’, ‘size’???) a lot more testing and reading of 10053 traces might be needed.
I’d start to check sum(AVG_COL_LEN [of all columns of interest]) * num_rows. But I can not even imagine where in the cavern of the Optimizer it will end.
It makes totally sense as oracle tries to save as much resources as it can. Therefore it only reads all those columns into memory it will need later – nothing more (but not only those it needs for a join – otherwise it might need to revisit some rows a 2nd time for the ‘other’ columns).

30 12 2010
Charles Hooper

I think that the two earlier comments (by Nigel Thomas and Martin Berger) identified what caused me to stop, think, … and understand when I read this portion of the book. The book (at least the first 160 pages) is so well written, that there is no doubt that the authors had to draw the line at some point and say that enough detail is enough detail. I think that a couple of additional qualifying words in these sentences might have helped, and the original version of the test case that I provided shows why, I think.

Let’s make a couple of changes to the test case script to see what happens (table T1 will have 100,000 rows, while table t2 will have 90,000 rows with an average row length that is nearly double that of table T1):

DROP TABLE T1 PURGE;
DROP TABLE T2 PURGE;
 
CREATE TABLE
  T1
AS
SELECT
  ROWNUM C1,
  LPAD(TO_CHAR(ROWNUM),10,'0') C2,
  LPAD('A',200,'A') PADDING
FROM
  DUAL
CONNECT BY
  LEVEL<=100000;
 
CREATE TABLE
  T2
AS
SELECT
  ROWNUM C1,
  LPAD(TO_CHAR(ROWNUM),10,'0') C2,
  LPAD('A',400,'A') PADDING
FROM
  DUAL
CONNECT BY
  LEVEL<=90000;
 
COMMIT;
 
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T1',ESTIMATE_PERCENT=>NULL)
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T2',ESTIMATE_PERCENT=>NULL)

For the AUTOTRACE TRACEONLY EXPLAIN script I received the following output – compare the position of table T2 in each of the execution plans with the execution plans from the original execution:

SQL> SELECT
  2    *
  3  FROM
  4    T1,
  5    T2
  6  WHERE
  7    T1.C1=T2.C1;
 
Execution Plan
----------------------------------------------------------
Plan hash value: 1838229974
 
-----------------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      | 90000 |    54M|       |  1396   (1)| 00:00:12 |
|*  1 |  HASH JOIN         |      | 90000 |    54M|    21M|  1396   (1)| 00:00:12 |
|   2 |   TABLE ACCESS FULL| T1   |   100K|    20M|       |   248   (1)| 00:00:02 |
|   3 |   TABLE ACCESS FULL| T2   | 90000 |    35M|       |   418   (1)| 00:00:04 |
-----------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."C1"="T2"."C1")
 
-
 
SQL> SELECT
  2    T1.C1,
  3    T2.C1
  4  FROM
  5    T1,
  6    T2
  7  WHERE
  8    T1.C1=T2.C1;
 
Execution Plan
----------------------------------------------------------
Plan hash value: 2959412835
 
---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      | 90000 |   878K|   667   (1)| 00:00:06 |
|*  1 |  HASH JOIN         |      | 90000 |   878K|   667   (1)| 00:00:06 |
|   2 |   TABLE ACCESS FULL| T2   | 90000 |   439K|   417   (1)| 00:00:04 |
|   3 |   TABLE ACCESS FULL| T1   |   100K|   488K|   248   (1)| 00:00:02 |
---------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."C1"="T2"."C1")
 
-
 
SQL> SELECT
  2    T1.C1,
  3    T2.C1
  4  FROM
  5    T1,
  6    T2
  7  WHERE
  8    T1.C1=T2.C1
  9    AND T1.C2=T2.C2;
 
Execution Plan
----------------------------------------------------------
Plan hash value: 2959412835
 
-----------------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      | 90000 |  2812K|       |   731   (1)| 00:00:06 |
|*  1 |  HASH JOIN         |      | 90000 |  2812K|  2464K|   731   (1)| 00:00:06 |
|   2 |   TABLE ACCESS FULL| T2   | 90000 |  1406K|       |   417   (1)| 00:00:04 |
|   3 |   TABLE ACCESS FULL| T1   |   100K|  1562K|       |   248   (1)| 00:00:02 |
-----------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."C1"="T2"."C1" AND "T1"."C2"="T2"."C2")
 
-
 
SQL> SELECT
  2    T1.C1,
  3    T2.C1,
  4    T1.C2
  5  FROM
  6    T1,
  7    T2
  8  WHERE
  9    T1.C1=T2.C1
 10    AND T1.C2=T2.C2;
 
Execution Plan
----------------------------------------------------------
Plan hash value: 2959412835
 
-----------------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      | 90000 |  2812K|       |   731   (1)| 00:00:06 |
|*  1 |  HASH JOIN         |      | 90000 |  2812K|  2464K|   731   (1)| 00:00:06 |
|   2 |   TABLE ACCESS FULL| T2   | 90000 |  1406K|       |   417   (1)| 00:00:04 |
|   3 |   TABLE ACCESS FULL| T1   |   100K|  1562K|       |   248   (1)| 00:00:02 |
-----------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."C1"="T2"."C1" AND "T1"."C2"="T2"."C2")
 
-
 
SQL> SELECT
  2    T1.C1,
  3    T2.C1,
  4    T1.C2,
  5    T1.PADDING
  6  FROM
  7    T1,
  8    T2
  9  WHERE
 10    T1.C1=T2.C1
 11    AND T1.C2=T2.C2;
 
Execution Plan
----------------------------------------------------------
Plan hash value: 2959412835
 
-----------------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      | 90000 |    19M|       |   969   (1)| 00:00:08 |
|*  1 |  HASH JOIN         |      | 90000 |    19M|  2464K|   969   (1)| 00:00:08 |
|   2 |   TABLE ACCESS FULL| T2   | 90000 |  1406K|       |   417   (1)| 00:00:04 |
|   3 |   TABLE ACCESS FULL| T1   |   100K|    20M|       |   248   (1)| 00:00:02 |
-----------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."C1"="T2"."C1" AND "T1"."C2"="T2"."C2")
 
-
 
SQL> SELECT
  2    T1.C1,
  3    T2.C1,
  4    T1.C2,
  5    T1.PADDING,
  6    T2.PADDING
  7  FROM
  8    T1,
  9    T2
 10  WHERE
 11    T1.C1=T2.C1
 12    AND T1.C2=T2.C2;
 
Execution Plan
----------------------------------------------------------
Plan hash value: 1838229974
 
-----------------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      | 90000 |    54M|       |  1396   (1)| 00:00:12 |
|*  1 |  HASH JOIN         |      | 90000 |    54M|    21M|  1396   (1)| 00:00:12 |
|   2 |   TABLE ACCESS FULL| T1   |   100K|    20M|       |   248   (1)| 00:00:02 |
|   3 |   TABLE ACCESS FULL| T2   | 90000 |    35M|       |   418   (1)| 00:00:04 |
-----------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."C1"="T2"."C1" AND "T1"."C2"="T2"."C2")

Let’s try again, this time with 1,000,000 rows in table T1 and 1,500,000 rows in table T2:

DROP TABLE T1 PURGE;
DROP TABLE T2 PURGE;
 
CREATE TABLE
  T1
AS
SELECT
  ROWNUM C1,
  LPAD(TO_CHAR(ROWNUM),10,'0') C2,
  LPAD('A',200,'A') PADDING
FROM
  DUAL
CONNECT BY
  LEVEL<=1000000;
 
CREATE TABLE
  T2
AS
SELECT
  ROWNUM C1,
  LPAD(TO_CHAR(ROWNUM),10,'0') C2,
  LPAD('A',400,'A') PADDING
FROM
  DUAL
CONNECT BY
  LEVEL<=1500000;
 
COMMIT;
 
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T1',ESTIMATE_PERCENT=>NULL)
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T2',ESTIMATE_PERCENT=>NULL)

This time, I will set the array fetch size to 1000, execute the queries, and retrieve the actual execution plans with the following:

SET AUTOTRACE OFF
SET LINESIZE 150
SET PAGESIZE 2000
SET TRIMSPOOL ON
SPOOL TEST_HASH_PLANS_LARGE2.TXT
 
SELECT /*+ LEADING(S) */
  T.PLAN_TABLE_OUTPUT
FROM
  (SELECT
    SQL_ID,
    CHILD_NUMBER
  FROM
    V$SQL
  WHERE
    SQL_TEXT LIKE 'SELECT%T1.C1=T2.C1%') S,
  TABLE(DBMS_XPLAN.DISPLAY_CURSOR(S.SQL_ID,S.CHILD_NUMBER,'ALLSTATS LAST +ROWS +COST +BYTES')) T;
 
SPOOL OFF

I received the following output, again, pay attention to the position of table T2 and the memory utilization:

SQL_ID  c2nzwfydj9nft, child number 0
-------------------------------------
SELECT   * FROM   T1,   T2 WHERE   T1.C1=T2.C1

Plan hash value: 1838229974

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows |E-Bytes|E-Temp | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  | Writes |  OMem |  1Mem | Used-Mem | Used-Tmp|
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |       |       | 19750 (100)|   1000K|00:00:03.39 |     120K|  44361 |  44361 |       |       |          |         |
|*  1 |  HASH JOIN         |      |      1 |   1000K|   604M|   218M| 19750   (1)|   1000K|00:00:03.39 |     120K|  44361 |  44361 |   244M|    14M|  211M (1)|     364K|
|   2 |   TABLE ACCESS FULL| T1   |      1 |   1000K|   206M|       |  2462   (1)|   1000K|00:00:00.13 |   31263 |      0 |      0 |       |       |          |         |
|   3 |   TABLE ACCESS FULL| T2   |      1 |   1500K|   596M|       |  6938   (1)|   1500K|00:00:00.29 |   88988 |      0 |      0 |       |       |          |         |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."C1"="T2"."C1")
SQL_ID  1r4up8dp9fc00, child number 0
-------------------------------------
SELECT   T1.C1,   T2.C1 FROM   T1,   T2 WHERE   T1.C1=T2.C1

Plan hash value: 1838229974

---------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows |E-Bytes|E-Temp | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
---------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |       |       |  9938 (100)|   1000K|00:00:01.22 |     120K|       |       |          |
|*  1 |  HASH JOIN         |      |      1 |   1000K|    10M|    16M|  9938   (1)|   1000K|00:00:01.22 |     120K|    27M|  3871K|   43M (0)|
|   2 |   TABLE ACCESS FULL| T1   |      1 |   1000K|  4882K|       |  2460   (1)|   1000K|00:00:00.10 |   31263 |       |       |          |
|   3 |   TABLE ACCESS FULL| T2   |      1 |   1500K|  8789K|       |  6935   (1)|   1500K|00:00:00.23 |   89195 |       |       |          |
---------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."C1"="T2"."C1")
SQL_ID  0rnjh74c5m1dc, child number 0
-------------------------------------
SELECT   T1.C1,   T2.C1 FROM   T1,   T2 WHERE   T1.C1=T2.C1   AND
T1.C2=T2.C2

Plan hash value: 1838229974

---------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows |E-Bytes|E-Temp | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
---------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |       |       | 10266 (100)|   1000K|00:00:01.45 |     120K|       |       |          |
|*  1 |  HASH JOIN         |      |      1 |   1000K|    31M|    26M| 10266   (1)|   1000K|00:00:01.45 |     120K|    40M|  5750K|   59M (0)|
|   2 |   TABLE ACCESS FULL| T1   |      1 |   1000K|    15M|       |  2461   (1)|   1000K|00:00:00.10 |   31263 |       |       |          |
|   3 |   TABLE ACCESS FULL| T2   |      1 |   1500K|    24M|       |  6936   (1)|   1500K|00:00:00.26 |   89195 |       |       |          |
---------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."C1"="T2"."C1" AND "T1"."C2"="T2"."C2")
SQL_ID  c691ypk4k096j, child number 0
-------------------------------------
SELECT   T1.C1,   T2.C1,   T1.C2 FROM   T1,   T2 WHERE   T1.C1=T2.C1
AND T1.C2=T2.C2

Plan hash value: 1838229974

---------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows |E-Bytes|E-Temp | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
---------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |       |       | 10266 (100)|   1000K|00:00:01.52 |     120K|       |       |          |
|*  1 |  HASH JOIN         |      |      1 |   1000K|    31M|    26M| 10266   (1)|   1000K|00:00:01.52 |     120K|    40M|  5750K|   59M (0)|
|   2 |   TABLE ACCESS FULL| T1   |      1 |   1000K|    15M|       |  2461   (1)|   1000K|00:00:00.12 |   31263 |       |       |          |
|   3 |   TABLE ACCESS FULL| T2   |      1 |   1500K|    24M|       |  6936   (1)|   1500K|00:00:00.25 |   89195 |       |       |          |
---------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."C1"="T2"."C1" AND "T1"."C2"="T2"."C2")
SQL_ID  dpq7d224rxc6t, child number 0
-------------------------------------
SELECT   T1.C1,   T2.C1,   T1.C2,   T1.PADDING FROM   T1,   T2 WHERE
T1.C1=T2.C1   AND T1.C2=T2.C2

Plan hash value: 2959412835

---------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows |E-Bytes|E-Temp | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
---------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |       |       | 12646 (100)|   1000K|00:00:01.54 |     120K|       |       |          |
|*  1 |  HASH JOIN         |      |      1 |   1000K|   223M|    41M| 12646   (1)|   1000K|00:00:01.54 |     120K|    59M|  5667K|   85M (0)|
|   2 |   TABLE ACCESS FULL| T2   |      1 |   1500K|    24M|       |  6936   (1)|   1500K|00:00:00.20 |   88254 |       |       |          |
|   3 |   TABLE ACCESS FULL| T1   |      1 |   1000K|   206M|       |  2462   (1)|   1000K|00:00:00.15 |   32263 |       |       |          |
---------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."C1"="T2"."C1" AND "T1"."C2"="T2"."C2")
SQL_ID  0k4q3ahjzu9nd, child number 0
-------------------------------------
SELECT   T1.C1,   T2.C1,   T1.C2,   T1.PADDING,   T2.PADDING FROM   T1,
  T2 WHERE   T1.C1=T2.C1   AND T1.C2=T2.C2

Plan hash value: 1838229974

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows |E-Bytes|E-Temp | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  | Writes |  OMem |  1Mem | Used-Mem | Used-Tmp|
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |       |       | 19750 (100)|   1000K|00:00:03.53 |     120K|  44330 |  44330 |       |       |          |         |
|*  1 |  HASH JOIN         |      |      1 |   1000K|   604M|   218M| 19750   (1)|   1000K|00:00:03.53 |     120K|  44330 |  44330 |   244M|    14M|  211M (1)|     364K|
|   2 |   TABLE ACCESS FULL| T1   |      1 |   1000K|   206M|       |  2462   (1)|   1000K|00:00:00.12 |   31263 |      0 |      0 |       |       |          |         |
|   3 |   TABLE ACCESS FULL| T2   |      1 |   1500K|   596M|       |  6938   (1)|   1500K|00:00:00.28 |   88985 |      0 |      0 |       |       |          |         |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."C1"="T2"."C1" AND "T1"."C2"="T2"."C2")

Anything else wrong with the quote?

30 12 2010
Martin Berger

I don’t like it when you make me splitting hairs, but ok, here we go:
(I hope the code-tags works)

CREATE TABLE
  T1
AS
SELECT
  ROWNUM C1,
  LPAD(TO_CHAR(ROWNUM),10,'0') C2,
  LPAD('A',10,'A') P1,
  LPAD('A',200,'A') PADDING
FROM
  DUAL
CONNECT BY
  LEVEL<=100000;

CREATE TABLE
  T2
AS
SELECT
  ROWNUM C1,
  LPAD(TO_CHAR(ROWNUM),10,'0') C2,
  LPAD('A',10,'A') P1,
  LPAD('A',390,'A') PADDING
FROM
  DUAL
CONNECT BY
  LEVEL<=  x;

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T1',ESTIMATE_PERCENT=>NULL)
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T2',ESTIMATE_PERCENT=>NULL) 

SELECT
  T1.C1,
  T2.C1
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1;

(edit: merged comments)

CREATE INDEX
  I1
  ON T1 (P1, C1, C2);
COMMIT;

CREATE INDEX
  I2
  ON T2 (P1, C1, C2);
COMMIT;

and run the select:

SELECT
  T1.C1,
  T2.C1
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1;

So what am I complaining about?

(edit: WordPress seems to get confused in the comment sections when a < sign appears in a SQL statement, and then an attempt to gather statistics is made. It is a little odd, but use this (without spaces) to put a less than sign in the code – I can then fix it just by editing the comment and immediately saving the comment:
& lt;

Martin, what is the number that should take the place of x in your second CREATE TABLE statement?)

30 12 2010
Charles Hooper

Unfortunately, I am not sure how many rows are supposed to be in table T2 (WordPress was hungry), so I tried 2 tests. With 100,000 rows in table T2 I see this for the output:

Execution Plan
----------------------------------------------------------
Plan hash value: 1838229974

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |   100K|   976K|   721   (1)| 00:00:06 |
|*  1 |  HASH JOIN         |      |   100K|   976K|   721   (1)| 00:00:06 |
|   2 |   TABLE ACCESS FULL| T1   |   100K|   488K|   256   (1)| 00:00:03 |
|   3 |   TABLE ACCESS FULL| T2   |   100K|   488K|   464   (1)| 00:00:04 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."C1"="T2"."C1")
 
-
 
Execution Plan
----------------------------------------------------------
Plan hash value: 2034135612

------------------------------------------------------------------------------
| Id  | Operation             | Name | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |      |   100K|   976K|    89   (4)| 00:00:01 |
|*  1 |  HASH JOIN            |      |   100K|   976K|    89   (4)| 00:00:01 |
|   2 |   INDEX FAST FULL SCAN| I1   |   100K|   488K|    44   (3)| 00:00:01 |
|   3 |   INDEX FAST FULL SCAN| I2   |   100K|   488K|    44   (3)| 00:00:01 |
------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."C1"="T2"."C1")

With 90,000 rows in table T2 I see this for the output:

Execution Plan
----------------------------------------------------------
Plan hash value: 2959412835

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      | 90000 |   878K|   675   (1)| 00:00:06 |
|*  1 |  HASH JOIN         |      | 90000 |   878K|   675   (1)| 00:00:06 |
|   2 |   TABLE ACCESS FULL| T2   | 90000 |   439K|   417   (1)| 00:00:04 |
|   3 |   TABLE ACCESS FULL| T1   |   100K|   488K|   256   (1)| 00:00:03 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."C1"="T2"."C1")
 
-
 
Execution Plan
----------------------------------------------------------
Plan hash value: 4050175478

------------------------------------------------------------------------------
| Id  | Operation             | Name | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |      | 90000 |   878K|    85   (4)| 00:00:01 |
|*  1 |  HASH JOIN            |      | 90000 |   878K|    85   (4)| 00:00:01 |
|   2 |   INDEX FAST FULL SCAN| I2   | 90000 |   439K|    40   (3)| 00:00:01 |
|   3 |   INDEX FAST FULL SCAN| I1   |   100K|   488K|    44   (3)| 00:00:01 |
------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."C1"="T2"."C1")

With 90,000 rows in table T2 and only an index on table T1 I see this for the output:

Execution Plan
----------------------------------------------------------
Plan hash value: 1801612801

------------------------------------------------------------------------------
| Id  | Operation             | Name | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |      | 90000 |   878K|   463   (2)| 00:00:04 |
|*  1 |  HASH JOIN            |      | 90000 |   878K|   463   (2)| 00:00:04 |
|   2 |   TABLE ACCESS FULL   | T2   | 90000 |   439K|   417   (1)| 00:00:04 |
|   3 |   INDEX FAST FULL SCAN| I1   |   100K|   488K|    44   (3)| 00:00:01 |
------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."C1"="T2"."C1")

The last execution plan seems a bit odd, if in fact Oracle would have to hash the entire row contents of table T2 into memory.

30 12 2010
Martin Berger

sorry for the big confusion!
I used 100000 rows in both tables. But that does not make any difference here.
t’s this sentence: ‘… the table that is determined to return the fewest rows will be hashed in its entirety into memory.’
As you showed in some of your test-cases not a single table is used to provide all the rows and columns of interest.
Therefore I’d change this to ‘… the object …’.
Do you want anything more to go on? ;-)

1 01 2011
Martin Berger

something more here:
This hash table includes all of the row data for that table …
I’d like to read something similar to This hash table includes all of the required row data for that object …
Do you agree based on the observations we had in the replies here, or should I provide another demo?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s




Follow

Get every new post delivered to your Inbox.

Join 147 other followers

%d bloggers like this: