First Table is 550MB, Second Table is 26GB – Nested Loops or Full Table Scan?

1 10 2010

October 1, 2010

(Forward to the Next Post in the Series)

A couple of days ago I wrote a blog article about creating test case scripts to test ideas.  In a recent OTN thread the original poster asked about why his value for the DB_FILE_MULTIBLOCK_READ_COUNT parameter is not being honored.  The OP stated that his DB_FILE_MULTIBLOCK_READ_COUNT parameter was set to 128, yet when reviewing a 10046 trace at level 8 he was seeing that a maximum of 16 blocks were read from disk at a time while perfoming full table scans.  So, what might be limiting the multi-block reads to 16 blocks (I believe that one of my previous blog article mentioned the limiting factors)?  This would be a great use for a test case script, although with 26GB and 550MB tables you might want to consider reducing the size of the tables when building the test case script.

Interesting, however the unexpectedly low multi-block read size is not the subject of this blog article.  Someone mentioned in the OTN thread that when joining the 550MB table to the 26GB table, a nested loops join should be used because hashing could take a lot of time.  I was not expecting to see a comment like that, so I started to wonder, could it be true?  I wonder if one of my earlier blog articles touched on this idea?  Maybe I overlooked something in the documentation:

“The database uses hash joins to join large data sets. The optimizer uses the smaller of two tables or data sources to build a hash table on the join key in memory. It then scans the larger table, probing the hash table to find the joined rows.

This method is best when the smaller table fits in available memory. The cost is then limited to a single read pass over the data for the two tables.

11.3.4.1 When the Optimizer Uses Hash Joins
The optimizer uses a hash join to join two tables if they are joined using an equijoin and if either of the following conditions are true:

  • A large amount of data must be joined.
  • A large fraction of a small table must be joined.”

Does the situation change if there is no index on the join columns in either table?  Does the situation change if there is no index on any of the predicates in the WHERE clause?  Should Oracle Database repeatedly full scan the 550MB or the 26GB table while performing a nested loops join?  What is a large fraction of a small table, 75% of the rows, 50% of the rows, 10% of the rows, 1% of the rows?  Maybe the use of a nested loops join in such a case needs a bit more consideration.  Of course we could create a test case script to determine if this statement in the OTN thread is correct, although we may need to hint the join method in the SELECT statement to test the performance with both join methods.

The same person later suggested that if the first table is small (I guess that the 550MB table is the small one) and the driven table (I suspect that the 26GB table matches this description) contains an index on the join column, Oracle’s optimizer will favor nested loops joins over hash joins.  It might be interesting to construct a test case to see if this is true (Oracle Database 11.1.0.7 used for the test case results displayed below).  To save time I will create tables which are a bit smaller than the original 550MB and 26GB tables, which might throw off the test case a little in favor of nested loops joins, but I will keep an eye on the output for problems.  In my test case script, table T1 will be my “large” table and table T2 will be my “small” table.

CREATE TABLE T1 (
  COL1 NUMBER,
  COL2 NUMBER,
  COL3 VARCHAR2(200),
  PRIMARY KEY(COL1));

INSERT INTO
  T1
SELECT
  ROWNUM,
  10000000-ROWNUM,
  RPAD(TO_CHAR(ROWNUM),200,'A')
FROM
  (SELECT
    ROWNUM RN
  FROM
    DUAL
  CONNECT BY
    LEVEL<=1000) V1,
  (SELECT
    ROWNUM RN
  FROM
    DUAL
  CONNECT BY
    LEVEL<=1000) V2;

COMMIT;

ALTER TABLE T1 MODIFY(COL2 NOT NULL);

CREATE INDEX IND_T1_COL2 ON T1(COL2);

CREATE TABLE T2 (
  COL1 NUMBER,
  COL2 NUMBER,
  COL3 VARCHAR2(200),
  PRIMARY KEY(COL1));

INSERT INTO
  T2
SELECT
  ROWNUM,
  10000000-ROWNUM,
  RPAD(TO_CHAR(ROWNUM),200,'A')
FROM
  (SELECT
    ROWNUM RN
  FROM
    DUAL
  CONNECT BY
    LEVEL<=100000);

COMMIT;

ALTER TABLE T2 MODIFY(COL2 NOT NULL);

CREATE INDEX IND_T2_COL2 ON T2(COL2);

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T1',CASCADE=>TRUE)
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T2',CASCADE=>TRUE)

We now have two simple tables with a column having an ascending number sequence, a column have a descending number sequence, and a 200 byte padding column to discourage full table scans.  Now let’s perform a quick test:

SET AUTOTRACE TRACEONLY EXPLAIN
SET LINESIZE 140
SET PAGESIZE 1000
SET TRIMSPOOL ON

SELECT
  T1.COL1,
  T1.COL2,
  T1.COL3,
  T2.COL1,
  T2.COL2,
  T2.COL3
FROM
  T1,
  T2
WHERE
  T1.COL1=T2.COL1;

Execution Plan
----------------------------------------------------------
Plan hash value: 2959412835

-----------------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |   100K|    40M|       | 28696   (1)| 00:01:57 |
|*  1 |  HASH JOIN         |      |   100K|    40M|    21M| 28696   (1)| 00:01:57 |
|   2 |   TABLE ACCESS FULL| T2   |   100K|    20M|       |   379   (2)| 00:00:02 |
|   3 |   TABLE ACCESS FULL| T1   |  1000K|   201M|       | 23406   (1)| 00:01:35 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."COL1"="T2"."COL1")

Two full table scans and a hash join.  If we retrieve all of the columns and all of the rows, the optimizer apparently will not favor nested loops joins.  Let’s try again with a slightly different query, this time using the column with the descending number sequence as the join column:

SELECT
  T1.COL1,
  T1.COL2,
  T1.COL3,
  T2.COL1,
  T2.COL2,
  T2.COL3
FROM
  T1,
  T2
WHERE
  T1.COL2=T2.COL2;

Execution Plan
----------------------------------------------------------
Plan hash value: 2959412835

-----------------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |   100K|    40M|       | 28696   (1)| 00:01:57 |
|*  1 |  HASH JOIN         |      |   100K|    40M|    21M| 28696   (1)| 00:01:57 |
|   2 |   TABLE ACCESS FULL| T2   |   100K|    20M|       |   379   (2)| 00:00:02 |
|   3 |   TABLE ACCESS FULL| T1   |  1000K|   201M|       | 23406   (1)| 00:01:35 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."COL2"="T2"."COL2")

Still a hash join and two full table scans.  Again, the optimizer did not favor nested loops joins.  Let’s try another query that specifies a range of values for the column with the ascending values:

SELECT
  T1.COL1,
  T1.COL2,
  T1.COL3,
  T2.COL1,
  T2.COL2,
  T2.COL3
FROM
  T1,
  T2
WHERE
  T1.COL1=T2.COL1
  AND T1.COL1 BETWEEN 1 AND 1000;

Execution Plan
----------------------------------------------------------
Plan hash value: 1754270684

---------------------------------------------------------------------------------------------
| Id  | Operation                    | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |              |   999 |   411K|    72   (2)| 00:00:01 |
|*  1 |  HASH JOIN                   |              |   999 |   411K|    72   (2)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1           |  1000 |   206K|    37   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | SYS_C0017027 |  1000 |       |     4   (0)| 00:00:01 |
|   4 |   TABLE ACCESS BY INDEX ROWID| T2           |  1000 |   206K|    34   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN          | SYS_C0017028 |  1000 |       |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."COL1"="T2"."COL1")
   3 - access("T1"."COL1">=1 AND "T1"."COL1"<=1000)
   5 - access("T2"."COL1">=1 AND "T2"."COL1"<=1000)

Still a hash join, although the indexes on both tables were used to reduce the number of rows entering the hash join.  So, in this case when selecting 1% (1,000 / 100,000) of the rows from tables, the optimizer still selected to perform a hash join.  Let’s try another SQL statement, this time putting the restriction on the column with the descending values (note that this query, if executed, will not return any rows because the values in column T2.COL2 do not fall into the range between 1 to 1,000):

SELECT
  T1.COL1,
  T1.COL2,
  T1.COL3,
  T2.COL1,
  T2.COL2,
  T2.COL3
FROM
  T1,
  T2
WHERE
  T1.COL1=T2.COL1
  AND T1.COL2 BETWEEN 1 AND 1000;

Execution Plan
----------------------------------------------------------
Plan hash value: 596667293

----------------------------------------------------------------------------------------------
| Id  | Operation                     | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |              |     1 |   422 |     5   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                 |              |       |       |            |          |
|   2 |   NESTED LOOPS                |              |     1 |   422 |     5   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID| T1           |     1 |   211 |     4   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN          | IND_T1_COL2  |     1 |       |     3   (0)| 00:00:01 |
|*  5 |    INDEX UNIQUE SCAN          | SYS_C0017028 |     1 |       |     0   (0)| 00:00:01 |
|   6 |   TABLE ACCESS BY INDEX ROWID | T2           |     1 |   211 |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("T1"."COL2">=1 AND "T1"."COL2"<=1000)
   5 - access("T1"."COL1"="T2"."COL1")

As you can probably tell from the above plan, the optimizer believes that 1 or fewer rows will be returned, so the optimizer changed from performing a single hash join to performing two nested loops joins.  Let’s try another query, this time we will not retrieve column T2.COL3 (the large padding column) and change the WHERE clause to pick up all COL1 values less than 10,000:

SELECT
  T1.COL1,
  T1.COL2,
  T1.COL3,
  T2.COL1,
  T2.COL2
FROM
  T1,
  T2
WHERE
  T1.COL1=T2.COL1
  AND T1.COL1 < 10000;

Execution Plan
----------------------------------------------------------
Plan hash value: 218283169

-------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                  |  9998 |  2167K|   666   (1)| 00:00:03 |
|*  1 |  HASH JOIN                   |                  |  9998 |  2167K|   666   (1)| 00:00:03 |
|*  2 |   VIEW                       | index$_join$_002 |  9999 |   107K|   323   (2)| 00:00:02 |
|*  3 |    HASH JOIN                 |                  |       |       |            |          |
|*  4 |     INDEX RANGE SCAN         | SYS_C0017028     |  9999 |   107K|    22  (10)| 00:00:01 |
|   5 |     INDEX FAST FULL SCAN     | IND_T2_COL2      |  9999 |   107K|   301   (1)| 00:00:02 |
|   6 |   TABLE ACCESS BY INDEX ROWID| T1               |  9999 |  2060K|   343   (1)| 00:00:02 |
|*  7 |    INDEX RANGE SCAN          | SYS_C0017027     |  9999 |       |    21   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."COL1"="T2"."COL1")
   2 - filter("T2"."COL1"<10000)
   3 - access(ROWID=ROWID)
   4 - access("T2"."COL1"<10000)
   7 - access("T1"."COL1"<10000)

The optimizer again did not select to perform nested loop joins, instead performing two hash joins.

Let’s put the restriction on column C2 and try again:

SELECT
  T1.COL1,
  T1.COL2,
  T1.COL3,
  T2.COL1,
  T2.COL2
FROM
  T1,
  T2
WHERE
  T1.COL1=T2.COL1
  AND T1.COL2 < 10000;

Execution Plan
----------------------------------------------------------
Plan hash value: 596667293

----------------------------------------------------------------------------------------------
| Id  | Operation                     | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |              |     1 |   222 |     5   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                 |              |       |       |            |          |
|   2 |   NESTED LOOPS                |              |     1 |   222 |     5   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID| T1           |     1 |   211 |     4   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN          | IND_T1_COL2  |     1 |       |     3   (0)| 00:00:01 |
|*  5 |    INDEX UNIQUE SCAN          | SYS_C0017028 |     1 |       |     0   (0)| 00:00:01 |
|   6 |   TABLE ACCESS BY INDEX ROWID | T2           |     1 |    11 |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("T1"."COL2"<10000)
   5 - access("T1"."COL1"="T2"."COL1")

The optimizer predicted that a single row would be returned, so it decided to use two nested loops joins rather than a hash join.  Let’s try again, but this time fix the SQL statement so that it will return rows if executed:

SELECT
  T1.COL1,
  T1.COL2,
  T1.COL3,
  T2.COL1,
  T2.COL2
FROM
  T1,
  T2
WHERE
  T1.COL1=T2.COL1
  AND T1.COL2 > (10000000-10000);

Execution Plan
----------------------------------------------------------
Plan hash value: 570481587

----------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name        | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |             |  9999 |  2167K|       |   804   (2)| 00:00:04 |
|*  1 |  HASH JOIN                   |             |  9999 |  2167K|  2184K|   804   (2)| 00:00:04 |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1          |  9999 |  2060K|       |   331   (1)| 00:00:02 |
|*  3 |    INDEX RANGE SCAN          | IND_T1_COL2 |  9999 |       |       |    26   (0)| 00:00:01 |
|   4 |   TABLE ACCESS FULL          | T2          |   100K|  1074K|       |   379   (2)| 00:00:02 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("T1"."COL1"="T2"."COL1")
   3 - access("T1"."COL2">9990000)

Back to a hash join again, so nothing magic about having a column with a descending sequence of numbers causing the nested loops join.  Let’s try again, this time retrieving only 100 rows, rather than 1,000 or 9,999:

SELECT
  T1.COL1,
  T1.COL2,
  T1.COL3,
  T2.COL1,
  T2.COL2,
  T2.COL3
FROM
  T1,
  T2
WHERE
  T1.COL1=T2.COL1
  AND T1.COL1 BETWEEN 1 AND 100;

Plan hash value: 1754270684

---------------------------------------------------------------------------------------------
| Id  | Operation                    | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |              |    99 | 41778 |    14   (8)| 00:00:01 |
|*  1 |  HASH JOIN                   |              |    99 | 41778 |    14   (8)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1           |   100 | 21100 |     7   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | SYS_C0017027 |   100 |       |     3   (0)| 00:00:01 |
|   4 |   TABLE ACCESS BY INDEX ROWID| T2           |   100 | 21100 |     6   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN          | SYS_C0017028 |   100 |       |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."COL1"="T2"."COL1")
   3 - access("T1"."COL1">=1 AND "T1"."COL1"<=100)
   5 - access("T2"."COL1">=1 AND "T2"."COL1"<=100)

A hash join again.  What if we decrease the number of rows from 100 to 10?

SELECT
  T1.COL1,
  T1.COL2,
  T1.COL3,
  T2.COL1,
  T2.COL2,
  T2.COL3
FROM
  T1,
  T2
WHERE
  T1.COL1=T2.COL1
  AND T1.COL1 BETWEEN 1 AND 10;

Execution Plan
----------------------------------------------------------
Plan hash value: 2528765105

----------------------------------------------------------------------------------------------
| Id  | Operation                     | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |              |     9 |  3798 |     5   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                 |              |       |       |            |          |
|   2 |   NESTED LOOPS                |              |     9 |  3798 |     5   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID| T1           |    10 |  2110 |     4   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN          | SYS_C0017027 |    10 |       |     3   (0)| 00:00:01 |
|*  5 |    INDEX UNIQUE SCAN          | SYS_C0017028 |     1 |       |     0   (0)| 00:00:01 |
|   6 |   TABLE ACCESS BY INDEX ROWID | T2           |     1 |   211 |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("T1"."COL1">=1 AND "T1"."COL1"<=10)
   5 - access("T1"."COL1"="T2"."COL1")
       filter("T2"."COL1"<=10 AND "T2"."COL1">=1)

Apparently, dropping the number of rows to 10 (0.01% of the rows in table T2, if my calculations are correct) was sufficient to make the optimizer switch from a hash join to two nested loops joins.

I will now ask the question, “If the first table is small and the driven table contains an index on the join column, will Oracle’s optimizer favor nested loops joins over hash joins?”  Could someone generate a test case that shows just the opposite of the results found in my test case?  Does the answer change if I were to repeat the test case script on Oracle Database 8.1.7.4 or 11.2.0.2?  What if instead of one 200 byte padding column there were fifty 10 byte padding columns, and the query only selected one of those padding columns from each table?  What if instead of one 200 byte padding column there was only a single 10 byte padding column?  Is it safe to make a generalization at this point?


Actions

Information

6 responses

1 10 2010
Hemant K Chitale

Oracle wouldn’t be considering the sizes of the tables per se, but the sizes of HashAreas.

1 10 2010
Charles Hooper

Hemant,

Thank you for suggesting another item that could affect the test case results, and another item to consider if we were trying to help the original poster in the OTN thread. For the Oracle instance used in this blog article:

SQL> show parameter pga
 
NAME                                 TYPE        VALUE
------------------------------------ ----------- -----
pga_aggregate_target                 big integer 400M

I mentioned the size of the table, or more specifically the column descriptions for a couple of reasons. One of those reasons is how the AVG_ROW_LEN for the table affects the costing of an access path (page 11 seems to show its use: http://www.centrexcc.com/A%20Look%20under%20the%20Hood%20of%20CBO%20-%20the%2010053%20Event.pdf – but there are probably more recent articles) – should we full table scan or index range scan to access the table data. Another reason I mentioned the size of the tables, or more accurately the size of the selected columns, is how much data will be supplied as input to the next operation in the plan – that likely is not an issue demonstrated in my simple test case script.

Getting back to your point, what if the join column was a 100 byte VARCHAR2 column rather than a numeric column, or multiple columns totalling an average of 200 bytes were the join condition. As you state, the amount of memory available for storing the hash table is an important consideration.

1 10 2010
Andreas Buckenhofer

Charles, I could imagine that optimizing for first rows instead of all rows could favour nested loops compared to hash joins.

1 10 2010
Charles Hooper

Andreas,

Another worthy to mention item that could impact the results of the test case, and another one that did not immediately come to mind when I wrote the blog article. For the Oracle instance used in this blog article:

SQL> SHOW PARAMETER OPTIMIZER_MODE
 
NAME                                 TYPE        VALUE
------------------------------------ ----------- --------
optimizer_mode                       string      ALL_ROWS

We could then extend your idea a little to say that introducing ROWNUM into the WHERE clause of the test case script’s SQL statements could also lead to a similar results, see:
http://jonathanlewis.wordpress.com/2010/09/30/rownum-effects-2/
https://hoopercharles.wordpress.com/2009/12/09/sql-%e2%80%93-bad-execution-plan-caused-by-rownum-row_number-is-possible-fix/

Any other ideas regarding what we should test in this test case?

2 10 2010
Jan-Marten Spit

also, the value of sreadtim (random io) and mreadtim (multiblock io) as well as mbrc from sys.aux_stats$ (‘system statistics’) play a role here.

2 10 2010
Charles Hooper

Another great suggestion for what could cause a change from hash joins to nested loops joins (or nested loops joins to hash joins). I might have to see what happens when experimenting with different system (CPU) statistics. Thank you for the suggestion.

Leave a comment