March 21, 2011
I occasionally see discussions about Oracle Database behavior that make me wonder… is it true, can I generate a test case that validates the statement, and just as important, can I generate a test case that refutes the statement. An interesting question was posted the the OTN forums regarding apparently conflicting advice that the original posted had received regarding which table the Oracle Database’s optimizer would select as the driving table in a nested loops join; if there is a large table and a small table to be joined by nested loops, which will be the driving (outer) table? Jonathan Lewis provided a good deal of insight in the thread, indicating that the optimizer’s decision is not as clear cut as suggested by the OP’s sources.
Not as clear cut as the optimizer always selects the smaller table as the driving (outer) table? Maybe the documentation is able to provide a little more insight into the topic:
“The optimizer uses nested loop joins when joining small number of rows, with a good driving condition between the two tables. You drive from the outer loop to the inner loop, so the order of tables in the execution plan is important.
The outer loop is the driving row source. It produces a set of rows for driving the join condition. The row source can be a table accessed using an index scan or a full table scan. Also, the rows can be produced from any other operation. For example, the output from a nested loop join can be used as a row source for another nested loop join.
The inner loop is iterated for every row returned from the outer loop, ideally by an index scan. If the access path for the inner loop is not dependent on the outer loop, then you can end up with a Cartesian product; for every iteration of the outer loop, the inner loop produces the same set of rows. Therefore, you should use other join methods when two independent row sources are joined together.”
I guess that helps a little – the driving table (or row source) is the outer table, and the inner table (or row source) is checked for every row returned by the driving table (after the specified filter predicates are applied to the driving table). However, the documentation does not explicitly state that the outer table must be the smaller table. Let’s check a couple of books:
- “Oracle Internals: Tips, Tricks, and Techniques for DBAs“
- “Oracle 9i Development by Example”
- “Oracle 9i R2 Data Warehousing”
- “Oracle SQL Tuning & CBO Internals”
- “Oracle 9i Performance Tuning Tips & Techniques”
- “Pro Oracle SQL”
The books were mostly in agreement, except for the last two. Since it appears that there might be some disagreement whether the smaller table is selected by the optimizer as the driving table or the larger table is selected by the optimizer as the driving table, we need a test case (as repeatedly requested as evidence by Sean in the OTN thread).
The following noworkload system statistics will be used:
SELECT PNAME, PVAL1 FROM SYS.AUX_STATS$ ORDER BY PNAME; PNAME PVAL1 --------------- ---------- CPUSPEED CPUSPEEDNW 2116.57559 DSTART DSTOP FLAGS 1 IOSEEKTIM 10 IOTFRSPEED 4096 MAXTHR MBRC MREADTIM SLAVETHR SREADTIM STATUS
First the tables. We will start simple, with one table (T3) having 100 rows and another table (T4) having 10 rows:
CREATE TABLE T3 ( C1 NUMBER, C2 NUMBER, C3 NUMBER, C4 VARCHAR2(20), PADDING VARCHAR2(200)); CREATE TABLE T4 ( C1 NUMBER, C2 NUMBER, C3 NUMBER, C4 VARCHAR2(20), PADDING VARCHAR2(200)); INSERT INTO T3 SELECT ROWNUM C1, 1000000-ROWNUM C2, MOD(ROWNUM-1,1000) C3, TO_CHAR(SYSDATE+MOD(ROWNUM-1,10000),'DAY') C4, LPAD(' ',200,'A') PADDING FROM DUAL CONNECT BY LEVEL<=100; INSERT INTO T4 SELECT ROWNUM C1, 1000000-ROWNUM C2, MOD(ROWNUM-1,1000) C3, TO_CHAR(SYSDATE+MOD(ROWNUM-1,10000),'DAY') C4, LPAD(' ',200,'A') PADDING FROM DUAL CONNECT BY LEVEL<=10; COMMIT; CREATE INDEX IND_T3_C1 ON T3(C1); CREATE INDEX IND_T3_C2 ON T3(C2); CREATE INDEX IND_T3_C3 ON T3(C3); CREATE INDEX IND_T4_C1 ON T4(C1); CREATE INDEX IND_T4_C2 ON T4(C2); CREATE INDEX IND_T4_C3 ON T4(C3); EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T3',CASCADE=>TRUE,ESTIMATE_PERCENT=>NULL) EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T4',CASCADE=>TRUE,ESTIMATE_PERCENT=>NULL)
We are able to easily produce an example where the “smallest” table is selected as the driving table (note that I had to add a hint to specify a nested loop join in several of these examples):
SET AUTOTRACE TRACEONLY EXPLAIN SELECT /*+ USE_NL(T3 T4) */ T3.C1, T3.C2, T3.C3, T3.C4, T4.C1, T4.C2, T4.C3, T4.C4 FROM T3, T4 WHERE T3.C1=T4.C1; Execution Plan ---------------------------------------------------------- Plan hash value: 567778651 ------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 10 | 420 | 13 (0)| 00:00:01 | | 1 | NESTED LOOPS | | | | | | | 2 | NESTED LOOPS | | 10 | 420 | 13 (0)| 00:00:01 | | 3 | TABLE ACCESS FULL | T4 | 10 | 210 | 3 (0)| 00:00:01 | |* 4 | INDEX RANGE SCAN | IND_T3_C1 | 1 | | 0 (0)| 00:00:01 | | 5 | TABLE ACCESS BY INDEX ROWID| T3 | 1 | 21 | 1 (0)| 00:00:01 | ------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 4 - access("T3"."C1"="T4"."C1")
If we stop at that point, we could declare quite simply that the optimizer selects the smaller table as the driving table. But wait a minute, take a look at this example where the optimizer selected the largest table as the driving table:
SELECT /*+ USE_NL(T3 T4) */ T3.C1, T3.C2, T3.C3, T3.C4, T4.C1, T4.C2, T4.C3, T4.C4 FROM T3, T4 WHERE T3.C1=T4.C1 AND T3.C2=1; Execution Plan ---------------------------------------------------------- Plan hash value: 4214127300 ------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 42 | 3 (0)| 00:00:01 | | 1 | NESTED LOOPS | | | | | | | 2 | NESTED LOOPS | | 1 | 42 | 3 (0)| 00:00:01 | | 3 | TABLE ACCESS BY INDEX ROWID| T3 | 1 | 21 | 2 (0)| 00:00:01 | |* 4 | INDEX RANGE SCAN | IND_T3_C2 | 1 | | 1 (0)| 00:00:01 | |* 5 | INDEX RANGE SCAN | IND_T4_C1 | 1 | | 0 (0)| 00:00:01 | | 6 | TABLE ACCESS BY INDEX ROWID | T4 | 1 | 21 | 1 (0)| 00:00:01 | ------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 4 - access("T3"."C2"=1) 5 - access("T3"."C1"="T4"."C1")
The above execution plans were generated on 11.2.0.2, which sometimes differs a bit from older Oracle Database release versions when nested loops joins are used (note the two nested loops joins), however we are able to hint the optimizer to generate the older style nested loops join:
SELECT /*+ USE_NL(T3 T4) OPTIMIZER_FEATURES_ENABLE('10.2.0.4') */ T3.C1, T3.C2, T3.C3, T3.C4, T4.C1, T4.C2, T4.C3, T4.C4 FROM T3, T4 WHERE T3.C1=T4.C1; Execution Plan ---------------------------------------------------------- Plan hash value: 2465588182 ----------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ----------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 10 | 420 | 13 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID| T3 | 1 | 21 | 1 (0)| 00:00:01 | | 2 | NESTED LOOPS | | 10 | 420 | 13 (0)| 00:00:01 | | 3 | TABLE ACCESS FULL | T4 | 10 | 210 | 3 (0)| 00:00:01 | |* 4 | INDEX RANGE SCAN | IND_T3_C1 | 1 | | 0 (0)| 00:00:01 | ----------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 4 - access("T3"."C1"="T4"."C1")
SELECT /*+ USE_NL(T3 T4) OPTIMIZER_FEATURES_ENABLE('10.2.0.4') */ T3.C1, T3.C2, T3.C3, T3.C4, T4.C1, T4.C2, T4.C3, T4.C4 FROM T3, T4 WHERE T3.C1=T4.C1 AND T3.C2=1; Execution Plan ---------------------------------------------------------- Plan hash value: 3446668716 ------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 42 | 3 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID | T4 | 1 | 21 | 1 (0)| 00:00:01 | | 2 | NESTED LOOPS | | 1 | 42 | 3 (0)| 00:00:01 | | 3 | TABLE ACCESS BY INDEX ROWID| T3 | 1 | 21 | 2 (0)| 00:00:01 | |* 4 | INDEX RANGE SCAN | IND_T3_C2 | 1 | | 1 (0)| 00:00:01 | |* 5 | INDEX RANGE SCAN | IND_T4_C1 | 1 | | 0 (0)| 00:00:01 | ------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 4 - access("T3"."C2"=1) 5 - access("T3"."C1"="T4"."C1")
We found one case where the larger table was selected as the driving table, so the books and articles that simply state absolutely that the smallest table will be the driving table are not completely correct. Maybe the larger table is only selected as the driving table when both tables are small? Let’s test that theory by creating a couple of more tables:
SET AUTOTRACE OFF CREATE TABLE T1 ( C1 NUMBER, C2 NUMBER, C3 NUMBER, C4 VARCHAR2(20), PADDING VARCHAR2(200)); CREATE TABLE T2 ( C1 NUMBER, C2 NUMBER, C3 NUMBER, C4 VARCHAR2(20), PADDING VARCHAR2(200)); INSERT INTO T1 SELECT ROWNUM C1, 1000000-ROWNUM C2, MOD(ROWNUM-1,1000) C3, TO_CHAR(SYSDATE+MOD(ROWNUM-1,10000),'DAY') C4, LPAD(' ',200,'A') PADDING FROM DUAL CONNECT BY LEVEL<=1000000; INSERT INTO T2 SELECT ROWNUM C1, 1000000-ROWNUM C2, MOD(ROWNUM-1,1000) C3, TO_CHAR(SYSDATE+MOD(ROWNUM-1,10000),'DAY') C4, LPAD(' ',200,'A') PADDING FROM DUAL CONNECT BY LEVEL<=100000; COMMIT; CREATE INDEX IND_T1_C1 ON T1(C1); CREATE INDEX IND_T1_C2 ON T1(C2); CREATE INDEX IND_T1_C3 ON T1(C3); CREATE INDEX IND_T2_C1 ON T2(C1); CREATE INDEX IND_T2_C2 ON T2(C2); CREATE INDEX IND_T2_C3 ON T2(C3); EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T1',CASCADE=>TRUE,ESTIMATE_PERCENT=>NULL) EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T2',CASCADE=>TRUE,ESTIMATE_PERCENT=>NULL)
The above script created table T1 with 1,000,000 rows and table T2 with 100,000 rows. We will now use queries that are similar to those that were used with the 100 and 10 row tables.
The smaller table (T2) as the driving table:
SET AUTOTRACE TRACEONLY EXPLAIN SELECT /*+ USE_NL(T1 T2) */ T1.C1, T1.C2, T1.C3, T1.C4, T2.C1, T2.C2, T2.C3, T2.C4 FROM T1, T2 WHERE T1.C1=T2.C1; Execution Plan ---------------------------------------------------------- Plan hash value: 2610346857 ------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 100K| 4687K| 300K (1)| 01:00:12 | | 1 | NESTED LOOPS | | | | | | | 2 | NESTED LOOPS | | 100K| 4687K| 300K (1)| 01:00:12 | | 3 | TABLE ACCESS FULL | T2 | 100K| 2343K| 889 (1)| 00:00:11 | |* 4 | INDEX RANGE SCAN | IND_T1_C1 | 1 | | 2 (0)| 00:00:01 | | 5 | TABLE ACCESS BY INDEX ROWID| T1 | 1 | 24 | 3 (0)| 00:00:01 | ------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 4 - access("T1"."C1"="T2"."C1")
The larger table as the driving table:
SELECT /*+ USE_NL(T1 T2) */ T1.C1, T1.C2, T1.C3, T1.C4, T2.C1, T2.C2, T2.C3, T2.C4 FROM T1, T2 WHERE T1.C1=T2.C1 AND T1.C2 BETWEEN 1 AND 10000; Execution Plan ---------------------------------------------------------- Plan hash value: 2331401024 ------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 10001 | 468K| 11353 (1)| 00:02:17 | | 1 | NESTED LOOPS | | | | | | | 2 | NESTED LOOPS | | 10001 | 468K| 11353 (1)| 00:02:17 | | 3 | TABLE ACCESS BY INDEX ROWID| T1 | 10001 | 234K| 348 (0)| 00:00:05 | |* 4 | INDEX RANGE SCAN | IND_T1_C2 | 10001 | | 25 (0)| 00:00:01 | |* 5 | INDEX RANGE SCAN | IND_T2_C1 | 1 | | 1 (0)| 00:00:01 | | 6 | TABLE ACCESS BY INDEX ROWID | T2 | 1 | 24 | 2 (0)| 00:00:01 | ------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 4 - access("T1"."C2">=1 AND "T1"."C2"<=10000) 5 - access("T1"."C1"="T2"."C1")
So, what is happening? Is it simply the case that it is the expected number of rows that will be returned from each table that determines which table will be the driving table? Let’s test:
SET AUTOTRACE OFF SELECT COUNT(*) FROM T1 WHERE T1.C1 BETWEEN 890000 AND 1000000; COUNT(*) ---------- 110001 SELECT COUNT(*) FROM T2 WHERE T2.C2 BETWEEN 900000 AND 1000000; COUNT(*) ---------- 100000
The above shows that if we specify T1.C1 BETWEEN 890000 AND 1000000 in the WHERE clause there will be 110,001 rows from the larger table that match the criteria. If we specify T2.C2 BETWEEN 900000 AND 1000000 in the WHERE clause there will be 100,000 rows from the smaller table that match the criteria. If we execute the following query, which table will be the driving table, the 10 times larger T1 table where we are retrieving 110,001 rows or the smaller T2 table where we are retrieving 100,000 rows?
SET AUTOTRACE TRACEONLY EXPLAIN SELECT /*+ USE_NL(T1 T2) */ T1.C1, T1.C2, T1.C3, T1.C4, T2.C1, T2.C2, T2.C3, T2.C4 FROM T1, T2 WHERE T1.C3=T2.C3 AND T1.C1 BETWEEN 890000 AND 1000000 AND T2.C2 BETWEEN 900000 AND 1000000;
This is the result that I received, which seems to demonstrate that it is not just the size of the tables, nor is it the number of expected rows to be returned from the tables, that determines which table will be the driving table:
------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 11M| 503M| 11M (1)| 37:03:27 | | 1 | NESTED LOOPS | | | | | | | 2 | NESTED LOOPS | | 11M| 503M| 11M (1)| 37:03:27 | | 3 | TABLE ACCESS BY INDEX ROWID| T1 | 110K| 2578K| 3799 (1)| 00:00:46 | |* 4 | INDEX RANGE SCAN | IND_T1_C1 | 110K| | 248 (1)| 00:00:03 | |* 5 | INDEX RANGE SCAN | IND_T2_C3 | 100 | | 1 (0)| 00:00:01 | |* 6 | TABLE ACCESS BY INDEX ROWID | T2 | 100 | 2400 | 101 (0)| 00:00:02 | ------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 4 - access("T1"."C1">=890000 AND "T1"."C1"<=1000000) 5 - access("T1"."C3"="T2"."C3") 6 - filter("T2"."C2">=900000 AND "T2"."C2"<=1000000)
110,001 rows from T1 is still somewhat close in number to the 100,000 rows from T2, so let’s try an experiment selecting 992,701 rows from T1:
SELECT /*+ USE_NL(T1 T2) */ T1.C1, T1.C2, T1.C3, T1.C4, T2.C1, T2.C2, T2.C3, T2.C4 FROM T1, T2 WHERE T1.C3=T2.C3 AND T1.C1 BETWEEN 7300 AND 1000000 AND T2.C2 BETWEEN 900000 AND 1000000; Execution Plan ---------------------------------------------------------- Plan hash value: 3718770616 ------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 99M| 4544M| 100M (1)|334:20:23 | | 1 | NESTED LOOPS | | | | | | | 2 | NESTED LOOPS | | 99M| 4544M| 100M (1)|334:20:23 | |* 3 | TABLE ACCESS FULL | T1 | 992K| 22M| 8835 (1)| 00:01:47 | |* 4 | INDEX RANGE SCAN | IND_T2_C3 | 100 | | 1 (0)| 00:00:01 | |* 5 | TABLE ACCESS BY INDEX ROWID| T2 | 100 | 2400 | 101 (0)| 00:00:02 | ------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 3 - filter("T1"."C1">=7300 AND "T1"."C1"<=1000000) 4 - access("T1"."C3"="T2"."C3") 5 - filter("T2"."C2">=900000 AND "T2"."C2"<=1000000)
As shown above, table T1 is still the driving table in the nested loops join. Let’s test retrieving 993,001 rows from T1:
SELECT /*+ USE_NL(T1 T2) */ T1.C1, T1.C2, T1.C3, T1.C4, T2.C1, T2.C2, T2.C3, T2.C4 FROM T1, T2 WHERE T1.C3=T2.C3 AND T1.C1 BETWEEN 7000 AND 1000000 AND T2.C2 BETWEEN 900000 AND 1000000; ------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 99M| 4545M| 100M (1)|334:26:13 | | 1 | NESTED LOOPS | | | | | | | 2 | NESTED LOOPS | | 99M| 4545M| 100M (1)|334:26:13 | |* 3 | TABLE ACCESS FULL | T2 | 100K| 2343K| 889 (1)| 00:00:11 | |* 4 | INDEX RANGE SCAN | IND_T1_C3 | 1000 | | 3 (0)| 00:00:01 | |* 5 | TABLE ACCESS BY INDEX ROWID| T1 | 993 | 23832 | 1003 (0)| 00:00:13 | ------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 3 - filter("T2"."C2">=900000 AND "T2"."C2"<=1000000) 4 - access("T1"."C3"="T2"."C3") 5 - filter("T1"."C1">=7000 AND "T1"."C1"<=1000000)
As shown above, table T2 is now the driving table for the nested loops join. So, there must be other factors beyond table (or better worded row source) size and the number of rows that will be retrieved from the tables. You might be wondering if the CLUSTERING_FACTOR of the indexes also plays a role in determining which table is the driving table:
SET AUTOTRACE OFF SELECT TABLE_NAME, INDEX_NAME, CLUSTERING_FACTOR, NUM_ROWS FROM USER_INDEXES WHERE TABLE_NAME IN ('T1','T2') ORDER BY TABLE_NAME, INDEX_NAME; TABLE_NAME INDEX_NAME CLUSTERING_FACTOR NUM_ROWS ---------- ---------- ----------------- ---------- T1 IND_T1_C1 32259 1000000 T1 IND_T1_C2 32259 1000000 T1 IND_T1_C3 1000000 1000000 T2 IND_T2_C1 3226 100000 T2 IND_T2_C2 3226 100000 T2 IND_T2_C3 100000 100000
I suggested (without checking) in the OTN thread that the CLUSTERING_FACTOR of the index on columns C2 would be higher than the CLUSTERING_FACTOR of the index on columns C1 because of the reverse (descending) order in which the C2 column values were inserted into the tables. Surprisingly (at least to me), the optimizer set the CLUSTERING_FACTOR of the C1 and C2 columns to be the same values, and set the CLUSTERING_FACTOR of column C3 to be the same as the number of rows in the table. Maybe one of the readers of this blog article can explain what happened to the CLUSTERING_FACTOR.
So, the answer to the OP’s question is not as simple as “the Smaller Table is the Driving Table” or “the Larger Table is the Driving Table”, but that there are other factors involved. I think that it might be time for a third read through of the book “Cost-Based Oracle Fundamentals”. In the mean time, anyone care to share more insight (yes we could look inside a 10053 trace file, but there must be at least one other tip that can be provided without referencing such a trace file).
Recent Comments