Nested Loops Join – the Smaller Table is the Driving Table, the Larger Table is the Driving Table

21 03 2011

March 21, 2011

I occasionally see discussions about Oracle Database behavior that make me wonder… is it true, can I generate a test case that validates the statement, and just as important, can I generate a test case that refutes the statement.  An interesting question was posted the the OTN forums regarding apparently conflicting advice that the original posted had received regarding which table the Oracle Database’s optimizer would select as the driving table in a nested loops join; if there is a large table and a small table to be joined by nested loops, which will be the driving (outer) table?  Jonathan Lewis provided a good deal of insight in the thread, indicating that the optimizer’s decision is not as clear cut as suggested by the OP’s sources.

Not as clear cut as the optimizer always selects the smaller table as the driving (outer) table?  Maybe the documentation is able to provide a little more insight into the topic:

“The optimizer uses nested loop joins when joining small number of rows, with a good driving condition between the two tables. You drive from the outer loop to the inner loop, so the order of tables in the execution plan is important.

The outer loop is the driving row source. It produces a set of rows for driving the join condition. The row source can be a table accessed using an index scan or a full table scan. Also, the rows can be produced from any other operation. For example, the output from a nested loop join can be used as a row source for another nested loop join.

The inner loop is iterated for every row returned from the outer loop, ideally by an index scan. If the access path for the inner loop is not dependent on the outer loop, then you can end up with a Cartesian product; for every iteration of the outer loop, the inner loop produces the same set of rows. Therefore, you should use other join methods when two independent row sources are joined together.”

I guess that helps a little – the driving table (or row source) is the outer table, and the inner table (or row source) is checked for every row returned by the driving table (after the specified filter predicates are applied to the driving table).  However, the documentation does not explicitly state that the outer table must be the smaller table.  Let’s check a couple of books:

The books were mostly in agreement, except for the last two.  Since it appears that there might be some disagreement whether the smaller table is selected by the optimizer as the driving table or the larger table is selected by the optimizer as the driving table, we need a test case (as repeatedly requested as evidence by Sean in the OTN thread). 

The following noworkload system statistics will be used:

SELECT
  PNAME,
  PVAL1
FROM
  SYS.AUX_STATS$
ORDER BY
  PNAME;

PNAME                PVAL1
--------------- ----------
CPUSPEED
CPUSPEEDNW      2116.57559
DSTART
DSTOP
FLAGS                    1
IOSEEKTIM               10
IOTFRSPEED            4096
MAXTHR
MBRC
MREADTIM
SLAVETHR
SREADTIM
STATUS 

First the tables.  We will start simple, with one table (T3) having 100 rows and another table (T4) having 10 rows:

CREATE TABLE T3 (
  C1 NUMBER,
  C2 NUMBER,
  C3 NUMBER,
  C4 VARCHAR2(20),
  PADDING VARCHAR2(200));

CREATE TABLE T4 (
  C1 NUMBER,
  C2 NUMBER,
  C3 NUMBER,
  C4 VARCHAR2(20),
  PADDING VARCHAR2(200));

INSERT INTO
  T3
SELECT
  ROWNUM C1,
  1000000-ROWNUM C2,
  MOD(ROWNUM-1,1000) C3,
  TO_CHAR(SYSDATE+MOD(ROWNUM-1,10000),'DAY') C4,
  LPAD(' ',200,'A') PADDING
FROM
  DUAL
CONNECT BY
  LEVEL<=100;

INSERT INTO
  T4
SELECT
  ROWNUM C1,
  1000000-ROWNUM C2,
  MOD(ROWNUM-1,1000) C3,
  TO_CHAR(SYSDATE+MOD(ROWNUM-1,10000),'DAY') C4,
  LPAD(' ',200,'A') PADDING
FROM
  DUAL
CONNECT BY
  LEVEL<=10;

COMMIT;

CREATE INDEX IND_T3_C1 ON T3(C1);
CREATE INDEX IND_T3_C2 ON T3(C2);
CREATE INDEX IND_T3_C3 ON T3(C3);

CREATE INDEX IND_T4_C1 ON T4(C1);
CREATE INDEX IND_T4_C2 ON T4(C2);
CREATE INDEX IND_T4_C3 ON T4(C3);

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T3',CASCADE=>TRUE,ESTIMATE_PERCENT=>NULL)
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T4',CASCADE=>TRUE,ESTIMATE_PERCENT=>NULL) 

We are able to easily produce an example where the "smallest" table is selected as the driving table (note that I had to add a hint to specify a nested loop join in several of these examples):

SET AUTOTRACE TRACEONLY EXPLAIN

SELECT /*+ USE_NL(T3 T4) */
  T3.C1,
  T3.C2,
  T3.C3,
  T3.C4,
  T4.C1,
  T4.C2,
  T4.C3,
  T4.C4
FROM
  T3,
  T4
WHERE
  T3.C1=T4.C1;

Execution Plan
----------------------------------------------------------
Plan hash value: 567778651

------------------------------------------------------------------------------------------
| Id  | Operation                    | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |           |    10 |   420 |    13   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                |           |       |       |            |          |
|   2 |   NESTED LOOPS               |           |    10 |   420 |    13   (0)| 00:00:01 |
|   3 |    TABLE ACCESS FULL         | T4        |    10 |   210 |     3   (0)| 00:00:01 |
|*  4 |    INDEX RANGE SCAN          | IND_T3_C1 |     1 |       |     0   (0)| 00:00:01 |
|   5 |   TABLE ACCESS BY INDEX ROWID| T3        |     1 |    21 |     1   (0)| 00:00:01 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("T3"."C1"="T4"."C1") 

If we stop at that point, we could declare quite simply that the optimizer selects the smaller table as the driving table.  But wait a minute, take a look at this example where the optimizer selected the largest table as the driving table:

SELECT /*+ USE_NL(T3 T4) */
  T3.C1,
  T3.C2,
  T3.C3,
  T3.C4,
  T4.C1,
  T4.C2,
  T4.C3,
  T4.C4
FROM
  T3,
  T4
WHERE
  T3.C1=T4.C1
  AND T3.C2=1;

Execution Plan
----------------------------------------------------------
Plan hash value: 4214127300

-------------------------------------------------------------------------------------------
| Id  | Operation                     | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |           |     1 |    42 |     3   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                 |           |       |       |            |          |
|   2 |   NESTED LOOPS                |           |     1 |    42 |     3   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID| T3        |     1 |    21 |     2   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN          | IND_T3_C2 |     1 |       |     1   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN           | IND_T4_C1 |     1 |       |     0   (0)| 00:00:01 |
|   6 |   TABLE ACCESS BY INDEX ROWID | T4        |     1 |    21 |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("T3"."C2"=1)
   5 - access("T3"."C1"="T4"."C1") 

The above execution plans were generated on 11.2.0.2, which sometimes differs a bit from older Oracle Database release versions when nested loops joins are used (note the two nested loops joins), however we are able to hint the optimizer to generate the older style nested loops join:

SELECT /*+ USE_NL(T3 T4) OPTIMIZER_FEATURES_ENABLE('10.2.0.4') */
  T3.C1,
  T3.C2,
  T3.C3,
  T3.C4,
  T4.C1,
  T4.C2,
  T4.C3,
  T4.C4
FROM
  T3,
  T4
WHERE
  T3.C1=T4.C1;

Execution Plan
----------------------------------------------------------
Plan hash value: 2465588182

-----------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |    10 |   420 |    13   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T3        |     1 |    21 |     1   (0)| 00:00:01 |
|   2 |   NESTED LOOPS              |           |    10 |   420 |    13   (0)| 00:00:01 |
|   3 |    TABLE ACCESS FULL        | T4        |    10 |   210 |     3   (0)| 00:00:01 |
|*  4 |    INDEX RANGE SCAN         | IND_T3_C1 |     1 |       |     0   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("T3"."C1"="T4"."C1") 

 

SELECT /*+ USE_NL(T3 T4) OPTIMIZER_FEATURES_ENABLE('10.2.0.4') */
  T3.C1,
  T3.C2,
  T3.C3,
  T3.C4,
  T4.C1,
  T4.C2,
  T4.C3,
  T4.C4
FROM
  T3,
  T4
WHERE
  T3.C1=T4.C1
  AND T3.C2=1;

Execution Plan
----------------------------------------------------------
Plan hash value: 3446668716

-------------------------------------------------------------------------------------------
| Id  | Operation                     | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |           |     1 |    42 |     3   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID  | T4        |     1 |    21 |     1   (0)| 00:00:01 |
|   2 |   NESTED LOOPS                |           |     1 |    42 |     3   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID| T3        |     1 |    21 |     2   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN          | IND_T3_C2 |     1 |       |     1   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN           | IND_T4_C1 |     1 |       |     0   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("T3"."C2"=1)
   5 - access("T3"."C1"="T4"."C1") 

We found one case where the larger table was selected as the driving table, so the books and articles that simply state absolutely that the smallest table will be the driving table are not completely correct.  Maybe the larger table is only selected as the driving table when both tables are small?  Let's test that theory by creating a couple of more tables:

SET AUTOTRACE OFF

CREATE TABLE T1 (
  C1 NUMBER,
  C2 NUMBER,
  C3 NUMBER,
  C4 VARCHAR2(20),
  PADDING VARCHAR2(200));

CREATE TABLE T2 (
  C1 NUMBER,
  C2 NUMBER,
  C3 NUMBER,
  C4 VARCHAR2(20),
  PADDING VARCHAR2(200));

INSERT INTO
  T1
SELECT
  ROWNUM C1,
  1000000-ROWNUM C2,
  MOD(ROWNUM-1,1000) C3,
  TO_CHAR(SYSDATE+MOD(ROWNUM-1,10000),'DAY') C4,
  LPAD(' ',200,'A') PADDING
FROM
  DUAL
CONNECT BY
  LEVEL<=1000000;

INSERT INTO
  T2
SELECT
  ROWNUM C1,
  1000000-ROWNUM C2,
  MOD(ROWNUM-1,1000) C3,
  TO_CHAR(SYSDATE+MOD(ROWNUM-1,10000),'DAY') C4,
  LPAD(' ',200,'A') PADDING
FROM
  DUAL
CONNECT BY
  LEVEL<=100000;

COMMIT;

CREATE INDEX IND_T1_C1 ON T1(C1);
CREATE INDEX IND_T1_C2 ON T1(C2);
CREATE INDEX IND_T1_C3 ON T1(C3);

CREATE INDEX IND_T2_C1 ON T2(C1);
CREATE INDEX IND_T2_C2 ON T2(C2);
CREATE INDEX IND_T2_C3 ON T2(C3);

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T1',CASCADE=>TRUE,ESTIMATE_PERCENT=>NULL)
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T2',CASCADE=>TRUE,ESTIMATE_PERCENT=>NULL) 

The above script created table T1 with 1,000,000 rows and table T2 with 100,000 rows.  We will now use queries that are similar to those that were used with the 100 and 10 row tables.

The smaller table (T2) as the driving table:

SET AUTOTRACE TRACEONLY EXPLAIN

SELECT /*+ USE_NL(T1 T2) */
  T1.C1,
  T1.C2,
  T1.C3,
  T1.C4,
  T2.C1,
  T2.C2,
  T2.C3,
  T2.C4
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1;

Execution Plan
----------------------------------------------------------
Plan hash value: 2610346857

------------------------------------------------------------------------------------------
| Id  | Operation                    | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |           |   100K|  4687K|   300K  (1)| 01:00:12 |
|   1 |  NESTED LOOPS                |           |       |       |            |          |
|   2 |   NESTED LOOPS               |           |   100K|  4687K|   300K  (1)| 01:00:12 |
|   3 |    TABLE ACCESS FULL         | T2        |   100K|  2343K|   889   (1)| 00:00:11 |
|*  4 |    INDEX RANGE SCAN          | IND_T1_C1 |     1 |       |     2   (0)| 00:00:01 |
|   5 |   TABLE ACCESS BY INDEX ROWID| T1        |     1 |    24 |     3   (0)| 00:00:01 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("T1"."C1"="T2"."C1") 

The larger table as the driving table:

SELECT /*+ USE_NL(T1 T2) */
  T1.C1,
  T1.C2,
  T1.C3,
  T1.C4,
  T2.C1,
  T2.C2,
  T2.C3,
  T2.C4
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1
  AND T1.C2 BETWEEN 1 AND 10000;

Execution Plan
----------------------------------------------------------
Plan hash value: 2331401024

-------------------------------------------------------------------------------------------
| Id  | Operation                     | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |           | 10001 |   468K| 11353   (1)| 00:02:17 |
|   1 |  NESTED LOOPS                 |           |       |       |            |          |
|   2 |   NESTED LOOPS                |           | 10001 |   468K| 11353   (1)| 00:02:17 |
|   3 |    TABLE ACCESS BY INDEX ROWID| T1        | 10001 |   234K|   348   (0)| 00:00:05 |
|*  4 |     INDEX RANGE SCAN          | IND_T1_C2 | 10001 |       |    25   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN           | IND_T2_C1 |     1 |       |     1   (0)| 00:00:01 |
|   6 |   TABLE ACCESS BY INDEX ROWID | T2        |     1 |    24 |     2   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("T1"."C2">=1 AND "T1"."C2"<=10000)
   5 - access("T1"."C1"="T2"."C1") 

So, what is happening?  Is it simply the case that it is the expected number of rows that will be returned from each table that determines which table will be the driving table?  Let's test:

SET AUTOTRACE OFF

SELECT
  COUNT(*)
FROM
  T1
WHERE
  T1.C1 BETWEEN 890000 AND 1000000;

  COUNT(*)
----------
    110001

SELECT
  COUNT(*)
FROM
  T2
WHERE
 T2.C2 BETWEEN 900000 AND 1000000;

  COUNT(*)
----------
    100000 

The above shows that if we specify T1.C1 BETWEEN 890000 AND 1000000 in the WHERE clause there will be 110,001 rows from the larger table that match the criteria. If we specify T2.C2 BETWEEN 900000 AND 1000000 in the WHERE clause there will be 100,000 rows from the smaller table that match the criteria. If we execute the following query, which table will be the driving table, the 10 times larger T1 table where we are retrieving 110,001 rows or the smaller T2 table where we are retrieving 100,000 rows?

SET AUTOTRACE TRACEONLY EXPLAIN

SELECT /*+ USE_NL(T1 T2) */
  T1.C1,
  T1.C2,
  T1.C3,
  T1.C4,
  T2.C1,
  T2.C2,
  T2.C3,
  T2.C4
FROM
  T1,
  T2
WHERE
  T1.C3=T2.C3
  AND T1.C1 BETWEEN 890000 AND 1000000
  AND T2.C2 BETWEEN 900000 AND 1000000; 

This is the result that I received, which seems to demonstrate that it is not just the size of the tables, nor is it the number of expected rows to be returned from the tables, that determines which table will be the driving table:

-------------------------------------------------------------------------------------------
| Id  | Operation                     | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |           |    11M|   503M|    11M  (1)| 37:03:27 |
|   1 |  NESTED LOOPS                 |           |       |       |            |          |
|   2 |   NESTED LOOPS                |           |    11M|   503M|    11M  (1)| 37:03:27 |
|   3 |    TABLE ACCESS BY INDEX ROWID| T1        |   110K|  2578K|  3799   (1)| 00:00:46 |
|*  4 |     INDEX RANGE SCAN          | IND_T1_C1 |   110K|       |   248   (1)| 00:00:03 |
|*  5 |    INDEX RANGE SCAN           | IND_T2_C3 |   100 |       |     1   (0)| 00:00:01 |
|*  6 |   TABLE ACCESS BY INDEX ROWID | T2        |   100 |  2400 |   101   (0)| 00:00:02 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("T1"."C1">=890000 AND "T1"."C1"<=1000000)
   5 - access("T1"."C3"="T2"."C3")
   6 - filter("T2"."C2">=900000 AND "T2"."C2"<=1000000) 

110,001 rows from T1 is still somewhat close in number to the 100,000 rows from T2, so let's try an experiment selecting 992,701 rows from T1:

SELECT /*+ USE_NL(T1 T2) */
  T1.C1,
  T1.C2,
  T1.C3,
  T1.C4,
  T2.C1,
  T2.C2,
  T2.C3,
  T2.C4
FROM
  T1,
  T2
WHERE
  T1.C3=T2.C3
  AND T1.C1 BETWEEN 7300 AND 1000000
  AND T2.C2 BETWEEN 900000 AND 1000000;

Execution Plan
----------------------------------------------------------
Plan hash value: 3718770616

------------------------------------------------------------------------------------------
| Id  | Operation                    | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |           |    99M|  4544M|   100M  (1)|334:20:23 |
|   1 |  NESTED LOOPS                |           |       |       |            |          |
|   2 |   NESTED LOOPS               |           |    99M|  4544M|   100M  (1)|334:20:23 |
|*  3 |    TABLE ACCESS FULL         | T1        |   992K|    22M|  8835   (1)| 00:01:47 |
|*  4 |    INDEX RANGE SCAN          | IND_T2_C3 |   100 |       |     1   (0)| 00:00:01 |
|*  5 |   TABLE ACCESS BY INDEX ROWID| T2        |   100 |  2400 |   101   (0)| 00:00:02 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("T1"."C1">=7300 AND "T1"."C1"<=1000000)
   4 - access("T1"."C3"="T2"."C3")
   5 - filter("T2"."C2">=900000 AND "T2"."C2"<=1000000) 

As shown above, table T1 is still the driving table in the nested loops join.  Let's test retrieving 993,001 rows from T1:

SELECT /*+ USE_NL(T1 T2) */
  T1.C1,
  T1.C2,
  T1.C3,
  T1.C4,
  T2.C1,
  T2.C2,
  T2.C3,
  T2.C4
FROM
  T1,
  T2
WHERE
  T1.C3=T2.C3
  AND T1.C1 BETWEEN 7000 AND 1000000
  AND T2.C2 BETWEEN 900000 AND 1000000;

------------------------------------------------------------------------------------------
| Id  | Operation                    | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |           |    99M|  4545M|   100M  (1)|334:26:13 |
|   1 |  NESTED LOOPS                |           |       |       |            |          |
|   2 |   NESTED LOOPS               |           |    99M|  4545M|   100M  (1)|334:26:13 |
|*  3 |    TABLE ACCESS FULL         | T2        |   100K|  2343K|   889   (1)| 00:00:11 |
|*  4 |    INDEX RANGE SCAN          | IND_T1_C3 |  1000 |       |     3   (0)| 00:00:01 |
|*  5 |   TABLE ACCESS BY INDEX ROWID| T1        |   993 | 23832 |  1003   (0)| 00:00:13 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("T2"."C2">=900000 AND "T2"."C2"<=1000000)
   4 - access("T1"."C3"="T2"."C3")
   5 - filter("T1"."C1">=7000 AND "T1"."C1"<=1000000) 

As shown above, table T2 is now the driving table for the nested loops join.  So, there must be other factors beyond table (or better worded row source) size and the number of rows that will be retrieved from the tables.  You might be wondering if the CLUSTERING_FACTOR of the indexes also plays a role in determining which table is the driving table:

SET AUTOTRACE OFF
 
SELECT
  TABLE_NAME,
  INDEX_NAME,
  CLUSTERING_FACTOR,
  NUM_ROWS
FROM
  USER_INDEXES
WHERE
  TABLE_NAME IN ('T1','T2')
ORDER BY
  TABLE_NAME,
  INDEX_NAME;

TABLE_NAME INDEX_NAME CLUSTERING_FACTOR   NUM_ROWS
---------- ---------- ----------------- ----------
T1         IND_T1_C1              32259    1000000
T1         IND_T1_C2              32259    1000000
T1         IND_T1_C3            1000000    1000000
T2         IND_T2_C1               3226     100000
T2         IND_T2_C2               3226     100000
T2         IND_T2_C3             100000     100000 

I suggested (without checking) in the OTN thread that the CLUSTERING_FACTOR of the index on columns C2 would be higher than the CLUSTERING_FACTOR of the index on columns C1 because of the reverse (descending) order in which the C2 column values were inserted into the tables.  Surprisingly (at least to me), the optimizer set the CLUSTERING_FACTOR of the C1 and C2 columns to be the same values, and set the CLUSTERING_FACTOR of column C3 to be the same as the number of rows in the table.  Maybe one of the readers of this blog article can explain what happened to the CLUSTERING_FACTOR.

So, the answer to the OP's question is not as simple as "the Smaller Table is the Driving Table" or "the Larger Table is the Driving Table", but that there are other factors involved.  I think that it might be time for a third read through of the book "Cost-Based Oracle Fundamentals".  In the mean time, anyone care to share more insight (yes we could look inside a 10053 trace file, but there must be at least one other tip that can be provided without referencing such a trace file).


Actions

Information

16 responses

21 03 2011
Ahmed AANGOUR

Hi Charles,

Thanks for this post.
I knew that it was not the size of the table neither the number of rows that determines the driving table but I always thought that it was the proportion of the table retrieved ….

21 03 2011
Narendra

Charles,

Thanks again for touching yet another “controversial” topic :)
As for your comment on clustering factor I suggested (without checking) in the OTN thread that the CLUSTERING_FACTOR of the index on columns C2 would be higher than the CLUSTERING_FACTOR of the index on columns C1 because of the reverse (descending) order in which the C2 column values were inserted into the tables. Surprisingly (at least to me), the optimizer set the CLUSTERING_FACTOR of the C1 and C2 columns to be the same values, and set the CLUSTERING_FACTOR of column C3 to be the same as the number of rows in the table. Maybe one of the readers of this blog article can explain what happened to the CLUSTERING_FACTOR.
all I can say is I was in the same situation as yours some time back when I was trying out a test case (I don’t exactly remember from where but it was most probably from Richard Foote’s blog post) and I was (then) surprised to find out that the exact reverse order of table data did not affect the clustering factor.
Once again, I believe the “conventional wisdom” (but probably misleading) was good clustering factor of an index indicates table data is stored sorted as per index key values whereas bad clustering factor indicates table data is not stored sorted as per index key values but I believe the more correct statement would be good clustering factor of an index indicates table data is stored clustered as per index key values whereas bad clustering factor indicates table data is not stored clustered as per index key values. See this documentation link.
My understanding is as long as all table rows for every index key are stored close to each other, index will have a good clustering factor. In your example, even though the data for columns C1 and C2 is added to table T2, is added in exact opposite order, the keys for indexes on C1 and C2 (IND_T2_C1 and IND_T2_C2) will still be able to find the corresponding table data, stored close to each other. In your example, it is just one row as the column values are unique but you should be able to achieve same results even if the column values were not unique.

21 03 2011
Dom Brooks

The fact the data is inserted in order – whether asc or desc – is what dictates the low CF of C1 and C2 indexes, in contrast to the “random” distribution of data for C3, no?

If the clustering factor calculation scans the index keeping count of the number of times the table block address changes, then it doesn’t matter if you’re going block 1, block 1, block 1, block 2… etc or block 99, block 99, block 99, block 98.

21 03 2011
Dom Brooks

> then it doesn’t matter if you’re going block 1, block 1, block 1, block 2… etc or block 99, block 99, block 99, block 98

and therefore an indicator that if you use either of these indexes to get to the table data for an appropriate key, then you have to make trips to relatively few table data blocks because the data for that key tends to be well clustered

21 03 2011
Dom Brooks

“key” is a bad choice of term, “indexed value” better…

21 03 2011
Charles Hooper

Dom,
I think that I now see the flaw in my reasoning regarding the clustering factor of column C3. While I was working the example I was thinking… I am inserting these column values in order by adding one less than ROWNUM to the value of SYSDATE, see:

SELECT
  TO_CHAR(SYSDATE+MOD(ROWNUM-1,10000),'DAY') C3
FROM
  DUAL
CONNECT BY
  LEVEL<=14;
 
C3
---------
MONDAY
TUESDAY
WEDNESDAY
THURSDAY
FRIDAY
SATURDAY
SUNDAY
MONDAY
TUESDAY
WEDNESDAY
THURSDAY
FRIDAY
SATURDAY
SUNDAY

Sure, they are in order by date, but not *alphabetical* order. Sometimes I just have to step away from the keyboard for a couple of minutes... and wait for someone to trigger a faint memory of something that I just simply do not think about often enough (where is chapter 5 when you need it).

Narendra,
I need to dig into your test case a little more.

21 03 2011
Dom Brooks

Chapter 5 – exactly.

You’ve said C3 in your comment above, but that seems to be C4 from the original examples.

> a faint memory of something that I just simply do not think about often enough
If it wasn’t for faint memories and vague recollections, my mind would be blank.

21 03 2011
Charles Hooper

Dom,

Nice catch – you are correct regarding the C3/C4 column was populated. I wonder if that is a sign that I need to work on my short-term memory too. I see that I have a bit more experimentation to do before clustering some other factor again. :-)

21 03 2011
Narendra

I have modified your example to make the values of columns C1 and C2 non-unique.

CREATE TABLE T2 (
  C1 NUMBER,
  C2 NUMBER);
  
  
INSERT INTO
T2
SELECT C1, C2 FROM (
SELECT
MOD(ROWNUM,1000) C1,
1000 - MOD(ROWNUM,1000) C2
FROM
DUAL
CONNECT BY
LEVEL<=100000)
ORDER BY C1;
 
CREATE INDEX IND_T2_C1 ON T2(C1);
CREATE INDEX IND_T2_C2 ON T2(C2);
 
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T2',CASCADE=>TRUE,ESTIMATE_PERCENT=>NULL)

SELECT
  UI.TABLE_NAME,
  UI.INDEX_NAME,
  UI.CLUSTERING_FACTOR,
  UI.NUM_ROWS,
  UT.BLOCKS
FROM
  USER_INDEXES UI,
  USER_TABLES UT
WHERE UI.TABLE_NAME = UT.TABLE_NAME
  AND UI.TABLE_NAME = 'T2'
ORDER BY
  UI.TABLE_NAME,
  UI.INDEX_NAME;

TABLE_NAME		       INDEX_NAME		      CLUSTERING_FACTOR   NUM_ROWS     BLOCKS
------------------------------ ------------------------------ ----------------- ---------- ----------
T2			       IND_T2_C1				    243     100000	  180
T2			       IND_T2_C2				    457     100000	  180

Although, the clustering factors of both indexes are not identical (probably due to non-unique column values), they both would still be considered as "good" clustering factor by the CBO.
Hope this helps.

(CH Edit: Fixed the problem caused by less than and greater than signs)

21 03 2011
Charles Hooper

Narendra,

Nice example! I was expecting to see results similar to those for the test case that I posted in this blog article. I guess that is the reason why results should be tested when the results are inconsistent with what one remembers to be true. These are the results that I received on one of the servers (running 10.2.0.2):

TABLE_NAME INDEX_NAME CLUSTERING_FACTOR   NUM_ROWS     BLOCKS
---------- ---------- ----------------- ---------- ----------
T2         IND_T2_C1                221     100000        180
T2         IND_T2_C2                479     100000        180
21 03 2011
Narendra

Charles,

Thanks for correcting my post. I should have mentioned my test was carried out on 10.2.0.1.
Now, I must admit that I don’t know how oracle calculates clustering factor so I can’t really explain whether (and how) number of non-unique index key values will impact the index clustering factor when the table data is in exact reverse order of the index key but I have a feeling that as long as the “exact reverse order” criteria is met by table data (with corresponding index keys), the clustering factor would remain close to number of blocks (rather than number of rows), which would enable the CBO to use that index to access data when possible.

21 03 2011
Pavan Kumar N

Hi Charles,

As per my knowledge it would be depend on the part of the of data that got inserted and % deletions/modifications carried out on index segment (impacted by table changes). From you example I observe that, index tree would right hand tree for c1 and left hand tree for c2 columns, as the data got inserted in chronological order. Looking into real transaction table, we might not come across. So as the data got inserted newly segment (perhaps no row migrations/chained rows) – inturn directly impacting on index it would be less. From the posted example in the above scenario, we hae 0% changes we could see the “table data is stored clustered as per index key values”.

I hope it would be helpful to suggest your valuable comments, so inturn I can rectify my understanding with concept.

25 03 2011
Donatello Settembrino

Hi Charles,
first of all congratulations, great test case.
From test case like this, it always comes out with some more knowledge.
Read the title of this post and the first to observe your test
I replied:

“I could not give a dry answer, depends on the case and the variables in play”
(I think that this statement is almost always)

But honestly I would not have immediately thought of all the possible combinations.

With regard to the clustering factor, re-running your test, I get
the following result:


SQL> select * from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
PL/SQL Release 11.2.0.2.0 - Production
CORE    11.2.0.2.0      Production
TNS for IBM/AIX RISC System/6000: Version 11.2.0.2.0 - Production
NLSRTL Version 11.2.0.2.0 - Production


SQL> SELECT a.table_name,
  2         a.index_name,
  3         a.clustering_factor,
  4         a.num_rows,
  5         b.BLOCKS as table_blocks
  6  FROM  user_indexes a
  7  join  user_tables  b
  8  on    a.TABLE_NAME = b.TABLE_NAME
  9  where a.TABLE_NAME = 'T1'
 10  ORDER BY a.TABLE_NAME,
 11           a.INDEX_NAME;

TABLE_NAME  INDEX_NAME  CLUSTERING_FACTOR NUM_ROWS   TABLE_BLOCKS
----------- ----------- ----------------- ---------- ------------
T1          IND_T1_C1   15873             1000000    16267
T1          IND_T1_C2   15873             1000000    16267

I was not surprised by the value of the clustering factor on the columns c1
and c2.

The definition I give to the clustering factor is as follows:

"represents the adjacent index keys
not point to the same data block. "

In your case can not be different .. can not

Narendra,

If I remember correctly, Oracle, calculate the clustering factor
using the function(undocumented) sys_op_countchg.
If I remember correctly, I read this thing on the book by
J. Lewis or somewhere else.

In the case of the index, the table T1 in the test of Charles:


SQL> select /*+ cursor_sharing_exact
  2        dynamic_sampling(0)
  3        no_monitoring
  4        no_expand
  5        index(t1,"IND_T1_C1")
  6        noparallel_index(t1,"IND_T1_C1") */
  7     sys_op_countchg(substrb(t1.rowid,1,15),1) as clustering_factor
  8     from t1 ;

CLUSTERING_FACTOR
-----------------
            15873

In the case of your example (with duplicate values)


SQL> CREATE TABLE T2 (
  2    C1 NUMBER,
  3    C2 NUMBER);

SQL> INSERT INTO
  2  T2
  3  SELECT C1, C2 FROM (
  4  SELECT
  5  MOD(ROWNUM,1000) C1,
  6  1000 - MOD(ROWNUM,1000) C2
  7  FROM
  8  DUAL
  9  CONNECT BY
 10  LEVEL<=100000)
 11  ORDER BY C1;

SQL> commit;


SQL> CREATE INDEX IND_T2_C1 ON T2(C1);

SQL> CREATE INDEX IND_T2_C2 ON T2(C2);

SQL> EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T2',CASCADE=>TRUE,ESTIMATE_PERCENT=>NULL)


SQL> SELECT a.table_name,
  2         a.index_name,
  3         a.clustering_factor,
  4         a.num_rows,
  5         b.BLOCKS as table_blocks
  6  FROM  user_indexes a
  7  join  user_tables  b
  8  on    a.TABLE_NAME = b.TABLE_NAME
  9  where a.TABLE_NAME = 'T2'
 10  ORDER BY a.TABLE_NAME,
 11           a.INDEX_NAME;

TABLE_NAME  INDEX_NAME   CLUSTERING_FACTOR NUM_ROWS   TABLE_BLOCKS
----------- ------------ ----------------- ---------- ------------
T2          IND_T2_C1    122               100000     121
T2          IND_T2_C2    228               100000     121

how is it calculated?

Reflecting on the definition of clustering factor that I gave earlier:

"How many adjacent index keys
not point to the same data block. "

So, if I think the definition of the clustering factor, I can imagine that Oracle does something like this:

1) for column c1 of table t2


SQL> select count(*) clusf
  2  from (
  3  with dist_key_blk as (
  4             select distinct c1
  5             , dbms_rowid.rowid_block_number(rowid) as current_blk
  6             , lead(dbms_rowid.rowid_block_number(rowid), 1, 99999999) over (ORDER BY c1) as next_blk
  7             from t2
  8             order by 1, 2
  9  )
 10  select c1
 11       , current_blk
 12       , next_blk
 13  from dist_key_blk
 14  where current_blk <> next_blk
 15  )
 16  ;

     CLUSF
----------
       122

2) for column c2 of table t2


SQL> select count(*) clusf
  2  from (
  3  with dist_key_blk as (
  4             select distinct c2
  5             , dbms_rowid.rowid_block_number(rowid) as current_blk
  6             , lead(dbms_rowid.rowid_block_number(rowid), 1, 99999999) over (ORDER BY c2) as next_blk
  7             from t2
  8             order by 1, 2
  9  )
 10  select c2
 11       , current_blk
 12       , next_blk
 13  from dist_key_blk
 14  where current_blk <> next_blk
 15  );

     CLUSF
----------
       228

Obviously, the query is only the fruit of my imagination, how it should work according to the definition ...
I did not say which is right, but I think it's pretty close ...

HTH

Donatello Settembrino

25 03 2011
Pavan Kumar N

Hi Donatello Settembrino,

That’s a pretty good demonstration of clustering factor. Lession learnt for me too..
After going through the test case, I just check across the trace file, while collecting statistics and finally came to information which you stated across. Thanks for sharing across and even Jonathan posted across the information quite long before

http://www.jlcomp.demon.co.uk/index_efficiency.html

25 03 2011
Charles Hooper

Donatello,

Nice demonstrations.

I was suffering from a bit of a mental block a couple of days ago when trying to visualize what the clustering factor should be for the ascending and descending sequence of numbers. I am still a little upset with myself about the second to the last paragraph in this article. Fortunately, people like you, Dom Brooks, and Narendra stepped up to identify the logic flaw in my reasoning.

25 03 2011
Narendra

Donatello,

That is awesome. Did not know that. Thanks.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s




Follow

Get every new post delivered to your Inbox.

Join 139 other followers

%d bloggers like this: