Possibly Interesting Execution Plan

29 10 2010

October 29, 2010

I recently found what I believe to be an interesting execution plan, but then maybe I am just having difficulty thinking of blog article topics.  Here is the execution plan:

Plan hash value: 904523798

--------------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name                      | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |                           |     1 |    80 |       |  5464   (1)| 00:01:06 |
|   1 |  HASH GROUP BY                 |                           |     1 |    80 |       |  5464   (1)| 00:01:06 |
|   2 |   NESTED LOOPS                 |                           |     1 |    80 |       |  5463   (1)| 00:01:06 |
|*  3 |    HASH JOIN                   |                           |    36 |  2340 |  6376K|  5401   (1)| 00:01:05 |
|   4 |     TABLE ACCESS BY INDEX ROWID| T_DATA_CORP_AUTHORIZATION |   408 |  9792 |       |    75   (0)| 00:00:01 |
|   5 |      NESTED LOOPS              |                           |   116K|  5003K|       |  2535   (1)| 00:00:31 |
|*  6 |       TABLE ACCESS FULL        | T_DATA_FILE_DETAILS       |   286 |  5720 |       |   202   (4)| 00:00:03 |
|*  7 |       INDEX RANGE SCAN         | PK_DATA_CORPAUTHORIZATION |   408 |       |       |     6   (0)| 00:00:01 |
|   8 |     INDEX FAST FULL SCAN       | IDX_FILE_REF_PLOC         |   911K|    18M|       |  1120   (1)| 00:00:14 |
|*  9 |    TABLE ACCESS BY INDEX ROWID | T_DATA_RECORD_DETAILS     |     1 |    15 |       |     2   (0)| 00:00:01 |
|* 10 |     INDEX UNIQUE SCAN          | PK_DATA_RECORD_DETAILS    |     1 |       |       |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("D"."FILE_ID"="R"."FILE_ID" AND "D"."RECORD_REFERENCE_NO"="R"."RECORD_REFERENCE_NO" AND
              "D"."FILE_ID"="M"."FILE_ID")
   6 - filter(TO_DATE(INTERNAL_FUNCTION("M"."FILE_UPLOADED_ON"))>=TO_DATE('2010-10-01 00:00:00',
              'yyyy-mm-dd hh24:mi:ss') AND TO_DATE(INTERNAL_FUNCTION("M"."FILE_UPLOADED_ON"))<=TO_DATE('2010-10-28
              00:00:00', 'yyyy-mm-dd hh24:mi:ss'))
   7 - access("R"."FILE_ID"="M"."FILE_ID")
   9 - filter(("N"."PRINTING_STATUS" IS NULL OR "N"."PRINTING_STATUS"<>'C') AND
              "N"."CORPORATE_AUTHORIZATION_DONE"='Y')
  10 - access("N"."FILE_ID"="R"."FILE_ID" AND "N"."RECORD_REFERENCE_NO"="R"."RECORD_REFERENCE_NO")

What do you think?  Perhaps someone will find something that is more interesting about the execution plan than what I found?  Or maybe it is just a boring execution plan that is nothing special, and I am just using this as an excuse for a blog article topic.  It might help to see the original SQL statement, but maybe not.


Actions

Information

8 responses

29 10 2010
Charles Hooper

A hint, possibly: Take a look at the predicted cardinality for line 5. When I saw that predicted I recall seeing the following blog article series and started wondering, could this be something other than a nested loops join (or maybe there is a single distinct value in the FILE_ID column):

http://jonathanlewis.wordpress.com/2010/08/15/joins-mj/

“The final join mechanism in my ‘all joins are nested loop joins’ argument is the Merge Join – a join mechanism that depends on both its row sources being pre-sorted on the join columns”

A second hint may be in the Predicate Information section for ID 6. I wonder what that INTERNAL_FUNCTION entry is, and why the date format is not specified for that entry, but is specified for the TO_DATE entry?

You might take a look at the OTN thread where this execution plan was found:

http://forums.oracle.com/forums/thread.jspa?messageID=8456127

30 10 2010
Narendra

Charles,

One of the reasons that I am a regular reader of your blog is because I like the way you use forum posts/questions to describe how oracle (and CBO, in particular) works. So my sincere request to you is to keep it up. I personally feel I learn a lot this way instead of conventional learning methods. Plus another advantage of this approach is you get to see a lot of problems that you might not experience in your workplace, and learn from it. I became interested and passionate about oracle when I started following AskTom site.
Now, coming back to the question here; first, I must admit I could not figure out much without reading the forum post to find out the corresponding query. It appears that this plan is a good example of how CBO decisions can go for a toss when obvious mistakes (like relying on implicit conversions, redundent conversions etc.) are done in writing the query. When the OP changed his/her query based on your suggestions, CBO generated most appropriate execution plan.
However, I am not sure why you think the cardinality of step 5 in EXPLAIN PLAN has any hint. My understanding is, it is normal for EXPLAIN PLAN to show the cardinality of NESTED LOOP step as the product of the joining row sources. In this case, T_DATA_FILE_DETAILS is being chosen as driving table and is expected to produce 286 rows. For each of the 286 records of driving table, an index range scan of PK_DATA_CORPAUTHORIZATION is expected to produce 408 rows. Hence the cardinality of NESTED LOOP is reported as 116K (= 286 * 408). So nothing unusual about it (to me, at least). EXPLAIN PLAN is known to report the cardinality of the inner row source for each iteration of the outer row source.
I am bit surprised/confused that why CBO decided to go for a NESTED LOOP join instead of HASH join. Is it because of some limitation of the CBO in calculating the cost of HASH join? Considering that the database version is 10.2.0.3, it seems a possibility.
About INTERNAL_FUNCTION, I think it is just the way oracle reports the predicate details. I can’t remember where I read about it, but when one applies a TO_DATE function to a DATE column, oracle actually converts the DATE to a CHAR and then applies TO_DATE to it. Why it does not include format details, I have no idea.

30 10 2010
Charles Hooper

Narendra,

Thank you for the compliments. I am not a regular reader of the AskTom site, but I have read several of the AskTom threads from the beginning to the end when they appear in search engine results. I participate in the forums in part for the reason that you mentioned – it is just one way to learn and to maintain the knowledge that you have previously learned. Additionally, I see it as a way to keep inaccurate information from being constantly repeated without being questioned.

Notice in the above execution plan that the nested loops join is expected to produce 116K rows, which is the same as 286 * 408. This nested loops join is actually a Cartesian join based on the optimizer’s cardinality estimates. The execution plan should show the number of expected rows that are being returned from each operation. A quick test case to demonstrate, first we set up the tables and indexes:

 CREATE TABLE T1 (
  C1 NUMBER NOT NULL,
  C2 VARCHAR2(20));
 
CREATE TABLE T2 (
  C1 NUMBER NOT NULL,
  C2 VARCHAR2(20));
 
CREATE TABLE T3 (
  C1 NUMBER NOT NULL,
  C2 VARCHAR2(20));
 
CREATE TABLE T4 (
  C1 NUMBER NOT NULL,
  C2 VARCHAR2(20));
 
CREATE INDEX IND_T1_C1 ON T1(C1);
CREATE INDEX IND_T2_C1 ON T2(C1);
CREATE INDEX IND_T3_C1 ON T3(C1);
CREATE INDEX IND_T4_C1 ON T4(C1);
 
INSERT INTO
  T1
SELECT
  ROWNUM,
  LPAD(TO_CHAR(ROWNUM),20,'0')
FROM
  DUAL
CONNECT BY
  LEVEL<=1000;
 
INSERT INTO
  T2
SELECT
  ROWNUM,
  LPAD(TO_CHAR(ROWNUM),20,'0')
FROM
  DUAL
CONNECT BY
  LEVEL<=1000;

INSERT INTO
  T3
SELECT
  1,
  LPAD(TO_CHAR(ROWNUM),20,'0')
FROM
  DUAL
CONNECT BY
  LEVEL<=1000;
 
INSERT INTO
  T4
SELECT
  1,
  LPAD(TO_CHAR(ROWNUM),20,'0')
FROM
  DUAL
CONNECT BY
  LEVEL<=1000;
 
COMMIT;
 
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T1',CASCADE=>TRUE,ESTIMATE_PERCENT=>100)
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T2',CASCADE=>TRUE,ESTIMATE_PERCENT=>100)
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T3',CASCADE=>TRUE,ESTIMATE_PERCENT=>100)
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T4',CASCADE=>TRUE,ESTIMATE_PERCENT=>100)

In the above, tables T1 and T2 are identical, and tables T3 and T4 are identical – but there are difference between tables T1 and T3. Table T1 contains 1,000 rows with 1,000 distinct values in column C1, while table T3 contains 1,000 rows with 1 distinct value in column C1. Now let’s take a look at an execution plan that joins tables T1 and T2 on column C1 (this execution plan was generated on 11.2.0.1, setting OPTIMIZER_FEATURES_ENABLE=’10.2.0.3′ did not change the execution plans):

SELECT
  T1.C1,
  T1.C2,
  T2.C1,
  T2.C2
FROM
  T1,
  T2
WHERE
  T1.C1 = T2.C1;
 
Execution Plan
----------------------------------------------------------
Plan hash value: 1838229974
 
---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |  1000 | 50000 |     7  (43)| 00:00:01 |
|*  1 |  HASH JOIN         |      |  1000 | 50000 |     7  (43)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| T1   |  1000 | 25000 |     3  (34)| 00:00:01 |
|   3 |   TABLE ACCESS FULL| T2   |  1000 | 25000 |     3  (34)| 00:00:01 |
---------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."C1"="T2"."C1")

A hash join by default, so I will have to use a hint to force a nested loops join: Note that the hash join indicates that 1,000 rows will be returned, rather than 1,000 * 1,000.

SELECT /*+ USE_NL(T1 T2) */
  T1.C1,
  T1.C2,
  T2.C1,
  T2.C2
FROM
  T1,
  T2
WHERE
  T1.C1 = T2.C1;
 
Execution Plan
----------------------------------------------------------
Plan hash value: 1967407726
 
---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |  1000 | 50000 |   890  (86)| 00:00:01 |
|   1 |  NESTED LOOPS      |      |  1000 | 50000 |   890  (86)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| T1   |  1000 | 25000 |     3  (34)| 00:00:01 |
|*  3 |   TABLE ACCESS FULL| T2   |     1 |    25 |     1 (100)| 00:00:01 |
---------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("T1"."C1"="T2"."C1")

I achieved a nested loops join in the above, even though full table scans are performed on both tables. Notice that the nested loops join, like the previous hash join, is expected to produce 1,000 rows rather than 1,000 * 1,000. Someone might wonder if this will change if table T2 is accessed through an index:

SELECT /*+ USE_NL(T1 T2) INDEX(T2) */
  T1.C1,
  T1.C2,
  T2.C1,
  T2.C2
FROM
  T1,
  T2
WHERE
  T1.C1 = T2.C1;
 
Execution Plan
----------------------------------------------------------
Plan hash value: 2827761549
 
------------------------------------------------------------------------------------------
| Id  | Operation                    | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |           |  1000 | 50000 |   897  (85)| 00:00:01 |
|   1 |  NESTED LOOPS                |           |  1000 | 50000 |   897  (85)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID| T2        |  1000 | 25000 |    10  (20)| 00:00:01 |
|   3 |    INDEX FULL SCAN           | IND_T2_C1 |  1000 |       |     4  (25)| 00:00:01 |
|*  4 |   TABLE ACCESS FULL          | T1        |     1 |    25 |     1 (100)| 00:00:01 |
------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter("T1"."C1"="T2"."C1")

I was hoping for an index range scan of table T2, I guess that I should have been more specific with the index hint. :-) Notice that the optimizer is still projecting that 1,000 rows will be returned from the nested loops join.

Now, let’s repeat the test using tables T3 and T4, which have one distinct value in column C1:

SELECT /*+ USE_NL(T3 T4) */
  T3.C1,
  T3.C2,
  T4.C1,
  T4.C2
FROM
  T3,
  T4
WHERE
  T3.C1 = T4.C1;
 
Execution Plan
----------------------------------------------------------
Plan hash value: 1907878852

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |  1000K|    45M|   838  (88)| 00:00:01 |
|   1 |  NESTED LOOPS      |      |  1000K|    45M|   838  (88)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| T3   |  1000 | 24000 |     3  (34)| 00:00:01 |
|*  3 |   TABLE ACCESS FULL| T4   |  1000 | 24000 |     1 (100)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("T3"."C1"="T4"."C1")
 
SELECT /*+ USE_NL(T3 T4) INDEX(T4) */
  T3.C1,
  T3.C2,
  T4.C1,
  T4.C2
FROM
  T3,
  T4
WHERE
  T3.C1 = T4.C1;
 
Execution Plan
----------------------------------------------------------
Plan hash value: 2532358356

------------------------------------------------------------------------------------------
| Id  | Operation                    | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |           |  1000K|    45M|   844  (87)| 00:00:01 |
|   1 |  NESTED LOOPS                |           |  1000K|    45M|   844  (87)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID| T4        |  1000 | 24000 |     9  (23)| 00:00:01 |
|   3 |    INDEX FULL SCAN           | IND_T4_C1 |  1000 |       |     4  (25)| 00:00:01 |
|*  4 |   TABLE ACCESS FULL          | T3        |  1000 | 24000 |     1 (100)| 00:00:01 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter("T3"."C1"="T4"."C1")

Notice now that the nested loops join is projected to produce 1000K (1,000 * 1,000) rows – essentially a Cartesian join.

I am beginning to wonder if the original poster in the OTN thread might have forgot to join the M.RECORD_REFERENCE_NO column, if such a column exists.

Regarding the cardinality problem on the line in the execution plan identified by ID 6, I recall reading something that Jonathan Lewis wrote some time again. After reading the article I remember writing something like, when functions are involved, all possible outcomes of an inequality total 11% of the rows in the table (there is more to the story). A less than inequality will cause the optimizer to estimate that 5% of the rows will be returned, a greater than inequality will cause the optimizer to estimate that 5% of the rows will be returned, and an equality will cause the optimizer to estimate that 1% of the rows will be returned. A demonstration, first doing it correctly:

SELECT
  *
FROM
  T1
WHERE
  C1<1000;
 
Execution Plan
----------------------------------------------------------
Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |   999 | 24975 |     3  (34)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |   999 | 24975 |     3  (34)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("C1"<1000)

The less than inequality has correctly estimated that 999 rows will be returned.

SELECT
  *
FROM
  T1
WHERE
  C1=1000;
 
Execution Plan
----------------------------------------------------------
Plan hash value: 683303157

-----------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |     1 |    25 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1        |     1 |    25 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IND_T1_C1 |     1 |       |     1   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C1"=1000)

The equality has caused the optimizer to correctly estimate that 1 row will be returned.

SELECT
  *
FROM
  T1
WHERE
  C1>1000;
 
Execution Plan
----------------------------------------------------------
Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |    25 |     3  (34)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |     1 |    25 |     3  (34)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("C1">1000)

The optimizer has predicted that one row will be returned by the above since the statistics indicate that 1,000 is the maximum value for the column.

Now let’s try again, this time telling Oracle to convert column C1’s values to a number (that column is already defined as NUMBER):

SELECT
  *
FROM
  T1
WHERE
  TO_NUMBER(C1)<1000;
 
Execution Plan
----------------------------------------------------------
Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |    50 |  1250 |     4  (50)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |    50 |  1250 |     4  (50)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(TO_NUMBER(TO_CHAR("C1"))<1000)

SELECT
  *
FROM
  T1
WHERE
  TO_NUMBER(C1)=1000;
 
Execution Plan
----------------------------------------------------------
Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |    25 |     4  (50)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |     1 |    25 |     4  (50)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(TO_NUMBER(TO_CHAR("C1"))=1000)

SELECT
  *
FROM
  T1
WHERE
  TO_NUMBER(C1)>1000;
 
Execution Plan
----------------------------------------------------------
Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |    50 |  1250 |     4  (50)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |    50 |  1250 |     4  (50)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(TO_NUMBER(TO_CHAR("C1"))>1000)

In the above, less than is projected to return 5% of the rows, greater than is projected to return 5% of the rows, and equal to is projected to return 0.1% of the rows, making all possible outcomes of the inequality comparison equal to 10.1% of the rows in the table. I think that this might show why I was initially concerned when I saw the TO_NUMBER (edit Nov 1, 2010: TO_DATE) function wrapping one of the OP’s columns.

30 10 2010
Narendra

I think that this might show why I was initially concerned when I saw the TO_NUMBER function wrapping one of the OP’s columns
Charles,
I think you mean to say TO_DATE function in place of TO_NUMBER, right? But thanks for explaining the logic to derive the estimated cardinality of step 6. Learned some thing new today.
However, I am still not convinced about the terminology “cartesian join” used to describe estimated cardinality. To me, it is just the way EXPLAIN PLAN reports the estimated cardinalities. In case of NESTED LOOP, the join condition is reported against the inner table access and hence the corresponding estimated cardinality is the number of rows for each record of the driving table. In case of HASH join, the plan output reports the join condition being applied at the HASH JOIN step (and not against any individual row sources) and hence the estimated cardinality of join step is never reported as the product of two row sources, which happens in case of NESTED LOOP join step. All I am saying is the estimated cardinality of JOIN step in this case is not due to any missing join condition but is simply derived from the estimated cardinalities of the two join sources being joined. I have generally observed the MERGE JOIN CARTESIAN being reported by the EXPLAIN PLAN when there is a missing join condition (but what do I know :) )
Maybe, we are saying the same thing but, as I said, the term “cartesian join” does not seem the right way to describe this.

1 11 2010
Charles Hooper

Narendra,

Thank you for the correction regarding TO_DATE appearing in the OP’s post rather than TO_NUMBER. I suspect that I should have spent a little longer proofreading my previous comment.

Regarding my comment about the nested loops join actually being a Cartesian join, one of Oracle’s definitions is found here:

http://download.oracle.com/docs/cd/E14072_01/server.112/e10821/optimops.htm#BABDGFIE

“A Cartesian join is used when one or more of the tables does not have any join conditions to any other tables in the statement. The optimizer joins every row from one data source with every row from the other data source, creating the Cartesian product of the two sets.”

So, if one table has 286 rows and another table has 408 rows, a Cartesian join of the two tables will produce 116,688 rows. The execution plan at the start of this blog article shows that the optimizer is predicting that 116K (roughly 116,000) rows will be returned by the nested loops operation at the execution plan line identified by ID 5. So, this is a Cartesian join, even though the predicate information section of the execution plan shows that we are driving into the PK_DATA_CORPAUTHORIZATION using a join condition:

7 - access("R"."FILE_ID"="M"."FILE_ID")

I do not believe that this is just the way EXPLAIN PLAN reports estimates for nested loops joins, and the test case in my previous comment in this thread shows why. Compare the predicted number of rows to be returned by the nested loops join in the following two execution plans (from my previous comment in this thread):

------------------------------------------------------------------------------------------
| Id  | Operation                    | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |           |  1000 | 50000 |   897  (85)| 00:00:01 |
|   1 |  NESTED LOOPS                |           |  1000 | 50000 |   897  (85)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID| T2        |  1000 | 25000 |    10  (20)| 00:00:01 |
|   3 |    INDEX FULL SCAN           | IND_T2_C1 |  1000 |       |     4  (25)| 00:00:01 |
|*  4 |   TABLE ACCESS FULL          | T1        |     1 |    25 |     1 (100)| 00:00:01 |
------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter("T1"."C1"="T2"."C1")
------------------------------------------------------

------------------------------------------------------------------------------------------
| Id  | Operation                    | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |           |  1000K|    45M|   844  (87)| 00:00:01 |
|   1 |  NESTED LOOPS                |           |  1000K|    45M|   844  (87)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID| T4        |  1000 | 24000 |     9  (23)| 00:00:01 |
|   3 |    INDEX FULL SCAN           | IND_T4_C1 |  1000 |       |     4  (25)| 00:00:01 |
|*  4 |   TABLE ACCESS FULL          | T3        |  1000 | 24000 |     1 (100)| 00:00:01 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter("T3"."C1"="T4"."C1")

Notice that the first execution plan estimates that the nested loops join will return 1,000 rows, while the second is estimated to return 1,000,000 (1000K) rows (tables T1, T2, T3, T4 all contain 1,000 rows).

1 11 2010
Narendra

Charles,

Only since you are saying this I have started doubting myself now ;)
Let me see if I manage to explain myself better.
Both of your examples prove that cardinality of NESTED LOOP join step, in EXPLAIN PLAN, is ALWAYS reported as product of the driving row source estimated cardinality multiplied by estimated cardinality of inner row source.
In your first example, driving row source is T2 (which is accessed using index) and its estimated cardinality is 1000 whereas inner row source is T1 and its estimated cardinality is reported as 1. The estimated cardinality of NESTED LOOP join step is reported as 1000 (1000 * 1).
In your second example, driving row source is T4 (which is accessed using index) and its estimated cardinality is 1000 whereas inner row source is T3 and its estimated cardinality is reported as 1000. The estimated cardinality of NESTED LOOP join step is reported as 1000K (1000 * 1000).
Basically, the NESTED LOOP join is processed as follows (pseudo code)

For each record in driving row source
Loop
   fetch matching data from inner row source
   Output the row
End Loop

I am absolutely sure that this case does not represent cartesian join. Basically in NESTED LOOP join, inner row source is always accessed once for each record of the driving row source. This can be confirmed from the value of “Starts” column in DBMS_XPLAN.DISPLAY_CURSOR output.
To me, cartesian join will come into picture when two row sources are retrieved independent of each other and each row (from one row source) will be joined to each row (of another row source), which takes place when either
1) join condition is missing OR
2) the revised NESTED LOOP join mechanism introduced in 9i and used till 10g (also called “Table Prefetch”) comes into effect and entire join condition is not satisfied by the index, provided inner row source is accessed using index. OR
3) both row sources are joined based on inequality operator, in which case MERGE JOIN CARTESIAN is used.

1 11 2010
Charles Hooper

I agree with the first half of your comment (I probably misunderstood your previous comments).

I will again argue that the second of the above execution plans in my previous comment is in fact showing a Cartesian join. Each table has 1,000 rows. If two 1,000 row tables are joined, and the result is 1,000,000 rows, then we are in fact seeing a Cartesian join, even though the execution plan reports that a nested loops join was performed. As was the case for the OP’s execution plan, where R.FILE_ID=M.FILE_ID was specified, and in my case where T3.C1=T4.C1 was specified, the join condition is *essentially meaningless* because the same value is recorded in that column for all rows in both tables. That means all rows in one table match the join condition for all rows in the second table. That is one of the reasons why I made note of the OP’s execution plan as a possibly interesting execution plan.

1 11 2010
Narendra

Charles,

Ok. Now I understood what you were trying to say with your second test case. I am too slow :( My sincere apologies for dragging this one for so long.
I agree that because of the nature of the data (in the second case), the join effectively becomes a cartesian join. But in that case, I am not sure why the optimizer did not decide to use MERGE JOIN CARTESIAN or HASH JOIN. Probably it is due to the way CBO calculates the costs of various join mechanisms.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s




Follow

Get every new post delivered to your Inbox.

Join 139 other followers

%d bloggers like this: