Which is Most Efficient when Selecting Rows: EXISTS, IN, or a JOIN

29 12 2009

December 29, 2009

This post is a follow up to the previous post that questioned which approach is most efficient when deleting rows.

The Oracle documentation for 10g R2 states the following

“In certain circumstances, it is better to use IN rather than EXISTS. In general, if the selective predicate is in the subquery, then use IN. If the selective predicate is in the parent query, then use EXISTS.”

“Sometimes, Oracle can rewrite a subquery when used with an IN clause to take advantage of selectivity specified in the subquery. This is most beneficial when the most selective filter appears in the subquery and there are indexes on the join columns. Conversely, using EXISTS is beneficial when the most selective filter is in the parent query. This allows the selective predicates in the parent query to be applied before filtering the rows against the EXISTS criteria.”

The above quotes seems to be similar to the quotes provided in the earlier blog article.  Let’s set up the same test tables as were used in the previous blog article:

CREATE TABLE T1 (
  C1 NUMBER,
  FILLER VARCHAR2(300),
  PRIMARY KEY (C1));

CREATE TABLE T2 (
  C1 NUMBER,
  FILLER VARCHAR2(300),
  PRIMARY KEY (C1));

INSERT INTO
  T1
SELECT
  ROWNUM,
  LPAD('A',300,'A')
FROM
  (SELECT
    ROWNUM NR
  FROM
    DUAL
  CONNECT BY
    LEVEL<=1000) V1,
  (SELECT
    ROWNUM NR
  FROM
    DUAL
  CONNECT BY
    LEVEL<=1000) V2;

INSERT INTO
  T2
SELECT
  ROWNUM*3,
  LPAD('A',300,'A')
FROM
  (SELECT
    ROWNUM NR
  FROM
    DUAL
  CONNECT BY
    LEVEL<=333) V1,
  (SELECT
    ROWNUM NR
  FROM
    DUAL
  CONNECT BY
    LEVEL<=1000) V2;

COMMIT;

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER, TABNAME=>'T1')
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER, TABNAME=>'T2')

Next, we will set up to capture a 10053 trace for three SQL statements:

  1. Uncorrelated IN subquery
  2. Correlated IN subquery
  3. EXISTS subquery
ALTER SESSION SET TRACEFILE_IDENTIFIER = 'SELECT_METHODS';
ALTER SESSION SET EVENTS '10053 trace name context forever, level 1';

ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE;

SET TIMING ON

SPOOL select_methods.txt

SELECT C1 FROM T1 WHERE C1 IN (SELECT C1 FROM T2);

ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE;

SELECT C1 FROM T1 WHERE C1 IN (SELECT C1 FROM T2 WHERE T2.C1=T1.C1);

ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE;

SELECT C1 FROM T1 WHERE EXISTS (SELECT 1 FROM T2 WHERE T2.C1=T1.C1);

SPOOL OFF

ALTER SESSION SET EVENTS '10053 trace name context off';

On Oracle Database 11.1.0.7, I received the following output in the select_methods.txt file (slightly cleaned up):

SQL> SELECT C1 FROM T1 WHERE C1 IN (SELECT C1 FROM T2);

Elapsed: 00:00:43.54

SQL> ALTER SYSTEM FLUSH BUFFER_CACHE;
SQL> ALTER SYSTEM FLUSH BUFFER_CACHE;

SQL> SELECT C1 FROM T1 WHERE C1 IN (SELECT C1 FROM T2 WHERE T2.C1=T1.C1);

Elapsed: 00:00:43.18

SQL> ALTER SYSTEM FLUSH BUFFER_CACHE;
SQL> ALTER SYSTEM FLUSH BUFFER_CACHE;

SQL> SELECT C1 FROM T1 WHERE EXISTS (SELECT 1 FROM T2 WHERE T2.C1=T1.C1);

Elapsed: 00:00:43.33

The above output seems to indicate that the correlated IN subquery completed the fastest, followed by the EXISTS subquery, and then the uncorrelated IN subquery.  Let’s check the 10053 trace file.

******************************************
----- Current SQL Statement for this session (sql_id=19gjmx8y5rcg9) -----
SELECT C1 FROM T1 WHERE C1 IN (SELECT C1 FROM T2)
*******************************************
...
Final query after transformations:******* UNPARSED QUERY IS *******
SELECT "T1"."C1" "C1" FROM "TESTUSER"."T2" "T2","TESTUSER"."T1" "T1" WHERE "T1"."C1"="T2"."C1"
...
============
Plan Table
============
---------------------------------------------+-----------------------------------+
| Id  | Operation              | Name        | Rows  | Bytes | Cost  | Time      |
---------------------------------------------+-----------------------------------+
| 0   | SELECT STATEMENT       |             |       |       |   576 |           |
| 1   |  NESTED LOOPS          |             |  325K | 3252K |   576 |  00:00:07 |
| 2   |   INDEX FAST FULL SCAN | SYS_C0016167|  977K | 4883K |   514 |  00:00:07 |
| 3   |   INDEX UNIQUE SCAN    | SYS_C0016168|     1 |     5 |     0 |           |
---------------------------------------------+-----------------------------------+
Predicate Information:
----------------------
3 - access("C1"="C1")

...
...
...

******************************************
----- Current SQL Statement for this session (sql_id=5hkw5pa3stxvr) -----
SELECT C1 FROM T1 WHERE C1 IN (SELECT C1 FROM T2 WHERE T2.C1=T1.C1)
*******************************************
...
Final query after transformations:******* UNPARSED QUERY IS *******
SELECT "T1"."C1" "C1" FROM "TESTUSER"."T2" "T2","TESTUSER"."T1" "T1" WHERE "T1"."C1"="T2"."C1" AND "T2"."C1"="T1"."C1"
...
============
Plan Table
============
---------------------------------------------+-----------------------------------+
| Id  | Operation              | Name        | Rows  | Bytes | Cost  | Time      |
---------------------------------------------+-----------------------------------+
| 0   | SELECT STATEMENT       |             |       |       |   576 |           |
| 1   |  NESTED LOOPS          |             |  325K | 3252K |   576 |  00:00:07 |
| 2   |   INDEX FAST FULL SCAN | SYS_C0016167|  977K | 4883K |   514 |  00:00:07 |
| 3   |   INDEX UNIQUE SCAN    | SYS_C0016168|     1 |     5 |     0 |           |
---------------------------------------------+-----------------------------------+
Predicate Information:
----------------------
3 - access("C1"="C1")

...
...
...

******************************************
----- Current SQL Statement for this session (sql_id=4x73yz9t7dazj) -----
SELECT C1 FROM T1 WHERE EXISTS (SELECT 1 FROM T2 WHERE T2.C1=T1.C1)
*******************************************
...
Final query after transformations:******* UNPARSED QUERY IS *******
SELECT "T1"."C1" "C1" FROM "TESTUSER"."T2" "T2","TESTUSER"."T1" "T1" WHERE "T2"."C1"="T1"."C1"
...
============
Plan Table
============
---------------------------------------------+-----------------------------------+
| Id  | Operation              | Name        | Rows  | Bytes | Cost  | Time      |
---------------------------------------------+-----------------------------------+
| 0   | SELECT STATEMENT       |             |       |       |   576 |           |
| 1   |  NESTED LOOPS SEMI     |             |  325K | 3252K |   576 |  00:00:07 |
| 2   |   INDEX FAST FULL SCAN | SYS_C0016167|  977K | 4883K |   514 |  00:00:07 |
| 3   |   INDEX UNIQUE SCAN    | SYS_C0016168|  108K |  541K |     0 |           |
---------------------------------------------+-----------------------------------+
Predicate Information:
----------------------
3 - access("T2"."C1"="T1"."C1")

Notice anything strange about the “Final query after transformations”?  The transformed uncorrelated IN subquery and the transformed EXISTS subquery are identical (if you ignore that T1.C1= T2.C1 is actually listed as T2.C1=T1.C1.  Notice also that the transformed queries are now standard joins rather than the IN query being transformed into an EXISTS query, or vice-versa.  The transformed version of the correlated IN subquery is a bit odd – it seems that the query optimizer in 11.1.0.7 did not automatically eliminate the duplicate T1.C1= T2.C1 (if I recall correctly 10.2.0.4 does remove the duplicate predicate when writing the “Final query”).  OK, the transformed versions of the SQL statements are for all purposes identical.  The plan for the EXISTS query shows that the join is a NESTED LOOPS SEMI, while the other plans show a join of NESTED LOOPS.

Let’s look at the DBMS_XPLAN output (note that STATISTICS_LEVEL was set to the default of TYPICAL):

SET TIMING OFF
SET AUTOTRACE OFF

SET PAGESIZE 2000
SET LINESIZE 140

ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE;

SPOOL select_methods2.txt

SELECT /*+ GATHER_PLAN_STATISTICS */ C1 FROM T1 WHERE C1 IN (SELECT C1 FROM T2);

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE;

SELECT /*+ GATHER_PLAN_STATISTICS */ C1 FROM T1 WHERE C1 IN (SELECT C1 FROM T2 WHERE T2.C1=T1.C1);

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE;

SELECT /*+ GATHER_PLAN_STATISTICS */ C1 FROM T1 WHERE EXISTS (SELECT 1 FROM T2 WHERE T2.C1=T1.C1);

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

SPOOL OFF 

The output of the above is as follows (note that the output has been cleaned up slightly):

SQL_ID  gwd4d6sk75hhk, child number 0                                
-------------------------------------                                
SELECT /*+ GATHER_PLAN_STATISTICS */ C1 FROM T1 WHERE C1 IN (SELECT C1 FROM T2)                          
Plan hash value: 319633161        

---------------------------------------------------------------------------------------------------------
| Id  | Operation             | Name         | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |              |      1 |        |    333K|00:00:00.05 |   64418 |   2547 |
|   1 |  NESTED LOOPS         |              |      1 |    333K|    333K|00:00:00.05 |   64418 |   2547 |
|   2 |   INDEX FAST FULL SCAN| SYS_C0016167 |      1 |   1000K|   1000K|00:00:00.02 |   24052 |   1883 |
|*  3 |   INDEX UNIQUE SCAN   | SYS_C0016168 |   1000K|      1 |    333K|00:00:00.84 |   40366 |    664 |
---------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):                  
---------------------------------------------------                  
   3 - access("C1"="C1")          

-

SQL_ID  fcm7ptb886bzt, child number 0                                
-------------------------------------                                
SELECT /*+ GATHER_PLAN_STATISTICS */ C1 FROM T1 WHERE C1 IN (SELECT C1 FROM T2 WHERE T2.C1=T1.C1)        

Plan hash value: 319633161        

---------------------------------------------------------------------------------------------------------
| Id  | Operation             | Name         | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |              |      1 |        |    333K|00:00:00.09 |   64418 |   2547 |
|   1 |  NESTED LOOPS         |              |      1 |    333K|    333K|00:00:00.09 |   64418 |   2547 |
|   2 |   INDEX FAST FULL SCAN| SYS_C0016167 |      1 |   1000K|   1000K|00:00:00.03 |   24052 |   1883 |
|*  3 |   INDEX UNIQUE SCAN   | SYS_C0016168 |   1000K|      1 |    333K|00:00:01.06 |   40366 |    664 |
---------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):                  
---------------------------------------------------                  
   3 - access("C1"="C1")          

-

SQL_ID  4ym7p8815nquy, child number 0                                
-------------------------------------                                
SELECT /*+ GATHER_PLAN_STATISTICS */ C1 FROM T1 WHERE EXISTS (SELECT 1 FROM T2 WHERE T2.C1=T1.C1)        

Plan hash value: 2371405353       

---------------------------------------------------------------------------------------------------------
| Id  | Operation             | Name         | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |              |      1 |        |    333K|00:00:00.06 |   64418 |   2547 |
|   1 |  NESTED LOOPS SEMI    |              |      1 |    333K|    333K|00:00:00.06 |   64418 |   2547 |
|   2 |   INDEX FAST FULL SCAN| SYS_C0016167 |      1 |   1000K|   1000K|00:00:00.03 |   24052 |   1883 |
|*  3 |   INDEX UNIQUE SCAN   | SYS_C0016168 |   1000K|    110K|    333K|00:00:00.94 |   40366 |    664 |
---------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):                  
---------------------------------------------------                  
   3 - access("T2"."C1"="T1"."C1")

The output captured during the 10053 trace implied: “The above output seems to indicate that the correlated IN subquery completed the fastest, followed by the EXISTS subquery, and then the uncorrelated IN subquery.”  The output captured during the DBMS_XPLAN script seems to show that the uncorrelated subquery completed the fastest, followed by the EXISTS subquery, and then the correlated subquery.

OK, so the question remains: Which is most efficient when selecting rows: EXISTS, IN, or a JOIN?  It is pretty hard to tell when the optimizer automatically re-writes the queries that we submit.

It might be interesting to recheck the output with a new test case that permits:

  1. NULL values in one or both tables.
  2. Duplicate values in one or both tables.
  3. Larger data sets.




Which is Most Efficient when Deleting Rows: EXISTS, IN, or a VIEW

29 12 2009

December 29, 2009

In the process of reading the book “Oracle SQL Recipes” I encountered a couple interesting statements.  One such statement is found on page 68 in recipe 3-5 Removing Rows Based on Data in Other Tables.  The book states the following:

“Whether you use IN or EXISTS depends on the sizes of the driving table (the outer table referenced in the SELECT, UPDATE, or DELETE) and the size of the result set in the subquery.  Using IN is most likely better if the results of the subquery are small or is a list of constants.  However, using EXISTS may run a lot more efficiently since the implicit JOIN may take advantage of indexes.”

The above comment sounds a bit like this one from the book “Expert Oracle Database 11g Administration” on page 1067 that might be describing just the behavior of SELECT statements, or might also be describing the behavior of DELETE statements:

“Subqueries perform better when you use IN rather than EXISTS.  Oracle recommends using the IN clause if the subquery has the selective WHERE clause.  If the parent query contains the selective WHERE clause, use the EXISTS clause rather than the IN clause.”

Both are interesting statements.  The “Oracle SQL Recipes” book provided three different SQL statements for removing rows in one table based on the contents of another table – those SQL statements used the demo data available for Oracle 11g.  I do not have the demo data loaded, so I wondered – what if I set up a simple test case?

Which method do you think will execute faster using the supplied test case?

The sample data, 1,000,000 rows in table T1 and 333,000 rows in T2:

CREATE TABLE T1 (
  C1 NUMBER,
  FILLER VARCHAR2(300),
  PRIMARY KEY (C1));

CREATE TABLE T2 (
  C1 NUMBER,
  FILLER VARCHAR2(300),
  PRIMARY KEY (C1));

INSERT INTO
  T1
SELECT
  ROWNUM,
  LPAD('A',300,'A')
FROM
  (SELECT
    ROWNUM NR
  FROM
    DUAL
  CONNECT BY
    LEVEL<=1000) V1,
  (SELECT
    ROWNUM NR
  FROM
    DUAL
  CONNECT BY
    LEVEL<=1000) V2;

INSERT INTO
  T2
SELECT
  ROWNUM*3,
  LPAD('A',300,'A')
FROM
  (SELECT
    ROWNUM NR
  FROM
    DUAL
  CONNECT BY
    LEVEL<=333) V1,
  (SELECT
    ROWNUM NR
  FROM
    DUAL
  CONNECT BY
    LEVEL<=1000) V2;

COMMIT;

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER, TABNAME=>'T1')
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER, TABNAME=>'T2')

The sample DELETE statements:

DELETE FROM T1 WHERE C1 IN (SELECT C1 FROM T2);

DELETE FROM T1 WHERE EXISTS (SELECT 1 FROM T2 WHERE T2.C1=T1.C1);

DELETE (SELECT C1 FROM T1 JOIN T2 USING (C1));

——–

Do not scroll down – you will spoil the follow up.  Which do you think is the most efficient method based on what is written above, and what you know about the behavior of recent Oracle releases (say 10.1 and later)?

Let’s say you ran the following script:

ALTER SESSION SET TRACEFILE_IDENTIFIER = 'DELETE_METHODS';
ALTER SESSION SET EVENTS '10053 trace name context forever, level 1';

ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE;

SET TIMING ON

SPOOL delete_methods.txt

DELETE FROM T1 WHERE C1 IN (SELECT C1 FROM T2);

ROLLBACK;

ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE;

DELETE FROM T1 WHERE EXISTS (SELECT 1 FROM T2 WHERE T2.C1=T1.C1);

ROLLBACK;

ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE;

DELETE (SELECT C1 FROM T1 JOIN T2 USING (C1));

ROLLBACK;

SPOOL OFF

ALTER SESSION SET EVENTS '10053 trace name context off';

Now which do you think is most efficient?  Do not scroll down – you will spoil the follow up.

Let’s say that in the Oracle Database 11.1.0.7 10053 trace file created by the above script, you find the following:

******************************************
----- Current SQL Statement for this session (sql_id=7mc4cvm67pns3) -----
DELETE FROM T1 WHERE C1 IN (SELECT C1 FROM T2)
*******************************************
...
Final query after transformations:******* UNPARSED QUERY IS *******
SELECT 0 FROM "TESTUSER"."T2" "T2","TESTUSER"."T1" "T1" WHERE "T1"."C1"="T2"."C1"
...
============
Plan Table
============
-------------------------------------------+-----------------------------------+
| Id  | Operation            | Name        | Rows  | Bytes | Cost  | Time      |
-------------------------------------------+-----------------------------------+
| 0   | DELETE STATEMENT     |             |       |       |  2025 |           |
| 1   |  DELETE              | T1          |       |       |       |           |
| 2   |   NESTED LOOPS       |             |  325K | 3252K |  2025 |  00:00:25 |
| 3   |    INDEX FULL SCAN   | SYS_C0030271|  977K | 4883K |  1892 |  00:00:23 |
| 4   |    INDEX UNIQUE SCAN | SYS_C0030272|     1 |     5 |     0 |           |
-------------------------------------------+-----------------------------------+
Predicate Information:
----------------------
4 - access("C1"="C1")

...
...
...

******************************************
----- Current SQL Statement for this session (sql_id=ahq5s54mw5a2g) -----
DELETE FROM T1 WHERE EXISTS (SELECT 1 FROM T2 WHERE T2.C1=T1.C1)
*******************************************
...
Final query after transformations:******* UNPARSED QUERY IS *******
SELECT 0 FROM "TESTUSER"."T2" "T2","TESTUSER"."T1" "T1" WHERE "T2"."C1"="T1"."C1"
...
============
Plan Table
============
-------------------------------------------+-----------------------------------+
| Id  | Operation            | Name        | Rows  | Bytes | Cost  | Time      |
-------------------------------------------+-----------------------------------+
| 0   | DELETE STATEMENT     |             |       |       |  2025 |           |
| 1   |  DELETE              | T1          |       |       |       |           |
| 2   |   NESTED LOOPS SEMI  |             |  325K | 3252K |  2025 |  00:00:25 |
| 3   |    INDEX FULL SCAN   | SYS_C0030271|  977K | 4883K |  1892 |  00:00:23 |
| 4   |    INDEX UNIQUE SCAN | SYS_C0030272|  108K |  541K |     0 |           |
-------------------------------------------+-----------------------------------+
Predicate Information:
----------------------
4 - access("T2"."C1"="T1"."C1")

...
...
...

******************************************
----- Current SQL Statement for this session (sql_id=61r3vxah952fu) -----
DELETE (SELECT C1 FROM T1 JOIN T2 USING (C1))
*******************************************
...
Final query after transformations:******* UNPARSED QUERY IS *******
SELECT 0 FROM "TESTUSER"."T2" "T2","TESTUSER"."T1" "T1" WHERE "T1"."C1"="T2"."C1"
...
============
Plan Table
============
-------------------------------------------+-----------------------------------+
| Id  | Operation            | Name        | Rows  | Bytes | Cost  | Time      |
-------------------------------------------+-----------------------------------+
| 0   | DELETE STATEMENT     |             |       |       |  2025 |           |
| 1   |  DELETE              | T1          |       |       |       |           |
| 2   |   NESTED LOOPS       |             |  325K | 3252K |  2025 |  00:00:25 |
| 3   |    INDEX FULL SCAN   | SYS_C0030271|  977K | 4883K |  1892 |  00:00:23 |
| 4   |    INDEX UNIQUE SCAN | SYS_C0030272|     1 |     5 |     0 |           |
-------------------------------------------+-----------------------------------+
Predicate Information:
----------------------
4 - access("T1"."C1"="T2"."C1")

Well, that is odd, three different queries sent in to the optimizer, and each time the optimizer showed the final query after transformation as the following:

SELECT 0 FROM "TESTUSER"."T2" "T2","TESTUSER"."T1" "T1" WHERE "T1"."C1"="T2"."C1"

Now which do you think is most efficient? 

The plans written in the 10053 trace file for Oracle 10.2.0.4 were slightly different:

Current SQL statement for this session:
DELETE FROM T1 WHERE C1 IN (SELECT C1 FROM T2)

============
Plan Table
============
---------------------------------------------+-----------------------------------+
| Id  | Operation               | Name       | Rows  | Bytes | Cost  | Time      |
---------------------------------------------+-----------------------------------+
| 0   | DELETE STATEMENT        |            |       |       |  1132 |           |
| 1   |  DELETE                 | T1         |       |       |       |           |
| 2   |   NESTED LOOPS          |            |  328K | 3283K |  1132 |  00:00:14 |
| 3   |    INDEX FAST FULL SCAN | SYS_C009783|  328K | 1642K |   173 |  00:00:03 |
| 4   |    INDEX UNIQUE SCAN    | SYS_C009782|     1 |     5 |     1 |  00:00:01 |
---------------------------------------------+-----------------------------------+

Predicate Information:
----------------------
4 - access("C1"="C1")

...
...
...
Current SQL statement for this session:
DELETE FROM T1 WHERE EXISTS (SELECT 1 FROM T2 WHERE T2.C1=T1.C1)

============
Plan Table
============
----------------------------------------------+-----------------------------------+
| Id  | Operation                | Name       | Rows  | Bytes | Cost  | Time      |
----------------------------------------------+-----------------------------------+
| 0   | DELETE STATEMENT         |            |       |       |  1687 |           |
| 1   |  DELETE                  | T1         |       |       |       |           |
| 2   |   NESTED LOOPS           |            |  328K | 3283K |  1687 |  00:00:21 |
| 3   |    SORT UNIQUE           |            |  328K | 1642K |   173 |  00:00:03 |
| 4   |     INDEX FAST FULL SCAN | SYS_C009783|  328K | 1642K |   173 |  00:00:03 |
| 5   |    INDEX UNIQUE SCAN     | SYS_C009782|     1 |     5 |     1 |  00:00:01 |
----------------------------------------------+-----------------------------------+

Predicate Information:
----------------------
5 - access("T2"."C1"="T1"."C1")

...
...
...
Current SQL statement for this session:
DELETE (SELECT C1 FROM T1 JOIN T2 USING (C1))

============
Plan Table
============
---------------------------------------------+-----------------------------------+
| Id  | Operation               | Name       | Rows  | Bytes | Cost  | Time      |
---------------------------------------------+-----------------------------------+
| 0   | DELETE STATEMENT        |            |       |       |  1132 |           |
| 1   |  DELETE                 | T1         |       |       |       |           |
| 2   |   NESTED LOOPS          |            |  328K | 3283K |  1132 |  00:00:14 |
| 3   |    INDEX FAST FULL SCAN | SYS_C009783|  328K | 1642K |   173 |  00:00:03 |
| 4   |    INDEX UNIQUE SCAN    | SYS_C009782|     1 |     5 |     1 |  00:00:01 |
---------------------------------------------+-----------------------------------+
Predicate Information:
----------------------
4 - access("T1"."C1"="T2"."C1")

Do not scroll down – you will spoil the follow up.

Let’s say that you test the performance on a system that uses direct, asynchronous I/O (no OS caching) with DBMS_XPLAN using the following script:

SET TIMING OFF
SET PAGESIZE 2000
SET LINESIZE 140

ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE;

SPOOL delete_methods.txt

DELETE /*+ GATHER_PLAN_STATISTICS */ FROM T1 WHERE C1 IN (SELECT C1 FROM T2);

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

ROLLBACK;

ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE;

DELETE /*+ GATHER_PLAN_STATISTICS */ FROM T1 WHERE EXISTS (SELECT 1 FROM T2 WHERE T2.C1=T1.C1);

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

ROLLBACK;

ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE;

DELETE /*+ GATHER_PLAN_STATISTICS */ (SELECT C1 FROM T1 JOIN T2 USING (C1));

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

ROLLBACK;

SPOOL OFF

Here is the output when I ran the above script (slightly cleaned up):

SQL> DELETE /*+ GATHER_PLAN_STATISTICS */ FROM T1 WHERE C1 IN (SELECT C1 FROM T2);

333000 rows deleted.

SQL>
SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

PLAN_TABLE_OUTPUT           
-------------------------------------------------------------------------------------------------------
SQL_ID  3ng6d061qazr1, child number 0
-------------------------------------
DELETE /*+ GATHER_PLAN_STATISTICS */ FROM T1 WHERE C1 IN (SELECT C1 FROM T2)                    

Plan hash value: 1028502382 

-------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name         | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
-------------------------------------------------------------------------------------------------------
|   0 | DELETE STATEMENT    |              |      1 |        |      0 |00:00:15.67 |     424K|  46351 |
|   1 |  DELETE             | T1           |      1 |        |      0 |00:00:15.67 |     424K|  46351 |
|   2 |   NESTED LOOPS      |              |      1 |    333K|    333K|00:00:01.10 |    7484 |   2577 |
|   3 |    INDEX FULL SCAN  | SYS_C0016151 |      1 |   1000K|   1000K|00:00:00.07 |    1877 |   1920 |
|*  4 |    INDEX UNIQUE SCAN| SYS_C0016152 |   1000K|      1 |    333K|00:00:01.08 |    5607 |    657 |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):              
---------------------------------------------------              
   4 - access("C1"="C1")    

SQL> ROLLBACK;

Rollback complete.

SQL> ALTER SYSTEM FLUSH BUFFER_CACHE;

System altered.

SQL> ALTER SYSTEM FLUSH BUFFER_CACHE;

System altered.

SQL> DELETE /*+ GATHER_PLAN_STATISTICS */ FROM T1 WHERE EXISTS (SELECT 1 FROM T2 WHERE T2.C1=T1.C1);

333000 rows deleted.

SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

PLAN_TABLE_OUTPUT           
-------------------------------------------------------------------------------------------------------
SQL_ID  1138j52zs4rqw, child number 0                            
-------------------------------------                            
DELETE /*+ GATHER_PLAN_STATISTICS */ FROM T1 WHERE EXISTS (SELECT 1 FROM T2 WHERE T2.C1=T1.C1)  

Plan hash value: 4152631912 

-------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name         | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
-------------------------------------------------------------------------------------------------------
|   0 | DELETE STATEMENT    |              |      1 |        |      0 |00:00:15.69 |     424K|  46355 |
|   1 |  DELETE             | T1           |      1 |        |      0 |00:00:15.69 |     424K|  46355 |
|   2 |   NESTED LOOPS SEMI |              |      1 |    333K|    333K|00:00:01.38 |    7484 |   2577 |
|   3 |    INDEX FULL SCAN  | SYS_C0016151 |      1 |   1000K|   1000K|00:00:00.02 |    1877 |   1920 |
|*  4 |    INDEX UNIQUE SCAN| SYS_C0016152 |   1000K|    110K|    333K|00:00:01.11 |    5607 |    657 |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):              
---------------------------------------------------              
   4 - access("T2"."C1"="T1"."C1")                               

SQL> ROLLBACK;

Rollback complete.

SQL> ALTER SYSTEM FLUSH BUFFER_CACHE;

System altered.

SQL> ALTER SYSTEM FLUSH BUFFER_CACHE;

System altered.

SQL> DELETE /*+ GATHER_PLAN_STATISTICS */ (SELECT C1 FROM T1 JOIN T2 USING (C1));

333000 rows deleted.

SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

PLAN_TABLE_OUTPUT           
-------------------------------------------------------------------------------------------------------
SQL_ID  c4xz24vawqqbw, child number 0                            
-------------------------------------                            
DELETE /*+ GATHER_PLAN_STATISTICS */ (SELECT C1 FROM T1 JOIN T2 USING (C1))                       

Plan hash value: 1028502382 

-------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name         | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
-------------------------------------------------------------------------------------------------------
|   0 | DELETE STATEMENT    |              |      1 |        |      0 |00:00:15.90 |     424K|  46355 |
|   1 |  DELETE             | T1           |      1 |        |      0 |00:00:15.90 |     424K|  46355 |
|   2 |   NESTED LOOPS      |              |      1 |    333K|    333K|00:00:03.69 |    7484 |   2577 |
|   3 |    INDEX FULL SCAN  | SYS_C0016151 |      1 |   1000K|   1000K|00:00:00.02 |    1877 |   1920 |
|*  4 |    INDEX UNIQUE SCAN| SYS_C0016152 |   1000K|      1 |    333K|00:00:01.13 |    5607 |    657 |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):              
---------------------------------------------------              
   4 - access("T1"."C1"="T2"."C1")                               

SQL> ROLLBACK;

Rollback complete.

So, the most efficient execution plan is the first one (by 0.02 seconds), and the least efficient is the last one (0.21 seconds slower than the second SQL statement).  Oh, but there is a problem – compare the steps in the first and last execution plans.  OK, now compare the steps in the first and second execution plans.

OK, so the question remains: Which is most efficient when deleting rows: EXISTS, IN, or a VIEW?  It is pretty hard to tell when the optimizer automatically re-writes the queries that we submit.








Follow

Get every new post delivered to your Inbox.

Join 144 other followers