Which is Most Efficient when Deleting Rows: EXISTS, IN, or a VIEW

29 12 2009

December 29, 2009

In the process of reading the book “Oracle SQL Recipes” I encountered a couple interesting statements.  One such statement is found on page 68 in recipe 3-5 Removing Rows Based on Data in Other Tables.  The book states the following:

“Whether you use IN or EXISTS depends on the sizes of the driving table (the outer table referenced in the SELECT, UPDATE, or DELETE) and the size of the result set in the subquery.  Using IN is most likely better if the results of the subquery are small or is a list of constants.  However, using EXISTS may run a lot more efficiently since the implicit JOIN may take advantage of indexes.”

The above comment sounds a bit like this one from the book “Expert Oracle Database 11g Administration” on page 1067 that might be describing just the behavior of SELECT statements, or might also be describing the behavior of DELETE statements:

“Subqueries perform better when you use IN rather than EXISTS.  Oracle recommends using the IN clause if the subquery has the selective WHERE clause.  If the parent query contains the selective WHERE clause, use the EXISTS clause rather than the IN clause.”

Both are interesting statements.  The “Oracle SQL Recipes” book provided three different SQL statements for removing rows in one table based on the contents of another table – those SQL statements used the demo data available for Oracle 11g.  I do not have the demo data loaded, so I wondered – what if I set up a simple test case?

Which method do you think will execute faster using the supplied test case?

The sample data, 1,000,000 rows in table T1 and 333,000 rows in T2:

CREATE TABLE T1 (
  C1 NUMBER,
  FILLER VARCHAR2(300),
  PRIMARY KEY (C1));

CREATE TABLE T2 (
  C1 NUMBER,
  FILLER VARCHAR2(300),
  PRIMARY KEY (C1));

INSERT INTO
  T1
SELECT
  ROWNUM,
  LPAD('A',300,'A')
FROM
  (SELECT
    ROWNUM NR
  FROM
    DUAL
  CONNECT BY
    LEVEL<=1000) V1,
  (SELECT
    ROWNUM NR
  FROM
    DUAL
  CONNECT BY
    LEVEL<=1000) V2;

INSERT INTO
  T2
SELECT
  ROWNUM*3,
  LPAD('A',300,'A')
FROM
  (SELECT
    ROWNUM NR
  FROM
    DUAL
  CONNECT BY
    LEVEL<=333) V1,
  (SELECT
    ROWNUM NR
  FROM
    DUAL
  CONNECT BY
    LEVEL<=1000) V2;

COMMIT;

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER, TABNAME=>'T1')
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER, TABNAME=>'T2')

The sample DELETE statements:

DELETE FROM T1 WHERE C1 IN (SELECT C1 FROM T2);

DELETE FROM T1 WHERE EXISTS (SELECT 1 FROM T2 WHERE T2.C1=T1.C1);

DELETE (SELECT C1 FROM T1 JOIN T2 USING (C1));

——–

Do not scroll down – you will spoil the follow up.  Which do you think is the most efficient method based on what is written above, and what you know about the behavior of recent Oracle releases (say 10.1 and later)?

Let’s say you ran the following script:

ALTER SESSION SET TRACEFILE_IDENTIFIER = 'DELETE_METHODS';
ALTER SESSION SET EVENTS '10053 trace name context forever, level 1';

ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE;

SET TIMING ON

SPOOL delete_methods.txt

DELETE FROM T1 WHERE C1 IN (SELECT C1 FROM T2);

ROLLBACK;

ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE;

DELETE FROM T1 WHERE EXISTS (SELECT 1 FROM T2 WHERE T2.C1=T1.C1);

ROLLBACK;

ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE;

DELETE (SELECT C1 FROM T1 JOIN T2 USING (C1));

ROLLBACK;

SPOOL OFF

ALTER SESSION SET EVENTS '10053 trace name context off';

Now which do you think is most efficient?  Do not scroll down - you will spoil the follow up.

Let’s say that in the Oracle Database 11.1.0.7 10053 trace file created by the above script, you find the following:

******************************************
----- Current SQL Statement for this session (sql_id=7mc4cvm67pns3) -----
DELETE FROM T1 WHERE C1 IN (SELECT C1 FROM T2)
*******************************************
...
Final query after transformations:******* UNPARSED QUERY IS *******
SELECT 0 FROM "TESTUSER"."T2" "T2","TESTUSER"."T1" "T1" WHERE "T1"."C1"="T2"."C1"
...
============
Plan Table
============
-------------------------------------------+-----------------------------------+
| Id  | Operation            | Name        | Rows  | Bytes | Cost  | Time      |
-------------------------------------------+-----------------------------------+
| 0   | DELETE STATEMENT     |             |       |       |  2025 |           |
| 1   |  DELETE              | T1          |       |       |       |           |
| 2   |   NESTED LOOPS       |             |  325K | 3252K |  2025 |  00:00:25 |
| 3   |    INDEX FULL SCAN   | SYS_C0030271|  977K | 4883K |  1892 |  00:00:23 |
| 4   |    INDEX UNIQUE SCAN | SYS_C0030272|     1 |     5 |     0 |           |
-------------------------------------------+-----------------------------------+
Predicate Information:
----------------------
4 - access("C1"="C1")

...
...
...

******************************************
----- Current SQL Statement for this session (sql_id=ahq5s54mw5a2g) -----
DELETE FROM T1 WHERE EXISTS (SELECT 1 FROM T2 WHERE T2.C1=T1.C1)
*******************************************
...
Final query after transformations:******* UNPARSED QUERY IS *******
SELECT 0 FROM "TESTUSER"."T2" "T2","TESTUSER"."T1" "T1" WHERE "T2"."C1"="T1"."C1"
...
============
Plan Table
============
-------------------------------------------+-----------------------------------+
| Id  | Operation            | Name        | Rows  | Bytes | Cost  | Time      |
-------------------------------------------+-----------------------------------+
| 0   | DELETE STATEMENT     |             |       |       |  2025 |           |
| 1   |  DELETE              | T1          |       |       |       |           |
| 2   |   NESTED LOOPS SEMI  |             |  325K | 3252K |  2025 |  00:00:25 |
| 3   |    INDEX FULL SCAN   | SYS_C0030271|  977K | 4883K |  1892 |  00:00:23 |
| 4   |    INDEX UNIQUE SCAN | SYS_C0030272|  108K |  541K |     0 |           |
-------------------------------------------+-----------------------------------+
Predicate Information:
----------------------
4 - access("T2"."C1"="T1"."C1")

...
...
...

******************************************
----- Current SQL Statement for this session (sql_id=61r3vxah952fu) -----
DELETE (SELECT C1 FROM T1 JOIN T2 USING (C1))
*******************************************
...
Final query after transformations:******* UNPARSED QUERY IS *******
SELECT 0 FROM "TESTUSER"."T2" "T2","TESTUSER"."T1" "T1" WHERE "T1"."C1"="T2"."C1"
...
============
Plan Table
============
-------------------------------------------+-----------------------------------+
| Id  | Operation            | Name        | Rows  | Bytes | Cost  | Time      |
-------------------------------------------+-----------------------------------+
| 0   | DELETE STATEMENT     |             |       |       |  2025 |           |
| 1   |  DELETE              | T1          |       |       |       |           |
| 2   |   NESTED LOOPS       |             |  325K | 3252K |  2025 |  00:00:25 |
| 3   |    INDEX FULL SCAN   | SYS_C0030271|  977K | 4883K |  1892 |  00:00:23 |
| 4   |    INDEX UNIQUE SCAN | SYS_C0030272|     1 |     5 |     0 |           |
-------------------------------------------+-----------------------------------+
Predicate Information:
----------------------
4 - access("T1"."C1"="T2"."C1")

Well, that is odd, three different queries sent in to the optimizer, and each time the optimizer showed the final query after transformation as the following:

SELECT 0 FROM "TESTUSER"."T2" "T2","TESTUSER"."T1" "T1" WHERE "T1"."C1"="T2"."C1"

Now which do you think is most efficient? 

The plans written in the 10053 trace file for Oracle 10.2.0.4 were slightly different:

Current SQL statement for this session:
DELETE FROM T1 WHERE C1 IN (SELECT C1 FROM T2)

============
Plan Table
============
---------------------------------------------+-----------------------------------+
| Id  | Operation               | Name       | Rows  | Bytes | Cost  | Time      |
---------------------------------------------+-----------------------------------+
| 0   | DELETE STATEMENT        |            |       |       |  1132 |           |
| 1   |  DELETE                 | T1         |       |       |       |           |
| 2   |   NESTED LOOPS          |            |  328K | 3283K |  1132 |  00:00:14 |
| 3   |    INDEX FAST FULL SCAN | SYS_C009783|  328K | 1642K |   173 |  00:00:03 |
| 4   |    INDEX UNIQUE SCAN    | SYS_C009782|     1 |     5 |     1 |  00:00:01 |
---------------------------------------------+-----------------------------------+

Predicate Information:
----------------------
4 - access("C1"="C1")

...
...
...
Current SQL statement for this session:
DELETE FROM T1 WHERE EXISTS (SELECT 1 FROM T2 WHERE T2.C1=T1.C1)

============
Plan Table
============
----------------------------------------------+-----------------------------------+
| Id  | Operation                | Name       | Rows  | Bytes | Cost  | Time      |
----------------------------------------------+-----------------------------------+
| 0   | DELETE STATEMENT         |            |       |       |  1687 |           |
| 1   |  DELETE                  | T1         |       |       |       |           |
| 2   |   NESTED LOOPS           |            |  328K | 3283K |  1687 |  00:00:21 |
| 3   |    SORT UNIQUE           |            |  328K | 1642K |   173 |  00:00:03 |
| 4   |     INDEX FAST FULL SCAN | SYS_C009783|  328K | 1642K |   173 |  00:00:03 |
| 5   |    INDEX UNIQUE SCAN     | SYS_C009782|     1 |     5 |     1 |  00:00:01 |
----------------------------------------------+-----------------------------------+

Predicate Information:
----------------------
5 - access("T2"."C1"="T1"."C1")

...
...
...
Current SQL statement for this session:
DELETE (SELECT C1 FROM T1 JOIN T2 USING (C1))

============
Plan Table
============
---------------------------------------------+-----------------------------------+
| Id  | Operation               | Name       | Rows  | Bytes | Cost  | Time      |
---------------------------------------------+-----------------------------------+
| 0   | DELETE STATEMENT        |            |       |       |  1132 |           |
| 1   |  DELETE                 | T1         |       |       |       |           |
| 2   |   NESTED LOOPS          |            |  328K | 3283K |  1132 |  00:00:14 |
| 3   |    INDEX FAST FULL SCAN | SYS_C009783|  328K | 1642K |   173 |  00:00:03 |
| 4   |    INDEX UNIQUE SCAN    | SYS_C009782|     1 |     5 |     1 |  00:00:01 |
---------------------------------------------+-----------------------------------+
Predicate Information:
----------------------
4 - access("T1"."C1"="T2"."C1")

Do not scroll down - you will spoil the follow up.

Let’s say that you test the performance on a system that uses direct, asynchronous I/O (no OS caching) with DBMS_XPLAN using the following script:

SET TIMING OFF
SET PAGESIZE 2000
SET LINESIZE 140

ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE;

SPOOL delete_methods.txt

DELETE /*+ GATHER_PLAN_STATISTICS */ FROM T1 WHERE C1 IN (SELECT C1 FROM T2);

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

ROLLBACK;

ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE;

DELETE /*+ GATHER_PLAN_STATISTICS */ FROM T1 WHERE EXISTS (SELECT 1 FROM T2 WHERE T2.C1=T1.C1);

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

ROLLBACK;

ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE;

DELETE /*+ GATHER_PLAN_STATISTICS */ (SELECT C1 FROM T1 JOIN T2 USING (C1));

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

ROLLBACK;

SPOOL OFF

Here is the output when I ran the above script (slightly cleaned up):

SQL> DELETE /*+ GATHER_PLAN_STATISTICS */ FROM T1 WHERE C1 IN (SELECT C1 FROM T2);

333000 rows deleted.

SQL>
SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

PLAN_TABLE_OUTPUT           
-------------------------------------------------------------------------------------------------------
SQL_ID  3ng6d061qazr1, child number 0
-------------------------------------
DELETE /*+ GATHER_PLAN_STATISTICS */ FROM T1 WHERE C1 IN (SELECT C1 FROM T2)                    

Plan hash value: 1028502382 

-------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name         | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
-------------------------------------------------------------------------------------------------------
|   0 | DELETE STATEMENT    |              |      1 |        |      0 |00:00:15.67 |     424K|  46351 |
|   1 |  DELETE             | T1           |      1 |        |      0 |00:00:15.67 |     424K|  46351 |
|   2 |   NESTED LOOPS      |              |      1 |    333K|    333K|00:00:01.10 |    7484 |   2577 |
|   3 |    INDEX FULL SCAN  | SYS_C0016151 |      1 |   1000K|   1000K|00:00:00.07 |    1877 |   1920 |
|*  4 |    INDEX UNIQUE SCAN| SYS_C0016152 |   1000K|      1 |    333K|00:00:01.08 |    5607 |    657 |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):              
---------------------------------------------------              
   4 - access("C1"="C1")    

SQL> ROLLBACK;

Rollback complete.

SQL> ALTER SYSTEM FLUSH BUFFER_CACHE;

System altered.

SQL> ALTER SYSTEM FLUSH BUFFER_CACHE;

System altered.

SQL> DELETE /*+ GATHER_PLAN_STATISTICS */ FROM T1 WHERE EXISTS (SELECT 1 FROM T2 WHERE T2.C1=T1.C1);

333000 rows deleted.

SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

PLAN_TABLE_OUTPUT           
-------------------------------------------------------------------------------------------------------
SQL_ID  1138j52zs4rqw, child number 0                            
-------------------------------------                            
DELETE /*+ GATHER_PLAN_STATISTICS */ FROM T1 WHERE EXISTS (SELECT 1 FROM T2 WHERE T2.C1=T1.C1)  

Plan hash value: 4152631912 

-------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name         | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
-------------------------------------------------------------------------------------------------------
|   0 | DELETE STATEMENT    |              |      1 |        |      0 |00:00:15.69 |     424K|  46355 |
|   1 |  DELETE             | T1           |      1 |        |      0 |00:00:15.69 |     424K|  46355 |
|   2 |   NESTED LOOPS SEMI |              |      1 |    333K|    333K|00:00:01.38 |    7484 |   2577 |
|   3 |    INDEX FULL SCAN  | SYS_C0016151 |      1 |   1000K|   1000K|00:00:00.02 |    1877 |   1920 |
|*  4 |    INDEX UNIQUE SCAN| SYS_C0016152 |   1000K|    110K|    333K|00:00:01.11 |    5607 |    657 |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):              
---------------------------------------------------              
   4 - access("T2"."C1"="T1"."C1")                               

SQL> ROLLBACK;

Rollback complete.

SQL> ALTER SYSTEM FLUSH BUFFER_CACHE;

System altered.

SQL> ALTER SYSTEM FLUSH BUFFER_CACHE;

System altered.

SQL> DELETE /*+ GATHER_PLAN_STATISTICS */ (SELECT C1 FROM T1 JOIN T2 USING (C1));

333000 rows deleted.

SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

PLAN_TABLE_OUTPUT           
-------------------------------------------------------------------------------------------------------
SQL_ID  c4xz24vawqqbw, child number 0                            
-------------------------------------                            
DELETE /*+ GATHER_PLAN_STATISTICS */ (SELECT C1 FROM T1 JOIN T2 USING (C1))                       

Plan hash value: 1028502382 

-------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name         | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
-------------------------------------------------------------------------------------------------------
|   0 | DELETE STATEMENT    |              |      1 |        |      0 |00:00:15.90 |     424K|  46355 |
|   1 |  DELETE             | T1           |      1 |        |      0 |00:00:15.90 |     424K|  46355 |
|   2 |   NESTED LOOPS      |              |      1 |    333K|    333K|00:00:03.69 |    7484 |   2577 |
|   3 |    INDEX FULL SCAN  | SYS_C0016151 |      1 |   1000K|   1000K|00:00:00.02 |    1877 |   1920 |
|*  4 |    INDEX UNIQUE SCAN| SYS_C0016152 |   1000K|      1 |    333K|00:00:01.13 |    5607 |    657 |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):              
---------------------------------------------------              
   4 - access("T1"."C1"="T2"."C1")                               

SQL> ROLLBACK;

Rollback complete.

So, the most efficient execution plan is the first one (by 0.02 seconds), and the least efficient is the last one (0.21 seconds slower than the second SQL statement).  Oh, but there is a problem – compare the steps in the first and last execution plans.  OK, now compare the steps in the first and second execution plans.

OK, so the question remains: Which is most efficient when deleting rows: EXISTS, IN, or a VIEW?  It is pretty hard to tell when the optimizer automatically re-writes the queries that we submit.


Actions

Information

2 responses

29 01 2010
Raju

Hi charles,

Could you please show me who you got 0.02 secs on the first stmt execution.
What did you total up?

Regards
Raju

29 01 2010
Charles Hooper

It might be a problem with the way I wrote that sentence:
“So, the most efficient execution plan is the first one (by 0.02 seconds), ”

The word “by” in this case was intended to indicate that the first execution completed 0.02 seconds faster than the second execution, based on the A-Time column in the plan output.
I compared 00:00:15.67 to 00:00:15.69 … 15.69 – 15.67 = 0.02 seconds

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s




Follow

Get every new post delivered to your Inbox.

Join 139 other followers

%d bloggers like this: