December 29, 2009
In the process of reading the book “Oracle SQL Recipes” I encountered a couple interesting statements. One such statement is found on page 68 in recipe 3-5 Removing Rows Based on Data in Other Tables. The book states the following:
“Whether you use IN or EXISTS depends on the sizes of the driving table (the outer table referenced in the SELECT, UPDATE, or DELETE) and the size of the result set in the subquery. Using IN is most likely better if the results of the subquery are small or is a list of constants. However, using EXISTS may run a lot more efficiently since the implicit JOIN may take advantage of indexes.”
The above comment sounds a bit like this one from the book “Expert Oracle Database 11g Administration” on page 1067 that might be describing just the behavior of SELECT statements, or might also be describing the behavior of DELETE statements:
“Subqueries perform better when you use IN rather than EXISTS. Oracle recommends using the IN clause if the subquery has the selective WHERE clause. If the parent query contains the selective WHERE clause, use the EXISTS clause rather than the IN clause.”
Both are interesting statements. The “Oracle SQL Recipes” book provided three different SQL statements for removing rows in one table based on the contents of another table – those SQL statements used the demo data available for Oracle 11g. I do not have the demo data loaded, so I wondered – what if I set up a simple test case?
Which method do you think will execute faster using the supplied test case?
The sample data, 1,000,000 rows in table T1 and 333,000 rows in T2:
CREATE TABLE T1 ( C1 NUMBER, FILLER VARCHAR2(300), PRIMARY KEY (C1)); CREATE TABLE T2 ( C1 NUMBER, FILLER VARCHAR2(300), PRIMARY KEY (C1)); INSERT INTO T1 SELECT ROWNUM, LPAD('A',300,'A') FROM (SELECT ROWNUM NR FROM DUAL CONNECT BY LEVEL<=1000) V1, (SELECT ROWNUM NR FROM DUAL CONNECT BY LEVEL<=1000) V2; INSERT INTO T2 SELECT ROWNUM*3, LPAD('A',300,'A') FROM (SELECT ROWNUM NR FROM DUAL CONNECT BY LEVEL<=333) V1, (SELECT ROWNUM NR FROM DUAL CONNECT BY LEVEL<=1000) V2; COMMIT; EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER, TABNAME=>'T1') EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER, TABNAME=>'T2')
The sample DELETE statements:
DELETE FROM T1 WHERE C1 IN (SELECT C1 FROM T2); DELETE FROM T1 WHERE EXISTS (SELECT 1 FROM T2 WHERE T2.C1=T1.C1); DELETE (SELECT C1 FROM T1 JOIN T2 USING (C1));
——–
Do not scroll down – you will spoil the follow up. Which do you think is the most efficient method based on what is written above, and what you know about the behavior of recent Oracle releases (say 10.1 and later)?
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
Let’s say you ran the following script:
ALTER SESSION SET TRACEFILE_IDENTIFIER = 'DELETE_METHODS'; ALTER SESSION SET EVENTS '10053 trace name context forever, level 1'; ALTER SYSTEM FLUSH BUFFER_CACHE; ALTER SYSTEM FLUSH BUFFER_CACHE; SET TIMING ON SPOOL delete_methods.txt DELETE FROM T1 WHERE C1 IN (SELECT C1 FROM T2); ROLLBACK; ALTER SYSTEM FLUSH BUFFER_CACHE; ALTER SYSTEM FLUSH BUFFER_CACHE; DELETE FROM T1 WHERE EXISTS (SELECT 1 FROM T2 WHERE T2.C1=T1.C1); ROLLBACK; ALTER SYSTEM FLUSH BUFFER_CACHE; ALTER SYSTEM FLUSH BUFFER_CACHE; DELETE (SELECT C1 FROM T1 JOIN T2 USING (C1)); ROLLBACK; SPOOL OFF ALTER SESSION SET EVENTS '10053 trace name context off';
Now which do you think is most efficient? Do not scroll down – you will spoil the follow up.
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
Let’s say that in the Oracle Database 11.1.0.7 10053 trace file created by the above script, you find the following:
****************************************** ----- Current SQL Statement for this session (sql_id=7mc4cvm67pns3) ----- DELETE FROM T1 WHERE C1 IN (SELECT C1 FROM T2) ******************************************* ... Final query after transformations:******* UNPARSED QUERY IS ******* SELECT 0 FROM "TESTUSER"."T2" "T2","TESTUSER"."T1" "T1" WHERE "T1"."C1"="T2"."C1" ... ============ Plan Table ============ -------------------------------------------+-----------------------------------+ | Id | Operation | Name | Rows | Bytes | Cost | Time | -------------------------------------------+-----------------------------------+ | 0 | DELETE STATEMENT | | | | 2025 | | | 1 | DELETE | T1 | | | | | | 2 | NESTED LOOPS | | 325K | 3252K | 2025 | 00:00:25 | | 3 | INDEX FULL SCAN | SYS_C0030271| 977K | 4883K | 1892 | 00:00:23 | | 4 | INDEX UNIQUE SCAN | SYS_C0030272| 1 | 5 | 0 | | -------------------------------------------+-----------------------------------+ Predicate Information: ---------------------- 4 - access("C1"="C1") ... ... ... ****************************************** ----- Current SQL Statement for this session (sql_id=ahq5s54mw5a2g) ----- DELETE FROM T1 WHERE EXISTS (SELECT 1 FROM T2 WHERE T2.C1=T1.C1) ******************************************* ... Final query after transformations:******* UNPARSED QUERY IS ******* SELECT 0 FROM "TESTUSER"."T2" "T2","TESTUSER"."T1" "T1" WHERE "T2"."C1"="T1"."C1" ... ============ Plan Table ============ -------------------------------------------+-----------------------------------+ | Id | Operation | Name | Rows | Bytes | Cost | Time | -------------------------------------------+-----------------------------------+ | 0 | DELETE STATEMENT | | | | 2025 | | | 1 | DELETE | T1 | | | | | | 2 | NESTED LOOPS SEMI | | 325K | 3252K | 2025 | 00:00:25 | | 3 | INDEX FULL SCAN | SYS_C0030271| 977K | 4883K | 1892 | 00:00:23 | | 4 | INDEX UNIQUE SCAN | SYS_C0030272| 108K | 541K | 0 | | -------------------------------------------+-----------------------------------+ Predicate Information: ---------------------- 4 - access("T2"."C1"="T1"."C1") ... ... ... ****************************************** ----- Current SQL Statement for this session (sql_id=61r3vxah952fu) ----- DELETE (SELECT C1 FROM T1 JOIN T2 USING (C1)) ******************************************* ... Final query after transformations:******* UNPARSED QUERY IS ******* SELECT 0 FROM "TESTUSER"."T2" "T2","TESTUSER"."T1" "T1" WHERE "T1"."C1"="T2"."C1" ... ============ Plan Table ============ -------------------------------------------+-----------------------------------+ | Id | Operation | Name | Rows | Bytes | Cost | Time | -------------------------------------------+-----------------------------------+ | 0 | DELETE STATEMENT | | | | 2025 | | | 1 | DELETE | T1 | | | | | | 2 | NESTED LOOPS | | 325K | 3252K | 2025 | 00:00:25 | | 3 | INDEX FULL SCAN | SYS_C0030271| 977K | 4883K | 1892 | 00:00:23 | | 4 | INDEX UNIQUE SCAN | SYS_C0030272| 1 | 5 | 0 | | -------------------------------------------+-----------------------------------+ Predicate Information: ---------------------- 4 - access("T1"."C1"="T2"."C1")
Well, that is odd, three different queries sent in to the optimizer, and each time the optimizer showed the final query after transformation as the following:
SELECT 0 FROM "TESTUSER"."T2" "T2","TESTUSER"."T1" "T1" WHERE "T1"."C1"="T2"."C1"
Now which do you think is most efficient?
The plans written in the 10053 trace file for Oracle 10.2.0.4 were slightly different:
Current SQL statement for this session: DELETE FROM T1 WHERE C1 IN (SELECT C1 FROM T2) ============ Plan Table ============ ---------------------------------------------+-----------------------------------+ | Id | Operation | Name | Rows | Bytes | Cost | Time | ---------------------------------------------+-----------------------------------+ | 0 | DELETE STATEMENT | | | | 1132 | | | 1 | DELETE | T1 | | | | | | 2 | NESTED LOOPS | | 328K | 3283K | 1132 | 00:00:14 | | 3 | INDEX FAST FULL SCAN | SYS_C009783| 328K | 1642K | 173 | 00:00:03 | | 4 | INDEX UNIQUE SCAN | SYS_C009782| 1 | 5 | 1 | 00:00:01 | ---------------------------------------------+-----------------------------------+ Predicate Information: ---------------------- 4 - access("C1"="C1") ... ... ... Current SQL statement for this session: DELETE FROM T1 WHERE EXISTS (SELECT 1 FROM T2 WHERE T2.C1=T1.C1) ============ Plan Table ============ ----------------------------------------------+-----------------------------------+ | Id | Operation | Name | Rows | Bytes | Cost | Time | ----------------------------------------------+-----------------------------------+ | 0 | DELETE STATEMENT | | | | 1687 | | | 1 | DELETE | T1 | | | | | | 2 | NESTED LOOPS | | 328K | 3283K | 1687 | 00:00:21 | | 3 | SORT UNIQUE | | 328K | 1642K | 173 | 00:00:03 | | 4 | INDEX FAST FULL SCAN | SYS_C009783| 328K | 1642K | 173 | 00:00:03 | | 5 | INDEX UNIQUE SCAN | SYS_C009782| 1 | 5 | 1 | 00:00:01 | ----------------------------------------------+-----------------------------------+ Predicate Information: ---------------------- 5 - access("T2"."C1"="T1"."C1") ... ... ... Current SQL statement for this session: DELETE (SELECT C1 FROM T1 JOIN T2 USING (C1)) ============ Plan Table ============ ---------------------------------------------+-----------------------------------+ | Id | Operation | Name | Rows | Bytes | Cost | Time | ---------------------------------------------+-----------------------------------+ | 0 | DELETE STATEMENT | | | | 1132 | | | 1 | DELETE | T1 | | | | | | 2 | NESTED LOOPS | | 328K | 3283K | 1132 | 00:00:14 | | 3 | INDEX FAST FULL SCAN | SYS_C009783| 328K | 1642K | 173 | 00:00:03 | | 4 | INDEX UNIQUE SCAN | SYS_C009782| 1 | 5 | 1 | 00:00:01 | ---------------------------------------------+-----------------------------------+ Predicate Information: ---------------------- 4 - access("T1"."C1"="T2"."C1")
Do not scroll down – you will spoil the follow up.
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
Let’s say that you test the performance on a system that uses direct, asynchronous I/O (no OS caching) with DBMS_XPLAN using the following script:
SET TIMING OFF SET PAGESIZE 2000 SET LINESIZE 140 ALTER SYSTEM FLUSH BUFFER_CACHE; ALTER SYSTEM FLUSH BUFFER_CACHE; SPOOL delete_methods.txt DELETE /*+ GATHER_PLAN_STATISTICS */ FROM T1 WHERE C1 IN (SELECT C1 FROM T2); SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST')); ROLLBACK; ALTER SYSTEM FLUSH BUFFER_CACHE; ALTER SYSTEM FLUSH BUFFER_CACHE; DELETE /*+ GATHER_PLAN_STATISTICS */ FROM T1 WHERE EXISTS (SELECT 1 FROM T2 WHERE T2.C1=T1.C1); SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST')); ROLLBACK; ALTER SYSTEM FLUSH BUFFER_CACHE; ALTER SYSTEM FLUSH BUFFER_CACHE; DELETE /*+ GATHER_PLAN_STATISTICS */ (SELECT C1 FROM T1 JOIN T2 USING (C1)); SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST')); ROLLBACK; SPOOL OFF
Here is the output when I ran the above script (slightly cleaned up):
SQL> DELETE /*+ GATHER_PLAN_STATISTICS */ FROM T1 WHERE C1 IN (SELECT C1 FROM T2); 333000 rows deleted. SQL> SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST')); PLAN_TABLE_OUTPUT ------------------------------------------------------------------------------------------------------- SQL_ID 3ng6d061qazr1, child number 0 ------------------------------------- DELETE /*+ GATHER_PLAN_STATISTICS */ FROM T1 WHERE C1 IN (SELECT C1 FROM T2) Plan hash value: 1028502382 ------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | ------------------------------------------------------------------------------------------------------- | 0 | DELETE STATEMENT | | 1 | | 0 |00:00:15.67 | 424K| 46351 | | 1 | DELETE | T1 | 1 | | 0 |00:00:15.67 | 424K| 46351 | | 2 | NESTED LOOPS | | 1 | 333K| 333K|00:00:01.10 | 7484 | 2577 | | 3 | INDEX FULL SCAN | SYS_C0016151 | 1 | 1000K| 1000K|00:00:00.07 | 1877 | 1920 | |* 4 | INDEX UNIQUE SCAN| SYS_C0016152 | 1000K| 1 | 333K|00:00:01.08 | 5607 | 657 | ------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 4 - access("C1"="C1") SQL> ROLLBACK; Rollback complete. SQL> ALTER SYSTEM FLUSH BUFFER_CACHE; System altered. SQL> ALTER SYSTEM FLUSH BUFFER_CACHE; System altered. SQL> DELETE /*+ GATHER_PLAN_STATISTICS */ FROM T1 WHERE EXISTS (SELECT 1 FROM T2 WHERE T2.C1=T1.C1); 333000 rows deleted. SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST')); PLAN_TABLE_OUTPUT ------------------------------------------------------------------------------------------------------- SQL_ID 1138j52zs4rqw, child number 0 ------------------------------------- DELETE /*+ GATHER_PLAN_STATISTICS */ FROM T1 WHERE EXISTS (SELECT 1 FROM T2 WHERE T2.C1=T1.C1) Plan hash value: 4152631912 ------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | ------------------------------------------------------------------------------------------------------- | 0 | DELETE STATEMENT | | 1 | | 0 |00:00:15.69 | 424K| 46355 | | 1 | DELETE | T1 | 1 | | 0 |00:00:15.69 | 424K| 46355 | | 2 | NESTED LOOPS SEMI | | 1 | 333K| 333K|00:00:01.38 | 7484 | 2577 | | 3 | INDEX FULL SCAN | SYS_C0016151 | 1 | 1000K| 1000K|00:00:00.02 | 1877 | 1920 | |* 4 | INDEX UNIQUE SCAN| SYS_C0016152 | 1000K| 110K| 333K|00:00:01.11 | 5607 | 657 | ------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 4 - access("T2"."C1"="T1"."C1") SQL> ROLLBACK; Rollback complete. SQL> ALTER SYSTEM FLUSH BUFFER_CACHE; System altered. SQL> ALTER SYSTEM FLUSH BUFFER_CACHE; System altered. SQL> DELETE /*+ GATHER_PLAN_STATISTICS */ (SELECT C1 FROM T1 JOIN T2 USING (C1)); 333000 rows deleted. SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST')); PLAN_TABLE_OUTPUT ------------------------------------------------------------------------------------------------------- SQL_ID c4xz24vawqqbw, child number 0 ------------------------------------- DELETE /*+ GATHER_PLAN_STATISTICS */ (SELECT C1 FROM T1 JOIN T2 USING (C1)) Plan hash value: 1028502382 ------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | ------------------------------------------------------------------------------------------------------- | 0 | DELETE STATEMENT | | 1 | | 0 |00:00:15.90 | 424K| 46355 | | 1 | DELETE | T1 | 1 | | 0 |00:00:15.90 | 424K| 46355 | | 2 | NESTED LOOPS | | 1 | 333K| 333K|00:00:03.69 | 7484 | 2577 | | 3 | INDEX FULL SCAN | SYS_C0016151 | 1 | 1000K| 1000K|00:00:00.02 | 1877 | 1920 | |* 4 | INDEX UNIQUE SCAN| SYS_C0016152 | 1000K| 1 | 333K|00:00:01.13 | 5607 | 657 | ------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 4 - access("T1"."C1"="T2"."C1") SQL> ROLLBACK; Rollback complete.
So, the most efficient execution plan is the first one (by 0.02 seconds), and the least efficient is the last one (0.21 seconds slower than the second SQL statement). Oh, but there is a problem – compare the steps in the first and last execution plans. OK, now compare the steps in the first and second execution plans.
OK, so the question remains: Which is most efficient when deleting rows: EXISTS, IN, or a VIEW? It is pretty hard to tell when the optimizer automatically re-writes the queries that we submit.
Hi charles,
Could you please show me who you got 0.02 secs on the first stmt execution.
What did you total up?
Regards
Raju
It might be a problem with the way I wrote that sentence:
“So, the most efficient execution plan is the first one (by 0.02 seconds), ”
The word “by” in this case was intended to indicate that the first execution completed 0.02 seconds faster than the second execution, based on the A-Time column in the plan output.
I compared 00:00:15.67 to 00:00:15.69 … 15.69 – 15.67 = 0.02 seconds