The Effects of Potential NULL Values in Row Sources During Updates Using an IN Subquery

10 01 2010

January 10, 2010

A couple of years ago the following question appeared in the comp.databases.oracle.server Usenet group:
http://groups.google.com/group/comp.databases.oracle.server/browse_thread/thread/f3c3dea939bf9c12

I’m doing the following update (in Oracle Database 9.2.0.5):

UPDATE table1 t1
SET field1 = NULL
WHERE field2 NOT IN (SELECT /*+ HASH_AJ */ t2.field2 FROM table2 t2)

Instead of doing the ANTIJOIN, The database is performing a FILTER on
table1 by reading table2. Why doesn’t the hint work? For several
tables other that table1 the hint does work.

I did not mention it at the time, but I am not sure if that hint is valid inside the subquery – I think that it needs to appear immediately after the UPDATE keyword.  Taking a guess at the original poster’s problem, I set up a simple test case using Oracle 10.2.0.2:

CREATE TABLE T1 (FIELD1 NUMBER(12), FIELD2 NUMBER(12) NOT NULL);
CREATE TABLE T2 (FIELD1 NUMBER(12), FIELD2 NUMBER(12) NOT NULL);

INSERT INTO
  T1
SELECT
  100,
  ROWNUM*3
FROM
  DUAL
CONNECT BY
  LEVEL<=100000;

INSERT INTO
  T2
SELECT
  100,
  ROWNUM*9
FROM
  DUAL
CONNECT BY
  LEVEL<=100000;

COMMIT;
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T1');
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T2');

I enabled a 10053 trace and then executed the following SQL statement:

UPDATE t1
SET field1 = NULL
WHERE field2 NOT IN (SELECT /*+ HASH_AJ */ t2.field2 FROM t2)

The 10053 trace file showed the following plan:

============
Plan Table
============
-----------------------------------------+-----------------------------------+
| Id  | Operation              | Name    | Rows  | Bytes | Cost  | Time      |
-----------------------------------------+-----------------------------------+
| 0   | UPDATE STATEMENT       |         |       |       |   307 |           |
| 1   |  UPDATE                | T1      |       |       |       |           |
| 2   |   HASH JOIN RIGHT ANTI |         |   208 |  2496 |   307 |  00:00:04 |
| 3   |    TABLE ACCESS FULL   | T2      |   98K |  488K |    44 |  00:00:01 |
| 4   |    TABLE ACCESS FULL   | T1      |   98K |  684K |    86 |  00:00:02 |
-----------------------------------------+-----------------------------------+

That looks like the plan that the OP would like to see.  Now, repeat the test with the following table definitions:

CREATE TABLE T1 (FIELD1 NUMBER(12), FIELD2 NUMBER(12));
CREATE TABLE T2 (FIELD1 NUMBER(12), FIELD2 NUMBER(12));

The 10053 trace file showed the following execution plan when executing the same SQL statement:

============
Plan Table
============
---------------------------------------+-----------------------------------+
| Id  | Operation            | Name    | Rows  | Bytes | Cost  | Time      |
---------------------------------------+-----------------------------------+
| 0   | UPDATE STATEMENT     |         |       |       | 3886K |           |
| 1   |  UPDATE              | T1      |       |       |       |           |
| 2   |   FILTER             |         |       |       |       |           |
| 3   |    TABLE ACCESS FULL | T1      |   98K |  684K |    44 |  00:00:01 |
| 4   |    TABLE ACCESS FULL | T2      |     1 |     5 |    45 |  00:00:01 |
---------------------------------------+-----------------------------------+

My last comment in the post suggested that the problem might be that even though there might not be any NULL values in the columns, the column definitions might permit NULL values, and that alone might restrict the options that are available to the optimizer for re-writing the SQL statement into a more efficient form.

Let’s try another test case.  Let’s create two tables and see how a similar update statement performs with a larger dataset, for instance two tables with 10,000,000 rows each.  The table creation script follows:

CREATE TABLE T1 (COL1 NUMBER(12), COL2 NUMBER(12));
CREATE TABLE T2 (COL1 NUMBER(12), COL2 NUMBER(12));

INSERT INTO
  T1
SELECT
  100,
  ROWNUM*3
FROM
  (SELECT
    ROWNUM RN
  FROM
    DUAL
  CONNECT BY
    LEVEL<=10000) V1,
  (SELECT
    ROWNUM RN
  FROM
    DUAL
  CONNECT BY
    LEVEL<=1000) V2;

INSERT INTO
  T2
SELECT
  100,
  ROWNUM*9
FROM
  (SELECT
    ROWNUM RN
  FROM
    DUAL
  CONNECT BY
    LEVEL<=10000) V1,
  (SELECT
    ROWNUM RN
  FROM
    DUAL
  CONNECT BY
    LEVEL<=1000) V2;

COMMIT;

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T1',ESTIMATE_PERCENT=>NULL)
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T2',ESTIMATE_PERCENT=>NULL)

The test begins on Oracle 11.1.0.7 and later shifts to Oracle 10.2.0.4.  In the test script, we enable a 10053 trace, execute the UPDATE statement, display the execution plan with runtime statistics, move the hint and re-execute the UPDATE statement, remove the hint and specify COL2 IS NOT NULL in the WHERE clause, modify the table columns to add a NOT NULL constraint and try again with no HASH_AJ hint.  The script follows:

SPOOL ANTIJOIN_TEST11.TXT
SET LINESIZE 150
SET PAGESIZE 2000

UPDATE /*+ GATHER_PLAN_STATISTICS */
  T1
SET
  COL1 = NULL
WHERE
  COL2 NOT IN (
    SELECT /*+ HASH_AJ */
      T2.COL2
    FROM
      T2);

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

ROLLBACK;

ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE;

ALTER SESSION SET TRACEFILE_IDENTIFIER = 'ANTIJOIN_TEST2';

UPDATE /*+ HASH_AJ GATHER_PLAN_STATISTICS */
  T1
SET
  COL1 = NULL
WHERE
  COL2 NOT IN (
    SELECT /*+  */
      T2.COL2
    FROM
      T2);

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

ROLLBACK;

ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SESSION SET TRACEFILE_IDENTIFIER = 'ANTIJOIN_TEST3';

UPDATE /*+ GATHER_PLAN_STATISTICS */
  T1
SET
  T1.COL1 = NULL
WHERE
  COL2 NOT IN (
    SELECT /*+  */
      T2.COL2
    FROM
      T2
    WHERE
      T2.COL2 IS NOT NULL)
  AND T1.COL2 IS NOT NULL;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

ROLLBACK;

ALTER TABLE T1 MODIFY COL2 NOT NULL;
ALTER TABLE T2 MODIFY COL2 NOT NULL;

ALTER SESSION SET TRACEFILE_IDENTIFIER = 'ANTIJOIN_TEST4';

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T1',ESTIMATE_PERCENT=>NULL,NO_INVALIDATE=>FALSE)
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T2',ESTIMATE_PERCENT=>NULL,NO_INVALIDATE=>FALSE)

ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE;

ALTER SESSION SET TRACEFILE_IDENTIFIER = 'ANTIJOIN_TEST5';
 
UPDATE /*+ GATHER_PLAN_STATISTICS */
  T1
SET
  COL1 = NULL
WHERE
  COL2 NOT IN (
    SELECT /*+  */
      T2.COL2
    FROM
      T2);
 
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));
 
ROLLBACK;

How did we do on Oracle 11.1.0.7?

The first update statement had an execution plan like this:

UPDATE /*+ GATHER_PLAN_STATISTICS */   T1 SET   COL1 = NULL WHERE                                                                          
COL2 NOT IN (     SELECT /*+ HASH_AJ */       T2.COL2     FROM       T2)                                                                   

Plan hash value: 875068713   

--------------------------------------------------------------------------------------------------------------------------------------------------      
| Id  | Operation                | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  | Writes |  OMem |  1Mem | Used-Mem | Used-Tmp|      
--------------------------------------------------------------------------------------------------------------------------------------------------      
|   0 | UPDATE STATEMENT         |      |      1 |        |      0 |00:04:07.39 |    6898K|  85474 |  36766 |       |       |          |         |      
|   1 |  UPDATE                  | T1   |      1 |        |      0 |00:04:07.39 |    6898K|  85474 |  36766 |       |       |          |         |      
|*  2 |   HASH JOIN RIGHT ANTI NA|      |      1 |   9999K|   6666K|00:00:30.58 |   38576 |  75353 |  36766 |   196M|    10M|   46M (1)|     298K|      
|   3 |    TABLE ACCESS FULL     | T2   |      1 |     10M|     10M|00:00:10.03 |   19288 |  19278 |      0 |       |       |          |         |      
|   4 |    TABLE ACCESS FULL     | T1   |      1 |     10M|     10M|00:00:00.12 |   19288 |  19278 |      0 |       |       |          |         |      
--------------------------------------------------------------------------------------------------------------------------------------------------      

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("COL2"="T2"."COL2")                 

According to the DBMS_XPLAN output, even without the table columns declared as NOT NULL, Oracle was able to use a special form of a NULL aware hash join anti – the query completed in just over 4 minutes.  According to the 10053 trace, only the GATHER_PLAN_STATISTICS hint was recognized.  The final query after transformation follows:

SELECT 0 FROM "TESTUSER"."T2" "T2","TESTUSER"."T1" "T1" WHERE "T1"."COL2"="T2"."COL2"

Moving the HASH_AJ hint so that it is next to the GATHER_PLAN_STATISTICS hint caused the optimizer to recognize the hint, but the 10053 trace file showed that the hint was not used.  The execution plan was identical to the execution plan above, although the actual time and number of consistent/current mode gets differed slightly.

Specifying the IS NOT NULL restriction in the WHERE clause did change the execution plan to a standard HASH JOIN ANTI:

UPDATE /*+ GATHER_PLAN_STATISTICS */   T1 SET   T1.COL1 = NULL WHERE                                                                       
COL2 NOT IN (     SELECT /*+  */       T2.COL2     FROM       T2                                                                           
WHERE       T2.COL2 IS NOT NULL)   AND T1.COL2 IS NOT NULL                                                                                 

Plan hash value: 2180616727  

-----------------------------------------------------------------------------------------------------------------------------------------------         
| Id  | Operation             | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  | Writes |  OMem |  1Mem | Used-Mem | Used-Tmp|         
-----------------------------------------------------------------------------------------------------------------------------------------------         
|   0 | UPDATE STATEMENT      |      |      1 |        |      0 |00:04:41.32 |    6926K|  84797 |  36766 |       |       |          |         |         
|   1 |  UPDATE               | T1   |      1 |        |      0 |00:04:41.32 |    6926K|  84797 |  36766 |       |       |          |         |         
|*  2 |   HASH JOIN RIGHT ANTI|      |      1 |   9999K|   6666K|00:00:57.74 |   38576 |  75353 |  36766 |   196M|    10M|   46M (1)|     298K|         
|*  3 |    TABLE ACCESS FULL  | T2   |      1 |     10M|     10M|00:00:00.09 |   19288 |  19278 |      0 |       |       |          |         |         
|*  4 |    TABLE ACCESS FULL  | T1   |      1 |     10M|     10M|00:00:00.12 |   19288 |  19278 |      0 |       |       |          |         |         
-----------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("COL2"="T2"."COL2")                 
   3 - filter("T2"."COL2" IS NOT NULL)            
   4 - filter("T1"."COL2" IS NOT NULL)

The number of consistent gets increased slightly, and so did the actual run time.

Let’s take a look at the statistics from the plan when the table columns have the NOT NULL constraint:

-----------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation             | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  | Writes |  OMem |  1Mem | Used-Mem | Used-Tmp|
-----------------------------------------------------------------------------------------------------------------------------------------------
|   0 | UPDATE STATEMENT      |      |      1 |        |      0 |00:04:47.51 |    6930K|  84435 |  36766 |       |       |          |         |
|   1 |  UPDATE               | T1   |      1 |        |      0 |00:04:47.51 |    6930K|  84435 |  36766 |       |       |          |         |
|*  2 |   HASH JOIN RIGHT ANTI|      |      1 |   9999K|   6666K|00:00:29.61 |   38576 |  75353 |  36766 |   196M|    10M|   46M (1)|     298K|
|   3 |    TABLE ACCESS FULL  | T2   |      1 |     10M|     10M|00:00:00.01 |   19288 |  19278 |      0 |       |       |          |         |
|   4 |    TABLE ACCESS FULL  | T1   |      1 |     10M|     10M|00:00:00.12 |   19288 |  19278 |      0 |       |       |          |         |
-----------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("COL2"="T2"."COL2")      

The same execution plan was used again, but the number of consistent gets and actual time increased even more.  In case you are wondering, the 10053 trace file shows that the final query after transformation is this:

SELECT 0 FROM "TESTUSER"."T2" "T2","TESTUSER"."T1" "T1" WHERE "T1"."COL2"="T2"."COL2"

—-

So, Oracle 11.1.0.7 is able to take advantage of the NULL aware “HASH JOIN RIGHT ANTI NA” operation.  Does this help?  Well, let’s try the test again on Oracle 10.2.0.4 running on the same computer.  The execution plan from the 10053 trace might look something like this for the first SQL statement with the hint in the wrong location:

UPDATE /*+ GATHER_PLAN_STATISTICS */   T1 SET   COL1 = NULL WHERE   COL2 NOT
IN (     SELECT /*+ HASH_AJ */       T2.COL2     FROM       T2)

============
Plan Table
============
---------------------------------------+-----------------------------------+
| Id  | Operation            | Name    | Rows  | Bytes | Cost  | Time      |
---------------------------------------+-----------------------------------+
| 0   | UPDATE STATEMENT     |         |       |       |   50G |           |
| 1   |  UPDATE              | T1      |       |       |       |           |
| 2   |   FILTER             |         |       |       |       |           |
| 3   |    TABLE ACCESS FULL | T1      | 9766K |   76M |  5332 |  00:01:04 |
| 4   |    TABLE ACCESS FULL | T2      |     1 |     6 |  5348 |  00:01:05 |
---------------------------------------+-----------------------------------+
Predicate Information:
----------------------
2 - filter( IS NULL)
4 - filter(LNNVL("T2"."COL2"<>:B1))

Note that the execution plan includes a FILTER operation on ID 2 of the plan – that is what the OP was seeing.  The final query after transformation looked something like this:

SELECT 0 FROM "TESTUSER"."T1" "SYS_ALIAS_1" WHERE  NOT EXISTS (SELECT /*+ HASH_AJ */ 0 FROM "TESTUSER"."T2" "T2" WHERE LNNVL("T2"."COL2"<>"SYS_ALIAS_1"."COL2"))

So, what did the execution plan look like with the statistics?  I don’t know – I killed it after a couple hours.  When I killed it, the SQL statement had processed 114,905,284 consistent or current mode gets and burned through 9,763.26 seconds of CPU time.

That just means we need a more powerful server, right?  OK, let’s try another experiment on a more powerful server.  This time, we will use a 10046 trace at level 8 using the query with the IS NOT NULL condition in the WHERE clause and gradually decrease the OPTIMIZER_FEATURES_ENABLED parameter from 11.1.0.7 to 10.2.0.1 to 9.2.0 to 8.1.7.

11.1.0.7:

Cursor 7   Ver 1   Parse at 0.000000  Similar Cnt 1
|PARSEs       1|CPU S    0.000000|CLOCK S    0.000000|ROWs        0|PHY RD BLKs         0|CON RD BLKs (Mem)         0|CUR RD BLKs (Mem)         0|SHARED POOL MISs      1|
|EXECs        1|CPU S   49.077915|CLOCK S   64.402548|ROWs  6666667|PHY RD BLKs     27342|CON RD BLKs (Mem)     39170|CUR RD BLKs (Mem)   6816007|SHARED POOL MISs      0|
|FETCHs       0|CPU S    0.000000|CLOCK S    0.000000|ROWs        0|PHY RD BLKs         0|CON RD BLKs (Mem)         0|CUR RD BLKs (Mem)         0|SHARED POOL MISs      0|

UPDATE
  T1
SET
  T1.COL1 = NULL
WHERE
  COL2 NOT IN (
    SELECT /*+  */
      T2.COL2
    FROM
      T2
    WHERE
      T2.COL2 IS NOT NULL)
  AND T1.COL2 IS NOT NULL

       (Rows 0)   UPDATE  T1 (cr=39170 pr=27342 pw=27311 time=0 us)
 (Rows 6666667)    HASH JOIN RIGHT ANTI (cr=38576 pr=27342 pw=27311 time=93577 us cost=28657 size=28 card=2)
(Rows 10000000)     TABLE ACCESS FULL T2 (cr=19288 pr=0 pw=0 time=0 us cost=5290 size=60000000 card=10000000)
(Rows 10000000)     TABLE ACCESS FULL T1 (cr=19288 pr=0 pw=0 time=15717 us cost=5290 size=80000000 card=10000000)

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  direct path write temp                        881        0.14          1.60
  direct path read temp                         882        0.40         12.03
  latch: checkpoint queue latch                   4        0.00          0.00
  log file switch completion                      8        0.54          1.01
  SQL*Net message to client                       1        0.00          0.00
  SQL*Net message from client                     1        0.00          0.00

64.4 seconds of total run time with 49.1 seconds of CPU time – I guess that is a bit faster.  Quite a few consistent gets and current mode gets.

10.2.0.1 and 9.2.0.1 generated the same execution plan as the above, so the statistics should also be about the same.

8.1.7:

Well, we ran into a problem with this one – I killed it after a bit over an hour.  The TYPICAL execution plan looked like this:

SQL_ID  11apc6grf5fph, child number 0
-------------------------------------
UPDATE   T1 SET   T1.COL1 = NULL WHERE   COL2 NOT IN (     SELECT /*+ 
*/       T2.COL2     FROM       T2     WHERE       T2.COL2 IS NOT NULL)
  AND T1.COL2 IS NOT NULL

Plan hash value: 3288325718

---------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost  | Inst   |
---------------------------------------------------------------------
|   0 | UPDATE STATEMENT    |      |       |       |  2926 |        |
|   1 |  UPDATE             | T1   |       |       |       |   OR11 |
|*  2 |   FILTER            |      |       |       |       |        |
|*  3 |    TABLE ACCESS FULL| T1   |   500K|  3906K|  2926 |   OR11 |
|*  4 |    TABLE ACCESS FULL| T2   |     1 |     6 |  2926 |   OR11 |
---------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter( IS NULL)
   3 - filter("T1"."COL2" IS NOT NULL)
   4 - filter(("T2"."COL2"=:B1 AND "T2"."COL2" IS NOT NULL))

That darn FILTER on ID 2 is back again, like what the OP saw.  What were the execution statistics at that point?  4853.41 seconds of CPU time and 212,520,861 consistent or current mode gets.  I know, let’s throw some more hardware at it, because that is the cheap and completely safe, risk-free solution, rather than doing a root cause analysis of the problem (that has to be true, I read it on the Internet – more than once 🙂 ).