Improving Performance by Using a Cartesian Join 2

18 05 2010

May 18, 2010

(Back to the Previous Post in the Series)

For a while I have wondered why Cartesian joins might happen when all necessary join conditions are provided in a SQL statement.  Sure, an explain plan will show that a Cartesian join is used when Oracle’s optimizer believes that one of the row sources will return a single row, as seen in this example from a discussion forum:

SELECT
  TBLPERSON.SNAME,
  TBLPERSON.FNAME,
  TO_CHAR(TBLWARDSHIFTS.SHIFTSTART,'HH24:MI') AS REQSTART,
  TO_CHAR(TBLWARDSHIFTS.SHIFTEND,'HH24:MI') AS REQEND,
  TBLREQUIREMENTS.RDATE
FROM
  TBLAVAILABILITY,
  TBLREQUIREMENTS,
  TBLWARDSHIFTS,
  TBLPERSON
WHERE
  TBLAVAILABILITY.ADATE = TBLREQUIREMENTS.RDATE
  AND TBLAVAILABILITY.PERSONID = TBLPERSON.PERSONID
  AND TBLREQUIREMENTS.WARDSHIFTID = TBLWARDSHIFTS.WARDSHIFTID
  AND TO_CHAR(TBLREQUIREMENTS.RDATE,'MM')=TO_CHAR(SYSDATE,'MM')
  AND TO_NUMBER(TO_CHAR(TBLWARDSHIFTS.SHIFTSTART,'HH24'))>=15
  AND TO_NUMBER(TO_CHAR(TBLWARDSHIFTS.SHIFTSTART,'HH24'))<20
  AND TBLAVAILABILITY.ANYLATE=1;

---------------------------------------------------------------------------------------------------------
| Id | Operation                      | Name            | Rows  | Bytes |TempSpc| Cost (%CPU)|Time      |
---------------------------------------------------------------------------------------------------------
|  1 |    NESTED LOOPS                |                 |       |       |       |            |          |
|  2 |     NESTED LOOPS               |                 |   233 | 15611 |       |  1469   (2)| 00:00:18 |
|* 3 |      HASH JOIN                 |                 |   233 | 10951 |       |  1003   (3)| 00:00:13 |
|* 4 |       TABLE ACCESS FULL        | TBLREQUIREMENTS |  5031 | 60372 |       |   980   (3)| 00:00:12 |
|  5 |       MERGE JOIN CARTESIAN     |                 |  3061 |   104K|       |    22   (0)| 00:00:01 |
|* 6 |        TABLE ACCESS FULL       | TBLWARDSHIFTS   |     1 |    20 |       |     3   (0)| 00:00:01 |
|  7 |        BUFFER SORT             |                 |  4639 | 69585 |       |    19   (0)| 00:00:01 |
|* 8 |         TABLE ACCESS FULL      | TBLAVAILABILITY |  4639 | 69585 |       |    19   (0)| 00:00:01 |
|* 9 |      INDEX UNIQUE SCAN         | PK5             |     1 |       |       |     1   (0)| 00:00:01 |
| 10 |     TABLE ACCESS BY INDEX ROWID| TBLPERSON       |     1 |    20 |       |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
  3 - access("TBLAVAILABILITY"."ADATE"="TBLREQUIREMENTS"."RDATE" AND
      "TBLREQUIREMENTS"."WARDSHIFTID"="TBLWARDSHIFTS"."WARDSHIFTID")
  4 - filter(TO_CHAR(INTERNAL_FUNCTION("TBLREQUIREMENTS"."RDATE"),'MM')=TO_CHAR(SYSDATE@!,'MM'))
  6 - filter(TO_NUMBER(TO_CHAR(INTERNAL_FUNCTION("TBLWARDSHIFTS"."SHIFTSTART"),'HH24'))>=15 AND
      TO_NUMBER(TO_CHAR(INTERNAL_FUNCTION("TBLWARDSHIFTS"."SHIFTSTART"),'HH24'))<20)
  8 - filter("TBLAVAILABILITY"."ANYLATE"=1)
  9 - access("TBLAVAILABILITY"."PERSONID"="TBLPERSON"."PERSONID")

We can see that all of the tables are linked:

TBLAVAILABILITY ~ TBLREQUIREMENTS
TBLAVAILABILITY ~ TBLPERSON
TBLREQUIREMENTS ~ TBLWARDSHIFTS

So, what could be leading Oracle to calculate that the predicates applied to TBLWARDSHIFTS will cause only a single row to be returned?  Might it be the functions that are applied to the TBLWARDSHIFTS.SHIFTSTART column and/or the two inequalities applied to that column in the WHERE clause?  What about out of date statistics?  Certainly, a 10053 trace might help solve part of the mystery.  An interesting comment by Tanel Poder in this blog article suggests that a MERGE JOIN CARTESIAN is similar to a NESTED LOOP operation, just without filtering, so maybe Cartesian joins are not all that bad.

The above example shows a case where a Cartesian join may hurt performance, but can it also help performance?  In this blog article comment I provided a test case that included unhinted and hinted (forced) execution plans with and without Cartesian joins.  For example, in one 11.1.0.7 database, the following SQL statement using the tables T1, T2, T3, and T4 from my previous blog article comment:

SELECT
  T1.C1,
  T2.C1,
  T1.C3,1,10 T1_C3,
  T2.C3 T2_C3
FROM
  T1,
  T2
WHERE
  T1.C2=T2.C2
  AND T1.C2 IN (
    SELECT
      C1
    FROM
      T3) 
  AND T2.C2 IN (
    SELECT
      C1
    FROM
      T4);

Produced this execution plan:

Plan hash value: 348785823

---------------------------------------------------------------------------------------
| Id  | Operation              | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |      |  2000K|   427M|       | 24447   (2)| 00:01:39 |
|*  1 |  HASH JOIN             |      |  2000K|   427M|  2488K| 24447   (2)| 00:01:39 |
|*  2 |   HASH JOIN            |      | 20000 |  2246K|       | 10984   (2)| 00:00:45 |
|   3 |    MERGE JOIN CARTESIAN|      |     2 |    12 |       |     5  (20)| 00:00:01 |
|   4 |     SORT UNIQUE        |      |     2 |     6 |       |     2   (0)| 00:00:01 |
|   5 |      TABLE ACCESS FULL | T4   |     2 |     6 |       |     2   (0)| 00:00:01 |
|   6 |     BUFFER SORT        |      |     2 |     6 |       |            |          |
|   7 |      SORT UNIQUE       |      |     2 |     6 |       |     2   (0)| 00:00:01 |
|   8 |       TABLE ACCESS FULL| T3   |     2 |     6 |       |     2   (0)| 00:00:01 |
|   9 |    TABLE ACCESS FULL   | T2   |  1000K|   103M|       | 10967   (1)| 00:00:45 |
|  10 |   TABLE ACCESS FULL    | T1   |  1000K|   103M|       | 10967   (1)| 00:00:45 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."C2"="T2"."C2" AND "T1"."C2"="C1")
   2 - access("T2"."C2"="C1")

Interesting, a merge join Cartesian operation between two unrelated tables – the tables in the subqueries that applied to tables T1 and T2.  Another Oracle 11.1.0.7 database produced this execution plan for the same unhinted SQL statement:

Plan hash value: 1754840566

----------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name      | Rows  | Bytes |TempSpc|Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |           |  4000K|   854M|       |42042   (4)| 00:08:25 |
|*  1 |  HASH JOIN RIGHT SEMI          |           |  4000K|   854M|       |42042   (4)| 00:08:25 |
|   2 |   TABLE ACCESS FULL            | T3        |     2 |     6 |       |    3   (0)| 00:00:01 |
|*  3 |   HASH JOIN                    |           |   200M|    41G|  2424K|41385   (2)| 00:08:17 |
|   4 |    NESTED LOOPS                |           |       |       |       |           |          |
|   5 |     NESTED LOOPS               |           | 20000 |  2187K|       |10026   (1)| 00:02:01 |
|   6 |      SORT UNIQUE               |           |     2 |     6 |       |    3   (0)| 00:00:01 |
|   7 |       TABLE ACCESS FULL        | T4        |     2 |     6 |       |    3   (0)| 00:00:01 |
|*  8 |      INDEX RANGE SCAN          | IND_T2_C2 | 10000 |       |       |   20   (0)| 00:00:01 |
|   9 |     TABLE ACCESS BY INDEX ROWID| T2        | 10000 |  1064K|       |10022   (1)| 00:02:01 |
|  10 |    TABLE ACCESS FULL           | T1        |  1000K|   103M|       |24864   (1)| 00:04:59 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."C2"="C1")
   3 - access("T1"."C2"="T2"."C2")
   8 - access("T2"."C2"="C1")

Notice that the MERGE JOIN CARTESIAN operation did not appear this time.  Despite the output of the Time column, the second server is considerably faster than the first.  To obtain the same execution plan on the second server I had to supply the following hint:

/*+ LEADING(T4, T3, T2, T1) USE_HASH(T1) USE_HASH(T2) */

With the above hint added to the SQL statement, the execution plan on the second server looked like this:

Plan hash value: 348785823

--------------------------------------------------------------------------------------
| Id  | Operation              | Name | Rows  | Bytes |TempSpc| Cost (%CPU)|Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |      |  4000K|   854M|       | 55716   (1)|00:11:09 |
|*  1 |  HASH JOIN             |      |  4000K|   854M|  4968K| 55716   (1)|00:11:09 |
|*  2 |   HASH JOIN            |      | 40000 |  4492K|       | 24874   (1)|00:04:59 |
|   3 |    MERGE JOIN CARTESIAN|      |     4 |    24 |       |     7  (15)|00:00:01 |
|   4 |     SORT UNIQUE        |      |     2 |     6 |       |     3   (0)|00:00:01 |
|   5 |      TABLE ACCESS FULL | T4   |     2 |     6 |       |     3   (0)|00:00:01 |
|   6 |     BUFFER SORT        |      |     2 |     6 |       |     4  (25)|00:00:01 |
|   7 |      SORT UNIQUE       |      |     2 |     6 |       |     3   (0)|00:00:01 |
|   8 |       TABLE ACCESS FULL| T3   |     2 |     6 |       |     3   (0)|00:00:01 |
|   9 |    TABLE ACCESS FULL   | T2   |  1000K|   103M|       | 24864   (1)|00:04:59 |
|  10 |   TABLE ACCESS FULL    | T1   |  1000K|   103M|       | 24864   (1)|00:04:59 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."C2"="T2"."C2" AND "T1"."C2"="C1")
   2 - access("T2"."C2"="C1")

As you can see, the predicted execution time is longer.  But, how accurate is the time prediction?  To make this a little more interesting, let’s try a couple of more hints on the second server:

/*+ LEADING(T1, T3, T4) */

Execution Plan
----------------------------------------------------------
Plan hash value: 3921125646

--------------------------------------------------------------------------------------
| Id  | Operation             | Name | Rows  | Bytes |TempSpc| Cost (%CPU)|Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |      |  2020K|   431M|       | 72263   (1)|00:14:28 |
|*  1 |  HASH JOIN            |      |  2020K|   431M|  2512K| 72263   (1)|00:14:28 |
|   2 |   MERGE JOIN CARTESIAN|      | 20202 |  2268K|       | 41547   (1)|00:08:19 |
|*  3 |    HASH JOIN SEMI     |      | 10101 |  1104K|   115M| 30593   (1)|00:06:08 |
|   4 |     TABLE ACCESS FULL | T1   |  1000K|   103M|       | 24864   (1)|00:04:59 |
|   5 |     TABLE ACCESS FULL | T3   |     2 |     6 |       |     3   (0)|00:00:01 |
|   6 |    BUFFER SORT        |      |     2 |     6 |       | 41544   (1)|00:08:19 |
|   7 |     SORT UNIQUE       |      |     2 |     6 |       |     1   (0)|00:00:01 |
|   8 |      TABLE ACCESS FULL| T4   |     2 |     6 |       |     1   (0)|00:00:01 |
|   9 |   TABLE ACCESS FULL   | T2   |  1000K|   103M|       | 24864   (1)|00:04:59 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."C2"="T2"."C2" AND "T2"."C2"="C1")
   3 - access("T1"."C2"="C1")

/*+ LEADING(T1, T3, T2) */

Plan hash value: 1368374064

-------------------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes |TempSpc| Cost (%CPU)|Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |  2020K|   431M|       | 56120   (2)|00:11:14 |
|*  1 |  HASH JOIN RIGHT SEMI|      |  2020K|   431M|       | 56120   (2)|00:11:14 |
|   2 |   TABLE ACCESS FULL  | T4   |     2 |     6 |       |     3   (0)|00:00:01 |
|*  3 |   HASH JOIN          |      |   101M|    20G|       | 55787   (1)|00:11:10 |
|*  4 |    HASH JOIN SEMI    |      | 10101 |  1104K|   115M| 30593   (1)|00:06:08 |
|   5 |     TABLE ACCESS FULL| T1   |  1000K|   103M|       | 24864   (1)|00:04:59 |
|   6 |     TABLE ACCESS FULL| T3   |     2 |     6 |       |     3   (0)|00:00:01 |
|   7 |    TABLE ACCESS FULL | T2   |  1000K|   103M|       | 24864   (1)|00:04:59 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T2"."C2"="C1")
   3 - access("T1"."C2"="T2"."C2")
   4 - access("T1"."C2"="C1")

Note that there was no Cartesian join in the above execution plan – that was the first hinted execution plan on the first server that did not yield a Cartesian join.

/*+ LEADING(T3, T4) */

Execution Plan
----------------------------------------------------------
Plan hash value: 2581882832

--------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name      | Rows  | Bytes |TempSpc|Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |           |  2000K|   427M|       |50767   (1)| 00:10:10 |
|*  1 |  HASH JOIN                    |           |  2000K|   427M|  2488K|50767   (1)| 00:10:10 |
|   2 |   NESTED LOOPS                |           |       |       |       |           |          |
|   3 |    NESTED LOOPS               |           | 20000 |  2246K|       |20052   (1)| 00:04:01 |
|   4 |     MERGE JOIN CARTESIAN      |           |     2 |    12 |       |    7  (15)| 00:00:01 |
|   5 |      SORT UNIQUE              |           |     2 |     6 |       |    3   (0)| 00:00:01 |
|   6 |       TABLE ACCESS FULL       | T3        |     2 |     6 |       |    3   (0)| 00:00:01 |
|   7 |      BUFFER SORT              |           |     2 |     6 |       |           |          |
|   8 |       SORT UNIQUE             |           |     2 |     6 |       |    3   (0)| 00:00:01 |
|   9 |        TABLE ACCESS FULL      | T4        |     2 |     6 |       |    3   (0)| 00:00:01 |
|* 10 |     INDEX RANGE SCAN          | IND_T2_C2 | 10000 |       |       |   20   (0)| 00:00:01 |
|  11 |    TABLE ACCESS BY INDEX ROWID| T2        | 10000 |  1064K|       |10022   (1)| 00:02:01 |
|  12 |   TABLE ACCESS FULL           | T1        |  1000K|   103M|       |24864   (1)| 00:04:59 |
--------------------------------------------------------------------------------------------------

Unfortunately, for the last execution plan I did not capture the predicate information section.

We need a script to determine which plan is fastest – this script will be run directly on the second server so that we eliminate the effects of network traffic.

SET AUTOTRACE TRACEONLY STATISTICS
SET ARRAYSIZE 1000
ALTER SESSION SET STATISTICS_LEVEL='ALL';
SET TIMING ON

SPOOL SMALLEST_TEST.TXT

ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE;

SELECT
  T1.C1,
  T2.C1,
  T1.C3,1,10 T1_C3,
  T2.C3 T2_C3
FROM
  T1,
  T2
WHERE
  T1.C2=T2.C2
  AND T1.C2 IN (
    SELECT
      C1
    FROM
      T3)
  AND T2.C2 IN (
    SELECT
      C1
    FROM
      T4);

ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE;

SELECT /*+ LEADING(T1, T3, T4) */
  T1.C1,
  T2.C1,
  T1.C3,1,10 T1_C3,
  T2.C3 T2_C3
FROM
  T1,
  T2
WHERE
  T1.C2=T2.C2
  AND T1.C2 IN (
    SELECT
      C1
    FROM
      T3)
  AND T2.C2 IN (
    SELECT
      C1
    FROM
      T4);

ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE;

SELECT /*+ LEADING(T1, T3, T2) */
  T1.C1,
  T2.C1,
  T1.C3,1,10 T1_C3,
  T2.C3 T2_C3
FROM
  T1,
  T2
WHERE
  T1.C2=T2.C2
  AND T1.C2 IN (
    SELECT
      C1
    FROM
      T3)
  AND T2.C2 IN (
    SELECT
      C1
    FROM
      T4);

ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE;

SELECT /*+ LEADING(T3, T4) */
  T1.C1,
  T2.C1,
  T1.C3,1,10 T1_C3,
  T2.C3 T2_C3
FROM
  T1,
  T2
WHERE
  T1.C2=T2.C2
  AND T1.C2 IN (
    SELECT
      C1
    FROM
      T3)
  AND T2.C2 IN (
    SELECT
      C1
    FROM
      T4);

ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE;

SELECT /*+ LEADING(T4, T3, T2, T1) USE_HASH(T1) USE_HASH(T2) */
  T1.C1,
  T2.C1,
  T1.C3,1,10 T1_C3,
  T2.C3 T2_C3
FROM
  T1,
  T2
WHERE
  T1.C2=T2.C2
  AND T1.C2 IN (
    SELECT
      C1
    FROM
      T3)
  AND T2.C2 IN (
    SELECT
      C1
    FROM
      T4);

SPOOL OFF

So, what was the output of the script?

SQL> SELECT
  2    T1.C1,
  3    T2.C1,
  4    T1.C3,1,10 T1_C3,
  5    T2.C3 T2_C3
  6  FROM
  7    T1,
  8    T2
  9  WHERE
 10    T1.C2=T2.C2
 11    AND T1.C2 IN (
 12      SELECT
 13        C1
 14      FROM
 15        T3)
 16    AND T2.C2 IN (
 17      SELECT
 18        C1
 19      FROM
 20        T4);

200000000 rows selected.

Elapsed: 00:09:15.90

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
     129202  consistent gets
     101906  physical reads
          0  redo size
 3032560885  bytes sent via SQL*Net to client
    2200489  bytes received via SQL*Net from client
     200001  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
  200000000  rows processed

---

SQL> SELECT /*+ LEADING(T1, T3, T4) */
  2    T1.C1,
  3    T2.C1,
  4    T1.C3,1,10 T1_C3,
  5    T2.C3 T2_C3
  6  FROM
  7    T1,
  8    T2
  9  WHERE
 10    T1.C2=T2.C2
 11    AND T1.C2 IN (
 12      SELECT
 13        C1
 14      FROM
 15        T3)
 16    AND T2.C2 IN (
 17      SELECT
 18        C1
 19      FROM
 20        T4);

200000000 rows selected.

Elapsed: 00:09:59.15

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
     200126  consistent gets
     181906  physical reads
          0  redo size
 3032560885  bytes sent via SQL*Net to client
    2200489  bytes received via SQL*Net from client
     200001  SQL*Net roundtrips to/from client
          2  sorts (memory)
          0  sorts (disk)
  200000000  rows processed

---

SQL> SELECT /*+ LEADING(T1, T3, T2) */
  2    T1.C1,
  3    T2.C1,
  4    T1.C3,1,10 T1_C3,
  5    T2.C3 T2_C3
  6  FROM
  7    T1,
  8    T2
  9  WHERE
 10    T1.C2=T2.C2
 11    AND T1.C2 IN (
 12      SELECT
 13        C1
 14      FROM
 15        T3)
 16    AND T2.C2 IN (
 17      SELECT
 18        C1
 19      FROM
 20        T4);

200000000 rows selected.

Elapsed: 00:10:18.07

---

SQL> SELECT /*+ LEADING(T3, T4) */
  2    T1.C1,
  3    T2.C1,
  4    T1.C3,1,10 T1_C3,
  5    T2.C3 T2_C3
  6  FROM
  7    T1,
  8    T2
  9  WHERE
 10    T1.C2=T2.C2
 11    AND T1.C2 IN (
 12      SELECT
 13        C1
 14      FROM
 15        T3)
 16    AND T2.C2 IN (
 17      SELECT
 18        C1
 19      FROM
 20        T4);

200000000 rows selected.

Elapsed: 00:09:51.72

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
     149245  consistent gets
     101906  physical reads
          0  redo size
 3032560885  bytes sent via SQL*Net to client
    2200489  bytes received via SQL*Net from client
     200001  SQL*Net roundtrips to/from client
          3  sorts (memory)
          0  sorts (disk)
  200000000  rows processed

---

SQL> SELECT /*+ LEADING(T4, T3, T2, T1) USE_HASH(T1) USE_HASH(T2) */
  2    T1.C1,
  3    T2.C1,
  4    T1.C3,1,10 T1_C3,
  5    T2.C3 T2_C3
  6  FROM
  7    T1,
  8    T2
  9  WHERE
 10    T1.C2=T2.C2
 11    AND T1.C2 IN (
 12      SELECT
 13        C1
 14      FROM
 15        T3)
 16    AND T2.C2 IN (
 17      SELECT
 18        C1
 19      FROM
 20        T4);

200000000 rows selected.

Elapsed: 00:08:47.07

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
     200126  consistent gets
     181906  physical reads
          0  redo size
 3032560885  bytes sent via SQL*Net to client
    2200489  bytes received via SQL*Net from client
     200001  SQL*Net roundtrips to/from client
          3  sorts (memory)
          0  sorts (disk)
  200000000  rows processed

The result?  The heavily hinted execution plan that reproduced the execution plan containing the Cartesian join found on the first server completed roughly 29 seconds faster than the unhinted execution plan that did not use a Cartesian join.  Not all Cartesian joins are bad.

You are probably wondering what the output from the first server looked like.  I think that the best answer is that the results are “in-doubt”.  The first server was queried over the network with the default array fetch size of 15.  The results follow (note that the execution plans may be different than those found on the second server):

SQL> SELECT /*+ LEADING(T1, T3, T4) */
  2    T1.C1,
  3    T2.C1,
  4    T1.C3,1,10 T1_C3,
  5    T2.C3 T2_C3
  6  FROM
  7    T1,
  8    T2
  9  WHERE
 10    T1.C2=T2.C2
 11    AND T1.C2 IN (
 12      SELECT
 13        C1
 14      FROM
 15        T3)
 16    AND T2.C2 IN (
 17      SELECT
 18        C1
 19      FROM
 20        T4);

200000000 rows selected.

Elapsed: 01:02:34.25

Statistics
----------------------------------------------------------
        114  recursive calls
          0  db block gets
     200110  consistent gets
     189480  physical reads
          0  redo size
SP2-0642: SQL*Plus internal error state 1075, context 1:5:4294967295
Unsafe to proceed
  146667044  bytes received via SQL*Net from client
   13333335  SQL*Net roundtrips to/from client
          2  sorts (memory)
          0  sorts (disk)
  200000000  rows processed

That output does not look good.  Should we continue?

SQL> SELECT /*+ LEADING(T1, T3, T2) */
  2    T1.C1,
  3    T2.C1,
  4    T1.C3,1,10 T1_C3,
  5    T2.C3 T2_C3
  6  FROM
  7    T1,
  8    T2
  9  WHERE
 10    T1.C2=T2.C2
 11    AND T1.C2 IN (
 12      SELECT
 13        C1
 14      FROM
 15        T3)
 16    AND T2.C2 IN (
 17      SELECT
 18        C1
 19      FROM
 20        T4);

200000000 rows selected.

Elapsed: 01:04:18.54

Statistics
----------------------------------------------------------
        115  recursive calls
          0  db block gets
     200110  consistent gets
     189480  physical reads
          0  redo size
SP2-0642: SQL*Plus internal error state 1075, context 1:5:4294967295
Unsafe to proceed
  146667044  bytes received via SQL*Net from client
   13333335  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
  200000000  rows processed

Unsafe to proceed?  OK, enough pain.  Let’s check the execution plans from the first server:

SET AUTOTRACE OFF
SET PAGESIZE 1000

SET LINESIZE 160

SELECT /*+ LEADING(S) */
  T.PLAN_TABLE_OUTPUT
FROM
  (SELECT
    SQL_ID,
    CHILD_NUMBER
  FROM
    V$SQL
  WHERE
    SQL_TEXT LIKE 'SELECT%'
    AND SQL_TEXT LIKE '%T1.C2=T2.C2%'
    AND SQL_TEXT LIKE '%T4)') S,
  TABLE(DBMS_XPLAN.DISPLAY_CURSOR(S.SQL_ID,S.CHILD_NUMBER,'ALLSTATS LAST +COST')) T;

The output (note that the STATISTICS_LEVEL was changed from TYPICAL to ALL when the queries were actually executed, hence the multiple child cursors for some of the SQL statements, and the warning that appears in the Note section for some of the execution plans):

SQL_ID  b0pk35tnkh80n, child number 0
-------------------------------------
SELECT /*+ LEADING(T1, T3, T2) */   T1.C1,   T2.C1,   T1.C3,1,10 T1_C3,
  T2.C3 T2_C3 FROM   T1,   T2 WHERE   T1.C2=T2.C2   AND T1.C2 IN (
SELECT       C1     FROM       T3)   AND T2.C2 IN (     SELECT       C1
    FROM       T4)

Plan hash value: 1368374064

--------------------------------------------------------------------------------------
| Id  | Operation            | Name | E-Rows | Cost (%CPU)|  OMem |  1Mem | Used-Mem |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |        | 29186 (100)|       |       |          |
|*  1 |  HASH JOIN RIGHT SEMI|      |   2020K| 29186   (9)|   935K|   935K|          |
|   2 |   TABLE ACCESS FULL  | T4   |      2 |     2   (0)|       |       |          |
|*  3 |   HASH JOIN          |      |    101M| 27997   (6)|  1949K|   945K|          |
|*  4 |    HASH JOIN SEMI    |      |  10101 | 13397   (1)|   133M|  7573K|          |
|   5 |     TABLE ACCESS FULL| T1   |   1000K| 10967   (1)|       |       |          |
|   6 |     TABLE ACCESS FULL| T3   |      2 |     2   (0)|       |       |          |
|   7 |    TABLE ACCESS FULL | T2   |   1000K| 10967   (1)|       |       |          |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T2"."C2"="C1")
   3 - access("T1"."C2"="T2"."C2")
   4 - access("T1"."C2"="C1")

Note
-----
   - Warning: basic plan statistics not available. These are only collected when:
       * hint 'gather_plan_statistics' is used for the statement or
       * parameter 'statistics_level' is set to 'ALL', at session or system level

SQL_ID  bkdkvvppuhta3, child number 0
-------------------------------------
SELECT   T1.C1,   T2.C1,   T1.C3,1,10 T1_C3,   T2.C3 T2_C3 FROM   T1,
T2 WHERE   T1.C2=T2.C2   AND T1.C2 IN (     SELECT       C1     FROM
   T3)   AND T2.C2 IN (     SELECT       C1     FROM       T4)

Plan hash value: 348785823

----------------------------------------------------------------------------------------
| Id  | Operation              | Name | E-Rows | Cost (%CPU)|  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |      |        | 24447 (100)|       |       |          |
|*  1 |  HASH JOIN             |      |   2000K| 24447   (2)|  3361K|   944K|          |
|*  2 |   HASH JOIN            |      |  20000 | 10984   (2)|   878K|   878K|          |
|   3 |    MERGE JOIN CARTESIAN|      |      2 |     5  (20)|       |       |          |
|   4 |     SORT UNIQUE        |      |      2 |     2   (0)| 73728 | 73728 |          |
|   5 |      TABLE ACCESS FULL | T4   |      2 |     2   (0)|       |       |          |
|   6 |     BUFFER SORT        |      |      2 |            | 73728 | 73728 |          |
|   7 |      SORT UNIQUE       |      |      2 |     2   (0)| 73728 | 73728 |          |
|   8 |       TABLE ACCESS FULL| T3   |      2 |     2   (0)|       |       |          |
|   9 |    TABLE ACCESS FULL   | T2   |   1000K| 10967   (1)|       |       |          |
|  10 |   TABLE ACCESS FULL    | T1   |   1000K| 10967   (1)|       |       |          |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."C2"="T2"."C2" AND "T1"."C2"="C1")
   2 - access("T2"."C2"="C1")

Note
-----
   - Warning: basic plan statistics not available. These are only collected when:
       * hint 'gather_plan_statistics' is used for the statement or
       * parameter 'statistics_level' is set to 'ALL', at session or system level

SQL_ID  bkdkvvppuhta3, child number 1
-------------------------------------
SELECT   T1.C1,   T2.C1,   T1.C3,1,10 T1_C3,   T2.C3 T2_C3 FROM   T1,
T2 WHERE   T1.C2=T2.C2   AND T1.C2 IN (     SELECT       C1     FROM
   T3)   AND T2.C2 IN (     SELECT       C1     FROM       T4)

Plan hash value: 348785823

------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation              | Name | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |      |      1 |        | 24447 (100)|    200M|00:00:17.51 |     200K|    181K|       |       |          |
|*  1 |  HASH JOIN             |      |      1 |   2000K| 24447   (2)|    200M|00:00:17.51 |     200K|    181K|  6223K|  1897K| 6684K (0)|
|*  2 |   HASH JOIN            |      |      1 |  20000 | 10984   (2)|  40000 |00:00:15.50 |   90967 |  90960 |   968K|   968K|  464K (0)|
|   3 |    MERGE JOIN CARTESIAN|      |      1 |      2 |     5  (20)|      4 |00:00:00.02 |       6 |      4 |       |       |          |
|   4 |     SORT UNIQUE        |      |      1 |      2 |     2   (0)|      2 |00:00:00.02 |       3 |      2 |  2048 |  2048 | 2048  (0)|
|   5 |      TABLE ACCESS FULL | T4   |      1 |      2 |     2   (0)|      2 |00:00:00.02 |       3 |      2 |       |       |          |
|   6 |     BUFFER SORT        |      |      2 |      2 |            |      4 |00:00:00.01 |       3 |      2 |  2048 |  2048 | 2048  (0)|
|   7 |      SORT UNIQUE       |      |      1 |      2 |     2   (0)|      2 |00:00:00.01 |       3 |      2 |  2048 |  2048 | 2048  (0)|
|   8 |       TABLE ACCESS FULL| T3   |      1 |      2 |     2   (0)|      2 |00:00:00.01 |       3 |      2 |       |       |          |
|   9 |    TABLE ACCESS FULL   | T2   |      1 |   1000K| 10967   (1)|   1000K|00:00:15.02 |   90961 |  90956 |       |       |          |
|  10 |   TABLE ACCESS FULL    | T1   |      1 |   1000K| 10967   (1)|   1000K|00:00:01.02 |     109K|  90956 |       |       |          |
------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."C2"="T2"."C2" AND "T1"."C2"="C1")
   2 - access("T2"."C2"="C1")

SQL_ID  2drtx01hd21jb, child number 0
-------------------------------------
SELECT /*+ LEADING(T1, T3, T4) */   T1.C1,   T2.C1,   T1.C3,1,10 T1_C3,
  T2.C3 T2_C3 FROM   T1,   T2 WHERE   T1.C2=T2.C2   AND T1.C2 IN (
SELECT       C1     FROM       T3)   AND T2.C2 IN (     SELECT       C1
    FROM       T4)

Plan hash value: 3921125646

---------------------------------------------------------------------------------------
| Id  | Operation             | Name | E-Rows | Cost (%CPU)|  OMem |  1Mem | Used-Mem |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |      |        | 31686 (100)|       |       |          |
|*  1 |  HASH JOIN            |      |   2020K| 31686   (1)|  3420K|   944K|          |
|   2 |   MERGE JOIN CARTESIAN|      |  20202 | 18221   (1)|       |       |          |
|*  3 |    HASH JOIN SEMI     |      |  10101 | 13397   (1)|   133M|  7573K|          |
|   4 |     TABLE ACCESS FULL | T1   |   1000K| 10967   (1)|       |       |          |
|   5 |     TABLE ACCESS FULL | T3   |      2 |     2   (0)|       |       |          |
|   6 |    BUFFER SORT        |      |      2 | 18219   (1)| 73728 | 73728 |          |
|   7 |     SORT UNIQUE       |      |      2 |     0   (0)| 73728 | 73728 |          |
|   8 |      TABLE ACCESS FULL| T4   |      2 |     0   (0)|       |       |          |
|   9 |   TABLE ACCESS FULL   | T2   |   1000K| 10967   (1)|       |       |          |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."C2"="T2"."C2" AND "T2"."C2"="C1")
   3 - access("T1"."C2"="C1")

Note
-----
   - Warning: basic plan statistics not available. These are only collected when:
       * hint 'gather_plan_statistics' is used for the statement or
       * parameter 'statistics_level' is set to 'ALL', at session or system level

SQL_ID  2drtx01hd21jb, child number 1
-------------------------------------
SELECT /*+ LEADING(T1, T3, T4) */   T1.C1,   T2.C1,   T1.C3,1,10 T1_C3,
  T2.C3 T2_C3 FROM   T1,   T2 WHERE   T1.C2=T2.C2   AND T1.C2 IN (
SELECT       C1     FROM       T3)   AND T2.C2 IN (     SELECT       C1
    FROM       T4)

Plan hash value: 3921125646

------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation             | Name | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  | Writes |  OMem |  1Mem | Used-Mem | Used-Tmp|
------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |      |      1 |        | 31686 (100)|    200M|00:00:29.65 |     200K|    189K|  13702 |       |       |          |         |
|*  1 |  HASH JOIN            |      |      1 |   2020K| 31686   (1)|    200M|00:00:29.65 |     200K|    189K|  13702 |  6223K|  1897K| 6659K (0)|         |
|   2 |   MERGE JOIN CARTESIAN|      |      1 |  20202 | 18221   (1)|  40000 |00:00:27.35 |   90967 |  98524 |  13702 |       |       |          |         |
|*  3 |    HASH JOIN SEMI     |      |      1 |  10101 | 13397   (1)|  20000 |00:00:27.31 |   90964 |  98522 |  13702 |   126M|  7603K|   83  (1)|     114K|
|   4 |     TABLE ACCESS FULL | T1   |      1 |   1000K| 10967   (1)|   1000K|00:00:17.00 |   90961 |  90956 |      0 |       |       |          |         |
|   5 |     TABLE ACCESS FULL | T3   |      1 |      2 |     2   (0)|      2 |00:00:00.16 |       3 |      2 |      0 |       |       |          |         |
|   6 |    BUFFER SORT        |      |  20000 |      2 | 18219   (1)|  40000 |00:00:00.04 |       3 |      2 |      0 |  2048 |  2048 | 2048  (0)|         |
|   7 |     SORT UNIQUE       |      |      1 |      2 |     0   (0)|      2 |00:00:00.04 |       3 |      2 |      0 |  2048 |  2048 | 2048  (0)|         |
|   8 |      TABLE ACCESS FULL| T4   |      1 |      2 |     0   (0)|      2 |00:00:00.04 |       3 |      2 |      0 |       |       |          |         |
|   9 |   TABLE ACCESS FULL   | T2   |      1 |   1000K| 10967   (1)|   1000K|00:00:01.02 |     109K|  90956 |      0 |       |       |          |         |
------------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."C2"="T2"."C2" AND "T2"."C2"="C1")
   3 - access("T1"."C2"="C1")

SQL_ID  91wazrzfau2wq, child number 0
-------------------------------------
SELECT /*+ LEADING(T3, T4) */   T1.C1,   T2.C1,   T1.C3,1,10 T1_C3,
T2.C3 T2_C3 FROM   T1,   T2 WHERE   T1.C2=T2.C2   AND T1.C2 IN (
SELECT       C1     FROM       T3)   AND T2.C2 IN (     SELECT       C1
    FROM       T4)

Plan hash value: 331368001

----------------------------------------------------------------------------------------
| Id  | Operation              | Name | E-Rows | Cost (%CPU)|  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |      |        | 24447 (100)|       |       |          |
|*  1 |  HASH JOIN             |      |   2000K| 24447   (2)|  3361K|   944K|          |
|*  2 |   HASH JOIN            |      |  20000 | 10984   (2)|   878K|   878K|          |
|   3 |    MERGE JOIN CARTESIAN|      |      2 |     5  (20)|       |       |          |
|   4 |     SORT UNIQUE        |      |      2 |     2   (0)| 73728 | 73728 |          |
|   5 |      TABLE ACCESS FULL | T3   |      2 |     2   (0)|       |       |          |
|   6 |     BUFFER SORT        |      |      2 |            | 73728 | 73728 |          |
|   7 |      SORT UNIQUE       |      |      2 |     2   (0)| 73728 | 73728 |          |
|   8 |       TABLE ACCESS FULL| T4   |      2 |     2   (0)|       |       |          |
|   9 |    TABLE ACCESS FULL   | T2   |   1000K| 10967   (1)|       |       |          |
|  10 |   TABLE ACCESS FULL    | T1   |   1000K| 10967   (1)|       |       |          |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."C2"="T2"."C2" AND "T1"."C2"="C1")
   2 - access("T2"."C2"="C1")

Note
-----
   - Warning: basic plan statistics not available. These are only collected when:
       * hint 'gather_plan_statistics' is used for the statement or
       * parameter 'statistics_level' is set to 'ALL', at session or system level

I will repeat – not all Cartesian joins are bad for performance.  But, of course, it is not always obvious which Cartesian joins are good, which are bad, and which are just OK without testing.





Query Costs

10 05 2010

May 10, 2010

It is a bit curious to see the number of message threads discussing calculated query costs that appeared last week on the OTN forums:

  • Oracle Selects a Plan with Higher Cost? – an interesting thread where Jonathan Lewis demonstrates that using an index hint in a query is actually a directive and not just a suggestion to the query optimizer as would be implied by the common English definition of the term “hint“.  Jonathan’s test case showed that that using the hint did not lower the calculated cost of the index access as stated in a 2001 thread on asktom.oracle.com, but instead restricted the access paths that were available to the optimizer, assuming that the index hint was valid.
  • Cost in explain plan vs elapsed time - the calculated cost for accessing a partitioned table is higher than that for an equivalent unpartitioned table, yet the ELAPSED_TIME in V$SQLAREA is less for the query accessing the partitioned table.
  • Different cost value ,but have the same execute time – changing the CPU cost and IO cost for a function decreased the cost of a query from 141 to 115, yet the query still takes just as long to execute.
  • About cost in tkprof – I offered a couple of test cases that showed why one should not bother to determine the calculated query cost once a TKPROF summary is generated from a 10046 trace of the query execution.

There have been other OTN threads that discussed query costs, and that OTN thread lead to an earlier blog article on this site

My responses in the About cost in tkprof thread follow – maybe someone will find the information to be helpful:

Oracle Database 11.1.0.6 and higher will output the calculated cost for a line in the execution plan when the STAT lines are written to the 10046 trace file. These STAT lines are then read by TKPROF and output in the “Row Source Operation” section of the output. In the case of 11.1.0.6 and greater, this is the estimated cost for the actual execution plan used by the query. Autotrace shows the estimated cost for what might be the execution plan that will be used by the query (see this blog article). Prior to Oracle Database 11.1.0.6, the calculated, estimated cost was not written to the trace file. The hash value and SQL ID (starting with 11.1.0.6) will be written to the raw 10046 trace file – you can use that information to pull the actual execution plan with cost figures from the library cache along with estimated costs using DBMS_XPLAN.DISPLAY_CURSOR (assuming that you are running Oracle Database 10g R1 or higher). See this three part series for details.

The estimated cost figure has a very limited value outside of its intended use by Oracle’s query optimizer as a means to derive the individual steps within a specific execution plan. You should not use the estimated cost figures as a means to determine if one query is more efficient than another query.

There is a reason why I did not suggest using the EXPLAIN parameter of TKPROF. The main reason is that it may show a very different execution plan. Secondly, it apparently does not include any COST values for Oracle 10.2.0.x. Third, I recommended against using the calculated cost values for comparisons. A demonstration is necessary:

CREATE TABLE T1(
  C1 NUMBER,
  C2 VARCHAR2(255),
  PRIMARY KEY (C1));

CREATE TABLE T2(
  C1 NUMBER,
  C2 VARCHAR2(255),
  PRIMARY KEY (C1));

INSERT INTO
  T1
SELECT
  ROWNUM,
  LPAD('A',255,'A')
FROM
  (SELECT
    ROWNUM RN
  FROM
    DUAL
  CONNECT BY
    LEVEL<=10000) V1,
  (SELECT
    ROWNUM RN
  FROM
    DUAL
  CONNECT BY
    LEVEL<=30) V2;

INSERT INTO
  T2
SELECT
  ROWNUM,
  LPAD('A',255,'A')
FROM
  (SELECT
    ROWNUM RN
  FROM
    DUAL
  CONNECT BY
    LEVEL<=10000) V1,
  (SELECT
    ROWNUM RN
  FROM
    DUAL
  CONNECT BY
    LEVEL<=30) V2;

COMMIT;

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T1',CASCADE=>TRUE)
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T2',CASCADE=>TRUE)

We now have two tables, each with 300,000 rows – each of the tables has a primary key index. The test case continues:

VARIABLE N1 NUMBER
VARIABLE N2 NUMBER

EXEC :N1:=1
EXEC :N2:=100000

SET AUTOTRACE TRACEONLY STATISTICS

ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT FOREVER, LEVEL 4';
ALTER SESSION SET TRACEFILE_IDENTIFIER = 'EXPLAIN_PLAN_TEST';

SELECT
  T1.C1,
  T2.C1,
  SUBSTR(T1.C2,1,1) T1_C2,
  SUBSTR(T2.C2,1,1) T2_C2
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1
  AND T1.C1 BETWEEN :N1 AND :N2;

EXEC :N2:=10

SELECT
  T1.C1,
  T2.C1,
  SUBSTR(T1.C2,1,1) T1_C2,
  SUBSTR(T2.C2,1,1) T2_C2
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1
  AND T1.C1 BETWEEN :N1 AND :N2;

ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT OFF';

The output of the above is the following:

100000 rows selected.

Statistics
---------------------------------------------------
      45  recursive calls
       0  db block gets
   24673  consistent gets
   26419  physical reads
       0  redo size
 2051555  bytes sent via SQL*Net to client
   73660  bytes received via SQL*Net from client
    6668  SQL*Net roundtrips to/from client
       0  sorts (memory)
       0  sorts (disk)
  100000  rows processed

10 rows selected.

Statistics
---------------------------------------------------
     0  recursive calls
     0  db block gets
 22257  consistent gets
 20896  physical reads
     0  redo size
   605  bytes sent via SQL*Net to client
   334  bytes received via SQL*Net from client
     2  SQL*Net roundtrips to/from client
     0  sorts (memory)
     0  sorts (disk)
    10  rows processed

Now we have a 10046 trace file, move it back to the client PC, and process the trace file using TKPROF with the EXPLAIN option (not recommended for typical use).

exit

TKPROF test_ora_2392_explain_plan_test.trc test_ora_2392_explain_plan_test.txt EXPLAIN=testuser/testpass@test

So, what does the resulting TKPROF summary file show?

SELECT
  T1.C1,
  T2.C1,
  SUBSTR(T1.C2,1,1) T1_C2,
  SUBSTR(T2.C2,1,1) T2_C2
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1
  AND T1.C1 BETWEEN :N1 AND :N2

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        2      0.00       0.00          0          0          0           0
Execute      2      0.00       0.00          0          0          0           0
Fetch     6670      2.06       3.09      47315      46930          0      100010
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total     6674      2.06       3.09      47315      46930          0      100010

Misses in library cache during parse: 1
Misses in library cache during execute: 1
Optimizer mode: ALL_ROWS
Parsing user id: 31  (TESTUSER)

Rows     Row Source Operation
-------  ---------------------------------------------------
 100000  FILTER  (cr=24673 pr=26419 pw=5239 time=41377235 us)
 100000   HASH JOIN  (cr=24673 pr=26419 pw=5239 time=41177148 us)
 100000    TABLE ACCESS FULL T1 (cr=11128 pr=11113 pw=0 time=495 us)
 100000    TABLE ACCESS FULL T2 (cr=13545 pr=10067 pw=0 time=100493 us)

Rows     Execution Plan
-------  ---------------------------------------------------
      0  SELECT STATEMENT   MODE: ALL_ROWS
 100000   FILTER
 100000    HASH JOIN
 100000     TABLE ACCESS   MODE: ANALYZED (BY INDEX ROWID) OF 'T1' (TABLE)
 100000      INDEX   MODE: ANALYZED (RANGE SCAN) OF 'SYS_C0020592' (INDEX (UNIQUE))
      0     TABLE ACCESS   MODE: ANALYZED (BY INDEX ROWID) OF 'T2' (TABLE)
      0      INDEX   MODE: ANALYZED (RANGE SCAN) OF 'SYS_C0020593' (INDEX (UNIQUE))

Hopefully, the above helps you understand why I did not suggest the use of the EXPLAIN parameter.

The calculated cost may be very misleading. There are a large number of factors that are considered when the costs are derived, and some of those factors may mislead the optimizer into either understating the true cost to the end user (typically considered to be execution time) or overstating the true cost to the end user. To the query optimizer, cost is supposed to represent time, but as I stated, you should NOT compare the calculated cost of one query with the calculated cost for another query.

It would probably be beneficial for you to experiment with a couple of things:

  • 10046 trace at either level 8 or 12
  • SET TIMING ON in SQL*Plus
  • DBMS_XPLAN.DISPLAY_CURSOR

Ignore the costs.

For fun, here is another test script, using the same tables that were created earlier:

VARIABLE N1 NUMBER
VARIABLE N2 NUMBER

EXEC :N1:=1
EXEC :N2:=100000

SET TIMING ON
SET AUTOTRACE TRACEONLY STATISTICS EXPLAIN
SET PAGESIZE 1000
SET LINESIZE 140

ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT FOREVER, LEVEL 12';
ALTER SESSION SET TRACEFILE_IDENTIFIER = 'EXPLAIN_PLAN_TEST2';

SELECT
  T1.C1,   T2.C1,
  SUBSTR(T1.C2,1,1) T1_C2,
  SUBSTR(T2.C2,1,1) T2_C2
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1
  AND T1.C1 BETWEEN :N1 AND :N2;

EXEC :N2:=10

SELECT
  T1.C1,   T2.C1,
  SUBSTR(T1.C2,1,1) T1_C2,
  SUBSTR(T2.C2,1,1) T2_C2
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1
  AND T1.C1 BETWEEN :N1 AND :N2;

SELECT SYSDATE FROM DUAL;

ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT OFF';

I actually executed the script a couple of times, changing the TRACEFILE_IDENTIFIER each time, because the first execution would require a hard parse and would likely require more physical reads – I wanted to factor those out of the run time. The following was displayed when the script ran:

100000 rows selected.

Elapsed: 00:00:03.82

Execution Plan
----------------------------------------------------------
Plan hash value: 77308553

----------------------------------------------------------------------------------------------
| Id  | Operation                     | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |              |   749 |   380K|   111   (1)| 00:00:01 |
|*  1 |  FILTER                       |              |       |       |            |          |
|*  2 |   HASH JOIN                   |              |   749 |   380K|   111   (1)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID| T1           |   750 |   190K|    55   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN          | SYS_C0020592 |  1350 |       |     4   (0)| 00:00:01 |
|   5 |    TABLE ACCESS BY INDEX ROWID| T2           |   750 |   190K|    55   (0)| 00:00:01 |
|*  6 |     INDEX RANGE SCAN          | SYS_C0020593 |  1350 |       |     4   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(TO_NUMBER(:N1)<=TO_NUMBER(:N2))
   2 - access("T1"."C1"="T2"."C1")
   4 - access("T1"."C1">=TO_NUMBER(:N1) AND "T1"."C1"<=TO_NUMBER(:N2))
   6 - access("T2"."C1">=TO_NUMBER(:N1) AND "T2"."C1"<=TO_NUMBER(:N2))

Statistics
------------------------------------------------
      0  recursive calls
      0  db block gets
  28923  consistent gets
  21711  physical reads
      0  redo size
2051555  bytes sent via SQL*Net to client
  73660  bytes received via SQL*Net from client
   6668  SQL*Net roundtrips to/from client
      0  sorts (memory)
      0  sorts (disk)
 100000  rows processed

10 rows selected.

Elapsed: 00:00:01.04

Execution Plan
----------------------------------------------------------
Plan hash value: 77308553

----------------------------------------------------------------------------------------------
| Id  | Operation                     | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |              |   749 |   380K|   111   (1)| 00:00:01 |
|*  1 |  FILTER                       |              |       |       |            |          |
|*  2 |   HASH JOIN                   |              |   749 |   380K|   111   (1)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID| T1           |   750 |   190K|    55   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN          | SYS_C0020592 |  1350 |       |     4   (0)| 00:00:01 |
|   5 |    TABLE ACCESS BY INDEX ROWID| T2           |   750 |   190K|    55   (0)| 00:00:01 |
|*  6 |     INDEX RANGE SCAN          | SYS_C0020593 |  1350 |       |     4   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(TO_NUMBER(:N1)<=TO_NUMBER(:N2))
   2 - access("T1"."C1"="T2"."C1")
   4 - access("T1"."C1">=TO_NUMBER(:N1) AND "T1"."C1"<=TO_NUMBER(:N2))
   6 - access("T2"."C1">=TO_NUMBER(:N1) AND "T2"."C1"<=TO_NUMBER(:N2))

Statistics
---------------------------------------------
    0  recursive calls
    0  db block gets
22257  consistent gets
21716  physical reads
    0  redo size
  605  bytes sent via SQL*Net to client
  334  bytes received via SQL*Net from client
    2  SQL*Net roundtrips to/from client
    0  sorts (memory)
    0  sorts (disk)
   10  rows processed

In the above, notice that the first execution required 3.82 seconds, and had a calculated cost of 111. The second execution required 1.04 seconds, and also had a calculated cost of 111. Note also that the displayed execution plan is incorrect. The elapsed time was output automatically because the script included SET TIMING ON. Keep in mind the effects of the buffer cache – the first time a SQL statement is executed, it will probably take longer to execute than it does on the second execution, and that is the reason why I executed the SQL statement a couple of extra times before I started looking at the performance. The above statistics indicate that 21,711 physical block reads were performed and 28,923 consistent gets on the first execution. On the second execution the number of physical block reads increased to 21,716 while the number of consistent gets decreased to 22,257.

If we run the 10046 trace file through TKPROF:

TKPROF test_ora_1608_explain_plan_test2.trc test_ora_1608_explain_plan_test2.txt

We now see the following in the resulting output:

SELECT
  T1.C1,   T2.C1,
  SUBSTR(T1.C2,1,1) T1_C2,
  SUBSTR(T2.C2,1,1) T2_C2
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1
  AND T1.C1 BETWEEN :N1 AND :N2

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        2      0.00       0.00          0          0          0           0
Execute      2      0.00       0.00          0          0          0           0
Fetch     6670      2.35       2.60      43427      51180          0      100010
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total     6674      2.35       2.60      43427      51180          0      100010

Misses in library cache during parse: 0
Optimizer mode: ALL_ROWS
Parsing user id: 31 

Rows     Row Source Operation
-------  ---------------------------------------------------
 100000  FILTER  (cr=28923 pr=21711 pw=0 time=1299979 us)
 100000   HASH JOIN  (cr=28923 pr=21711 pw=0 time=1199904 us)
 100000    TABLE ACCESS FULL T1 (cr=11128 pr=11028 pw=0 time=596 us)
 100000    TABLE ACCESS FULL T2 (cr=17795 pr=10683 pw=0 time=100562 us)

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                    6670        0.00          0.00
  db file scattered read                        436        0.01          0.97
  db file sequential read                       755        0.00          0.05
  SQL*Net message from client                  6670        0.01          1.86

The wait events are critical to the understanding of why the query required as long as it did to execute.

Now, let’s jump into the 10046 raw trace file. We see the following:

PARSING IN CURSOR #6 len=145 dep=0 uid=31 oct=3 lid=31 tim=3828906982 hv=631402459 ad='a68ddf18'
SELECT
  T1.C1,   T2.C1,
  SUBSTR(T1.C2,1,1) T1_C2,
  SUBSTR(T2.C2,1,1) T2_C2
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1
  AND T1.C1 BETWEEN :N1 AND :N2
END OF STMT
PARSE #6:c=0,e=33,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,tim=3828906975
BINDS #6:
kkscoacd
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=03 fl2=1000000 frm=00 csi=00 siz=48 off=0
  kxsbbbfp=13cb6528  bln=22  avl=02  flg=05
  value=1
 Bind#1
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=03 fl2=1000000 frm=00 csi=00 siz=0 off=24
  kxsbbbfp=13cb6540  bln=22  avl=02  flg=01
  value=100000
...
STAT #6 id=1 cnt=100000 pid=0 pos=1 obj=0 op='FILTER  (cr=28923 pr=21711 pw=0 time=1299979 us)'
STAT #6 id=2 cnt=100000 pid=1 pos=1 obj=0 op='HASH JOIN  (cr=28923 pr=21711 pw=0 time=1199904 us)'
STAT #6 id=3 cnt=100000 pid=2 pos=1 obj=114319 op='TABLE ACCESS FULL T1 (cr=11128 pr=11028 pw=0 time=596 us)'
STAT #6 id=4 cnt=100000 pid=2 pos=2 obj=114321 op='TABLE ACCESS FULL T2 (cr=17795 pr=10683 pw=0 time=100562 us)'

There is the hash value: hv=631402459 (we should probably also do something with the address because of the possibility of multiple unrelated SQL statements potentially having the same hash value).  Now, back in SQL*Plus, we can do the following:

SET AUTOTRACE OFF

SELECT
  SQL_ID
FROM
  V$SQL
WHERE
  HASH_VALUE=631402459;

SQL_ID
-------------
5pf2mzwku4vyv

There is the SQL_ID that we need to pull the actual execution plan from the library cache. If we integrate that SQL statement into another than pulls back the actual execution plan, this is what we see:

SELECT
  *
FROM
  TABLE(DBMS_XPLAN.DISPLAY_CURSOR(
    (SELECT
      SQL_ID
    FROM
      V$SQL
    WHERE
      HASH_VALUE=631402459),
    NULL,
    'TYPICAL +PEEKED_BINDS'));

SQL_ID  5pf2mzwku4vyv, child number 0
-------------------------------------
SELECT   T1.C1,   T2.C1,   SUBSTR(T1.C2,1,1) T1_C2,   SUBSTR(T2.C2,1,1)
T2_C2 FROM   T1,   T2 WHERE   T1.C1=T2.C1   AND T1.C1 BETWEEN :N1 AND :N2

Plan hash value: 487071653

------------------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |       |       |       |  1809 (100)|          |
|*  1 |  FILTER             |      |       |       |       |            |          |
|*  2 |   HASH JOIN         |      |   100K|    49M|    25M|  1809   (4)| 00:00:09 |
|*  3 |    TABLE ACCESS FULL| T1   |   100K|    24M|       |   568   (5)| 00:00:03 |
|*  4 |    TABLE ACCESS FULL| T2   |   100K|    24M|       |   568   (5)| 00:00:03 |
------------------------------------------------------------------------------------

Peeked Binds (identified by position):
--------------------------------------
   1 - :N1 (NUMBER): 1
   2 - :N2 (NUMBER): 100000

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(:N1<=:N2)
   2 - access("T1"."C1"="T2"."C1")
   3 - filter(("T1"."C1"<=:N2 AND "T1"."C1">=:N1))
   4 - filter(("T2"."C1"<=:N2 AND "T2"."C1">=:N1))

So, the above shows the calculated cost (1809) for the actual execution plan used for the query when it was executed and the peeked bind variables that were submitted during the hard parse.

The next example changes the STATISTICS_LEVEL at the session level to ALL, which has a tendency to increase the execution time for queries (a hint can be added to the query so that changing the STATISTICS_LEVEL is not necessary):

VARIABLE N1 NUMBER
VARIABLE N2 NUMBER

EXEC :N1:=1
EXEC :N2:=100000

ALTER SESSION SET STATISTICS_LEVEL='ALL';

SELECT
  T1.C1,   T2.C1,
  SUBSTR(T1.C2,1,1) T1_C2,
  SUBSTR(T2.C2,1,1) T2_C2
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1
  AND T1.C1 BETWEEN :N1 AND :N2;

OK, let’s see the execution plan for the query that was just executed:

SELECT
  *
FROM
  TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST +PEEKED_BINDS +PROJECTION +ALIAS +PREDICATE +COST +BYTES'));

Plan hash value: 487071653

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows |E-Bytes|E-Temp | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  | Writes |  OMem|  1Mem | Used-Mem | Used-Tmp|
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|*  1 |  FILTER             |      |      1 |        |       |       |            |    100K|00:00:04.56 |   23056 |  27980 |   6231 |      |       |          |         |
|*  2 |   HASH JOIN         |      |      1 |    100K|    49M|    25M|  1809   (4)|    100K|00:00:04.45 |   23056 |  27980 |   6231 |   29M|  3792K|   11M (1)|   52224 |
|*  3 |    TABLE ACCESS FULL| T1   |      1 |    100K|    24M|       |   568   (5)|    100K|00:00:01.99 |   11128 |  11054 |      0 |      |       |          |         |
|*  4 |    TABLE ACCESS FULL| T2   |      1 |    100K|    24M|       |   568   (5)|    100K|00:00:01.51 |   11928 |  10695 |      0 |      |       |          |         |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$1
   3 - SEL$1 / T1@SEL$1
   4 - SEL$1 / T2@SEL$1

Peeked Binds (identified by position):
--------------------------------------
   1 - (NUMBER): 1
   2 - (NUMBER): 100000

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(:N1<=:N2)
   2 - access("T1"."C1"="T2"."C1")
   3 - filter(("T1"."C1"<=:N2 AND "T1"."C1">=:N1))
   4 - filter(("T2"."C1"<=:N2 AND "T2"."C1">=:N1))

Column Projection Information (identified by operation id):
-----------------------------------------------------------
   1 - "T1"."C1"[NUMBER,22], "T2"."C1"[NUMBER,22], "T1"."C2"[VARCHAR2,255], "T2"."C2"[VARCHAR2,255]
   2 - (#keys=1) "T1"."C1"[NUMBER,22], "T2"."C1"[NUMBER,22], "T1"."C2"[VARCHAR2,255], "T2"."C2"[VARCHAR2,255]
   3 - "T1"."C1"[NUMBER,22], "T1"."C2"[VARCHAR2,255]
   4 - "T2"."C1"[NUMBER,22], "T2"."C2"[VARCHAR2,255]

The above shows the estimated costs as well as the actual execution statistics.

This blog has a number of related articles. For example, this one on DBMS_XPLAN.





Finding the Right Index for this SQL Statement

1 04 2010

April 1, 2010

We have a SQL statement in our ERP package that looks like the following:

SELECT TRANSACTION_ID , WORKORDER_BASE_ID , WORKORDER_LOT_ID , WORKORDER_SPLIT_ID , WORKORDER_SUB_ID ,
OPERATION_SEQ_NO , RESOURCE_ID , TYPE , INDIRECT_ID , CLOCK_IN , TRUNC(TRANSACTION_DATE) , BREAK_CLOCKIN ,
HOURS_BREAK , HOURS_BREAK_IND , SHIFT_DATE , HOURS_PREVIOUS , HOURS_OVERALL , CLOCK_OUT , HOURS_WORKED ,
USER_ID , CREATE_DATE , STARTING_TRANS , UNADJ_CLOCK_IN , PRD_INSP_PLAN_ID , GOOD_QTY , BAD_QTY , DEVIATION_ID 
FROM LABOR_TICKET WHERE  EMPLOYEE_ID = :1  AND HOURS_WORKED IS NULL  ORDER BY EMPLOYEE_ID, TRANSACTION_ID DESC

This SQL statement is fired at the database many times a day, as employees on the production floor start and end production run batches (and also when they “clock-in” and “clock-out” for the day).  The ERP system has an index on the EMPLOYEE_ID column, but while that index will work great for the employee who was just recently hired, unfortunately that index will be inefficient for those employees with 15+ years of transaction history in the database.

One of the suggestions from the ERP user’s group back in 2001 was to create the following index to improve performance, which would hopefully eliminate the need to sort the rows retrieved by the query:

CREATE INDEX VT_LABOR_TICKET_1 ON LABOR_TICKET(EMPLOYEE_ID, TRANSACTION_ID DESC);

In case you are wondering, that descending index specification causes Oracle to create a function based index on recent Oracle releases, while that specification was silently ignored on older Oracle releases (prior to Oracle Database 8i for Enterprise Edition, if I recall correctly, and prior to Oracle Database 9i for Standard Edition).

Did the index help?  Maybe, maybe not.  Consider what would probably happen with a recent Oracle release.  The nightly automatic statistics collection process calculates that the clustering factor for this index is very high, and automatically generates a histogram on the column due to the uneven distribution of the column values.  With recent Oracle Database releases, bind variable peeking is enabled by default.  The employee with 15+ years of transaction history in the system is the first to sign in for the day – and a new child cursor is created for the SQL statement due to the statistics collection 5+ hours earlier.  Assuming that this entire table is in the KEEP buffer pool, the execution plan might look something like this:

Plan hash value: 2750877200

------------------------------------------------------------------------------------------------------------------------
| Id  | Operation          | Name         | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------------
|   1 |  SORT ORDER BY     |              |      1 |      1 |      0 |00:00:00.42 |   35610 |  1024 |  1024 |          |
|*  2 |   TABLE ACCESS FULL| LABOR_TICKET |      1 |      1 |      0 |00:00:00.42 |   35610 |       |       |          |
------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(("HOURS_WORKED" IS NULL AND "EMPLOYEE_ID"=:1))

A 10046 trace at level 12 for the query looks like this:

PARSING IN CURSOR #2 len=538 dep=0 uid=30 oct=3 lid=30 tim=402571814 hv=1657901498 ad='8af31240'
SELECT TRANSACTION_ID , WORKORDER_BASE_ID , WORKORDER_LOT_ID , WORKORDER_SPLIT_ID , WORKORDER_SUB_ID , OPERATION_SEQ_NO , RESOURCE_ID , TYPE , INDIRECT_ID , CLOCK_IN , TRUNC(TRANSACTION_DATE) , BREAK_CLOCKIN , HOURS_BREAK , HOURS_BREAK_IND , SHIFT_DATE , HOURS_PREVIOUS , HOURS_OVERALL , CLOCK_OUT , HOURS_WORKED , USER_ID , CREATE_DATE , STARTING_TRANS , UNADJ_CLOCK_IN , PRD_INSP_PLAN_ID , GOOD_QTY , BAD_QTY , DEVIATION_ID  FROM LABOR_TICKET WHERE  EMPLOYEE_ID = :1
AND HOURS_WORKED IS NULL
ORDER BY EMPLOYEE_ID, TRANSACTION_ID DESC
END OF STMT
PARSE #2:c=0,e=1198,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,tim=402571806
BINDS #2:
kkscoacd
 Bind#0
  oacdty=96 mxl=32(12) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=1000010 frm=01 csi=178 siz=32 off=0
  kxsbbbfp=2ee851f8  bln=32  avl=06  flg=05
  value="EMP103"
EXEC #2:c=0,e=5512,p=0,cr=2,cu=0,mis=1,r=0,dep=0,og=1,tim=402577873
FETCH #2:c=406250,e=415234,p=0,cr=35610,cu=0,mis=0,r=0,dep=0,og=1,tim=402993364
STAT #2 id=1 cnt=0 pid=0 pos=1 obj=0 op='SORT ORDER BY (cr=35610 pr=0 pw=0 time=415238 us)'
STAT #2 id=2 cnt=0 pid=1 pos=1 obj=12347 op='TABLE ACCESS FULL LABOR_TICKET (cr=35610 pr=0 pw=0 time=415190 us)'

The production floor employee is happy to get back to work with only a 1/2 second delay.  But what would happen if the table was not fully cached in the buffer cache, or if a new employee made his way to the computer first after the statistics collection?  Either the VT_LABOR_TICKET_1 index or the index just on the EMPLOYEE_ID column might have been used for the execution plan, and our long term employee might have had to endure a much longer wait to get back to work, and certainly would not leave the computer in a happy mood.

If we could modify the hard-coded SQL statement, we might try forcing the use of the VT_LABOR_TICKET_1 index, and we might even specify to only look at those transactions that we created in the last couple of days.  Let’s just try a hint to use the index, which we could implement using this approach:

SELECT /*+ INDEX(LABOR_TICKET VT_LABOR_TICKET_1) */ TRANSACTION_ID , WORKORDER_BASE_ID ,
WORKORDER_LOT_ID , WORKORDER_SPLIT_ID , WORKORDER_SUB_ID , OPERATION_SEQ_NO , RESOURCE_ID , TYPE ,
INDIRECT_ID , CLOCK_IN , TRUNC(TRANSACTION_DATE) , BREAK_CLOCKIN , HOURS_BREAK , HOURS_BREAK_IND ,
SHIFT_DATE , HOURS_PREVIOUS , HOURS_OVERALL , CLOCK_OUT , HOURS_WORKED , USER_ID , CREATE_DATE ,
STARTING_TRANS , UNADJ_CLOCK_IN , PRD_INSP_PLAN_ID , GOOD_QTY , BAD_QTY , DEVIATION_ID  FROM
LABOR_TICKET WHERE  EMPLOYEE_ID = :1  AND HOURS_WORKED IS NULL  ORDER BY EMPLOYEE_ID,
TRANSACTION_ID DESC;

For one of the employees with 15+ years of transaction history, the execution plan might look something like this:

Plan hash value: 1422135358

-----------------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name              | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-----------------------------------------------------------------------------------------------------------
|*  1 |  TABLE ACCESS BY INDEX ROWID| LABOR_TICKET      |      1 |      1 |      0 |00:00:00.05 |    8684 |
|*  2 |   INDEX RANGE SCAN          | VT_LABOR_TICKET_1 |      1 |  11719 |  16376 |00:00:00.01 |      66 |
-----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("HOURS_WORKED" IS NULL)
   2 - access("EMPLOYEE_ID"=:1)
       filter("EMPLOYEE_ID"=:1)

The execution completed 10 times faster than the full table scan, which was certainly helped by careful use of the KEEP buffer pool to limit the impact of the index’s high clustering factor.  The 10046 trace at level 12 might look something like this:

PARSING IN CURSOR #13 len=583 dep=0 uid=30 oct=3 lid=30 tim=2180285132 hv=989309132 ad='7c9e2b58'
SELECT /*+ INDEX(LABOR_TICKET VT_LABOR_TICKET_1) */ TRANSACTION_ID , WORKORDER_BASE_ID , WORKORDER_LOT_ID , WORKORDER_SPLIT_ID , WORKORDER_SUB_ID , OPERATION_SEQ_NO , RESOURCE_ID , TYPE , INDIRECT_ID , CLOCK_IN , TRUNC(TRANSACTION_DATE) , BREAK_CLOCKIN , HOURS_BREAK , HOURS_BREAK_IND , SHIFT_DATE , HOURS_PREVIOUS , HOURS_OVERALL , CLOCK_OUT , HOURS_WORKED , USER_ID , CREATE_DATE , STARTING_TRANS , UNADJ_CLOCK_IN , PRD_INSP_PLAN_ID , GOOD_QTY , BAD_QTY , DEVIATION_ID  FROM LABOR_TICKET WHERE  EMPLOYEE_ID = :1
AND HOURS_WORKED IS NULL
ORDER BY EMPLOYEE_ID, TRANSACTION_ID DESC
END OF STMT
PARSE #13:c=0,e=1218,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,tim=2180285124
BINDS #13:
kkscoacd
 Bind#0
  oacdty=96 mxl=32(12) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=1000010 frm=01 csi=178 siz=32 off=0
  kxsbbbfp=2db72048  bln=32  avl=06  flg=05
  value="EMP103"
EXEC #13:c=15625,e=5540,p=0,cr=2,cu=0,mis=1,r=0,dep=0,og=1,tim=2180291322
WAIT #13: nam='SQL*Net message to client' ela= 5 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=2180291466
FETCH #13:c=46875,e=54726,p=0,cr=8684,cu=0,mis=0,r=0,dep=0,og=1,tim=2180346313
STAT #13 id=1 cnt=0 pid=0 pos=1 obj=12347 op='TABLE ACCESS BY INDEX ROWID LABOR_TICKET (cr=8684 pr=0 pw=0 time=54719 us)'
STAT #13 id=2 cnt=16376 pid=1 pos=1 obj=14555 op='INDEX RANGE SCAN VT_LABOR_TICKET_1 (cr=66 pr=0 pw=0 time=51 us)'

If you have read many of Richard Foote’s blog articles, you are probably well aware that B*Tree indexes do not store NULL values… at least without using fancy tricks as described on his blog… or if there are multiple columns in the index and at least one column value is not NULL.  So, we create a new index:

CREATE INDEX IND_LT_EMPLOYEE_HW ON LABOR_TICKET(EMPLOYEE_ID,HOURS_WORKED);

The execution plan might look like the following for the employee with the 15+ years of transaction history:

SELECT /*+ INDEX(LABOR_TICKET IND_LT_EMPLOYEE_HW) */ TRANSACTION_ID , WORKORDER_BASE_ID , WORKORDER_LOT_ID , WORKORDER_SPLIT_ID
, WORKORDER_SUB_ID , OPERATION_SEQ_NO , RESOURCE_ID , TYPE , INDIRECT_ID , CLOCK_IN , TRUNC(TRANSACTION_DATE) , BREAK_CLOCKIN ,
HOURS_BREAK , HOURS_BREAK_IND , SHIFT_DATE , HOURS_PREVIOUS , HOURS_OVERALL , CLOCK_OUT , HOURS_WORKED , USER_ID , CREATE_DATE
, STARTING_TRANS , UNADJ_CLOCK_IN , PRD_INSP_PLAN_ID , GOOD_QTY , BAD_QTY , DEVIATION_ID  FROM LABOR_TICKET WHERE  EMPLOYEE_ID
= :1  AND HOURS_WORKED IS NULL  ORDER BY EMPLOYEE_ID, TRANSACTION_ID DESC

Plan hash value: 2790784761

----------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name               | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------------
|   1 |  SORT ORDER BY               |                    |      1 |      1 |      0 |00:00:00.01 |       3 |  1024 |  1024 |          |
|   2 |   TABLE ACCESS BY INDEX ROWID| LABOR_TICKET       |      1 |      1 |      0 |00:00:00.01 |       3 |       |       |          |
|*  3 |    INDEX RANGE SCAN          | IND_LT_EMPLOYEE_HW |      1 |      1 |      0 |00:00:00.01 |       3 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("EMPLOYEE_ID"=:1 AND "HOURS_WORKED" IS NULL)

And a 10046 trace at level 12 might look like this:

PARSING IN CURSOR #6 len=584 dep=0 uid=30 oct=3 lid=30 tim=2919660608 hv=4195357598 ad='66272978'
SELECT /*+ INDEX(LABOR_TICKET IND_LT_EMPLOYEE_HW) */ TRANSACTION_ID , WORKORDER_BASE_ID , WORKORDER_LOT_ID , WORKORDER_SPLIT_ID , WORKORDER_SUB_ID , OPERATION_SEQ_NO , RESOURCE_ID , TYPE , INDIRECT_ID , CLOCK_IN , TRUNC(TRANSACTION_DATE) , BREAK_CLOCKIN , HOURS_BREAK , HOURS_BREAK_IND , SHIFT_DATE , HOURS_PREVIOUS , HOURS_OVERALL , CLOCK_OUT , HOURS_WORKED , USER_ID , CREATE_DATE , STARTING_TRANS , UNADJ_CLOCK_IN , PRD_INSP_PLAN_ID , GOOD_QTY , BAD_QTY , DEVIATION_ID  FROM LABOR_TICKET WHERE  EMPLOYEE_ID = :1
AND HOURS_WORKED IS NULL
ORDER BY EMPLOYEE_ID, TRANSACTION_ID DESC
END OF STMT
PARSE #6:c=0,e=1230,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,tim=2919660600
BINDS #6:
kkscoacd
 Bind#0
  oacdty=96 mxl=32(12) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=1000010 frm=01 csi=178 siz=32 off=0
  kxsbbbfp=2db7d620  bln=32  avl=06  flg=05
  value="EMP103"
EXEC #6:c=0,e=5468,p=0,cr=2,cu=0,mis=1,r=0,dep=0,og=1,tim=2919666706
FETCH #6:c=0,e=103,p=0,cr=3,cu=0,mis=0,r=0,dep=0,og=1,tim=2919667106
STAT #6 id=1 cnt=0 pid=0 pos=1 obj=0 op='SORT ORDER BY (cr=3 pr=0 pw=0 time=114 us)'
STAT #6 id=2 cnt=0 pid=1 pos=1 obj=12347 op='TABLE ACCESS BY INDEX ROWID LABOR_TICKET (cr=3 pr=0 pw=0 time=72 us)'
STAT #6 id=3 cnt=0 pid=2 pos=1 obj=102065 op='INDEX RANGE SCAN IND_LT_EMPLOYEE_HW (cr=3 pr=0 pw=0 time=64 us)'

Five times faster than using the VT_LABOR_TICKET_1 index?  No, not if you look at the STAT lines in the 10046 trace.  When the VT_LABOR_TICKET_1 index was used, the top STAT line shows time=54719 us, while the top STAT line when the IND_LT_EMPLOYEE_HW index was used shows time=72 us – that is about 760 times faster.  As you might imagine, the employee overjoyed to return to the production floor 0.054647 seconds faster.  :-)

But this whole exercise is not about doing something 0.054647 seconds faster, but instead about eliminating unnecessary work.  We should be looking at what else that is executing simultaneously against the database is now able to complete faster due to less CPU and I/O competition from the needless work that was caused by this SQL statement.





SQL – Overlapping Transactions, How Many Hours have We Worked?

19 03 2010

March 19, 2010

A couple of years ago I received an email from an ERP mailing list that requested assistance with a task of determining how many hours an employee was in a building.  The ERP software allows employees to log into the computer system that job A was in-process between 8:00 AM and 10:00 AM and job B was in-process from 10:00 AM to noon, so in this case it would be a simple task of subtracting the corresponding starting time from the ending time and summing the results.  But what happens if the employee is responsible for monitoring automated processes, where the employee may monitor several simultaneous in-process jobs?  For example, job A was in-process between 8:00 AM and 10:00 AM and job B was in-process from 8:30 AM to 9:25 AM, job C was in-process from 9:00 AM to 11:00 AM, job D was in-process from 10:30 AM to noon.  In the case of the concurrent in-process jobs it becomes much more difficult to show that the employee worked for 4 hours, just as in the first example.

How could we solve this problem?  Paint a picture – there are 1440 minutes in a day, so we could just lay down each of the transactions on top of each of those minutes to see how many of the minutes are occupied.  If we start with the following, we obtain the number of minutes past midnight of the shift date for the clock in and clock out, rounded to the nearest minute:

SELECT
  EMPLOYEE_ID,
  SHIFT_DATE,
  ROUND((CLOCK_IN-SHIFT_DATE)*1440) CLOCK_IN,
  ROUND((CLOCK_OUT-SHIFT_DATE)*1440) CLOCK_OUT
FROM
  LABOR_TICKET
WHERE
  SHIFT_DATE=TRUNC(SYSDATE-1);

If we had a way of stepping through all of the minutes in the day, and potentially two days in the event that the shift date crosses midnight, we could count the number of minutes in which there was an active labor ticket for each employee.  To do this, we need a counter:

SELECT
  ROWNUM COUNTER
FROM
  DUAL
CONNECT BY
  LEVEL<=2880;

Now, if we combine the two SQL statements together, placing each in an inline view, we can count the number of distinct minutes for which there was a labor ticket:

SELECT
  LT.EMPLOYEE_ID,
  LT.SHIFT_DATE,
  MIN(CLOCK_IN) CLOCK_IN,
  MAX(CLOCK_OUT) CLOCK_OUT,
  COUNT(DISTINCT C.COUNTER)/60 HOURS
FROM
  (SELECT
    EMPLOYEE_ID,
    SHIFT_DATE,
    ROUND((ROUND(CLOCK_IN,'MI')-SHIFT_DATE)*1440) CLOCK_IN,
    ROUND((ROUND(CLOCK_OUT,'MI')-SHIFT_DATE)*1440) CLOCK_OUT
  FROM
    LABOR_TICKET
  WHERE
    SHIFT_DATE=TRUNC(SYSDATE-1)) LT,
  (SELECT
    ROWNUM COUNTER
  FROM
    DUAL
  CONNECT BY
    LEVEL<=2880) C
WHERE
  C.COUNTER>LT.CLOCK_IN
  AND C.COUNTER<=LT.CLOCK_OUT
GROUP BY
  LT.EMPLOYEE_ID,
  LT.SHIFT_DATE;

We have a couple of potential problems with the above:

  1. It could bury/peg the CPU in the database server due to the CONNECT BY.
  2. The CLOCK_IN and CLOCK_OUT times are not in a friendly format.

First, we create a counter table:

CREATE TABLE
  MY_COUNTER
AS
SELECT
  ROWNUM COUNTER
FROM
  DUAL
CONNECT BY
  LEVEL<=2880;

COMMIT;

Now, plugging in the counter table and cleaning up the CLOCK_IN and CLOCK_OUT times:

SELECT
  LT.EMPLOYEE_ID,
  LT.SHIFT_DATE,
  MIN(CLOCK_IN)/1440+SHIFT_DATE CLOCK_IN,
  MAX(CLOCK_OUT)/1440+SHIFT_DATE CLOCK_OUT,
  ROUND(COUNT(DISTINCT C.COUNTER)/60,2) HOURS
FROM
  (SELECT
    EMPLOYEE_ID,
    SHIFT_DATE,
    ROUND((ROUND(CLOCK_IN,'MI')-SHIFT_DATE)*1440) CLOCK_IN,
    ROUND((ROUND(CLOCK_OUT,'MI')-SHIFT_DATE)*1440) CLOCK_OUT
  FROM
    LABOR_TICKET
  WHERE
    SHIFT_DATE=TRUNC(SYSDATE-1)) LT,
  (SELECT
    COUNTER
  FROM
    MY_COUNTER) C
WHERE
  C.COUNTER>LT.CLOCK_IN
  AND C.COUNTER<=LT.CLOCK_OUT
GROUP BY
  LT.EMPLOYEE_ID,
  LT.SHIFT_DATE;

This solution was just as easy as painting by number.  It is just a matter of breaking a complicated task into solvable problems.





Improving Performance by Using a Cartesian Join

18 03 2010

March 18, 2010

(Forward to the Next Post in the Series)

This example is based on a demonstration that I gave during a presentation last year.  I did not go into great detail how the code worked, but I demonstrated that a carefully constructed Cartesian join is helpful and efficient for solutions to certain types of problems.  Assume that you have a table named APPLICATION_LIST that lists all of the modules belonging to an application, another table named USER_LIST that lists each Oracle username that has access to the application, and a third table named USER_PROGRAM_PERMISSION that lists each username that is denied access to one of the application modules.  The table construction may seem a little odd, but this is based on an actual example found in a commercial product.  The goal is to produce a cross-tab style report that shows all users’ permissions to all of the application modules, and have that cross-tab report appear in Excel.  The table definitions for our test tables look like this:

CREATE TABLE APPLICATION_LIST(
  PROGRAM_ID VARCHAR2(30),
  MENU_STRING VARCHAR2(30),
  PRIMARY KEY (PROGRAM_ID));

CREATE TABLE USER_LIST(
  NAME VARCHAR2(30),
  TYPE NUMBER,
  PRIMARY KEY(NAME));

CREATE TABLE USER_PROGRAM_PERMISSION(
  USER_ID VARCHAR2(30),
  PROGRAM_ID VARCHAR2(30),
  PERMISSION CHAR(1),
  PROGRAM_COMPONENT VARCHAR(20),
  PRIMARY KEY(USER_ID,PROGRAM_ID));

We will populate the test tables with the following script:

INSERT INTO
  APPLICATION_LIST
SELECT
  DBMS_RANDOM.STRING('Z',10) PROGRAM_ID,
  DBMS_RANDOM.STRING('A',20) MENU_STRING
FROM
  DUAL
CONNECT BY
  LEVEL<=100;

INSERT INTO
  USER_LIST
SELECT
  'USER'||TO_CHAR(ROWNUM) USER_ID,
  1 TYPE
FROM
  DUAL
CONNECT BY
  LEVEL<=300;

INSERT INTO
  USER_PROGRAM_PERMISSION
SELECT
  USER_ID,
  PROGRAM_COMPONENT,
  PERMISSION,
  'PROGRAM'
FROM
  (SELECT
    UL.NAME USER_ID,
    AL.PROGRAM_ID PROGRAM_COMPONENT,
    'N' PERMISSION
  FROM
    USER_LIST UL,
    APPLICATION_LIST AL
  ORDER BY
    DBMS_RANDOM.VALUE)
WHERE
  ROWNUM<=27000;

COMMIT;

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'APPLICATION_LIST',CASCADE=>TRUE)
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'USER_LIST',CASCADE=>TRUE)
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'USER_PROGRAM_PERMISSION',CASCADE=>TRUE)

The first VBS script will not use a Cartesian Merge join – instead it will retrieve a list of all users and all application modules, and then probe the USER_PROGRAM_PERMISSION table once for each USER_ID. (IntentionalCartesian1-NoCartesian.VBS - save as IntentionalCartesian1-NoCartesian.VBS)

Const adVarChar = 200
Const adCmdText = 1
Const adCmdStoredProc = 4
Const adParamInput = 1

Dim i
Dim j
Dim strSQL
Dim strLastColumn
Dim strProgramName
Dim strUsername
Dim strPassword
Dim strDatabase
Dim strPermission
Dim snpData
Dim comData
Dim dbDatabase
Dim objExcel

Set dbDatabase = CreateObject("ADODB.Connection")
Set snpData = CreateObject("ADODB.Recordset")
Set comData = CreateObject("ADODB.Command")

On Error Resume Next

strUsername = "MyUsername"
strPassword = "MyPassword"
strDatabase = "MyDB"

dbDatabase.ConnectionString = "Provider=OraOLEDB.Oracle;Data Source=" & strDatabase & ";User ID=" & strUsername & ";Password=" & strPassword & ";"
dbDatabase.Open
'Should verify that the connection attempt was successful, but I will leave that for someone else to code

Set snpData = CreateObject("adodb.recordset")

'Create an Excel connection
Set objExcel = CreateObject("Excel.Application")

With objExcel
    .Workbooks.Add
    .ActiveWorkbook.Sheets.Add
    .ActiveSheet.Name = "Application Permissions"

    'Remove the three default worksheets
    For i = 1 To .ActiveWorkbook.Sheets.Count
        If .ActiveWorkbook.Sheets(i).Name = "Sheet1" Then
            .Sheets("Sheet1").Select
            .ActiveWindow.SelectedSheets.Delete
        End If
        If .ActiveWorkbook.Sheets(i).Name = "Sheet2" Then
            .Sheets("Sheet2").Select
            .ActiveWindow.SelectedSheets.Delete
        End If
        If .ActiveWorkbook.Sheets(i).Name = "Sheet3" Then
            .Sheets("Sheet3").Select
            .ActiveWindow.SelectedSheets.Delete
        End If
    Next

    .Visible = True
End With

strSQL = "SELECT" & vbCrLf
strSQL = strSQL & "  PROGRAM_ID," & vbCrLf
strSQL = strSQL & "  MENU_STRING" & vbCrLf
strSQL = strSQL & "FROM" & vbCrLf
strSQL = strSQL & "  APPLICATION_LIST" & vbCrLf
strSQL = strSQL & "WHERE" & vbCrLf
strSQL = strSQL & "  PROGRAM_ID NOT IN ('.SEPARATOR')" & vbCrLf
strSQL = strSQL & "ORDER BY" & vbCrLf
strSQL = strSQL & "  MENU_STRING"

snpData.Open strSQL, dbDatabase

If snpData.State = 1 Then
    strProgramName = snpData.GetRows(30000)
    snpData.Close
End If

'Set the number of elements in the strPermission array to match the number of application module names
ReDim strPermission(UBound(strProgramName, 2))

'Copy the module names into Excel
For j = 0 To UBound(strPermission)
    strPermission(j) = strProgramName(1, j) ' & " - " & strProgramName(0, j)
Next
With objExcel
    .Application.ScreenUpdating = False
    .ActiveSheet.Range(.ActiveSheet.Cells(1, 2), .ActiveSheet.Cells(1, 1 + UBound(strProgramName, 2))) = strPermission
End With

'Retrieve the list of users
strSQL = "SELECT" & vbCrLf
strSQL = strSQL & "  NAME" & vbCrLf
strSQL = strSQL & "FROM" & vbCrLf
strSQL = strSQL & "  USER_LIST" & vbCrLf
strSQL = strSQL & "WHERE" & vbCrLf
strSQL = strSQL & "  TYPE=1" & vbCrLf
strSQL = strSQL & "  AND NAME NOT LIKE '%#'" & vbCrLf
strSQL = strSQL & "ORDER BY" & vbCrLf
strSQL = strSQL & "  NAME"

snpData.Open strSQL, dbDatabase

If snpData.State = 1 Then
    strUsername = snpData.GetRows(30000)
    snpData.Close
End If

'Set the SQL statement to use to retrieve permissions, ? is a bind variable placeholder
strSQL = "SELECT" & vbCrLf
strSQL = strSQL & "  PROGRAM_ID," & vbCrLf
strSQL = strSQL & "  PERMISSION" & vbCrLf
strSQL = strSQL & "FROM" & vbCrLf
strSQL = strSQL & "  USER_PROGRAM_PERMISSION" & vbCrLf
strSQL = strSQL & "WHERE" & vbCrLf
strSQL = strSQL & "  PROGRAM_COMPONENT='PROGRAM'" & vbCrLf
strSQL = strSQL & "  AND USER_ID= ?" & vbCrLf
strSQL = strSQL & "ORDER BY" & vbCrLf
strSQL = strSQL & "  PROGRAM_ID"

With comData
    'Set up the command properties
    .CommandText = strSQL
    .CommandType = adCmdText
    .CommandTimeout = 30
    .ActiveConnection = dbDatabase

    .Parameters.Append .CreateParameter("user_id", adVarChar, adParamInput, 30, "")
End With

'Loop through each user
For i = 0 To UBound(strUsername, 2)
    'Reset the permissions for the next user
    For j = 0 To UBound(strPermission)
        strPermission(j) = "Y"
    Next

    comData("user_id") = strUsername(0, i)
    Set snpData = comData.Execute

    If snpData.State = 1 Then
        Do While Not (snpData.EOF)
            For j = 0 To UBound(strProgramName, 2)
                If strProgramName(0, j) = snpData("program_id") Then
                    strPermission(j) = snpData("permission")
                    Exit For
                End If
            Next
            snpData.MoveNext
        Loop
        snpData.Close
    End If

    With objExcel
        .ActiveSheet.Cells(i + 2, 1) = strUsername(0, i)
        .ActiveSheet.Range(.ActiveSheet.Cells(i + 2, 2), .ActiveSheet.Cells(i + 2, 1 + UBound(strProgramName, 2))) = strPermission
    End With
Next

'Convert the number of columns into letter notation
strLastColumn = Chr(64 + Int((UBound(strProgramName, 2)) / 26)) & Chr(64 + ((UBound(strProgramName, 2)) Mod 26 + 1))

'Final cleanup
With objExcel
    .ActiveSheet.Range(.ActiveSheet.Cells(1, 2), .ActiveSheet.Cells(1, 1 + UBound(strProgramName, 2))).Orientation = 90
    .ActiveSheet.Columns("A:" & strLastColumn).AutoFit
    .Application.ScreenUpdating = True
    .ActiveSheet.Range("B2").Select
    .ActiveWindow.FreezePanes = True
End With

dbDatabase.Close

Set snpData = Nothing
Set comData = Nothing
Set objExcel = Nothing

If you ignore the fact that the above script redefines the meaning of the strUsername variable, the script works.  The problem with the script is that it repeatedly sends queries to the database, and probably should be optimized to remove the repeated queries (the number of repeated communication to the database server could have been much worse).  Let’s take a look at version 2 of the script (IntentionalCartesian2-NoCartesian.VBS – save as IntentionalCartesian2-NoCartesian.VBS)

Dim i
Dim j
Dim strSQL
Dim strLastColumn
Dim strProgramName
Dim strEmployeename
Dim strUsername
Dim strPassword
Dim strDatabase
Dim strPermission
Dim snpData
Dim dbDatabase
Dim objExcel

Set dbDatabase = CreateObject("ADODB.Connection")
Set snpData = CreateObject("ADODB.Recordset")

On Error Resume Next

strUsername = "MyUsername"
strPassword = "MyPassword"
strDatabase = "MyDB"

dbDatabase.ConnectionString = "Provider=OraOLEDB.Oracle;Data Source=" & strDatabase & ";User ID=" & strUsername & ";Password=" & strPassword & ";"
dbDatabase.Open
'Should verify that the connection attempt was successful, but I will leave that for someone else to code

Set snpData = CreateObject("adodb.recordset")

'Create an Excel connection
Set objExcel = CreateObject("Excel.Application")

With objExcel
    .Workbooks.Add
    .ActiveWorkbook.Sheets.Add
    .ActiveSheet.Name = "Visual Permissions"

    'Remove the three default worksheets
    For i = 1 To .ActiveWorkbook.Sheets.Count
        If .ActiveWorkbook.Sheets(i).Name = "Sheet1" Then
            .Sheets("Sheet1").Select
            .ActiveWindow.SelectedSheets.Delete
        End If
        If .ActiveWorkbook.Sheets(i).Name = "Sheet2" Then
            .Sheets("Sheet2").Select
            .ActiveWindow.SelectedSheets.Delete
        End If
        If .ActiveWorkbook.Sheets(i).Name = "Sheet3" Then
            .Sheets("Sheet3").Select
            .ActiveWindow.SelectedSheets.Delete
        End If
    Next

    .Visible = True
End With

strSQL = "SELECT" & vbCrLf
strSQL = strSQL & "  PROGRAM_ID," & vbCrLf
strSQL = strSQL & "  MENU_STRING" & vbCrLf
strSQL = strSQL & "FROM" & vbCrLf
strSQL = strSQL & "  APPLICATION_LIST" & vbCrLf
strSQL = strSQL & "WHERE" & vbCrLf
strSQL = strSQL & "  PROGRAM_ID NOT IN ('.SEPARATOR')" & vbCrLf
strSQL = strSQL & "ORDER BY" & vbCrLf
strSQL = strSQL & "  MENU_STRING"

snpData.Open strSQL, dbDatabase

If snpData.State = 1 Then
    strProgramName = snpData.GetRows(30000)
    snpData.Close
End If

'Set the number of elements in the strPermission array to match the number of application module names
ReDim strPermission(UBound(strProgramName, 2))

'Copy the module names into Excel
For j = 0 To UBound(strPermission)
    strPermission(j) = strProgramName(1, j) ' & " - " & strProgramName(0, j)
Next
With objExcel
    .Application.ScreenUpdating = False
    .ActiveSheet.Range(.ActiveSheet.Cells(1, 2), .ActiveSheet.Cells(1, 1 + UBound(strProgramName, 2))) = strPermission
End With

'Retrieve the list of users
strSQL = "SELECT" & vbCrLf
strSQL = strSQL & "  NAME" & vbCrLf
strSQL = strSQL & "FROM" & vbCrLf
strSQL = strSQL & "  USER_LIST" & vbCrLf
strSQL = strSQL & "WHERE" & vbCrLf
strSQL = strSQL & "  TYPE=1" & vbCrLf
strSQL = strSQL & "  AND NAME NOT LIKE '%#'" & vbCrLf
strSQL = strSQL & "ORDER BY" & vbCrLf
strSQL = strSQL & "  NAME"

snpData.Open strSQL, dbDatabase

If snpData.State = 1 Then
    strEmployeename = snpData.GetRows(30000)
    snpData.Close
End If

'Set the SQL statement to use to retrieve permissions, ? is a bind variable placeholder
strSQL = "SELECT" & vbCrLf
strSQL = strSQL & "  USER_ID," & vbCrLf
strSQL = strSQL & "  PROGRAM_ID," & vbCrLf
strSQL = strSQL & "  PERMISSION" & vbCrLf
strSQL = strSQL & "FROM" & vbCrLf
strSQL = strSQL & "  USER_PROGRAM_PERMISSION" & vbCrLf
strSQL = strSQL & "WHERE" & vbCrLf
strSQL = strSQL & "  PROGRAM_COMPONENT='PROGRAM'" & vbCrLf
strSQL = strSQL & "ORDER BY" & vbCrLf
strSQL = strSQL & "  USER_ID," & vbCrLf
strSQL = strSQL & "  PROGRAM_ID"

snpData.Open strSQL, dbDatabase

If snpData.State = 1 Then
    strAllPermission = snpData.GetRows(600000)
    snpData.Close
End If

strLastUser = ""
intLastPermission = 0

'Loop through all users
For i = 0 To UBound(strEmployeename, 2)
    'Reset the permissions for the next user
    For j = 0 To UBound(strPermission)
        strPermission(j) = "Y"
    Next

    For j = intLastPermission To UBound(strAllPermission, 2)
        If strAllPermission(0, j) = strEmployeename(0, i) Then
            'Examine the permissions for this user
            For k = 0 To UBound(strProgramName, 2)
                If strProgramName(0, k) = strAllPermission(1, j) Then
                    strPermission(k) = strAllPermission(2, j)
                    Exit For
                End If
            Next

        End If

        'Record the loop position so that we do not start at 0 for the next user
        intLastPermission = j

        If strAllPermission(0, j) > strEmployeename(0, i) Then
            'We have passed the last permission for this user, exit the For loop
            Exit For
        End If
    Next

    With objExcel
        .ActiveSheet.Cells(i + 2, 1) = strEmployeename(0, i)
        .ActiveSheet.Range(.ActiveSheet.Cells(i + 2, 2), .ActiveSheet.Cells(i + 2, 1 + UBound(strProgramName, 2))) = strPermission
    End With
Next

'Convert the number of columns into letter notation
strLastColumn = Chr(64 + Int((UBound(strProgramName, 2)) / 26)) & Chr(64 + ((UBound(strProgramName, 2)) Mod 26 + 1))

'Final cleanup
With objExcel
    .ActiveSheet.Range(.ActiveSheet.Cells(1, 2), .ActiveSheet.Cells(1, 1 + UBound(strProgramName, 2))).Orientation = 90
    .ActiveSheet.Columns("A:" & strLastColumn).AutoFit
    .Application.ScreenUpdating = True
    .ActiveSheet.Range("B2").Select
    .ActiveWindow.FreezePanes = True
End With

dbDatabase.Close

Set snpData = Nothing
Set dbDatabase = Nothing
Set objExcel = Nothing

While the second version of the script is better than the first, we are still sending three SQL statements to the server.  We can improve that with a Cartesian join.  Let’s take a look at version 3 of the script (IntentionalCartesian3-Cartesian.VBS – save as IntentionalCartesian3-Cartesian.VBS)

Dim i
Dim intUserCount
Dim intPermissionCount
Dim strLastColumn
Dim strLastUser
Dim strSQL
Dim strPermission(500)
Dim strModuleName(500)

Dim strUsername
Dim strPassword
Dim strDatabase
Dim snpData
Dim dbDatabase
Dim objExcel

Set dbDatabase = CreateObject("ADODB.Connection")
Set snpData = CreateObject("ADODB.Recordset")

On Error Resume Next

strUsername = "MyUsername"
strPassword = "MyPassword"
strDatabase = "MyDB"

dbDatabase.ConnectionString = "Provider=OraOLEDB.Oracle;Data Source=" & strDatabase & ";User ID=" & strUsername & ";Password=" & strPassword & ";"
dbDatabase.Open
'Should verify that the connection attempt was successful, but I will leave that for someone else to code

Set snpData = CreateObject("adodb.recordset")

'Create an Excel connection
Set objExcel = CreateObject("Excel.Application")

With objExcel
    .Workbooks.Add
    .ActiveWorkbook.Sheets.Add
    .ActiveSheet.Name = "Application Permissions"

    'Remove the three default worksheets
    For i = 1 To .ActiveWorkbook.Sheets.Count
        If .ActiveWorkbook.Sheets(i).Name = "Sheet1" Then
            .Sheets("Sheet1").Select
            .ActiveWindow.SelectedSheets.Delete
        End If
        If .ActiveWorkbook.Sheets(i).Name = "Sheet2" Then
            .Sheets("Sheet2").Select
            .ActiveWindow.SelectedSheets.Delete
        End If
        If .ActiveWorkbook.Sheets(i).Name = "Sheet3" Then
            .Sheets("Sheet3").Select
            .ActiveWindow.SelectedSheets.Delete
        End If
    Next

    .Visible = True
End With

strSQL = "SELECT" & vbCrLf
strSQL = strSQL & "  AUP.NAME USERNAME," & vbCrLf
strSQL = strSQL & "  AUP.MENU_STRING MODULE," & vbCrLf
strSQL = strSQL & "  NVL(UGA.PERMISSION,AUP.DEFAULT_PERMISSION) PERMISSION" & vbCrLf
strSQL = strSQL & "FROM" & vbCrLf
strSQL = strSQL & "  (SELECT" & vbCrLf
strSQL = strSQL & "    UL.NAME NAME," & vbCrLf
strSQL = strSQL & "    A.PROGRAM_ID," & vbCrLf
strSQL = strSQL & "    A.MENU_STRING," & vbCrLf
strSQL = strSQL & "    'Y' DEFAULT_PERMISSION" & vbCrLf
strSQL = strSQL & "  FROM" & vbCrLf
strSQL = strSQL & "    APPLICATION_LIST A," & vbCrLf
strSQL = strSQL & "    USER_LIST UL" & vbCrLf
strSQL = strSQL & "  WHERE" & vbCrLf
strSQL = strSQL & "    A.PROGRAM_ID NOT IN ('.SEPARATOR')" & vbCrLf
strSQL = strSQL & "    AND UL.NAME NOT LIKE '%#') AUP," & vbCrLf
strSQL = strSQL & "  (SELECT" & vbCrLf
strSQL = strSQL & "    USER_ID," & vbCrLf
strSQL = strSQL & "    PROGRAM_ID," & vbCrLf
strSQL = strSQL & "    PERMISSION" & vbCrLf
strSQL = strSQL & "  FROM" & vbCrLf
strSQL = strSQL & "    USER_PROGRAM_PERMISSION" & vbCrLf
strSQL = strSQL & "  WHERE" & vbCrLf
strSQL = strSQL & "    PROGRAM_COMPONENT='PROGRAM') UGA" & vbCrLf
strSQL = strSQL & "WHERE" & vbCrLf
strSQL = strSQL & "  AUP.NAME=UGA.USER_ID(+)" & vbCrLf
strSQL = strSQL & "  AND AUP.PROGRAM_ID=UGA.PROGRAM_ID(+)" & vbCrLf
strSQL = strSQL & "ORDER BY" & vbCrLf
strSQL = strSQL & "  AUP.NAME," & vbCrLf
strSQL = strSQL & "  AUP.MENU_STRING"

snpData.Open strSQL, dbDatabase

If snpData.State = 1 Then
  With objExcel
    .Application.ScreenUpdating = False
    strLastUser = ""
    intUserCount = 0
    Do While Not snpData.EOF
        If strLastUser <> snpData("username") Then
            If strLastUser <> "" Then
                'Write out the permissions for the previous user
                .ActiveSheet.Range(.ActiveSheet.Cells(intUserCount + 1, 1), .ActiveSheet.Cells(intUserCount + 1, 1 + intPermissionCount)) = strPermission
            End If
            If intUserCount = 1 Then
                'Write out the module names
                .ActiveSheet.Range(.ActiveSheet.Cells(1, 1), .ActiveSheet.Cells(1, 1 + intPermissionCount)) = strModuleName
            End If

            strPermission(0) = snpData("username")
            intPermissionCount = 0
            intUserCount = intUserCount + 1
        End If
        intPermissionCount = intPermissionCount + 1
        strPermission(intPermissionCount) = snpData("permission")
        strLastUser = snpData("username")

        If intUserCount = 1 Then
            'Record the module names
            strModuleName(intPermissionCount) = snpData("module")
        End If

        snpData.MoveNext
    Loop
    If strLastUser <> "" Then
        'Write out the permissions for the last user
        .ActiveSheet.Range(.ActiveSheet.Cells(intUserCount + 1, 1), .ActiveSheet.Cells(intUserCount + 1, 1 + intPermissionCount)) = strPermission
    End If

    strLastColumn = Chr(64 + Int((intPermissionCount) / 26)) & Chr(64 + ((intPermissionCount) Mod 26 + 1))
    .ActiveSheet.Range(.ActiveSheet.Cells(1, 2), .ActiveSheet.Cells(1, 1 + intPermissionCount)).Orientation = 90
    .ActiveSheet.Columns("A:" & strLastColumn).AutoFit
    .Application.ScreenUpdating = True
    .ActiveWindow.FreezePanes = False
    .ActiveSheet.Range("B2").Select
    .ActiveWindow.FreezePanes = True

    .Application.ScreenUpdating = True
  End With
End If

snpData.Close
dbDatabase.Close
Set snpData = Nothing
Set dbDatabase = Nothing
Set objExcel = Nothing

Notice in the above that the client-side code is much smaller, and we have collapsed the three SQL statements into a single SQL statement with the help of a Cartesian join between the APPLICATION_LIST and USER_LIST tables.  The end result looks like this:





Impact of the TRUNC Function on an Indexed Date Column

8 03 2010

March 8, 2010

A recent email from an ERP user’s group mailing list reminded me of a small problem in that ERP program’s modules related to the DATE columns in several of its tables.  In DATE columns that should only contain a date component, rows will occasionally be inserted by one of the ERP program’s modules with a date and time component, for example ’08-MAR-2010 13:01:13′ rather than just ’08-MAR-2010 00:00:00′.  This bug, or feature, leads to unexpected performance issues when normal B*Tree indexes are present on that date column.  To work around the time component in the DATE type columns, the ERP program modules frequently uses a technique like this to perform a comparisons on only the date component of a DATE type columns:

SELECT
  *
FROM
  T3
WHERE
  TRUNC(DATE_COLUMN) = :d1;

In the above D1 is a bind variable.  On occasion, the ERP program will instead pass in the date value as a constant/literal rather than as a bind variable.  What is wrong with the above syntax?  Would the above syntax be considered a bug if the DATE_COLUMN column had a normal B*Tree index?  Is there a better way to retrieve the required rows?  Incidentally, I started re-reading the book “Troubleshooting Oracle Performance” and I encountered a couple of interesting sentences on page 7 that seem to address this issue:

“For all intents and purposes, an application experiencing poor performance is no worse [should probably state no better] than an application failing to fulfill its functional requirements. In both situations, the application is useless.”

Let’s try a couple of experiments to see why the above SQL statement requires improvement.

First, we will create table T2 that will serve as a nearly sequential ordered row source with a small amount of randomization introduced into the data by the DBMS_RANDOM function.  This row source will be used to help duplicate the essentially random arrival rate of transactions into our T3 test table:

CREATE TABLE T2 AS
SELECT
  DBMS_RANDOM.VALUE(0,0.55555)+ROWNUM/10000 DAYS
FROM
  (SELECT
    ROWNUM RN
  FROM
    DUAL
  CONNECT BY
    LEVEL<=10000) V1,
  (SELECT
    ROWNUM RN
  FROM
    DUAL
  CONNECT BY
    LEVEL<=10000) V2;

The date column in our T3 table is derived from SYSDATE, so ideally the rows should be in order by the DAYS column in table T2.  In a production environment, on rare occasion someone will slip in a row that is not in sequential order through an edit of the DATE column for a row, so we should be able to simulate that slight randomness by creating another table from table T2 before generating table T3 (the rows will fill the table blocks with an occasional row that is out of date sequence):

CREATE TABLE T2_ORDERED NOLOGGING AS
SELECT
  DAYS
FROM
  T2
ORDER BY
  DAYS;

For our simulation, we have a final problem that needs to be addressed.  The volume of data entered on a Saturday in the production database is less than that for a Monday through Friday, and the volume of data entered on a Sunday is less than that entered on a Saturday.  To add just a little more randomness, we will insert the rows into table T3 based on the following criteria:

  • 90% chance of a row from T2_ORDERED being included if the date falls on a Monday through Friday
  • 20% chance of a row from T2_ORDERED being included if the date falls on a Saturday
  • 10% chance of a row from T2_ORDERED being included if the date falls on a Sunday

The SQL statement to build table T3 follows:

CREATE TABLE T3 NOLOGGING AS
SELECT
  DAYS+TO_DATE('01-JAN-1990','DD-MON-YYYY') C1,
  DAYS+TO_DATE('01-JAN-1990','DD-MON-YYYY') C2,
  LPAD('A',255,'A') C3
FROM
  T2_ORDERED
WHERE
  DECODE(TO_CHAR(DAYS+TO_DATE('01-JAN-1990','DD-MON-YYYY'),'D'),'1',0.9,'7',0.8,0.1)<DBMS_RANDOM.VALUE(0,1);

CREATE INDEX IND_T3_C2 ON T3(C2);

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T3',CASCADE=>TRUE)

ALTER TABLE T3 MODIFY C1 NOT NULL;
ALTER TABLE T3 MODIFY C2 NOT NULL;

Let’s check the data distribution:

SELECT
  COUNT(*) NUM_ROWS,
  SUM(DECODE(TO_CHAR(C1,'D'),'6',1,0)) FRIDAYS,
  SUM(DECODE(TO_CHAR(C1,'D'),'7',1,0)) SATURDAYS,
  SUM(DECODE(TO_CHAR(C1,'D'),'1',1,0)) SUNDAYS
FROM
  T3;

  NUM_ROWS    FRIDAYS  SATURDAYS    SUNDAYS
---------- ---------- ---------- ----------
68,579,287 12,858,100  2,855,164  1,428,569

From the above we are able to determine that roughly 18.7% of the rows have a date that is on a Friday, roughly 4.2% of the rows have a date that is on a Saturday, and 2.1% of the rows are on a Sunday.

This test will be performed on Oracle Database 11.2.0.1 with the __DB_CACHE_SIZE hidden parameter floating at roughly 7,381,975,040 (6.875GB).  I will use my toy project for performance tuning to submit the SQL statements and retrieve the DBMS_XPLAN output, but the same could be accomplished with just SQL*Plus (most tests can also be performed using my Automated DBMS_XPLAN tool).

Let’s start simple, we will start with a simple SQL statement to retrieve the rows with today’s date (March 8, 2010) using literals against the indexed column.  I will execute each SQL statement twice to take advantage of any previously cached blocks in the buffer cache, and eliminate the time consumed by the hard parse:

SELECT
  C1,
  C2,
  C3
FROM
  T3
WHERE
  TRUNC(C2) = TO_DATE('08-MAR-2010','DD-MON-YYYY');

SQL_ID  3us49wsdzdun3, child number 1
-------------------------------------
SELECT    C1,    C2,    C3  FROM    T3  WHERE    TRUNC(C2) =
TO_DATE('08-MAR-2010','DD-MON-YYYY')

Plan hash value: 4161002650

---------------------------------------------------------------------------------------------
| Id  | Operation         | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |      1 |        |   9114 |00:02:31.93 |    2743K|   2743K|
|*  1 |  TABLE ACCESS FULL| T3   |      1 |   9114 |   9114 |00:02:31.93 |    2743K|   2743K|
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(TRUNC(INTERNAL_FUNCTION("C2"))=TO_DATE(' 2010-03-08 00:00:00',
              'syyyy-mm-dd hh24:mi:ss'))

Note
-----
   - cardinality feedback used for this statement

Roughly 2 minutes and 32 seconds.  Notice the Note at the bottom of the DBMS_XPLAN output, cardinality feedback (apparently not documented) is a new feature in Oracle Database 11.2.0.1 (see here for a related blog article).  The first execution required 2 minutes and 33 seconds, but a predicted cardinality of 685,000 rows (1% of the total) was returned for the first execution.  The second execution generated a second child cursor with a corrected cardinality estimate based on the actual number of rows returned during the first execution.  2 minutes and 32 seconds is not bad, unless this is an OLTP application and an end user is waiting for the application to return the rows.

Let’s try again with a modified, equivalent SQL statement, again executing the SQL statement twice:

SELECT
  C1,
  C2,
  C3
FROM
  T3
WHERE
  C2 >= TO_DATE('08-MAR-2010','DD-MON-YYYY')
  AND C2 < TO_DATE('08-MAR-2010','DD-MON-YYYY')+1;

SQL_ID  c7jfpa0rpt95a, child number 0
-------------------------------------
SELECT    C1,    C2,    C3  FROM    T3  WHERE    C2 >=
TO_DATE('08-MAR-2010','DD-MON-YYYY')    AND C2 <
TO_DATE('08-MAR-2010','DD-MON-YYYY')+1

Plan hash value: 4176467757

---------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |      1 |        |   9114 |00:00:00.02 |     575 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T3        |      1 |   6859 |   9114 |00:00:00.02 |     575 |
|*  2 |   INDEX RANGE SCAN          | IND_T3_C2 |      1 |   6859 |   9114 |00:00:00.01 |     118 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C2">=TO_DATE(' 2010-03-08 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "C2"<TO_DATE(' 2010-03-09 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

0.02 seconds compared to 2 minutes and 32 seconds.  You will notice that the estimated number of rows, while not exact, is reasonably close even without a cardinality feedback adjustment.  Also notice that the optimizer adjusted the date calculation that was in the WHERE clause of the SQL statement.

Let’s try again with a second modified, equivalent SQL statement, again executing the SQL statement twice:

SELECT
  C1,
  C2,
  C3
FROM
  T3
WHERE
  C2 BETWEEN TO_DATE('08-MAR-2010','DD-MON-YYYY')
    AND TO_DATE('08-MAR-2010','DD-MON-YYYY') + (1-1/24/60/60);

SQL_ID  7xthpspukrbtv, child number 0
-------------------------------------
SELECT    C1,    C2,    C3  FROM    T3  WHERE    C2 BETWEEN
TO_DATE('08-MAR-2010','DD-MON-YYYY')      AND
TO_DATE('08-MAR-2010','DD-MON-YYYY') + (1-1/24/60/60)

Plan hash value: 4176467757

---------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |      1 |        |   9114 |00:00:00.02 |     575 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T3        |      1 |   6860 |   9114 |00:00:00.02 |     575 |
|*  2 |   INDEX RANGE SCAN          | IND_T3_C2 |      1 |   6860 |   9114 |00:00:00.01 |     118 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C2">=TO_DATE(' 2010-03-08 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "C2"<=TO_DATE(' 2010-03-08 23:59:59', 'syyyy-mm-dd hh24:mi:ss'))

0.02 seconds again, and the estimated number of rows is roughly the same as we achieved with the previous SQL statement.  By checking the Predicate Information section of the DBMS_XPLAN output we see that the optimizer has transformed the BETWEEN syntax into roughly the same syntax as was used in the previous SQL statement.

Let’s try again with bind variables (the bind variable names are automatically changed by ADO into generic names, and that is why the bind variable appears in the execution plan as :1 rather than :d1):

SELECT
  C1,
  C2,
  C3
FROM
  T3
WHERE
  TRUNC(C2) = :d1;

SQL_ID  cub25jm7y8zck, child number 0
-------------------------------------
SELECT    C1,    C2,    C3  FROM    T3  WHERE    TRUNC(C2) = :1

Plan hash value: 4161002650

---------------------------------------------------------------------------------------------
| Id  | Operation         | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |      1 |        |   9114 |00:02:33.37 |    2743K|   2743K|
|*  1 |  TABLE ACCESS FULL| T3   |      1 |    685K|   9114 |00:02:33.37 |    2743K|   2743K|
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(TRUNC(INTERNAL_FUNCTION("C2"))=:1)

Notice that our friendly note about Cardinality Feedback did not appear this time, and that the cardinality estimate was not corrected when the SQL statement was executed for the second time, even though bind variable peeking did happen.  The incorrect cardinality estimate would not have changed the execution plan for this SQL statement, but could impact the execution plan if table T3 were joined to another table.

Let’s try the other equivalent SQL statement with bind variables:

SELECT
  C1,
  C2,
  C3
FROM
  T3
WHERE
  C2 >= :d1
  AND C2 < :d2 +1;

SQL_ID  9j2a54zbzb9cz, child number 0
-------------------------------------
SELECT    C1,    C2,    C3  FROM    T3  WHERE    C2 >= :1    AND C2 <
:2 +1

Plan hash value: 3025660695

----------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |           |      1 |        |   9114 |00:00:00.02 |     575 |
|*  1 |  FILTER                      |           |      1 |        |   9114 |00:00:00.02 |     575 |
|   2 |   TABLE ACCESS BY INDEX ROWID| T3        |      1 |   6859 |   9114 |00:00:00.02 |     575 |
|*  3 |    INDEX RANGE SCAN          | IND_T3_C2 |      1 |   6859 |   9114 |00:00:00.01 |     118 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(:1<:2+1)
   3 - access("C2">=:1 AND "C2"<:2+1)

The optimizer estimated that 6,859 rows would be returned, just as it did when we used literals in the SQL statement  because of bind variable peeking.  In case you are wondering, the same estimated row cardinality was returned when the D2 bind variable was set to ’09-MAR-2010′ in the application and the +1 was removed from the SQL statement.

Quite the problem we caused by pretending to not understand the impact of using a function in the WHERE clause on an indexed column.  We could create a function based index to work around the problem of the application programmers not knowing how to specify a specific date without using the TRUNC function on a DATE column:

CREATE INDEX IND_T3_C2_FBI ON T3(TRUNC(C2));

ALTER SYSTEM FLUSH SHARED_POOL;

But is creating a function based index the best approach, or have we just created another problem rather than attacking the root cause of the original problem?  We now have two indexes on the same column that need to be updated every time a row is inserted or deleted in table T3, and also maintained every time that column is updated (even when updated with the same value).  Let's experiment with the function based index.

SELECT
  C1,
  C2,
  C3
FROM
  T3
WHERE
  TRUNC(C2) = TO_DATE('08-MAR-2010','DD-MON-YYYY');

SQL_ID  3us49wsdzdun3, child number 1
-------------------------------------
SELECT    C1,    C2,    C3  FROM    T3  WHERE    TRUNC(C2) =
TO_DATE('08-MAR-2010','DD-MON-YYYY')

Plan hash value: 3662266936

-------------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name          | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |               |      1 |        |   9114 |00:00:00.01 |     576 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T3            |      1 |   9114 |   9114 |00:00:00.01 |     576 |
|*  2 |   INDEX RANGE SCAN          | IND_T3_C2_FBI |      1 |   9114 |   9114 |00:00:00.01 |     119 |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("T3"."SYS_NC00004$"=TO_DATE(' 2010-03-08 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

Note
-----
   - cardinality feedback used for this statement

Cardinality feedback again helps out the cardinality estimate on the second execution, but look at the Predicate Information section of the execution plan.  We have now increased the difficulty of walking through a complicated execution plan with the help of the Predicate Information section of the execution plan to see how the predicates in the WHERE clause are applied to the execution plan.  Not so bad, right?  What happens if this column C2 is joined to a column in another table, or even specified as being equal to column C1 in this table?  Let's take a look:

SELECT
  C1,
  C2,
  C3
FROM
  T3
WHERE
  C2 >= TO_DATE('08-MAR-2010','DD-MON-YYYY')
  AND C2 < TO_DATE('08-MAR-2010','DD-MON-YYYY')+1
  AND C2=C1;

SQL_ID  27rqhg1mpmzt9, child number 1
-------------------------------------
SELECT    C1,    C2,    C3  FROM    T3  WHERE    C2 >=
TO_DATE('08-MAR-2010','DD-MON-YYYY')    AND C2 <
TO_DATE('08-MAR-2010','DD-MON-YYYY')+1    AND C2=C1

Plan hash value: 4176467757

---------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |      1 |        |   9114 |00:00:00.01 |     575 |
|*  1 |  TABLE ACCESS BY INDEX ROWID| T3        |      1 |   9114 |   9114 |00:00:00.01 |     575 |
|*  2 |   INDEX RANGE SCAN          | IND_T3_C2 |      1 |   9114 |   9114 |00:00:00.01 |     118 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(("C2"="C1" AND "C1">=TO_DATE(' 2010-03-08 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss') AND "C1"<TO_DATE(' 2010-03-09 00:00:00', 'syyyy-mm-dd hh24:mi:ss')))
   2 - access("C2">=TO_DATE(' 2010-03-08 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "C2"<TO_DATE(' 2010-03-09 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

Note
-----
   - cardinality feedback used for this statement

On the first execution the E-Rows column for plan ID 1 showed that the cardinality estimate was 1 row, and on the second execution the cardinality estimate was corrected to 9114.  Notice that transitive closure took place - the filter operation on plan ID 1 shows the same restrictions for column C1 as had applied to column C2 in the WHERE clause.

Let's try again with the SQL statement using the TRUNC function - this SQL statement will use the function based index:

SELECT
  C1,
  C2,
  C3
FROM
  T3
WHERE
  TRUNC(C2) = TO_DATE('08-MAR-2010','DD-MON-YYYY')
  AND C2=C1;

SQL_ID  ftu92j3z99ppr, child number 1
-------------------------------------
SELECT    C1,    C2,    C3  FROM    T3  WHERE    TRUNC(C2) =
TO_DATE('08-MAR-2010','DD-MON-YYYY')    AND C2=C1

Plan hash value: 3662266936

-------------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name          | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |               |      1 |        |   9114 |00:00:00.01 |     576 |
|*  1 |  TABLE ACCESS BY INDEX ROWID| T3            |      1 |   9114 |   9114 |00:00:00.01 |     576 |
|*  2 |   INDEX RANGE SCAN          | IND_T3_C2_FBI |      1 |   9114 |   9114 |00:00:00.01 |     119 |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("C2"="C1")
   2 - access("T3"."SYS_NC00004$"=TO_DATE(' 2010-03-08 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

Note
-----
   - cardinality feedback used for this statement

The cardinality estimate is again correct because of cardinality feedback, but notice what is missing from the Predicate Information section of the execution plan (transitive closure did not happen).

So, does the use of TRUNC(DATE_COLUMN) without the presence of a function based index qualify as an application bug?  What if a the function based index is present - is it still a bug?

Something possibly interesting, but unrelated.  I executed the following commands:

ALTER INDEX IND_T3_C2_FBI UNUSABLE;

(perform a little more testing)

ALTER INDEX IND_T3_C2_FBI REBUILD;

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T3',CASCADE=>TRUE)

I received the following after several minutes:

BEGIN DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T3',CASCADE=>TRUE);
END;

*
ERROR at line 1:
ORA-00600: internal error code, arguments: [15851], [3], [2], [1], [1], [], [],
[], [], [], [], []
ORA-06512: at "SYS.DBMS_STATS", line 20337
ORA-06512: at "SYS.DBMS_STATS", line 20360
ORA-06512: at line 1

The same error appeared without the CASCADE option, but a call to collect statistics on the indexes for the table, as well as other tables completes successfully.  I may look at this problem again later.

Continuing, we will create another table:

CREATE TABLE T4 NOLOGGING AS
SELECT
  *
FROM
  T3
WHERE
  C2 BETWEEN TO_DATE('01-JAN-2010','DD-MON-YYYY')
    AND TO_DATE('08-MAR-2010','DD-MON-YYYY') + (1-1/24/60/60);

CREATE INDEX IND_T4_C2 ON T4(C2);

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T4',CASCADE=>TRUE)

Before we start, let's take a look at the disk space used by the objects and the automatically allocated extent sizes:

SELECT
  SEGMENT_NAME SEGMENT,
  (SUM(BYTES))/1048576 TOTAL_MB
FROM
  DBA_EXTENTS
WHERE
  OWNER=USER
  AND SEGMENT_NAME IN ('IND_T3_C2','IND_T3_C2_FBI','T3','T4','IND_T4_C2')
GROUP BY
  SEGMENT_NAME
ORDER BY
  SEGMENT_NAME;

SEGMENT           TOTAL_MB
--------------- ----------
IND_T3_C2             1469
IND_T3_C2_FBI         1472
IND_T4_C2               10
T3                   21480
T4                     144  

SELECT
  SEGMENT_NAME SEGMENT,
  COUNT(*) EXTENTS,
  BYTES/1024 EXT_SIZE_KB,
  (COUNT(*) * BYTES)/1048576 TOTAL_MB
FROM
  DBA_EXTENTS
WHERE
  OWNER=USER
  AND SEGMENT_NAME IN ('IND_T3_C2','IND_T3_C2_FBI','T3','T4','IND_T4_C2')
GROUP BY
  SEGMENT_NAME,
  BYTES
ORDER BY
  SEGMENT_NAME,
  BYTES;

SEGMENT            EXTENTS EXT_SIZE_KB   TOTAL_MB
--------------- ---------- ----------- ----------
IND_T3_C2               16          64          1
IND_T3_C2               63        1024         63
IND_T3_C2              120        8192        960
IND_T3_C2                1       27648         27
IND_T3_C2                1       34816         34
IND_T3_C2                6       65536        384
IND_T3_C2_FBI           16          64          1
IND_T3_C2_FBI           63        1024         63
IND_T3_C2_FBI          120        8192        960
IND_T3_C2_FBI            7       65536        448
IND_T4_C2               16          64          1
IND_T4_C2                9        1024          9
T3                      16          64          1
T3                      63        1024         63
T3                     120        8192        960
T3                       1       19456         19
T3                       1       43008         42
T3                       1       44032         43
T3                     318       65536      20352
T4                      16          64          1
T4                      63        1024         63
T4                      10        8192         80

Table T3 is using about 21GB of space while table T4 is using about 144MB of space.  We occasionally received an extent size that is not a power of 2 in size - a bit unexpected.  Let's try a couple of SQL statements that access the two tables:

SELECT
  T3.C1,
  T3.C2,
  T4.C3
FROM
  T3,
  T4
WHERE
  TRUNC(T3.C2) = TO_DATE('08-MAR-2010','DD-MON-YYYY')
  AND T3.C2=T4.C2;

SQL_ID  f2v7cf7w2bwqq, child number 0
-------------------------------------
SELECT    T3.C1,    T3.C2,    T4.C3  FROM    T3,    T4  WHERE   
TRUNC(T3.C2) = TO_DATE('08-MAR-2010','DD-MON-YYYY')     AND T3.C2=T4.C2

Plan hash value: 1631978485

--------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name          | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |               |      1 |        |  10044 |00:00:00.38 |   18622 |     25 |       |       |          |
|*  1 |  HASH JOIN                   |               |      1 |   7095 |  10044 |00:00:00.38 |   18622 |     25 |  1223K|  1223K| 1593K (0)|
|   2 |   TABLE ACCESS BY INDEX ROWID| T3            |      1 |   6857 |   9114 |00:00:00.04 |     394 |     25 |       |       |          |
|*  3 |    INDEX RANGE SCAN          | IND_T3_C2_FBI |      1 |   6857 |   9114 |00:00:00.03 |      28 |     25 |       |       |          |
|   4 |   TABLE ACCESS FULL          | T4            |      1 |    452K|    452K|00:00:00.12 |   18228 |      0 |       |       |          |
--------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T3"."C2"="T4"."C2")
   3 - access("T3"."SYS_NC00004$"=TO_DATE(' 2010-03-08 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

The above used a full table scan on table T4, and you will notice that a filter predicate is not applied to table T4 to reduce the number of rows entering the hash join.  Transitive closure did not take place.  Let's try again with the SQL statement with the other syntax that does not use the TRUNC function, nor the function based index:

SELECT
  T3.C1,
  T3.C2,
  T4.C3
FROM
  T3,
  T4
WHERE
  T3.C2 BETWEEN TO_DATE('08-MAR-2010','DD-MON-YYYY')
    AND TO_DATE('08-MAR-2010','DD-MON-YYYY') + (1-1/24/60/60)
  AND T3.C2=T4.C2;

SQL_ID  5swqbjak147vk, child number 0
-------------------------------------
SELECT    T3.C1,    T3.C2,    T4.C3  FROM    T3,    T4  WHERE    T3.C2
BETWEEN TO_DATE('08-MAR-2010','DD-MON-YYYY')       AND
TO_DATE('08-MAR-2010','DD-MON-YYYY') + (1-1/24/60/60)    AND T3.C2=T4.C2

Plan hash value: 3991319422

----------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |           |      1 |        |  10044 |00:00:00.06 |     983 |      1 |       |       |          |
|*  1 |  HASH JOIN                   |           |      1 |   6761 |  10044 |00:00:00.06 |     983 |      1 |  1223K|  1223K| 1618K (0)|
|   2 |   TABLE ACCESS BY INDEX ROWID| T3        |      1 |   6860 |   9114 |00:00:00.01 |     393 |      0 |       |       |          |
|*  3 |    INDEX RANGE SCAN          | IND_T3_C2 |      1 |   6860 |   9114 |00:00:00.01 |      27 |      0 |       |       |          |
|   4 |   TABLE ACCESS BY INDEX ROWID| T4        |      1 |   6762 |   9114 |00:00:00.03 |     590 |      1 |       |       |          |
|*  5 |    INDEX RANGE SCAN          | IND_T4_C2 |      1 |   6762 |   9114 |00:00:00.01 |     127 |      1 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T3"."C2"="T4"."C2")
   3 - access("T3"."C2">=TO_DATE(' 2010-03-08 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "T3"."C2"<=TO_DATE(' 2010-03-08
              23:59:59', 'syyyy-mm-dd hh24:mi:ss'))
   5 - access("T4"."C2">=TO_DATE(' 2010-03-08 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "T4"."C2"<=TO_DATE(' 2010-03-08
              23:59:59', 'syyyy-mm-dd hh24:mi:ss'))

Notice this time that transitive closure happened, allowing the optimizer to take advantage of the IND_T4_C2 index on table T4.

You are probably thinking, it must be that we need a function based index on the C2 column of table T4 also to allow transitive closure to happen.  Let's try:

CREATE INDEX IND_T4_C2_FBI ON T4(TRUNC(C2));

ALTER SYSTEM FLUSH SHARED_POOL;

Now our SQL statement again:

SELECT
  T3.C1,
  T3.C2,
  T4.C3
FROM
  T3,
  T4
WHERE
  TRUNC(T3.C2) = TO_DATE('08-MAR-2010','DD-MON-YYYY')
  AND T3.C2=T4.C2;

SQL_ID  f2v7cf7w2bwqq, child number 0
-------------------------------------
SELECT    T3.C1,    T3.C2,    T4.C3  FROM    T3,    T4  WHERE   
TRUNC(T3.C2) = TO_DATE('08-MAR-2010','DD-MON-YYYY')     AND T3.C2=T4.C2

Plan hash value: 1631978485

-----------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name          | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |               |      1 |        |  10044 |00:00:00.33 |   18622 |       |       |          |
|*  1 |  HASH JOIN                   |               |      1 |   7095 |  10044 |00:00:00.33 |   18622 |  1223K|  1223K| 1584K (0)|
|   2 |   TABLE ACCESS BY INDEX ROWID| T3            |      1 |   6857 |   9114 |00:00:00.01 |     394 |       |       |          |
|*  3 |    INDEX RANGE SCAN          | IND_T3_C2_FBI |      1 |   6857 |   9114 |00:00:00.01 |      28 |       |       |          |
|   4 |   TABLE ACCESS FULL          | T4            |      1 |    452K|    452K|00:00:00.11 |   18228 |       |       |          |
-----------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T3"."C2"="T4"."C2")
   3 - access("T3"."SYS_NC00004$"=TO_DATE(' 2010-03-08 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

As expected, the function based index on column C4 of table T4 was not used because transitive closure did not happen.  Do we still want to do it the wrong way?  The execution time could have been much longer than 0.33 seconds, of course, if table T4 were much larger and a large number of physical reads were required.  Try again using a larger table T4:

DROP TABLE T4 PURGE;

CREATE TABLE T4 NOLOGGING AS
SELECT
  *
FROM
  T3
WHERE
  C2 BETWEEN TO_DATE('01-JAN-2000','DD-MON-YYYY')
    AND TO_DATE('08-MAR-2010','DD-MON-YYYY') + (1-1/24/60/60);

CREATE INDEX IND_T4_C2 ON T4(C2);

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T4',CASCADE=>TRUE)

Table T4 now requires about 7.9GB of disk space.  Now a range scan that accesses tables T3 and T4 (each SQL statement is executed twice, with the last execution plan reported):

SELECT
  T3.C1,
  T3.C2,
  T4.C3
FROM
  T3,
  T4
WHERE
  TRUNC(T3.C2) BETWEEN TO_DATE('08-MAR-2009','DD-MON-YYYY')
    AND TO_DATE('01-JUL-2009','DD-MON-YYYY')
  AND T3.C2=T4.C2;

SQL_ID  2d4f5x92axqgn, child number 0
-------------------------------------
SELECT    T3.C1,    T3.C2,    T4.C3  FROM    T3,    T4  WHERE   
TRUNC(T3.C2) BETWEEN TO_DATE('08-MAR-2009','DD-MON-YYYY')      AND
TO_DATE('01-JUL-2009','DD-MON-YYYY')    AND T3.C2=T4.C2

Plan hash value: 1631978485

--------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name          | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |               |      1 |        |    874K|00:00:33.24 |    1062K|    302K|       |       |          |
|*  1 |  HASH JOIN                   |               |      1 |    849K|    874K|00:00:33.24 |    1062K|    302K|    33M|  5591K|   50M (0)|
|   2 |   TABLE ACCESS BY INDEX ROWID| T3            |      1 |    802K|    795K|00:00:00.56 |   33957 |      0 |       |       |          |
|*  3 |    INDEX RANGE SCAN          | IND_T3_C2_FBI |      1 |    802K|    795K|00:00:00.20 |    2115 |      0 |       |       |          |
|   4 |   TABLE ACCESS FULL          | T4            |      1 |     25M|     25M|00:00:17.13 |    1028K|    302K|       |       |          |
--------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T3"."C2"="T4"."C2")
   3 - access("T3"."SYS_NC00004$">=TO_DATE(' 2009-03-08 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "T3"."SYS_NC00004$"<=TO_DATE('
              2009-07-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

Notice the full table scan of table T4.  Now the other SQL statement:

SELECT
  T3.C1,
  T3.C2,
  T4.C3
FROM
  T3,
  T4
WHERE
  T3.C2 BETWEEN TO_DATE('08-MAR-2009','DD-MON-YYYY')
    AND TO_DATE('01-JUL-2009','DD-MON-YYYY') + (1-1/24/60/60)
  AND T3.C2=T4.C2;

SQL_ID  539d93k50ruz3, child number 0
-------------------------------------
SELECT    T3.C1,    T3.C2,    T4.C3  FROM    T3,    T4  WHERE    T3.C2
BETWEEN TO_DATE('08-MAR-2009','DD-MON-YYYY')      AND
TO_DATE('01-JUL-2009','DD-MON-YYYY') + (1-1/24/60/60)    AND T3.C2=T4.C2

Plan hash value: 1243183227

-----------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |           |      1 |        |    874K|00:00:05.80 |   85051 |    574 |       |       |          |
|   1 |  MERGE JOIN                   |           |      1 |    795K|    874K|00:00:05.80 |   85051 |    574 |       |       |          |
|   2 |   TABLE ACCESS BY INDEX ROWID | T4        |      1 |    795K|    795K|00:00:02.43 |   51097 |    574 |       |       |          |
|*  3 |    INDEX RANGE SCAN           | IND_T4_C2 |      1 |    795K|    795K|00:00:00.41 |   10841 |      0 |       |       |          |
|*  4 |   SORT JOIN                   |           |    795K|    795K|    874K|00:00:02.00 |   33954 |      0 |    30M|  1977K|   26M (0)|
|   5 |    TABLE ACCESS BY INDEX ROWID| T3        |      1 |    795K|    795K|00:00:00.50 |   33954 |      0 |       |       |          |
|*  6 |     INDEX RANGE SCAN          | IND_T3_C2 |      1 |    795K|    795K|00:00:00.17 |    2114 |      0 |       |       |          |
-----------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T4"."C2">=TO_DATE(' 2009-03-08 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "T4"."C2"<=TO_DATE(' 2009-07-01
              23:59:59', 'syyyy-mm-dd hh24:mi:ss'))
   4 - access("T3"."C2"="T4"."C2")
       filter("T3"."C2"="T4"."C2")
   6 - access("T3"."C2">=TO_DATE(' 2009-03-08 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "T3"."C2"<=TO_DATE(' 2009-07-01
              23:59:59', 'syyyy-mm-dd hh24:mi:ss'))

Notice that the above used the index on table T4, but performed a sort-merge join between the two tables.  We are able to force a hash join, as was used with the other SQL statement, by applying a hint:

SQL_ID  b9q6tf6p6x2m0, child number 0
-------------------------------------
SELECT /*+ USE_HASH (T3 T4) */    T3.C1,    T3.C2,    T4.C3  FROM   
T3,    T4  WHERE    T3.C2 BETWEEN TO_DATE('08-MAR-2009','DD-MON-YYYY') 
    AND TO_DATE('01-JUL-2009','DD-MON-YYYY') + (1-1/24/60/60)    AND
T3.C2=T4.C2

Plan hash value: 3991319422

-------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |           |      1 |        |    874K|00:00:03.60 |   85051 |       |       |          |
|*  1 |  HASH JOIN                   |           |      1 |    795K|    874K|00:00:03.60 |   85051 |    33M|  5591K|   50M (0)|
|   2 |   TABLE ACCESS BY INDEX ROWID| T3        |      1 |    795K|    795K|00:00:00.54 |   33954 |       |       |          |
|*  3 |    INDEX RANGE SCAN          | IND_T3_C2 |      1 |    795K|    795K|00:00:00.19 |    2114 |       |       |          |
|   4 |   TABLE ACCESS BY INDEX ROWID| T4        |      1 |    795K|    795K|00:00:01.44 |   51097 |       |       |          |
|*  5 |    INDEX RANGE SCAN          | IND_T4_C2 |      1 |    795K|    795K|00:00:00.40 |   10841 |       |       |          |
-------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T3"."C2"="T4"."C2")
   3 - access("T3"."C2">=TO_DATE(' 2009-03-08 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "T3"."C2"<=TO_DATE(' 2009-07-01
              23:59:59', 'syyyy-mm-dd hh24:mi:ss'))
   5 - access("T4"."C2">=TO_DATE(' 2009-03-08 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "T4"."C2"<=TO_DATE(' 2009-07-01
              23:59:59', 'syyyy-mm-dd hh24:mi:ss'))

9.2 times faster (just 5.7 times faster without the hint) by not using the TRUNC function and function-based index combination.  Are we able to just agree to do it the right way, or should I continue?  Without the function based index we receive an execution plan like this:

SQL_ID  2d4f5x92axqgn, child number 0
-------------------------------------
SELECT    T3.C1,    T3.C2,    T4.C3  FROM    T3,    T4  WHERE   
TRUNC(T3.C2) BETWEEN TO_DATE('08-MAR-2009','DD-MON-YYYY')      AND
TO_DATE('01-JUL-2009','DD-MON-YYYY')    AND T3.C2=T4.C2

Plan hash value: 1396201636

-------------------------------------------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
-------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |    874K|00:03:05.88 |    3771K|   3013K|       |       |          |
|*  1 |  HASH JOIN         |      |      1 |    849K|    874K|00:03:05.88 |    3771K|   3013K|    33M|  5591K|   52M (0)|
|*  2 |   TABLE ACCESS FULL| T3   |      1 |    802K|    795K|00:02:34.36 |    2743K|   2743K|       |       |          |
|   3 |   TABLE ACCESS FULL| T4   |      1 |     25M|     25M|00:00:15.72 |    1028K|    270K|       |       |          |
-------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T3"."C2"="T4"."C2")
   2 - filter((TRUNC(INTERNAL_FUNCTION("C2"))>=TO_DATE(' 2009-03-08 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              TRUNC(INTERNAL_FUNCTION("C2"))<=TO_DATE(' 2009-07-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')))

A full table scan of a 21GB table and 7.9GB table, with 795,000 rows from the large table and 25,000,000 rows from the small table entering the hash join - probably not too good for performance.  Fix the performance bug in the application and let the user get back to counting the pencils in the pencil jar 9.2 times faster (or 51.6 times faster if there is no function based index).

While you might not frequently join two tables on a DATE column as I have done in this demonstration, how common is it to store numeric data in a VARCHAR2 column, and then need to be able to compare those values with numbers stored in NUMBER type columns, with literals, or numeric bind variables?





Query Performance Problem, or is Something Else Happening?

3 03 2010

March 3, 2010

Let’s say that you encouter a query that is taking an unexpectedly long time to run, possibly 30 minutes.  Maybe the query looks something like this:

SELECT   mds.messagepartdata, ms.status, mi.ID, mi.messageguid, mi.channel,
         ROWNUM AS messagecount
    FROM pfmq_messagedata md,
         pfmq_messagedatastorage mds,
         pfmq_messageinfo mi,
         pfmq_messagestatus ms
   WHERE (    mi.queuename = 'CL312911032'
          AND mi.ID = ms.ID
          AND mi.ID = md.ID
          AND mi.ID = mds.ID
          AND md.ID = mds.ID
          AND md.messageparttype = mds.messageparttype
          AND md.messageparttype = 1
          AND (ms.statusrevisionnumber = (SELECT MAX (statusrevisionnumber)
                                            FROM pfmq_messagestatus ms2
                                           WHERE ms2.ID = ms.ID)
              )
         )
     AND ((ms.status = 64) AND (mi.direction = 1) AND mi.messagetype = 0)
ORDER BY mi.sequenceordinalnumber, mi.senttime

In the above, the table PFMQ_MESSAGEDATASTORAGE contains a long raw column – the MESSAGEPARTDATA column.  You enable a 10046 trace, but only at level 1 – so there are no wait events written to the trace file.  The resulting trace file is then processed using TKPROF, and you obtain the following output:

call     count       cpu    elapsed       disk      query    current  rows
------- ------  -------- ---------- ---------- ---------- ---------- -----
Parse        1      0.00       0.00          0          0          0     0
Execute      1      0.00       0.00          0          0          0     0
Fetch     4321     14.56     580.31     231750     746064          0 64806
------- ------  -------- ---------- ---------- ---------- ---------- -----
total     4323     14.56     580.31     231750     746064          0 64806

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 29

Rows     Row Source Operation
-------  ---------------------------------------------------
  64806  SORT ORDER BY (cr=681258 r=184767 w=0 time=403515790 us)
  64806   COUNT  (cr=681258 r=184767 w=0 time=1729762996 us)
  64806    NESTED LOOPS  (cr=681258 r=184767 w=0 time=1729717540 us)
  64806     NESTED LOOPS  (cr=486600 r=92648 w=0 time=901417748 us)
  64806      NESTED LOOPS  (cr=356748 r=46572 w=0 time=268980743 us)
  64820       TABLE ACCESS BY INDEX ROWID PFMQ_MESSAGEINFO (cr=31514 r=23422 w=0 time=44437657 us)
 120784        INDEX RANGE SCAN AK1_PFMQ_SEQUENCENUMBER (cr=3117 r=3062 w=0 time=10896605 us)(object id 6511)
  64806       TABLE ACCESS BY INDEX ROWID PFMQ_MESSAGESTATUS (cr=325234 r=23150 w=0 time=224278563 us)
  64820        INDEX RANGE SCAN XPKPF_MESSAGESTATUS (cr=260414 r=15792 w=0 time=208616639 us)(object id 6515)
  64820         SORT AGGREGATE (cr=129644 r=116 w=0 time=1973822 us)
  64820          FIRST ROW  (cr=129644 r=116 w=0 time=1810738 us)
  64820           INDEX RANGE SCAN (MIN/MAX) XPKPF_MESSAGESTATUS (cr=129644 r=116 w=0 time=1756030 us)(object id 6515)
  64806      INDEX UNIQUE SCAN XPKPF_MESSAGEDATA (cr=129852 r=46076 w=0 time=632244506 us)(object id 6505)
  64806     TABLE ACCESS BY INDEX ROWID PFMQ_MESSAGEDATASTORAGE (cr=194658 r=92119 w=0 time=828055493 us)
  64806      INDEX UNIQUE SCAN XPKPF_MESSAGEDATASTORAGE (cr=129852 r=46036 w=0 time=613528422 us)(object id 6507)

How would you troubleshoot this performance problem?  What are the problems?  What looks good about the above output?  What about the above makes you wonder if some detail is missing?  You will find the above SQL statement in this comp.databases.oracle.server Usenet thread.

What can be said about the above output?

  • The client application is retrieving roughly 15 rows in each fetch request: 64,806/4,321 = 14.998 rows per fetch.  Maybe setting the fetch array size to a larger value would help?
  • The fetch calls required 14.56 seconds of the server’s CPU time, while the elapsed time for the fetch was 508.31 580.31 seconds.  565.75 seconds were probably spent doing something other than actively burning CPU cycles – such as waiting for the completion of a disk read.
  • Most of the indexes accessed were high precision indexes, meaning that few of the rowids returned by the index where eliminated at the table level, with the exception of the AK1_PFMQ_SEQUENCENUMBER index, where 46% of the rows identified by the index were discarded.  Those rows were discarded very early in the plan, so the performance impact was likely minimal.
  • There was heavy use of nested loop joins – that might be OK, but might not be as efficient as a hash join if a large percentage of the tables were being accessed or if the clustering factor were high.
  • There were 231,750 blocks read from disk, and considering the heavy use of indexes and nested loop joins, those blocks were likely read one at a time from disk.  If that was the case, the average time to read a block from disk was 0.0024412 seconds (2.4412ms), which is considered to be very fast access times for physical reads from disk.
  • Considering the WHERE clause, the execution plan starts with the two tables with the greatest number of predicates – so that is probably a smart starting point.
  • The elapsed time reported on the last line of the execution plan is greater than the elapsed time reported on the first line of the execution plan – just an odd inconsistency that the time reported on the last line was 613.5 seconds, yet the elapsed time reported by TKPROF for the entire execution was only 580.31 seconds?  The second line of the execution plan shows 1,729.7 seconds (28.8 minutes) on the time= entry, which again is inconsistent with TKPROF’s elapsed time statistic.
  • 580.31 seconds is roughly 9.7 minutes – what happened to the other 20.3 minutes in the 30 minute total execution time that was reported by the original poster?

What might be the next couple of steps for troubleshooting this performance problem?

  • Generate the 10046 trace at level 8, rather than level 1.  That will cause the wait events to be written to the trace file.  If the trace file contains a large number of long duration waits on SQL*Net type wait events, check the network with a packet sniffer (Wireshark, for example) and check the client-side activity to make certain that it is not the client application that is the source of the slow performance.  If you see large unexplained gaps in the tim= values in the trace file without a corresponding long wait event in between, investigate whether the server’s CPUs are over-burdened.
  • Check the statistics on the tables and indexes to make certain that those statistics are reasonably accurate and up to date.
  • Review the current optimizer related initialization parameters to make certain that silly parameters are not specified.
  • Most of all, determine where the missing 20.3 minutes have gone.




Why Doesn’t this SQL Work?

2 03 2010

March 2, 2010

I read a lot of computer books – a fair number of those are on the topic of Oracle, and a portion of those are specific to writing SQL that executes on Oracle Database.  I also spend time browsing the Internet looking for interesting articles.  I found an interesting SQL statement in a couple of places on the Internet, so I thought that I would share the SQL statement:

SELECT
  BOOK_KEY
FROM
  BOOK
WHERE
  NOT EXISTS (SELECT BOOK_KEY FROM SALES);

The SQL statement can be found here:
http://books.google.com/books?id=xJ0fLjQFUFcC&pg=PA105#v=onepage&q=&f=false

And here (as well as a half-dozen other places on the Internet):
http://deepthinking99.wordpress.com/2009/11/20/rewriting-sql-for-faster-performance/

Deep thinking… something is wrong with that SQL statement.  Maybe we need a test script to see the problem?

CREATE TABLE T5 AS
SELECT
  ROWNUM BOOK_KEY
FROM
  DUAL
CONNECT BY
  LEVEL<=20;

CREATE TABLE T6 AS
SELECT
  ROWNUM*2 BOOK_KEY
FROM
  DUAL
CONNECT BY
  LEVEL<=20;

Let’s pretend that table T5 is the table BOOK and table T6 is the table SALES.  The SQL statement would look like this using our test tables:

SELECT
  BOOK_KEY
FROM
  T5
WHERE
  NOT EXISTS (SELECT BOOK_KEY FROM T6);

Both of the above links go on to suggest that a transformed and faster version of the above SQL statement would look like this:

SELECT
  B.BOOK_KEY
FROM
  T5 B,
  T6 S
WHERE
  B.BOOK_KEY=S.BOOK_KEY(+)
  AND S.BOOK_KEY IS NULL;

I suggest that both of the above links (and the 6+ other links found through a Google search) are clearly wrong – the first SQL statement is obviously faster.  Don’t believe me?  Put 1,000,000 rows in each table and time how long it takes to transfer the results back to the client computer.  How confident am I?  Take a look:

SELECT
  BOOK_KEY
FROM
  T5
WHERE
  NOT EXISTS (SELECT BOOK_KEY FROM T6);

no rows selected
--
SELECT
  B.BOOK_KEY
FROM
  T5 B,
  T6 S
WHERE
  B.BOOK_KEY=S.BOOK_KEY(+)
  AND S.BOOK_KEY IS NULL;

  BOOK_KEY
----------
         5
         3
        15
        19
        17
         7
         9
        13
         1
        11

So, if each table contained 1,000,000 rows, which SQL statement would return the result set to the client the fastest?

Lesson 1: if you plan to publish something, whether in book form or on the Internet, make certain that what you publish actually works (or at least looks like you put some effort into it).

Lesson 2: if you plan to copy someone else’s work and post it on your website/blog make certain that what you copy and pass off as your own actually works.

Lesson 3: don’t trust everything that you read on the Internet or in a book without first verifying that the information is correct, even if you find the information on your favorite website.

Makes you wonder if someone would suggest replacing a pure SQL solution with a combined SQL and PL/SQL solution for the purpose of improving performance.  No, that would be silly.  Pardon me while I go re-sequence the 64 bits to keep them from chattering as they pass through the router… maybe I should try to oil the bits or use a bigger router.  On second thought, I’ll just use a hammer (putting down the 28oz framing hammer to grab the small hammer, those bit break too easily).





CPU Wait? LAG to the Rescue

26 02 2010

February 26, 2010

A question in a recent OTN thread appeared as follows:

I’m in RAC database (10gR2/Redhat4). I need to store the real CPU wait every 1 minute in a table for a month. For that I thing to query the GV$SYS_TIME_MODEL [for the "DB CPU" statistic].

This is a very difficult question to answer.  Why (additional reference)?  Technically, in Oracle when a session is “on the CPU”, that session is not in a wait event, and therefore not officially waiting for anything. The “DB CPU” statistic captures the accumulated time spent on the CPU by foreground processes – the user sessions. This CPU time does not include the time that the background processes (DBWR, LGWR, PMON, etc.) spend consuming CPU time. Additionally, the “DB CPU” statistic does not consider/accumulate CPU time consumed by processes that are not related to the database instance.  It could also be said that the “DB CPU” time does not account for time that the session spends waiting for its turn to execute on the CPUs.

With the above in mind, let’s see if we are able to calculate the amount of CPU time consumed by the sessions and the background processes in one minute intervals.  First, we need a logging table.  The following SQL statement builds the SYS_TIME_MODEL_CPU table using a couple of the column definitions from the GV$SYS_TIME_MODEL view so that I do not need to explicitly state the column data types (notice that the SQL statement is collapsing data from two source rows into a single row):

CREATE TABLE SYS_TIME_MODEL_CPU AS
SELECT
  SYSDATE CHK_ID,
  INST_ID,
  SUM(DECODE(STAT_NAME,'DB CPU',VALUE,NULL)) DB_CPU,
  SUM(DECODE(STAT_NAME,'background cpu time',VALUE,NULL)) BACKGROUND_CPU
FROM
  GV$SYS_TIME_MODEL
WHERE
  0=1
GROUP BY
  INST_ID;

If we are able to find a way to schedule the following SQL statement to execute once a minute, we will be able to store the current values of the “DB CPU” and “background cpu time” statistics with the following SQL statement (note that executing this SQL statement will also consume CPU time, the very thing we are trying to measure):

INSERT INTO SYS_TIME_MODEL_CPU
SELECT
  SYSDATE CHK_ID,
  INST_ID,
  SUM(DECODE(STAT_NAME,'DB CPU',VALUE,NULL)) DB_CPU,
  SUM(DECODE(STAT_NAME,'background cpu time',VALUE,NULL)) BACKGROUND_CPU
FROM
  GV$SYS_TIME_MODEL
WHERE
  STAT_NAME IN ('DB CPU','background cpu time')
GROUP BY
  INST_ID;

One way to schedule the SQL statement to execute once a minute is to use the DBMS_LOCK.SLEEP function in a loop.  Unfortunely, on some platforms the function may not wait exactly the specified number of seconds (it may wait slightly longer), and may cause the “PL/SQL lock timer” wait event to steal a position in the top 5 wait events list in a Statspack or AWR report.  For testing purposes, the following anonymous PL/SQL script might be used:

DECLARE
  STime DATE := SYSDATE;
BEGIN
  WHILE (SYSDATE - STime) < 32 LOOP
    INSERT INTO SYS_TIME_MODEL_CPU
      SELECT
        SYSDATE CHK_ID,
        INST_ID,
        SUM(DECODE(STAT_NAME,'DB CPU',VALUE,NULL)) DB_CPU,
        SUM(DECODE(STAT_NAME,'background cpu time',VALUE,NULL)) BACKGROUND_CPU
      FROM
        GV$SYS_TIME_MODEL
      WHERE
        STAT_NAME IN ('DB CPU','background cpu time')
      GROUP BY
        INST_ID;

      COMMIT;
      DBMS_LOCK.SLEEP(60);
  End Loop;
End;
/

If we allow the script to run for a couple of minutes (rather than 31 days), we are able to determine how much CPU time was consumed every minute by using the LAG analytic function, as shown below:

SELECT
  TO_CHAR(CHK_ID,'YYYY-MM-DD HH24:MI') CHK_ID,
  INST_ID,
  DB_CPU-LAG(DB_CPU,1) OVER (PARTITION BY INST_ID ORDER BY CHK_ID) DB_CPU,
  BACKGROUND_CPU-LAG(BACKGROUND_CPU,1) OVER (PARTITION BY INST_ID ORDER BY CHK_ID) BACKGROUND_CPU
FROM
  SYS_TIME_MODEL_CPU
ORDER BY
  CHK_ID;

CHK_ID              INST_ID     DB_CPU BACKGROUND_CPU
---------------- ---------- ---------- --------------
2010-02-24 07:18          1
2010-02-24 07:19          1   59990544          66070
2010-02-24 07:20          1   59951475          66724
2010-02-24 07:21          1   59985268          71768
2010-02-24 07:22          1   60000569          63694
2010-02-24 07:23          1   60002938          71639
2010-02-24 07:24          1   59978651          63770
2010-02-24 07:25          1   61487141          62785
2010-02-24 07:26          1      24194          76990

To determine the number of seconds of CPU time consumed, the values shown in the DB_CPU and BACKGROUND_CPU columns should be divided by 1,000,000.

Why not just use AWR data to obtain this information?  Just because AWR is built-in does not mean that it is free to use the features of AWR (as I attempted to argue in this OTN thread) – this is a fact that is often glossed over by various books, blog articles, “news” articles, and even the Oracle Database documentation when it states that AWR reports are the replacement for Statspack reports.





Transitive Closure Causing an Execution Plan Short-Circuit

20 02 2010

February 20, 2010

A recent thread on OTN asked the following question:

Sometimes predicates can be generated due to Transitive Closure, but is it possible to see below mentioned behaviour.  Is it possible to create test case for observing this behaviour?

You can even have cases like this one: if n1 > 10 and n1 < 0, then 0 > 10, which is always false, and therefore can short-circuit an entire branch of an execution plan.

In the OTN thread, Jonathan Lewis directly answered the question posted by the original poster, and I provided an answer to a similar question that involved two columns in a test table.  The setup for the test case included the following SQL statements:

CREATE TABLE T1 AS
SELECT
  ROWNUM C1,
  ROWNUM C2
FROM
  DUAL
CONNECT BY
  LEVEL<=100000;

COMMIT;

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T1')

ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE;

(Note that submitting the test case using my VBS tool makes short work of temporarily changing the STATISTICS_LEVEL and enabling a 10046 trace while generating a DBMS_XPLAN for a SQL statement: Automated DBMS_XPLAN, Trace, and Send to Excel )

The SQL to generate the test case follows:

ALTER SESSION SET STATISTICS_LEVEL='ALL';
ALTER SESSION SET TRACEFILE_IDENTIFIER = 'SQL_SHORT_CIRCUIT';
ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT FOREVER, LEVEL 8';

SELECT
  *
FROM
  T1
WHERE
  C1<=100
  AND C2>=10000
  AND C1>C2;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

SQL_ID  3uk9dajd5cn74, child number 0
-------------------------------------
SELECT T1.* FROM T1 WHERE     C1<=100    AND C2>=10000    AND C1>C2

Plan hash value: 3332582666

---------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   |
---------------------------------------------------------------------------
|*  1 |  FILTER            |      |      1 |        |      0 |00:00:00.01 |
|*  2 |   TABLE ACCESS FULL| T1   |      0 |      1 |      0 |00:00:00.01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(NULL IS NOT NULL)
   2 - filter(("C1">"C2" AND "C2"<100 AND "C1"<=100 AND "C1">10000 AND
              "C2">=10000))

ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT OFF';

Notice in the execution plan that the Starts column for the TABLE ACCESS FULL operation is 0 – that line in the execution plan was never executed.  By reviewing the 10046 trace file, you could further confirm that there were no physical reads when the SQL statement executed, which confirms that the full table scan operation was never performed:

PARSING IN CURSOR #2 len=68 dep=0 uid=31 oct=3 lid=31 tim=3008717127 hv=1515606244 ad='a775f1c8'
SELECT T1.* FROM T1 WHERE 
  C1<=100
  AND C2>=10000
  AND C1>C2
END OF STMT
PARSE #2:c=0,e=2320,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,tim=3008717120

EXEC #2:c=0,e=123,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,tim=3008717834
WAIT #2: nam='SQL*Net message to client' ela= 5 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=3008717930
FETCH #2:c=0,e=11,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,tim=3008718038
WAIT #2: nam='SQL*Net message from client' ela= 1777 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=3008719953
STAT #2 id=1 cnt=0 pid=0 pos=1 obj=0 op='FILTER  (cr=0 pr=0 pw=0 time=5 us)'
STAT #2 id=2 cnt=0 pid=1 pos=1 obj=114196 op='TABLE ACCESS FULL T1 (cr=0 pr=0 pw=0 time=0 us)'




What is the Meaning of the %CPU Column in an Explain Plan?

19 02 2010

February 19, 2010

(Forward to the Next Post in the Series)

A question recently appeared on the OTN forums asking what %CPU means in an explain plan output.  I did not see a clear definition of that column in the documentation, so I set up a test case.  We will use the test table from this blog article.  Let’s try creating an explain plan on Oracle 11.2.0.1 for a query:

EXPLAIN PLAN FOR
SELECT
  T1.C1,
  T1.C2,
  T1.C3
FROM
  T1,
  (SELECT
    C1,
    C2
  FROM
    T1
  WHERE
    MOD(C1,3)=0) V
WHERE
  T1.C1=V.C1(+)
  AND V.C1 IS NULL
ORDER BY
  T1.C1 DESC;

The above command wrote a couple of rows into the PLAN_TABLE table.  At this point, we should probably consult the documentation to understand the columns in the PLAN_TABLE table.

COST: Cost of the operation as estimated by the optimizer’s query approach. Cost is not determined for table access operations. The value of this column does not have any particular unit of measurement; it is merely a weighted value used to compare costs of execution plans. The value of this column is a function of the CPU_COST and IO_COST columns.

IO_COST: I/O cost of the operation as estimated by the query optimizer’s approach. The value of this column is proportional to the number of data blocks read by the operation. For statements that use the rule-based approach, this column is null.

CPU_COST: CPU cost of the operation as estimated by the query optimizer’s approach. The value of this column is proportional to the number of machine cycles required for the operation. For statements that use the rule-based approach, this column is null.

We found a couple of interesting columns in the PLAN_TABLE table, so let’s query the table

SELECT
  ID,
  COST,
  IO_COST,
  CPU_COST
FROM
  PLAN_TABLE;

 ID  COST  IO_COST   CPU_COST
--- ----- -------- ----------
  0  1482     1467  364928495
  1  1482     1467  364928495
  2   898      887  257272866
  3   889      887   42272866
  4     0        0       2150

Now let’s display the execution plan:

SELECT
  *
FROM
  TABLE(DBMS_XPLAN.DISPLAY);

Plan hash value: 1923834833

--------------------------------------------------------------------------------------------
| Id  | Operation           | Name         | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |              | 99000 |  1836K|       |  1482   (2)| 00:00:18 |
|   1 |  SORT ORDER BY      |              | 99000 |  1836K|  2736K|  1482   (2)| 00:00:18 |
|   2 |   NESTED LOOPS ANTI |              | 99000 |  1836K|       |   898   (2)| 00:00:11 |
|   3 |    TABLE ACCESS FULL| T1           |   100K|  1367K|       |   889   (1)| 00:00:11 |
|*  4 |    INDEX UNIQUE SCAN| SYS_C0018049 |    10 |    50 |       |     0   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("T1"."C1"="C1")
       filter(MOD("C1",3)=0)

The %CPU is 2 for ID 0, 1, and 2, and the %CPU is 1 for ID 3.  Let’s return to the query of the PLAN_TABLE table and perform a couple of calculations:

SELECT
  ID,
  COST,
  IO_COST,
  COST-IO_COST DIFF,
  CEIL(DECODE(COST,0,0,(COST-IO_COST)/COST)*100) PER_CPU,
  CPU_COST
FROM
  PLAN_TABLE;

 ID  COST  IO_COST  DIFF  PER_CPU   CPU_COST
--- ----- -------- ----- -------- ----------
  0  1482     1467    15        2  364928495
  1  1482     1467    15        2  364928495
  2   898      887    11        2  257272866
  3   889      887     2        1   42272866
  4     0        0     0        0       2150

In the above, I subtracted the IO_COST column from the COST column to derive the DIFF column.  I then divided the value in the DIFF column by the COST column, multiplied the result by 100 to convert the number to a percent, and then rounded up the result to derive the PER_CPU column.  The PER_CPU column seems to match the %CPU column in the DBMS_XPLAN output.  Let’s try another SQL statement:

DELETE FROM PLAN_TABLE;

EXPLAIN PLAN FOR
SELECT
  C1
FROM
  T1
WHERE
  'A'||C1 LIKE 'A%';

Now let’s run the query against the PLAN_TABLE table to see if we are able to predict the values that will appear in the %CPU column of the DBMS_XPLAN output:

SELECT
  ID,
  COST,
  IO_COST,
  COST-IO_COST DIFF,
  CEIL(DECODE(COST,0,0,(COST-IO_COST)/COST)*100) PER_CPU,
  CPU_COST
FROM
  PLAN_TABLE;

 ID  COST  IO_COST  DIFF  PER_CPU   CPU_COST
--- ----- -------- ----- -------- ----------
  0    54       52     2        4   43331709
  1    54       52     2        4   43331709

The above indicates that the %CPU column should show the number 4 on both rows of the execution plan.

SELECT
  *
FROM
  TABLE(DBMS_XPLAN.DISPLAY);

Plan hash value: 2950179127

-------------------------------------------------------------------------------------
| Id  | Operation            | Name         | Rows  | Bytes | Cost (%CPU)|  Time    |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |              |  5000 | 25000 |    54   (4)| 00:00:01 |
|*  1 |  INDEX FAST FULL SCAN| SYS_C0018049 |  5000 | 25000 |    54   (4)| 00:00:01 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter('A'||TO_CHAR("C1") LIKE 'A%')

One of my previous blog articles showed the following execution plan – this was the actual plan displayed by DBMS_XPLAN.DISPLAY_CURSOR after the SQL statement executed:

--------------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name     | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |          |       |       |   247 (100)|          |        |      |            |
|   1 |  PX COORDINATOR      |          |       |       |            |          |        |      |            |
|   2 |   PX SEND QC (RANDOM)| :TQ10000 | 10000 |  2236K|   247   (1)| 00:00:03 |  Q1,00 | P->S | QC (RAND)  |
|   3 |    PX BLOCK ITERATOR |          | 10000 |  2236K|   247   (1)| 00:00:03 |  Q1,00 | PCWC |            |
|*  4 |     TABLE ACCESS FULL| T1       | 10000 |  2236K|   247   (1)| 00:00:03 |  Q1,00 | PCWP |            |
--------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access(:Z>=:Z AND :Z<=:Z)
       filter(("C1"<=10000 AND "C1">=1)) 

Is there anything strange about the %CPU column in the above plan?

Incidentally, a query of SYS.AUX_STATS$ shows the following output (values are used to determine the impact of the CPU_COST column that is displayed in the PLAN_TABLE table):

SELECT
  PNAME,
  PVAL1
FROM
  SYS.AUX_STATS$
WHERE
  PNAME IN ('CPUSPEED','CPUSPEEDNW');

PNAME           PVAL1
---------- ----------
CPUSPEEDNW   2031.271
CPUSPEED 




SQL – Determining if Resources are in Use in 2 Hour Time Periods

10 02 2010

February 10, 2010

A question appeared in the comp.databases.oracle.misc Usenet group a couple of years ago that caught my attention, and not just because the subject line read “need urgent help“.  The original poster supplied the following request:

I am creating attendance sheet software for in-house use.

my data is like this:

-----------------------------------------------------------------------
| name     |   login time                   |   logout time           |
-----------------------------------------------------------------------
|  a       |  2007-11-10 12:00:00           |  2007-11-10 16:00:00    |
-----------------------------------------------------------------------
|  b       |  2007-11-10 15:00:00           |  2007-11-10 18:00:00    |
-----------------------------------------------------------------------

My requirement:

I want to generate an hourly report like this:

----------------------------------------------------------
 date             time range        total people logged in
----------------------------------------------------------
 2007-11-10          0 -2                      0
----------------------------------------------------------
 2007-12-10          2-4                       0
----------------------------------------------------------
.
.
----------------------------------------------------------
 2007-11-10         12-14                      1
----------------------------------------------------------
 2007-11-10         14-16                      2
----------------------------------------------------------
 2007-11-10         16-18                      1
----------------------------------------------------------
.
.
----------------------------------------------------------
 2007-11-10         22-24                      0
----------------------------------------------------------

This is what I want to create, but I don’t know how can I generate such kind of report.

Ed Prochak offered the following advice in the thread, advice that is probably something that is easily forgotten when confronted by a simple problem that seems to be impossible to solve.

HINT: Try resolving the problem in steps.

The key to this is understanding that the result set of a SELECT can be considered a table. You may already have the first step you need, but basically try to think of the problem in parts.

If we were to apply Ed’s suggestions, we might start off by trying to simplify the problem with a test table, something like this:

CREATE TABLE T1 (
  USERNAME VARCHAR2(15),
  LOGIN_TIME DATE,
  LOGOUT_TIME DATE);

INSERT INTO
  T1
VALUES(
  'a',
  TO_DATE('2007-11-10 12:00','YYYY-MM-DD HH24:MI'),
  TO_DATE('2007-11-10 16:00','YYYY-MM-DD HH24:MI'));

INSERT INTO
  T1
VALUES(
  'b',
  TO_DATE('2007-11-10 15:00','YYYY-MM-DD HH24:MI'),
  TO_DATE('2007-11-10 18:00','YYYY-MM-DD HH24:MI'));

COMMIT;

At this point, we might start thinking about what, if any, potential problems we might encounter.  One of the challenges that we will face is working around the need to generate up to 12 rows (1 for each of the 2 hour long possible time periods) for each row in the source table.  A second problem is how to handle logins that occur before midnight, with a corresponding logout that occurs after midnight.  If it were known that there would be no time periods that cross midnight, we might try to build a solution like this using our test table:

SELECT
  TRUNC(LOGIN_TIME) CHECK_DATE,
  TO_NUMBER(TO_CHAR(LOGIN_TIME,'HH24')) LOGIN_HOUR,
  TO_NUMBER(TO_CHAR(LOGOUT_TIME,'HH24')) LOGOUT_HOUR
FROM
  T1;

CHECK_DAT LOGIN_HOUR LOGOUT_HOUR
--------- ---------- -----------
10-NOV-07         12          16
10-NOV-07         15          18

The above just simplifies the input table into dates, login hour and logout hour.

Next, we need a way to generate 12 rows.  We could just use an existing table, and specify that we want to return all rows where ROWNUM<=12, but we will use CONNECT BY LEVEL, which could result in greater CPU consumption, but would likely be more portable:

SELECT
  (LEVEL-1)*2 LOGIN_COUNTER,
  (LEVEL-1)*2+2 LOGOUT_COUNTER
FROM
  DUAL
CONNECT BY
  LEVEL<=12;

LOGIN_COUNTER LOGOUT_COUNTER
------------- --------------
            0              2
            2              4
            4              6
            6              8
            8             10
           10             12
           12             14
           14             16
           16             18
           18             20
           20             22
           22             24

Now that we have the two simplified data sets, we just need to find where the two data sets intersect.  First, let’s find those records where the numbers from the counter fall between the LOGIN_HOUR and the LOGOUT_HOUR:

SELECT
  T.CHECK_DATE,
  T.LOGIN_HOUR,
  T.LOGOUT_HOUR,
  TO_CHAR(LOGIN_COUNTER,'99')||'-'||TO_CHAR(LOGOUT_COUNTER,'99') TIME_RANGE
FROM
  (SELECT
    TRUNC(LOGIN_TIME) CHECK_DATE,
    TO_NUMBER(TO_CHAR(LOGIN_TIME,'HH24')) LOGIN_HOUR,
    TO_NUMBER(TO_CHAR(LOGOUT_TIME,'HH24')) LOGOUT_HOUR
  FROM
    T1) T,
  (SELECT
    (LEVEL-1)*2 LOGIN_COUNTER,
    (LEVEL-1)*2+2 LOGOUT_COUNTER
  FROM
    DUAL
  CONNECT BY
    LEVEL<=12) C
WHERE
  C.LOGIN_COUNTER BETWEEN T.LOGIN_HOUR AND T.LOGOUT_HOUR
  AND C.LOGOUT_COUNTER BETWEEN T.LOGIN_HOUR AND T.LOGOUT_HOUR
ORDER BY
  1,
  4,
  2;

CHECK_DAT LOGIN_HOUR LOGOUT_HOUR TIME_RA
--------- ---------- ----------- -------
10-NOV-07         12          16  12- 14
10-NOV-07         12          16  14- 16
10-NOV-07         15          18  16- 18

You may notice that the above output is missing one row.  Let’s see if we can find a way to include the missing row:

SELECT
  T.CHECK_DATE,
  T.LOGIN_HOUR,
  T.LOGOUT_HOUR,
  TO_CHAR(LOGIN_COUNTER,'99')||'-'||TO_CHAR(LOGOUT_COUNTER,'99') TIME_RANGE
FROM
  (SELECT
    TRUNC(LOGIN_TIME) CHECK_DATE,
    TO_NUMBER(TO_CHAR(LOGIN_TIME,'HH24')) LOGIN_HOUR,
    TO_NUMBER(TO_CHAR(LOGOUT_TIME,'HH24')) LOGOUT_HOUR
  FROM
    T1) T,
  (SELECT
    (LEVEL-1)*2 LOGIN_COUNTER,
    (LEVEL-1)*2+2 LOGOUT_COUNTER
  FROM
    DUAL
  CONNECT BY
    LEVEL<=12) C
WHERE
  (C.LOGIN_COUNTER BETWEEN T.LOGIN_HOUR AND T.LOGOUT_HOUR
    AND C.LOGOUT_COUNTER BETWEEN T.LOGIN_HOUR AND T.LOGOUT_HOUR)
  OR T.LOGIN_HOUR BETWEEN C.LOGIN_COUNTER AND C.LOGOUT_COUNTER-1
  OR T.LOGOUT_HOUR BETWEEN C.LOGIN_COUNTER+1 AND C.LOGOUT_COUNTER
ORDER BY
  1,
  4,
  2;

CHECK_DAT LOGIN_HOUR LOGOUT_HOUR TIME_RA
--------- ---------- ----------- -------
10-NOV-07         12          16  12- 14
10-NOV-07         12          16  14- 16
10-NOV-07         15          18  14- 16
10-NOV-07         15          18  16- 18

By also allowing the LOGIN_HOUR to fall between the LOGIN_COUNTER and LOGOUT_COUNTER, or the LOGOUT_HOUR to fall between the LOGIN_COUNTER and LOGOUT_COUNTER (with a slight adjustment), we pick up the missing row.  Now, it is a simple matter to find the total number in each time period by sliding the above into an inline view:

SELECT
  T.CHECK_DATE,
  TO_CHAR(LOGIN_COUNTER,'99')||'-'||TO_CHAR(LOGOUT_COUNTER,'99') TIME_RANGE,
  COUNT(*) TOTAL_PEOPLE
FROM
  (SELECT
    TRUNC(LOGIN_TIME) CHECK_DATE,
    TO_NUMBER(TO_CHAR(LOGIN_TIME,'HH24')) LOGIN_HOUR,
    TO_NUMBER(TO_CHAR(LOGOUT_TIME,'HH24')) LOGOUT_HOUR
  FROM
    T1) T,
  (SELECT
    (LEVEL-1)*2 LOGIN_COUNTER,
    (LEVEL-1)*2+2 LOGOUT_COUNTER
  FROM
    DUAL
  CONNECT BY
    LEVEL<=12) C
WHERE
  (C.LOGIN_COUNTER BETWEEN T.LOGIN_HOUR AND T.LOGOUT_HOUR
    AND C.LOGOUT_COUNTER BETWEEN T.LOGIN_HOUR AND T.LOGOUT_HOUR)
  OR T.LOGIN_HOUR BETWEEN C.LOGIN_COUNTER AND C.LOGOUT_COUNTER-1
  OR T.LOGOUT_HOUR BETWEEN C.LOGIN_COUNTER+1 AND C.LOGOUT_COUNTER
GROUP BY
  T.CHECK_DATE,
  TO_CHAR(LOGIN_COUNTER,'99')||'-'||TO_CHAR(LOGOUT_COUNTER,'99')
ORDER BY
  1,
  2;

CHECK_DAT TIME_RA TOTAL_PEOPLE
--------- ------- ------------
10-NOV-07  12- 14            1
10-NOV-07  14- 16            2
10-NOV-07  16- 18            1

The above SQL statement is likely not the only solution to the problem.  Let’s take another look at the problem.  What if there is a need for the time intervals to cross midnight.  We need to make a couple of adjustments.  First, let’s add another row to the table for variety:

INSERT INTO
  T1
VALUES(
  'c',
  TO_DATE('2007-11-10 13:00','YYYY-MM-DD HH24:MI'),
  TO_DATE('2007-11-10 19:00','YYYY-MM-DD HH24:MI'));

We can start with the SELECT statement we used earlier:

SELECT
  LOGIN_TIME,
  TO_NUMBER(TO_CHAR(LOGIN_TIME,'HH24')) LOGIN_HOUR,
  TO_NUMBER(TO_CHAR(LOGOUT_TIME,'HH24')) LOGOUT_HOUR
FROM
  T1;

LOGIN_TIM LOGIN_HOUR LOGOUT_HOUR
--------- ---------- -----------
10-NOV-07         12          16
10-NOV-07         15          18
10-NOV-07         13          19

We will modified the SQL statement above to produce the same output, with a little bit more efficiency:

SELECT
  LOGIN_TIME,
  (LOGIN_TIME-TRUNC(LOGIN_TIME))*24 LOGIN_HOUR,
  (LOGOUT_TIME-TRUNC(LOGOUT_TIME))*24 LOGOUT_HOUR
FROM
  T1;

LOGIN_TIM LOGIN_HOUR LOGOUT_HOUR
--------- ---------- -----------
10-NOV-07         12          16
10-NOV-07         15          18
10-NOV-07         13          19

Now, we need to round the clock in and clock out times to two hour intervals – note that the LOGOUT_HOUR_A value of the last row was rounded up:

SELECT
  LOGIN_TIME,
  (LOGIN_TIME-TRUNC(LOGIN_TIME))*24 LOGIN_HOUR,
  (LOGOUT_TIME-TRUNC(LOGOUT_TIME))*24 LOGOUT_HOUR,
  FLOOR((LOGIN_TIME-TRUNC(LOGIN_TIME))*24/2)*2 LOGIN_HOUR_A,
  CEIL((LOGOUT_TIME-TRUNC(LOGOUT_TIME))*24/2)*2 LOGOUT_HOUR_A
FROM
  T1;

LOGIN_TIM LOGIN_HOUR LOGOUT_HOUR LOGIN_HOUR_A LOGOUT_HOUR_A
--------- ---------- ----------- ------------ -------------
10-NOV-07         12          16           12            16
10-NOV-07         15          18           14            18
10-NOV-07         13          19           12            20

Let’s take the above hours and translate them back into date/time values and determine the number of intervals between the adjusted LOGIN_HOUR_A and LOGOUT_HOUR_A:

SELECT
  LOGIN_TIME,
  LOGOUT_TIME,
  TRUNC(LOGIN_TIME)+(FLOOR((LOGIN_TIME-TRUNC(LOGIN_TIME))*24/2)*2)/24 LOGIN_HOUR_A,
  TRUNC(LOGOUT_TIME)+(CEIL((LOGOUT_TIME-TRUNC(LOGOUT_TIME))*24/2)*2)/24 LOGOUT_HOUR_A,
  ((TRUNC(LOGOUT_TIME)+(CEIL((LOGOUT_TIME-TRUNC(LOGOUT_TIME))*24/2)*2)/24)
    - (TRUNC(LOGIN_TIME)+(FLOOR((LOGIN_TIME-TRUNC(LOGIN_TIME))*24/2)*2)/24))*12 H
FROM
  T1;

LOGIN_TIME           LOGOUT_TIME          LOGIN_HOUR_A         LOGOUT_HOUR_A        H
-------------------- -------------------- -------------------- -------------------- -
10-NOV-2007 12:00:00 10-NOV-2007 16:00:00 10-NOV-2007 12:00:00 10-NOV-2007 16:00:00 2
10-NOV-2007 15:00:00 10-NOV-2007 18:00:00 10-NOV-2007 14:00:00 10-NOV-2007 18:00:00 2
10-NOV-2007 13:00:00 10-NOV-2007 19:00:00 10-NOV-2007 12:00:00 10-NOV-2007 20:00:00 4

We then combine the above with a simple counter that counts from 1 up to 12, only joining those rows from the counter that are less than or equal to the calculated number of intervals.  By adding the number of hours determined by the counter to the adjusted LOGIN_HOUR_A, values we obtain the time intervals:

SELECT
  T.LOGIN_TIME,
  T.LOGOUT_TIME,
  T.LOGIN_HOUR_A+(C.COUNTER*2-2)/24 TIME_START,
  T.LOGIN_HOUR_A+(C.COUNTER*2)/24 TIME_END
FROM
  (SELECT
    LOGIN_TIME,
    LOGOUT_TIME,
    TRUNC(LOGIN_TIME)+(FLOOR((LOGIN_TIME-TRUNC(LOGIN_TIME))*24/2)*2)/24 LOGIN_HOUR_A,
    TRUNC(LOGOUT_TIME)+(CEIL((LOGOUT_TIME-TRUNC(LOGOUT_TIME))*24/2)*2)/24 LOGOUT_HOUR_A,
    ((TRUNC(LOGOUT_TIME)+(CEIL((LOGOUT_TIME-TRUNC(LOGOUT_TIME))*24/2)*2)/24)
     - (TRUNC(LOGIN_TIME)+(FLOOR((LOGIN_TIME-TRUNC(LOGIN_TIME))*24/2)*2)/24))*12 H
  FROM
    T1) T,
  (SELECT
    LEVEL COUNTER
  FROM
    DUAL
  CONNECT BY
    LEVEL<=12) C
WHERE
  T.H>=C.COUNTER;

LOGIN_TIME           LOGOUT_TIME          TIME_START           TIME_END
==================== ==================== ==================== ====================
10-NOV-2007 12:00:00 10-NOV-2007 16:00:00 10-NOV-2007 12:00:00 10-NOV-2007 14:00:00
10-NOV-2007 15:00:00 10-NOV-2007 18:00:00 10-NOV-2007 14:00:00 10-NOV-2007 16:00:00
10-NOV-2007 13:00:00 10-NOV-2007 19:00:00 10-NOV-2007 12:00:00 10-NOV-2007 14:00:00
10-NOV-2007 12:00:00 10-NOV-2007 16:00:00 10-NOV-2007 14:00:00 10-NOV-2007 16:00:00
10-NOV-2007 15:00:00 10-NOV-2007 18:00:00 10-NOV-2007 16:00:00 10-NOV-2007 18:00:00
10-NOV-2007 13:00:00 10-NOV-2007 19:00:00 10-NOV-2007 14:00:00 10-NOV-2007 16:00:00
10-NOV-2007 13:00:00 10-NOV-2007 19:00:00 10-NOV-2007 16:00:00 10-NOV-2007 18:00:00
10-NOV-2007 13:00:00 10-NOV-2007 19:00:00 10-NOV-2007 18:00:00 10-NOV-2007 20:00:00

The final step is to perform a group by:

SELECT
  CHECK_DATE,
  TO_CHAR(TIME_START,'HH24')||'-'||TO_CHAR(TIME_END,'HH24') TIME_RANGE,
  COUNT(*) TOTAL_PEOPLE
FROM
(SELECT
  TRUNC(T.LOGIN_HOUR_A+(C.COUNTER*2-2)/24) CHECK_DATE,
  T.LOGIN_HOUR_A+(C.COUNTER*2-2)/24 TIME_START,
  T.LOGIN_HOUR_A+(C.COUNTER*2)/24 TIME_END
FROM
  (SELECT
    LOGIN_TIME,
    LOGOUT_TIME,
    TRUNC(LOGIN_TIME)+(FLOOR((LOGIN_TIME-TRUNC(LOGIN_TIME))*24/2)*2)/24 LOGIN_HOUR_A,
    TRUNC(LOGOUT_TIME)+(CEIL((LOGOUT_TIME-TRUNC(LOGOUT_TIME))*24/2)*2)/24 LOGOUT_HOUR_A,
    ((TRUNC(LOGOUT_TIME)+(CEIL((LOGOUT_TIME-TRUNC(LOGOUT_TIME))*24/2)*2)/24)
     - (TRUNC(LOGIN_TIME)+(FLOOR((LOGIN_TIME-TRUNC(LOGIN_TIME))*24/2)*2)/24))*12 H
  FROM
    T1) T,
  (SELECT
    LEVEL COUNTER
  FROM
    DUAL
  CONNECT BY
    LEVEL<=12) C
WHERE
  T.H>=C.COUNTER)
GROUP BY
  CHECK_DATE,
  TO_CHAR(TIME_START,'HH24')||'-'||TO_CHAR(TIME_END,'HH24')
ORDER BY
  1,
  2;

CHECK_DAT TIME_ TOTAL_PEOPLE
--------- ----- ------------
10-NOV-07 12-14            2
10-NOV-07 14-16            3
10-NOV-07 16-18            2
10-NOV-07 18-20            1

Now that we know that the above approach provides the desired results with the existing data, let’s add a row to the table where the values cross midnight:

INSERT INTO
  T1
VALUES(
  'c',
  TO_DATE('2007-11-10 19:00','YYYY-MM-DD HH24:MI'),
  TO_DATE('2007-11-11 04:00','YYYY-MM-DD HH24:MI'));

The results of our final SQL statement look like this following:

CHECK_DAT TIME_ TOTAL_PEOPLE
--------- ----- ------------
10-NOV-07 12-14            2
10-NOV-07 14-16            3
10-NOV-07 16-18            2
10-NOV-07 18-20            2
10-NOV-07 20-22            1
10-NOV-07 22-00            1
11-NOV-07 00-02            1
11-NOV-07 02-04            1

I don’t think that anyone mentioned it in the Usenet thread, but Ed Prochak’s suggestion was correct.  It does not matter much if someone is trying to solve an algebra problem (or even manually attempting to solve a long division problem), a performance tuning problem, or a SQL related problem – what is required is a logical, step-by-step approach to tackling the problem, with each step moving closer to the end result.





Excel – Charting the Results of Oracle Analytic Functions

6 02 2010

February 6, 2010

This is a somewhat complicated example that builds a couple of sample tables, uses a SQL statement with the Oracle analytic function LEAD submitted through ADO in an Excel macro, and then presents the information on an Excel worksheet.  When the user clicks one of three buttons on the Excel worksheet, an Excel macro executes that then build charts using disconnected row sources – a disconnected ADO recordset is used to sort the data categories before pushing that data into the charts that are built on the fly.

To start, we need to build the sample tables.  The first two tables follow, a part list table and a vendor list table with random data:

CREATE TABLE PART_LIST (
  PART_ID VARCHAR2(30),
  PRODUCT_CODE VARCHAR2(30),
  COMMODITY_CODE VARCHAR2(30),
  PURCHASED CHAR(1),
  PRIMARY KEY (PART_ID));

INSERT INTO
  PART_LIST
SELECT
  DBMS_RANDOM.STRING('Z',10),
  DBMS_RANDOM.STRING('Z',1),
  DBMS_RANDOM.STRING('Z',1),
  DECODE(ROUND(DBMS_RANDOM.VALUE(1,2)),1,'Y','N')
FROM
  DUAL
CONNECT BY
  LEVEL<=50000;

COMMIT;

CREATE TABLE VENDOR_LIST (
  VENDOR_ID VARCHAR2(30),
  PRIMARY KEY (VENDOR_ID));

INSERT INTO
  VENDOR_LIST
SELECT
  DBMS_RANDOM.STRING('Z',10)
FROM
  DUAL
CONNECT BY
  LEVEL<=100; 

COMMIT;

Next, we need to build a purchase transaction history table, allowing a single part to be purchased from 10 randomly selected vendors of the 100 vendors.  This is actually a Cartesian join, but we need to force it to handled as a nested loop join so that we will have a different set of 10 vendors for each PART_ID:

CREATE TABLE PURCHASE_HISTORY (
  TRANSACTION_ID NUMBER,
  VENDOR_ID VARCHAR2(30),
  PART_ID VARCHAR2(30),
  UNIT_PRICE NUMBER(12,2),
  PURCHASE_DATE DATE,
  PRIMARY KEY (TRANSACTION_ID));

INSERT INTO
  PURCHASE_HISTORY
SELECT /*+ ORDERED USE_NL(PL VL) */
  ROWNUM,
  VL.VENDOR_ID,
  PL.PART_ID,
  VL.UNIT_PRICE,
  VL.PURCHASE_DATE
FROM
  PART_LIST PL,
  (SELECT
     'A' MIN_PART,
     'ZZZZZZZZZZZ' MAX_PART,
     VENDOR_ID,
     UNIT_PRICE,
     PURCHASE_DATE,
     ROWNUM RN
  FROM
    (SELECT
      VENDOR_ID,
      ROUND(DBMS_RANDOM.VALUE(0,10000),2) UNIT_PRICE,
      TRUNC(SYSDATE) - ROUND(DBMS_RANDOM.VALUE(0,5000)) PURCHASE_DATE
    FROM
      VENDOR_LIST
    ORDER BY
      DBMS_RANDOM.VALUE)) VL
WHERE
  PL.PURCHASED='Y'
  AND VL.RN<=10
  AND PL.PART_ID BETWEEN VL.MIN_PART AND VL.MAX_PART;

COMMIT;

Before we start working in Excel, we need to put together a SQL statement so that we are able to determine by how much the price of a part fluctuates over time.  We will use the LEAD analytic function to allow us to compare the current row values with the next row values, and only output the row when either the VENDOR_ID changes or the UNIT_PRICE changes.  While the sample data potentially includes dates up to 5,000 days ago, we only want to consider dates up to 720 days ago for this example:

SELECT /*+ ORDERED */
  PH.PART_ID,
  PH.VENDOR_ID,
  PH.UNIT_PRICE,
  PH.LAST_VENDOR_ID,
  PH.LAST_UNIT_PRICE,
  PL.PRODUCT_CODE,
  PL.COMMODITY_CODE
FROM
  (SELECT
    PH.PART_ID,
    PH.VENDOR_ID,
    PH.UNIT_PRICE,
    PH.PURCHASE_DATE,
    LEAD(PH.VENDOR_ID,1,NULL) OVER (PARTITION BY PART_ID ORDER BY PURCHASE_DATE DESC) LAST_VENDOR_ID,
    LEAD(PH.UNIT_PRICE,1,NULL) OVER (PARTITION BY PART_ID ORDER BY PURCHASE_DATE DESC) LAST_UNIT_PRICE
  FROM
    PURCHASE_HISTORY PH
  WHERE
    PH.PURCHASE_DATE>=TRUNC(SYSDATE-720)) PH,
  PART_LIST PL
WHERE
  PH.PART_ID=PL.PART_ID
  AND (PH.VENDOR_ID<>NVL(PH.LAST_VENDOR_ID,'-')
    OR PH.UNIT_PRICE<>NVL(PH.LAST_UNIT_PRICE,-1))
ORDER BY
  PH.PART_ID,
  PH.PURCHASE_DATE DESC;

The output of the above SQL statement might look something like this:

PART_ID    VENDOR_ID  UNIT_PRICE LAST_VENDO LAST_UNIT_PRICE P C
---------- ---------- ---------- ---------- --------------- - -
AAAFWXDGOR HHJAWQCYIV    1773.67 RPKWXSTFDS         5841.37 I T
AAAFWXDGOR RPKWXSTFDS    5841.37                            I T
AABDVNQJBS BBOSDBKYBR    4034.07                            D J
AABNDOOTTV HQBZXICKQM    2932.36                            C G
AABPRKFTLG NKYJQJXGJN     242.18 HHJAWQCYIV         1997.01 F I
AABPRKFTLG HHJAWQCYIV    1997.01                            F I
AACHFXHCDC SZWNZCRUWZ    3562.43                            P G
AACNAAOZWE JEYKZFIKJU    4290.12                            L N
AAEAYOLWMN DNDYVXUZVZ    4431.63                            K T
AAFLKRJTCO QPXIDOEDTI    8613.52                            Q G
AAGDNYXQGW BZFMNYJVBP     911.06 RPKWXSTFDS         2813.39 B L
AAGDNYXQGW RPKWXSTFDS    2813.39                            B L
AAGMKTQITK RAGVQSBHKW    9221.90 BCIRRDLHAN         8541.34 S W
AAGMKTQITK BCIRRDLHAN    8541.34 CWQNPITMBE         5611.73 S W
AAGMKTQITK CWQNPITMBE    5611.73                            S W
AAINVDSSWC CQXRSIWOIL    2690.31 BBOSDBKYBR         1707.15 K R
AAINVDSSWC BBOSDBKYBR    1707.15 QFPGRYTYUM         9158.98 K R
AAINVDSSWC QFPGRYTYUM    9158.98                            K R
AALCTODILL NKYJQJXGJN    2116.94                            K M
AAMAUJIWLF LPMSAUJGHR    6294.19 CNHZFDEWIH         4666.58 L P
AAMAUJIWLF CNHZFDEWIH    4666.58 SZWNZCRUWZ         2096.59 L P
AAMAUJIWLF SZWNZCRUWZ    2096.59                            L P
AAMYBVKFQC GLVKOCSHSF     265.63 PNGVEEYGKA         5869.67 X Z
AAMYBVKFQC PNGVEEYGKA    5869.67                            X Z
AANVGRNFEX NFHOKCKLDN    3961.42                            Q O
...

Now we need to switch over to Excel.  Create four ActiveX command buttons named cmdInitialize, cmdComparePC, cmdCompareCC, cmdCompareVendorID.  Name the worksheet OracleAnalyticTest, as shown below:

Right-click the OracleAnalyticTest worksheet and select View Code.  See this blog article to determine how to enable macros in Excel 2007 (if you have not already turned on this feature) and add a reference to the Microsoft ActiveX Data Objects 2.8 (or 6.0) Library.  We will also need to add a reference to the Microsoft ActiveX Data Objects Recordset 2.8 (or 6.0) Library.  Next, we add the code to the cmdInitialize button:

Option Explicit 'Forces all variables to be declared

Dim dbDatabase As New ADODB.Connection
Dim strDatabase As String
Dim strUserName As String
Dim strPassword As String

Private Function ConnectDatabase() As Integer
    Dim intResult As Integer

    On Error Resume Next

    If dbDatabase.State <> 1 Then
        'Connection to the database if closed, specify the database name, a username, and password
        strDatabase = "MyDB"
        strUserName = "MyUser"
        strPassword = "MyPassword"

        'Connect to the database
        'Oracle connection string
        dbDatabase.ConnectionString = "Provider=OraOLEDB.Oracle;Data Source=" & strDatabase & ";User ID=" & strUserName & ";Password=" & strPassword & ";ChunkSize=1000;FetchSize=100;"

        dbDatabase.ConnectionTimeout = 40
        dbDatabase.CursorLocation = adUseClient
        dbDatabase.Open

        If (dbDatabase.State <> 1) Or (Err <> 0) Then
            intResult = MsgBox("Could not connect to the database.  Check your user name and password." & vbCrLf & Error(Err), 16, "Excel Demo")

            ConnectDatabase = False
        Else
            ConnectDatabase = True
        End If
    Else
        ConnectDatabase = True
    End If
End Function

Private Sub cmdInitialize_Click()
    Dim i As Integer
    Dim intResult As Integer
    Dim lngRow As Long
    Dim strSQL As String
    Dim snpData As ADODB.Recordset

    On Error Resume Next

    Sheets("OracleAnalyticTest").ChartObjects.Delete
    Sheets("OracleAnalyticTest").Rows("4:10000").Delete Shift:=xlUp

    intResult = ConnectDatabase

    If intResult = True Then
        Set snpData = New ADODB.Recordset

        strSQL = "SELECT /*+ ORDERED */" & vbCrLf
        strSQL = strSQL & "  PH.PART_ID," & vbCrLf
        strSQL = strSQL & "  PH.VENDOR_ID," & vbCrLf
        strSQL = strSQL & "  PH.UNIT_PRICE," & vbCrLf
        strSQL = strSQL & "  PH.LAST_VENDOR_ID," & vbCrLf
        strSQL = strSQL & "  PH.LAST_UNIT_PRICE," & vbCrLf
        strSQL = strSQL & "  PL.PRODUCT_CODE," & vbCrLf
        strSQL = strSQL & "  PL.COMMODITY_CODE" & vbCrLf
        strSQL = strSQL & "FROM" & vbCrLf
        strSQL = strSQL & "  (SELECT" & vbCrLf
        strSQL = strSQL & "    PH.PART_ID," & vbCrLf
        strSQL = strSQL & "    PH.VENDOR_ID," & vbCrLf
        strSQL = strSQL & "    PH.UNIT_PRICE," & vbCrLf
        strSQL = strSQL & "    PH.PURCHASE_DATE," & vbCrLf
        strSQL = strSQL & "    LEAD(PH.VENDOR_ID,1,NULL) OVER (PARTITION BY PART_ID ORDER BY PURCHASE_DATE DESC) LAST_VENDOR_ID," & vbCrLf
        strSQL = strSQL & "    LEAD(PH.UNIT_PRICE,1,NULL) OVER (PARTITION BY PART_ID ORDER BY PURCHASE_DATE DESC) LAST_UNIT_PRICE" & vbCrLf
        strSQL = strSQL & "  FROM" & vbCrLf
        strSQL = strSQL & "    PURCHASE_HISTORY PH" & vbCrLf
        strSQL = strSQL & "  WHERE" & vbCrLf
        strSQL = strSQL & "    PH.PURCHASE_DATE>=TRUNC(SYSDATE-270)) PH," & vbCrLf
        strSQL = strSQL & "  PART_LIST PL" & vbCrLf
        strSQL = strSQL & "WHERE" & vbCrLf
        strSQL = strSQL & "  PH.PART_ID=PL.PART_ID" & vbCrLf
        strSQL = strSQL & "  AND (PH.VENDOR_ID<>NVL(PH.LAST_VENDOR_ID,'-')" & vbCrLf
        strSQL = strSQL & "    OR PH.UNIT_PRICE<>NVL(PH.LAST_UNIT_PRICE,-1))" & vbCrLf
        strSQL = strSQL & "ORDER BY" & vbCrLf
        strSQL = strSQL & "  PH.PART_ID," & vbCrLf
        strSQL = strSQL & "  PH.PURCHASE_DATE DESC"
        snpData.Open strSQL, dbDatabase

        If snpData.State = 1 Then
            Application.ScreenUpdating = False

            For i = 0 To snpData.Fields.Count - 1
                ActiveSheet.Cells(3, i + 1).Value = snpData.Fields(i).Name
            Next i
            ActiveSheet.Range(ActiveSheet.Cells(3, 1), ActiveSheet.Cells(3, snpData.Fields.Count)).Font.Bold = True

            ActiveSheet.Range("A4").CopyFromRecordset snpData

            'Auto-fit up to 26 columns
            ActiveSheet.Columns("A:" & Chr(64 + snpData.Fields.Count)).AutoFit
            ActiveSheet.Range("A4").Select
            ActiveWindow.FreezePanes = True

            'Remove duplicate rows with the same PART ID
            lngRow = 4
            Do While lngRow < Sheets("OracleAnalyticTest").UsedRange.Rows.Count + 2
                If Sheets("OracleAnalyticTest").Cells(lngRow, 1).FormulaR1C1 = "" Then
                    'Past the end of the rows
                    Exit Do
                End If
                If Sheets("OracleAnalyticTest").Cells(lngRow - 1, 1).FormulaR1C1 = Sheets("OracleAnalyticTest").Cells(lngRow, 1).FormulaR1C1 Then
                    'Found a duplicate row, delete it
                    Sheets("OracleAnalyticTest").Rows(lngRow).Delete Shift:=xlUp
                Else
                    lngRow = lngRow + 1
                End If
            Loop
            snpData.Close

            Application.ScreenUpdating = True
        End If
    End If

    Set snpData = Nothing
End Sub

The cmdInitialize_Click subroutine retrieves the data from the database using the supplied SQL statement and writes that information to the worksheet.  The macro then eliminates subsequent rows if the part ID is identical to the previous part ID (this step would not have been required if we modified the SQL statement to use the ROW_NUMBER analytic function, and eliminate all rows where the ROW_NUMBER value is not 1).  Once you add the above code, you should be able to switch back to the Excel worksheet, turn off Design Mode, and click the Initialize button.

Unfortunately, this example will retrieve too many rows with too little variation in the PRODUCT_CODE and COMMODITY_CODE columns (just 26 distinct values), so it might be a good idea to delete all rows below row 1004.  Now we need to switch back to the Microsoft Visual Basic editor and add the code for the other three buttons.  Note that this code takes advantage of gradient shading in Excel 2007 charts, so some modification might be necessary on Excel 2003 and earlier.

Private Sub cmdCompareCC_Click()
    Dim i As Long
    Dim intCount As Integer
    Dim intChartNumber As Integer
    Dim lngRows As Long
    Dim dblValues() As Double
    Dim strValueNames() As String
    Dim snpDataList As ADOR.Recordset

    On Error Resume Next

    Sheets("OracleAnalyticTest").ChartObjects.Delete
    Sheets("OracleAnalyticTest").Cells(4, 1).Select
    lngRows = Sheets("OracleAnalyticTest").UsedRange.Rows.Count + 2

    'Set up to use ADOR to automatically sort the product codes
    Set snpDataList = New ADOR.Recordset
    snpDataList.Fields.Append "commodity_code", adVarChar, 30
    snpDataList.Open

    'Pick up a distinct list of commodity codes
    For i = 4 To lngRows
        'Only include those commodity codes with price changes
        If (Sheets("OracleAnalyticTest").Cells(i, 5).Value <> 0) And (Round(Sheets("OracleAnalyticTest").Cells(i, 3).Value - Sheets("OracleAnalyticTest").Cells(i, 5).Value, 2) <> 0) Then
            If snpDataList.RecordCount > 0 Then
                snpDataList.MoveFirst
            End If
            snpDataList.Find ("commodity_code = '" & Sheets("OracleAnalyticTest").Cells(i, 7) & "'")
            If snpDataList.EOF Then
                'Did not find a matching record
                snpDataList.AddNew
                snpDataList("commodity_code") = Sheets("OracleAnalyticTest").Cells(i, 7).Value
                snpDataList.Update
            End If
        End If
    Next i
    snpDataList.Sort = "commodity_code"

    'Find the matching rows for each product code
    snpDataList.MoveFirst
    Do While Not snpDataList.EOF
        intCount = 0
        ReDim dblValues(250)
        ReDim strValueNames(250)
        For i = 4 To lngRows
            If intCount >= 250 Then
                'Excel charts only permit about 250 data points when created with this method
                Exit For
            End If
            If Sheets("OracleAnalyticTest").Cells(i, 7).Value = snpDataList("commodity_code") Then
                'Found a row with this product code
                If (Sheets("OracleAnalyticTest").Cells(i, 5).Value <> 0) And (Round(Sheets("OracleAnalyticTest").Cells(i, 3).Value - Sheets("OracleAnalyticTest").Cells(i, 5).Value, 2) <> 0) Then
                    'Price change was found
                    dblValues(intCount) = Round(Sheets("OracleAnalyticTest").Cells(i, 3).Value - Sheets("OracleAnalyticTest").Cells(i, 5).Value, 2)
                    strValueNames(intCount) = Sheets("OracleAnalyticTest").Cells(i, 1).FormulaR1C1
                    intCount = intCount + 1
                End If
            End If
        Next i

        'Set the arrays to the exact number of elements, first element at position 0
        ReDim Preserve dblValues(intCount - 1)
        ReDim Preserve strValueNames(intCount - 1)

        intChartNumber = intChartNumber + 1
        With Sheets("OracleAnalyticTest").ChartObjects.Add(10 * intChartNumber, 60 + 10 * intChartNumber, 400, 300)
            .Chart.SeriesCollection.NewSeries
            .Chart.SeriesCollection(1).Values = dblValues
            .Chart.SeriesCollection(1).XValues = strValueNames
            .Chart.Axes(1).CategoryType = 2
            .Chart.HasLegend = False

            .Chart.HasTitle = True
            .Chart.ChartTitle.Text = "Price Changes by Commodity Code: " & snpDataList("commodity_code")

            .Chart.Axes(xlCategory, xlPrimary).HasTitle = True
            .Chart.Axes(xlCategory, xlPrimary).AxisTitle.Characters.Text = "Part ID"
            .Chart.Axes(xlValue, xlPrimary).HasTitle = True
            .Chart.Axes(xlValue, xlPrimary).AxisTitle.Characters.Text = "Unit Cost Change"

            .Chart.SeriesCollection(1).HasDataLabels = True
            .Chart.SeriesCollection(1).HasLeaderLines = True

            With .Chart.PlotArea.Border
                .ColorIndex = 16
                .Weight = xlThin
                .LineStyle = xlContinuous
            End With

            .Chart.PlotArea.Fill.OneColorGradient Style:=msoGradientHorizontal, Variant:=2, Degree:=0.756847486076142
            .Chart.PlotArea.Fill.ForeColor.SchemeColor = 23
            .Chart.PlotArea.Fill.Visible = True
            With .Chart.PlotArea.Border
                .ColorIndex = 57
                .Weight = xlThin
                .LineStyle = xlContinuous
            End With

            .Chart.SeriesCollection(1).Fill.OneColorGradient Style:=msoGradientVertical, Variant:=4, Degree:=0.2
            .Chart.SeriesCollection(1).Fill.Visible = True
            .Chart.SeriesCollection(1).Fill.ForeColor.SchemeColor = 4

            .Chart.Axes(xlValue).MajorGridlines.Border.ColorIndex = 2
            With .Chart.SeriesCollection(1).DataLabels.Font
                .Name = "Arial"
                .FontStyle = "Regular"
                .Size = 8
                .Color = RGB(255, 255, 255)
            End With
            With .Chart.Axes(xlCategory).TickLabels.Font
                .Name = "Arial"
                .FontStyle = "Regular"
                .Size = 8
                .Color = RGB(255, 255, 255)
            End With
            With .Chart.ChartTitle.Font
                .Name = "Arial"
                .FontStyle = "Bold"
                .Size = 16
                .Color = RGB(0, 0, 255)
            End With
        End With
        snpDataList.MoveNext
    Loop

    Set snpDataList = Nothing
End Sub

Private Sub cmdComparePC_Click()
    Dim i As Long
    Dim intCount As Integer
    Dim intChartNumber As Integer
    Dim lngRows As Long
    Dim dblValues() As Double
    Dim strValueNames() As String
    Dim snpDataList As ADOR.Recordset

    On Error Resume Next

    Sheets("OracleAnalyticTest").ChartObjects.Delete
    Sheets("OracleAnalyticTest").Cells(4, 1).Select
    lngRows = Sheets("OracleAnalyticTest").UsedRange.Rows.Count + 2

    'Set up to use ADOR to automatically sort the product codes
    Set snpDataList = New ADOR.Recordset
    snpDataList.Fields.Append "product_code", adVarChar, 30
    snpDataList.Open

    'Pick up a distinct list of product codes
    For i = 4 To lngRows
        'Only include those product codes with price changes
        If (Sheets("OracleAnalyticTest").Cells(i, 5).Value <> 0) And (Round(Sheets("OracleAnalyticTest").Cells(i, 3).Value - Sheets("OracleAnalyticTest").Cells(i, 5).Value, 2) <> 0) Then
            If snpDataList.RecordCount > 0 Then
                snpDataList.MoveFirst
            End If
            snpDataList.Find ("product_code = '" & Sheets("OracleAnalyticTest").Cells(i, 6) & "'")
            If snpDataList.EOF Then
                'Did not find a matching record
                snpDataList.AddNew
                snpDataList("product_code") = Sheets("OracleAnalyticTest").Cells(i, 6).Value
                snpDataList.Update
            End If
        End If
    Next i
    snpDataList.Sort = "product_code"

    'Find the matching rows for each product code
    snpDataList.MoveFirst
    Do While Not snpDataList.EOF
        intCount = 0
        ReDim dblValues(250)
        ReDim strValueNames(250)
        For i = 4 To lngRows
            If intCount >= 250 Then
                'Excel charts only permit about 250 data points when created with this method
                Exit For
            End If
            If Sheets("OracleAnalyticTest").Cells(i, 6).Value = snpDataList("product_code") Then
                'Found a row with this product code
                If (Sheets("OracleAnalyticTest").Cells(i, 5).Value <> 0) And (Round(Sheets("OracleAnalyticTest").Cells(i, 3).Value - Sheets("OracleAnalyticTest").Cells(i, 5).Value, 2) <> 0) Then
                    'Price change was found
                    dblValues(intCount) = Round(Sheets("OracleAnalyticTest").Cells(i, 3).Value - Sheets("OracleAnalyticTest").Cells(i, 5).Value, 2)
                    strValueNames(intCount) = Sheets("OracleAnalyticTest").Cells(i, 1).FormulaR1C1
                    intCount = intCount + 1
                End If
            End If
        Next i

        'Set the arrays to the exact number of elements, first element at position 0
        ReDim Preserve dblValues(intCount - 1)
        ReDim Preserve strValueNames(intCount - 1)

        intChartNumber = intChartNumber + 1

        With Sheets("OracleAnalyticTest").ChartObjects.Add(10 * intChartNumber, 60 + 10 * intChartNumber, 400, 300)
            .Chart.SeriesCollection.NewSeries
            .Chart.SeriesCollection(1).Values = dblValues
            .Chart.SeriesCollection(1).XValues = strValueNames
            .Chart.Axes(1).CategoryType = 2
            .Chart.HasLegend = False

            .Chart.HasTitle = True
            .Chart.ChartTitle.Text = "Price Changes by Product Code: " & snpDataList("product_code")

            .Chart.Axes(xlCategory, xlPrimary).HasTitle = True
            .Chart.Axes(xlCategory, xlPrimary).AxisTitle.Characters.Text = "Part ID"
            .Chart.Axes(xlValue, xlPrimary).HasTitle = True
            .Chart.Axes(xlValue, xlPrimary).AxisTitle.Characters.Text = "Unit Cost Change"

            .Chart.SeriesCollection(1).HasDataLabels = True
            .Chart.SeriesCollection(1).HasLeaderLines = True

            With .Chart.PlotArea.Border
                .ColorIndex = 16
                .Weight = xlThin
                .LineStyle = xlContinuous
            End With

            .Chart.PlotArea.Fill.OneColorGradient Style:=msoGradientHorizontal, Variant:=2, Degree:=0.756847486076142
            .Chart.PlotArea.Fill.ForeColor.SchemeColor = 23
            .Chart.PlotArea.Fill.Visible = True
            With .Chart.PlotArea.Border
                .ColorIndex = 57
                .Weight = xlThin
                .LineStyle = xlContinuous
            End With

            .Chart.SeriesCollection(1).Fill.OneColorGradient Style:=msoGradientVertical, Variant:=4, Degree:=0.2
            .Chart.SeriesCollection(1).Fill.Visible = True
            .Chart.SeriesCollection(1).Fill.ForeColor.SchemeColor = 5

            .Chart.Axes(xlValue).MajorGridlines.Border.ColorIndex = 2
            With .Chart.SeriesCollection(1).DataLabels.Font
                .Name = "Arial"
                .FontStyle = "Regular"
                .Size = 8
                .Color = RGB(255, 255, 255)
            End With
            With .Chart.Axes(xlCategory).TickLabels.Font
                .Name = "Arial"
                .FontStyle = "Regular"
                .Size = 8
                .Color = RGB(255, 255, 255)
            End With
            With .Chart.ChartTitle.Font
                .Name = "Arial"
                .FontStyle = "Bold"
                .Size = 16
                .Color = RGB(0, 0, 255)
            End With
        End With

        snpDataList.MoveNext
    Loop

    Set snpDataList = Nothing
End Sub

Private Sub cmdCompareVendorID_Click()
    Dim i As Long
    Dim intCount As Integer
    Dim intChartNumber As Integer
    Dim lngRows As Long
    Dim dblValues() As Double
    Dim strValueNames() As String
    Dim snpDataList As ADOR.Recordset

    On Error Resume Next

    Sheets("OracleAnalyticTest").ChartObjects.Delete
    Sheets("OracleAnalyticTest").Cells(4, 1).Select
    lngRows = Sheets("OracleAnalyticTest").UsedRange.Rows.Count + 2

    'Set up to use ADOR to automatically sort the product codes
    Set snpDataList = New ADOR.Recordset
    snpDataList.Fields.Append "vendor_id", adVarChar, 30
    snpDataList.Open

    'Pick up a distinct list of vendor IDs
    For i = 4 To lngRows
        'Only include those vendor IDs with price changes
        If (Sheets("OracleAnalyticTest").Cells(i, 5).Value <> 0) And (Round(Sheets("OracleAnalyticTest").Cells(i, 3).Value - Sheets("OracleAnalyticTest").Cells(i, 5).Value, 2) <> 0) Then
            If snpDataList.RecordCount > 0 Then
                snpDataList.MoveFirst
            End If
            snpDataList.Find ("vendor_id = '" & Sheets("OracleAnalyticTest").Cells(i, 2) & "'")
            If snpDataList.EOF Then
                'Did not find a matching record
                snpDataList.AddNew
                snpDataList("vendor_id") = Sheets("OracleAnalyticTest").Cells(i, 2).Value
                snpDataList.Update
            End If
        End If
    Next i
    snpDataList.Sort = "vendor_id"

    'Find the matching rows for each product code
    snpDataList.MoveFirst
    Do While Not snpDataList.EOF
        intCount = 0
        ReDim dblValues(250)
        ReDim strValueNames(250)
        For i = 4 To lngRows
            If intCount >= 250 Then
                'Excel charts only permit about 250 data points when created with this method
                Exit For
            End If
            If Sheets("OracleAnalyticTest").Cells(i, 2).Value = snpDataList("vendor_id") Then
                'Found a row with this product code
                If (Sheets("OracleAnalyticTest").Cells(i, 5).Value <> 0) And (Round(Sheets("OracleAnalyticTest").Cells(i, 3).Value - Sheets("OracleAnalyticTest").Cells(i, 5).Value, 2) <> 0) Then
                    'Price change was found
                    dblValues(intCount) = Round(Sheets("OracleAnalyticTest").Cells(i, 3).Value - Sheets("OracleAnalyticTest").Cells(i, 5).Value, 2)
                    strValueNames(intCount) = Sheets("OracleAnalyticTest").Cells(i, 1).FormulaR1C1
                    intCount = intCount + 1
                End If
            End If
        Next i

        'Set the arrays to the exact number of elements, first element at position 0
        ReDim Preserve dblValues(intCount - 1)
        ReDim Preserve strValueNames(intCount - 1)

        intChartNumber = intChartNumber + 1

        With Sheets("OracleAnalyticTest").ChartObjects.Add(10 * intChartNumber, 60 + 10 * intChartNumber, 400, 300)
            .Chart.SeriesCollection.NewSeries
            .Chart.SeriesCollection(1).Values = dblValues
            .Chart.SeriesCollection(1).XValues = strValueNames
            .Chart.Axes(1).CategoryType = 2
            .Chart.HasLegend = False

            .Chart.HasTitle = True
            .Chart.ChartTitle.Text = "Price Changes by Vendor: " & snpDataList("vendor_id")

            .Chart.Axes(xlCategory, xlPrimary).HasTitle = True
            .Chart.Axes(xlCategory, xlPrimary).AxisTitle.Characters.Text = "Part ID"
            .Chart.Axes(xlValue, xlPrimary).HasTitle = True
            .Chart.Axes(xlValue, xlPrimary).AxisTitle.Characters.Text = "Unit Cost Change"

            .Chart.SeriesCollection(1).HasDataLabels = True
            .Chart.SeriesCollection(1).HasLeaderLines = True

            With .Chart.PlotArea.Border
                .ColorIndex = 16
                .Weight = xlThin
                .LineStyle = xlContinuous
            End With

            .Chart.PlotArea.Fill.OneColorGradient Style:=msoGradientHorizontal, Variant:=2, Degree:=0.756847486076142
            .Chart.PlotArea.Fill.ForeColor.SchemeColor = 23
            .Chart.PlotArea.Fill.Visible = True
            With .Chart.PlotArea.Border
                .ColorIndex = 57
                .Weight = xlThin
                .LineStyle = xlContinuous
            End With

            .Chart.SeriesCollection(1).Fill.OneColorGradient Style:=msoGradientVertical, Variant:=4, Degree:=0.2
            .Chart.SeriesCollection(1).Fill.Visible = True
            .Chart.SeriesCollection(1).Fill.ForeColor.SchemeColor = 45

            .Chart.Axes(xlValue).MajorGridlines.Border.ColorIndex = 2
            With .Chart.SeriesCollection(1).DataLabels.Font
                .Name = "Arial"
                .FontStyle = "Regular"
                .Size = 8
                .Color = RGB(255, 255, 255)
            End With
            With .Chart.Axes(xlCategory).TickLabels.Font
                .Name = "Arial"
                .FontStyle = "Regular"
                .Size = 8
                .Color = RGB(255, 255, 255)
            End With
            With .Chart.ChartTitle.Font
                .Name = "Arial"
                .FontStyle = "Bold"
                .Size = 16
                .Color = RGB(0, 0, 255)
            End With
        End With
        snpDataList.MoveNext
    Loop

    Set snpDataList = Nothing
End Sub

If we switch back to the Excel worksheet, the remaining three buttons should now work.  Clicking each button will cause Excel to examine the data in the worksheet to locate all of the unique values for PRODUCT_CODE, COMMODITY_CODE, or VENDOR_ID, and then sort the list in alphabetical order, and build a chart for each of the part IDs that fall into those categories.  The results for my test run of each button looks like the following three pictures.

You can, of course, adapt the code to work with other SQL statements and modify the chart generating code to alter the chart type, colors, and fonts.





SQL Basics – Working with ERP Data

24 01 2010

January 24, 2010

This blog article is based on a portion of a presentation that I gave at a regional ERP user’s group meeting.  While some of the information is specific to that particular ERP platform, the concepts should be general enough that the material may be applied to other environments.

Typically, a language called Structured Query Language (SQL) is used to directly communicate with the database.  As with all languages, there are syntax rules that must be followed.  In general, data is stored in a series of tables, which may be thought of as if they were worksheets in an Excel spreadsheet.  The various tables may be joined together to provide greater detail, but great care must be taken to correctly join the tables together.  The correct table joining conditions may be partially determined by examining the primary and foreign key relationships between the tables, and we will talk about that more later in the presentation.

Tips:

Relationships between tables containing related information may be determined by:

  • Primary (parent) and foreign (child) relationships defined in the database (see Data Dict Foreign Keys worksheet).
  • Primary key columns are often named ID, and the foreign key columns are often named table_ID, for example: ACCOUNT.ID = ACCOUNT_BALANCE.ACCOUNT_ID
  • Relationships may be discovered by searching for other tables in the database containing the same column names (see Data Dict Tables worksheet).

SQL Basics:

Indexes on table columns may allow a query to execute faster, but it is important that all of the beginning columns in the index are used (don’t forget the TYPE column when retrieving information from the WORK_ORDER table, or the WORKORDER_TYPE column when accessing the OPERATION table).  While indexes usually help when a small amount of information is needed from a table, other methods (full table scan) are sometimes more appropriate. 

Indexes usually cannot be used for those columns in the WHERE clause if the column appears inside a function name – index will not be used for   TRUNC(LABOR_TICKET.TRANSACTION_DATE) =  – unless a function based index is created for that function and column combination.

When multiple tables must be accessed, each column retrieved should be prefixed with the table name (or an aliased name for the table) containing the column.  Prefixing the columns improves the readabilty of the SQL statement and prevents errors that happen when two tables contain columns with the same names.

In a WHERE clause, character type data should appear in single quotes ( ‘ ), and number type data should not appear in single quotes.  Dates should not rely on implicit data type conversion -  don’t use ’24-JAN-2010′ as there is a chance that the implicit conversion will fail in certain environments.

Information retrieved from the database using a SQL statement may be grouped to summarize the data.

Executing SQL:

Assume that we are new to SQL and just start typing a SQL statement, hoping that the database will be able to help us make a correct request – since that kind of works in Microsoft Access.

SELECT DISTINCT
  *
FROM
  WORK_ORDER,
  OPERATION,
  REQUIREMENT;

When we execute this SQL statement, the database server spins and spins (not the formal meaning of a spin), until the SQL statement finally falls over and dies (to the uninitiated, this is not supposed to happen when a query executes).

Mon Jan 07 11:01:32 2009
ORA-1652: unable to extend temp segment by 128 in tablespace TEMPORARY_DATA
Mon Jan 07 11:12:31 2009
ORA-1652: unable to extend temp segment by 128 in tablespace TEMPORARY_DATA
Mon Jan 07 11:15:11 2009
ORA-1652: unable to extend temp segment by 128 in tablespace TEMPORARY_DATA
Mon Jan 07 11:25:28 2009
ORA-1652: unable to extend temp segment by 128 in tablespace TEMPORARY_DATA
Mon Jan 07 11:28:13 2009
ORA-1652: unable to extend temp segment by 128 in tablespace TEMPORARY_DATA

Depending on the database engine and the database administrator, it might be that the database is down for a long time, or that just the query tool that submitted the SQL statement crashes after forcing the CPUs on the server to spin excessively.  Be careful about who has access to a query tool that access the database.

Simple SQL - Retrieve the part ID, description, product code, and quantity on hand for all parts:

The following is a simple SQL statement which will retrieve four columns from the PART table for all parts, essentially in random order.  You may notice that my SQL statement is formatted in a very specific way – the reason for this formatting will become more clear later.  Essentially, standardized formats help improve database performance (by reducing the number of hard parses) – for ad hoc SQL statements (those created for one time use), the performance difference probably will not be noticed, but when placed into various applications that execute the SQL statements repeatedly, the performance difference will be very clear.

SELECT
  ID,
  DESCRIPTION,
  PRODUCT_CODE,
  QTY_ON_HAND
FROM
  PART;

Retrieve the part ID, description, product code, and quantity on hand for all parts with a commodity code of AAAA:

SELECT
  ID,
  DESCRIPTION,
  PRODUCT_CODE,
  QTY_ON_HAND
FROM
  PART
WHERE
  COMMODITY_CODE = 'AAAA';

Retrieve the part ID, description, product code, and quantity on hand for all parts with a commodity code of AAAA with more than 10 on hand:

SELECT
  ID,
  DESCRIPTION,
  PRODUCT_CODE,
  QTY_ON_HAND
FROM
  PART
WHERE
  COMMODITY_CODE = 'AAAA'
  AND QTY_ON_HAND > 10

Retrieve the part ID, description, product code, and quantity on hand for all parts with a commodity code beginning with  A  with 10 to 100 on hand:

SELECT
  ID,
  DESCRIPTION,
  PRODUCT_CODE,
  QTY_ON_HAND
FROM
  PART
WHERE
  COMMODITY_CODE LIKE 'A%'
  AND QTY_ON_HAND BETWEEN 10 AND 100;

Retrieve the part ID, description, product code, and quantity on hand sorted by product code, then part ID – Fixing the Random Order:

SELECT
  ID,
  DESCRIPTION,
  PRODUCT_CODE,
  QTY_ON_HAND
FROM
  PART
WHERE
  COMMODITY_CODE LIKE 'A%'
  AND QTY_ON_HAND BETWEEN 10 AND 100
ORDER BY
  PRODUCT_CODE,
  ID;

Retrieve the product code, and total quantity on hand by product code, sorted by product code:

SELECT
  PRODUCT_CODE,
  SUM(QTY_ON_HAND) AS TOTAL_QTY
FROM
  PART
WHERE
  COMMODITY_CODE LIKE 'F%'
GROUP BY
  PRODUCT_CODE
ORDER BY
  PRODUCT_CODE;

The above example changed the previous example quite a bit, so that only those parts with a commodity code beginning with F are returned - in the example, I want to determine the total number of parts on hand by product code (labeled TOTAL_QTY) for those parts with a commodity code beginning with F.  In addition to the ORDER BY clause, a GROUP BY clause was also needed.  The columns that must be listed in the group by clause are those columns in the SELECT clause which are not inside a SUM(), AVG(), MIN(), MAX(), or similar function.

Retrieve the product code, and total quantity on hand by product code, return only those with a total quantity on hand more than 100, sorted by product code:

SELECT
  PRODUCT_CODE,
  SUM(QTY_ON_HAND) AS TOTAL_QTY
FROM
  PART
WHERE
  COMMODITY_CODE LIKE 'F%'
GROUP BY
  PRODUCT_CODE
HAVING
  SUM(QTY_ON_HAND) > 100
ORDER BY
  PRODUCT_CODE;

Retrieve the top level part ID produced by all unreleased, firmed, and released work orders, include the work order, lot, part description, and quantity on hand:

Now that we know how to work with data stored in a single table, let’s take a look at an example with two tables.  Each column returned from the tables should be prefixed with the table name – primarily in case where the same column name appears in both tables, but doing this also makes it easier to troubleshoot problems with the SQL statement at a later time.  The following SQL statement retrieves a list of all parts produced by non-closed and non-canceled work orders that are in the system (status is unreleased, firmed, or released).

SELECT
  WORK_ORDER.BASE_ID,
  WORK_ORDER.LOT_ID,
  WORK_ORDER.SPLIT_ID,
  WORK_ORDER.PART_ID,
  PART.DESCRIPTION,
  PART.QTY_ON_HAND,
  WORK_ORDER.DESIRED_QTY,
  WORK_ORDER.RECEIVED_QTY
FROM
  WORK_ORDER,
  PART
WHERE
  WORK_ORDER.TYPE = 'W'
  AND WORK_ORDER.SUB_ID='0'
  AND WORK_ORDER.PART_ID=PART.ID
  AND WORK_ORDER.DESIRED_QTY > WORK_ORDER.RECEIVED_QTY
  AND WORK_ORDER.STATUS IN ('U', 'F', 'R')
ORDER BY
  WORK_ORDER.PART_ID,
  WORK_ORDER.BASE_ID,
  WORK_ORDER.LOT_ID,
  WORK_ORDER.SPLIT_ID;

The following SQL statement is essentially the same SQL statement as the last, just with table aliases (or short-names) which significantly reduce the amount of typing.

SELECT
  WO.BASE_ID,
  WO.LOT_ID,
  WO.SPLIT_ID,
  WO.PART_ID,
  P.DESCRIPTION,
  P.QTY_ON_HAND,
  WO.DESIRED_QTY,
  WO.RECEIVED_QTY
FROM
  WORK_ORDER WO,
  PART P
WHERE
  WO.TYPE = 'W'
  AND WO.SUB_ID='0'
  AND WO.PART_ID=P.ID
  AND WO.DESIRED_QTY > WO.RECEIVED_QTY
  AND WO.STATUS IN ('U', 'F', 'R')
ORDER BY
  WO.PART_ID,
  WO.BASE_ID,
  WO.LOT_ID,
  WO.SPLIT_ID;

Retrieve the engineering master information for a part

Back to the original example which brought down the database server (or at the least filled the temp tablespace to its maximum size), adding in two references to the PART table, each with a different alias name.  This SQL statement will retrieve the main header card, all operations, and all material requirements for a specific fabricated part.  But, there is a catch.  Operations without material requirements are excluded from the output.  Fixing that problem requires the use of an outer join, which on Oracle is indicated by a (+) following the column name that is permitted to be NULL, and on SQL Server the outer join is indicated by an * to the side of the equality that is NOT permitted to be NULL.  (Note that there are also ANSI style inner and outer joins, but these are not mentioned here).

SELECT
  WO.BASE_ID || DECODE(O.WORKORDER_SUB_ID, '0', '/', '-' || O.WORKORDER_SUB_ID  || '/') || WO.LOT_ID AS WORK_ORDER,
  WO.DESIRED_QTY - WO.RECEIVED_QTY AS REMAINING_QTY,
  WO.PART_ID AS WO_PART_ID,
  P.DESCRIPTION AS WO_PART_DESC,
  O.SEQUENCE_NO AS OP,
  O.RESOURCE_ID,
  SR.DESCRIPTION AS RESOURCE_DESC,
  O.SETUP_HRS,
  O.RUN_HRS,
  O.CALC_END_QTY,
  R.PIECE_NO,
  R.PART_ID AS REQ_PART_ID,
  P2.DESCRIPTION AS REQ_PART_DESC,
  R.CALC_QTY
FROM
  WORK_ORDER WO,
  PART P,
  OPERATION O,
  SHOP_RESOURCE SR,
  REQUIREMENT R,
  PART P2
WHERE
  WO.TYPE = 'M'
  AND P.ID = 'ABC123'
  AND P.ID = WO.BASE_ID
  AND P.ENGINEERING_MSTR = WO.LOT_ID
  AND WO.SPLIT_ID = '0'
  AND WO.SUB_ID = '0'
  AND WO.TYPE = O.WORKORDER_TYPE
  AND WO.BASE_ID = O.WORKORDER_BASE_ID
  AND WO.LOT_ID = O.WORKORDER_LOT_ID
  AND WO.SPLIT_ID = O.WORKORDER_SPLIT_ID
  AND O.RESOURCE_ID = SR.ID(+)
  AND O.WORKORDER_TYPE = R.WORKORDER_TYPE(+)
  AND O.WORKORDER_BASE_ID = R.WORKORDER_BASE_ID(+)
  AND O.WORKORDER_LOT_ID = R.WORKORDER_LOT_ID(+)
  AND O.WORKORDER_SPLIT_ID = R.WORKORDER_SPLIT_ID(+)
  AND O.WORKORDER_SUB_ID = R.WORKORDER_SUB_ID(+)
  AND O.SEQUENCE_NO = R.OPERATION_SEQ_NO(+)
  AND R.PART_ID = P2.ID(+)
ORDER BY
  O.WORKORDER_SUB_ID,
  O.SEQUENCE_NO,
  R.PART_ID;

Analyze the UNIT_MATERIAL_COST column in the PART table.  For each part, find the relative cost (high to low) ranking, average cost, smallest cost, highest cost, and the total number in each group, when the parts are grouped individually by product code, commodity code, and also preferred vendor (all parts without a preferred vendor are grouped together):

SELECT
  ID,
  DESCRIPTION,
  PRODUCT_CODE,
  COMMODITY_CODE,
  UNIT_MATERIAL_COST,
  ROW_NUMBER() OVER (PARTITION BY PRODUCT_CODE ORDER BY COMMODITY_CODE,ID) PART_WITHIN_PC,
  COUNT(1) OVER (PARTITION BY PRODUCT_CODE ORDER BY COMMODITY_CODE,ID) PART_WITHIN_PC2,
  RANK() OVER (PARTITION BY PRODUCT_CODE ORDER BY UNIT_MATERIAL_COST DESC NULLS LAST) RANK_PC_COST,
  AVG(UNIT_MATERIAL_COST) OVER (PARTITION BY PRODUCT_CODE) AVG_PC_COST,
  MIN(UNIT_MATERIAL_COST) OVER (PARTITION BY PRODUCT_CODE) MIN_PC_COST,
  MAX(UNIT_MATERIAL_COST) OVER (PARTITION BY PRODUCT_CODE) MAX_PC_COST,
  COUNT(UNIT_MATERIAL_COST) OVER (PARTITION BY PRODUCT_CODE) COUNT_PC,
  RANK() OVER (PARTITION BY COMMODITY_CODE ORDER BY UNIT_MATERIAL_COST DESC NULLS LAST) RANK_CC_COST,
  AVG(UNIT_MATERIAL_COST) OVER (PARTITION BY COMMODITY_CODE) AVG_CC_COST,
  MIN(UNIT_MATERIAL_COST) OVER (PARTITION BY COMMODITY_CODE) MIN_CC_COST,
  MAX(UNIT_MATERIAL_COST) OVER (PARTITION BY COMMODITY_CODE) MAX_CC_COST,
  COUNT(UNIT_MATERIAL_COST) OVER (PARTITION BY COMMODITY_CODE) COUNT_CC,
  RANK() OVER (PARTITION BY NVL(PREF_VENDOR_ID,'IN_HOUSE_FAB') ORDER BY UNIT_MATERIAL_COST
    DESC NULLS LAST) RANK_VENDOR_COST,
  AVG(UNIT_MATERIAL_COST) OVER (PARTITION BY NVL(PREF_VENDOR_ID,'IN_HOUSE_FAB')) AVG_VENDOR_COST,
  MIN(UNIT_MATERIAL_COST) OVER (PARTITION BY NVL(PREF_VENDOR_ID,'IN_HOUSE_FAB')) MIN_VENDOR_COST,
  MAX(UNIT_MATERIAL_COST) OVER (PARTITION BY NVL(PREF_VENDOR_ID,'IN_HOUSE_FAB')) MAX_VENDOR_COST,
  COUNT(UNIT_MATERIAL_COST) OVER (PARTITION BY PREF_VENDOR_ID) COUNT_VENDOR
FROM
  PART
ORDER BY
  ID;

On Oracle, there are also analytical functions which allow information to be grouped together without the need for a GROUP BY clause, and each column returned could potentially be grouped using different criteria.  There are several interesting analytical functions that make otherwise difficult comparisons both easy to accomplish and efficient to execute.  Many of the analytical functions allow data to be summarized by groups without losing the detail contained in each row of the data, for instance we are able to select the part_ID, description, and unit_material_cost without grouping on those columns.  PARTITION BY may be thought of as behaving like GROUP BY.  The inclusion of ORDER BY within the OVER clause means that only those rows encountered to that point, when sorted in the specified order, will be considered.

Show the sum of the hours worked for each employee by shift date, along with the previous five days and the next five days, and the next Monday after the shift date – looking at previous and next rows in the data, using inline view:

SELECT
  EMPLOYEE_ID,
  SHIFT_DATE,
  NEXT_DAY(SHIFT_DATE,'MONDAY') PAYROLL_PREPARE_DATE,
  LAG(HOURS_WORKED,5,0) OVER (PARTITION BY EMPLOYEE_ID ORDER BY SHIFT_DATE) PREV5_HOURS,
  LAG(HOURS_WORKED,4,0) OVER (PARTITION BY EMPLOYEE_ID ORDER BY SHIFT_DATE) PREV4_HOURS,
  LAG(HOURS_WORKED,3,0) OVER (PARTITION BY EMPLOYEE_ID ORDER BY SHIFT_DATE) PREV3_HOURS,
  LAG(HOURS_WORKED,2,0) OVER (PARTITION BY EMPLOYEE_ID ORDER BY SHIFT_DATE) PREV2_HOURS,
  LAG(HOURS_WORKED,1,0) OVER (PARTITION BY EMPLOYEE_ID ORDER BY SHIFT_DATE) PREV_HOURS,
  HOURS_WORKED,
  LEAD(HOURS_WORKED,1,0) OVER (PARTITION BY EMPLOYEE_ID ORDER BY SHIFT_DATE) NEXT_HOURS,
  LEAD(HOURS_WORKED,2,0) OVER (PARTITION BY EMPLOYEE_ID ORDER BY SHIFT_DATE) NEXT2_HOURS,
  LEAD(HOURS_WORKED,3,0) OVER (PARTITION BY EMPLOYEE_ID ORDER BY SHIFT_DATE) NEXT3_HOURS,
  LEAD(HOURS_WORKED,4,0) OVER (PARTITION BY EMPLOYEE_ID ORDER BY SHIFT_DATE) NEXT4_HOURS,
  LEAD(HOURS_WORKED,5,0) OVER (PARTITION BY EMPLOYEE_ID ORDER BY SHIFT_DATE) NEXT5_HOURS
FROM
  (SELECT
    EMPLOYEE_ID,
    SHIFT_DATE,
    SUM(HOURS_WORKED) HOURS_WORKED
  FROM
    LABOR_TICKET
  WHERE
    SHIFT_DATE>=TRUNC(SYSDATE-14)
  GROUP BY
    EMPLOYEE_ID,
    SHIFT_DATE
  ORDER BY
    EMPLOYEE_ID,
    SHIFT_DATE);

LAG and LEAD are interesting functions which permit looking at previous and next rows, when sorted in the specified order.

SQL coding is not hard to understand – as long as you build out from a simple SQL statement to the SQL statement that returns the desired output.





PGA Memory – The Developer’s Secret Weapon for Stealing All of the Memory in the Server 2

19 01 2010

January 19, 2010

This article is a follow up to the earlier article – just how much PGA memory can a SQL statement with two NOT IN clauses, and an ORDER BY clause consume?  As we saw in the previous post, DBMS_XPLAN.DISPLAY_CURSOR may be a bit misleading due to the scale of the Used-Tmp column, and the fact that not all of the memory listed in the Used-Mem column is necessarily used at the same time.

So, let’s try three experiments where we modify the SQL statement in the script to have one of the following:

AND T1.C1 BETWEEN 1 AND 500000
AND T1.C1 BETWEEN 1 AND 1000000
AND T1.C1 BETWEEN 1 AND 1400000

So, for the first test, the PGAMemoryFill2.sql script will look like this:

DECLARE
CURSOR C_MEMORY_FILL IS
SELECT
  T1.C1,
  T1.C2,
  T1.C3
FROM
  T1
WHERE
  T1.C1 NOT IN (
    SELECT
      C1
    FROM
      T2)
  AND T1.C2 NOT IN (
    SELECT
      C2
    FROM
      T3)
  AND T1.C1 BETWEEN 1 AND 500000
ORDER BY
  T1.C2 DESC,
  T1.C1 DESC;

TYPE TYPE_MEMORY_FILL IS TABLE OF C_MEMORY_FILL%ROWTYPE
INDEX BY BINARY_INTEGER;

T_MEMORY_FILL  TYPE_MEMORY_FILL;

BEGIN
  OPEN C_MEMORY_FILL;
  LOOP
    FETCH C_MEMORY_FILL BULK COLLECT INTO  T_MEMORY_FILL LIMIT  10000000;

    EXIT WHEN T_MEMORY_FILL.COUNT = 0;

    FOR I IN T_MEMORY_FILL.FIRST..T_MEMORY_FILL.LAST LOOP
      NULL;
    END LOOP;

    DBMS_LOCK.SLEEP(20);
  END LOOP;
END;
/

(You two DBAs who are about to stand and clap, sit back down, didn’t you learn anything from the previous article that used bulk collect?)  We will use just two sessions, and make a small adjustment to the query of V$SQL_WORKAREA_ACTIVE so that we will be able to match the memory allocation to a specific step in the execution plan.  Additionally, that view will be queried once approximately every 10 seconds.

Session 1:

SELECT SID FROM V$MYSTAT WHERE ROWNUM<=1;

       SID
----------
       303

SET AUTOTRACE TRACEONLY EXPLAIN

SELECT
  T1.C1,
  T1.C2,
  T1.C3
FROM
  T1
WHERE
  T1.C1 NOT IN (
    SELECT
      C1
    FROM
      T2)
  AND T1.C2 NOT IN (
    SELECT
      C2
    FROM
      T3)
  AND T1.C1 BETWEEN 1 AND 500000
ORDER BY
  T1.C2 DESC,
  T1.C1 DESC;

Plan hash value: 3251203018

-------------------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes |TempSpc| Cost (%CPU)|  Time    |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |   497K|    82M|       |   825K  (1)| 02:45:02 |
|   1 |  SORT ORDER BY       |      |   497K|    82M|    86M|   825K  (1)| 02:45:02 |
|*  2 |   HASH JOIN ANTI NA  |      |   497K|    82M|    73M|   806K  (1)| 02:41:14 |
|*  3 |    HASH JOIN ANTI NA |      |   499K|    68M|    71M|   668K  (1)| 02:13:46 |
|*  4 |     TABLE ACCESS FULL| T1   |   500K|    65M|       |   543K  (1)| 01:48:42 |
|   5 |     TABLE ACCESS FULL| T2   |    10M|    57M|       |   113K  (1)| 00:22:39 |
|   6 |    TABLE ACCESS FULL | T3   |    10M|   295M|       |   113K  (1)| 00:22:39 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("T1"."C2"="C2")
   3 - access("T1"."C1"="C1")
   4 - filter("T1"."C1"<=500000 AND "T1"."C1">=1)

SET AUTOTRACE OFF

ALTER SESSION SET STATISTICS_LEVEL=ALL;

@PGAMemoryFill2.sql

Session 2:

SET PAGESIZE 2000
SET LINESIZE 150

COLUMN ID FORMAT 99
COLUMN PASSES FORMAT 999999
COLUMN OPERATION_TYPE FORMAT A12
COLUMN WA_SIZE FORMAT 9999999990
SPOOL SQL_WORKAREA.TXT

SELECT
  SQL_ID,
  OPERATION_ID ID,
  OPERATION_TYPE,
  WORK_AREA_SIZE WA_SIZE,
  ACTUAL_MEM_USED,
  NUMBER_PASSES PASSES,
  TEMPSEG_SIZE
FROM
  V$SQL_WORKAREA_ACTIVE
ORDER BY
  SQL_ID,
  OPERATION_ID;

SQL_ID         ID OPERATION_TY     WA_SIZE ACTUAL_MEM_USED  PASSES TEMPSEG_SIZE
------------- --- ------------ ----------- --------------- ------- ------------
57vx5p5xq42jq   2 HASH-JOIN       98526208         4239360       0
57vx5p5xq42jq   3 HASH-JOIN       95463424        80363520       0

So far, the hash join at ID 2 is consuming about 4.04MB, and the hash join at ID 3 is consuming about 76.64MB.  Now we repeat the query of V$SQL_WORKAREA_ACTIVE roughly every 10 seconds:

SQL_ID         ID OPERATION_TY     WA_SIZE ACTUAL_MEM_USED  PASSES TEMPSEG_SIZE
------------- --- ------------ ----------- --------------- ------- ------------
57vx5p5xq42jq   2 HASH-JOIN       98526208         4239360       0
57vx5p5xq42jq   3 HASH-JOIN       95463424        80363520       0

SQL_ID         ID OPERATION_TY     WA_SIZE ACTUAL_MEM_USED  PASSES TEMPSEG_SIZE
------------- --- ------------ ----------- --------------- ------- ------------
57vx5p5xq42jq   2 HASH-JOIN       98526208         4239360       0
57vx5p5xq42jq   3 HASH-JOIN       95463424        80363520       0

SQL_ID         ID OPERATION_TY     WA_SIZE ACTUAL_MEM_USED  PASSES TEMPSEG_SIZE
------------- --- ------------ ----------- --------------- ------- ------------
57vx5p5xq42jq   2 HASH-JOIN       98526208         4239360       0
57vx5p5xq42jq   3 HASH-JOIN       95463424        80363520       0

SQL_ID         ID OPERATION_TY     WA_SIZE ACTUAL_MEM_USED  PASSES TEMPSEG_SIZE
------------- --- ------------ ----------- --------------- ------- ------------
57vx5p5xq42jq   2 HASH-JOIN       98526208         4239360       0
57vx5p5xq42jq   3 HASH-JOIN       95463424        80363520       0

SQL_ID         ID OPERATION_TY     WA_SIZE ACTUAL_MEM_USED  PASSES TEMPSEG_SIZE
------------- --- ------------ ----------- --------------- ------- ------------
57vx5p5xq42jq   2 HASH-JOIN       98526208         4239360       0
57vx5p5xq42jq   3 HASH-JOIN       95463424        80363520       0

SQL_ID         ID OPERATION_TY     WA_SIZE ACTUAL_MEM_USED  PASSES TEMPSEG_SIZE
------------- --- ------------ ----------- --------------- ------- ------------
57vx5p5xq42jq   2 HASH-JOIN       98526208         4239360       0
57vx5p5xq42jq   3 HASH-JOIN       95463424        80363520       0

SQL_ID         ID OPERATION_TY     WA_SIZE ACTUAL_MEM_USED  PASSES TEMPSEG_SIZE
------------- --- ------------ ----------- --------------- ------- ------------
57vx5p5xq42jq   2 HASH-JOIN       98526208         4239360       0
57vx5p5xq42jq   3 HASH-JOIN       95463424        80363520       0

SQL_ID         ID OPERATION_TY     WA_SIZE ACTUAL_MEM_USED  PASSES TEMPSEG_SIZE
------------- --- ------------ ----------- --------------- ------- ------------
57vx5p5xq42jq   2 HASH-JOIN       98526208         4239360       0
57vx5p5xq42jq   3 HASH-JOIN       95463424        80363520       0

SQL_ID         ID OPERATION_TY     WA_SIZE ACTUAL_MEM_USED  PASSES TEMPSEG_SIZE
------------- --- ------------ ----------- --------------- ------- ------------
57vx5p5xq42jq   2 HASH-JOIN       98526208         4239360       0
57vx5p5xq42jq   3 HASH-JOIN       95463424        80363520       0

SQL_ID         ID OPERATION_TY     WA_SIZE ACTUAL_MEM_USED  PASSES TEMPSEG_SIZE
------------- --- ------------ ----------- --------------- ------- ------------
57vx5p5xq42jq   2 HASH-JOIN       98526208         4239360       0
57vx5p5xq42jq   3 HASH-JOIN       95463424        80363520       0

SQL_ID         ID OPERATION_TY     WA_SIZE ACTUAL_MEM_USED  PASSES TEMPSEG_SIZE
------------- --- ------------ ----------- --------------- ------- ------------
57vx5p5xq42jq   2 HASH-JOIN       98526208         4239360       0
57vx5p5xq42jq   3 HASH-JOIN       95463424        80363520       0

SQL_ID         ID OPERATION_TY     WA_SIZE ACTUAL_MEM_USED  PASSES TEMPSEG_SIZE
------------- --- ------------ ----------- --------------- ------- ------------
57vx5p5xq42jq   2 HASH-JOIN       98526208         4239360       0
57vx5p5xq42jq   3 HASH-JOIN       95463424        80363520       0

SQL_ID         ID OPERATION_TY     WA_SIZE ACTUAL_MEM_USED  PASSES TEMPSEG_SIZE
------------- --- ------------ ----------- --------------- ------- ------------
57vx5p5xq42jq   2 HASH-JOIN       98526208         4239360       0
57vx5p5xq42jq   3 HASH-JOIN       95463424        80363520       0

SQL_ID         ID OPERATION_TY     WA_SIZE ACTUAL_MEM_USED  PASSES TEMPSEG_SIZE
------------- --- ------------ ----------- --------------- ------- ------------
57vx5p5xq42jq   2 HASH-JOIN       98526208         4239360       0
57vx5p5xq42jq   3 HASH-JOIN       95771648        97603584       0

SQL_ID         ID OPERATION_TY     WA_SIZE ACTUAL_MEM_USED  PASSES TEMPSEG_SIZE
------------- --- ------------ ----------- --------------- ------- ------------
57vx5p5xq42jq   2 HASH-JOIN       98526208         4239360       0
57vx5p5xq42jq   3 HASH-JOIN       95771648        97603584       0

SQL_ID         ID OPERATION_TY     WA_SIZE ACTUAL_MEM_USED  PASSES TEMPSEG_SIZE
------------- --- ------------ ----------- --------------- ------- ------------
57vx5p5xq42jq   2 HASH-JOIN       98526208         4239360       0
57vx5p5xq42jq   3 HASH-JOIN       95771648        97603584       0

SQL_ID         ID OPERATION_TY     WA_SIZE ACTUAL_MEM_USED  PASSES TEMPSEG_SIZE
------------- --- ------------ ----------- --------------- ------- ------------
57vx5p5xq42jq   2 HASH-JOIN       87232512        89805824       0

SQL_ID         ID OPERATION_TY     WA_SIZE ACTUAL_MEM_USED  PASSES TEMPSEG_SIZE
------------- --- ------------ ----------- --------------- ------- ------------
57vx5p5xq42jq   2 HASH-JOIN       87232512        89805824       0

As we can see from the above, the hash join at ID 2 continued to consume 4.04MB, while the hash join at ID 3 increased to 93.09 MB.  When the hash join at ID 3 disappeared, the hash join at ID 2 consumed roughly 83.19MB.  The two hash joins and the sort operation completed in-memory, without spilling to the TEMP tablespace.

Two executions of this SQL statement show the total PGA memory consumed by the session jumped up to a high of 207.40MB, but dropped down to 133.03MB, and then eventually hit 8.03MB when the script ended:

SELECT
  SN.NAME,
  SS.VALUE
FROM
  V$STATNAME SN,
  V$SESSTAT SS
WHERE
  SS.SID=303
  AND SS.STATISTIC#=SN.STATISTIC#
  AND SN.NAME LIKE '%pga%';

NAME                          VALUE
----------------------- -----------
session pga memory      139,489,936
session pga memory max  217,477,776

NAME                          VALUE
----------------------- -----------
session pga memory        8,417,936
session pga memory max  217,477,776

Let’s check the DBMS_XPLAN output:

SELECT
  *
FROM
  TABLE(DBMS_XPLAN.DISPLAY_CURSOR('57vx5p5xq42jq',0,'ALLSTATS LAST'));

Plan hash value: 3251203018

---------------------------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
---------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |      1 |        |    400K|00:02:33.07 |    2833K|   2833K|       |       |          |
|   1 |  SORT ORDER BY       |      |      1 |    497K|    400K|00:02:33.07 |    2833K|   2833K|    68M|  2873K|   61M (0)|
|*  2 |   HASH JOIN ANTI NA  |      |      1 |    497K|    400K|00:02:32.75 |    2833K|   2833K|    74M|  7919K|   85M (0)|
|*  3 |    HASH JOIN ANTI NA |      |      1 |    499K|    450K|00:02:09.73 |    2416K|   2416K|    82M|  7919K|   93M (0)|
|*  4 |     TABLE ACCESS FULL| T1   |      1 |    500K|    500K|00:01:46.14 |    2000K|   1999K|       |       |          |
|   5 |     TABLE ACCESS FULL| T2   |      1 |     10M|     10M|00:00:20.03 |     416K|    416K|       |       |          |
|   6 |    TABLE ACCESS FULL | T3   |      1 |     10M|     10M|00:00:20.03 |     416K|    416K|       |       |          |
---------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("T1"."C2"="C2")
   3 - access("T1"."C1"="C1")
   4 - filter(("T1"."C1"<=500000 AND "T1"."C1">=1))

The DBMS_XPLAN output indicates that all three workarea executions where optimal with the sort consuming 61MB, the hash join at ID2 consuming 85MB, and the hash join at ID 3 consuming 93MB – but remember that the memory was not all used at the same time.

Let’s repeat the test with a larger number range to see if we are able to locate the tipping point.

Session 1:

SET AUTOTRACE TRACEONLY EXPLAIN

SELECT
  T1.C1,
  T1.C2,
  T1.C3
FROM
  T1
WHERE
  T1.C1 NOT IN (
    SELECT
      C1
    FROM
      T2)
  AND T1.C2 NOT IN (
    SELECT
      C2
    FROM
      T3)
  AND T1.C1 BETWEEN 1 AND 1000000
ORDER BY
  T1.C2 DESC,
  T1.C1 DESC;

Plan hash value: 3251203018

-------------------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes |TempSpc| Cost (%CPU)|  Time    |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |   995K|   165M|       |   851K  (1)| 02:50:16 |
|   1 |  SORT ORDER BY       |      |   995K|   165M|   172M|   851K  (1)| 02:50:16 |
|*  2 |   HASH JOIN ANTI NA  |      |   995K|   165M|   147M|   813K  (1)| 02:42:40 |
|*  3 |    HASH JOIN ANTI NA |      |   999K|   136M|   142M|   672K  (1)| 02:14:28 |
|*  4 |     TABLE ACCESS FULL| T1   |  1000K|   130M|       |   543K  (1)| 01:48:42 |
|   5 |     TABLE ACCESS FULL| T2   |    10M|    57M|       |   113K  (1)| 00:22:39 |
|   6 |    TABLE ACCESS FULL | T3   |    10M|   295M|       |   113K  (1)| 00:22:39 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("T1"."C2"="C2")
   3 - access("T1"."C1"="C1")
   4 - filter("T1"."C1"<=1000000 AND "T1"."C1">=1)

SET AUTOTRACE OFF

@PGAMemoryFill2.sql

Session 2:

SELECT
  SQL_ID,
  OPERATION_ID ID,
  OPERATION_TYPE,
  WORK_AREA_SIZE WA_SIZE,
  ACTUAL_MEM_USED,
  NUMBER_PASSES PASSES,
  TEMPSEG_SIZE
FROM
  V$SQL_WORKAREA_ACTIVE
ORDER BY
  SQL_ID,
  OPERATION_ID;

SQL_ID         ID OPERATION_TY     WA_SIZE ACTUAL_MEM_USED  PASSES TEMPSEG_SIZE
------------- --- ------------ ----------- --------------- ------- ------------
7wy7nqhbn5v7g   2 HASH-JOIN      187907072         8454144       0
7wy7nqhbn5v7g   3 HASH-JOIN      181816320       161557504       0 

We start off with the hash join at ID 2 consuming 8.06MB and the hash join at ID 3 consuming  154.07MB.  Now we continuing executing that query roughly every 10 seconds:

SQL_ID         ID OPERATION_TY     WA_SIZE ACTUAL_MEM_USED  PASSES TEMPSEG_SIZE
------------- --- ------------ ----------- --------------- ------- ------------
7wy7nqhbn5v7g   2 HASH-JOIN      187907072         8454144       0
7wy7nqhbn5v7g   3 HASH-JOIN      181816320       161557504       0

SQL_ID         ID OPERATION_TY     WA_SIZE ACTUAL_MEM_USED  PASSES TEMPSEG_SIZE
------------- --- ------------ ----------- --------------- ------- ------------
7wy7nqhbn5v7g   2 HASH-JOIN      187907072         8454144       0
7wy7nqhbn5v7g   3 HASH-JOIN      181816320       161557504       0

SQL_ID         ID OPERATION_TY     WA_SIZE ACTUAL_MEM_USED  PASSES TEMPSEG_SIZE
------------- --- ------------ ----------- --------------- ------- ------------
7wy7nqhbn5v7g   2 HASH-JOIN      187907072         8454144       0
7wy7nqhbn5v7g   3 HASH-JOIN      181816320       161557504       0

SQL_ID         ID OPERATION_TY     WA_SIZE ACTUAL_MEM_USED  PASSES TEMPSEG_SIZE
------------- --- ------------ ----------- --------------- ------- ------------
7wy7nqhbn5v7g   2 HASH-JOIN      187907072         8454144       0
7wy7nqhbn5v7g   3 HASH-JOIN      181816320       161557504       0

SQL_ID         ID OPERATION_TY     WA_SIZE ACTUAL_MEM_USED  PASSES TEMPSEG_SIZE
------------- --- ------------ ----------- --------------- ------- ------------
7wy7nqhbn5v7g   2 HASH-JOIN      187907072         8454144       0
7wy7nqhbn5v7g   3 HASH-JOIN      181816320       161557504       0

SQL_ID         ID OPERATION_TY     WA_SIZE ACTUAL_MEM_USED  PASSES TEMPSEG_SIZE
------------- --- ------------ ----------- --------------- ------- ------------
7wy7nqhbn5v7g   2 HASH-JOIN      187907072         8454144       0
7wy7nqhbn5v7g   3 HASH-JOIN      181816320       161557504       0

SQL_ID         ID OPERATION_TY     WA_SIZE ACTUAL_MEM_USED  PASSES TEMPSEG_SIZE
------------- --- ------------ ----------- --------------- ------- ------------
7wy7nqhbn5v7g   2 HASH-JOIN      187907072         8454144       0
7wy7nqhbn5v7g   3 HASH-JOIN      181816320       161557504       0

SQL_ID         ID OPERATION_TY     WA_SIZE ACTUAL_MEM_USED  PASSES TEMPSEG_SIZE
------------- --- ------------ ----------- --------------- ------- ------------
7wy7nqhbn5v7g   2 HASH-JOIN      187907072         8454144       0
7wy7nqhbn5v7g   3 HASH-JOIN      181816320       161557504       0

SQL_ID         ID OPERATION_TY     WA_SIZE ACTUAL_MEM_USED  PASSES TEMPSEG_SIZE
------------- --- ------------ ----------- --------------- ------- ------------
7wy7nqhbn5v7g   2 HASH-JOIN      187907072         8454144       0
7wy7nqhbn5v7g   3 HASH-JOIN      181816320       161557504       0

SQL_ID         ID OPERATION_TY     WA_SIZE ACTUAL_MEM_USED  PASSES TEMPSEG_SIZE
------------- --- ------------ ----------- --------------- ------- ------------
7wy7nqhbn5v7g   2 HASH-JOIN      187907072         8454144       0
7wy7nqhbn5v7g   3 HASH-JOIN      181816320       161557504       0

SQL_ID         ID OPERATION_TY     WA_SIZE ACTUAL_MEM_USED  PASSES TEMPSEG_SIZE
------------- --- ------------ ----------- --------------- ------- ------------
7wy7nqhbn5v7g   2 HASH-JOIN      187907072         8454144       0
7wy7nqhbn5v7g   3 HASH-JOIN      181816320       161557504       0

SQL_ID         ID OPERATION_TY     WA_SIZE ACTUAL_MEM_USED  PASSES TEMPSEG_SIZE
------------- --- ------------ ----------- --------------- ------- ------------
7wy7nqhbn5v7g   2 HASH-JOIN      187907072         8454144       0
7wy7nqhbn5v7g   3 HASH-JOIN      188286976       215750656       0

SQL_ID         ID OPERATION_TY     WA_SIZE ACTUAL_MEM_USED  PASSES TEMPSEG_SIZE
------------- --- ------------ ----------- --------------- ------- ------------
7wy7nqhbn5v7g   2 HASH-JOIN      187907072         8454144       0
7wy7nqhbn5v7g   3 HASH-JOIN      188286976       215750656       0

SQL_ID         ID OPERATION_TY     WA_SIZE ACTUAL_MEM_USED  PASSES TEMPSEG_SIZE
------------- --- ------------ ----------- --------------- ------- ------------
7wy7nqhbn5v7g   2 HASH-JOIN      167822336       194778112       0

SQL_ID         ID OPERATION_TY     WA_SIZE ACTUAL_MEM_USED  PASSES TEMPSEG_SIZE
------------- --- ------------ ----------- --------------- ------- ------------
7wy7nqhbn5v7g   2 HASH-JOIN      167822336       194778112       0

SQL_ID         ID OPERATION_TY     WA_SIZE ACTUAL_MEM_USED  PASSES TEMPSEG_SIZE
------------- --- ------------ ----------- --------------- ------- ------------
7wy7nqhbn5v7g   2 HASH-JOIN      167822336       194778112       0

SQL_ID         ID OPERATION_TY     WA_SIZE ACTUAL_MEM_USED  PASSES TEMPSEG_SIZE
------------- --- ------------ ----------- --------------- ------- ------------
7wy7nqhbn5v7g   1 SORT (v2)        1245184          486400       1    117440512

As we are able to see from the above, the hash join at ID 2 continued consuming 8.06MB of memory while the hash join at ID 3 grew to 205.76MB.  Once the hash join at ID 3 disappeared, the hash join at ID 2 grew to 185.75MB – both of the hash joins completed using an optimal, in-memory execution.  We saw in the earlier test that the SORT operation at ID 1 required about 24MB less PGA memory that the hash join at ID 2, yet this time the sort operation spilled to disk, using 112MB of space in the TEMP tablespace and just 0.46MB of PGA memory (there must be a reason why the hash join completed in memory, but the SORT operation that consumed less memory spilled to disk, but it escapes me at the moment – the old rule before the PGA_AGGREGATE_TARGET was introduced is that HASH_AREA_SIZE defaulted to twice the value for SORT_AREA_SIZE – I wonder if some of that logic is still present).

So, what about the PGA memory usage?

SELECT
  SN.NAME,
  SS.VALUE
FROM
  V$STATNAME SN,
  V$SESSTAT SS
WHERE
  SS.SID=303
  AND SS.STATISTIC#=SN.STATISTIC#
  AND SN.NAME LIKE '%pga%';

NAME                          VALUE
----------------------- -----------
session pga memory      287,994,512
session pga memory max  390,558,352

NAME                          VALUE
----------------------- -----------
session pga memory        8,549,008
session pga memory max  390,558,352

The PGA memory usage hit a high of 372.47MB and dropped down to 8.15MB when the script completed.  Let’s check the DBMS_XPLAN output:

SELECT
  *
FROM
  TABLE(DBMS_XPLAN.DISPLAY_CURSOR('7wy7nqhbn5v7g',0,'ALLSTATS LAST'));

Plan hash value: 3251203018

----------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  | Writes |  OMem |  1Mem | Used-Mem | Used-Tmp|
----------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |      1 |        |    800K|00:02:37.04 |    2833K|   2847K|  14286 |       |       |          |         |
|   1 |  SORT ORDER BY       |      |      1 |    995K|    800K|00:02:37.04 |    2833K|   2847K|  14286 |   126M|  3808K|  116M (1)|     112K|
|*  2 |   HASH JOIN ANTI NA  |      |      1 |    995K|    800K|00:02:33.47 |    2833K|   2833K|      0 |   145M|  7919K|  185M (0)|         |
|*  3 |    HASH JOIN ANTI NA |      |      1 |    999K|    900K|00:02:09.85 |    2416K|   2416K|      0 |   161M|  7919K|  205M (0)|         |
|*  4 |     TABLE ACCESS FULL| T1   |      1 |   1000K|   1000K|00:01:45.86 |    2000K|   1999K|      0 |       |       |          |         |
|   5 |     TABLE ACCESS FULL| T2   |      1 |     10M|     10M|00:00:20.00 |     416K|    416K|      0 |       |       |          |         |
|   6 |    TABLE ACCESS FULL | T3   |      1 |     10M|     10M|00:00:10.03 |     416K|    416K|      0 |       |       |          |         |
----------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("T1"."C2"="C2")
   3 - access("T1"."C1"="C1")
   4 - filter(("T1"."C1"<=1000000 AND "T1"."C1">=1))

The above seems to indicate that the SORT operation at ID 1 at one point consumed 126MB 116MB of memory, and must have then spilled to disk, reducing the memory usage to the 0.46MB value that we saw with the earlier query of V$SQL_WORKAREA_ACTIVE.  This output confirms that the SORT operation performed a 1 pass workarea execution, while the two hash joins performed an optimal workarea execution.

Let’s repeat the test a final time with a larger number range to see if we are able to locate the tipping point.

Session 1:

SET AUTOTRACE TRACEONLY EXPLAIN

SELECT
  T1.C1,
  T1.C2,
  T1.C3
FROM
  T1
WHERE
  T1.C1 NOT IN (
    SELECT
      C1
    FROM
      T2)
  AND T1.C2 NOT IN (
    SELECT
      C2
    FROM
      T3)
  AND T1.C1 BETWEEN 1 AND 1400000
ORDER BY
  T1.C2 DESC,
  T1.C1 DESC;

Plan hash value: 1147745168

------------------------------------------------------------------------------------------
| Id  | Operation                 | Name | Rows  | Bytes |TempSpc| Cost (%CPU)|Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |      |  1393K|   231M|       |   872K  (1)|02:54:27 |
|   1 |  SORT ORDER BY            |      |  1393K|   231M|   242M|   872K  (1)|02:54:27 |
|*  2 |   HASH JOIN ANTI NA       |      |  1393K|   231M|   206M|   819K  (1)|02:43:49 |
|*  3 |    HASH JOIN RIGHT ANTI NA|      |  1399K|   190M|   171M|   675K  (1)|02:15:02 |
|   4 |     TABLE ACCESS FULL     | T2   |    10M|    57M|       |   113K  (1)|00:22:39 |
|*  5 |     TABLE ACCESS FULL     | T1   |  1400K|   182M|       |   543K  (1)|01:48:42 |
|   6 |    TABLE ACCESS FULL      | T3   |    10M|   295M|       |   113K  (1)|00:22:39 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("T1"."C2"="C2")
   3 - access("T1"."C1"="C1")
   5 - filter("T1"."C1"<=1400000 AND "T1"."C1">=1)

SET AUTOTRACE OFF

Session 2:

SELECT
  SQL_ID,
  OPERATION_ID ID,
  OPERATION_TYPE,
  WORK_AREA_SIZE WA_SIZE,
  ACTUAL_MEM_USED,
  NUMBER_PASSES PASSES,
  TEMPSEG_SIZE
FROM
  V$SQL_WORKAREA_ACTIVE
ORDER BY
  SQL_ID,
  OPERATION_ID;

SQL_ID         ID OPERATION_TY     WA_SIZE ACTUAL_MEM_USED  PASSES TEMPSEG_SIZE
------------- --- ------------ ----------- --------------- ------- ------------
a6yfcryfux22j   2 HASH-JOIN       21715968         8454144       0
a6yfcryfux22j   3 HASH-JOIN       29298688        20733952       0     19922944

SQL_ID         ID OPERATION_TY     WA_SIZE ACTUAL_MEM_USED  PASSES TEMPSEG_SIZE
------------- --- ------------ ----------- --------------- ------- ------------
a6yfcryfux22j   2 HASH-JOIN       21715968         8454144       0
a6yfcryfux22j   3 HASH-JOIN       29298688        20733952       0     57671680

SQL_ID         ID OPERATION_TY     WA_SIZE ACTUAL_MEM_USED  PASSES TEMPSEG_SIZE
------------- --- ------------ ----------- --------------- ------- ------------
a6yfcryfux22j   2 HASH-JOIN       21715968         8454144       0
a6yfcryfux22j   3 HASH-JOIN       29298688        20733952       0     96468992

SQL_ID         ID OPERATION_TY     WA_SIZE ACTUAL_MEM_USED  PASSES TEMPSEG_SIZE
------------- --- ------------ ----------- --------------- ------- ------------
a6yfcryfux22j   2 HASH-JOIN       21715968         8454144       0
a6yfcryfux22j   3 HASH-JOIN      132551680        97767424       1    130023424

SQL_ID         ID OPERATION_TY     WA_SIZE ACTUAL_MEM_USED  PASSES TEMPSEG_SIZE
------------- --- ------------ ----------- --------------- ------- ------------
a6yfcryfux22j   2 HASH-JOIN       21715968        20730880       0    173015040
a6yfcryfux22j   3 HASH-JOIN      145126400       151730176       1    169869312

SQL_ID         ID OPERATION_TY     WA_SIZE ACTUAL_MEM_USED  PASSES TEMPSEG_SIZE
------------- --- ------------ ----------- --------------- ------- ------------
a6yfcryfux22j   2 HASH-JOIN       21715968        20730880       0    173015040
a6yfcryfux22j   3 HASH-JOIN      145126400       151730176       1    169869312

SQL_ID         ID OPERATION_TY     WA_SIZE ACTUAL_MEM_USED  PASSES TEMPSEG_SIZE
------------- --- ------------ ----------- --------------- ------- ------------
a6yfcryfux22j   2 HASH-JOIN       21715968        20730880       0    173015040
a6yfcryfux22j   3 HASH-JOIN      145126400       151730176       1    169869312

SQL_ID         ID OPERATION_TY     WA_SIZE ACTUAL_MEM_USED  PASSES TEMPSEG_SIZE
------------- --- ------------ ----------- --------------- ------- ------------
a6yfcryfux22j   2 HASH-JOIN       21715968        20730880       0    173015040
a6yfcryfux22j   3 HASH-JOIN      145126400       151730176       1    169869312

SQL_ID         ID OPERATION_TY     WA_SIZE ACTUAL_MEM_USED  PASSES TEMPSEG_SIZE
------------- --- ------------ ----------- --------------- ------- ------------
a6yfcryfux22j   2 HASH-JOIN       21715968        20730880       0    173015040
a6yfcryfux22j   3 HASH-JOIN      145126400       151730176       1    169869312

SQL_ID         ID OPERATION_TY     WA_SIZE ACTUAL_MEM_USED  PASSES TEMPSEG_SIZE
------------- --- ------------ ----------- --------------- ------- ------------
a6yfcryfux22j   2 HASH-JOIN       21715968        20730880       0    173015040
a6yfcryfux22j   3 HASH-JOIN      145126400       151730176       1    169869312

SQL_ID         ID OPERATION_TY     WA_SIZE ACTUAL_MEM_USED  PASSES TEMPSEG_SIZE
------------- --- ------------ ----------- --------------- ------- ------------
a6yfcryfux22j   2 HASH-JOIN       21715968        20730880       0    173015040
a6yfcryfux22j   3 HASH-JOIN      145126400       151730176       1    169869312

SQL_ID         ID OPERATION_TY     WA_SIZE ACTUAL_MEM_USED  PASSES TEMPSEG_SIZE
------------- --- ------------ ----------- --------------- ------- ------------
a6yfcryfux22j   2 HASH-JOIN       21715968        20730880       0    173015040
a6yfcryfux22j   3 HASH-JOIN      145126400       151730176       1    169869312

SQL_ID         ID OPERATION_TY     WA_SIZE ACTUAL_MEM_USED  PASSES TEMPSEG_SIZE
------------- --- ------------ ----------- --------------- ------- ------------
a6yfcryfux22j   2 HASH-JOIN       21715968        20730880       0    173015040
a6yfcryfux22j   3 HASH-JOIN      145126400       151730176       1    169869312

SQL_ID         ID OPERATION_TY     WA_SIZE ACTUAL_MEM_USED  PASSES TEMPSEG_SIZE
------------- --- ------------ ----------- --------------- ------- ------------
a6yfcryfux22j   2 HASH-JOIN       21715968        20730880       0    173015040
a6yfcryfux22j   3 HASH-JOIN      145126400       151730176       1    169869312

SQL_ID         ID OPERATION_TY     WA_SIZE ACTUAL_MEM_USED  PASSES TEMPSEG_SIZE
------------- --- ------------ ----------- --------------- ------- ------------
a6yfcryfux22j   2 HASH-JOIN       21715968        20730880       0    173015040
a6yfcryfux22j   3 HASH-JOIN      145126400       151730176       1    169869312

SQL_ID         ID OPERATION_TY     WA_SIZE ACTUAL_MEM_USED  PASSES TEMPSEG_SIZE
------------- --- ------------ ----------- --------------- ------- ------------
a6yfcryfux22j   2 HASH-JOIN       21715968        20730880       0    173015040
a6yfcryfux22j   3 HASH-JOIN      145126400       151730176       1    169869312

SQL_ID         ID OPERATION_TY     WA_SIZE ACTUAL_MEM_USED  PASSES TEMPSEG_SIZE
------------- --- ------------ ----------- --------------- ------- ------------
a6yfcryfux22j   2 HASH-JOIN      190238720       105683968       1    189792256

SQL_ID         ID OPERATION_TY     WA_SIZE ACTUAL_MEM_USED  PASSES TEMPSEG_SIZE
------------- --- ------------ ----------- --------------- ------- ------------
a6yfcryfux22j   2 HASH-JOIN      202813440       204740608       1    199229440

SQL_ID         ID OPERATION_TY     WA_SIZE ACTUAL_MEM_USED  PASSES TEMPSEG_SIZE
------------- --- ------------ ----------- --------------- ------- ------------
a6yfcryfux22j   2 HASH-JOIN      202813440       204740608       1    220200960

SQL_ID         ID OPERATION_TY     WA_SIZE ACTUAL_MEM_USED  PASSES TEMPSEG_SIZE
------------- --- ------------ ----------- --------------- ------- ------------
a6yfcryfux22j   2 HASH-JOIN      202813440       204740608       1    242221056

SQL_ID         ID OPERATION_TY     WA_SIZE ACTUAL_MEM_USED  PASSES TEMPSEG_SIZE
------------- --- ------------ ----------- --------------- ------- ------------
a6yfcryfux22j   1 SORT (v2)       32429056        25973760       1    137363456
a6yfcryfux22j   2 HASH-JOIN       10075136         8312832       1    251658240

SELECT
  SN.NAME,
  SS.VALUE
FROM
  V$STATNAME SN,
  V$SESSTAT SS
WHERE
  SS.SID=303
  AND SS.STATISTIC#=SN.STATISTIC#
  AND SN.NAME LIKE '%pga%';

NAME                          VALUE
----------------------- -----------
session pga memory      377,975,440
session pga memory max  390,558,352

NAME                          VALUE
----------------------- -----------
session pga memory        8,549,008
session pga memory max  390,558,352

SELECT
  *
FROM
  TABLE(DBMS_XPLAN.DISPLAY_CURSOR('a6yfcryfux22j',0,'ALLSTATS LAST'));

Plan hash value: 1147745168

---------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                 | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  | Writes |  OMem |  1Mem | Used-Mem | Used-Tmp|
---------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |      |      1 |        |   1120K|00:03:03.88 |    2833K|   2902K|  68818 |       |       |          |         |
|   1 |  SORT ORDER BY            |      |      1 |   1393K|   1120K|00:03:03.88 |    2833K|   2902K|  68818 |   177M|  4474K|  116M (1)|     158K|
|*  2 |   HASH JOIN ANTI NA       |      |      1 |   1393K|   1120K|00:02:57.80 |    2833K|   2882K|  48701 |   202M|  7914K|  195M (1)|     240K|
|*  3 |    HASH JOIN RIGHT ANTI NA|      |      1 |   1399K|   1260K|00:02:23.18 |    2416K|   2436K|  19840 |   269M|    14M|  144M (1)|     162K|
|   4 |     TABLE ACCESS FULL     | T2   |      1 |     10M|     10M|00:00:20.03 |     416K|    416K|      0 |       |       |          |         |
|*  5 |     TABLE ACCESS FULL     | T1   |      1 |   1400K|   1400K|00:01:48.32 |    2000K|   1999K|      0 |       |       |          |         |
|   6 |    TABLE ACCESS FULL      | T3   |      1 |     10M|     10M|00:00:20.03 |     416K|    416K|      0 |       |       |          |         |
---------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("T1"."C2"="C2")
   3 - access("T1"."C1"="C1")
   5 - filter(("T1"."C1"<=1400000 AND "T1"."C1">=1))

All three of the workarea executions became 1 pass executions, but look at the Used-Mem and the Used-Tmp columns.  If you had not seen the previous test cases, you might take a look at the DBMS_XPLAN output and remark how silly Oracle is to consume 116M of PGA memory during a SORT operation and spill to the TEMP tablespace just 156KB, or how silly it is that Oracle would consume 195MB in the hash join at ID 2 and spill just 240KB to the TEMP tablespace.  It should now be obious that this is not what is happening – so much for relying on the DBMS_XPLAN output with ALLSTATS LAST specified at the format parameter and STATISTICS_LEVEL set to ALL.  Your results could be different with a different Oracle release (the above test results are from 11.1.0.7), different value for PGA_AGGREGATE_TARGET, or with different levels of concurrent activity in the database.





PGA Memory – The Developer’s Secret Weapon for Stealing All of the Memory in the Server

18 01 2010

January 18, 2010

(Forward to the Follow-Up Post)

Here is a fun test where you might be able to bring down the server in one of several ways (warning, you might not want to try this with anything less than Oracle Database 11.1.0.6 – if you want to try the test with Oracle 9i or 10g, add NOT NULL constraints to the columns C1 and C2 in each table):

  • Filling up the last bit of available space in the datafiles.
  • Causing the Temp tablespace to madly expand until it reaches its maximum size.
  • Stealing all of the memory on the server, so much for setting the PGA_AGGREGATE_TARGET parameter.
  • Swamping the disk subsystem

We start out with three innocent looking tables created by the following script:

CREATE TABLE T1 AS
SELECT
  ROWNUM C1,
  RPAD('R'||TO_CHAR(ROWNUM),30,'B') C2,
  RPAD('A',100,'A') C3
FROM
  (SELECT
    ROWNUM C1
  FROM
    DUAL
  CONNECT BY
    LEVEL<=10000) V1,
  (SELECT
    ROWNUM C1
  FROM
    DUAL
  CONNECT BY
    LEVEL<=10000) V2;

CREATE TABLE T2 AS
SELECT
  ROWNUM*10 C1,
  RPAD('R'||TO_CHAR(ROWNUM*10),30,'B') C2,
  RPAD('A',255,'A') C3
FROM
  (SELECT
    ROWNUM C1
  FROM
    DUAL
  CONNECT BY
    LEVEL<=10000) V1,
  (SELECT
    ROWNUM C1
  FROM
    DUAL
  CONNECT BY
    LEVEL<=1000) V2;

CREATE TABLE T3 AS
SELECT
  (ROWNUM*10)+2 C1,
  RPAD('R'||TO_CHAR((ROWNUM*10)+2),30,'B') C2,
  RPAD('A',255,'A') C3
FROM
  (SELECT
    ROWNUM C1
  FROM
    DUAL
  CONNECT BY
    LEVEL<=10000) V1,
  (SELECT
    ROWNUM C1
  FROM
    DUAL
  CONNECT BY
    LEVEL<=1000) V2;

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T1')
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T2')
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T3')

We can see how much disk space is in use by the three tables with the following SQL statement:

SELECT
  SEGMENT_NAME SEGMENT,
  SUM(BYTES/1048576) TOTAL_MB
FROM
  DBA_EXTENTS
WHERE
  OWNER=USER
  AND SEGMENT_NAME IN ('T1','T2','T3')
GROUP BY
  SEGMENT_NAME
ORDER BY
  SEGMENT_NAME;

SEGMENT   TOTAL_MB
------- ----------
T1           15684
T2            3269
T3            3266

Looks like about 21.7GB is in use by the three tables.  Next, we need a script that we will name PGAMemoryFill.sql:

DECLARE
CURSOR C_MEMORY_FILL IS
SELECT
  T1.C1,
  T1.C2,
  T1.C3
FROM
  T1
WHERE
  T1.C1 NOT IN (
    SELECT
      C1
    FROM
      T2)
  AND T1.C2 NOT IN (
    SELECT
      C2
    FROM
      T3)
ORDER BY
  T1.C2 DESC,
  T1.C1 DESC;

TYPE TYPE_MEMORY_FILL IS TABLE OF C_MEMORY_FILL%ROWTYPE
INDEX BY BINARY_INTEGER;

T_MEMORY_FILL  TYPE_MEMORY_FILL;

BEGIN
  OPEN C_MEMORY_FILL;
  LOOP
    FETCH C_MEMORY_FILL BULK COLLECT INTO  T_MEMORY_FILL  LIMIT 10000000;

    EXIT WHEN T_MEMORY_FILL.COUNT = 0;

    FOR I IN T_MEMORY_FILL.FIRST..T_MEMORY_FILL.LAST LOOP
      NULL;
    END LOOP;

    DBMS_LOCK.SLEEP(20);
  END LOOP;
END;
/

Yes, the script is performing bulk collection (2 DBAs stand up and clap, the rest start shaking their heads side to side).

Let’s check the PGA_AGGREGATE_TARGET:

SHOW PARAMETER PGA_AGGREGATE_TARGET

NAME                                 TYPE        VALUE
------------------------------------ ----------- -----
pga_aggregate_target                 big integer 1800M

OK, the PGA_AGGREGATE_TARGET is just less then 1.8GB, and the server has 12GB of memory.

Now for the test, we will need two sessions, session 1 will be the session that executes the above script, and session 2 will execute various queries to see what is happening in the database.

Session 1:

SELECT SID FROM V$MYSTAT WHERE ROWNUM<=1;

  SID
-----
  335

SET AUTOTRACE TRACEONLY EXPLAIN

SELECT
  T1.C1,
  T1.C2,
  T1.C3
FROM
  T1
WHERE
  T1.C1 NOT IN (
    SELECT
      C1
    FROM
      T2)
  AND T1.C2 NOT IN (
    SELECT
      C2
    FROM
      T3)
ORDER BY
  T1.C2 DESC,
  T1.C1 DESC;

Plan hash value: 2719846691

------------------------------------------------------------------------------------------
| Id  | Operation                 | Name | Rows  | Bytes |TempSpc| Cost (%CPU)|Time      |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |      |     1 |   174 |       |  2351K  (1)| 07:50:19 |
|   1 |  SORT ORDER BY            |      |     1 |   174 |       |  2351K  (1)| 07:50:19 |
|*  2 |   HASH JOIN RIGHT ANTI NA |      |     1 |   174 |   171M|  2351K  (1)| 07:50:19 |
|   3 |    TABLE ACCESS FULL      | T2   |    10M|    57M|       |   113K  (1)| 00:22:39 |
|*  4 |    HASH JOIN RIGHT ANTI NA|      |    99M|    15G|   410M|  1382K  (1)| 04:36:25 |
|   5 |     TABLE ACCESS FULL     | T3   |    10M|   295M|       |   113K  (1)| 00:22:39 |
|   6 |     TABLE ACCESS FULL     | T1   |   100M|    12G|       |   543K  (1)| 01:48:42 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("T1"."C1"="C1")
   4 - access("T1"."C2"="C2")

SET AUTOTRACE OFF

SELECT 1 FROM DUAL;

From the above, Oracle is planning to perform a NULL aware hash join between table T1 and T3 (predicted to consume 410MB of space in the TEMP tablespace… is this the true unit of measurement, keep reading), and then join that row source to table T2 using a NULL aware hash join (Oracle 10.2.0.4 and lower will not use a NULL aware hash join – you have been warned) – the SQL statement involving tables T1, T2, and T3 is the SQL statement that will be executed in the PGAMemoryFill.sql script.

In session 2 we will take a look at the optimizer parameters in effect for the last SQL statement executed by Session 1:

SET PAGESIZE 1000
COLUMN CN FORMAT 99
COLUMN NAME FORMAT A37
COLUMN VALUE FORMAT A14
COLUMN DEF FORMAT A3

SELECT
  CHILD_NUMBER CN,
  NAME,
  VALUE,
  ISDEFAULT DEF
FROM
  V$SQL_OPTIMIZER_ENV SOE,
  V$SESSION S
WHERE
  SOE.SQL_ID=S.SQL_ID
  AND SOE.CHILD_NUMBER=S.SQL_CHILD_NUMBER
  AND S.SID=335
ORDER BY
  NAME;

 CN NAME                                  VALUE          DEF
--- ------------------------------------- -------------- ---
  0 _pga_max_size                         368640 KB      NO
  0 active_instance_count                 1              YES
  0 bitmap_merge_area_size                1048576        YES
  0 cell_offload_compaction               ADAPTIVE       YES
  0 cell_offload_plan_display             AUTO           YES
  0 cell_offload_processing               true           YES
  0 cpu_count                             8              YES
  0 cursor_sharing                        exact          YES
  0 db_file_multiblock_read_count         128            YES
  0 hash_area_size                        131072         YES
  0 is_recur_flags                        0              YES
  0 optimizer_capture_sql_plan_baselines  false          YES
  0 optimizer_dynamic_sampling            2              YES
  0 optimizer_features_enable             11.1.0.7       YES
  0 optimizer_index_caching               0              YES
  0 optimizer_index_cost_adj              100            YES
  0 optimizer_mode                        all_rows       YES
  0 optimizer_secure_view_merging         true           YES
  0 optimizer_use_invisible_indexes       false          YES
  0 optimizer_use_pending_statistics      false          YES
  0 optimizer_use_sql_plan_baselines      true           YES
  0 parallel_ddl_mode                     enabled        YES
  0 parallel_degree                       0              YES
  0 parallel_dml_mode                     disabled       YES
  0 parallel_execution_enabled            true           YES
  0 parallel_query_default_dop            0              YES
  0 parallel_query_mode                   enabled        YES
  0 pga_aggregate_target                  1843200 KB     YES
  0 query_rewrite_enabled                 true           YES
  0 query_rewrite_integrity               enforced       YES
  0 result_cache_mode                     MANUAL         YES
  0 skip_unusable_indexes                 true           YES
  0 sort_area_retained_size               0              YES
  0 sort_area_size                        65536          YES
  0 star_transformation_enabled           false          YES
  0 statistics_level                      typical        YES
  0 transaction_isolation_level           read_commited  YES
  0 workarea_size_policy                  auto           YES 

Notice in the above that _pga_max_size was set to 368640KB (360MB – 20% of the PGA_AGGREGATE_TARGET – note that this value does not seem to decrease as hard parses are forced when a lot of PGA memory is in use), and even though the ISDEFAULT column shows that this is not the default value, the value was set automatically based on the PGA_AGGREGATE_TARGET value.  To further demonstate that _pga_max_size has not been adjusted, here are two screen shots from one of my programs that shows all of the initialization parameters that are in effect, note Is Default is set to TRUE for this parameter (to output all of the hidden parameter values, see http://www.jlcomp.demon.co.uk/params.html):

The 368640 KB value reported for the _PGA_MAX_SIZE in the V$SQL_OPTIMIZER_ENV view exactly matches the value for _PGA_MAX_SIZE returned by the query of  X$KSPPI and X$KSPPSV.

Before we start causing damage, let’s check the documentation for the V$SQL_WORKAREA_ACTIVE view:
http://download.oracle.com/docs/cd/B28359_01/server.111/b28274/memory.htm#i48705
http://download-west.oracle.com/docs/cd/B28359_01/server.111/b28320/dynviews_3058.htm

The second of the above links defines the columns in the view.  A couple of those column definitions follow:

  • OPERATION_TYPE: Type of operation using the work area (SORT, HASH JOIN, GROUP BY, BUFFERING, BITMAP MERGE, or BITMAP CREATE)
  • WORK_AREA_SIZE: Maximum size (in bytes) of the work area as it is currently used by the operation
  • ACTUAL_MEM_USED: Amount of PGA memory (in bytes) currently allocated on behalf of this work area. This value should range between 0 and WORK_AREA_SIZE.
  • NUMBER_PASSES: Number of passes corresponding to this work area (0 if running in OPTIMAL mode)
  • TEMPSEG_SIZE: Size (in bytes) of the temporary segment used on behalf of this work area.  This column is NULL if this work area has not (yet) spilled to disk.

While session 1 is busy executing the PGAMemoryFill.sql script, session 2 will periodically query the V$SQL_WORKAREA_ACTIVE view to see what is happening.

In Session 1:

ALTER SESSION SET STATISTICS_LEVEL=ALL;

@PGAMemoryFill.sql

In Session 2 starts repeatedly executing the following SQL statement after a short delay (note that I could have selected the OPERATION_ID column to make it easy to tie the memory used to a specific operation in the execution plan that was displayed earlier):

SELECT
  SQL_ID,
  OPERATION_TYPE,
  WORK_AREA_SIZE,
  ACTUAL_MEM_USED,
  NUMBER_PASSES,
  TEMPSEG_SIZE
FROM
  V$SQL_WORKAREA_ACTIVE;

SQL_ID        OPERATION_TYPE       WORK_AREA_SIZE ACTUAL_MEM_USED NUMBER_PASSES TEMPSEG_SIZE
------------- -------------------- -------------- --------------- ------------- ------------
0k5pr4rx072sv HASH-JOIN                  29298688        20712448             0    121634816

So, Session 1 is using about 19.75MB of PGA memory for a hash join, and according to the definition of the NUMBER_PASSES column, the hash join is currently an optimal execution, yet that seems to conflict with the definition of the TEMPSEG_SIZE definition and the output in that column.  Session 2 will continue to re-execute the above SQL statement, pausing after each execution:

SQL_ID        OPERATION_TYPE       WORK_AREA_SIZE ACTUAL_MEM_USED NUMBER_PASSES TEMPSEG_SIZE
------------- -------------------- -------------- --------------- ------------- ------------
0k5pr4rx072sv HASH-JOIN                  40738816        33011712             0     45088768
0k5pr4rx072sv HASH-JOIN                 189427712        20740096             1    130023424

Now there are two hash joins active for the SQL statement with a total of 51.26MB of PGA memory in use.  One of the hash joins is still an optimal execution, while the second has become a 1 pass execution.

 
SQL_ID        OPERATION_TYPE       WORK_AREA_SIZE ACTUAL_MEM_USED NUMBER_PASSES TEMPSEG_SIZE
------------- -------------------- -------------- --------------- ------------- ------------
0k5pr4rx072sv SORT (v2)                  78364672        28554240             1   1293942784
0k5pr4rx072sv HASH-JOIN                 147055616       148588544             1    814743552
0k5pr4rx072sv HASH-JOIN                 129864704       110271488             1    470810624

Now both of the hash joins are reporting a 1 pass execution.  A V2 sort operation has joined the output, and it too is executing as a 1 pass operation.  The session is now using just over 274MB of PGA memory based on the output of this view.

 
SQL_ID        OPERATION_TYPE       WORK_AREA_SIZE ACTUAL_MEM_USED NUMBER_PASSES TEMPSEG_SIZE
------------- -------------------- -------------- --------------- ------------- ------------
0k5pr4rx072sv SORT (v2)                  78304256        43396096             1   1749024768
0k5pr4rx072sv HASH-JOIN                 147055616       148588544             1    968884224
0k5pr4rx072sv HASH-JOIN                 129864704       110271488             1    591396864

SQL_ID        OPERATION_TYPE       WORK_AREA_SIZE ACTUAL_MEM_USED NUMBER_PASSES TEMPSEG_SIZE
------------- -------------------- -------------- --------------- ------------- ------------
0k5pr4rx072sv SORT (v2)                  80089088         5712896             1   5366611968
0k5pr4rx072sv HASH-JOIN                 147055616       148588544             1   2097152000
0k5pr4rx072sv HASH-JOIN                 129864704       110271488             1   1509949440

SQL_ID        OPERATION_TYPE       WORK_AREA_SIZE ACTUAL_MEM_USED NUMBER_PASSES TEMPSEG_SIZE
------------- -------------------- -------------- --------------- ------------- ------------
0k5pr4rx072sv SORT (v2)                  73031680        67003392             1   8383365120
0k5pr4rx072sv HASH-JOIN                 147055616       148588544             1   3050307584
0k5pr4rx072sv HASH-JOIN                 129864704       110271488             1   2283798528

The session has made it up to 310.77MB of PGA memory, and the TEMPSEG_SIZE column values continue to grow.

--

SQL_ID        OPERATION_TYPE       WORK_AREA_SIZE ACTUAL_MEM_USED NUMBER_PASSES TEMPSEG_SIZE
------------- -------------------- -------------- --------------- ------------- ------------
0k5pr4rx072sv SORT (v2)                  67550208        37590016             1   9634316288
0k5pr4rx072sv HASH-JOIN                  23760896        13456384             1   3338665984
0k5pr4rx072sv HASH-JOIN                 129864704       110271488             1   2607808512

SQL_ID        OPERATION_TYPE       WORK_AREA_SIZE ACTUAL_MEM_USED NUMBER_PASSES TEMPSEG_SIZE
------------- -------------------- -------------- --------------- ------------- ------------
0k5pr4rx072sv SORT (v2)                  67550208        47077376             1   1.0252E+10
0k5pr4rx072sv HASH-JOIN                  23760896        13456384             1   3338665984
0k5pr4rx072sv HASH-JOIN                 129864704       110271488             1   2770337792

SQL_ID        OPERATION_TYPE       WORK_AREA_SIZE ACTUAL_MEM_USED NUMBER_PASSES TEMPSEG_SIZE
------------- -------------------- -------------- --------------- ------------- ------------
0k5pr4rx072sv SORT (v2)                 188743680       188285952             1   1.1250E+10
0k5pr4rx072sv HASH-JOIN                  18951168        17658880             1   2839543808

One of the hash join operations has completed, must be about done now.

SQL_ID        OPERATION_TYPE       WORK_AREA_SIZE ACTUAL_MEM_USED NUMBER_PASSES TEMPSEG_SIZE
------------- -------------------- -------------- --------------- ------------- ------------
0k5pr4rx072sv SORT (v2)                  90914816        91714560             1   1.1966E+10

The final hash join finished, and the TEMPSEG_SIZE is now 1.1966E+10, which indicates that the temporary segment size in the TEMP tablespace is about 11.14GB.  That is kind of big – remember that number.  When just the sort operation is returned by the above query, session 2 executes this SQL statement:

COLUMN VALUE FORMAT 999,999,999,990

SELECT
  SN.NAME,
  SS.VALUE
FROM
  V$STATNAME SN,
  V$SESSTAT SS
WHERE
  SS.SID=335
  AND SS.STATISTIC#=SN.STATISTIC#
  AND SN.NAME LIKE '%pga%';

NAME                             VALUE
------------------------ -------------
session pga memory       3,391,500,272
session pga memory max   3,391,500,272

Based on the above, Session 1 is not consuming about 90MB of PGA memory, but instead roughly 3234.39MB of PGA memory (the 2 DBAs still standing and clapping should sit down now).  Let’s hope that the DBA responsible for this database did not consider the 1800MB value for the PGA_AGGREGATE_TARGET parameter as a hard upper limit, and set the other parameters to take full advantage of the 12GB of memory in the server.

Once the script ends, the above SQL statement returns the following values:

NAME                             VALUE
------------------------ -------------
session pga memory          10,039,280
session pga memory max   3,391,500,272 

The session is still consuming 9.57MB of PGA memory just sitting idle – remember this number.

Now just to make sure that 0k5pr4rx072sv, as output by the query of the V$SQL_WORKAREA_ACTIVE view, is the SQL_ID for our SQL statement:

SELECT
  SQL_TEXT
FROM
  V$SQL
WHERE
  SQL_ID='0k5pr4rx072sv';

SQL_TEXT
--------------------------------------------------------------------------------
SELECT T1.C1, T1.C2, T1.C3 FROM T1 WHERE T1.C1 NOT IN ( SELECT C1 FROM T2) AND T
1.C2 NOT IN ( SELECT C2 FROM T3) ORDER BY T1.C2 DESC, T1.C1 DESC

Good, now let’s check the execution plan for the SQL statement:

SELECT
  *
FROM
  TABLE(DBMS_XPLAN.DISPLAY_CURSOR('0k5pr4rx072sv',0,'ALLSTATS LAST'));

SQL_ID  0k5pr4rx072sv, child number 0
-------------------------------------
SELECT T1.C1, T1.C2, T1.C3 FROM T1 WHERE T1.C1 NOT IN ( SELECT C1 FROM
T2) AND T1.C2 NOT IN ( SELECT C2 FROM T3) ORDER BY T1.C2 DESC, T1.C1
DESC

Plan hash value: 2719846691

---------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                 | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  | Writes |  OMem |  1Mem | Used-Mem | Used-Tmp|
---------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |      |      1 |        |     80M|00:24:55.23 |    2833K|   5024K|   2190K|       |       |          |         |
|   1 |  SORT ORDER BY            |      |      1 |      1 |     80M|00:24:55.23 |    2833K|   5024K|   2190K|    12G|    35M|  179M (1)|      11M|
|*  2 |   HASH JOIN RIGHT ANTI NA |      |      1 |      1 |     80M|00:12:08.70 |    2833K|   3563K|    730K|   269M|    14M|  105M (1)|    2708K|
|   3 |    TABLE ACCESS FULL      | T2   |      1 |     10M|     10M|00:00:20.03 |     416K|    416K|      0 |       |       |          |         |
|*  4 |    HASH JOIN RIGHT ANTI NA|      |      1 |     99M|     90M|00:08:25.72 |    2416K|   2811K|    394K|   521M|    19M|  141M (1)|    3184K|
|   5 |     TABLE ACCESS FULL     | T3   |      1 |     10M|     10M|00:00:20.12 |     416K|    416K|      0 |       |       |          |         |
|   6 |     TABLE ACCESS FULL     | T1   |      1 |    100M|    100M|00:03:20.37 |    2000K|   1999K|      0 |       |       |          |         |
---------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("T1"."C1"="C1")
   4 - access("T1"."C2"="C2") 

Let’s see, almost 25 minutes to execute the SQL statement, a total of roughly 425MB of memory was used during one pass workarea executions (but from our earlier output, not all of that memory was in use at the same time) and the SORT ORDER BY operation used 11M of TEMP tablespace space… but is that 11MB, or is it 11 million KB, or is it 11,534,336 KB (2^20 * 11 KB)?  Remember earlier we found “that the temporary segment size in the TEMP tablespace is about 11.14GB”, so that 11M means 11,534,336 KB, or about 11GB.  OK, that was slightly confusing, but we are not done yet.  (Side note: the author of the book “Troubleshooting Oracle Performance” commented on the Used-Tmp column here.)

Let’s have some fun and burn memory (or have some fun until something breaks).  Now, we will run the PGAMemoryFill.sql script in 4 sessions, with a fifth session to monitor the progress (if you want to have more fun, modify the WHERE clause on the T1 table so that all workarea executions are optimal, rather than spilling to disk in a one-pass or multi-pass operation).  In 4 sessions, execute the script:

@PGAMemoryFill.sql

After a short pause, session 5 (the monitoring session) should periodically submit the following query:

SELECT
  SN.NAME,
  SUM(SS.VALUE) VALUE
FROM
  V$STATNAME SN,
  V$SESSTAT SS
WHERE
  SS.STATISTIC#=SN.STATISTIC#
  AND SN.NAME='session pga memory'
GROUP BY
  SN.NAME;

NAME                            VALUE
-------------------- ----------------
session pga memory        144,351,120

Roughly 137.66MB of PGA memory in use – I wonder if we will hit 3,391,500,272 * 4 = 12.63GB of PGA memory in use – 4 times the value seen for the single session?)  Well, let’s keep executing the above query with brief pauses between each execution:

NAME                            VALUE
-------------------- ----------------
session pga memory        144,351,120

NAME                            VALUE
-------------------- ----------------
session pga memory        285,902,000

NAME                            VALUE
-------------------- ----------------
session pga memory      1,191,920,144

NAME                            VALUE
-------------------- ----------------
session pga memory      1,296,843,280

NAME                            VALUE
-------------------- ----------------
session pga memory      1,379,306,720

NAME                            VALUE
-------------------- ----------------
session pga memory      1,401,504,272

NAME                            VALUE
-------------------- ----------------
session pga memory      1,465,467,408

NAME                            VALUE
-------------------- ----------------
session pga memory      1,473,207,536

NAME                            VALUE
-------------------- ----------------
session pga memory      1,484,283,120

Let’s check one of the sessions to see how it is doing:

SELECT
  SN.NAME,
  SS.VALUE
FROM
  V$STATNAME SN,
  V$SESSTAT SS
WHERE
  SS.SID=335
  AND SS.STATISTIC#=SN.STATISTIC#
  AND SN.NAME LIKE '%pga%';

NAME                              VALUE
---------------------- ----------------
session pga memory          357,904,368
session pga memory maz      357,904,368

This one session is using roughly 341.32MB of PGA memory, now back to the other query:

NAME                            VALUE
-------------------- ----------------
session pga memory      1,476,418,800

NAME                            VALUE
-------------------- ----------------
session pga memory      4,517,252,992

NAME                            VALUE
-------------------- ----------------
session pga memory      4,518,556,832

The PGA memory usage seems to have stabilized at 4,309.23MB (4.21GB), so we didn’t bring down the server by exceeding the 12GB of memory in the server, but this is 2.4 times the value of the PGA_AGGREGATE_TARGET parameter.  Let’s check on the progress of our 4 sessions:

SELECT
  SS.SID,
  SN.NAME,
  SS.VALUE
FROM
  V$STATNAME SN,
  V$SESSTAT SS
WHERE
  SS.VALUE>=300*1024*1024
  AND SS.STATISTIC#=SN.STATISTIC#
  AND SN.NAME LIKE '%pga%'
ORDER BY
  SS.SID,
  SN.NAME;

SID NAME                              VALUE
--- ---------------------- ----------------
297 session pga memory        3,322,442,384
297 session pga memory max    3,384,832,656
304 session pga memory          368,734,864
304 session pga memory max      373,518,992
305 session pga memory          368,734,864
305 session pga memory max      368,734,864
335 session pga memory          357,904,368
335 session pga memory max      357,904,368

The above seems to show that one of the sessions is still using 3,168.53GB of PGA memory, while the other three have each retreated to roughly 352MB of PGA memory.  Let’s check on the sessions…  The script in 3 of the 4 sessions crashed with this error:

ERROR at line 1:
ORA-01652: unable to extend temp segment by 128 in tablespace TEMP
ORA-06512: at line 33

Quick math time: is 11.14GB * 4 greater than the maximum size of a SMALLFILE datafile in a database with an 8KB block size?  128 * 8KB = 1MB, which is the extent size in the TEMP tablespace.  OK, if the script crashed in 3 of the 4 sessions, why are each of those sessions still consuming about 352MB of PGA memory when they are just sitting there waiting for the next SQL statement?  This would certainly drive someone mad trying to figure out what Jimmy the Developer has done.  So, how do you get the memory back from the session so that it can be returned to the operating system?  You must execute this specially crafted SQL statement in each session:

SELECT
  42
FROM
  DUAL
WHERE
  1=2;

OK, it does not need to be that SQL statement, but until another SQL statement is executed, the 352MB acquired by each of the three sessions cannot be used for anything else.  And that, my friends, is the developer’s secret weapon for stealing all of the memory in the server.  Now try to modify the SQL statement in the PGAMemoryFill.sql script so that all three workarea executions are optimal executions to see how high the memory usage can be pushed while executing the SQL statement.





Explain Plan Lies, Autotrace Lies, TKPROF Lies, What is the Plan?

11 01 2010

January 11, 2010

As some of you might be aware, EXPLAIN PLAN will occasionally show the wrong execution plan for a SQL statement.  SQL*Plus’ AUTOTRACE feature experiences a similar problem with generating accurate plans from time to time, especially when the SQL statement uses bind variables.  Did you know that TKPROF may also show the wrong execution plan from time to time?  I set up a simple test case to demonstrate this behavior.

The test case tables:

CREATE TABLE T1(
  C1 NUMBER,
  C2 VARCHAR2(255),
  PRIMARY KEY (C1));

CREATE TABLE T2(
  C1 NUMBER,
  C2 VARCHAR2(255),
  PRIMARY KEY (C1));

INSERT INTO
  T1
SELECT
  ROWNUM,
  LPAD('A',255,'A')
FROM
  (SELECT
    ROWNUM RN
  FROM
    DUAL
  CONNECT BY
    LEVEL<=10000) V1,
  (SELECT
    ROWNUM RN
  FROM
    DUAL
  CONNECT BY
    LEVEL<=30) V2;

INSERT INTO
  T2
SELECT
  ROWNUM,
  LPAD('A',255,'A')
FROM
  (SELECT
    ROWNUM RN
  FROM
    DUAL
  CONNECT BY
    LEVEL<=10000) V1,
  (SELECT
    ROWNUM RN
  FROM
    DUAL
  CONNECT BY
    LEVEL<=30) V2;

COMMIT;

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T1',CASCADE=>TRUE)
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T2',CASCADE=>TRUE)

The above creates two tables, each with 300,000 rows with a primary key column (thus an index exists for the column), and a second column that acts as padding to make the row longer (to discourage full table scans).

The test script that I built looks like this:

ALTER SESSION SET STATISTICS_LEVEL='ALL';

VARIABLE N1 NUMBER
VARIABLE N2 NUMBER

SET AUTOTRACE TRACEONLY EXPLAIN
SET LINESIZE 150
SET PAGESIZE 2000
SET ARRAYSIZE 100

SPOOL DIFF_EXPLAIN_PLAN_TEST.TXT

EXEC :N1:=1
EXEC :N2:=2

SELECT
  T1.C1,
  T2.C1,
  SUBSTR(T1.C2,1,1) T1_C2,
  SUBSTR(T2.C2,1,1) T2_C2
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1
  AND T1.C1 BETWEEN :N1 AND :N2;

EXEC :N1:=1
EXEC :N2:=10

SELECT
  T1.C1,
  T2.C1,
  SUBSTR(T1.C2,1,1) T1_C2,
  SUBSTR(T2.C2,1,1) T2_C2
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1
  AND T1.C1 BETWEEN :N1 AND :N2;

EXEC :N1:=1
EXEC :N2:=100

SELECT
  T1.C1,
  T2.C1,
  SUBSTR(T1.C2,1,1) T1_C2,
  SUBSTR(T2.C2,1,1) T2_C2
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1
  AND T1.C1 BETWEEN :N1 AND :N2;

EXEC :N1:=1
EXEC :N2:=1000

SELECT
  T1.C1,
  T2.C1,
  SUBSTR(T1.C2,1,1) T1_C2,
  SUBSTR(T2.C2,1,1) T2_C2
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1
  AND T1.C1 BETWEEN :N1 AND :N2;

EXEC :N1:=1
EXEC :N2:=10000

SELECT
  T1.C1,
  T2.C1,
  SUBSTR(T1.C2,1,1) T1_C2,
  SUBSTR(T2.C2,1,1) T2_C2
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1
  AND T1.C1 BETWEEN :N1 AND :N2;

EXEC :N1:=1
EXEC :N2:=100000

SELECT
  T1.C1,
  T2.C1,
  SUBSTR(T1.C2,1,1) T1_C2,
  SUBSTR(T2.C2,1,1) T2_C2
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1
  AND T1.C1 BETWEEN :N1 AND :N2;

EXEC :N1:=1
EXEC :N2:=300000

SELECT
  T1.C1,
  T2.C1,
  SUBSTR(T1.C2,1,1) T1_C2,
  SUBSTR(T2.C2,1,1) T2_C2
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1
  AND T1.C1 BETWEEN :N1 AND :N2;

/* --------------------------------------------------- */

ALTER SYSTEM FLUSH SHARED_POOL;

SET AUTOTRACE OFF

EXEC :N1:=1
EXEC :N2:=2

SELECT
  T1.C1,
  T2.C1,
  SUBSTR(T1.C2,1,1) T1_C2,
  SUBSTR(T2.C2,1,1) T2_C2
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1
  AND T1.C1 BETWEEN :N1 AND :N2;

SELECT
  T1.C1,
  T2.C1,
  SUBSTR(T1.C2,1,1) T1_C2,
  SUBSTR(T2.C2,1,1) T2_C2
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1
  AND T1.C1 BETWEEN :N1 AND :N2;

SELECT
  T1.C1,
  T2.C1,
  SUBSTR(T1.C2,1,1) T1_C2,
  SUBSTR(T2.C2,1,1) T2_C2
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1
  AND T1.C1 BETWEEN :N1 AND :N2;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

EXEC :N1:=1
EXEC :N2:=10

SELECT
  T1.C1,
  T2.C1,
  SUBSTR(T1.C2,1,1) T1_C2,
  SUBSTR(T2.C2,1,1) T2_C2
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1
  AND T1.C1 BETWEEN :N1 AND :N2;

SELECT
  T1.C1,
  T2.C1,
  SUBSTR(T1.C2,1,1) T1_C2,
  SUBSTR(T2.C2,1,1) T2_C2
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1
  AND T1.C1 BETWEEN :N1 AND :N2;

SELECT
  T1.C1,
  T2.C1,
  SUBSTR(T1.C2,1,1) T1_C2,
  SUBSTR(T2.C2,1,1) T2_C2
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1
  AND T1.C1 BETWEEN :N1 AND :N2;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

EXEC :N1:=1
EXEC :N2:=100

SELECT
  T1.C1,
  T2.C1,
  SUBSTR(T1.C2,1,1) T1_C2,
  SUBSTR(T2.C2,1,1) T2_C2
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1
  AND T1.C1 BETWEEN :N1 AND :N2;

SELECT
  T1.C1,
  T2.C1,
  SUBSTR(T1.C2,1,1) T1_C2,
  SUBSTR(T2.C2,1,1) T2_C2
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1
  AND T1.C1 BETWEEN :N1 AND :N2;

SELECT
  T1.C1,
  T2.C1,
  SUBSTR(T1.C2,1,1) T1_C2,
  SUBSTR(T2.C2,1,1) T2_C2
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1
  AND T1.C1 BETWEEN :N1 AND :N2;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

EXEC :N1:=1
EXEC :N2:=1000

SELECT
  T1.C1,
  T2.C1,
  SUBSTR(T1.C2,1,1) T1_C2,
  SUBSTR(T2.C2,1,1) T2_C2
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1
  AND T1.C1 BETWEEN :N1 AND :N2;

SELECT
  T1.C1,
  T2.C1,
  SUBSTR(T1.C2,1,1) T1_C2,
  SUBSTR(T2.C2,1,1) T2_C2
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1
  AND T1.C1 BETWEEN :N1 AND :N2;

SELECT
  T1.C1,
  T2.C1,
  SUBSTR(T1.C2,1,1) T1_C2,
  SUBSTR(T2.C2,1,1) T2_C2
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1
  AND T1.C1 BETWEEN :N1 AND :N2;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

EXEC :N1:=1
EXEC :N2:=10000

SELECT
  T1.C1,
  T2.C1,
  SUBSTR(T1.C2,1,1) T1_C2,
  SUBSTR(T2.C2,1,1) T2_C2
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1
  AND T1.C1 BETWEEN :N1 AND :N2;

SELECT
  T1.C1,
  T2.C1,
  SUBSTR(T1.C2,1,1) T1_C2,
  SUBSTR(T2.C2,1,1) T2_C2
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1
  AND T1.C1 BETWEEN :N1 AND :N2;

SELECT
  T1.C1,
  T2.C1,
  SUBSTR(T1.C2,1,1) T1_C2,
  SUBSTR(T2.C2,1,1) T2_C2
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1
  AND T1.C1 BETWEEN :N1 AND :N2;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

EXEC :N1:=1
EXEC :N2:=100000

SELECT
  T1.C1,
  T2.C1,
  SUBSTR(T1.C2,1,1) T1_C2,
  SUBSTR(T2.C2,1,1) T2_C2
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1
  AND T1.C1 BETWEEN :N1 AND :N2;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

EXEC :N1:=1
EXEC :N2:=300000

SELECT
  T1.C1,
  T2.C1,
  SUBSTR(T1.C2,1,1) T1_C2,
  SUBSTR(T2.C2,1,1) T2_C2
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1
  AND T1.C1 BETWEEN :N1 AND :N2;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

/* ------------------------------ */

SET AUTOTRACE TRACEONLY STATISTICS
ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT FOREVER, LEVEL 8';
ALTER SESSION SET TRACEFILE_IDENTIFIER = 'DIFF_EXPLAIN_PLAN_TEST';

EXEC :N1:=1
EXEC :N2:=2

SELECT
  T1.C1,
  T2.C1,
  SUBSTR(T1.C2,1,1) T1_C2,
  SUBSTR(T2.C2,1,1) T2_C2
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1
  AND T1.C1 BETWEEN :N1 AND :N2;

EXEC :N1:=1
EXEC :N2:=10
SELECT
  T1.C1,
  T2.C1,
  SUBSTR(T1.C2,1,1) T1_C2,
  SUBSTR(T2.C2,1,1) T2_C2
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1
  AND T1.C1 BETWEEN :N1 AND :N2;

EXEC :N1:=1
EXEC :N2:=100

SELECT
  T1.C1,
  T2.C1,
  SUBSTR(T1.C2,1,1) T1_C2,
  SUBSTR(T2.C2,1,1) T2_C2
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1
  AND T1.C1 BETWEEN :N1 AND :N2;

EXEC :N1:=1
EXEC :N2:=1000

SELECT
  T1.C1,
  T2.C1,
  SUBSTR(T1.C2,1,1) T1_C2,
  SUBSTR(T2.C2,1,1) T2_C2
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1
  AND T1.C1 BETWEEN :N1 AND :N2;

EXEC :N1:=1
EXEC :N2:=10000

SELECT
  T1.C1,
  T2.C1,
  SUBSTR(T1.C2,1,1) T1_C2,
  SUBSTR(T2.C2,1,1) T2_C2
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1
  AND T1.C1 BETWEEN :N1 AND :N2;

EXEC :N1:=1
EXEC :N2:=100000

SELECT
  T1.C1,
  T2.C1,
  SUBSTR(T1.C2,1,1) T1_C2,
  SUBSTR(T2.C2,1,1) T2_C2
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1
  AND T1.C1 BETWEEN :N1 AND :N2;

EXEC :N1:=1
EXEC :N2:=300000

SELECT
  T1.C1,
  T2.C1,
  SUBSTR(T1.C2,1,1) T1_C2,
  SUBSTR(T2.C2,1,1) T2_C2
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1
  AND T1.C1 BETWEEN :N1 AND :N2;

ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT OFF';

SET AUTOTRACE OFF

SPOOL OFF 

There are three sections in the script:

  1. Display the AUTOTRACE generated execution plans without executing the queries, adjusting the bind variable values before each execution.
  2. Execute the SQL statement and display the actual execution plan using DBMS_XPLAN, adjusting the bind variable values before each execution (note that the flush of the shared pool seems to be required, the SQL statement is executed a couple times with each set of bind variable values to trigger adaptive cursor sharing in Oracle 11.1.0.6 and above).
  3. Execute the SQL statement displaying only the execution statistics, with a 10046 trace at level 8 enabled.

What were the results on Oracle 11.1.0.7 (you may see different results on 11.1.0.6 and 11.2.0.1)?  Section 1 of the output (slightly cleaned up) follows – keep on eye on the predicted number of rows and the Predicate Information section:

SQL> EXEC :N1:=1
SQL> EXEC :N2:=2
SQL>
SQL> SELECT
  2    T1.C1,
  3    T2.C1,
  4    SUBSTR(T1.C2,1,1) T1_C2,
  5    SUBSTR(T2.C2,1,1) T2_C2
  6  FROM
  7    T1,
  8    T2
  9  WHERE
 10    T1.C1=T2.C1
 11    AND T1.C1 BETWEEN :N1 AND :N2;

Execution Plan
----------------------------------------------------------                    
Plan hash value: 2267210268   

------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name         | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |              | 15000 |  7617K|       |   498   (1)| 00:00:06 |
|*  1 |  FILTER                       |              |       |       |       |            |          |
|*  2 |   HASH JOIN                   |              | 15000 |  7617K|  3992K|   498   (1)| 00:00:06 |
|   3 |    TABLE ACCESS BY INDEX ROWID| T1           | 15000 |  3808K|       |    55   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN          | SYS_C0030339 |  1350 |       |       |     4   (0)| 00:00:01 |
|   5 |    TABLE ACCESS BY INDEX ROWID| T2           | 15000 |  3808K|       |    55   (0)| 00:00:01 |
|*  6 |     INDEX RANGE SCAN          | SYS_C0030340 |  1350 |       |       |     4   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):  
---------------------------------------------------  
   1 - filter(TO_NUMBER(:N1)<=TO_NUMBER(:N2))        
   2 - access("T1"."C1"="T2"."C1")                   
   4 - access("T1"."C1">=TO_NUMBER(:N1) AND "T1"."C1"<=TO_NUMBER(:N2))        
   6 - access("T2"."C1">=TO_NUMBER(:N1) AND "T2"."C1"<=TO_NUMBER(:N2))        

SQL>
SQL> EXEC :N1:=1
SQL> EXEC :N2:=10
SQL>
SQL> SELECT
  2    T1.C1,
  3    T2.C1,
  4    SUBSTR(T1.C2,1,1) T1_C2,
  5    SUBSTR(T2.C2,1,1) T2_C2
  6  FROM
  7    T1,
  8    T2
  9  WHERE
 10    T1.C1=T2.C1
 11    AND T1.C1 BETWEEN :N1 AND :N2;

Execution Plan
----------------------------------------------------------                    
Plan hash value: 2267210268   

------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name         | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |              | 15000 |  7617K|       |   498   (1)| 00:00:06 |
|*  1 |  FILTER                       |              |       |       |       |            |          |
|*  2 |   HASH JOIN                   |              | 15000 |  7617K|  3992K|   498   (1)| 00:00:06 |
|   3 |    TABLE ACCESS BY INDEX ROWID| T1           | 15000 |  3808K|       |    55   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN          | SYS_C0030339 |  1350 |       |       |     4   (0)| 00:00:01 |
|   5 |    TABLE ACCESS BY INDEX ROWID| T2           | 15000 |  3808K|       |    55   (0)| 00:00:01 |
|*  6 |     INDEX RANGE SCAN          | SYS_C0030340 |  1350 |       |       |     4   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):  
---------------------------------------------------  
   1 - filter(TO_NUMBER(:N1)<=TO_NUMBER(:N2))        
   2 - access("T1"."C1"="T2"."C1")                   
   4 - access("T1"."C1">=TO_NUMBER(:N1) AND "T1"."C1"<=TO_NUMBER(:N2))        
   6 - access("T2"."C1">=TO_NUMBER(:N1) AND "T2"."C1"<=TO_NUMBER(:N2))        

SQL>
SQL> EXEC :N1:=1
SQL> EXEC :N2:=100
SQL>
SQL> SELECT
  2    T1.C1,
  3    T2.C1,
  4    SUBSTR(T1.C2,1,1) T1_C2,
  5    SUBSTR(T2.C2,1,1) T2_C2
  6  FROM
  7    T1,
  8    T2
  9  WHERE
 10    T1.C1=T2.C1
 11    AND T1.C1 BETWEEN :N1 AND :N2;

Execution Plan
----------------------------------------------------------                    
Plan hash value: 2267210268   

------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name         | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |              | 15000 |  7617K|       |   498   (1)| 00:00:06 |
|*  1 |  FILTER                       |              |       |       |       |            |          |
|*  2 |   HASH JOIN                   |              | 15000 |  7617K|  3992K|   498   (1)| 00:00:06 |
|   3 |    TABLE ACCESS BY INDEX ROWID| T1           | 15000 |  3808K|       |    55   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN          | SYS_C0030339 |  1350 |       |       |     4   (0)| 00:00:01 |
|   5 |    TABLE ACCESS BY INDEX ROWID| T2           | 15000 |  3808K|       |    55   (0)| 00:00:01 |
|*  6 |     INDEX RANGE SCAN          | SYS_C0030340 |  1350 |       |       |     4   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):  
---------------------------------------------------  
   1 - filter(TO_NUMBER(:N1)<=TO_NUMBER(:N2))        
   2 - access("T1"."C1"="T2"."C1")                   
   4 - access("T1"."C1">=TO_NUMBER(:N1) AND "T1"."C1"<=TO_NUMBER(:N2))        
   6 - access("T2"."C1">=TO_NUMBER(:N1) AND "T2"."C1"<=TO_NUMBER(:N2))        

SQL>
SQL> EXEC :N1:=1
SQL> EXEC :N2:=1000
SQL>
SQL> SELECT
  2    T1.C1,
  3    T2.C1,
  4    SUBSTR(T1.C2,1,1) T1_C2,
  5    SUBSTR(T2.C2,1,1) T2_C2
  6  FROM
  7    T1,
  8    T2
  9  WHERE
 10    T1.C1=T2.C1
 11    AND T1.C1 BETWEEN :N1 AND :N2;

Execution Plan
----------------------------------------------------------                    
Plan hash value: 2267210268   

------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name         | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |              | 15000 |  7617K|       |   498   (1)| 00:00:06 |
|*  1 |  FILTER                       |              |       |       |       |            |          |
|*  2 |   HASH JOIN                   |              | 15000 |  7617K|  3992K|   498   (1)| 00:00:06 |
|   3 |    TABLE ACCESS BY INDEX ROWID| T1           | 15000 |  3808K|       |    55   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN          | SYS_C0030339 |  1350 |       |       |     4   (0)| 00:00:01 |
|   5 |    TABLE ACCESS BY INDEX ROWID| T2           | 15000 |  3808K|       |    55   (0)| 00:00:01 |
|*  6 |     INDEX RANGE SCAN          | SYS_C0030340 |  1350 |       |       |     4   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):  
---------------------------------------------------  
   1 - filter(TO_NUMBER(:N1)<=TO_NUMBER(:N2))        
   2 - access("T1"."C1"="T2"."C1")                   
   4 - access("T1"."C1">=TO_NUMBER(:N1) AND "T1"."C1"<=TO_NUMBER(:N2))        
   6 - access("T2"."C1">=TO_NUMBER(:N1) AND "T2"."C1"<=TO_NUMBER(:N2))        

SQL>
SQL> EXEC :N1:=1
SQL> EXEC :N2:=10000
SQL>
SQL> SELECT
  2    T1.C1,
  3    T2.C1,
  4    SUBSTR(T1.C2,1,1) T1_C2,
  5    SUBSTR(T2.C2,1,1) T2_C2
  6  FROM
  7    T1,
  8    T2
  9  WHERE
 10    T1.C1=T2.C1
 11    AND T1.C1 BETWEEN :N1 AND :N2;

Execution Plan
----------------------------------------------------------                    
Plan hash value: 2267210268   

------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name         | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |              | 15000 |  7617K|       |   498   (1)| 00:00:06 |
|*  1 |  FILTER                       |              |       |       |       |            |          |
|*  2 |   HASH JOIN                   |              | 15000 |  7617K|  3992K|   498   (1)| 00:00:06 |
|   3 |    TABLE ACCESS BY INDEX ROWID| T1           | 15000 |  3808K|       |    55   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN          | SYS_C0030339 |  1350 |       |       |     4   (0)| 00:00:01 |
|   5 |    TABLE ACCESS BY INDEX ROWID| T2           | 15000 |  3808K|       |    55   (0)| 00:00:01 |
|*  6 |     INDEX RANGE SCAN          | SYS_C0030340 |  1350 |       |       |     4   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):  
---------------------------------------------------  
   1 - filter(TO_NUMBER(:N1)<=TO_NUMBER(:N2))        
   2 - access("T1"."C1"="T2"."C1")                   
   4 - access("T1"."C1">=TO_NUMBER(:N1) AND "T1"."C1"<=TO_NUMBER(:N2))        
   6 - access("T2"."C1">=TO_NUMBER(:N1) AND "T2"."C1"<=TO_NUMBER(:N2))        

SQL>
SQL> EXEC :N1:=1
SQL> EXEC :N2:=100000
SQL>
SQL> SELECT
  2    T1.C1,
  3    T2.C1,
  4    SUBSTR(T1.C2,1,1) T1_C2,
  5    SUBSTR(T2.C2,1,1) T2_C2
  6  FROM
  7    T1,
  8    T2
  9  WHERE
 10    T1.C1=T2.C1
 11    AND T1.C1 BETWEEN :N1 AND :N2;

Execution Plan
----------------------------------------------------------                    
Plan hash value: 2267210268   

------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name         | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |              | 15000 |  7617K|       |   498   (1)| 00:00:06 |
|*  1 |  FILTER                       |              |       |       |       |            |          |
|*  2 |   HASH JOIN                   |              | 15000 |  7617K|  3992K|   498   (1)| 00:00:06 |
|   3 |    TABLE ACCESS BY INDEX ROWID| T1           | 15000 |  3808K|       |    55   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN          | SYS_C0030339 |  1350 |       |       |     4   (0)| 00:00:01 |
|   5 |    TABLE ACCESS BY INDEX ROWID| T2           | 15000 |  3808K|       |    55   (0)| 00:00:01 |
|*  6 |     INDEX RANGE SCAN          | SYS_C0030340 |  1350 |       |       |     4   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):  
---------------------------------------------------  
   1 - filter(TO_NUMBER(:N1)<=TO_NUMBER(:N2))        
   2 - access("T1"."C1"="T2"."C1")                   
   4 - access("T1"."C1">=TO_NUMBER(:N1) AND "T1"."C1"<=TO_NUMBER(:N2))        
   6 - access("T2"."C1">=TO_NUMBER(:N1) AND "T2"."C1"<=TO_NUMBER(:N2))        

SQL>
SQL> EXEC :N1:=1
SQL> EXEC :N2:=300000
SQL>
SQL> SELECT
  2    T1.C1,
  3    T2.C1,
  4    SUBSTR(T1.C2,1,1) T1_C2,
  5    SUBSTR(T2.C2,1,1) T2_C2
  6  FROM
  7    T1,
  8    T2
  9  WHERE
 10    T1.C1=T2.C1
 11    AND T1.C1 BETWEEN :N1 AND :N2;

Execution Plan
----------------------------------------------------------                    
Plan hash value: 2267210268   

------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name         | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |              | 15000 |  7617K|       |   498   (1)| 00:00:06 |
|*  1 |  FILTER                       |              |       |       |       |            |          |
|*  2 |   HASH JOIN                   |              | 15000 |  7617K|  3992K|   498   (1)| 00:00:06 |
|   3 |    TABLE ACCESS BY INDEX ROWID| T1           | 15000 |  3808K|       |    55   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN          | SYS_C0030339 |  1350 |       |       |     4   (0)| 00:00:01 |
|   5 |    TABLE ACCESS BY INDEX ROWID| T2           | 15000 |  3808K|       |    55   (0)| 00:00:01 |
|*  6 |     INDEX RANGE SCAN          | SYS_C0030340 |  1350 |       |       |     4   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):  
---------------------------------------------------  
   1 - filter(TO_NUMBER(:N1)<=TO_NUMBER(:N2))        
   2 - access("T1"."C1"="T2"."C1")                   
   4 - access("T1"."C1">=TO_NUMBER(:N1) AND "T1"."C1"<=TO_NUMBER(:N2))        
   6 - access("T2"."C1">=TO_NUMBER(:N1) AND "T2"."C1"<=TO_NUMBER(:N2))

Notice that the optimizer is estimating that each table will return 15,000 of the 300,000 rows, 5% of the rows in the tables.  Also notice in the Predicate Information section that AUTOTRACE is treating each of the bind variables as if it were defined as a VARCHAR2, rather than a NUMBER.

Next, we will move on to the second section of the script to see the actual execution plans.  Keep an eye on the E-Rows column, the execution plan, and the order of the query results.

SQL> EXEC :N1:=1
SQL> EXEC :N2:=2
SQL> SELECT
  2    T1.C1,
  3    T2.C1,
  4    SUBSTR(T1.C2,1,1) T1_C2,
  5    SUBSTR(T2.C2,1,1) T2_C2
  6  FROM
  7    T1,
  8    T2
  9  WHERE
 10    T1.C1=T2.C1
 11    AND T1.C1 BETWEEN :N1 AND :N2;

        C1         C1 T T     
---------- ---------- - -     
         1          1 A A     
         2          2 A A     

SQL_ID  bgjafhqwjt1zt, child number 0
Plan hash value: 4256353061   
---------------------------------------------------------------------------------------------------------                     
| Id  | Operation                      | Name         | Starts | E-Rows | A-Rows |   A-Time   | Buffers |                     
---------------------------------------------------------------------------------------------------------                     
|   0 | SELECT STATEMENT               |              |      1 |        |      2 |00:00:00.01 |      11 |                     
|*  1 |  FILTER                        |              |      1 |        |      2 |00:00:00.01 |      11 |                     
|   2 |   NESTED LOOPS                 |              |      1 |        |      2 |00:00:00.01 |      11 |                     
|   3 |    NESTED LOOPS                |              |      1 |      1 |      2 |00:00:00.01 |       9 |                     
|   4 |     TABLE ACCESS BY INDEX ROWID| T1           |      1 |      2 |      2 |00:00:00.01 |       5 |                     
|*  5 |      INDEX RANGE SCAN          | SYS_C0030339 |      1 |      2 |      2 |00:00:00.01 |       3 |                     
|*  6 |     INDEX UNIQUE SCAN          | SYS_C0030340 |      2 |      1 |      2 |00:00:00.01 |       4 |                     
|   7 |    TABLE ACCESS BY INDEX ROWID | T2           |      2 |      1 |      2 |00:00:00.01 |       2 |                     
---------------------------------------------------------------------------------------------------------                     

Predicate Information (identified by operation id):  
---------------------------------------------------  
   1 - filter(:N1<=:N2)
   5 - access("T1"."C1">=:N1 AND "T1"."C1"<=:N2)     
   6 - access("T1"."C1"="T2"."C1")                   
       filter(("T2"."C1"<=:N2 AND "T2"."C1">=:N1))   

SQL> EXEC :N1:=1
SQL> EXEC :N2:=10
SQL> SELECT
  2    T1.C1,
  3    T2.C1,
  4    SUBSTR(T1.C2,1,1) T1_C2,
  5    SUBSTR(T2.C2,1,1) T2_C2
  6  FROM
  7    T1,
  8    T2
  9  WHERE
 10    T1.C1=T2.C1
 11    AND T1.C1 BETWEEN :N1 AND :N2;

        C1         C1 T T     
---------- ---------- - -     
         1          1 A A     
         2          2 A A     
         3          3 A A     
         4          4 A A     
         5          5 A A     
         6          6 A A     
         7          7 A A     
         8          8 A A     
         9          9 A A     
        10         10 A A     

SQL_ID  bgjafhqwjt1zt, child number 0
Plan hash value: 4256353061   
---------------------------------------------------------------------------------------------------------                     
| Id  | Operation                      | Name         | Starts | E-Rows | A-Rows |   A-Time   | Buffers |                     
---------------------------------------------------------------------------------------------------------                     
|   0 | SELECT STATEMENT               |              |      1 |        |     10 |00:00:00.01 |      24 |                     
|*  1 |  FILTER                        |              |      1 |        |     10 |00:00:00.01 |      24 |                     
|   2 |   NESTED LOOPS                 |              |      1 |        |     10 |00:00:00.01 |      24 |                     
|   3 |    NESTED LOOPS                |              |      1 |      1 |     10 |00:00:00.01 |      14 |                     
|   4 |     TABLE ACCESS BY INDEX ROWID| T1           |      1 |      2 |     10 |00:00:00.01 |       5 |                     
|*  5 |      INDEX RANGE SCAN          | SYS_C0030339 |      1 |      2 |     10 |00:00:00.01 |       3 |                     
|*  6 |     INDEX UNIQUE SCAN          | SYS_C0030340 |     10 |      1 |     10 |00:00:00.01 |       9 |                     
|   7 |    TABLE ACCESS BY INDEX ROWID | T2           |     10 |      1 |     10 |00:00:00.01 |      10 |                     
---------------------------------------------------------------------------------------------------------                     

Predicate Information (identified by operation id):  
---------------------------------------------------  
   1 - filter(:N1<=:N2)
   5 - access("T1"."C1">=:N1 AND "T1"."C1"<=:N2)     
   6 - access("T1"."C1"="T2"."C1")                   
       filter(("T2"."C1"<=:N2 AND "T2"."C1">=:N1))    

SQL_ID  bgjafhqwjt1zt, child number 0                
Plan hash value: 4256353061   
---------------------------------------------------------------------------------------------------------                     
| Id  | Operation                      | Name         | Starts | E-Rows | A-Rows |   A-Time   | Buffers |                     
---------------------------------------------------------------------------------------------------------                     
|   0 | SELECT STATEMENT               |              |      1 |        |    100 |00:00:00.01 |     117 |                     
|*  1 |  FILTER                        |              |      1 |        |    100 |00:00:00.01 |     117 |                     
|   2 |   NESTED LOOPS                 |              |      1 |        |    100 |00:00:00.01 |     117 |                     
|   3 |    NESTED LOOPS                |              |      1 |      1 |    100 |00:00:00.01 |      17 |                     
|   4 |     TABLE ACCESS BY INDEX ROWID| T1           |      1 |      2 |    100 |00:00:00.01 |       8 |                     
|*  5 |      INDEX RANGE SCAN          | SYS_C0030339 |      1 |      2 |    100 |00:00:00.01 |       3 |                     
|*  6 |     INDEX UNIQUE SCAN          | SYS_C0030340 |    100 |      1 |    100 |00:00:00.01 |       9 |                     
|   7 |    TABLE ACCESS BY INDEX ROWID | T2           |    100 |      1 |    100 |00:00:00.01 |     100 |                     
---------------------------------------------------------------------------------------------------------                     

Predicate Information (identified by operation id):  
---------------------------------------------------  
   1 - filter(:N1<=:N2)
   5 - access("T1"."C1">=:N1 AND "T1"."C1"<=:N2)     
   6 - access("T1"."C1"="T2"."C1")                   
       filter(("T2"."C1"<=:N2 AND "T2"."C1">=:N1))    

SQL> EXEC :N1:=1
SQL> EXEC :N2:=1000

        C1         C1 T T     
---------- ---------- - -     
         1          1 A A     
         2          2 A A     
         3          3 A A     
         4          4 A A     
         5          5 A A     
...
       997        997 A A     
       998        998 A A     
       999        999 A A     
      1000       1000 A A     

SQL_ID  bgjafhqwjt1zt, child number 0                
Plan hash value: 4256353061   
---------------------------------------------------------------------------------------------------------                     
| Id  | Operation                      | Name         | Starts | E-Rows | A-Rows |   A-Time   | Buffers |                     
---------------------------------------------------------------------------------------------------------                     
|   0 | SELECT STATEMENT               |              |      1 |        |   1000 |00:00:00.01 |    1087 |                     
|*  1 |  FILTER                        |              |      1 |        |   1000 |00:00:00.01 |    1087 |                     
|   2 |   NESTED LOOPS                 |              |      1 |        |   1000 |00:00:00.01 |    1087 |                     
|   3 |    NESTED LOOPS                |              |      1 |      1 |   1000 |00:00:00.01 |      87 |                     
|   4 |     TABLE ACCESS BY INDEX ROWID| T1           |      1 |      2 |   1000 |00:00:00.01 |      61 |                     
|*  5 |      INDEX RANGE SCAN          | SYS_C0030339 |      1 |      2 |   1000 |00:00:00.01 |      13 |                     
|*  6 |     INDEX UNIQUE SCAN          | SYS_C0030340 |   1000 |      1 |   1000 |00:00:00.01 |      26 |                     
|   7 |    TABLE ACCESS BY INDEX ROWID | T2           |   1000 |      1 |   1000 |00:00:00.01 |    1000 |                     
---------------------------------------------------------------------------------------------------------                     

Predicate Information (identified by operation id):  
---------------------------------------------------  
   1 - filter(:N1<=:N2)
   5 - access("T1"."C1">=:N1 AND "T1"."C1"<=:N2)     
   6 - access("T1"."C1"="T2"."C1")                   
       filter(("T2"."C1"<=:N2 AND "T2"."C1">=:N1))   

SQL> EXEC :N1:=1
SQL> EXEC :N2:=10000

        C1         C1 T T     
---------- ---------- - -     
         1          1 A A     
         2          2 A A     
         3          3 A A     
         4          4 A A     
         5          5 A A     
...
      9997       9997 A A     
      9998       9998 A A     
      9999       9999 A A     
     10000      10000 A A     

SQL_ID  bgjafhqwjt1zt, child number 1                
Plan hash value: 2267210268   
-----------------------------------------------------------------------------------------------------------------------------------                  
| Id  | Operation                     | Name         | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |                  
-----------------------------------------------------------------------------------------------------------------------------------                  
|   0 | SELECT STATEMENT              |              |      1 |        |  10000 |00:00:00.03 |     975 |       |       |          |                  
|*  1 |  FILTER                       |              |      1 |        |  10000 |00:00:00.03 |     975 |       |       |          |                  
|*  2 |   HASH JOIN                   |              |      1 |   9999 |  10000 |00:00:00.03 |     975 |  3337K|   921K| 3324K (0)|                  
|   3 |    TABLE ACCESS BY INDEX ROWID| T1           |      1 |  10000 |  10000 |00:00:00.01 |     390 |       |       |          |                  
|*  4 |     INDEX RANGE SCAN          | SYS_C0030339 |      1 |  10000 |  10000 |00:00:00.01 |      19 |       |       |          |                  
|   5 |    TABLE ACCESS BY INDEX ROWID| T2           |      1 |  10000 |  10000 |00:00:00.01 |     585 |       |       |          |                  
|*  6 |     INDEX RANGE SCAN          | SYS_C0030340 |      1 |  10000 |  10000 |00:00:00.01 |     118 |       |       |          |                  
-----------------------------------------------------------------------------------------------------------------------------------                  

Predicate Information (identified by operation id):  
---------------------------------------------------  
   1 - filter(:N1<=:N2)
   2 - access("T1"."C1"="T2"."C1")                   
   4 - access("T1"."C1">=:N1 AND "T1"."C1"<=:N2)     
   6 - access("T2"."C1">=:N1 AND "T2"."C1"<=:N2)     

SQL> EXEC :N1:=1
SQL> EXEC :N2:=100000

        C1         C1 T T     
---------- ---------- - -     
        28         28 A A     
        29         29 A A     
        30         30 A A     
        31         31 A A     
        32         32 A A     
...
     98763      98763 A A     
     98764      98764 A A     
     98765      98765 A A     
     98766      98766 A A     

SQL_ID  bgjafhqwjt1zt, child number 2                
Plan hash value: 487071653    
-----------------------------------------------------------------------------------------------------------------             
| Id  | Operation           | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |             
-----------------------------------------------------------------------------------------------------------------             
|   0 | SELECT STATEMENT    |      |      1 |        |    100K|00:00:00.29 |   23219 |       |       |          |             
|*  1 |  FILTER             |      |      1 |        |    100K|00:00:00.29 |   23219 |       |       |          |             
|*  2 |   HASH JOIN         |      |      1 |    100K|    100K|00:00:00.29 |   23219 |    28M|  3683K|   29M (0)|             
|*  3 |    TABLE ACCESS FULL| T1   |      1 |    100K|    100K|00:00:00.03 |   11128 |       |       |          |             
|*  4 |    TABLE ACCESS FULL| T2   |      1 |    100K|    100K|00:00:00.03 |   12091 |       |       |          |             
-----------------------------------------------------------------------------------------------------------------             

Predicate Information (identified by operation id):  
---------------------------------------------------  
   1 - filter(:N1<=:N2)
   2 - access("T1"."C1"="T2"."C1")                   
   3 - filter(("T1"."C1"<=:N2 AND "T1"."C1">=:N1))   
   4 - filter(("T2"."C1"<=:N2 AND "T2"."C1">=:N1))   

SQL> EXEC :N1:=1
SQL> EXEC :N2:=300000

        C1         C1 T T     
---------- ---------- - -     
        31         31 A A     
        34         34 A A     
        35         35 A A     
        37         37 A A     
        44         44 A A     
...
    276467     276467 A A     
    276910     276910 A A     
    277750     277750 A A     
    277771     277771 A A     

SQL_ID  bgjafhqwjt1zt, child number 3                
Plan hash value: 487071653    
---------------------------------------------------------------------------------------------------------------------------------------------        
| Id  | Operation           | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  | Writes |  OMem |  1Mem | Used-Mem | Used-Tmp|        
---------------------------------------------------------------------------------------------------------------------------------------------        
|   0 | SELECT STATEMENT    |      |      1 |        |    300K|00:00:06.67 |   23351 |  15996 |  15996 |       |       |          |         |        
|*  1 |  FILTER             |      |      1 |        |    300K|00:00:06.67 |   23351 |  15996 |  15996 |       |       |          |         |        
|*  2 |   HASH JOIN         |      |      1 |    300K|    300K|00:00:06.37 |   23351 |  15996 |  15996 |    85M|  7366K|   43M (1)|     132K|        
|*  3 |    TABLE ACCESS FULL| T1   |      1 |    300K|    300K|00:00:00.01 |   11128 |      0 |      0 |       |       |          |         |        
|*  4 |    TABLE ACCESS FULL| T2   |      1 |    300K|    300K|00:00:00.01 |   12223 |      0 |      0 |       |       |          |         |        
---------------------------------------------------------------------------------------------------------------------------------------------        

Predicate Information (identified by operation id):  
---------------------------------------------------  
   1 - filter(:N1<=:N2)
   2 - access("T1"."C1"="T2"."C1")                   
   3 - filter(("T1"."C1">=:N1 AND "T1"."C1"<=:N2))   
   4 - filter(("T2"."C1">=:N1 AND "T2"."C1"<=:N2))   

Next, we will move on to the third section of the script to see the output of TKPROF.  The Oracle 11.1.0.6 client home happened to be first in my path before any other Oracle home.  I renamed the trace file from the server and processed it with TKPROF:

tkprof DIFF_EXPLAIN_PLAN_TEST.trc DIFF_EXPLAIN_PLAN_TEST_TRC.txt

What was in the DIFF_EXPLAIN_PLAN_TEST_TRC.txt file generated by TKPROF?

********************************************************************************
SELECT
  T1.C1,
  T2.C1,
  SUBSTR(T1.C2,1,1) T1_C2,
  SUBSTR(T2.C2,1,1) T2_C2
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1
  AND T1.C1 BETWEEN :N1 AND :N2

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        7      0.00       0.00          0          0          0           0
Execute      7      0.01       0.03          0          0          0           0
Fetch     4120      3.52      10.91      18216      47844          0      411112
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total     4134      3.54      10.95      18216      47844          0      411112

Misses in library cache during parse: 0
Misses in library cache during execute: 4
Optimizer mode: ALL_ROWS
Parsing user id: 518 

Rows     Row Source Operation
-------  ---------------------------------------------------
      2  FILTER  (cr=11 pr=0 pw=0 time=0 us)
      2   NESTED LOOPS  (cr=11 pr=0 pw=0 time=0 us)
      2    NESTED LOOPS  (cr=9 pr=0 pw=0 time=0 us cost=5 size=520 card=1)
      2     TABLE ACCESS BY INDEX ROWID T1 (cr=5 pr=0 pw=0 time=0 us cost=3 size=520 card=2)
      2      INDEX RANGE SCAN SYS_C0030337 (cr=3 pr=0 pw=0 time=0 us cost=2 size=0 card=2)(object id 87187)
      2     INDEX UNIQUE SCAN SYS_C0030338 (cr=4 pr=0 pw=0 time=0 us cost=0 size=0 card=1)(object id 87189)
      2    TABLE ACCESS BY INDEX ROWID T2 (cr=2 pr=0 pw=0 time=0 us cost=1 size=260 card=1)

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                    4120        0.00          0.02
  SQL*Net message from client                  4120        0.01          3.21
  direct path write temp                       1128        0.99          2.06
  direct path read temp                        1128        0.17          5.10
********************************************************************************

Interesting, all seven executions with different bind variable values used the same execution plan.  That means that this command lied?:

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

Wait a minute, that is not in the title of this article.  If only I knew how to read a raw 10046 extended SQL trace, I could read this:

STAT #8 id=1 cnt=2 pid=0 pos=1 obj=0 op='FILTER  (cr=11 pr=0 pw=0 time=0 us)'
STAT #8 id=2 cnt=2 pid=1 pos=1 obj=0 op='NESTED LOOPS  (cr=11 pr=0 pw=0 time=0 us)'
STAT #8 id=3 cnt=2 pid=2 pos=1 obj=0 op='NESTED LOOPS  (cr=9 pr=0 pw=0 time=0 us cost=5 size=520 card=1)'
STAT #8 id=4 cnt=2 pid=3 pos=1 obj=87264 op='TABLE ACCESS BY INDEX ROWID T1 (cr=5 pr=0 pw=0 time=0 us cost=3 size=520 card=2)'
STAT #8 id=5 cnt=2 pid=4 pos=1 obj=87265 op='INDEX RANGE SCAN SYS_C0030339 (cr=3 pr=0 pw=0 time=0 us cost=2 size=0 card=2)'
STAT #8 id=6 cnt=2 pid=3 pos=2 obj=87267 op='INDEX UNIQUE SCAN SYS_C0030340 (cr=4 pr=0 pw=0 time=0 us cost=0 size=0 card=1)'
STAT #8 id=7 cnt=2 pid=2 pos=2 obj=87266 op='TABLE ACCESS BY INDEX ROWID T2 (cr=2 pr=0 pw=0 time=0 us cost=1 size=260 card=1)'
...
STAT #5 id=1 cnt=300000 pid=0 pos=1 obj=0 op='FILTER  (cr=23515 pr=15705 pw=15705 time=4741368 us)'
STAT #5 id=2 cnt=300000 pid=1 pos=1 obj=0 op='HASH JOIN  (cr=23515 pr=15705 pw=15705 time=4741368 us cost=13774 size=156000000 card=300000)'
STAT #5 id=3 cnt=300000 pid=2 pos=1 obj=87264 op='TABLE ACCESS FULL T1 (cr=11128 pr=0 pw=0 time=31126 us cost=3024 size=78000000 card=300000)'
STAT #5 id=4 cnt=300000 pid=2 pos=2 obj=87266 op='TABLE ACCESS FULL T2 (cr=12387 pr=0 pw=0 time=155851 us cost=3024 size=78000000 card=300000)'

If you do know how to read the STAT lines in a 10046 extended SQL trace, you would know that the execution plan definitely did change between the first and last execution when 10046 extended SQL tracing was enabled.

We might be inclined to do some checking, like this:

SELECT
  CHILD_NUMBER,
  EXECUTIONS,
  ROWS_PROCESSED,
  BUFFER_GETS,
  BIND_SET_HASH_VALUE
FROM
  V$SQL_CS_STATISTICS
WHERE
  SQL_ID='bgjafhqwjt1zt'
ORDER BY
  CHILD_NUMBER;

CHILD_NUMBER EXECUTIONS ROWS_PROCESSED BUFFER_GETS BIND_SET_HASH_VALUE
------------ ---------- -------------- ----------- -------------------
           0          1          12001        1087           722894381
           0          1             25         124          2702357211
           1          1         180000         975           982654583
           2          1        1000000       23219          2772294946
           3          1        3000000       23351          1545127490
           4          1             25          11          2702357211
           5          1            140           8             6258482
           6          1           1400          14          4281096765
           7          1          18000         102           722894381

Or, we might try doing something like this:

SPOOL 'DIFF_EXPLAIN_PLANS_ADAPT.TXT'

SELECT
  *
FROM
  TABLE(DBMS_XPLAN.DISPLAY_CURSOR('bgjafhqwjt1zt',NULL,'TYPICAL'));

SPOOL OFF

PLAN_TABLE_OUTPUT      
------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  bgjafhqwjt1zt, child number 0                    
-------------------------------------                    
SELECT   T1.C1,   T2.C1,   SUBSTR(T1.C2,1,1) T1_C2,   SUBSTR(T2.C2,1,1)         
T2_C2 FROM   T1,   T2 WHERE   T1.C1=T2.C1   AND T1.C1 BETWEEN :N1 AND           
:N2                    

Plan hash value: 4256353061       
-----------------------------------------------------------------------------------------------        
| Id  | Operation                      | Name         | Rows  | Bytes | Cost (%CPU)| Time     |        
-----------------------------------------------------------------------------------------------        
|   0 | SELECT STATEMENT               |              |       |       |     5 (100)|          |        
|*  1 |  FILTER                        |              |       |       |            |          |        
|   2 |   NESTED LOOPS                 |              |       |       |            |          |        
|   3 |    NESTED LOOPS                |              |     1 |   520 |     5   (0)| 00:00:01 |        
|   4 |     TABLE ACCESS BY INDEX ROWID| T1           |     2 |   520 |     3   (0)| 00:00:01 |        
|*  5 |      INDEX RANGE SCAN          | SYS_C0030339 |     2 |       |     2   (0)| 00:00:01 |        
|*  6 |     INDEX UNIQUE SCAN          | SYS_C0030340 |     1 |       |     0   (0)|          |        
|   7 |    TABLE ACCESS BY INDEX ROWID | T2           |     1 |   260 |     1   (0)| 00:00:01 |        
-----------------------------------------------------------------------------------------------        

Predicate Information (identified by operation id):      
---------------------------------------------------      
   1 - filter(:N1<=:N2)
   5 - access("T1"."C1">=:N1 AND "T1"."C1"<=:N2)         
   6 - access("T1"."C1"="T2"."C1")                       
       filter(("T2"."C1"<=:N2 AND "T2"."C1">=:N1))       

SQL_ID  bgjafhqwjt1zt, child number 1                    
-------------------------------------                    
SELECT   T1.C1,   T2.C1,   SUBSTR(T1.C2,1,1) T1_C2,   SUBSTR(T2.C2,1,1)         
T2_C2 FROM   T1,   T2 WHERE   T1.C1=T2.C1   AND T1.C1 BETWEEN :N1 AND           
:N2                    

Plan hash value: 2267210268       
------------------------------------------------------------------------------------------------------ 
| Id  | Operation                     | Name         | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     | 
------------------------------------------------------------------------------------------------------ 
|   0 | SELECT STATEMENT              |              |       |       |       |  1042 (100)|          | 
|*  1 |  FILTER                       |              |       |       |       |            |          | 
|*  2 |   HASH JOIN                   |              |  9999 |  5077K|  2664K|  1042   (1)| 00:00:13 | 
|   3 |    TABLE ACCESS BY INDEX ROWID| T1           | 10000 |  2539K|       |   391   (0)| 00:00:05 | 
|*  4 |     INDEX RANGE SCAN          | SYS_C0030339 | 10000 |       |       |    20   (0)| 00:00:01 | 
|   5 |    TABLE ACCESS BY INDEX ROWID| T2           | 10000 |  2539K|       |   391   (0)| 00:00:05 | 
|*  6 |     INDEX RANGE SCAN          | SYS_C0030340 | 10000 |       |       |    20   (0)| 00:00:01 | 
------------------------------------------------------------------------------------------------------ 

Predicate Information (identified by operation id):      
---------------------------------------------------      
   1 - filter(:N1<=:N2)
   2 - access("T1"."C1"="T2"."C1")                       
   4 - access("T1"."C1">=:N1 AND "T1"."C1"<=:N2)         
   6 - access("T2"."C1">=:N1 AND "T2"."C1"<=:N2)         

SQL_ID  bgjafhqwjt1zt, child number 2                    
-------------------------------------                    
SELECT   T1.C1,   T2.C1,   SUBSTR(T1.C2,1,1) T1_C2,   SUBSTR(T2.C2,1,1)         
T2_C2 FROM   T1,   T2 WHERE   T1.C1=T2.C1   AND T1.C1 BETWEEN :N1 AND           
:N2                    

Plan hash value: 487071653        
------------------------------------------------------------------------------------                   
| Id  | Operation           | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |                   
------------------------------------------------------------------------------------                   
|   0 | SELECT STATEMENT    |      |       |       |       |  8623 (100)|          |                   
|*  1 |  FILTER             |      |       |       |       |            |          |                   
|*  2 |   HASH JOIN         |      |   100K|    49M|    25M|  8623   (1)| 00:01:44 |                   
|*  3 |    TABLE ACCESS FULL| T1   |   100K|    24M|       |  3023   (1)| 00:00:37 |                   
|*  4 |    TABLE ACCESS FULL| T2   |   100K|    24M|       |  3023   (1)| 00:00:37 |                   
------------------------------------------------------------------------------------                   

Predicate Information (identified by operation id):      
---------------------------------------------------      
   1 - filter(:N1<=:N2)
   2 - access("T1"."C1"="T2"."C1")                       
   3 - filter(("T1"."C1"<=:N2 AND "T1"."C1">=:N1))       
   4 - filter(("T2"."C1"<=:N2 AND "T2"."C1">=:N1))       

SQL_ID  bgjafhqwjt1zt, child number 3                    
-------------------------------------                    
SELECT   T1.C1,   T2.C1,   SUBSTR(T1.C2,1,1) T1_C2,   SUBSTR(T2.C2,1,1)         
T2_C2 FROM   T1,   T2 WHERE   T1.C1=T2.C1   AND T1.C1 BETWEEN :N1 AND           
:N2                    

Plan hash value: 487071653        
------------------------------------------------------------------------------------                   
| Id  | Operation           | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |                   
------------------------------------------------------------------------------------                   
|   0 | SELECT STATEMENT    |      |       |       |       | 13774 (100)|          |                   
|*  1 |  FILTER             |      |       |       |       |            |          |                   
|*  2 |   HASH JOIN         |      |   300K|   148M|    77M| 13774   (1)| 00:02:46 |                   
|*  3 |    TABLE ACCESS FULL| T1   |   300K|    74M|       |  3024   (1)| 00:00:37 |                   
|*  4 |    TABLE ACCESS FULL| T2   |   300K|    74M|       |  3024   (1)| 00:00:37 |                   
------------------------------------------------------------------------------------                   

Predicate Information (identified by operation id):      
---------------------------------------------------      
   1 - filter(:N1<=:N2)
   2 - access("T1"."C1"="T2"."C1")                       
   3 - filter(("T1"."C1">=:N1 AND "T1"."C1"<=:N2))       
   4 - filter(("T2"."C1">=:N1 AND "T2"."C1"<=:N2))       

SQL_ID  bgjafhqwjt1zt, child number 4                    
-------------------------------------                    
SELECT   T1.C1,   T2.C1,   SUBSTR(T1.C2,1,1) T1_C2,   SUBSTR(T2.C2,1,1)         
T2_C2 FROM   T1,   T2 WHERE   T1.C1=T2.C1   AND T1.C1 BETWEEN :N1 AND           
:N2                    

Plan hash value: 4256353061       
-----------------------------------------------------------------------------------------------        
| Id  | Operation                      | Name         | Rows  | Bytes | Cost (%CPU)| Time     |        
-----------------------------------------------------------------------------------------------        
|   0 | SELECT STATEMENT               |              |       |       |     5 (100)|          |        
|*  1 |  FILTER                        |              |       |       |            |          |        
|   2 |   NESTED LOOPS                 |              |       |       |            |          |        
|   3 |    NESTED LOOPS                |              |     1 |   520 |     5   (0)| 00:00:01 |        
|   4 |     TABLE ACCESS BY INDEX ROWID| T1           |     2 |   520 |     3   (0)| 00:00:01 |        
|*  5 |      INDEX RANGE SCAN          | SYS_C0030339 |     2 |       |     2   (0)| 00:00:01 |        
|*  6 |     INDEX UNIQUE SCAN          | SYS_C0030340 |     1 |       |     0   (0)|          |        
|   7 |    TABLE ACCESS BY INDEX ROWID | T2           |     1 |   260 |     1   (0)| 00:00:01 |        
-----------------------------------------------------------------------------------------------        

Predicate Information (identified by operation id):      
---------------------------------------------------      
   1 - filter(:N1<=:N2)
   5 - access("T1"."C1">=:N1 AND "T1"."C1"<=:N2)         
   6 - access("T1"."C1"="T2"."C1")                       
       filter(("T2"."C1"<=:N2 AND "T2"."C1">=:N1))       

SQL_ID  bgjafhqwjt1zt, child number 5                    
-------------------------------------                    
SELECT   T1.C1,   T2.C1,   SUBSTR(T1.C2,1,1) T1_C2,   SUBSTR(T2.C2,1,1)         
T2_C2 FROM   T1,   T2 WHERE   T1.C1=T2.C1   AND T1.C1 BETWEEN :N1 AND           
:N2                    

Plan hash value: 2267210268       
----------------------------------------------------------------------------------------------         
| Id  | Operation                     | Name         | Rows  | Bytes | Cost (%CPU)| Time     |         
----------------------------------------------------------------------------------------------         
|   0 | SELECT STATEMENT              |              |       |       |     7 (100)|          |         
|*  1 |  FILTER                       |              |       |       |            |          |         
|*  2 |   HASH JOIN                   |              |     9 |  4680 |     7  (15)| 00:00:01 |         
|   3 |    TABLE ACCESS BY INDEX ROWID| T1           |    10 |  2600 |     3   (0)| 00:00:01 |         
|*  4 |     INDEX RANGE SCAN          | SYS_C0030339 |    10 |       |     2   (0)| 00:00:01 |         
|   5 |    TABLE ACCESS BY INDEX ROWID| T2           |    10 |  2600 |     3   (0)| 00:00:01 |         
|*  6 |     INDEX RANGE SCAN          | SYS_C0030340 |    10 |       |     2   (0)| 00:00:01 |         
----------------------------------------------------------------------------------------------         

Predicate Information (identified by operation id):      
---------------------------------------------------      
   1 - filter(:N1<=:N2)
   2 - access("T1"."C1"="T2"."C1")                       
   4 - access("T1"."C1">=:N1 AND "T1"."C1"<=:N2)         
   6 - access("T2"."C1">=:N1 AND "T2"."C1"<=:N2)         

SQL_ID  bgjafhqwjt1zt, child number 6                    
-------------------------------------                    
SELECT   T1.C1,   T2.C1,   SUBSTR(T1.C2,1,1) T1_C2,   SUBSTR(T2.C2,1,1)         
T2_C2 FROM   T1,   T2 WHERE   T1.C1=T2.C1   AND T1.C1 BETWEEN :N1 AND           
:N2                    

Plan hash value: 2267210268       
----------------------------------------------------------------------------------------------         
| Id  | Operation                     | Name         | Rows  | Bytes | Cost (%CPU)| Time     |         
----------------------------------------------------------------------------------------------         
|   0 | SELECT STATEMENT              |              |       |       |    13 (100)|          |         
|*  1 |  FILTER                       |              |       |       |            |          |         
|*  2 |   HASH JOIN                   |              |    99 | 51480 |    13   (8)| 00:00:01 |         
|   3 |    TABLE ACCESS BY INDEX ROWID| T1           |   100 | 26000 |     6   (0)| 00:00:01 |         
|*  4 |     INDEX RANGE SCAN          | SYS_C0030339 |   100 |       |     2   (0)| 00:00:01 |         
|   5 |    TABLE ACCESS BY INDEX ROWID| T2           |   100 | 26000 |     6   (0)| 00:00:01 |         
|*  6 |     INDEX RANGE SCAN          | SYS_C0030340 |   100 |       |     2   (0)| 00:00:01 |         
----------------------------------------------------------------------------------------------         

Predicate Information (identified by operation id):      
---------------------------------------------------      
   1 - filter(:N1<=:N2)
   2 - access("T1"."C1"="T2"."C1")                       
   4 - access("T1"."C1">=:N1 AND "T1"."C1"<=:N2)         
   6 - access("T2"."C1">=:N1 AND "T2"."C1"<=:N2)         

SQL_ID  bgjafhqwjt1zt, child number 7                    
-------------------------------------                    
SELECT   T1.C1,   T2.C1,   SUBSTR(T1.C2,1,1) T1_C2,   SUBSTR(T2.C2,1,1)         
T2_C2 FROM   T1,   T2 WHERE   T1.C1=T2.C1   AND T1.C1 BETWEEN :N1 AND           
:N2                    

Plan hash value: 2267210268       
----------------------------------------------------------------------------------------------         
| Id  | Operation                     | Name         | Rows  | Bytes | Cost (%CPU)| Time     |         
----------------------------------------------------------------------------------------------         
|   0 | SELECT STATEMENT              |              |       |       |    83 (100)|          |         
|*  1 |  FILTER                       |              |       |       |            |          |         
|*  2 |   HASH JOIN                   |              |   999 |   507K|    83   (2)| 00:00:01 |         
|   3 |    TABLE ACCESS BY INDEX ROWID| T1           |  1000 |   253K|    41   (0)| 00:00:01 |         
|*  4 |     INDEX RANGE SCAN          | SYS_C0030339 |  1000 |       |     3   (0)| 00:00:01 |         
|   5 |    TABLE ACCESS BY INDEX ROWID| T2           |  1000 |   253K|    41   (0)| 00:00:01 |         
|*  6 |     INDEX RANGE SCAN          | SYS_C0030340 |  1000 |       |     3   (0)| 00:00:01 |         
----------------------------------------------------------------------------------------------         

Predicate Information (identified by operation id):      
---------------------------------------------------      
   1 - filter(:N1<=:N2)
   2 - access("T1"."C1"="T2"."C1")                       
   4 - access("T1"."C1">=:N1 AND "T1"."C1"<=:N2)         
   6 - access("T2"."C1">=:N1 AND "T2"."C1"<=:N2)         

You might be wondering if TKPROF in 11.1.0.7 or 11.2.0.1 lie about the execution plan captured in an extended 10046 SQL trace.  I would tell you, but this blog article is now too long. :-)





The Effects of Potential NULL Values in Row Sources During Updates Using an IN Subquery

10 01 2010

January 10, 2010

A couple of years ago the following question appeared in the comp.databases.oracle.server Usenet group:
http://groups.google.com/group/comp.databases.oracle.server/browse_thread/thread/f3c3dea939bf9c12

I’m doing the following update (in Oracle Database 9.2.0.5):

UPDATE table1 t1
SET field1 = NULL
WHERE field2 NOT IN (SELECT /*+ HASH_AJ */ t2.field2 FROM table2 t2)

Instead of doing the ANTIJOIN, The database is performing a FILTER on
table1 by reading table2. Why doesn’t the hint work? For several
tables other that table1 the hint does work.

I did not mention it at the time, but I am not sure if that hint is valid inside the subquery – I think that it needs to appear immediately after the UPDATE keyword.  Taking a guess at the original poster’s problem, I set up a simple test case using Oracle 10.2.0.2:

CREATE TABLE T1 (FIELD1 NUMBER(12), FIELD2 NUMBER(12) NOT NULL);
CREATE TABLE T2 (FIELD1 NUMBER(12), FIELD2 NUMBER(12) NOT NULL);

INSERT INTO
  T1
SELECT
  100,
  ROWNUM*3
FROM
  DUAL
CONNECT BY
  LEVEL<=100000;

INSERT INTO
  T2
SELECT
  100,
  ROWNUM*9
FROM
  DUAL
CONNECT BY
  LEVEL<=100000;

COMMIT;
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T1');
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T2');

I enabled a 10053 trace and then executed the following SQL statement:

UPDATE t1
SET field1 = NULL
WHERE field2 NOT IN (SELECT /*+ HASH_AJ */ t2.field2 FROM t2)

The 10053 trace file showed the following plan:

============
Plan Table
============
-----------------------------------------+-----------------------------------+
| Id  | Operation              | Name    | Rows  | Bytes | Cost  | Time      |
-----------------------------------------+-----------------------------------+
| 0   | UPDATE STATEMENT       |         |       |       |   307 |           |
| 1   |  UPDATE                | T1      |       |       |       |           |
| 2   |   HASH JOIN RIGHT ANTI |         |   208 |  2496 |   307 |  00:00:04 |
| 3   |    TABLE ACCESS FULL   | T2      |   98K |  488K |    44 |  00:00:01 |
| 4   |    TABLE ACCESS FULL   | T1      |   98K |  684K |    86 |  00:00:02 |
-----------------------------------------+-----------------------------------+

That looks like the plan that the OP would like to see.  Now, repeat the test with the following table definitions:

CREATE TABLE T1 (FIELD1 NUMBER(12), FIELD2 NUMBER(12));
CREATE TABLE T2 (FIELD1 NUMBER(12), FIELD2 NUMBER(12));

The 10053 trace file showed the following execution plan when executing the same SQL statement:

============
Plan Table
============
---------------------------------------+-----------------------------------+
| Id  | Operation            | Name    | Rows  | Bytes | Cost  | Time      |
---------------------------------------+-----------------------------------+
| 0   | UPDATE STATEMENT     |         |       |       | 3886K |           |
| 1   |  UPDATE              | T1      |       |       |       |           |
| 2   |   FILTER             |         |       |       |       |           |
| 3   |    TABLE ACCESS FULL | T1      |   98K |  684K |    44 |  00:00:01 |
| 4   |    TABLE ACCESS FULL | T2      |     1 |     5 |    45 |  00:00:01 |
---------------------------------------+-----------------------------------+

My last comment in the post suggested that the problem might be that even though there might not be any NULL values in the columns, the column definitions might permit NULL values, and that alone might restrict the options that are available to the optimizer for re-writing the SQL statement into a more efficient form.

Let’s try another test case.  Let’s create two tables and see how a similar update statement performs with a larger dataset, for instance two tables with 10,000,000 rows each.  The table creation script follows:

CREATE TABLE T1 (COL1 NUMBER(12), COL2 NUMBER(12));
CREATE TABLE T2 (COL1 NUMBER(12), COL2 NUMBER(12));

INSERT INTO
  T1
SELECT
  100,
  ROWNUM*3
FROM
  (SELECT
    ROWNUM RN
  FROM
    DUAL
  CONNECT BY
    LEVEL<=10000) V1,
  (SELECT
    ROWNUM RN
  FROM
    DUAL
  CONNECT BY
    LEVEL<=1000) V2;

INSERT INTO
  T2
SELECT
  100,
  ROWNUM*9
FROM
  (SELECT
    ROWNUM RN
  FROM
    DUAL
  CONNECT BY
    LEVEL<=10000) V1,
  (SELECT
    ROWNUM RN
  FROM
    DUAL
  CONNECT BY
    LEVEL<=1000) V2;

COMMIT;

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T1',ESTIMATE_PERCENT=>NULL)
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T2',ESTIMATE_PERCENT=>NULL)

The test begins on Oracle 11.1.0.7 and later shifts to Oracle 10.2.0.4.  In the test script, we enable a 10053 trace, execute the UPDATE statement, display the execution plan with runtime statistics, move the hint and re-execute the UPDATE statement, remove the hint and specify COL2 IS NOT NULL in the WHERE clause, modify the table columns to add a NOT NULL constraint and try again with no HASH_AJ hint.  The script follows:

SPOOL ANTIJOIN_TEST11.TXT
SET LINESIZE 150
SET PAGESIZE 2000

UPDATE /*+ GATHER_PLAN_STATISTICS */
  T1
SET
  COL1 = NULL
WHERE
  COL2 NOT IN (
    SELECT /*+ HASH_AJ */
      T2.COL2
    FROM
      T2);

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

ROLLBACK;

ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE;

ALTER SESSION SET TRACEFILE_IDENTIFIER = 'ANTIJOIN_TEST2';

UPDATE /*+ HASH_AJ GATHER_PLAN_STATISTICS */
  T1
SET
  COL1 = NULL
WHERE
  COL2 NOT IN (
    SELECT /*+  */
      T2.COL2
    FROM
      T2);

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

ROLLBACK;

ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SESSION SET TRACEFILE_IDENTIFIER = 'ANTIJOIN_TEST3';

UPDATE /*+ GATHER_PLAN_STATISTICS */
  T1
SET
  T1.COL1 = NULL
WHERE
  COL2 NOT IN (
    SELECT /*+  */
      T2.COL2
    FROM
      T2
    WHERE
      T2.COL2 IS NOT NULL)
  AND T1.COL2 IS NOT NULL;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

ROLLBACK;

ALTER TABLE T1 MODIFY COL2 NOT NULL;
ALTER TABLE T2 MODIFY COL2 NOT NULL;

ALTER SESSION SET TRACEFILE_IDENTIFIER = 'ANTIJOIN_TEST4';

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T1',ESTIMATE_PERCENT=>NULL,NO_INVALIDATE=>FALSE)
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T2',ESTIMATE_PERCENT=>NULL,NO_INVALIDATE=>FALSE)

ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE;

ALTER SESSION SET TRACEFILE_IDENTIFIER = 'ANTIJOIN_TEST5';
 
UPDATE /*+ GATHER_PLAN_STATISTICS */
  T1
SET
  COL1 = NULL
WHERE
  COL2 NOT IN (
    SELECT /*+  */
      T2.COL2
    FROM
      T2);
 
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));
 
ROLLBACK;

How did we do on Oracle 11.1.0.7?

The first update statement had an execution plan like this:

UPDATE /*+ GATHER_PLAN_STATISTICS */   T1 SET   COL1 = NULL WHERE                                                                          
COL2 NOT IN (     SELECT /*+ HASH_AJ */       T2.COL2     FROM       T2)                                                                   

Plan hash value: 875068713   

--------------------------------------------------------------------------------------------------------------------------------------------------      
| Id  | Operation                | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  | Writes |  OMem |  1Mem | Used-Mem | Used-Tmp|      
--------------------------------------------------------------------------------------------------------------------------------------------------      
|   0 | UPDATE STATEMENT         |      |      1 |        |      0 |00:04:07.39 |    6898K|  85474 |  36766 |       |       |          |         |      
|   1 |  UPDATE                  | T1   |      1 |        |      0 |00:04:07.39 |    6898K|  85474 |  36766 |       |       |          |         |      
|*  2 |   HASH JOIN RIGHT ANTI NA|      |      1 |   9999K|   6666K|00:00:30.58 |   38576 |  75353 |  36766 |   196M|    10M|   46M (1)|     298K|      
|   3 |    TABLE ACCESS FULL     | T2   |      1 |     10M|     10M|00:00:10.03 |   19288 |  19278 |      0 |       |       |          |         |      
|   4 |    TABLE ACCESS FULL     | T1   |      1 |     10M|     10M|00:00:00.12 |   19288 |  19278 |      0 |       |       |          |         |      
--------------------------------------------------------------------------------------------------------------------------------------------------      

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("COL2"="T2"."COL2")                 

According to the DBMS_XPLAN output, even without the table columns declared as NOT NULL, Oracle was able to use a special form of a NULL aware hash join anti – the query completed in just over 4 minutes.  According to the 10053 trace, only the GATHER_PLAN_STATISTICS hint was recognized.  The final query after transformation follows:

SELECT 0 FROM "TESTUSER"."T2" "T2","TESTUSER"."T1" "T1" WHERE "T1"."COL2"="T2"."COL2"

Moving the HASH_AJ hint so that it is next to the GATHER_PLAN_STATISTICS hint caused the optimizer to recognize the hint, but the 10053 trace file showed that the hint was not used.  The execution plan was identical to the execution plan above, although the actual time and number of consistent/current mode gets differed slightly.

Specifying the IS NOT NULL restriction in the WHERE clause did change the execution plan to a standard HASH JOIN ANTI:

UPDATE /*+ GATHER_PLAN_STATISTICS */   T1 SET   T1.COL1 = NULL WHERE                                                                       
COL2 NOT IN (     SELECT /*+  */       T2.COL2     FROM       T2                                                                           
WHERE       T2.COL2 IS NOT NULL)   AND T1.COL2 IS NOT NULL                                                                                 

Plan hash value: 2180616727  

-----------------------------------------------------------------------------------------------------------------------------------------------         
| Id  | Operation             | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  | Writes |  OMem |  1Mem | Used-Mem | Used-Tmp|         
-----------------------------------------------------------------------------------------------------------------------------------------------         
|   0 | UPDATE STATEMENT      |      |      1 |        |      0 |00:04:41.32 |    6926K|  84797 |  36766 |       |       |          |         |         
|   1 |  UPDATE               | T1   |      1 |        |      0 |00:04:41.32 |    6926K|  84797 |  36766 |       |       |          |         |         
|*  2 |   HASH JOIN RIGHT ANTI|      |      1 |   9999K|   6666K|00:00:57.74 |   38576 |  75353 |  36766 |   196M|    10M|   46M (1)|     298K|         
|*  3 |    TABLE ACCESS FULL  | T2   |      1 |     10M|     10M|00:00:00.09 |   19288 |  19278 |      0 |       |       |          |         |         
|*  4 |    TABLE ACCESS FULL  | T1   |      1 |     10M|     10M|00:00:00.12 |   19288 |  19278 |      0 |       |       |          |         |         
-----------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("COL2"="T2"."COL2")                 
   3 - filter("T2"."COL2" IS NOT NULL)            
   4 - filter("T1"."COL2" IS NOT NULL)

The number of consistent gets increased slightly, and so did the actual run time.

Let’s take a look at the statistics from the plan when the table columns have the NOT NULL constraint:

-----------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation             | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  | Writes |  OMem |  1Mem | Used-Mem | Used-Tmp|
-----------------------------------------------------------------------------------------------------------------------------------------------
|   0 | UPDATE STATEMENT      |      |      1 |        |      0 |00:04:47.51 |    6930K|  84435 |  36766 |       |       |          |         |
|   1 |  UPDATE               | T1   |      1 |        |      0 |00:04:47.51 |    6930K|  84435 |  36766 |       |       |          |         |
|*  2 |   HASH JOIN RIGHT ANTI|      |      1 |   9999K|   6666K|00:00:29.61 |   38576 |  75353 |  36766 |   196M|    10M|   46M (1)|     298K|
|   3 |    TABLE ACCESS FULL  | T2   |      1 |     10M|     10M|00:00:00.01 |   19288 |  19278 |      0 |       |       |          |         |
|   4 |    TABLE ACCESS FULL  | T1   |      1 |     10M|     10M|00:00:00.12 |   19288 |  19278 |      0 |       |       |          |         |
-----------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("COL2"="T2"."COL2")      

The same execution plan was used again, but the number of consistent gets and actual time increased even more.  In case you are wondering, the 10053 trace file shows that the final query after transformation is this:

SELECT 0 FROM "TESTUSER"."T2" "T2","TESTUSER"."T1" "T1" WHERE "T1"."COL2"="T2"."COL2"

—-

So, Oracle 11.1.0.7 is able to take advantage of the NULL aware “HASH JOIN RIGHT ANTI NA” operation.  Does this help?  Well, let’s try the test again on Oracle 10.2.0.4 running on the same computer.  The execution plan from the 10053 trace might look something like this for the first SQL statement with the hint in the wrong location:

UPDATE /*+ GATHER_PLAN_STATISTICS */   T1 SET   COL1 = NULL WHERE   COL2 NOT
IN (     SELECT /*+ HASH_AJ */       T2.COL2     FROM       T2)

============
Plan Table
============
---------------------------------------+-----------------------------------+
| Id  | Operation            | Name    | Rows  | Bytes | Cost  | Time      |
---------------------------------------+-----------------------------------+
| 0   | UPDATE STATEMENT     |         |       |       |   50G |           |
| 1   |  UPDATE              | T1      |       |       |       |           |
| 2   |   FILTER             |         |       |       |       |           |
| 3   |    TABLE ACCESS FULL | T1      | 9766K |   76M |  5332 |  00:01:04 |
| 4   |    TABLE ACCESS FULL | T2      |     1 |     6 |  5348 |  00:01:05 |
---------------------------------------+-----------------------------------+
Predicate Information:
----------------------
2 - filter( IS NULL)
4 - filter(LNNVL("T2"."COL2"<>:B1))

Note that the execution plan includes a FILTER operation on ID 2 of the plan – that is what the OP was seeing.  The final query after transformation looked something like this:

SELECT 0 FROM "TESTUSER"."T1" "SYS_ALIAS_1" WHERE  NOT EXISTS (SELECT /*+ HASH_AJ */ 0 FROM "TESTUSER"."T2" "T2" WHERE LNNVL("T2"."COL2"<>"SYS_ALIAS_1"."COL2"))

So, what did the execution plan look like with the statistics?  I don’t know – I killed it after a couple hours.  When I killed it, the SQL statement had processed 114,905,284 consistent or current mode gets and burned through 9,763.26 seconds of CPU time.

That just means we need a more powerful server, right?  OK, let’s try another experiment on a more powerful server.  This time, we will use a 10046 trace at level 8 using the query with the IS NOT NULL condition in the WHERE clause and gradually decrease the OPTIMIZER_FEATURES_ENABLED parameter from 11.1.0.7 to 10.2.0.1 to 9.2.0 to 8.1.7.

11.1.0.7:

Cursor 7   Ver 1   Parse at 0.000000  Similar Cnt 1
|PARSEs       1|CPU S    0.000000|CLOCK S    0.000000|ROWs        0|PHY RD BLKs         0|CON RD BLKs (Mem)         0|CUR RD BLKs (Mem)         0|SHARED POOL MISs      1|
|EXECs        1|CPU S   49.077915|CLOCK S   64.402548|ROWs  6666667|PHY RD BLKs     27342|CON RD BLKs (Mem)     39170|CUR RD BLKs (Mem)   6816007|SHARED POOL MISs      0|
|FETCHs       0|CPU S    0.000000|CLOCK S    0.000000|ROWs        0|PHY RD BLKs         0|CON RD BLKs (Mem)         0|CUR RD BLKs (Mem)         0|SHARED POOL MISs      0|

UPDATE
  T1
SET
  T1.COL1 = NULL
WHERE
  COL2 NOT IN (
    SELECT /*+  */
      T2.COL2
    FROM
      T2
    WHERE
      T2.COL2 IS NOT NULL)
  AND T1.COL2 IS NOT NULL

       (Rows 0)   UPDATE  T1 (cr=39170 pr=27342 pw=27311 time=0 us)
 (Rows 6666667)    HASH JOIN RIGHT ANTI (cr=38576 pr=27342 pw=27311 time=93577 us cost=28657 size=28 card=2)
(Rows 10000000)     TABLE ACCESS FULL T2 (cr=19288 pr=0 pw=0 time=0 us cost=5290 size=60000000 card=10000000)
(Rows 10000000)     TABLE ACCESS FULL T1 (cr=19288 pr=0 pw=0 time=15717 us cost=5290 size=80000000 card=10000000)

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  direct path write temp                        881        0.14          1.60
  direct path read temp                         882        0.40         12.03
  latch: checkpoint queue latch                   4        0.00          0.00
  log file switch completion                      8        0.54          1.01
  SQL*Net message to client                       1        0.00          0.00
  SQL*Net message from client                     1        0.00          0.00

64.4 seconds of total run time with 49.1 seconds of CPU time – I guess that is a bit faster.  Quite a few consistent gets and current mode gets.

10.2.0.1 and 9.2.0.1 generated the same execution plan as the above, so the statistics should also be about the same.

8.1.7:

Well, we ran into a problem with this one – I killed it after a bit over an hour.  The TYPICAL execution plan looked like this:

SQL_ID  11apc6grf5fph, child number 0
-------------------------------------
UPDATE   T1 SET   T1.COL1 = NULL WHERE   COL2 NOT IN (     SELECT /*+ 
*/       T2.COL2     FROM       T2     WHERE       T2.COL2 IS NOT NULL)
  AND T1.COL2 IS NOT NULL

Plan hash value: 3288325718

---------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost  | Inst   |
---------------------------------------------------------------------
|   0 | UPDATE STATEMENT    |      |       |       |  2926 |        |
|   1 |  UPDATE             | T1   |       |       |       |   OR11 |
|*  2 |   FILTER            |      |       |       |       |        |
|*  3 |    TABLE ACCESS FULL| T1   |   500K|  3906K|  2926 |   OR11 |
|*  4 |    TABLE ACCESS FULL| T2   |     1 |     6 |  2926 |   OR11 |
---------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter( IS NULL)
   3 - filter("T1"."COL2" IS NOT NULL)
   4 - filter(("T2"."COL2"=:B1 AND "T2"."COL2" IS NOT NULL))

That darn FILTER on ID 2 is back again, like what the OP saw.  What were the execution statistics at that point?  4853.41 seconds of CPU time and 212,520,861 consistent or current mode gets.  I know, let’s throw some more hardware at it, because that is the cheap and completely safe, risk-free solution, rather than doing a root cause analysis of the problem (that has to be true, I read it on the Internet – more than once :-) ).





DATE Datatype Or NUMBER Datatype – Which Should be Used?

6 01 2010

January 6, 2010

In a recent discussion thread on the Oracle forums, the following question was asked:
http://forums.oracle.com/forums/thread.jspa?threadID=1007653

I have a scenario where I need to store the data in the format YYYYMM (e.g. 201001 which means January, 2010).  I am trying to evaluate what is the most appropriate datatype to store this kind of data. I am comparing 2 options, NUMBER and DATE.  As the data is essentially a component of oracle date datatype and experts like Tom Kyte have proved (with examples) that using right datatype is better for optimizer. So I was expecting that using DATE datatype will yield (at least) similar (if not better) cardinality estimates than using NUMBER datatype. However, my tests show that when using DATE the cardinality estimates are way off from actuals whereas sing NUMBER the cardinality estimates are much closer to actuals.
My questions are:
1) What should be the most appropriate datatype used to store YYYYMM data?
2) Why does using DATE datatype yield estimates that are way off from actuals than using NUMBER datatype?

Test case (update Jan 7, 2010 : there was a copy-paste error in the line for collecting statistics on table B – the original version of the script posted here collected statistics on table A twice):

create table a nologging as select to_number(to_char(add_months(to_date('200101','YYYYMM'),level - 1), 'YYYYMM')) id from dual connect by level <= 289;

create table b (id number) ;

begin
  for i in 1..8192
  loop
    insert into b select * from a ;
  end loop;
  commit;
end;
/ 

alter table a add dt date;

alter table b add dt date;

update a set dt = to_date(id, 'YYYYMM');

update b set dt = to_date(id, 'YYYYMM');

commit;

exec dbms_stats.gather_table_stats(user, 'A', estimate_percent=>NULL);

exec dbms_stats.gather_table_stats(user, 'B', estimate_percent=>NULL);

explain plan for select count(*) from b where id between 200810 and 200903;
select * from table(dbms_xplan.display);

explain plan for select count(*) from b where dt between to_date(200810, 'YYYYMM') and to_date(200903, 'YYYYMM');
select * from table(dbms_xplan.display);

This is an interesting problem, why would using the NUMBER datatype yield better cardinality estimates than the example with the DATE datatype?  When the NUMBER datatype was used, the optimizer predicted that the full table scan operation would return 46,604 rows, while the optimizer predicted that the full table scan would return 5,919 rows when the DATE datatype was used – the actual number of rows returned is 49,152.

The person who posted the above test case later stated that he believes that the DATE datatype is the correct choice, but he would have a difficult time justifying that opinion when confronted by someone suggesting the use of the NUMBER data type.

I posted the test results from my run with Oracle 11.1.0.7:

SQL> set autotrace traceonly explain
SQL> select count(*) from b where id between 200810 and 200903 ;

Execution Plan
----------------------------------------------------------
Plan hash value: 749587668

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     5 |  4715   (1)| 00:00:57 |
|   1 |  SORT AGGREGATE    |      |     1 |     5 |            |          |
|*  2 |   TABLE ACCESS FULL| B    |   108K|   527K|  4715   (1)| 00:00:57 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("ID"<=200903 AND "ID">=200810)

SQL> select count(*) from b where dt between to_date(200810, 'YYYYMM') and to_date(200903, 'YYYYMM') ;

Execution Plan
----------------------------------------------------------
Plan hash value: 749587668

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     8 |  4718   (2)| 00:00:57 |
|   1 |  SORT AGGREGATE    |      |     1 |     8 |            |          |
|*  2 |   TABLE ACCESS FULL| B    | 57166 |   446K|  4718   (2)| 00:00:57 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("DT"<=TO_DATE(' 2009-03-01 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss') AND "DT">=TO_DATE(' 2008-10-01 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss'))

SQL> set autotrace off
SQL> select count(*) from b where id between 200810 and 200903 ;

  COUNT(*)
----------
     49152

SQL> select count(*) from b where dt between to_date(200810, 'YYYYMM') and to_date(200903, 'YYYYMM') ;

  COUNT(*)
----------
     49152 

Well, it seems that Oracle 11.1.0.7 predicted that when the NUMBER datatype was used, the full table scan would return roughly 108,000 rows.  Oracle 11.1.0.7 predicted that when the DATE datatype was used, the full table scan would return 57,166 rows – significantly closer to the actual number of 49,152.  If there were an index on that column, how would the different cardinality estimates affect the possibility that the optimizer might select to use that index rather than a full table scan?  What if the data volume were increased by a factor of, say 1,000 or 1,000,000?

I also captured a 10053 trace during the test run, and found this in the trace file:

******************************************
----- Current SQL Statement for this session (sql_id=7uk18xj0z9uxf) -----
select count(*) from b where id between 200810 and 200903
*******************************************
...
***************************************
SINGLE TABLE ACCESS PATH
  Single Table Cardinality Estimation for B[B]

  Table: B  Alias: B
    Card: Original: 2367488.000000  Rounded: 108124  Computed: 108124.16  Non Adjusted: 108124.16
  Access Path: TableScan
    Cost:  4714.53  Resp: 4714.53  Degree: 0
      Cost_io: 4670.00  Cost_cpu: 636216240
      Resp_io: 4670.00  Resp_cpu: 636216240
  Best:: AccessPath: TableScan
         Cost: 4714.53  Degree: 1  Resp: 4714.53  Card: 108124.16  Bytes: 0

***************************************
...
...
...
******************************************
----- Current SQL Statement for this session (sql_id=2ac0k15zjdg5x) -----
select count(*) from b where dt between to_date(200810, 'YYYYMM') and to_date(200903, 'YYYYMM')
*******************************************
...

***************************************
SINGLE TABLE ACCESS PATH
  Single Table Cardinality Estimation for B[B]

  Table: B  Alias: B
    Card: Original: 2367488.000000  Rounded: 57166  Computed: 57165.51  Non Adjusted: 57165.51
  Access Path: TableScan
    Cost:  4717.89  Resp: 4717.89  Degree: 0
      Cost_io: 4670.00  Cost_cpu: 684264079
      Resp_io: 4670.00  Resp_cpu: 684264079
  Best:: AccessPath: TableScan
         Cost: 4717.89  Degree: 1  Resp: 4717.89  Card: 57165.51  Bytes: 0

***************************************

Notice in the above that Oracle’s statistics gathering process did not create histograms when I collected statistics for the tables.  The calculated cost is the same for either datatype, but what would happen if that table were then joined to another table?  Is the optimizer seeing histograms in some of the original poster’s test cases?

The original poster is running Oracle Database 10.2.0.1, so I ran a test on Oracle 10.2.0.2 with OPTIMIZER_FEATURES_ENABLE set to 10.2.0.1, using the data created by the OP’s data creation script:

ALTER SESSION SET TRACEFILE_IDENTIFIER = 'DateTest';
ALTER SESSION SET EVENTS '10053 trace name context forever, level 1';

select count(*) from b where id between 200810 and 200903 ;

select count(*) from b where dt between to_date(200810, 'YYYYMM') and to_date(200903, 'YYYYMM') ;

ALTER SESSION SET EVENTS '10053 trace name context off';

set autotrace traceonly explain

select count(*) from b where id between 200810 and 200903 ;

select count(*) from b where dt between to_date(200810, 'YYYYMM') and to_date(200903, 'YYYYMM') ;

set autotrace off

select count(*) from b where id between 200810 and 200903;

The output from the above script, when run on Oracle 10.2.0.2 follows:

SQL> select count(*) from b where id between 200810 and 200903 ;

Execution Plan
----------------------------------------------------------
Plan hash value: 749587668

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     5 |   919  (11)| 00:00:05 |
|   1 |  SORT AGGREGATE    |      |     1 |     5 |            |          |
|*  2 |   TABLE ACCESS FULL| B    |   108K|   527K|   919  (11)| 00:00:05 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("ID"<=200903 AND "ID">=200810)

SQL> select count(*) from b where dt between to_date(200810, 'YYYYMM') and to_date(200903, 'YYYYMM') ;

Execution Plan
----------------------------------------------------------
Plan hash value: 749587668

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     8 |   926  (12)| 00:00:05 |
|   1 |  SORT AGGREGATE    |      |     1 |     8 |            |          |
|*  2 |   TABLE ACCESS FULL| B    | 57166 |   446K|   926  (12)| 00:00:05 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("DT"<=TO_DATE('2009-03-01 00:00:00', 'yyyy-mm-dd
              hh24:mi:ss') AND "DT">=TO_DATE('2008-10-01 00:00:00', 'yyyy-mm-dd
              hh24:mi:ss'))

SQL> set autotrace off

SQL> select count(*) from b where id between 200810 and 200903 ;

  COUNT(*)
----------
     49152 

The estimated cardinalities appear to be identical to that of Oracle 11.1.0.7, so why was the original poster seeing different cardinality estimates?  Here is the output from the 10053 trace file:

******************************************
Current SQL statement for this session:
select count(*) from b where id between 200810 and 200903
*******************************************
...
  PARAMETERS WITH ALTERED VALUES
  ******************************
  optimizer_features_enable           = 10.2.0.1
  *********************************
...
BASE STATISTICAL INFORMATION
***********************
Table Stats::
  Table:  B  Alias:  B
    #Rows: 2367488  #Blks:  16725  AvgRowLen:  13.00
***************************************
SINGLE TABLE ACCESS PATH
  Column (#1): ID(NUMBER)
    AvgLen: 5.00 NDV: 289 Nulls: 0 Density: 0.0034602 Min: 200101 Max: 202501
  Table:  B  Alias: B    
    Card: Original: 2367488  Rounded: 108124  Computed: 108124.16  Non Adjusted: 108124.16
  Access Path: TableScan
    Cost:  918.67  Resp: 918.67  Degree: 0
      Cost_io: 819.00  Cost_cpu: 632570063
      Resp_io: 819.00  Resp_cpu: 632570063
  Best:: AccessPath: TableScan
         Cost: 918.67  Degree: 1  Resp: 918.67  Card: 108124.16  Bytes: 0
***************************************
...
...
...
******************************************
Current SQL statement for this session:
select count(*) from b where dt between to_date(200810, 'YYYYMM') and to_date(200903, 'YYYYMM')
*******************************************
...
  *************************************
  PARAMETERS WITH ALTERED VALUES
  ******************************
  optimizer_features_enable           = 10.2.0.1
  *********************************
...
BASE STATISTICAL INFORMATION
***********************
Table Stats::
  Table:  B  Alias:  B
    #Rows: 2367488  #Blks:  16725  AvgRowLen:  13.00
***************************************
SINGLE TABLE ACCESS PATH
  Column (#2): DT(DATE)
    AvgLen: 8.00 NDV: 289 Nulls: 0 Density: 0.0034602 Min: 2451911 Max: 2460677
  Table:  B  Alias: B    
    Card: Original: 2367488  Rounded: 57166  Computed: 57165.51  Non Adjusted: 57165.51
  Access Path: TableScan
    Cost:  926.24  Resp: 926.24  Degree: 0
      Cost_io: 819.00  Cost_cpu: 680617902
      Resp_io: 819.00  Resp_cpu: 680617902
  Best:: AccessPath: TableScan
         Cost: 926.24  Degree: 1  Resp: 926.24  Card: 57165.51  Bytes: 0

Notice that no histograms were collected based on the 10053 trace file.

Now a second test, this time we will instruct Oracle to create histograms, and also force the optimizer to hard parse the SQL statements that reference table B when those SQL statements are re-executed:

exec dbms_stats.gather_table_stats(user, 'B', estimate_percent=>NULL,method_opt=>'FOR ALL COLUMNS SIZE 254',no_invalidate=>false);

ALTER SESSION SET TRACEFILE_IDENTIFIER = 'DateTest2';
ALTER SESSION SET EVENTS '10053 trace name context forever, level 1';

select count(*) from b where id between 200810 and 200903 ;

select count(*) from b where dt between to_date(200810, 'YYYYMM') and to_date(200903, 'YYYYMM');

ALTER SESSION SET EVENTS '10053 trace name context off';

set autotrace traceonly explain

select count(*) from b where id between 200810 and 200903;

select count(*) from b where dt between to_date(200810, 'YYYYMM') and to_date(200903, 'YYYYMM');

set autotrace off

So, what is the output of the above?

SQL> select count(*) from b where id between 200810 and 200903;

Execution Plan
----------------------------------------------------------
Plan hash value: 749587668

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     5 |   919  (11)| 00:00:05 |
|   1 |  SORT AGGREGATE    |      |     1 |     5 |            |          |
|*  2 |   TABLE ACCESS FULL| B    | 46604 |   227K|   919  (11)| 00:00:05 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("ID"<=200903 AND "ID">=200810)

SQL> select count(*) from b where dt between to_date(200810, 'YYYYMM') and to_date(200903, 'YYYYMM');

Execution Plan
----------------------------------------------------------
Plan hash value: 749587668

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     8 |   926  (12)| 00:00:05 |
|   1 |  SORT AGGREGATE    |      |     1 |     8 |            |          |
|*  2 |   TABLE ACCESS FULL| B    | 46604 |   364K|   926  (12)| 00:00:05 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("DT"<=TO_DATE('2009-03-01 00:00:00', 'yyyy-mm-dd
              hh24:mi:ss') AND "DT">=TO_DATE('2008-10-01 00:00:00', 'yyyy-mm-dd
              hh24:mi:ss')) 

Interesting, both queries estimate that the full table scan operation will return 46,604 rows – interesting.  That cardinality estimate exactly matches the cardinality estimate in the OP’s plan for the SQL statement that accessed the NUMBER datatype…

For fun, let’s look in the 10053 trace file:

******************************************
Current SQL statement for this session:
select count(*) from b where id between 200810 and 200903
*******************************************
...
BASE STATISTICAL INFORMATION
***********************
Table Stats::
  Table:  B  Alias:  B
    #Rows: 2367488  #Blks:  16725  AvgRowLen:  13.00
***************************************
SINGLE TABLE ACCESS PATH
  Column (#1): ID(NUMBER)
    AvgLen: 5.00 NDV: 289 Nulls: 0 Density: 0.0034602 Min: 200101 Max: 202501
    Histogram: HtBal  #Bkts: 254  UncompBkts: 254  EndPtVals: 255
  Table:  B  Alias: B    
    Card: Original: 2367488  Rounded: 46604  Computed: 46604.09  Non Adjusted: 46604.09
  Access Path: TableScan
    Cost:  918.76  Resp: 918.76  Degree: 0
      Cost_io: 819.00  Cost_cpu: 633149246
      Resp_io: 819.00  Resp_cpu: 633149246
  Best:: AccessPath: TableScan
         Cost: 918.76  Degree: 1  Resp: 918.76  Card: 46604.09  Bytes: 0
...
...
...
******************************************
Current SQL statement for this session:
select count(*) from b where dt between to_date(200810, 'YYYYMM') and to_date(200903, 'YYYYMM')
*******************************************
...
BASE STATISTICAL INFORMATION
***********************
Table Stats::
  Table:  B  Alias:  B
    #Rows: 2367488  #Blks:  16725  AvgRowLen:  13.00
***************************************
SINGLE TABLE ACCESS PATH
  Column (#2): DT(DATE)
    AvgLen: 8.00 NDV: 289 Nulls: 0 Density: 0.0034602 Min: 2451911 Max: 2460677
    Histogram: HtBal  #Bkts: 254  UncompBkts: 254  EndPtVals: 255
  Table:  B  Alias: B    
    Card: Original: 2367488  Rounded: 46604  Computed: 46604.09  Non Adjusted: 46604.09
  Access Path: TableScan
    Cost:  926.22  Resp: 926.22  Degree: 0
      Cost_io: 819.00  Cost_cpu: 680499006
      Resp_io: 819.00  Resp_cpu: 680499006
  Best:: AccessPath: TableScan
         Cost: 926.22  Degree: 1  Resp: 926.22  Card: 46604.09  Bytes: 0

The 10053 trace file shows that in both cases a height balanced histogram with 254 buckets was created.  But, how accurate would the estimate be if there were 1,000 or 1,000,000 times as many rows?  What if the time interval were changed to something else?  What if each of the 289 distinct values for the ID and DT columns did not have an equal distribution of values?

So, why select a DATE datatype rather than a NUMBER datatype?  These are the reasons that I proposed in the discussion thread:

One of the problems with putting date values in number columns is this – if you select the range from 200810 to 200903, the optimizer will likely make the assumption that 200810 is just as likely of a number as 200808, 200812, 200814, 200816, 200820, 200890, 200900, etc. Some of those year/month combinations are simply not possible. In such a case, the optimizer should over-estimate the number of rows returned from that range when the column data type is NUMBER, and should be reasonably close when the column data type is DATE, since the optimizer knows that 200814 (14/1/2008), 200816 (16/1/2008), 200820 (20/1/2008), 200890 (90/1/2008), 200900 (0/1/2009), etc. could never be dates (and would be completely out of the serial sequence of dates). By putting the date type data into a DATE column, you have essentially added a constraint to the database to prevent invalid dates from being added. Additionally, date math, such as finding the number of days between 200802 and 200803 (compared to 200702 and 200703) is very simple – the answer is not 1 in both cases, but rather 29 and 28, respectively.

Any other comments?

OK, enough guessing, let’s try a couple tests.  Here is the test table, with 10,000,000 rows with an uneven distribution of rows for each value:

DROP TABLE B PURGE;

CREATE TABLE B AS
SELECT
  TO_NUMBER(TO_CHAR(TRUNC(TO_DATE('200101','YYYYDD')+SQRT(ROWNUM),'MM'),'YYYYMM')) ID,
  TRUNC(TO_DATE('200101','YYYYDD')+SQRT(ROWNUM),'MM') DT
FROM
  (SELECT
    ROWNUM RN
  FROM
    DUAL
  CONNECT BY
    LEVEL<=1000) V1,
  (SELECT
    ROWNUM RN
  FROM
    DUAL
  CONNECT BY
    LEVEL<=10000) V2;

CREATE INDEX IND_B_ID ON B(ID);
CREATE INDEX IND_B_DT ON B(DT);

SET LINESIZE 130
SET PAGESIZE 1000

The first test script with no histograms, supplying various ranges for year-month combinations while gathering execution statistics, and displaying the actual execution plans:

EXEC DBMS_STATS.GATHER_TABLE_STATS(USER,'B',CASCADE=>TRUE,METHOD_OPT=>'FOR ALL COLUMNS SIZE 1',NO_INVALIDATE=>FALSE);

SPOOL HISTOGRAMTEST.TXT

SELECT COUNT(*) FROM B;

SELECT /*+ GATHER_PLAN_STATISTICS */ COUNT(*) FROM B WHERE ID BETWEEN 200810 AND 200903;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

SELECT /*+ GATHER_PLAN_STATISTICS */ COUNT(*) FROM B WHERE DT BETWEEN TO_DATE(200810,'YYYYMM') AND TO_DATE(200903,'YYYYMM');

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

SELECT /*+ GATHER_PLAN_STATISTICS */ COUNT(*) FROM B WHERE ID BETWEEN 200110 AND 200203;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

SELECT /*+ GATHER_PLAN_STATISTICS */ COUNT(*) FROM B WHERE DT BETWEEN TO_DATE(200110,'YYYYMM') AND TO_DATE(200203,'YYYYMM');

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

SELECT /*+ GATHER_PLAN_STATISTICS */ COUNT(*) FROM B WHERE ID BETWEEN 200210 AND 200303;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

SELECT /*+ GATHER_PLAN_STATISTICS */ COUNT(*) FROM B WHERE DT BETWEEN TO_DATE(200210,'YYYYMM') AND TO_DATE(200303,'YYYYMM');

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

SELECT /*+ GATHER_PLAN_STATISTICS */ COUNT(*) FROM B WHERE ID BETWEEN 200812 AND 200901;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

SELECT /*+ GATHER_PLAN_STATISTICS */ COUNT(*) FROM B WHERE DT BETWEEN TO_DATE(200812,'YYYYMM') AND TO_DATE(200901,'YYYYMM');

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

SELECT /*+ GATHER_PLAN_STATISTICS */ COUNT(*) FROM B WHERE ID BETWEEN 200112 AND 200201;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

SELECT /*+ GATHER_PLAN_STATISTICS */ COUNT(*) FROM B WHERE DT BETWEEN TO_DATE(200112,'YYYYMM') AND TO_DATE(200201,'YYYYMM');

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

SELECT /*+ GATHER_PLAN_STATISTICS */ COUNT(*) FROM B WHERE ID BETWEEN 200612 AND 200901;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

SELECT /*+ GATHER_PLAN_STATISTICS */ COUNT(*) FROM B WHERE DT BETWEEN TO_DATE(200612,'YYYYMM') AND TO_DATE(200901,'YYYYMM');

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

SPOOL OFF

The trimmed output (from Oracle 11.1.0.7) of the first date range follows:

SQL> SELECT COUNT(*) FROM B;

  COUNT(*)                                                                                                                       
----------                                                                                                                       
  10000000                                                                                                                       

SQL> SELECT /*+ GATHER_PLAN_STATISTICS */ COUNT(*) FROM B WHERE ID BETWEEN 200810 AND 200903;

----------------------------------------------------------------------------------------                                         
| Id  | Operation         | Name     | Starts | E-Rows | A-Rows |   A-Time   | Buffers |                                         
----------------------------------------------------------------------------------------                                         
|   0 | SELECT STATEMENT  |          |      1 |        |      1 |00:00:00.06 |    2371 |                                         
|   1 |  SORT AGGREGATE   |          |      1 |      1 |      1 |00:00:00.06 |    2371 |                                         
|*  2 |   INDEX RANGE SCAN| IND_B_ID |      1 |   1310K|   1063K|00:00:00.01 |    2371 |                                         
----------------------------------------------------------------------------------------                                         

Predicate Information (identified by operation id):                                                                              
---------------------------------------------------                                                                              
   2 - access("ID">=200810 AND "ID"<=200903)                                                                                     

SQL>                                                                                                                                   
SQL> SELECT /*+ GATHER_PLAN_STATISTICS */ COUNT(*) FROM B WHERE DT BETWEEN TO_DATE(200810,'YYYYMM') AND TO_DATE(200903,'YYYYMM');

----------------------------------------------------------------------------------------                                         
| Id  | Operation         | Name     | Starts | E-Rows | A-Rows |   A-Time   | Buffers |                                         
----------------------------------------------------------------------------------------                                         
|   0 | SELECT STATEMENT  |          |      1 |        |      1 |00:00:00.06 |    2816 |                                         
|   1 |  SORT AGGREGATE   |          |      1 |      1 |      1 |00:00:00.06 |    2816 |                                         
|*  2 |   INDEX RANGE SCAN| IND_B_DT |      1 |    674K|   1063K|00:00:00.01 |    2816 |                                         
----------------------------------------------------------------------------------------                                         

Predicate Information (identified by operation id):                                                                              
---------------------------------------------------                                                                              
   2 - access("DT">=TO_DATE(' 2008-10-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')                                                    
              AND "DT"<=TO_DATE(' 2009-03-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))                                               

In the above, the estimated number of rows with the comparison on the numeric column is about 250,000 rows above the actual, while the comparison on the date column is about 400,000 rows below the actual – could this be enough of a difference to change the execution plan if the clustering factor of the indexes were high?  What if the tables had a larger average row length?  What if this table were joined with another table?  Note that the number of logical blocks accessed is less with the index on the numeric column.

The trimmed output of the second date range follows:

SQL> SELECT /*+ GATHER_PLAN_STATISTICS */ COUNT(*) FROM B WHERE ID BETWEEN 200110 AND 200203;

----------------------------------------------------------------------------------------                                         
| Id  | Operation         | Name     | Starts | E-Rows | A-Rows |   A-Time   | Buffers |                                         
----------------------------------------------------------------------------------------                                         
|   0 | SELECT STATEMENT  |          |      1 |        |      1 |00:00:00.01 |     299 |                                         
|   1 |  SORT AGGREGATE   |          |      1 |      1 |      1 |00:00:00.01 |     299 |                                         
|*  2 |   INDEX RANGE SCAN| IND_B_ID |      1 |   1344K|    132K|00:00:00.01 |     299 |                                         
----------------------------------------------------------------------------------------                                         

Predicate Information (identified by operation id):                                                                              
---------------------------------------------------                                                                              
   2 - access("ID">=200110 AND "ID"<=200203)                                                                                     

SQL>
SQL> SELECT /*+ GATHER_PLAN_STATISTICS */ COUNT(*) FROM B WHERE DT BETWEEN TO_DATE(200110,'YYYYMM') AND TO_DATE(200203,'YYYYMM');

----------------------------------------------------------------------------------------                                         
| Id  | Operation         | Name     | Starts | E-Rows | A-Rows |   A-Time   | Buffers |                                         
----------------------------------------------------------------------------------------                                         
|   0 | SELECT STATEMENT  |          |      1 |        |      1 |00:00:00.01 |     353 |                                         
|   1 |  SORT AGGREGATE   |          |      1 |      1 |      1 |00:00:00.01 |     353 |                                         
|*  2 |   INDEX RANGE SCAN| IND_B_DT |      1 |    674K|    132K|00:00:00.01 |     353 |                                         
----------------------------------------------------------------------------------------                                         

Predicate Information (identified by operation id):                                                                              
---------------------------------------------------                                                                              
   2 - access("DT">=TO_DATE(' 2001-10-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')                                                    
              AND "DT"<=TO_DATE(' 2002-03-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))                                               

In the above, the estimated number of rows with the comparison on the numeric column is about 10 times as high as the actual number of rows, while the estimated number of rows with the comparison on the date column is about 5 times as high.  Could this be enough to trigger a different execution plan for the queries – where one uses an index access, while the other uses a full table scan?

The trimmed output of the third date range follows:

SQL> SELECT /*+ GATHER_PLAN_STATISTICS */ COUNT(*) FROM B WHERE ID BETWEEN 200210 AND 200303;

----------------------------------------------------------------------------------------                                         
| Id  | Operation         | Name     | Starts | E-Rows | A-Rows |   A-Time   | Buffers |                                         
----------------------------------------------------------------------------------------                                         
|   0 | SELECT STATEMENT  |          |      1 |        |      1 |00:00:00.03 |     594 |                                         
|   1 |  SORT AGGREGATE   |          |      1 |      1 |      1 |00:00:00.03 |     594 |                                         
|*  2 |   INDEX RANGE SCAN| IND_B_ID |      1 |   1344K|    265K|00:00:00.01 |     594 |                                         
----------------------------------------------------------------------------------------                                         

Predicate Information (identified by operation id):                                                                              
---------------------------------------------------                                                                              
   2 - access("ID">=200210 AND "ID"<=200303)                                                                                     

SQL>
SQL> SELECT /*+ GATHER_PLAN_STATISTICS */ COUNT(*) FROM B WHERE DT BETWEEN TO_DATE(200210,'YYYYMM') AND TO_DATE(200303,'YYYYMM');

----------------------------------------------------------------------------------------                                         
| Id  | Operation         | Name     | Starts | E-Rows | A-Rows |   A-Time   | Buffers |                                         
----------------------------------------------------------------------------------------                                         
|   0 | SELECT STATEMENT  |          |      1 |        |      1 |00:00:00.03 |     705 |                                         
|   1 |  SORT AGGREGATE   |          |      1 |      1 |      1 |00:00:00.03 |     705 |                                         
|*  2 |   INDEX RANGE SCAN| IND_B_DT |      1 |    674K|    265K|00:00:00.01 |     705 |                                         
----------------------------------------------------------------------------------------                                         

Predicate Information (identified by operation id):                                                                              
---------------------------------------------------                                                                              
   2 - access("DT">=TO_DATE(' 2002-10-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')                                                    
              AND "DT"<=TO_DATE(' 2003-03-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))                                               

In this case the estimates are about the same as the previous test, but the actual number of rows has doubled.  The optimizer’s estimates are again in favor of the date datatype.

The trimmed output of the fourth date range follows:

SQL> SELECT /*+ GATHER_PLAN_STATISTICS */ COUNT(*) FROM B WHERE ID BETWEEN 200812 AND 200901;

----------------------------------------------------------------------------------------                                         
| Id  | Operation         | Name     | Starts | E-Rows | A-Rows |   A-Time   | Buffers |                                         
----------------------------------------------------------------------------------------                                         
|   0 | SELECT STATEMENT  |          |      1 |        |      1 |00:00:00.03 |     810 |                                         
|   1 |  SORT AGGREGATE   |          |      1 |      1 |      1 |00:00:00.03 |     810 |                                         
|*  2 |   INDEX RANGE SCAN| IND_B_ID |      1 |   1285K|    362K|00:00:00.01 |     810 |                                         
----------------------------------------------------------------------------------------                                         

Predicate Information (identified by operation id):                                                                              
---------------------------------------------------                                                                              
   2 - access("ID">=200812 AND "ID"<=200901)                                                                                     

SQL>
SQL> SELECT /*+ GATHER_PLAN_STATISTICS */ COUNT(*) FROM B WHERE DT BETWEEN TO_DATE(200812,'YYYYMM') AND TO_DATE(200901,'YYYYMM');

----------------------------------------------------------------------------------------                                         
| Id  | Operation         | Name     | Starts | E-Rows | A-Rows |   A-Time   | Buffers |                                         
----------------------------------------------------------------------------------------                                         
|   0 | SELECT STATEMENT  |          |      1 |        |      1 |00:00:00.03 |     962 |                                         
|   1 |  SORT AGGREGATE   |          |      1 |      1 |      1 |00:00:00.03 |     962 |                                         
|*  2 |   INDEX RANGE SCAN| IND_B_DT |      1 |    291K|    362K|00:00:00.01 |     962 |                                         
----------------------------------------------------------------------------------------                                         

Predicate Information (identified by operation id):                                                                              
---------------------------------------------------                                                                              
   2 - access("DT">=TO_DATE(' 2008-12-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')                                                    
              AND "DT"<=TO_DATE(' 2009-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))                                               

This time, the optimizer’s estimate when the date datatype was used is very close, while the optimizer’s estimate when the numeric datatype was used is about 4 times greater than the actual number of rows.

The trimmed output of the fifth date range follows:

SQL> SELECT /*+ GATHER_PLAN_STATISTICS */ COUNT(*) FROM B WHERE ID BETWEEN 200112 AND 200201;

----------------------------------------------------------------------------------------                                         
| Id  | Operation         | Name     | Starts | E-Rows | A-Rows |   A-Time   | Buffers |                                         
----------------------------------------------------------------------------------------                                         
|   0 | SELECT STATEMENT  |          |      1 |        |      1 |00:00:00.03 |     104 |                                         
|   1 |  SORT AGGREGATE   |          |      1 |      1 |      1 |00:00:00.03 |     104 |                                         
|*  2 |   INDEX RANGE SCAN| IND_B_ID |      1 |   1295K|  45260 |00:00:00.01 |     104 |                                         
----------------------------------------------------------------------------------------                                         

Predicate Information (identified by operation id):                                                                              
---------------------------------------------------                                                                              
   2 - access("ID">=200112 AND "ID"<=200201)                                                                                     

SQL>
SQL> SELECT /*+ GATHER_PLAN_STATISTICS */ COUNT(*) FROM B WHERE DT BETWEEN TO_DATE(200112,'YYYYMM') AND TO_DATE(200201,'YYYYMM');

----------------------------------------------------------------------------------------                                         
| Id  | Operation         | Name     | Starts | E-Rows | A-Rows |   A-Time   | Buffers |                                         
----------------------------------------------------------------------------------------                                         
|   0 | SELECT STATEMENT  |          |      1 |        |      1 |00:00:00.01 |     122 |                                         
|   1 |  SORT AGGREGATE   |          |      1 |      1 |      1 |00:00:00.01 |     122 |                                         
|*  2 |   INDEX RANGE SCAN| IND_B_DT |      1 |    291K|  45260 |00:00:00.01 |     122 |                                         
----------------------------------------------------------------------------------------                                         

Predicate Information (identified by operation id):                                                                              
---------------------------------------------------                                                                              
   2 - access("DT">=TO_DATE(' 2001-12-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')                                                    
              AND "DT"<=TO_DATE(' 2002-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))                                               

This time with the numeric datatype, the optimizer is estimating 1,295,000 rows when in fact only 45,260 are returned during the index range scan.  The estimate with the date datatype is also quite high, but it is 4 to 5 times lower (thus closer to the actual) than with the numeric datatype.

Now, the final trimmed output:

SQL> SELECT /*+ GATHER_PLAN_STATISTICS */ COUNT(*) FROM B WHERE ID BETWEEN 200612 AND 200901;

--------------------------------------------------------------------------------------------                                     
| Id  | Operation             | Name     | Starts | E-Rows | A-Rows |   A-Time   | Buffers |                                     
--------------------------------------------------------------------------------------------                                     
|   0 | SELECT STATEMENT      |          |      1 |        |      1 |00:00:00.78 |   22345 |                                     
|   1 |  SORT AGGREGATE       |          |      1 |      1 |      1 |00:00:00.78 |   22345 |                                     
|*  2 |   INDEX FAST FULL SCAN| IND_B_ID |      1 |   3764K|   4054K|00:00:00.22 |   22345 |                                     
--------------------------------------------------------------------------------------------                                     

Predicate Information (identified by operation id):                                                                              
---------------------------------------------------                                                                              
   2 - filter(("ID">=200612 AND "ID"<=200901))                                                                                   

SQL>
SQL> SELECT /*+ GATHER_PLAN_STATISTICS */ COUNT(*) FROM B WHERE DT BETWEEN TO_DATE(200612,'YYYYMM') AND TO_DATE(200901,'YYYYMM');

-------------------------------------------------------------------------------------                                            
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |                                            
-------------------------------------------------------------------------------------                                            
|   0 | SELECT STATEMENT   |      |      1 |        |      1 |00:00:00.47 |   24950 |                                            
|   1 |  SORT AGGREGATE    |      |      1 |      1 |      1 |00:00:00.47 |   24950 |                                            
|*  2 |   TABLE ACCESS FULL| B    |      1 |   2623K|   4054K|00:00:00.12 |   24950 |                                            
-------------------------------------------------------------------------------------                                            

Predicate Information (identified by operation id):                                                                              
---------------------------------------------------                                                                              
   2 - filter(("DT">=TO_DATE(' 2006-12-01 00:00:00', 'syyyy-mm-dd                                                                
              hh24:mi:ss') AND "DT"<=TO_DATE(' 2009-01-01 00:00:00', 'syyyy-mm-dd                                                
              hh24:mi:ss'))) 

Here, with the wider date range, the optimizer is closer with the numeric data type, and has selected to perform a fast full scan of the index, which would use multi-block reads if disk accesses were required.  Note that the optimizer selected to perform a full table scan when the date datatype was used, even though the estimated number of rows was less.  Note too that this in-memory operation completed about twice as fast as the in-memory index fast full scan operation.

Now let’s take a look at what happens when a histogram is present on each of the columns.  The script is identical to the previous script, except for the first two lines:

EXEC DBMS_STATS.GATHER_TABLE_STATS(USER,'B',CASCADE=>TRUE,METHOD_OPT=>'FOR ALL COLUMNS SIZE 254',NO_INVALIDATE=>FALSE);

SPOOL HISTOGRAMTEST2.TXT

Below are the results from this test run:

SQL> SELECT COUNT(*) FROM B;

  COUNT(*)     
----------     
  10000000     

SQL>
SQL> SELECT /*+ GATHER_PLAN_STATISTICS */ COUNT(*) FROM B WHERE ID BETWEEN 200810 AND 200903;

----------------------------------------------------------------------------------------   
| Id  | Operation         | Name     | Starts | E-Rows | A-Rows |   A-Time   | Buffers |   
----------------------------------------------------------------------------------------   
|   0 | SELECT STATEMENT  |          |      1 |        |      1 |00:00:00.09 |    2371 |   
|   1 |  SORT AGGREGATE   |          |      1 |      1 |      1 |00:00:00.09 |    2371 |   
|*  2 |   INDEX RANGE SCAN| IND_B_ID |      1 |   1083K|   1063K|00:00:00.01 |    2371 |   
----------------------------------------------------------------------------------------   

Predicate Information (identified by operation id):  
---------------------------------------------------  
   2 - access("ID">=200810 AND "ID"<=200903)         

SQL>
SQL> SELECT /*+ GATHER_PLAN_STATISTICS */ COUNT(*) FROM B WHERE DT BETWEEN TO_DATE(200810,'YYYYMM') AND TO_DATE(200903,'YYYYMM');

----------------------------------------------------------------------------------------   
| Id  | Operation         | Name     | Starts | E-Rows | A-Rows |   A-Time   | Buffers |   
----------------------------------------------------------------------------------------   
|   0 | SELECT STATEMENT  |          |      1 |        |      1 |00:00:00.09 |    2816 |   
|   1 |  SORT AGGREGATE   |          |      1 |      1 |      1 |00:00:00.09 |    2816 |   
|*  2 |   INDEX RANGE SCAN| IND_B_DT |      1 |   1083K|   1063K|00:00:00.01 |    2816 |   
----------------------------------------------------------------------------------------   

Predicate Information (identified by operation id):  
---------------------------------------------------  
   2 - access("DT">=TO_DATE(' 2008-10-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')              
              AND "DT"<=TO_DATE(' 2009-03-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))         

SQL> SELECT /*+ GATHER_PLAN_STATISTICS */ COUNT(*) FROM B WHERE ID BETWEEN 200110 AND 200203;

----------------------------------------------------------------------------------------   
| Id  | Operation         | Name     | Starts | E-Rows | A-Rows |   A-Time   | Buffers |   
----------------------------------------------------------------------------------------   
|   0 | SELECT STATEMENT  |          |      1 |        |      1 |00:00:00.01 |     299 |   
|   1 |  SORT AGGREGATE   |          |      1 |      1 |      1 |00:00:00.01 |     299 |   
|*  2 |   INDEX RANGE SCAN| IND_B_ID |      1 |    130K|    132K|00:00:00.01 |     299 |   
----------------------------------------------------------------------------------------   

Predicate Information (identified by operation id):  
---------------------------------------------------  
   2 - access("ID">=200110 AND "ID"<=200203)         

SQL>
SQL> SELECT /*+ GATHER_PLAN_STATISTICS */ COUNT(*) FROM B WHERE DT BETWEEN TO_DATE(200110,'YYYYMM') AND TO_DATE(200203,'YYYYMM');

----------------------------------------------------------------------------------------   
| Id  | Operation         | Name     | Starts | E-Rows | A-Rows |   A-Time   | Buffers |   
----------------------------------------------------------------------------------------   
|   0 | SELECT STATEMENT  |          |      1 |        |      1 |00:00:00.01 |     353 |   
|   1 |  SORT AGGREGATE   |          |      1 |      1 |      1 |00:00:00.01 |     353 |   
|*  2 |   INDEX RANGE SCAN| IND_B_DT |      1 |    130K|    132K|00:00:00.01 |     353 |   
----------------------------------------------------------------------------------------   

Predicate Information (identified by operation id):  
---------------------------------------------------  
   2 - access("DT">=TO_DATE(' 2001-10-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')              
              AND "DT"<=TO_DATE(' 2002-03-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))         

SQL> SELECT /*+ GATHER_PLAN_STATISTICS */ COUNT(*) FROM B WHERE ID BETWEEN 200210 AND 200303;

----------------------------------------------------------------------------------------   
| Id  | Operation         | Name     | Starts | E-Rows | A-Rows |   A-Time   | Buffers |   
----------------------------------------------------------------------------------------   
|   0 | SELECT STATEMENT  |          |      1 |        |      1 |00:00:00.01 |     594 |   
|   1 |  SORT AGGREGATE   |          |      1 |      1 |      1 |00:00:00.01 |     594 |   
|*  2 |   INDEX RANGE SCAN| IND_B_ID |      1 |    284K|    265K|00:00:00.01 |     594 |   
----------------------------------------------------------------------------------------   

Predicate Information (identified by operation id):  
---------------------------------------------------  
   2 - access("ID">=200210 AND "ID"<=200303)         

SQL>
SQL> SELECT /*+ GATHER_PLAN_STATISTICS */ COUNT(*) FROM B WHERE DT BETWEEN TO_DATE(200210,'YYYYMM') AND TO_DATE(200303,'YYYYMM');

----------------------------------------------------------------------------------------   
| Id  | Operation         | Name     | Starts | E-Rows | A-Rows |   A-Time   | Buffers |   
----------------------------------------------------------------------------------------   
|   0 | SELECT STATEMENT  |          |      1 |        |      1 |00:00:00.01 |     705 |   
|   1 |  SORT AGGREGATE   |          |      1 |      1 |      1 |00:00:00.01 |     705 |   
|*  2 |   INDEX RANGE SCAN| IND_B_DT |      1 |    284K|    265K|00:00:00.01 |     705 |   
----------------------------------------------------------------------------------------   

Predicate Information (identified by operation id):  
---------------------------------------------------  
   2 - access("DT">=TO_DATE(' 2002-10-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')              
              AND "DT"<=TO_DATE(' 2003-03-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))         

SQL> SELECT /*+ GATHER_PLAN_STATISTICS */ COUNT(*) FROM B WHERE ID BETWEEN 200812 AND 200901;

----------------------------------------------------------------------------------------   
| Id  | Operation         | Name     | Starts | E-Rows | A-Rows |   A-Time   | Buffers |   
----------------------------------------------------------------------------------------   
|   0 | SELECT STATEMENT  |          |      1 |        |      1 |00:00:00.01 |     810 |   
|   1 |  SORT AGGREGATE   |          |      1 |      1 |      1 |00:00:00.01 |     810 |   
|*  2 |   INDEX RANGE SCAN| IND_B_ID |      1 |    383K|    362K|00:00:00.01 |     810 |   
----------------------------------------------------------------------------------------   

Predicate Information (identified by operation id):  
---------------------------------------------------  
   2 - access("ID">=200812 AND "ID"<=200901)         

SQL>
SQL> SELECT /*+ GATHER_PLAN_STATISTICS */ COUNT(*) FROM B WHERE DT BETWEEN TO_DATE(200812,'YYYYMM') AND TO_DATE(200901,'YYYYMM');

----------------------------------------------------------------------------------------   
| Id  | Operation         | Name     | Starts | E-Rows | A-Rows |   A-Time   | Buffers |   
----------------------------------------------------------------------------------------   
|   0 | SELECT STATEMENT  |          |      1 |        |      1 |00:00:00.03 |     962 |   
|   1 |  SORT AGGREGATE   |          |      1 |      1 |      1 |00:00:00.03 |     962 |   
|*  2 |   INDEX RANGE SCAN| IND_B_DT |      1 |    383K|    362K|00:00:00.01 |     962 |   
----------------------------------------------------------------------------------------   

Predicate Information (identified by operation id):  
---------------------------------------------------  
   2 - access("DT">=TO_DATE(' 2008-12-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')              
              AND "DT"<=TO_DATE(' 2009-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))         

SQL>
SQL> SELECT /*+ GATHER_PLAN_STATISTICS */ COUNT(*) FROM B WHERE ID BETWEEN 200112 AND 200201;

----------------------------------------------------------------------------------------   
| Id  | Operation         | Name     | Starts | E-Rows | A-Rows |   A-Time   | Buffers |   
----------------------------------------------------------------------------------------   
|   0 | SELECT STATEMENT  |          |      1 |        |      1 |00:00:00.01 |     104 |   
|   1 |  SORT AGGREGATE   |          |      1 |      1 |      1 |00:00:00.01 |     104 |   
|*  2 |   INDEX RANGE SCAN| IND_B_ID |      1 |  39391 |  45260 |00:00:00.01 |     104 |   
----------------------------------------------------------------------------------------   

Predicate Information (identified by operation id):  
---------------------------------------------------  
   2 - access("ID">=200112 AND "ID"<=200201)         

SQL>
SQL> SELECT /*+ GATHER_PLAN_STATISTICS */ COUNT(*) FROM B WHERE DT BETWEEN TO_DATE(200112,'YYYYMM') AND TO_DATE(200201,'YYYYMM');

----------------------------------------------------------------------------------------   
| Id  | Operation         | Name     | Starts | E-Rows | A-Rows |   A-Time   | Buffers |   
----------------------------------------------------------------------------------------   
|   0 | SELECT STATEMENT  |          |      1 |        |      1 |00:00:00.01 |     122 |   
|   1 |  SORT AGGREGATE   |          |      1 |      1 |      1 |00:00:00.01 |     122 |   
|*  2 |   INDEX RANGE SCAN| IND_B_DT |      1 |  39391 |  45260 |00:00:00.01 |     122 |   
----------------------------------------------------------------------------------------   

Predicate Information (identified by operation id):  
---------------------------------------------------  
   2 - access("DT">=TO_DATE(' 2001-12-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')              
              AND "DT"<=TO_DATE(' 2002-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))         

SQL> SELECT /*+ GATHER_PLAN_STATISTICS */ COUNT(*) FROM B WHERE ID BETWEEN 200612 AND 200901;

--------------------------------------------------------------------------------------------
| Id  | Operation             | Name     | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |          |      1 |        |      1 |00:00:00.75 |   22345 |
|   1 |  SORT AGGREGATE       |          |      1 |      1 |      1 |00:00:00.75 |   22345 |
|*  2 |   INDEX FAST FULL SCAN| IND_B_ID |      1 |   4053K|   4054K|00:00:00.22 |   22345 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):  
---------------------------------------------------  
   2 - filter(("ID">=200612 AND "ID"<=200901))       

SQL>
SQL> SELECT /*+ GATHER_PLAN_STATISTICS */ COUNT(*) FROM B WHERE DT BETWEEN TO_DATE(200612,'YYYYMM') AND TO_DATE(200901,'YYYYMM');

-------------------------------------------------------------------------------------      
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |      
-------------------------------------------------------------------------------------      
|   0 | SELECT STATEMENT   |      |      1 |        |      1 |00:00:00.47 |   24950 |      
|   1 |  SORT AGGREGATE    |      |      1 |      1 |      1 |00:00:00.47 |   24950 |      
|*  2 |   TABLE ACCESS FULL| B    |      1 |   4053K|   4054K|00:00:00.12 |   24950 |      
-------------------------------------------------------------------------------------      

Predicate Information (identified by operation id):  
---------------------------------------------------  
   2 - filter(("DT">=TO_DATE(' 2006-12-01 00:00:00', 'syyyy-mm-dd                          
              hh24:mi:ss') AND "DT"<=TO_DATE(' 2009-01-01 00:00:00', 'syyyy-mm-dd          
              hh24:mi:ss')))                         

As the above indicates, with a 254 bucket histogram on both of the columns the optimizer calculates estimated row counts that are typically very close to the actual row counts for both datatypes – essentially the only difference is the number of logical reads.  So, adding the histogram helps, but then what if the OP implements good coding standards and uses bind variables rather than constants (literals)?

So, which datatype would you choose, and why?





Eliminate Rows Having a Letter and Number Combination

3 01 2010

January 3, 2010

(Back to the Previous Post in the Series)

In a recent message thread on the comp.databases.oracle.misc group, the following question was asked:

I am currently using Oracle9i Enterprise Edition Release 9.2.0.4.0. I have a table with following data

Table 1 (Sample data)

a12345
A123423
g13452
G452323
h34423
r34323
b23232
n232323

I am currently using this as a subquery in one of the query. As per a new request I have to now exclude all values which start with h, b or n followed by numeric values. So end result the subquery should give me is

Table 1 (Sample data)

a12345
A123423
g13452
G452323
r34323

I am little stumped on this for now. Could not get it right in my query. Can anyone please advise here. Let me know if any more information is needed from my side.

Note: The starting character in all values can sometimes in “lower case” or sometimes in “upper case”.

Interesting problem, although it would have been helpful had the OP provided the DDL and DML to create the test case.  Let’s see if there is a hard way to solve this problem:

CREATE TABLE T10(HOMEWORK VARCHAR2(20));

INSERT INTO T10 VALUES ('a12345');
INSERT INTO T10 VALUES ('A123423');
INSERT INTO T10 VALUES ('g13452');
INSERT INTO T10 VALUES ('G452323');
INSERT INTO T10 VALUES ('h34423');
INSERT INTO T10 VALUES ('r34323');
INSERT INTO T10 VALUES ('b23232');
INSERT INTO T10 VALUES ('n232323');
INSERT INTO T10 VALUES ('NB151517');
INSERT INTO T10 VALUES ('C0151517');
INSERT INTO T10 VALUES ('f9151517');
INSERT INTO T10 VALUES ('HE4423');

COMMIT;

Note that I added a couple of extra rows just for fun (actually to help with testing).

Let’s look at the ASCII values of the first and second characters:

SELECT
  HOMEWORK,
  ASCII(SUBSTR(HOMEWORK,1,1)) ASC_VAL1,
  ASCII(SUBSTR(HOMEWORK,2,1)) ASC_VAL2
FROM
  T10;

HOMEWORK     ASC_VAL1   ASC_VAL2
---------- ---------- ----------
a12345             97         49
A123423            65         49
g13452            103         49
G452323            71         52
h34423            104         51
r34323            114         51
b23232             98         50
n232323           110         50
NB151517           78         66
C0151517           67         48
f9151517          102         57
HE4423             72         69

OK, I see the ones that we want to exclude, let’s build a matrix:

SELECT
  HOMEWORK,
  ASCII(SUBSTR(HOMEWORK,1,1)) ASC_VAL1,
  ASCII(SUBSTR(HOMEWORK,2,1)) ASC_VAL2,
  DECODE(ASCII(SUBSTR(HOMEWORK,1,1)),104,1,72,1,66,1,98,1,78,1,110,1,0) IS_EXC1,
  DECODE(SIGN(ASCII(SUBSTR(HOMEWORK,2,1))-47),1,DECODE(SIGN(ASCII(SUBSTR(HOMEWORK,2,1))-58),-1,1,0),0) IS_EXC2
FROM
  T10;

HOMEWORK     ASC_VAL1   ASC_VAL2    IS_EXC1    IS_EXC2
---------- ---------- ---------- ---------- ----------
a12345             97         49          0          1
A123423            65         49          0          1
g13452            103         49          0          1
G452323            71         52          0          1
h34423            104         51          1          1
r34323            114         51          0          1
b23232             98         50          1          1
n232323           110         50          1          1
NB151517           78         66          1          0
C0151517           67         48          0          1
f9151517          102         57          0          1
HE4423             72         69          1          0

If there is a 1 in both of the right-most columns, then the row should be eliminated.  What is the easiest way to tell if there is a 1 in both columns?  Multiply the column values together, and if we receive a product of 1 then the row should be excluded:

SELECT
  *
FROM
  (SELECT
    HOMEWORK,
    ASCII(SUBSTR(HOMEWORK,1,1)) ASC_VAL1,
    ASCII(SUBSTR(HOMEWORK,2,1)) ASC_VAL2,
    DECODE(ASCII(SUBSTR(HOMEWORK,1,1)), 104,1,72,1,66,1,98,1,78,1,110,1,0) IS_EXC1,
    DECODE(SIGN(ASCII(SUBSTR(HOMEWORK,2,1))-47),1,DECODE(SIGN(ASCII(SUBSTR(HOMEWORK,2,1))-58),-1,1,0),0) IS_EXC2
  FROM
    T10)
WHERE
  IS_EXC1*IS_EXC2<>1;

HOMEWORK     ASC_VAL1   ASC_VAL2    IS_EXC1    IS_EXC2
---------- ---------- ---------- ---------- ----------
a12345             97         49          0          1
A123423            65         49          0          1
g13452            103         49          0          1
G452323            71         52          0          1
r34323            114         51          0          1
NB151517           78         66          1          0
C0151517           67         48          0          1
f9151517          102         57          0          1
HE4423             72         69          1          0

An explanation of the IS_EXC2 column follows:

The numbers 0 through 9 have ASCII values ranging from 48 to 57.

  1. Obtain the second character in the column: SUBSTR(HOMEWORK,2,1) 
  2. Use the ASCII function to find the ASCII value of the second character 
  3. Subtract 47 from the ASCII value for the second character
  4. If the difference is greater than 0, then:
    ** Subtract 58 from that ASCII value
  5. If the difference is less than 0, then we found an ASCII value between 48 and 57 – therefore the second character must be a number
    ** Return the number 1 if the ASCII value is between 48 and 57, otherwise return 0

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Maxim Demenko offered the following as a solution to the problem which uses the RTRIM function: 

Just to mention another approach regarding your question:

SQL> with t as (
   2   select 'a12345' c from dual  union all
   3   select 'A123423' from dual  union all
   4   select 'g13452' from dual  union all
   5   select 'G452323' from dual  union all
   6   select 'h34423' from dual  union all
   7   select 'r34323' from dual  union all
   8   select 'b23232' from dual  union all
   9   select 'n' from dual union all
  10   select 'n232323' from dual
  11  )
  12  -- End test data
  13  select c
  14  from t
  15  where not lower(rtrim(c,'0123456789')) in ('h','b','n')
  16  /

C
-------
a12345
A123423
g13452
G452323
r34323

Maxim’s solution is quite impressive.  Here is an explanation of his solution:

SELECT
  *
FROM
  T10;

HOMEWORK
--------
a12345
A123423
g13452
G452323
h34423
r34323
b23232
n232323
NB151517
C0151517
f9151517
HE4423

The demo table has 12 rows.

The first part of his solution does this:

SELECT
  HOMEWORK,
  RTRIM(HOMEWORK,'0123456789') TEST
FROM
  T10;

HOMEWORK   TEST
---------- ----
a12345     a
A123423    A
g13452     g
G452323    G
h34423     h
r34323     r
b23232     b
n232323    n
NB151517   NB
C0151517   C
f9151517   f
HE4423     HE

Notice in the above that the TEST column shows that the RTRIM function eliminated everything to the right of the first digit, including that first digit.  Then, his solution simply determines if what is left (in the TEST column) is one of h, b, or n, and if it is, the row is eliminated.

The output of Maxim’s solution:

SELECT
  HOMEWORK
FROM
  T10
WHERE
  NOT LOWER(RTRIM(HOMEWORK,'0123456789')) IN ('h','b','n');

HOMEWORK
---------
a12345
A123423
g13452
G452323
r34323
NB151517
C0151517
f9151517
HE4423

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Now that we have seen the hard way to solve the problem and a very clever way to solve it, are there any other ways?

A CASE structure could be used rather than the cumbersome nested DECODE and SIGN statements.  A CASE structure will be easier to maintain:

SELECT
  CASE WHEN ASCII(SUBSTR(HOMEWORK,2,1)) >= 48
        AND ASCII(SUBSTR(HOMEWORK,2,1)) <= 57 THEN 1
    ELSE 0 END IS_EXC2
FROM
  T10;

You could transform this section to a CASE structure also:

DECODE(ASCII(SUBSTR(HOMEWORK,1,1)),104,1,72,1,66,1,98,1,78,1,110,1,0) IS_EXC1

 

SELECT
  CASE ASCII(SUBSTR(HOMEWORK,1,1))
    WHEN 104 THEN 1
    WHEN 72 THEN 1
    WHEN 66 THEN 1
    WHEN 98 THEN 1
    WHEN 78 THEN 1
    WHEN 110 THEN 1
    ELSE 0 END IS_EXC1
FROM
  T10;

Finally, you could combine the two CASE structures in the WHERE clause:

SELECT
  HOMEWORK,
  ASCII(SUBSTR(HOMEWORK,1,1)) ASC_VAL1,
  ASCII(SUBSTR(HOMEWORK,2,1)) ASC_VAL2
FROM
  T10
WHERE
  (CASE ASCII(SUBSTR(HOMEWORK,1,1))
    WHEN 104 THEN 1
    WHEN 72 THEN 1
    WHEN 66 THEN 1
    WHEN 98 THEN 1
    WHEN 78 THEN 1
    WHEN 110 THEN 1
    ELSE 0 END) *
  (CASE WHEN ASCII(SUBSTR(HOMEWORK,2,1)) >= 48
        AND ASCII(SUBSTR(HOMEWORK,2,1)) <= 57 THEN 1
    ELSE 0 END) = 0;

HOMEWORK     ASC_VAL1   ASC_VAL2
---------- ---------- ----------
a12345             97         49
A123423            65         49
g13452            103         49
G452323            71         52
r34323            114         51
NB151517           78         66
C0151517           67         48
f9151517          102         57
HE4423             72         69

Here are a couple more solutions:
The silly way with a MINUS operation:

SELECT
  HOMEWORK
FROM
  T10
MINUS
SELECT
  HOMEWORK
FROM
  T10
WHERE
  UPPER(SUBSTR(HOMEWORK,1,1)) IN ('H','B','N')
  AND SUBSTR(HOMEWORK,2,1) IN ('1','2','3','4','5','6','7','8','9','0');

HOMEWORK
--------
A123423
C0151517
G452323
HE4423
NB151517
a12345
f9151517
g13452
r34323

The neat solution with MINUS:

SELECT
  HOMEWORK
FROM
  T10
MINUS
SELECT
  HOMEWORK
FROM
  T10
WHERE
  UPPER(SUBSTR(HOMEWORK,1,1)) IN ('H','B','N')
  AND SUBSTR(HOMEWORK,2,1) IN (
    SELECT
      TO_CHAR(ROWNUM-1)
    FROM
      DUAL
    CONNECT BY
      LEVEL<=10);

HOMEWORK
--------
A123423
C0151517
G452323
HE4423
NB151517
a12345
f9151517
g13452
r34323

The NOT method:

SELECT
  HOMEWORK
FROM
  T10
WHERE
  NOT(UPPER(SUBSTR(HOMEWORK,1,1)) IN ('H','B','N')
    AND SUBSTR(HOMEWORK,2,1) IN
('1','2','3','4','5','6','7','8','9','0'));

HOMEWORK
--------
a12345
A123423
g13452
G452323
r34323
NB151517
C0151517
f9151517
HE4423

The neat solution with NOT:

SELECT
  HOMEWORK
FROM
  T10
WHERE
  NOT(UPPER(SUBSTR(HOMEWORK,1,1)) IN ('H','B','N')
    AND SUBSTR(HOMEWORK,2,1) IN (
      SELECT
        TO_CHAR(ROWNUM-1)
      FROM
        DUAL
      CONNECT BY
        LEVEL<=10));

HOMEWORK
--------
a12345
A123423
g13452
G452323
r34323
NB151517
C0151517
f9151517
HE4423

The left outer join method:

SELECT
  T10.HOMEWORK
FROM
  T10,
  (SELECT
    HOMEWORK
  FROM
    T10
  WHERE
    (UPPER(SUBSTR(HOMEWORK,1,1)) IN ('H','B','N'))
    AND (SUBSTR(HOMEWORK,2,1) IN (
      SELECT
        TO_CHAR(ROWNUM-1)
      FROM
        DUAL
      CONNECT BY
        LEVEL<=10))) NT10
WHERE
  T10.HOMEWORK=NT10.HOMEWORK(+)
  AND NT10.HOMEWORK IS NULL;

HOMEWORK
--------
A123423
C0151517
r34323
HE4423
g13452
f9151517
a12345
G452323
NB151517

The Cartesian join method:

SELECT
  HOMEWORK
FROM
  T10
WHERE
  UPPER(SUBSTR(HOMEWORK,1,2)) NOT IN
(SELECT
  L||N
FROM
  (SELECT
    DECODE(ROWNUM,1,'H',2,'B',3,'N') L
  FROM
    DUAL
  CONNECT BY
    LEVEL<=3),
  (SELECT
    TO_CHAR(ROWNUM-1) N
  FROM
    DUAL
  CONNECT BY
    LEVEL<=10));

HOMEWORK
--------
a12345
A123423
g13452
G452323
r34323
NB151517
C0151517
f9151517
HE4423

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Mark Powell offered the following method using the TRANSLATE function:

Here is a solution that uses a translate function.  My result vary because I could not remember the actual starting letters specified by the OP as I do not have access to Oracle and the forum at the same time.  I made my solution case sensitive and used “b,g, and h”.  I added two rows to ensure at least one row that started with one of the exclude letters when followed by digits whould appear in the output.

1 > select * from t10
  2  where homework not in (
  3    select homework
  4    from t10
  5    where ( substr(homework,1,1) in ('b','g','h')
  6    and instr(translate(homework,'012345678','999999999'),'9') > 0 ))
  7  /

HOMEWORK
--------------------
a12345
A123423
G452323
r34323
n232323
NB151517
C0151517
f9151517
HE4423
hxxxxxxx          -- added
gabcdefg          -- added

11 rows selected.

The above assumes that all the data is of the form Letter || digits and that no data with mixed letters and digits where the presence of letters should cause the data to not be excluded.  The following would handle data with those rules using something like h123x as a test case.

  5    where ( substr(homework,1,1) in (‘b’,’g’,’h’)
  6    and       replace(translate(substr(homework,2,length (homework)),
  7            ’012345678′,’999999999′),’9′,”) is null

Using an upper or lower rtrim depending on case sensitivity desired as Maxum demostrated does seem a lot slicker of a solution.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

If the OP were running Oracle 10g R1 or later the following would also work:
REGEXP_INSTR method:

SELECT
  HOMEWORK
FROM
  T10
WHERE
  REGEXP_INSTR(UPPER(HOMEWORK),'[HBN][0123456789]')<>1;

HOMEWORK
--------
a12345
A123423
g13452
G452323
r34323
NB151517
C0151517
f9151517
HE4423

Shortened version of the above:

SELECT
  HOMEWORK
FROM
  T10
WHERE
  REGEXP_INSTR(UPPER(HOMEWORK),'[HBN][0-9]')<>1

HOMEWORK
--------
a12345
A123423
g13452
G452323
r34323
NB151517
C0151517
f9151517
HE4423

REGEXP_REPLACE method:

SELECT
  HOMEWORK
FROM
  T10
WHERE
  REGEXP_REPLACE(SUBSTR(UPPER(HOMEWORK),1,2),'[HBN][0123456789]',NULL) IS NOT NULL;

HOMEWORK
--------
a12345
A123423
g13452
G452323
r34323
NB151517
C0151517
f9151517
HE4423







Follow

Get every new post delivered to your Inbox.

Join 139 other followers