Impact of the TRUNC Function on an Indexed Date Column

8 03 2010

March 8, 2010

A recent email from an ERP user’s group mailing list reminded me of a small problem in that ERP program’s modules related to the DATE columns in several of its tables.  In DATE columns that should only contain a date component, rows will occasionally be inserted by one of the ERP program’s modules with a date and time component, for example ’08-MAR-2010 13:01:13′ rather than just ’08-MAR-2010 00:00:00′.  This bug, or feature, leads to unexpected performance issues when normal B*Tree indexes are present on that date column.  To work around the time component in the DATE type columns, the ERP program modules frequently uses a technique like this to perform a comparisons on only the date component of a DATE type columns:

SELECT
  *
FROM
  T3
WHERE
  TRUNC(DATE_COLUMN) = :d1;

In the above D1 is a bind variable.  On occasion, the ERP program will instead pass in the date value as a constant/literal rather than as a bind variable.  What is wrong with the above syntax?  Would the above syntax be considered a bug if the DATE_COLUMN column had a normal B*Tree index?  Is there a better way to retrieve the required rows?  Incidentally, I started re-reading the book “Troubleshooting Oracle Performance” and I encountered a couple of interesting sentences on page 7 that seem to address this issue:

“For all intents and purposes, an application experiencing poor performance is no worse [should probably state no better] than an application failing to fulfill its functional requirements. In both situations, the application is useless.”

Let’s try a couple of experiments to see why the above SQL statement requires improvement.

First, we will create table T2 that will serve as a nearly sequential ordered row source with a small amount of randomization introduced into the data by the DBMS_RANDOM function.  This row source will be used to help duplicate the essentially random arrival rate of transactions into our T3 test table:

CREATE TABLE T2 AS
SELECT
  DBMS_RANDOM.VALUE(0,0.55555)+ROWNUM/10000 DAYS
FROM
  (SELECT
    ROWNUM RN
  FROM
    DUAL
  CONNECT BY
    LEVEL<=10000) V1,
  (SELECT
    ROWNUM RN
  FROM
    DUAL
  CONNECT BY
    LEVEL<=10000) V2;

The date column in our T3 table is derived from SYSDATE, so ideally the rows should be in order by the DAYS column in table T2.  In a production environment, on rare occasion someone will slip in a row that is not in sequential order through an edit of the DATE column for a row, so we should be able to simulate that slight randomness by creating another table from table T2 before generating table T3 (the rows will fill the table blocks with an occasional row that is out of date sequence):

CREATE TABLE T2_ORDERED NOLOGGING AS
SELECT
  DAYS
FROM
  T2
ORDER BY
  DAYS;

For our simulation, we have a final problem that needs to be addressed.  The volume of data entered on a Saturday in the production database is less than that for a Monday through Friday, and the volume of data entered on a Sunday is less than that entered on a Saturday.  To add just a little more randomness, we will insert the rows into table T3 based on the following criteria:

  • 90% chance of a row from T2_ORDERED being included if the date falls on a Monday through Friday
  • 20% chance of a row from T2_ORDERED being included if the date falls on a Saturday
  • 10% chance of a row from T2_ORDERED being included if the date falls on a Sunday

The SQL statement to build table T3 follows:

CREATE TABLE T3 NOLOGGING AS
SELECT
  DAYS+TO_DATE('01-JAN-1990','DD-MON-YYYY') C1,
  DAYS+TO_DATE('01-JAN-1990','DD-MON-YYYY') C2,
  LPAD('A',255,'A') C3
FROM
  T2_ORDERED
WHERE
  DECODE(TO_CHAR(DAYS+TO_DATE('01-JAN-1990','DD-MON-YYYY'),'D'),'1',0.9,'7',0.8,0.1)<DBMS_RANDOM.VALUE(0,1);

CREATE INDEX IND_T3_C2 ON T3(C2);

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T3',CASCADE=>TRUE)

ALTER TABLE T3 MODIFY C1 NOT NULL;
ALTER TABLE T3 MODIFY C2 NOT NULL;

Let’s check the data distribution:

SELECT
  COUNT(*) NUM_ROWS,
  SUM(DECODE(TO_CHAR(C1,'D'),'6',1,0)) FRIDAYS,
  SUM(DECODE(TO_CHAR(C1,'D'),'7',1,0)) SATURDAYS,
  SUM(DECODE(TO_CHAR(C1,'D'),'1',1,0)) SUNDAYS
FROM
  T3;

  NUM_ROWS    FRIDAYS  SATURDAYS    SUNDAYS
---------- ---------- ---------- ----------
68,579,287 12,858,100  2,855,164  1,428,569

From the above we are able to determine that roughly 18.7% of the rows have a date that is on a Friday, roughly 4.2% of the rows have a date that is on a Saturday, and 2.1% of the rows are on a Sunday.

This test will be performed on Oracle Database 11.2.0.1 with the __DB_CACHE_SIZE hidden parameter floating at roughly 7,381,975,040 (6.875GB).  I will use my toy project for performance tuning to submit the SQL statements and retrieve the DBMS_XPLAN output, but the same could be accomplished with just SQL*Plus (most tests can also be performed using my Automated DBMS_XPLAN tool).

Let’s start simple, we will start with a simple SQL statement to retrieve the rows with today’s date (March 8, 2010) using literals against the indexed column.  I will execute each SQL statement twice to take advantage of any previously cached blocks in the buffer cache, and eliminate the time consumed by the hard parse:

SELECT
  C1,
  C2,
  C3
FROM
  T3
WHERE
  TRUNC(C2) = TO_DATE('08-MAR-2010','DD-MON-YYYY');

SQL_ID  3us49wsdzdun3, child number 1
-------------------------------------
SELECT    C1,    C2,    C3  FROM    T3  WHERE    TRUNC(C2) =
TO_DATE('08-MAR-2010','DD-MON-YYYY')

Plan hash value: 4161002650

---------------------------------------------------------------------------------------------
| Id  | Operation         | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |      1 |        |   9114 |00:02:31.93 |    2743K|   2743K|
|*  1 |  TABLE ACCESS FULL| T3   |      1 |   9114 |   9114 |00:02:31.93 |    2743K|   2743K|
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(TRUNC(INTERNAL_FUNCTION("C2"))=TO_DATE(' 2010-03-08 00:00:00',
              'syyyy-mm-dd hh24:mi:ss'))

Note
-----
   - cardinality feedback used for this statement

Roughly 2 minutes and 32 seconds.  Notice the Note at the bottom of the DBMS_XPLAN output, cardinality feedback (apparently not documented) is a new feature in Oracle Database 11.2.0.1 (see here for a related blog article).  The first execution required 2 minutes and 33 seconds, but a predicted cardinality of 685,000 rows (1% of the total) was returned for the first execution.  The second execution generated a second child cursor with a corrected cardinality estimate based on the actual number of rows returned during the first execution.  2 minutes and 32 seconds is not bad, unless this is an OLTP application and an end user is waiting for the application to return the rows.

Let’s try again with a modified, equivalent SQL statement, again executing the SQL statement twice:

SELECT
  C1,
  C2,
  C3
FROM
  T3
WHERE
  C2 >= TO_DATE('08-MAR-2010','DD-MON-YYYY')
  AND C2 < TO_DATE('08-MAR-2010','DD-MON-YYYY')+1;

SQL_ID  c7jfpa0rpt95a, child number 0
-------------------------------------
SELECT    C1,    C2,    C3  FROM    T3  WHERE    C2 >=
TO_DATE('08-MAR-2010','DD-MON-YYYY')    AND C2 <
TO_DATE('08-MAR-2010','DD-MON-YYYY')+1

Plan hash value: 4176467757

---------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |      1 |        |   9114 |00:00:00.02 |     575 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T3        |      1 |   6859 |   9114 |00:00:00.02 |     575 |
|*  2 |   INDEX RANGE SCAN          | IND_T3_C2 |      1 |   6859 |   9114 |00:00:00.01 |     118 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C2">=TO_DATE(' 2010-03-08 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "C2"<TO_DATE(' 2010-03-09 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

0.02 seconds compared to 2 minutes and 32 seconds.  You will notice that the estimated number of rows, while not exact, is reasonably close even without a cardinality feedback adjustment.  Also notice that the optimizer adjusted the date calculation that was in the WHERE clause of the SQL statement.

Let’s try again with a second modified, equivalent SQL statement, again executing the SQL statement twice:

SELECT
  C1,
  C2,
  C3
FROM
  T3
WHERE
  C2 BETWEEN TO_DATE('08-MAR-2010','DD-MON-YYYY')
    AND TO_DATE('08-MAR-2010','DD-MON-YYYY') + (1-1/24/60/60);

SQL_ID  7xthpspukrbtv, child number 0
-------------------------------------
SELECT    C1,    C2,    C3  FROM    T3  WHERE    C2 BETWEEN
TO_DATE('08-MAR-2010','DD-MON-YYYY')      AND
TO_DATE('08-MAR-2010','DD-MON-YYYY') + (1-1/24/60/60)

Plan hash value: 4176467757

---------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |      1 |        |   9114 |00:00:00.02 |     575 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T3        |      1 |   6860 |   9114 |00:00:00.02 |     575 |
|*  2 |   INDEX RANGE SCAN          | IND_T3_C2 |      1 |   6860 |   9114 |00:00:00.01 |     118 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C2">=TO_DATE(' 2010-03-08 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "C2"<=TO_DATE(' 2010-03-08 23:59:59', 'syyyy-mm-dd hh24:mi:ss'))

0.02 seconds again, and the estimated number of rows is roughly the same as we achieved with the previous SQL statement.  By checking the Predicate Information section of the DBMS_XPLAN output we see that the optimizer has transformed the BETWEEN syntax into roughly the same syntax as was used in the previous SQL statement.

Let’s try again with bind variables (the bind variable names are automatically changed by ADO into generic names, and that is why the bind variable appears in the execution plan as :1 rather than :d1):

SELECT
  C1,
  C2,
  C3
FROM
  T3
WHERE
  TRUNC(C2) = :d1;

SQL_ID  cub25jm7y8zck, child number 0
-------------------------------------
SELECT    C1,    C2,    C3  FROM    T3  WHERE    TRUNC(C2) = :1

Plan hash value: 4161002650

---------------------------------------------------------------------------------------------
| Id  | Operation         | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |      1 |        |   9114 |00:02:33.37 |    2743K|   2743K|
|*  1 |  TABLE ACCESS FULL| T3   |      1 |    685K|   9114 |00:02:33.37 |    2743K|   2743K|
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(TRUNC(INTERNAL_FUNCTION("C2"))=:1)

Notice that our friendly note about Cardinality Feedback did not appear this time, and that the cardinality estimate was not corrected when the SQL statement was executed for the second time, even though bind variable peeking did happen.  The incorrect cardinality estimate would not have changed the execution plan for this SQL statement, but could impact the execution plan if table T3 were joined to another table.

Let’s try the other equivalent SQL statement with bind variables:

SELECT
  C1,
  C2,
  C3
FROM
  T3
WHERE
  C2 >= :d1
  AND C2 < :d2 +1;

SQL_ID  9j2a54zbzb9cz, child number 0
-------------------------------------
SELECT    C1,    C2,    C3  FROM    T3  WHERE    C2 >= :1    AND C2 <
:2 +1

Plan hash value: 3025660695

----------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |           |      1 |        |   9114 |00:00:00.02 |     575 |
|*  1 |  FILTER                      |           |      1 |        |   9114 |00:00:00.02 |     575 |
|   2 |   TABLE ACCESS BY INDEX ROWID| T3        |      1 |   6859 |   9114 |00:00:00.02 |     575 |
|*  3 |    INDEX RANGE SCAN          | IND_T3_C2 |      1 |   6859 |   9114 |00:00:00.01 |     118 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(:1<:2+1)
   3 - access("C2">=:1 AND "C2"<:2+1)

The optimizer estimated that 6,859 rows would be returned, just as it did when we used literals in the SQL statement  because of bind variable peeking.  In case you are wondering, the same estimated row cardinality was returned when the D2 bind variable was set to ’09-MAR-2010′ in the application and the +1 was removed from the SQL statement.

Quite the problem we caused by pretending to not understand the impact of using a function in the WHERE clause on an indexed column.  We could create a function based index to work around the problem of the application programmers not knowing how to specify a specific date without using the TRUNC function on a DATE column:

CREATE INDEX IND_T3_C2_FBI ON T3(TRUNC(C2));

ALTER SYSTEM FLUSH SHARED_POOL;

But is creating a function based index the best approach, or have we just created another problem rather than attacking the root cause of the original problem?  We now have two indexes on the same column that need to be updated every time a row is inserted or deleted in table T3, and also maintained every time that column is updated (even when updated with the same value).  Let's experiment with the function based index.

SELECT
  C1,
  C2,
  C3
FROM
  T3
WHERE
  TRUNC(C2) = TO_DATE('08-MAR-2010','DD-MON-YYYY');

SQL_ID  3us49wsdzdun3, child number 1
-------------------------------------
SELECT    C1,    C2,    C3  FROM    T3  WHERE    TRUNC(C2) =
TO_DATE('08-MAR-2010','DD-MON-YYYY')

Plan hash value: 3662266936

-------------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name          | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |               |      1 |        |   9114 |00:00:00.01 |     576 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T3            |      1 |   9114 |   9114 |00:00:00.01 |     576 |
|*  2 |   INDEX RANGE SCAN          | IND_T3_C2_FBI |      1 |   9114 |   9114 |00:00:00.01 |     119 |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("T3"."SYS_NC00004$"=TO_DATE(' 2010-03-08 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

Note
-----
   - cardinality feedback used for this statement

Cardinality feedback again helps out the cardinality estimate on the second execution, but look at the Predicate Information section of the execution plan.  We have now increased the difficulty of walking through a complicated execution plan with the help of the Predicate Information section of the execution plan to see how the predicates in the WHERE clause are applied to the execution plan.  Not so bad, right?  What happens if this column C2 is joined to a column in another table, or even specified as being equal to column C1 in this table?  Let's take a look:

SELECT
  C1,
  C2,
  C3
FROM
  T3
WHERE
  C2 >= TO_DATE('08-MAR-2010','DD-MON-YYYY')
  AND C2 < TO_DATE('08-MAR-2010','DD-MON-YYYY')+1
  AND C2=C1;

SQL_ID  27rqhg1mpmzt9, child number 1
-------------------------------------
SELECT    C1,    C2,    C3  FROM    T3  WHERE    C2 >=
TO_DATE('08-MAR-2010','DD-MON-YYYY')    AND C2 <
TO_DATE('08-MAR-2010','DD-MON-YYYY')+1    AND C2=C1

Plan hash value: 4176467757

---------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |      1 |        |   9114 |00:00:00.01 |     575 |
|*  1 |  TABLE ACCESS BY INDEX ROWID| T3        |      1 |   9114 |   9114 |00:00:00.01 |     575 |
|*  2 |   INDEX RANGE SCAN          | IND_T3_C2 |      1 |   9114 |   9114 |00:00:00.01 |     118 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(("C2"="C1" AND "C1">=TO_DATE(' 2010-03-08 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss') AND "C1"<TO_DATE(' 2010-03-09 00:00:00', 'syyyy-mm-dd hh24:mi:ss')))
   2 - access("C2">=TO_DATE(' 2010-03-08 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "C2"<TO_DATE(' 2010-03-09 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

Note
-----
   - cardinality feedback used for this statement

On the first execution the E-Rows column for plan ID 1 showed that the cardinality estimate was 1 row, and on the second execution the cardinality estimate was corrected to 9114.  Notice that transitive closure took place - the filter operation on plan ID 1 shows the same restrictions for column C1 as had applied to column C2 in the WHERE clause.

Let's try again with the SQL statement using the TRUNC function - this SQL statement will use the function based index:

SELECT
  C1,
  C2,
  C3
FROM
  T3
WHERE
  TRUNC(C2) = TO_DATE('08-MAR-2010','DD-MON-YYYY')
  AND C2=C1;

SQL_ID  ftu92j3z99ppr, child number 1
-------------------------------------
SELECT    C1,    C2,    C3  FROM    T3  WHERE    TRUNC(C2) =
TO_DATE('08-MAR-2010','DD-MON-YYYY')    AND C2=C1

Plan hash value: 3662266936

-------------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name          | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |               |      1 |        |   9114 |00:00:00.01 |     576 |
|*  1 |  TABLE ACCESS BY INDEX ROWID| T3            |      1 |   9114 |   9114 |00:00:00.01 |     576 |
|*  2 |   INDEX RANGE SCAN          | IND_T3_C2_FBI |      1 |   9114 |   9114 |00:00:00.01 |     119 |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("C2"="C1")
   2 - access("T3"."SYS_NC00004$"=TO_DATE(' 2010-03-08 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

Note
-----
   - cardinality feedback used for this statement

The cardinality estimate is again correct because of cardinality feedback, but notice what is missing from the Predicate Information section of the execution plan (transitive closure did not happen).

So, does the use of TRUNC(DATE_COLUMN) without the presence of a function based index qualify as an application bug?  What if a the function based index is present - is it still a bug?

Something possibly interesting, but unrelated.  I executed the following commands:

ALTER INDEX IND_T3_C2_FBI UNUSABLE;

(perform a little more testing)

ALTER INDEX IND_T3_C2_FBI REBUILD;

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T3',CASCADE=>TRUE)

I received the following after several minutes:

BEGIN DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T3',CASCADE=>TRUE);
END;

*
ERROR at line 1:
ORA-00600: internal error code, arguments: [15851], [3], [2], [1], [1], [], [],
[], [], [], [], []
ORA-06512: at "SYS.DBMS_STATS", line 20337
ORA-06512: at "SYS.DBMS_STATS", line 20360
ORA-06512: at line 1

The same error appeared without the CASCADE option, but a call to collect statistics on the indexes for the table, as well as other tables completes successfully.  I may look at this problem again later.

Continuing, we will create another table:

CREATE TABLE T4 NOLOGGING AS
SELECT
  *
FROM
  T3
WHERE
  C2 BETWEEN TO_DATE('01-JAN-2010','DD-MON-YYYY')
    AND TO_DATE('08-MAR-2010','DD-MON-YYYY') + (1-1/24/60/60);

CREATE INDEX IND_T4_C2 ON T4(C2);

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T4',CASCADE=>TRUE)

Before we start, let's take a look at the disk space used by the objects and the automatically allocated extent sizes:

SELECT
  SEGMENT_NAME SEGMENT,
  (SUM(BYTES))/1048576 TOTAL_MB
FROM
  DBA_EXTENTS
WHERE
  OWNER=USER
  AND SEGMENT_NAME IN ('IND_T3_C2','IND_T3_C2_FBI','T3','T4','IND_T4_C2')
GROUP BY
  SEGMENT_NAME
ORDER BY
  SEGMENT_NAME;

SEGMENT           TOTAL_MB
--------------- ----------
IND_T3_C2             1469
IND_T3_C2_FBI         1472
IND_T4_C2               10
T3                   21480
T4                     144  

SELECT
  SEGMENT_NAME SEGMENT,
  COUNT(*) EXTENTS,
  BYTES/1024 EXT_SIZE_KB,
  (COUNT(*) * BYTES)/1048576 TOTAL_MB
FROM
  DBA_EXTENTS
WHERE
  OWNER=USER
  AND SEGMENT_NAME IN ('IND_T3_C2','IND_T3_C2_FBI','T3','T4','IND_T4_C2')
GROUP BY
  SEGMENT_NAME,
  BYTES
ORDER BY
  SEGMENT_NAME,
  BYTES;

SEGMENT            EXTENTS EXT_SIZE_KB   TOTAL_MB
--------------- ---------- ----------- ----------
IND_T3_C2               16          64          1
IND_T3_C2               63        1024         63
IND_T3_C2              120        8192        960
IND_T3_C2                1       27648         27
IND_T3_C2                1       34816         34
IND_T3_C2                6       65536        384
IND_T3_C2_FBI           16          64          1
IND_T3_C2_FBI           63        1024         63
IND_T3_C2_FBI          120        8192        960
IND_T3_C2_FBI            7       65536        448
IND_T4_C2               16          64          1
IND_T4_C2                9        1024          9
T3                      16          64          1
T3                      63        1024         63
T3                     120        8192        960
T3                       1       19456         19
T3                       1       43008         42
T3                       1       44032         43
T3                     318       65536      20352
T4                      16          64          1
T4                      63        1024         63
T4                      10        8192         80

Table T3 is using about 21GB of space while table T4 is using about 144MB of space.  We occasionally received an extent size that is not a power of 2 in size - a bit unexpected.  Let's try a couple of SQL statements that access the two tables:

SELECT
  T3.C1,
  T3.C2,
  T4.C3
FROM
  T3,
  T4
WHERE
  TRUNC(T3.C2) = TO_DATE('08-MAR-2010','DD-MON-YYYY')
  AND T3.C2=T4.C2;

SQL_ID  f2v7cf7w2bwqq, child number 0
-------------------------------------
SELECT    T3.C1,    T3.C2,    T4.C3  FROM    T3,    T4  WHERE   
TRUNC(T3.C2) = TO_DATE('08-MAR-2010','DD-MON-YYYY')     AND T3.C2=T4.C2

Plan hash value: 1631978485

--------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name          | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |               |      1 |        |  10044 |00:00:00.38 |   18622 |     25 |       |       |          |
|*  1 |  HASH JOIN                   |               |      1 |   7095 |  10044 |00:00:00.38 |   18622 |     25 |  1223K|  1223K| 1593K (0)|
|   2 |   TABLE ACCESS BY INDEX ROWID| T3            |      1 |   6857 |   9114 |00:00:00.04 |     394 |     25 |       |       |          |
|*  3 |    INDEX RANGE SCAN          | IND_T3_C2_FBI |      1 |   6857 |   9114 |00:00:00.03 |      28 |     25 |       |       |          |
|   4 |   TABLE ACCESS FULL          | T4            |      1 |    452K|    452K|00:00:00.12 |   18228 |      0 |       |       |          |
--------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T3"."C2"="T4"."C2")
   3 - access("T3"."SYS_NC00004$"=TO_DATE(' 2010-03-08 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

The above used a full table scan on table T4, and you will notice that a filter predicate is not applied to table T4 to reduce the number of rows entering the hash join.  Transitive closure did not take place.  Let's try again with the SQL statement with the other syntax that does not use the TRUNC function, nor the function based index:

SELECT
  T3.C1,
  T3.C2,
  T4.C3
FROM
  T3,
  T4
WHERE
  T3.C2 BETWEEN TO_DATE('08-MAR-2010','DD-MON-YYYY')
    AND TO_DATE('08-MAR-2010','DD-MON-YYYY') + (1-1/24/60/60)
  AND T3.C2=T4.C2;

SQL_ID  5swqbjak147vk, child number 0
-------------------------------------
SELECT    T3.C1,    T3.C2,    T4.C3  FROM    T3,    T4  WHERE    T3.C2
BETWEEN TO_DATE('08-MAR-2010','DD-MON-YYYY')       AND
TO_DATE('08-MAR-2010','DD-MON-YYYY') + (1-1/24/60/60)    AND T3.C2=T4.C2

Plan hash value: 3991319422

----------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |           |      1 |        |  10044 |00:00:00.06 |     983 |      1 |       |       |          |
|*  1 |  HASH JOIN                   |           |      1 |   6761 |  10044 |00:00:00.06 |     983 |      1 |  1223K|  1223K| 1618K (0)|
|   2 |   TABLE ACCESS BY INDEX ROWID| T3        |      1 |   6860 |   9114 |00:00:00.01 |     393 |      0 |       |       |          |
|*  3 |    INDEX RANGE SCAN          | IND_T3_C2 |      1 |   6860 |   9114 |00:00:00.01 |      27 |      0 |       |       |          |
|   4 |   TABLE ACCESS BY INDEX ROWID| T4        |      1 |   6762 |   9114 |00:00:00.03 |     590 |      1 |       |       |          |
|*  5 |    INDEX RANGE SCAN          | IND_T4_C2 |      1 |   6762 |   9114 |00:00:00.01 |     127 |      1 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T3"."C2"="T4"."C2")
   3 - access("T3"."C2">=TO_DATE(' 2010-03-08 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "T3"."C2"<=TO_DATE(' 2010-03-08
              23:59:59', 'syyyy-mm-dd hh24:mi:ss'))
   5 - access("T4"."C2">=TO_DATE(' 2010-03-08 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "T4"."C2"<=TO_DATE(' 2010-03-08
              23:59:59', 'syyyy-mm-dd hh24:mi:ss'))

Notice this time that transitive closure happened, allowing the optimizer to take advantage of the IND_T4_C2 index on table T4.

You are probably thinking, it must be that we need a function based index on the C2 column of table T4 also to allow transitive closure to happen.  Let's try:

CREATE INDEX IND_T4_C2_FBI ON T4(TRUNC(C2));

ALTER SYSTEM FLUSH SHARED_POOL;

Now our SQL statement again:

SELECT
  T3.C1,
  T3.C2,
  T4.C3
FROM
  T3,
  T4
WHERE
  TRUNC(T3.C2) = TO_DATE('08-MAR-2010','DD-MON-YYYY')
  AND T3.C2=T4.C2;

SQL_ID  f2v7cf7w2bwqq, child number 0
-------------------------------------
SELECT    T3.C1,    T3.C2,    T4.C3  FROM    T3,    T4  WHERE   
TRUNC(T3.C2) = TO_DATE('08-MAR-2010','DD-MON-YYYY')     AND T3.C2=T4.C2

Plan hash value: 1631978485

-----------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name          | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |               |      1 |        |  10044 |00:00:00.33 |   18622 |       |       |          |
|*  1 |  HASH JOIN                   |               |      1 |   7095 |  10044 |00:00:00.33 |   18622 |  1223K|  1223K| 1584K (0)|
|   2 |   TABLE ACCESS BY INDEX ROWID| T3            |      1 |   6857 |   9114 |00:00:00.01 |     394 |       |       |          |
|*  3 |    INDEX RANGE SCAN          | IND_T3_C2_FBI |      1 |   6857 |   9114 |00:00:00.01 |      28 |       |       |          |
|   4 |   TABLE ACCESS FULL          | T4            |      1 |    452K|    452K|00:00:00.11 |   18228 |       |       |          |
-----------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T3"."C2"="T4"."C2")
   3 - access("T3"."SYS_NC00004$"=TO_DATE(' 2010-03-08 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

As expected, the function based index on column C4 of table T4 was not used because transitive closure did not happen.  Do we still want to do it the wrong way?  The execution time could have been much longer than 0.33 seconds, of course, if table T4 were much larger and a large number of physical reads were required.  Try again using a larger table T4:

DROP TABLE T4 PURGE;

CREATE TABLE T4 NOLOGGING AS
SELECT
  *
FROM
  T3
WHERE
  C2 BETWEEN TO_DATE('01-JAN-2000','DD-MON-YYYY')
    AND TO_DATE('08-MAR-2010','DD-MON-YYYY') + (1-1/24/60/60);

CREATE INDEX IND_T4_C2 ON T4(C2);

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T4',CASCADE=>TRUE)

Table T4 now requires about 7.9GB of disk space.  Now a range scan that accesses tables T3 and T4 (each SQL statement is executed twice, with the last execution plan reported):

SELECT
  T3.C1,
  T3.C2,
  T4.C3
FROM
  T3,
  T4
WHERE
  TRUNC(T3.C2) BETWEEN TO_DATE('08-MAR-2009','DD-MON-YYYY')
    AND TO_DATE('01-JUL-2009','DD-MON-YYYY')
  AND T3.C2=T4.C2;

SQL_ID  2d4f5x92axqgn, child number 0
-------------------------------------
SELECT    T3.C1,    T3.C2,    T4.C3  FROM    T3,    T4  WHERE   
TRUNC(T3.C2) BETWEEN TO_DATE('08-MAR-2009','DD-MON-YYYY')      AND
TO_DATE('01-JUL-2009','DD-MON-YYYY')    AND T3.C2=T4.C2

Plan hash value: 1631978485

--------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name          | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |               |      1 |        |    874K|00:00:33.24 |    1062K|    302K|       |       |          |
|*  1 |  HASH JOIN                   |               |      1 |    849K|    874K|00:00:33.24 |    1062K|    302K|    33M|  5591K|   50M (0)|
|   2 |   TABLE ACCESS BY INDEX ROWID| T3            |      1 |    802K|    795K|00:00:00.56 |   33957 |      0 |       |       |          |
|*  3 |    INDEX RANGE SCAN          | IND_T3_C2_FBI |      1 |    802K|    795K|00:00:00.20 |    2115 |      0 |       |       |          |
|   4 |   TABLE ACCESS FULL          | T4            |      1 |     25M|     25M|00:00:17.13 |    1028K|    302K|       |       |          |
--------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T3"."C2"="T4"."C2")
   3 - access("T3"."SYS_NC00004$">=TO_DATE(' 2009-03-08 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "T3"."SYS_NC00004$"<=TO_DATE('
              2009-07-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

Notice the full table scan of table T4.  Now the other SQL statement:

SELECT
  T3.C1,
  T3.C2,
  T4.C3
FROM
  T3,
  T4
WHERE
  T3.C2 BETWEEN TO_DATE('08-MAR-2009','DD-MON-YYYY')
    AND TO_DATE('01-JUL-2009','DD-MON-YYYY') + (1-1/24/60/60)
  AND T3.C2=T4.C2;

SQL_ID  539d93k50ruz3, child number 0
-------------------------------------
SELECT    T3.C1,    T3.C2,    T4.C3  FROM    T3,    T4  WHERE    T3.C2
BETWEEN TO_DATE('08-MAR-2009','DD-MON-YYYY')      AND
TO_DATE('01-JUL-2009','DD-MON-YYYY') + (1-1/24/60/60)    AND T3.C2=T4.C2

Plan hash value: 1243183227

-----------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |           |      1 |        |    874K|00:00:05.80 |   85051 |    574 |       |       |          |
|   1 |  MERGE JOIN                   |           |      1 |    795K|    874K|00:00:05.80 |   85051 |    574 |       |       |          |
|   2 |   TABLE ACCESS BY INDEX ROWID | T4        |      1 |    795K|    795K|00:00:02.43 |   51097 |    574 |       |       |          |
|*  3 |    INDEX RANGE SCAN           | IND_T4_C2 |      1 |    795K|    795K|00:00:00.41 |   10841 |      0 |       |       |          |
|*  4 |   SORT JOIN                   |           |    795K|    795K|    874K|00:00:02.00 |   33954 |      0 |    30M|  1977K|   26M (0)|
|   5 |    TABLE ACCESS BY INDEX ROWID| T3        |      1 |    795K|    795K|00:00:00.50 |   33954 |      0 |       |       |          |
|*  6 |     INDEX RANGE SCAN          | IND_T3_C2 |      1 |    795K|    795K|00:00:00.17 |    2114 |      0 |       |       |          |
-----------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T4"."C2">=TO_DATE(' 2009-03-08 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "T4"."C2"<=TO_DATE(' 2009-07-01
              23:59:59', 'syyyy-mm-dd hh24:mi:ss'))
   4 - access("T3"."C2"="T4"."C2")
       filter("T3"."C2"="T4"."C2")
   6 - access("T3"."C2">=TO_DATE(' 2009-03-08 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "T3"."C2"<=TO_DATE(' 2009-07-01
              23:59:59', 'syyyy-mm-dd hh24:mi:ss'))

Notice that the above used the index on table T4, but performed a sort-merge join between the two tables.  We are able to force a hash join, as was used with the other SQL statement, by applying a hint:

SQL_ID  b9q6tf6p6x2m0, child number 0
-------------------------------------
SELECT /*+ USE_HASH (T3 T4) */    T3.C1,    T3.C2,    T4.C3  FROM   
T3,    T4  WHERE    T3.C2 BETWEEN TO_DATE('08-MAR-2009','DD-MON-YYYY') 
    AND TO_DATE('01-JUL-2009','DD-MON-YYYY') + (1-1/24/60/60)    AND
T3.C2=T4.C2

Plan hash value: 3991319422

-------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |           |      1 |        |    874K|00:00:03.60 |   85051 |       |       |          |
|*  1 |  HASH JOIN                   |           |      1 |    795K|    874K|00:00:03.60 |   85051 |    33M|  5591K|   50M (0)|
|   2 |   TABLE ACCESS BY INDEX ROWID| T3        |      1 |    795K|    795K|00:00:00.54 |   33954 |       |       |          |
|*  3 |    INDEX RANGE SCAN          | IND_T3_C2 |      1 |    795K|    795K|00:00:00.19 |    2114 |       |       |          |
|   4 |   TABLE ACCESS BY INDEX ROWID| T4        |      1 |    795K|    795K|00:00:01.44 |   51097 |       |       |          |
|*  5 |    INDEX RANGE SCAN          | IND_T4_C2 |      1 |    795K|    795K|00:00:00.40 |   10841 |       |       |          |
-------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T3"."C2"="T4"."C2")
   3 - access("T3"."C2">=TO_DATE(' 2009-03-08 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "T3"."C2"<=TO_DATE(' 2009-07-01
              23:59:59', 'syyyy-mm-dd hh24:mi:ss'))
   5 - access("T4"."C2">=TO_DATE(' 2009-03-08 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "T4"."C2"<=TO_DATE(' 2009-07-01
              23:59:59', 'syyyy-mm-dd hh24:mi:ss'))

9.2 times faster (just 5.7 times faster without the hint) by not using the TRUNC function and function-based index combination.  Are we able to just agree to do it the right way, or should I continue?  Without the function based index we receive an execution plan like this:

SQL_ID  2d4f5x92axqgn, child number 0
-------------------------------------
SELECT    T3.C1,    T3.C2,    T4.C3  FROM    T3,    T4  WHERE   
TRUNC(T3.C2) BETWEEN TO_DATE('08-MAR-2009','DD-MON-YYYY')      AND
TO_DATE('01-JUL-2009','DD-MON-YYYY')    AND T3.C2=T4.C2

Plan hash value: 1396201636

-------------------------------------------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
-------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |    874K|00:03:05.88 |    3771K|   3013K|       |       |          |
|*  1 |  HASH JOIN         |      |      1 |    849K|    874K|00:03:05.88 |    3771K|   3013K|    33M|  5591K|   52M (0)|
|*  2 |   TABLE ACCESS FULL| T3   |      1 |    802K|    795K|00:02:34.36 |    2743K|   2743K|       |       |          |
|   3 |   TABLE ACCESS FULL| T4   |      1 |     25M|     25M|00:00:15.72 |    1028K|    270K|       |       |          |
-------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T3"."C2"="T4"."C2")
   2 - filter((TRUNC(INTERNAL_FUNCTION("C2"))>=TO_DATE(' 2009-03-08 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              TRUNC(INTERNAL_FUNCTION("C2"))<=TO_DATE(' 2009-07-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')))

A full table scan of a 21GB table and 7.9GB table, with 795,000 rows from the large table and 25,000,000 rows from the small table entering the hash join - probably not too good for performance.  Fix the performance bug in the application and let the user get back to counting the pencils in the pencil jar 9.2 times faster (or 51.6 times faster if there is no function based index).

While you might not frequently join two tables on a DATE column as I have done in this demonstration, how common is it to store numeric data in a VARCHAR2 column, and then need to be able to compare those values with numbers stored in NUMBER type columns, with literals, or numeric bind variables?


Actions

Information

One response

22 04 2010
Blogroll Report 05/02/2010 – 12/03/2010 « Coskan’s Approach to Oracle

[...] 16-What is the impact of trunc function on an indexed date column? Charles Hooper-Impact of the TRUNC Function on an Indexed Date Column [...]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s




Follow

Get every new post delivered to your Inbox.

Join 142 other followers

%d bloggers like this: