06 | January | 2010 | Charles Hooper's Oracle Notes

DATE Datatype Or NUMBER Datatype – Which Should be Used?

6 01 2010

January 6, 2010

In a recent discussion thread on the Oracle forums, the following question was asked:
http://forums.oracle.com/forums/thread.jspa?threadID=1007653

I have a scenario where I need to store the data in the format YYYYMM (e.g. 201001 which means January, 2010). I am trying to evaluate what is the most appropriate datatype to store this kind of data. I am comparing 2 options, NUMBER and DATE. As the data is essentially a component of oracle date datatype and experts like Tom Kyte have proved (with examples) that using right datatype is better for optimizer. So I was expecting that using DATE datatype will yield (at least) similar (if not better) cardinality estimates than using NUMBER datatype. However, my tests show that when using DATE the cardinality estimates are way off from actuals whereas sing NUMBER the cardinality estimates are much closer to actuals.
My questions are:
1) What should be the most appropriate datatype used to store YYYYMM data?
2) Why does using DATE datatype yield estimates that are way off from actuals than using NUMBER datatype?

Test case (update Jan 7, 2010 : there was a copy-paste error in the line for collecting statistics on table B – the original version of the script posted here collected statistics on table A twice):
create table a nologging as select to_number(to_char(add_months(to_date('200101','YYYYMM'),level - 1), 'YYYYMM')) id from dual connect by level <= 289;

create table b (id number) ;

begin
  for i in 1..8192
  loop
    insert into b select * from a ;
  end loop;
  commit;
end;
/ 

alter table a add dt date;

alter table b add dt date;

update a set dt = to_date(id, 'YYYYMM');

update b set dt = to_date(id, 'YYYYMM');

commit;

exec dbms_stats.gather_table_stats(user, 'A', estimate_percent=>NULL);

exec dbms_stats.gather_table_stats(user, 'B', estimate_percent=>NULL);

explain plan for select count(*) from b where id between 200810 and 200903;
select * from table(dbms_xplan.display);

explain plan for select count(*) from b where dt between to_date(200810, 'YYYYMM') and to_date(200903, 'YYYYMM');
select * from table(dbms_xplan.display);

This is an interesting problem, why would using the NUMBER datatype yield better cardinality estimates than the example with the DATE datatype? When the NUMBER datatype was used, the optimizer predicted that the full table scan operation would return 46,604 rows, while the optimizer predicted that the full table scan would return 5,919 rows when the DATE datatype was used – the actual number of rows returned is 49,152.

The person who posted the above test case later stated that he believes that the DATE datatype is the correct choice, but he would have a difficult time justifying that opinion when confronted by someone suggesting the use of the NUMBER data type.

I posted the test results from my run with Oracle 11.1.0.7:

SQL> set autotrace traceonly explain
SQL> select count(*) from b where id between 200810 and 200903 ;

Execution Plan
----------------------------------------------------------
Plan hash value: 749587668

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     5 |  4715   (1)| 00:00:57 |
|   1 |  SORT AGGREGATE    |      |     1 |     5 |            |          |
|*  2 |   TABLE ACCESS FULL| B    |   108K|   527K|  4715   (1)| 00:00:57 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("ID"<=200903 AND "ID">=200810)

SQL> select count(*) from b where dt between to_date(200810, 'YYYYMM') and to_date(200903, 'YYYYMM') ;

Execution Plan
----------------------------------------------------------
Plan hash value: 749587668

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     8 |  4718   (2)| 00:00:57 |
|   1 |  SORT AGGREGATE    |      |     1 |     8 |            |          |
|*  2 |   TABLE ACCESS FULL| B    | 57166 |   446K|  4718   (2)| 00:00:57 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("DT"<=TO_DATE(' 2009-03-01 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss') AND "DT">=TO_DATE(' 2008-10-01 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss'))

SQL> set autotrace off
SQL> select count(*) from b where id between 200810 and 200903 ;

  COUNT(*)
----------
     49152

SQL> select count(*) from b where dt between to_date(200810, 'YYYYMM') and to_date(200903, 'YYYYMM') ;

  COUNT(*)
----------
     49152

Well, it seems that Oracle 11.1.0.7 predicted that when the NUMBER datatype was used, the full table scan would return roughly 108,000 rows. Oracle 11.1.0.7 predicted that when the DATE datatype was used, the full table scan would return 57,166 rows – significantly closer to the actual number of 49,152. If there were an index on that column, how would the different cardinality estimates affect the possibility that the optimizer might select to use that index rather than a full table scan? What if the data volume were increased by a factor of, say 1,000 or 1,000,000?

I also captured a 10053 trace during the test run, and found this in the trace file:

******************************************
----- Current SQL Statement for this session (sql_id=7uk18xj0z9uxf) -----
select count(*) from b where id between 200810 and 200903
*******************************************
...
***************************************
SINGLE TABLE ACCESS PATH
  Single Table Cardinality Estimation for B[B]

  Table: B  Alias: B
    Card: Original: 2367488.000000  Rounded: 108124  Computed: 108124.16  Non Adjusted: 108124.16
  Access Path: TableScan
    Cost:  4714.53  Resp: 4714.53  Degree: 0
      Cost_io: 4670.00  Cost_cpu: 636216240
      Resp_io: 4670.00  Resp_cpu: 636216240
  Best:: AccessPath: TableScan
         Cost: 4714.53  Degree: 1  Resp: 4714.53  Card: 108124.16  Bytes: 0

***************************************
...
...
...
******************************************
----- Current SQL Statement for this session (sql_id=2ac0k15zjdg5x) -----
select count(*) from b where dt between to_date(200810, 'YYYYMM') and to_date(200903, 'YYYYMM')
*******************************************
...

***************************************
SINGLE TABLE ACCESS PATH
  Single Table Cardinality Estimation for B[B]

  Table: B  Alias: B
    Card: Original: 2367488.000000  Rounded: 57166  Computed: 57165.51  Non Adjusted: 57165.51
  Access Path: TableScan
    Cost:  4717.89  Resp: 4717.89  Degree: 0
      Cost_io: 4670.00  Cost_cpu: 684264079
      Resp_io: 4670.00  Resp_cpu: 684264079
  Best:: AccessPath: TableScan
         Cost: 4717.89  Degree: 1  Resp: 4717.89  Card: 57165.51  Bytes: 0

***************************************

Notice in the above that Oracle’s statistics gathering process did not create histograms when I collected statistics for the tables. The calculated cost is the same for either datatype, but what would happen if that table were then joined to another table? Is the optimizer seeing histograms in some of the original poster’s test cases?

The original poster is running Oracle Database 10.2.0.1, so I ran a test on Oracle 10.2.0.2 with OPTIMIZER_FEATURES_ENABLE set to 10.2.0.1, using the data created by the OP’s data creation script:

ALTER SESSION SET TRACEFILE_IDENTIFIER = 'DateTest';
ALTER SESSION SET EVENTS '10053 trace name context forever, level 1';

select count(*) from b where id between 200810 and 200903 ;

select count(*) from b where dt between to_date(200810, 'YYYYMM') and to_date(200903, 'YYYYMM') ;

ALTER SESSION SET EVENTS '10053 trace name context off';

set autotrace traceonly explain

select count(*) from b where id between 200810 and 200903 ;

select count(*) from b where dt between to_date(200810, 'YYYYMM') and to_date(200903, 'YYYYMM') ;

set autotrace off

select count(*) from b where id between 200810 and 200903;

The output from the above script, when run on Oracle 10.2.0.2 follows:

SQL> select count(*) from b where id between 200810 and 200903 ;

Execution Plan
----------------------------------------------------------
Plan hash value: 749587668

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     5 |   919  (11)| 00:00:05 |
|   1 |  SORT AGGREGATE    |      |     1 |     5 |            |          |
|*  2 |   TABLE ACCESS FULL| B    |   108K|   527K|   919  (11)| 00:00:05 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("ID"<=200903 AND "ID">=200810)

SQL> select count(*) from b where dt between to_date(200810, 'YYYYMM') and to_date(200903, 'YYYYMM') ;

Execution Plan
----------------------------------------------------------
Plan hash value: 749587668

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     8 |   926  (12)| 00:00:05 |
|   1 |  SORT AGGREGATE    |      |     1 |     8 |            |          |
|*  2 |   TABLE ACCESS FULL| B    | 57166 |   446K|   926  (12)| 00:00:05 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("DT"<=TO_DATE('2009-03-01 00:00:00', 'yyyy-mm-dd
              hh24:mi:ss') AND "DT">=TO_DATE('2008-10-01 00:00:00', 'yyyy-mm-dd
              hh24:mi:ss'))

SQL> set autotrace off

SQL> select count(*) from b where id between 200810 and 200903 ;

  COUNT(*)
----------
     49152

The estimated cardinalities appear to be identical to that of Oracle 11.1.0.7, so why was the original poster seeing different cardinality estimates? Here is the output from the 10053 trace file:

******************************************
Current SQL statement for this session:
select count(*) from b where id between 200810 and 200903
*******************************************
...
  PARAMETERS WITH ALTERED VALUES
  ******************************
  optimizer_features_enable           = 10.2.0.1
  *********************************
...
BASE STATISTICAL INFORMATION
***********************
Table Stats::
  Table:  B  Alias:  B
    #Rows: 2367488  #Blks:  16725  AvgRowLen:  13.00
***************************************
SINGLE TABLE ACCESS PATH
  Column (#1): ID(NUMBER)
    AvgLen: 5.00 NDV: 289 Nulls: 0 Density: 0.0034602 Min: 200101 Max: 202501
  Table:  B  Alias: B    
    Card: Original: 2367488  Rounded: 108124  Computed: 108124.16  Non Adjusted: 108124.16
  Access Path: TableScan
    Cost:  918.67  Resp: 918.67  Degree: 0
      Cost_io: 819.00  Cost_cpu: 632570063
      Resp_io: 819.00  Resp_cpu: 632570063
  Best:: AccessPath: TableScan
         Cost: 918.67  Degree: 1  Resp: 918.67  Card: 108124.16  Bytes: 0
***************************************
...
...
...
******************************************
Current SQL statement for this session:
select count(*) from b where dt between to_date(200810, 'YYYYMM') and to_date(200903, 'YYYYMM')
*******************************************
...
  *************************************
  PARAMETERS WITH ALTERED VALUES
  ******************************
  optimizer_features_enable           = 10.2.0.1
  *********************************
...
BASE STATISTICAL INFORMATION
***********************
Table Stats::
  Table:  B  Alias:  B
    #Rows: 2367488  #Blks:  16725  AvgRowLen:  13.00
***************************************
SINGLE TABLE ACCESS PATH
  Column (#2): DT(DATE)
    AvgLen: 8.00 NDV: 289 Nulls: 0 Density: 0.0034602 Min: 2451911 Max: 2460677
  Table:  B  Alias: B    
    Card: Original: 2367488  Rounded: 57166  Computed: 57165.51  Non Adjusted: 57165.51
  Access Path: TableScan
    Cost:  926.24  Resp: 926.24  Degree: 0
      Cost_io: 819.00  Cost_cpu: 680617902
      Resp_io: 819.00  Resp_cpu: 680617902
  Best:: AccessPath: TableScan
         Cost: 926.24  Degree: 1  Resp: 926.24  Card: 57165.51  Bytes: 0

Notice that no histograms were collected based on the 10053 trace file.

Now a second test, this time we will instruct Oracle to create histograms, and also force the optimizer to hard parse the SQL statements that reference table B when those SQL statements are re-executed:

exec dbms_stats.gather_table_stats(user, 'B', estimate_percent=>NULL,method_opt=>'FOR ALL COLUMNS SIZE 254',no_invalidate=>false);

ALTER SESSION SET TRACEFILE_IDENTIFIER = 'DateTest2';
ALTER SESSION SET EVENTS '10053 trace name context forever, level 1';

select count(*) from b where id between 200810 and 200903 ;

select count(*) from b where dt between to_date(200810, 'YYYYMM') and to_date(200903, 'YYYYMM');

ALTER SESSION SET EVENTS '10053 trace name context off';

set autotrace traceonly explain

select count(*) from b where id between 200810 and 200903;

select count(*) from b where dt between to_date(200810, 'YYYYMM') and to_date(200903, 'YYYYMM');

set autotrace off

So, what is the output of the above?

SQL> select count(*) from b where id between 200810 and 200903;

Execution Plan
----------------------------------------------------------
Plan hash value: 749587668

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     5 |   919  (11)| 00:00:05 |
|   1 |  SORT AGGREGATE    |      |     1 |     5 |            |          |
|*  2 |   TABLE ACCESS FULL| B    | 46604 |   227K|   919  (11)| 00:00:05 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("ID"<=200903 AND "ID">=200810)

SQL> select count(*) from b where dt between to_date(200810, 'YYYYMM') and to_date(200903, 'YYYYMM');

Execution Plan
----------------------------------------------------------
Plan hash value: 749587668

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     8 |   926  (12)| 00:00:05 |
|   1 |  SORT AGGREGATE    |      |     1 |     8 |            |          |
|*  2 |   TABLE ACCESS FULL| B    | 46604 |   364K|   926  (12)| 00:00:05 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("DT"<=TO_DATE('2009-03-01 00:00:00', 'yyyy-mm-dd
              hh24:mi:ss') AND "DT">=TO_DATE('2008-10-01 00:00:00', 'yyyy-mm-dd
              hh24:mi:ss'))

Interesting, both queries estimate that the full table scan operation will return 46,604 rows – interesting. That cardinality estimate exactly matches the cardinality estimate in the OP’s plan for the SQL statement that accessed the NUMBER datatype…

For fun, let’s look in the 10053 trace file:

******************************************
Current SQL statement for this session:
select count(*) from b where id between 200810 and 200903
*******************************************
...
BASE STATISTICAL INFORMATION
***********************
Table Stats::
  Table:  B  Alias:  B
    #Rows: 2367488  #Blks:  16725  AvgRowLen:  13.00
***************************************
SINGLE TABLE ACCESS PATH
  Column (#1): ID(NUMBER)
    AvgLen: 5.00 NDV: 289 Nulls: 0 Density: 0.0034602 Min: 200101 Max: 202501
    Histogram: HtBal  #Bkts: 254  UncompBkts: 254  EndPtVals: 255
  Table:  B  Alias: B    
    Card: Original: 2367488  Rounded: 46604  Computed: 46604.09  Non Adjusted: 46604.09
  Access Path: TableScan
    Cost:  918.76  Resp: 918.76  Degree: 0
      Cost_io: 819.00  Cost_cpu: 633149246
      Resp_io: 819.00  Resp_cpu: 633149246
  Best:: AccessPath: TableScan
         Cost: 918.76  Degree: 1  Resp: 918.76  Card: 46604.09  Bytes: 0
...
...
...
******************************************
Current SQL statement for this session:
select count(*) from b where dt between to_date(200810, 'YYYYMM') and to_date(200903, 'YYYYMM')
*******************************************
...
BASE STATISTICAL INFORMATION
***********************
Table Stats::
  Table:  B  Alias:  B
    #Rows: 2367488  #Blks:  16725  AvgRowLen:  13.00
***************************************
SINGLE TABLE ACCESS PATH
  Column (#2): DT(DATE)
    AvgLen: 8.00 NDV: 289 Nulls: 0 Density: 0.0034602 Min: 2451911 Max: 2460677
    Histogram: HtBal  #Bkts: 254  UncompBkts: 254  EndPtVals: 255
  Table:  B  Alias: B    
    Card: Original: 2367488  Rounded: 46604  Computed: 46604.09  Non Adjusted: 46604.09
  Access Path: TableScan
    Cost:  926.22  Resp: 926.22  Degree: 0
      Cost_io: 819.00  Cost_cpu: 680499006
      Resp_io: 819.00  Resp_cpu: 680499006
  Best:: AccessPath: TableScan
         Cost: 926.22  Degree: 1  Resp: 926.22  Card: 46604.09  Bytes: 0

The 10053 trace file shows that in both cases a height balanced histogram with 254 buckets was created. But, how accurate would the estimate be if there were 1,000 or 1,000,000 times as many rows? What if the time interval were changed to something else? What if each of the 289 distinct values for the ID and DT columns did not have an equal distribution of values?

So, why select a DATE datatype rather than a NUMBER datatype? These are the reasons that I proposed in the discussion thread:

One of the problems with putting date values in number columns is this – if you select the range from 200810 to 200903, the optimizer will likely make the assumption that 200810 is just as likely of a number as 200808, 200812, 200814, 200816, 200820, 200890, 200900, etc. Some of those year/month combinations are simply not possible. In such a case, the optimizer should over-estimate the number of rows returned from that range when the column data type is NUMBER, and should be reasonably close when the column data type is DATE, since the optimizer knows that 200814 (14/1/2008), 200816 (16/1/2008), 200820 (20/1/2008), 200890 (90/1/2008), 200900 (0/1/2009), etc. could never be dates (and would be completely out of the serial sequence of dates). By putting the date type data into a DATE column, you have essentially added a constraint to the database to prevent invalid dates from being added. Additionally, date math, such as finding the number of days between 200802 and 200803 (compared to 200702 and 200703) is very simple – the answer is not 1 in both cases, but rather 29 and 28, respectively.

Any other comments?

OK, enough guessing, let’s try a couple tests. Here is the test table, with 10,000,000 rows with an uneven distribution of rows for each value:

DROP TABLE B PURGE;

CREATE TABLE B AS
SELECT
  TO_NUMBER(TO_CHAR(TRUNC(TO_DATE('200101','YYYYDD')+SQRT(ROWNUM),'MM'),'YYYYMM')) ID,
  TRUNC(TO_DATE('200101','YYYYDD')+SQRT(ROWNUM),'MM') DT
FROM
  (SELECT
    ROWNUM RN
  FROM
    DUAL
  CONNECT BY
    LEVEL<=1000) V1,
  (SELECT
    ROWNUM RN
  FROM
    DUAL
  CONNECT BY
    LEVEL<=10000) V2;

CREATE INDEX IND_B_ID ON B(ID);
CREATE INDEX IND_B_DT ON B(DT);

SET LINESIZE 130
SET PAGESIZE 1000

The first test script with no histograms, supplying various ranges for year-month combinations while gathering execution statistics, and displaying the actual execution plans:

EXEC DBMS_STATS.GATHER_TABLE_STATS(USER,'B',CASCADE=>TRUE,METHOD_OPT=>'FOR ALL COLUMNS SIZE 1',NO_INVALIDATE=>FALSE);

SPOOL HISTOGRAMTEST.TXT

SELECT COUNT(*) FROM B;

SELECT /*+ GATHER_PLAN_STATISTICS */ COUNT(*) FROM B WHERE ID BETWEEN 200810 AND 200903;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

SELECT /*+ GATHER_PLAN_STATISTICS */ COUNT(*) FROM B WHERE DT BETWEEN TO_DATE(200810,'YYYYMM') AND TO_DATE(200903,'YYYYMM');

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

SELECT /*+ GATHER_PLAN_STATISTICS */ COUNT(*) FROM B WHERE ID BETWEEN 200110 AND 200203;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

SELECT /*+ GATHER_PLAN_STATISTICS */ COUNT(*) FROM B WHERE DT BETWEEN TO_DATE(200110,'YYYYMM') AND TO_DATE(200203,'YYYYMM');

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

SELECT /*+ GATHER_PLAN_STATISTICS */ COUNT(*) FROM B WHERE ID BETWEEN 200210 AND 200303;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

SELECT /*+ GATHER_PLAN_STATISTICS */ COUNT(*) FROM B WHERE DT BETWEEN TO_DATE(200210,'YYYYMM') AND TO_DATE(200303,'YYYYMM');

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

SELECT /*+ GATHER_PLAN_STATISTICS */ COUNT(*) FROM B WHERE ID BETWEEN 200812 AND 200901;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

SELECT /*+ GATHER_PLAN_STATISTICS */ COUNT(*) FROM B WHERE DT BETWEEN TO_DATE(200812,'YYYYMM') AND TO_DATE(200901,'YYYYMM');

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

SELECT /*+ GATHER_PLAN_STATISTICS */ COUNT(*) FROM B WHERE ID BETWEEN 200112 AND 200201;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

SELECT /*+ GATHER_PLAN_STATISTICS */ COUNT(*) FROM B WHERE DT BETWEEN TO_DATE(200112,'YYYYMM') AND TO_DATE(200201,'YYYYMM');

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

SELECT /*+ GATHER_PLAN_STATISTICS */ COUNT(*) FROM B WHERE ID BETWEEN 200612 AND 200901;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

SELECT /*+ GATHER_PLAN_STATISTICS */ COUNT(*) FROM B WHERE DT BETWEEN TO_DATE(200612,'YYYYMM') AND TO_DATE(200901,'YYYYMM');

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

SPOOL OFF

The trimmed output (from Oracle 11.1.0.7) of the first date range follows:

SQL> SELECT COUNT(*) FROM B;

  COUNT(*)                                                                                                                       
----------                                                                                                                       
  10000000                                                                                                                       

SQL> SELECT /*+ GATHER_PLAN_STATISTICS */ COUNT(*) FROM B WHERE ID BETWEEN 200810 AND 200903;

----------------------------------------------------------------------------------------                                         
| Id  | Operation         | Name     | Starts | E-Rows | A-Rows |   A-Time   | Buffers |                                         
----------------------------------------------------------------------------------------                                         
|   0 | SELECT STATEMENT  |          |      1 |        |      1 |00:00:00.06 |    2371 |                                         
|   1 |  SORT AGGREGATE   |          |      1 |      1 |      1 |00:00:00.06 |    2371 |                                         
|*  2 |   INDEX RANGE SCAN| IND_B_ID |      1 |   1310K|   1063K|00:00:00.01 |    2371 |                                         
----------------------------------------------------------------------------------------                                         

Predicate Information (identified by operation id):                                                                              
---------------------------------------------------                                                                              
   2 - access("ID">=200810 AND "ID"<=200903)                                                                                     

SQL>                                                                                                                                   
SQL> SELECT /*+ GATHER_PLAN_STATISTICS */ COUNT(*) FROM B WHERE DT BETWEEN TO_DATE(200810,'YYYYMM') AND TO_DATE(200903,'YYYYMM');

----------------------------------------------------------------------------------------                                         
| Id  | Operation         | Name     | Starts | E-Rows | A-Rows |   A-Time   | Buffers |                                         
----------------------------------------------------------------------------------------                                         
|   0 | SELECT STATEMENT  |          |      1 |        |      1 |00:00:00.06 |    2816 |                                         
|   1 |  SORT AGGREGATE   |          |      1 |      1 |      1 |00:00:00.06 |    2816 |                                         
|*  2 |   INDEX RANGE SCAN| IND_B_DT |      1 |    674K|   1063K|00:00:00.01 |    2816 |                                         
----------------------------------------------------------------------------------------                                         

Predicate Information (identified by operation id):                                                                              
---------------------------------------------------                                                                              
   2 - access("DT">=TO_DATE(' 2008-10-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')                                                    
              AND "DT"<=TO_DATE(' 2009-03-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

In the above, the estimated number of rows with the comparison on the numeric column is about 250,000 rows above the actual, while the comparison on the date column is about 400,000 rows below the actual – could this be enough of a difference to change the execution plan if the clustering factor of the indexes were high? What if the tables had a larger average row length? What if this table were joined with another table? Note that the number of logical blocks accessed is less with the index on the numeric column.

The trimmed output of the second date range follows:

SQL> SELECT /*+ GATHER_PLAN_STATISTICS */ COUNT(*) FROM B WHERE ID BETWEEN 200110 AND 200203;

----------------------------------------------------------------------------------------                                         
| Id  | Operation         | Name     | Starts | E-Rows | A-Rows |   A-Time   | Buffers |                                         
----------------------------------------------------------------------------------------                                         
|   0 | SELECT STATEMENT  |          |      1 |        |      1 |00:00:00.01 |     299 |                                         
|   1 |  SORT AGGREGATE   |          |      1 |      1 |      1 |00:00:00.01 |     299 |                                         
|*  2 |   INDEX RANGE SCAN| IND_B_ID |      1 |   1344K|    132K|00:00:00.01 |     299 |                                         
----------------------------------------------------------------------------------------                                         

Predicate Information (identified by operation id):                                                                              
---------------------------------------------------                                                                              
   2 - access("ID">=200110 AND "ID"<=200203)                                                                                     

SQL>
SQL> SELECT /*+ GATHER_PLAN_STATISTICS */ COUNT(*) FROM B WHERE DT BETWEEN TO_DATE(200110,'YYYYMM') AND TO_DATE(200203,'YYYYMM');

----------------------------------------------------------------------------------------                                         
| Id  | Operation         | Name     | Starts | E-Rows | A-Rows |   A-Time   | Buffers |                                         
----------------------------------------------------------------------------------------                                         
|   0 | SELECT STATEMENT  |          |      1 |        |      1 |00:00:00.01 |     353 |                                         
|   1 |  SORT AGGREGATE   |          |      1 |      1 |      1 |00:00:00.01 |     353 |                                         
|*  2 |   INDEX RANGE SCAN| IND_B_DT |      1 |    674K|    132K|00:00:00.01 |     353 |                                         
----------------------------------------------------------------------------------------                                         

Predicate Information (identified by operation id):                                                                              
---------------------------------------------------                                                                              
   2 - access("DT">=TO_DATE(' 2001-10-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')                                                    
              AND "DT"<=TO_DATE(' 2002-03-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

In the above, the estimated number of rows with the comparison on the numeric column is about 10 times as high as the actual number of rows, while the estimated number of rows with the comparison on the date column is about 5 times as high. Could this be enough to trigger a different execution plan for the queries – where one uses an index access, while the other uses a full table scan?

The trimmed output of the third date range follows:

SQL> SELECT /*+ GATHER_PLAN_STATISTICS */ COUNT(*) FROM B WHERE ID BETWEEN 200210 AND 200303;

----------------------------------------------------------------------------------------                                         
| Id  | Operation         | Name     | Starts | E-Rows | A-Rows |   A-Time   | Buffers |                                         
----------------------------------------------------------------------------------------                                         
|   0 | SELECT STATEMENT  |          |      1 |        |      1 |00:00:00.03 |     594 |                                         
|   1 |  SORT AGGREGATE   |          |      1 |      1 |      1 |00:00:00.03 |     594 |                                         
|*  2 |   INDEX RANGE SCAN| IND_B_ID |      1 |   1344K|    265K|00:00:00.01 |     594 |                                         
----------------------------------------------------------------------------------------                                         

Predicate Information (identified by operation id):                                                                              
---------------------------------------------------                                                                              
   2 - access("ID">=200210 AND "ID"<=200303)                                                                                     

SQL>
SQL> SELECT /*+ GATHER_PLAN_STATISTICS */ COUNT(*) FROM B WHERE DT BETWEEN TO_DATE(200210,'YYYYMM') AND TO_DATE(200303,'YYYYMM');

----------------------------------------------------------------------------------------                                         
| Id  | Operation         | Name     | Starts | E-Rows | A-Rows |   A-Time   | Buffers |                                         
----------------------------------------------------------------------------------------                                         
|   0 | SELECT STATEMENT  |          |      1 |        |      1 |00:00:00.03 |     705 |                                         
|   1 |  SORT AGGREGATE   |          |      1 |      1 |      1 |00:00:00.03 |     705 |                                         
|*  2 |   INDEX RANGE SCAN| IND_B_DT |      1 |    674K|    265K|00:00:00.01 |     705 |                                         
----------------------------------------------------------------------------------------                                         

Predicate Information (identified by operation id):                                                                              
---------------------------------------------------                                                                              
   2 - access("DT">=TO_DATE(' 2002-10-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')                                                    
              AND "DT"<=TO_DATE(' 2003-03-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

In this case the estimates are about the same as the previous test, but the actual number of rows has doubled. The optimizer’s estimates are again in favor of the date datatype.

The trimmed output of the fourth date range follows:

SQL> SELECT /*+ GATHER_PLAN_STATISTICS */ COUNT(*) FROM B WHERE ID BETWEEN 200812 AND 200901;

----------------------------------------------------------------------------------------                                         
| Id  | Operation         | Name     | Starts | E-Rows | A-Rows |   A-Time   | Buffers |                                         
----------------------------------------------------------------------------------------                                         
|   0 | SELECT STATEMENT  |          |      1 |        |      1 |00:00:00.03 |     810 |                                         
|   1 |  SORT AGGREGATE   |          |      1 |      1 |      1 |00:00:00.03 |     810 |                                         
|*  2 |   INDEX RANGE SCAN| IND_B_ID |      1 |   1285K|    362K|00:00:00.01 |     810 |                                         
----------------------------------------------------------------------------------------                                         

Predicate Information (identified by operation id):                                                                              
---------------------------------------------------                                                                              
   2 - access("ID">=200812 AND "ID"<=200901)                                                                                     

SQL>
SQL> SELECT /*+ GATHER_PLAN_STATISTICS */ COUNT(*) FROM B WHERE DT BETWEEN TO_DATE(200812,'YYYYMM') AND TO_DATE(200901,'YYYYMM');

----------------------------------------------------------------------------------------                                         
| Id  | Operation         | Name     | Starts | E-Rows | A-Rows |   A-Time   | Buffers |                                         
----------------------------------------------------------------------------------------                                         
|   0 | SELECT STATEMENT  |          |      1 |        |      1 |00:00:00.03 |     962 |                                         
|   1 |  SORT AGGREGATE   |          |      1 |      1 |      1 |00:00:00.03 |     962 |                                         
|*  2 |   INDEX RANGE SCAN| IND_B_DT |      1 |    291K|    362K|00:00:00.01 |     962 |                                         
----------------------------------------------------------------------------------------                                         

Predicate Information (identified by operation id):                                                                              
---------------------------------------------------                                                                              
   2 - access("DT">=TO_DATE(' 2008-12-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')                                                    
              AND "DT"<=TO_DATE(' 2009-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

This time, the optimizer’s estimate when the date datatype was used is very close, while the optimizer’s estimate when the numeric datatype was used is about 4 times greater than the actual number of rows.

The trimmed output of the fifth date range follows:

SQL> SELECT /*+ GATHER_PLAN_STATISTICS */ COUNT(*) FROM B WHERE ID BETWEEN 200112 AND 200201;

----------------------------------------------------------------------------------------                                         
| Id  | Operation         | Name     | Starts | E-Rows | A-Rows |   A-Time   | Buffers |                                         
----------------------------------------------------------------------------------------                                         
|   0 | SELECT STATEMENT  |          |      1 |        |      1 |00:00:00.03 |     104 |                                         
|   1 |  SORT AGGREGATE   |          |      1 |      1 |      1 |00:00:00.03 |     104 |                                         
|*  2 |   INDEX RANGE SCAN| IND_B_ID |      1 |   1295K|  45260 |00:00:00.01 |     104 |                                         
----------------------------------------------------------------------------------------                                         

Predicate Information (identified by operation id):                                                                              
---------------------------------------------------                                                                              
   2 - access("ID">=200112 AND "ID"<=200201)                                                                                     

SQL>
SQL> SELECT /*+ GATHER_PLAN_STATISTICS */ COUNT(*) FROM B WHERE DT BETWEEN TO_DATE(200112,'YYYYMM') AND TO_DATE(200201,'YYYYMM');

----------------------------------------------------------------------------------------                                         
| Id  | Operation         | Name     | Starts | E-Rows | A-Rows |   A-Time   | Buffers |                                         
----------------------------------------------------------------------------------------                                         
|   0 | SELECT STATEMENT  |          |      1 |        |      1 |00:00:00.01 |     122 |                                         
|   1 |  SORT AGGREGATE   |          |      1 |      1 |      1 |00:00:00.01 |     122 |                                         
|*  2 |   INDEX RANGE SCAN| IND_B_DT |      1 |    291K|  45260 |00:00:00.01 |     122 |                                         
----------------------------------------------------------------------------------------                                         

Predicate Information (identified by operation id):                                                                              
---------------------------------------------------                                                                              
   2 - access("DT">=TO_DATE(' 2001-12-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')                                                    
              AND "DT"<=TO_DATE(' 2002-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

This time with the numeric datatype, the optimizer is estimating 1,295,000 rows when in fact only 45,260 are returned during the index range scan. The estimate with the date datatype is also quite high, but it is 4 to 5 times lower (thus closer to the actual) than with the numeric datatype.

Now, the final trimmed output:

SQL> SELECT /*+ GATHER_PLAN_STATISTICS */ COUNT(*) FROM B WHERE ID BETWEEN 200612 AND 200901;

--------------------------------------------------------------------------------------------                                     
| Id  | Operation             | Name     | Starts | E-Rows | A-Rows |   A-Time   | Buffers |                                     
--------------------------------------------------------------------------------------------                                     
|   0 | SELECT STATEMENT      |          |      1 |        |      1 |00:00:00.78 |   22345 |                                     
|   1 |  SORT AGGREGATE       |          |      1 |      1 |      1 |00:00:00.78 |   22345 |                                     
|*  2 |   INDEX FAST FULL SCAN| IND_B_ID |      1 |   3764K|   4054K|00:00:00.22 |   22345 |                                     
--------------------------------------------------------------------------------------------                                     

Predicate Information (identified by operation id):                                                                              
---------------------------------------------------                                                                              
   2 - filter(("ID">=200612 AND "ID"<=200901))                                                                                   

SQL>
SQL> SELECT /*+ GATHER_PLAN_STATISTICS */ COUNT(*) FROM B WHERE DT BETWEEN TO_DATE(200612,'YYYYMM') AND TO_DATE(200901,'YYYYMM');

-------------------------------------------------------------------------------------                                            
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |                                            
-------------------------------------------------------------------------------------                                            
|   0 | SELECT STATEMENT   |      |      1 |        |      1 |00:00:00.47 |   24950 |                                            
|   1 |  SORT AGGREGATE    |      |      1 |      1 |      1 |00:00:00.47 |   24950 |                                            
|*  2 |   TABLE ACCESS FULL| B    |      1 |   2623K|   4054K|00:00:00.12 |   24950 |                                            
-------------------------------------------------------------------------------------                                            

Predicate Information (identified by operation id):                                                                              
---------------------------------------------------                                                                              
   2 - filter(("DT">=TO_DATE(' 2006-12-01 00:00:00', 'syyyy-mm-dd                                                                
              hh24:mi:ss') AND "DT"<=TO_DATE(' 2009-01-01 00:00:00', 'syyyy-mm-dd                                                
              hh24:mi:ss')))

Here, with the wider date range, the optimizer is closer with the numeric data type, and has selected to perform a fast full scan of the index, which would use multi-block reads if disk accesses were required. Note that the optimizer selected to perform a full table scan when the date datatype was used, even though the estimated number of rows was less. Note too that this in-memory operation completed about twice as fast as the in-memory index fast full scan operation.

Now let’s take a look at what happens when a histogram is present on each of the columns. The script is identical to the previous script, except for the first two lines:

EXEC DBMS_STATS.GATHER_TABLE_STATS(USER,'B',CASCADE=>TRUE,METHOD_OPT=>'FOR ALL COLUMNS SIZE 254',NO_INVALIDATE=>FALSE);

SPOOL HISTOGRAMTEST2.TXT

Below are the results from this test run:

SQL> SELECT COUNT(*) FROM B;

  COUNT(*)     
----------     
  10000000     

SQL>
SQL> SELECT /*+ GATHER_PLAN_STATISTICS */ COUNT(*) FROM B WHERE ID BETWEEN 200810 AND 200903;

----------------------------------------------------------------------------------------   
| Id  | Operation         | Name     | Starts | E-Rows | A-Rows |   A-Time   | Buffers |   
----------------------------------------------------------------------------------------   
|   0 | SELECT STATEMENT  |          |      1 |        |      1 |00:00:00.09 |    2371 |   
|   1 |  SORT AGGREGATE   |          |      1 |      1 |      1 |00:00:00.09 |    2371 |   
|*  2 |   INDEX RANGE SCAN| IND_B_ID |      1 |   1083K|   1063K|00:00:00.01 |    2371 |   
----------------------------------------------------------------------------------------   

Predicate Information (identified by operation id):  
---------------------------------------------------  
   2 - access("ID">=200810 AND "ID"<=200903)         

SQL>
SQL> SELECT /*+ GATHER_PLAN_STATISTICS */ COUNT(*) FROM B WHERE DT BETWEEN TO_DATE(200810,'YYYYMM') AND TO_DATE(200903,'YYYYMM');

----------------------------------------------------------------------------------------   
| Id  | Operation         | Name     | Starts | E-Rows | A-Rows |   A-Time   | Buffers |   
----------------------------------------------------------------------------------------   
|   0 | SELECT STATEMENT  |          |      1 |        |      1 |00:00:00.09 |    2816 |   
|   1 |  SORT AGGREGATE   |          |      1 |      1 |      1 |00:00:00.09 |    2816 |   
|*  2 |   INDEX RANGE SCAN| IND_B_DT |      1 |   1083K|   1063K|00:00:00.01 |    2816 |   
----------------------------------------------------------------------------------------   

Predicate Information (identified by operation id):  
---------------------------------------------------  
   2 - access("DT">=TO_DATE(' 2008-10-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')              
              AND "DT"<=TO_DATE(' 2009-03-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))         

SQL> SELECT /*+ GATHER_PLAN_STATISTICS */ COUNT(*) FROM B WHERE ID BETWEEN 200110 AND 200203;

----------------------------------------------------------------------------------------   
| Id  | Operation         | Name     | Starts | E-Rows | A-Rows |   A-Time   | Buffers |   
----------------------------------------------------------------------------------------   
|   0 | SELECT STATEMENT  |          |      1 |        |      1 |00:00:00.01 |     299 |   
|   1 |  SORT AGGREGATE   |          |      1 |      1 |      1 |00:00:00.01 |     299 |   
|*  2 |   INDEX RANGE SCAN| IND_B_ID |      1 |    130K|    132K|00:00:00.01 |     299 |   
----------------------------------------------------------------------------------------   

Predicate Information (identified by operation id):  
---------------------------------------------------  
   2 - access("ID">=200110 AND "ID"<=200203)         

SQL>
SQL> SELECT /*+ GATHER_PLAN_STATISTICS */ COUNT(*) FROM B WHERE DT BETWEEN TO_DATE(200110,'YYYYMM') AND TO_DATE(200203,'YYYYMM');

----------------------------------------------------------------------------------------   
| Id  | Operation         | Name     | Starts | E-Rows | A-Rows |   A-Time   | Buffers |   
----------------------------------------------------------------------------------------   
|   0 | SELECT STATEMENT  |          |      1 |        |      1 |00:00:00.01 |     353 |   
|   1 |  SORT AGGREGATE   |          |      1 |      1 |      1 |00:00:00.01 |     353 |   
|*  2 |   INDEX RANGE SCAN| IND_B_DT |      1 |    130K|    132K|00:00:00.01 |     353 |   
----------------------------------------------------------------------------------------   

Predicate Information (identified by operation id):  
---------------------------------------------------  
   2 - access("DT">=TO_DATE(' 2001-10-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')              
              AND "DT"<=TO_DATE(' 2002-03-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))         

SQL> SELECT /*+ GATHER_PLAN_STATISTICS */ COUNT(*) FROM B WHERE ID BETWEEN 200210 AND 200303;

----------------------------------------------------------------------------------------   
| Id  | Operation         | Name     | Starts | E-Rows | A-Rows |   A-Time   | Buffers |   
----------------------------------------------------------------------------------------   
|   0 | SELECT STATEMENT  |          |      1 |        |      1 |00:00:00.01 |     594 |   
|   1 |  SORT AGGREGATE   |          |      1 |      1 |      1 |00:00:00.01 |     594 |   
|*  2 |   INDEX RANGE SCAN| IND_B_ID |      1 |    284K|    265K|00:00:00.01 |     594 |   
----------------------------------------------------------------------------------------   

Predicate Information (identified by operation id):  
---------------------------------------------------  
   2 - access("ID">=200210 AND "ID"<=200303)         

SQL>
SQL> SELECT /*+ GATHER_PLAN_STATISTICS */ COUNT(*) FROM B WHERE DT BETWEEN TO_DATE(200210,'YYYYMM') AND TO_DATE(200303,'YYYYMM');

----------------------------------------------------------------------------------------   
| Id  | Operation         | Name     | Starts | E-Rows | A-Rows |   A-Time   | Buffers |   
----------------------------------------------------------------------------------------   
|   0 | SELECT STATEMENT  |          |      1 |        |      1 |00:00:00.01 |     705 |   
|   1 |  SORT AGGREGATE   |          |      1 |      1 |      1 |00:00:00.01 |     705 |   
|*  2 |   INDEX RANGE SCAN| IND_B_DT |      1 |    284K|    265K|00:00:00.01 |     705 |   
----------------------------------------------------------------------------------------   

Predicate Information (identified by operation id):  
---------------------------------------------------  
   2 - access("DT">=TO_DATE(' 2002-10-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')              
              AND "DT"<=TO_DATE(' 2003-03-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))         

SQL> SELECT /*+ GATHER_PLAN_STATISTICS */ COUNT(*) FROM B WHERE ID BETWEEN 200812 AND 200901;

----------------------------------------------------------------------------------------   
| Id  | Operation         | Name     | Starts | E-Rows | A-Rows |   A-Time   | Buffers |   
----------------------------------------------------------------------------------------   
|   0 | SELECT STATEMENT  |          |      1 |        |      1 |00:00:00.01 |     810 |   
|   1 |  SORT AGGREGATE   |          |      1 |      1 |      1 |00:00:00.01 |     810 |   
|*  2 |   INDEX RANGE SCAN| IND_B_ID |      1 |    383K|    362K|00:00:00.01 |     810 |   
----------------------------------------------------------------------------------------   

Predicate Information (identified by operation id):  
---------------------------------------------------  
   2 - access("ID">=200812 AND "ID"<=200901)         

SQL>
SQL> SELECT /*+ GATHER_PLAN_STATISTICS */ COUNT(*) FROM B WHERE DT BETWEEN TO_DATE(200812,'YYYYMM') AND TO_DATE(200901,'YYYYMM');

----------------------------------------------------------------------------------------   
| Id  | Operation         | Name     | Starts | E-Rows | A-Rows |   A-Time   | Buffers |   
----------------------------------------------------------------------------------------   
|   0 | SELECT STATEMENT  |          |      1 |        |      1 |00:00:00.03 |     962 |   
|   1 |  SORT AGGREGATE   |          |      1 |      1 |      1 |00:00:00.03 |     962 |   
|*  2 |   INDEX RANGE SCAN| IND_B_DT |      1 |    383K|    362K|00:00:00.01 |     962 |   
----------------------------------------------------------------------------------------   

Predicate Information (identified by operation id):  
---------------------------------------------------  
   2 - access("DT">=TO_DATE(' 2008-12-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')              
              AND "DT"<=TO_DATE(' 2009-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))         

SQL>
SQL> SELECT /*+ GATHER_PLAN_STATISTICS */ COUNT(*) FROM B WHERE ID BETWEEN 200112 AND 200201;

----------------------------------------------------------------------------------------   
| Id  | Operation         | Name     | Starts | E-Rows | A-Rows |   A-Time   | Buffers |   
----------------------------------------------------------------------------------------   
|   0 | SELECT STATEMENT  |          |      1 |        |      1 |00:00:00.01 |     104 |   
|   1 |  SORT AGGREGATE   |          |      1 |      1 |      1 |00:00:00.01 |     104 |   
|*  2 |   INDEX RANGE SCAN| IND_B_ID |      1 |  39391 |  45260 |00:00:00.01 |     104 |   
----------------------------------------------------------------------------------------   

Predicate Information (identified by operation id):  
---------------------------------------------------  
   2 - access("ID">=200112 AND "ID"<=200201)         

SQL>
SQL> SELECT /*+ GATHER_PLAN_STATISTICS */ COUNT(*) FROM B WHERE DT BETWEEN TO_DATE(200112,'YYYYMM') AND TO_DATE(200201,'YYYYMM');

----------------------------------------------------------------------------------------   
| Id  | Operation         | Name     | Starts | E-Rows | A-Rows |   A-Time   | Buffers |   
----------------------------------------------------------------------------------------   
|   0 | SELECT STATEMENT  |          |      1 |        |      1 |00:00:00.01 |     122 |   
|   1 |  SORT AGGREGATE   |          |      1 |      1 |      1 |00:00:00.01 |     122 |   
|*  2 |   INDEX RANGE SCAN| IND_B_DT |      1 |  39391 |  45260 |00:00:00.01 |     122 |   
----------------------------------------------------------------------------------------   

Predicate Information (identified by operation id):  
---------------------------------------------------  
   2 - access("DT">=TO_DATE(' 2001-12-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')              
              AND "DT"<=TO_DATE(' 2002-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))         

SQL> SELECT /*+ GATHER_PLAN_STATISTICS */ COUNT(*) FROM B WHERE ID BETWEEN 200612 AND 200901;

--------------------------------------------------------------------------------------------
| Id  | Operation             | Name     | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |          |      1 |        |      1 |00:00:00.75 |   22345 |
|   1 |  SORT AGGREGATE       |          |      1 |      1 |      1 |00:00:00.75 |   22345 |
|*  2 |   INDEX FAST FULL SCAN| IND_B_ID |      1 |   4053K|   4054K|00:00:00.22 |   22345 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):  
---------------------------------------------------  
   2 - filter(("ID">=200612 AND "ID"<=200901))       

SQL>
SQL> SELECT /*+ GATHER_PLAN_STATISTICS */ COUNT(*) FROM B WHERE DT BETWEEN TO_DATE(200612,'YYYYMM') AND TO_DATE(200901,'YYYYMM');

-------------------------------------------------------------------------------------      
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |      
-------------------------------------------------------------------------------------      
|   0 | SELECT STATEMENT   |      |      1 |        |      1 |00:00:00.47 |   24950 |      
|   1 |  SORT AGGREGATE    |      |      1 |      1 |      1 |00:00:00.47 |   24950 |      
|*  2 |   TABLE ACCESS FULL| B    |      1 |   4053K|   4054K|00:00:00.12 |   24950 |      
-------------------------------------------------------------------------------------      

Predicate Information (identified by operation id):  
---------------------------------------------------  
   2 - filter(("DT">=TO_DATE(' 2006-12-01 00:00:00', 'syyyy-mm-dd                          
              hh24:mi:ss') AND "DT"<=TO_DATE(' 2009-01-01 00:00:00', 'syyyy-mm-dd          
              hh24:mi:ss')))

As the above indicates, with a 254 bucket histogram on both of the columns the optimizer calculates estimated row counts that are typically very close to the actual row counts for both datatypes – essentially the only difference is the number of logical reads. So, adding the histogram helps, but then what if the OP implements good coding standards and uses bind variables rather than constants (literals)?

—

So, which datatype would you choose, and why?

Comments : 7 Comments »

Categories : Histogram, SQL

	Charles Hooper on Favorite Free Applications to…
	Zach on Favorite Free Applications to…
	Charles Hooper on Favorite Free Applications to…
	Zach on Favorite Free Applications to…
	Charles Hooper on Huge Presentation – Work…
	Dan on Huge Presentation – Work…
	Charles Hooper on Review of Synology DS415+, How…
	Matthias Clausen on Review of Synology DS415+, How…
	Charles Hooper on _SMALL_TABLE_THRESHOLD Paramet…
	mohd atif on _SMALL_TABLE_THRESHOLD Paramet…
	Charles Hooper on Free ANSI SQL to Oracle Specif…
	Kalin Agrawal on Free ANSI SQL to Oracle Specif…
	Charles Hooper on Large Page Support or Not…
	vaurob on Large Page Support or Not…
	Locating Oracle… on Finding a New Home for a Clien…

Charles Hooper's Oracle Notes

DATE Datatype Or NUMBER Datatype – Which Should be Used?

Categories

Recent Articles

Misc. Articles

Popular Articles

Recent Comments

Blogroll

Hints for Posting Code Sections in Comments