An Invalid, or Do You Just Not Want to Work

1 12 2010

December 1, 2010

Another recent blog article forced me to Stop, Think, … and just about understand (in case you are wondering about the blog title, the definition of invalid).

Consider the following table definition:

CREATE TABLE T3(
  V1 VARCHAR2(10),
  D2 DATE,
  N3 NUMBER);

INSERT INTO T3 VALUES(
  CHR(65),
  TRUNC(SYSDATE)-65,
  65);

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T3')

SELECT
  *
FROM
  T3;

V1         D2                N3
---------- --------- ----------
A          27-SEP-10         65 

Which of the following SQL statements are valid SQL statements for the above table (E is not an option)?

A:

SELECT
  *
FROM
  T3
WHERE
  TO_NUMBER(V1)=1; 

B:

SELECT
  *
FROM
  T3
WHERE
  TO_NUMBER(V1)=TO_NUMBER('A'); 

C:

SELECT
  *
FROM
  T3 WHERE TO_NUMBER(V1)=1
  AND ROWNUM=2; 

D:

SELECT
  *
FROM
  T3
WHERE
  TO_NUMBER(V1)=1
  AND 1=2; 

E:

SELECT
  *
FROM
  T3
WHERE
  TO_NUMBER(V1)=1
  AND 1=2; 

F:

SELECT
  *
FROM
  T3
WHERE
  TO_DATE(D2) = '27-SEP-2010'; 

G:

SELECT
  *
FROM
  T3
WHERE
  TO_DATE(D2) = '32-SEP-2010'; 

H:

SELECT
  *
FROM
  T3
WHERE
  D2 = '32-SEP-2010'
  AND 1=2; 

I:

SELECT
  *
FROM
  T3
WHERE
  TO_DATE(D2) = N3; 

J:

SELECT
  *
FROM
  T3
WHERE
  TO_DATE(D2) = N3
  AND TO_CHAR(1)='2'; 

K:

SELECT
  *
FROM
  T3
WHERE
  TO_DATE(D2) = N3
  AND 1=2;

L:

SELECT
  *
FROM
  T3
WHERE
  N3 = 'A'; 

M:

SELECT
  *
FROM
  T3
WHERE
  N3 = '27-SEP-2010'; 

N:

SELECT
  *
FROM
  T3
WHERE
  N3 = '32-SEP-2010'; 

——————-

Stop and think about it for a moment, which of the above are valid SQL statements?

—-

—-

—-

—-

—-

—-

—-

—-

—-

—-

—-

—-

—-

—-

—-

—-

—-

—-

—-

—-

—-

—-

—-

—-

We could use the AUTOTRACE feature in SQL*Plus to tell which are valid, and which are not:

SET LINESIZE 140
SET TRIMSPOOL ON
SET PAGESIZE 1000
SET AUTOTRACE TRACEONLY EXPLAIN

SELECT
  *
FROM
  T3
WHERE
  TO_NUMBER(V1)=1;

SELECT
  *
FROM
  T3
WHERE
  TO_NUMBER(V1)=TO_NUMBER('A');

SELECT
  *
FROM
  T3 WHERE TO_NUMBER(V1)=1
  AND ROWNUM=2;

SELECT
  *
FROM
  T3
WHERE
  TO_NUMBER(V1)=1
  AND 1=2;

SELECT
  *
FROM
  T3
WHERE
  TO_DATE(D2) = '27-SEP-2010';

SELECT
  *
FROM
  T3
WHERE
  TO_DATE(D2) = '32-SEP-2010';

SELECT
  *
FROM
  T3
WHERE
  D2 = '32-SEP-2010'
  AND 1=2;

SELECT
  *
FROM
  T3
WHERE
  TO_DATE(D2) = N3;

SELECT
  *
FROM
  T3
WHERE
  TO_DATE(D2) = N3
  AND TO_CHAR(1)='2';

SELECT
  *
FROM
  T3
WHERE
  TO_DATE(D2) = N3
  AND 1=2;

SELECT
  *
FROM
  T3
WHERE
  N3 = 'A';

SELECT
  *
FROM
  T3
WHERE
  N3 = '27-SEP-2010';

SELECT
  *
FROM
  T3
WHERE
  N3 = '32-SEP-2010'; 

The results might look like this:

A:

SQL> SELECT
  2    *
  3  FROM
  4    T3
  5  WHERE
  6    TO_NUMBER(V1)=1;

Execution Plan
----------------------------------------------------------
Plan hash value: 4161002650

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |    13 |     2   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T3   |     1 |    13 |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(TO_NUMBER("V1")=1) 

B:

SQL> SELECT
  2    *
  3  FROM
  4    T3
  5  WHERE
  6    TO_NUMBER(V1)=TO_NUMBER('A');

Execution Plan
----------------------------------------------------------
Plan hash value: 4161002650

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |    13 |     2   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T3   |     1 |    13 |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(TO_NUMBER("V1")=TO_NUMBER('A'))

C:

SQL> SELECT
  2    *
  3  FROM
  4    T3 WHERE TO_NUMBER(V1)=1
  5    AND ROWNUM=2;

Execution Plan
----------------------------------------------------------
Plan hash value: 1538339754

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |     1 |    13 |     2   (0)| 00:00:01 |
|   1 |  COUNT              |      |       |       |            |          |
|*  2 |   FILTER            |      |       |       |            |          |
|*  3 |    TABLE ACCESS FULL| T3   |     1 |    13 |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(ROWNUM=2)
   3 - filter(TO_NUMBER("V1")=1) 

D:

SQL> SELECT
  2    *
  3  FROM
  4    T3
  5  WHERE
  6    TO_NUMBER(V1)=1
  7    AND 1=2;

Execution Plan
----------------------------------------------------------
Plan hash value: 3859223164

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |    13 |     0   (0)|          |
|*  1 |  FILTER            |      |       |       |            |          |
|*  2 |   TABLE ACCESS FULL| T3   |     1 |    13 |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(NULL IS NOT NULL)
   2 - filter(TO_NUMBER("V1")=1)  

F:

SQL> SELECT
  2    *
  3  FROM
  4    T3
  5  WHERE
  6    TO_DATE(D2) = '27-SEP-2010';

Execution Plan
----------------------------------------------------------
Plan hash value: 4161002650

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |    13 |     2   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T3   |     1 |    13 |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(TO_DATE(INTERNAL_FUNCTION("D2"))=TO_DATE(' 2010-09-27
              00:00:00', 'syyyy-mm-dd hh24:mi:ss')) 

G:

SQL> SELECT
  2    *
  3  FROM
  4    T3
  5  WHERE
  6    TO_DATE(D2) = '32-SEP-2010';

Execution Plan
----------------------------------------------------------
Plan hash value: 4161002650

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |    13 |     2   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T3   |     1 |    13 |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(TO_DATE(INTERNAL_FUNCTION("D2"))='32-SEP-2010') 

H:

SQL> SELECT
  2    *
  3  FROM
  4    T3
  5  WHERE
  6    D2 = '32-SEP-2010'
  7    AND 1=2;

Execution Plan
----------------------------------------------------------
Plan hash value: 3859223164

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |    13 |     0   (0)|          |
|*  1 |  FILTER            |      |       |       |            |          |
|*  2 |   TABLE ACCESS FULL| T3   |     1 |    13 |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(NULL IS NOT NULL)
   2 - filter("D2"='32-SEP-2010') 

I:

SQL> SELECT
  2    *
  3  FROM
  4    T3
  5  WHERE
  6    TO_DATE(D2) = N3;
  TO_DATE(D2) = N3
              *
ERROR at line 6:
ORA-00932: inconsistent datatypes: expected DATE got NUMBER 

J:

SQL> SELECT
  2    *
  3  FROM
  4    T3
  5  WHERE
  6    TO_DATE(D2) = N3
  7    AND TO_CHAR(1)='2';
  TO_DATE(D2) = N3
              *
ERROR at line 6:
ORA-00932: inconsistent datatypes: expected DATE got NUMBER 

K:

SQL> SELECT
  2    *
  3  FROM
  4    T3
  5  WHERE
  6    TO_DATE(D2) = N3
  7    AND 1=2;
  TO_DATE(D2) = N3
              *
ERROR at line 6:
ORA-00932: inconsistent datatypes: expected DATE got NUMBER 

L:

SQL> SELECT
  2    *
  3  FROM
  4    T3
  5  WHERE
  6    N3 = 'A';

Execution Plan
----------------------------------------------------------
Plan hash value: 4161002650

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |    13 |     2   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T3   |     1 |    13 |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("N3"=TO_NUMBER('A')) 

M:

SQL> SELECT
  2    *
  3  FROM
  4    T3
  5  WHERE
  6    N3 = '27-SEP-2010';

Execution Plan
----------------------------------------------------------
Plan hash value: 4161002650

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |    13 |     2   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T3   |     1 |    13 |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("N3"=TO_NUMBER('27-SEP-2010')) 

N:

SQL> SELECT
  2    *
  3  FROM
  4    T3
  5  WHERE
  6    N3 = '32-SEP-2010';

Execution Plan
----------------------------------------------------------
Plan hash value: 4161002650

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |    13 |     2   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T3   |     1 |    13 |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("N3"=TO_NUMBER('32-SEP-2010')) 

So, which SQL statements are valid, and which are not?  Did you guess correctly?  Are you surprised, or still not sure?

How about if we do this just to confirm:

SET AUTOTRACE OFF
ALTER SESSION SET TRACEFILE_IDENTIFIER = 'TEST_SQL_OK';
ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT FOREVER, LEVEL 8';

SELECT
  *
FROM
  T3
WHERE
  TO_NUMBER(V1)=1;

SELECT
  *
FROM
  T3
WHERE
  TO_NUMBER(V1)=TO_NUMBER('A');

SELECT
  *
FROM
  T3 WHERE TO_NUMBER(V1)=1
  AND ROWNUM=2;

SELECT
  *
FROM
  T3
WHERE
  TO_NUMBER(V1)=1
  AND 1=2;

SELECT
  *
FROM
  T3
WHERE
  TO_DATE(D2) = '27-SEP-2010';

SELECT
  *
FROM
  T3
WHERE
  TO_DATE(D2) = '32-SEP-2010';

SELECT
  *
FROM
  T3
WHERE
  D2 = '32-SEP-2010'
  AND 1=2;

SELECT
  *
FROM
  T3
WHERE
  TO_DATE(D2) = N3;

SELECT
  *
FROM
  T3
WHERE
  TO_DATE(D2) = N3
  AND TO_CHAR(1)='2';

SELECT
  *
FROM
  T3
WHERE
  TO_DATE(D2) = N3
  AND 1=2;

SELECT
  *
FROM
  T3
WHERE
  N3 = 'A';

SELECT
  *
FROM
  T3
WHERE
  N3 = '27-SEP-2010';

SELECT
  *
FROM
  T3
WHERE
  N3 = '32-SEP-2010';

ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT OFF'; 

The output looks like this:

A:

SQL> SELECT
  2    *
  3  FROM
  4    T3
  5  WHERE
  6    TO_NUMBER(V1)=1;
  TO_NUMBER(V1)=1
  *
ERROR at line 6:
ORA-01722: invalid number 

B:

SQL> SELECT
  2    *
  3  FROM
  4    T3
  5  WHERE
  6    TO_NUMBER(V1)=TO_NUMBER('A');
  TO_NUMBER(V1)=TO_NUMBER('A')
  *
ERROR at line 6:
ORA-01722: invalid number 

C:

SQL> SELECT
  2    *
  3  FROM
  4    T3 WHERE TO_NUMBER(V1)=1
  5    AND ROWNUM=2;
  T3 WHERE TO_NUMBER(V1)=1
           *
ERROR at line 4:
ORA-01722: invalid number 

D:

SQL> SELECT
  2    *
  3  FROM
  4    T3
  5  WHERE
  6    TO_NUMBER(V1)=1
  7    AND 1=2;

no rows selected 

F:

SQL> SELECT
  2    *
  3  FROM
  4    T3
  5  WHERE
  6    TO_DATE(D2) = '27-SEP-2010';

V1         D2                N3
---------- --------- ----------
A          27-SEP-10         65 

G:

SQL> SELECT
  2    *
  3  FROM
  4    T3
  5  WHERE
  6    TO_DATE(D2) = '32-SEP-2010';
  TO_DATE(D2) = '32-SEP-2010'
                *
ERROR at line 6:
ORA-01847: day of month must be between 1 and last day of month 

H:

SQL> SELECT
  2    *
  3  FROM
  4    T3
  5  WHERE
  6    D2 = '32-SEP-2010'
  7    AND 1=2;

no rows selected 

I:

SQL> SELECT
  2    *
  3  FROM
  4    T3
  5  WHERE
  6    TO_DATE(D2) = N3;
  TO_DATE(D2) = N3
              *
ERROR at line 6:
ORA-00932: inconsistent datatypes: expected DATE got NUMBER 

J:

SQL> SELECT
  2    *
  3  FROM
  4    T3
  5  WHERE
  6    TO_DATE(D2) = N3
  7    AND TO_CHAR(1)='2';
  TO_DATE(D2) = N3
              *
ERROR at line 6:
ORA-00932: inconsistent datatypes: expected DATE got NUMBER 

K:

SQL> SELECT
  2    *
  3  FROM
  4    T3
  5  WHERE
  6    TO_DATE(D2) = N3
  7    AND 1=2;
  TO_DATE(D2) = N3
              *
ERROR at line 6:
ORA-00932: inconsistent datatypes: expected DATE got NUMBER 

L:

SQL> SELECT
  2    *
  3  FROM
  4    T3
  5  WHERE
  6    N3 = 'A';
  N3 = 'A'
       *
ERROR at line 6:
ORA-01722: invalid number 

M:

SQL> SELECT
  2    *
  3  FROM
  4    T3
  5  WHERE
  6    N3 = '27-SEP-2010';
  N3 = '27-SEP-2010'
       *
ERROR at line 6:
ORA-01722: invalid number 

N:

SQL> SELECT
  2    *
  3  FROM
  4    T3
  5  WHERE
  6    N3 = '32-SEP-2010';
  N3 = '32-SEP-2010'
       *
ERROR at line 6:
ORA-01722: invalid number 

So, which SQL statements are invalid?  Last chance.

—-

—-

—-

—-

—-

—-

—-

—-

—-

—-

—-

—-

—-

—-

—-

—-

—-

—-

—-

—-

—-

Good idea, let’s take a look inside the 10046 trace file to see if it provides any critical clues:

WAIT #6: nam='SQL*Net message to client' ela= 3 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=106208311587
WAIT #6: nam='SQL*Net message from client' ela= 409 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=106208313312
CLOSE #6:c=0,e=0,dep=0,type=1,tim=106208312194
WAIT #0: nam='SQL*Net message to client' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=106208313394
WAIT #0: nam='SQL*Net message from client' ela= 1137 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=106208314551
=====================
PARSING IN CURSOR #2 len=44 dep=0 uid=185 oct=3 lid=185 tim=106208312194 hv=2056319931 ad='1e367f68' sqlid='94uqsttx91wxv'
SELECT
  *
FROM
  T3
WHERE
  TO_NUMBER(V1)=1
END OF STMT
PARSE #2:c=0,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=4161002650,tim=106208312194
EXEC #2:c=0,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=4161002650,tim=106208312194
WAIT #2: nam='SQL*Net message to client' ela= 2 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=106208314830
FETCH #2:c=0,e=0,p=0,cr=5,cu=0,mis=0,r=0,dep=0,og=1,plh=4161002650,tim=106208312194
STAT #2 id=1 cnt=0 pid=0 pos=1 obj=104348 op='TABLE ACCESS FULL T3 (cr=0 pr=0 pw=0 time=0 us cost=2 size=13 card=1)'
WAIT #2: nam='SQL*Net break/reset to client' ela= 3 driver id=1413697536 break?=1 p3=0 obj#=-1 tim=106208315213
WAIT #2: nam='SQL*Net break/reset to client' ela= 124 driver id=1413697536 break?=0 p3=0 obj#=-1 tim=106208315356
CLOSE #2:c=0,e=0,dep=0,type=0,tim=106208312194
WAIT #0: nam='SQL*Net message to client' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=106208315926
WAIT #0: nam='SQL*Net message from client' ela= 1229 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=106208317175
=====================
PARSING IN CURSOR #4 len=57 dep=0 uid=185 oct=3 lid=185 tim=106208312194 hv=4036914934 ad='1e3647e4' sqlid='b34gs4zs9wvrq'
SELECT
  *
FROM
  T3
WHERE
  TO_NUMBER(V1)=TO_NUMBER('A')
END OF STMT
PARSE #4:c=0,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=4161002650,tim=106208312194
EXEC #4:c=0,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=4161002650,tim=106208312194
WAIT #4: nam='SQL*Net message to client' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=106208317422
FETCH #4:c=0,e=0,p=0,cr=5,cu=0,mis=0,r=0,dep=0,og=1,plh=4161002650,tim=106208312194
STAT #4 id=1 cnt=0 pid=0 pos=1 obj=104348 op='TABLE ACCESS FULL T3 (cr=0 pr=0 pw=0 time=0 us cost=2 size=13 card=1)'
WAIT #4: nam='SQL*Net break/reset to client' ela= 2 driver id=1413697536 break?=1 p3=0 obj#=-1 tim=106208317617
WAIT #4: nam='SQL*Net break/reset to client' ela= 125 driver id=1413697536 break?=0 p3=0 obj#=-1 tim=106208317761
CLOSE #4:c=0,e=0,dep=0,type=0,tim=106208312194
WAIT #0: nam='SQL*Net message to client' ela= 2 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=106208318310
WAIT #0: nam='SQL*Net message from client' ela= 1084 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=106208319413
=====================
PARSING IN CURSOR #8 len=57 dep=0 uid=185 oct=3 lid=185 tim=106208312194 hv=3419871603 ad='1e3641f4' sqlid='2g02uyv5xf6bm'
SELECT
  *
FROM
  T3 WHERE TO_NUMBER(V1)=1
  AND ROWNUM=2
END OF STMT
PARSE #8:c=0,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=1538339754,tim=106208312194
EXEC #8:c=0,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=1538339754,tim=106208312194
WAIT #8: nam='SQL*Net message to client' ela= 2 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=106208319705
FETCH #8:c=0,e=0,p=0,cr=5,cu=0,mis=0,r=0,dep=0,og=1,plh=1538339754,tim=106208312194
STAT #8 id=1 cnt=0 pid=0 pos=1 obj=0 op='COUNT  (cr=0 pr=0 pw=0 time=0 us)'
STAT #8 id=2 cnt=0 pid=1 pos=1 obj=0 op='FILTER  (cr=0 pr=0 pw=0 time=0 us)'
STAT #8 id=3 cnt=0 pid=2 pos=1 obj=104348 op='TABLE ACCESS FULL T3 (cr=0 pr=0 pw=0 time=0 us cost=2 size=13 card=1)'
WAIT #8: nam='SQL*Net break/reset to client' ela= 2 driver id=1413697536 break?=1 p3=0 obj#=-1 tim=106208319939
WAIT #8: nam='SQL*Net break/reset to client' ela= 112 driver id=1413697536 break?=0 p3=0 obj#=-1 tim=106208320070
CLOSE #8:c=0,e=0,dep=0,type=0,tim=106208312194
WAIT #0: nam='SQL*Net message to client' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=106208320807
WAIT #0: nam='SQL*Net message from client' ela= 1240 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=106208322066
=====================
PARSING IN CURSOR #3 len=54 dep=0 uid=185 oct=3 lid=185 tim=106208312194 hv=1521821591 ad='1e363c04' sqlid='3ty5saddba9wr'
SELECT
  *
FROM
  T3
WHERE
  TO_NUMBER(V1)=1
  AND 1=2
END OF STMT
PARSE #3:c=0,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=3859223164,tim=106208312194
EXEC #3:c=0,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=3859223164,tim=106208312194
WAIT #3: nam='SQL*Net message to client' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=106208322296
FETCH #3:c=0,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=3859223164,tim=106208312194
STAT #3 id=1 cnt=0 pid=0 pos=1 obj=0 op='FILTER  (cr=0 pr=0 pw=0 time=0 us)'
STAT #3 id=2 cnt=0 pid=1 pos=1 obj=104348 op='TABLE ACCESS FULL T3 (cr=0 pr=0 pw=0 time=0 us cost=2 size=13 card=1)'
WAIT #3: nam='SQL*Net message from client' ela= 366 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=106208322774
CLOSE #3:c=0,e=0,dep=0,type=1,tim=106208312194
WAIT #0: nam='SQL*Net message to client' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=106208322830
WAIT #0: nam='SQL*Net message from client' ela= 1249 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=106208324098
=====================
PARSING IN CURSOR #2 len=56 dep=0 uid=185 oct=3 lid=185 tim=106208312194 hv=1963953631 ad='1e3634ec' sqlid='du7hngjuhz3fz'
SELECT
  *
FROM
  T3
WHERE
  TO_DATE(D2) = '27-SEP-2010'
END OF STMT
PARSE #2:c=0,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=4161002650,tim=106208312194
EXEC #2:c=0,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=4161002650,tim=106208312194
WAIT #2: nam='SQL*Net message to client' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=106208326129
FETCH #2:c=0,e=0,p=0,cr=5,cu=0,mis=0,r=1,dep=0,og=1,plh=4161002650,tim=106208312194
WAIT #2: nam='SQL*Net message from client' ela= 182 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=106208326423
FETCH #2:c=0,e=0,p=0,cr=2,cu=0,mis=0,r=0,dep=0,og=1,plh=4161002650,tim=106208312194
STAT #2 id=1 cnt=1 pid=0 pos=1 obj=104348 op='TABLE ACCESS FULL T3 (cr=7 pr=0 pw=0 time=0 us cost=2 size=13 card=1)'
WAIT #2: nam='SQL*Net message to client' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=106208326523
WAIT #2: nam='SQL*Net message from client' ela= 411 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=106208327021
CLOSE #2:c=0,e=0,dep=0,type=0,tim=106208312194
WAIT #0: nam='SQL*Net message to client' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=106208327083
WAIT #0: nam='SQL*Net message from client' ela= 28084 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=106208355185
=====================
PARSING IN CURSOR #4 len=56 dep=0 uid=185 oct=3 lid=185 tim=106208339668 hv=1226802804 ad='1e362dd4' sqlid='1hqqd0x4jz1mn'
SELECT
  *
FROM
  T3
WHERE
  TO_DATE(D2) = '32-SEP-2010'
END OF STMT
PARSE #4:c=15625,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=4161002650,tim=106208339668
EXEC #4:c=0,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=4161002650,tim=106208339668
WAIT #4: nam='SQL*Net message to client' ela= 2 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=106208355455
FETCH #4:c=0,e=0,p=0,cr=5,cu=0,mis=0,r=0,dep=0,og=1,plh=4161002650,tim=106208339668
STAT #4 id=1 cnt=0 pid=0 pos=1 obj=104348 op='TABLE ACCESS FULL T3 (cr=0 pr=0 pw=0 time=0 us cost=2 size=13 card=1)'
WAIT #4: nam='SQL*Net break/reset to client' ela= 2 driver id=1413697536 break?=1 p3=0 obj#=-1 tim=106208355669
WAIT #4: nam='SQL*Net break/reset to client' ela= 134 driver id=1413697536 break?=0 p3=0 obj#=-1 tim=106208355822
CLOSE #4:c=0,e=0,dep=0,type=0,tim=106208339668
WAIT #0: nam='SQL*Net message to client' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=106208356681
WAIT #0: nam='SQL*Net message from client' ela= 1259 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=106208357960
=====================
PARSING IN CURSOR #8 len=57 dep=0 uid=185 oct=3 lid=185 tim=106208339668 hv=2831431159 ad='1e3627e4' sqlid='3u7tx3knc8dgr'
SELECT
  *
FROM
  T3
WHERE
  D2 = '32-SEP-2010'
  AND 1=2
END OF STMT
PARSE #8:c=0,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=3859223164,tim=106208339668
EXEC #8:c=0,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=3859223164,tim=106208339668
WAIT #8: nam='SQL*Net message to client' ela= 2 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=106208358205
FETCH #8:c=0,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=3859223164,tim=106208339668
STAT #8 id=1 cnt=0 pid=0 pos=1 obj=0 op='FILTER  (cr=0 pr=0 pw=0 time=0 us)'
STAT #8 id=2 cnt=0 pid=1 pos=1 obj=104348 op='TABLE ACCESS FULL T3 (cr=0 pr=0 pw=0 time=0 us cost=2 size=13 card=1)'
WAIT #8: nam='SQL*Net message from client' ela= 439 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=106208358756
CLOSE #8:c=0,e=0,dep=0,type=0,tim=106208339668
WAIT #0: nam='SQL*Net message to client' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=106208358815
WAIT #0: nam='SQL*Net message from client' ela= 1489 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=106208360322
WAIT #3: nam='SQL*Net break/reset to client' ela= 5 driver id=1413697536 break?=1 p3=0 obj#=-1 tim=106208363231
WAIT #3: nam='SQL*Net break/reset to client' ela= 133 driver id=1413697536 break?=0 p3=0 obj#=-1 tim=106208363398
WAIT #3: nam='SQL*Net message to client' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=106208363421
WAIT #3: nam='SQL*Net message from client' ela= 701 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=106208364163
CLOSE #3:c=0,e=0,dep=0,type=0,tim=106208339668
WAIT #0: nam='SQL*Net message to client' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=106208364233
WAIT #0: nam='SQL*Net message from client' ela= 1322 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=106208365574
WAIT #6: nam='SQL*Net break/reset to client' ela= 3 driver id=1413697536 break?=1 p3=0 obj#=-1 tim=106208366129
WAIT #6: nam='SQL*Net break/reset to client' ela= 126 driver id=1413697536 break?=0 p3=0 obj#=-1 tim=106208366285
WAIT #6: nam='SQL*Net message to client' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=106208366308
WAIT #6: nam='SQL*Net message from client' ela= 679 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=106208367019
CLOSE #6:c=0,e=0,dep=0,type=0,tim=106208339668
WAIT #0: nam='SQL*Net message to client' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=106208367086
WAIT #0: nam='SQL*Net message from client' ela= 1229 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=106208368340
WAIT #2: nam='SQL*Net break/reset to client' ela= 4 driver id=1413697536 break?=1 p3=0 obj#=-1 tim=106208368838
WAIT #2: nam='SQL*Net break/reset to client' ela= 122 driver id=1413697536 break?=0 p3=0 obj#=-1 tim=106208368989
WAIT #2: nam='SQL*Net message to client' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=106208369011
WAIT #2: nam='SQL*Net message from client' ela= 679 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=106208369721
CLOSE #2:c=0,e=0,dep=0,type=0,tim=106208339668
WAIT #0: nam='SQL*Net message to client' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=106208369786
WAIT #0: nam='SQL*Net message from client' ela= 1010 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=106208370816
=====================
PARSING IN CURSOR #4 len=37 dep=0 uid=185 oct=3 lid=185 tim=106208339668 hv=1569184524 ad='1e361d0c' sqlid='54544w5fsgqsc'
SELECT
  *
FROM
  T3
WHERE
  N3 = 'A'
END OF STMT
PARSE #4:c=0,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=4161002650,tim=106208339668
EXEC #4:c=0,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=4161002650,tim=106208370919
WAIT #4: nam='SQL*Net message to client' ela= 2 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=106208371121
FETCH #4:c=0,e=0,p=0,cr=5,cu=0,mis=0,r=0,dep=0,og=1,plh=4161002650,tim=106208370919
STAT #4 id=1 cnt=0 pid=0 pos=1 obj=104348 op='TABLE ACCESS FULL T3 (cr=0 pr=0 pw=0 time=0 us cost=2 size=13 card=1)'
WAIT #4: nam='SQL*Net break/reset to client' ela= 2 driver id=1413697536 break?=1 p3=0 obj#=-1 tim=106208371337
WAIT #4: nam='SQL*Net break/reset to client' ela= 108 driver id=1413697536 break?=0 p3=0 obj#=-1 tim=106208371463
CLOSE #4:c=0,e=0,dep=0,type=0,tim=106208370919
WAIT #0: nam='SQL*Net message to client' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=106208372094
WAIT #0: nam='SQL*Net message from client' ela= 1072 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=106208373186
=====================
PARSING IN CURSOR #8 len=47 dep=0 uid=185 oct=3 lid=185 tim=106208370919 hv=2630877256 ad='1e36173c' sqlid='5xks2pufd0028'
SELECT
  *
FROM
  T3
WHERE
  N3 = '27-SEP-2010'
END OF STMT
PARSE #8:c=0,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=4161002650,tim=106208370919
EXEC #8:c=0,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=4161002650,tim=106208370919
WAIT #8: nam='SQL*Net message to client' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=106208373423
FETCH #8:c=0,e=0,p=0,cr=5,cu=0,mis=0,r=0,dep=0,og=1,plh=4161002650,tim=106208370919
STAT #8 id=1 cnt=0 pid=0 pos=1 obj=104348 op='TABLE ACCESS FULL T3 (cr=0 pr=0 pw=0 time=0 us cost=2 size=13 card=1)'
WAIT #8: nam='SQL*Net break/reset to client' ela= 2 driver id=1413697536 break?=1 p3=0 obj#=-1 tim=106208373612
WAIT #8: nam='SQL*Net break/reset to client' ela= 102 driver id=1413697536 break?=0 p3=0 obj#=-1 tim=106208373732
CLOSE #8:c=0,e=0,dep=0,type=0,tim=106208370919
WAIT #0: nam='SQL*Net message to client' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=106208374372
WAIT #0: nam='SQL*Net message from client' ela= 1083 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=106208375474
=====================
PARSING IN CURSOR #3 len=47 dep=0 uid=185 oct=3 lid=185 tim=106208370919 hv=1567527007 ad='1e36116c' sqlid='c5w7qdxfqx42z'
SELECT
  *
FROM
  T3
WHERE
  N3 = '32-SEP-2010'
END OF STMT
PARSE #3:c=0,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=4161002650,tim=106208370919
EXEC #3:c=0,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=4161002650,tim=106208370919
WAIT #3: nam='SQL*Net message to client' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=106208375702
FETCH #3:c=0,e=0,p=0,cr=5,cu=0,mis=0,r=0,dep=0,og=1,plh=4161002650,tim=106208370919
STAT #3 id=1 cnt=0 pid=0 pos=1 obj=104348 op='TABLE ACCESS FULL T3 (cr=0 pr=0 pw=0 time=0 us cost=2 size=13 card=1)'
WAIT #3: nam='SQL*Net break/reset to client' ela= 2 driver id=1413697536 break?=1 p3=0 obj#=-1 tim=106208375886
WAIT #3: nam='SQL*Net break/reset to client' ela= 102 driver id=1413697536 break?=0 p3=0 obj#=-1 tim=106208376006
CLOSE #3:c=0,e=0,dep=0,type=0,tim=106208370919
WAIT #0: nam='SQL*Net message to client' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=106208376714
WAIT #0: nam='SQL*Net message from client' ela= 719 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=106208377453
PARSE #6:c=0,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=0,plh=0,tim=106208370919
EXEC #6:c=0,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=0,plh=0,tim=106208370919 

It is a little painful that not even the PARSE call for some of the SQL statements completed, therefore the SQL statement is not written to the 10046 trace file.

Maybe I should have clarified what I was asking for?  When we say that a SQL statement is invalid, do we mean:

  • It generates an error message when it is executed.
  • It generates an error message when it is parsed.
  • It generates no error message, yet is logically invalid due to impossible values: 1=2 ; September 32, 2010 ; TO_NUMBER(‘A’)
  • Something else?




I ORDERED a Hint – Why Won’t You Listen to Me?

29 11 2010

November 29, 2010 (Updated November 30, 2010)

I recently read a blog article that forced me to Stop, Think, … Understand (which happens to be the subtitle of this blog, so it is worth mentioning).  The blog article discussed an error found in the Oracle Database documentation regarding how to read an execution plan – more specifically, how to find the first operation in the execution plan.  A couple of months ago I had the opportunity to attend one of Tanel Poder’s presentations, which I believe was titled “Back to Basics – Choosing the Starting Point of Performance Tuning and Troubleshooting Wisely.”  In the presentation Tanel remarked that there is an error in the documentation (quoted on the referenced blog article), which essentially states that the first operation executed in an execution plan is the operation that is indented the most to the right.  Tanel made the claim that (paraphrasing, of course) the first operation that is executed is actually the first operation (reading from the top of the execution plan) which has no child operations.  An interesting claim, with which I felt no doubt that is correct, but I never verified the claim.  I had that chance when reviewing the referenced blog article, where one of the comments smartly suggested to test with the ORDERED clause hint, and to examine the execution plan.  It seemed to me that this was a very good idea, and I was initially shocked to find that Tanel’s claim was incorrect.  But wait, we can’t stop yet.  What else can we test to see if Tanel is right (he is, of course – see the referenced blog article for the reason)?  How about a 10046 trace with the ORDERED clause hint?  But then I started seeing something strange, a quick script that uses the test tables that were created in an earlier blog article:

SET AUTOTRACE TRACEONLY STATISTICS
SET ARRAYSIZE 1000

ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE;

ALTER SESSION SET TRACEFILE_IDENTIFIER = 'ORDERED_POL_PO_P';
ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT FOREVER, LEVEL 8';

SELECT /*+ ORDERED */
  PO.VENDOR_ID,
  P.PRODUCT_CODE,
  P.STOCK_UM,
  SUM(POL.ORDER_QTY) ORDER_QTY
FROM
  PO_LINE POL,
  PO_HEADER PO,
  PARTS P
WHERE
  PO.ORDER_DATE BETWEEN TRUNC(SYSDATE-90) AND TRUNC(SYSDATE)
  AND PO.PURC_ORDER_ID=POL.PURC_ORDER_ID
  AND POL.PART_ID=P.PART_ID
GROUP BY
  PO.VENDOR_ID,
  P.PRODUCT_CODE,
  P.STOCK_UM;

ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT OFF';

ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE;

ALTER SESSION SET TRACEFILE_IDENTIFIER = 'ORDERED_POL_P_PO';
ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT FOREVER, LEVEL 8';

SELECT /*+ ORDERED */
  PO.VENDOR_ID,
  P.PRODUCT_CODE,
  P.STOCK_UM,
  SUM(POL.ORDER_QTY) ORDER_QTY
FROM
  PO_LINE POL,
  PARTS P,
  PO_HEADER PO
WHERE
  PO.ORDER_DATE BETWEEN TRUNC(SYSDATE-90) AND TRUNC(SYSDATE)
  AND PO.PURC_ORDER_ID=POL.PURC_ORDER_ID
  AND POL.PART_ID=P.PART_ID
GROUP BY
  PO.VENDOR_ID,
  P.PRODUCT_CODE,
  P.STOCK_UM;

SELECT SYSDATE FROM DUAL;

ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT OFF';

ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE;

ALTER SESSION SET TRACEFILE_IDENTIFIER = 'ORDERED_P_POL_PO';
ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT FOREVER, LEVEL 8';

SELECT /*+ ORDERED */
  PO.VENDOR_ID,
  P.PRODUCT_CODE,
  P.STOCK_UM,
  SUM(POL.ORDER_QTY) ORDER_QTY
FROM
  PARTS P,
  PO_LINE POL,
  PO_HEADER PO
WHERE
  PO.ORDER_DATE BETWEEN TRUNC(SYSDATE-90) AND TRUNC(SYSDATE)
  AND PO.PURC_ORDER_ID=POL.PURC_ORDER_ID
  AND POL.PART_ID=P.PART_ID
GROUP BY
  PO.VENDOR_ID,
  P.PRODUCT_CODE,
  P.STOCK_UM;

SELECT SYSDATE FROM DUAL;

ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT OFF';

ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE;

ALTER SESSION SET TRACEFILE_IDENTIFIER = 'ORDERED_P_PO_POL';
ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT FOREVER, LEVEL 8';

SELECT /*+ ORDERED */
  PO.VENDOR_ID,
  P.PRODUCT_CODE,
  P.STOCK_UM,
  SUM(POL.ORDER_QTY) ORDER_QTY
FROM
  PARTS P,
  PO_HEADER PO,
  PO_LINE POL
WHERE
  PO.ORDER_DATE BETWEEN TRUNC(SYSDATE-90) AND TRUNC(SYSDATE)
  AND PO.PURC_ORDER_ID=POL.PURC_ORDER_ID
  AND POL.PART_ID=P.PART_ID
GROUP BY
  PO.VENDOR_ID,
  P.PRODUCT_CODE,
  P.STOCK_UM;

SELECT SYSDATE FROM DUAL;

ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT OFF'; 

So, what is strange about the 10046 trace file.  Let’s take a look at the TKPROF output that was generated by Oracle Database 11.2.0.1 for the first SQL statement (color coded to help see the tables in the execution plan):  

SELECT /*+ ORDERED */
FROM
  PO_LINE POL,
  PO_HEADER PO,
  PARTS P

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
      1855       1855       1855  HASH GROUP BY (cr=267225 pr=267202 pw=0 time=397 us cost=21234 size=105735 card=1855)
      5935       5935       5935   HASH JOIN  (cr=267225 pr=267202 pw=0 time=757 us cost=21233 size=338295 card=5935)
      5935       5935       5935    VIEW  VW_GBC_9 (cr=262229 pr=262213 pw=0 time=1641 us cost=20837 size=213660 card=5935)
      5935       5935       5935     HASH GROUP BY (cr=262229 pr=262213 pw=0 time=757 us cost=20837 size=308620 card=5935)
    329789     329789     329789      FILTER  (cr=262229 pr=262213 pw=0 time=4290315 us)
    329789     329789     329789       HASH JOIN  (cr=262229 pr=262213 pw=0 time=4248980 us cost=20823 size=17880824 card=343862)
     13890      13890      13890        TABLE ACCESS FULL PO_HEADER (cr=13173 pr=13163 pw=0 time=552501 us cost=1068 size=388920 card=13890)
  12205347   12205347   12205347        TABLE ACCESS FULL PO_LINE (cr=249056 pr=249050 pw=0 time=2448109 us cost=19697 size=292928328 card=12205347)
     99694      99694      99694    TABLE ACCESS FULL PARTS (cr=4996 pr=4989 pw=0 time=43511 us cost=394 size=2093574 card=99694) 

In the above, Oracle builds a hash table (this probably is not the right term, which escapes me at the moment) of the PO_HEADER table which is then used to probe into the 1.2 million rows in the PO_LINE table, eventually the 5935 rows from the join of PO_HEADER to PO_LINE is joined to the PARTS table.  So, other than the order of PO_HEADER and PO_LINE being swapped, the ORDERED hint seems to have controlled the order in which the tables were accessed.  Let’s take a look at the next query:

SELECT /*+ ORDERED */
FROM
  PO_LINE POL,
  PARTS P,
  PO_HEADER PO

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
      1855       1855       1855  HASH GROUP BY (cr=267225 pr=317918 pw=50685 time=529 us cost=26617 size=135415 card=1855)
    329789     329789     329789   FILTER  (cr=267225 pr=317918 pw=50685 time=3703300 us)
    329789     329789     329789    HASH JOIN  (cr=267225 pr=317918 pw=50685 time=3669899 us cost=26602 size=25101926 card=343862)
     13890      13890      13890     TABLE ACCESS FULL PO_HEADER (cr=13173 pr=13163 pw=0 time=546640 us cost=1068 size=388920 card=13890)
  12205347   12205347   12205347     HASH JOIN  (cr=254052 pr=304755 pw=50685 time=3243495 us cost=25476 size=549240615 card=12205347)
  12205347   12205347   12205347      TABLE ACCESS FULL PO_LINE (cr=249056 pr=249050 pw=0 time=1831025 us cost=19697 size=292928328 card=12205347)
     99694      99694      99694      TABLE ACCESS FULL PARTS (cr=4996 pr=4989 pw=0 time=50678 us cost=394 size=2093574 card=99694) 

The above might give us a little trouble.  It looks like Oracle builds a hash table (this probably is not the right term, which escapes me at the moment) of the 1.2 million rows in the PO_LINE table which is then used to probe into the nearly 100,000 rows in the PARTS table, and then that result is hash joined to the PO_HEADER table.  So, the ORDERED hint seems to have exactly controlled the order in which the tables were accessed (but wait, something is wrong, if only I knew how to read the raw 10046 trace file).  Let’s check the next query’s TKPROF output: 

SELECT /*+ ORDERED */
FROM
  PARTS P,
  PO_LINE POL,
  PO_HEADER PO

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
      1855       1855       1855  HASH GROUP BY (cr=267225 pr=267202 pw=0 time=397 us cost=26588 size=135415 card=1855)
    329789     329789     329789   FILTER  (cr=267225 pr=267202 pw=0 time=6561463 us)
    329789     329789     329789    HASH JOIN  (cr=267225 pr=267202 pw=0 time=6525886 us cost=26574 size=25101926 card=343862)
     13890      13890      13890     TABLE ACCESS FULL PO_HEADER (cr=13173 pr=13163 pw=0 time=532624 us cost=1068 size=388920 card=13890)
  12205347   12205347   12205347     HASH JOIN  (cr=254052 pr=254039 pw=0 time=6233040 us cost=25447 size=549240615 card=12205347)
     99694      99694      99694      TABLE ACCESS FULL PARTS (cr=4996 pr=4989 pw=0 time=42487 us cost=394 size=2093574 card=99694)
  12205347   12205347   12205347      TABLE ACCESS FULL PO_LINE (cr=249056 pr=249050 pw=0 time=1623411 us cost=19697 size=292928328 card=12205347)

The join order is PARTS -> PO_LINE -> PO_HEADER, just like what we ORDERED (but wait, what does that 10046 trace file show again?).  Let’s take a look at the final query’s TKPROF output:

SELECT /*+ ORDERED */
  PO.VENDOR_ID,
  P.PRODUCT_CODE,
  P.STOCK_UM,
  SUM(POL.ORDER_QTY) ORDER_QTY
FROM
  PARTS P,
  PO_HEADER PO,
  PO_LINE POL

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
      1855       1855       1855  HASH GROUP BY (cr=267225 pr=284314 pw=17112 time=794 us cost=78881 size=109445 card=1855)
    103059     103059     103059   MERGE JOIN  (cr=267225 pr=284314 pw=17112 time=2531825 us cost=78877 size=6031275 card=102225)
   3628420    3628420    3628420    SORT JOIN (cr=254052 pr=271151 pw=17112 time=930143 us cost=77807 size=112482384 card=3628464)
   3628464    3628464    3628464     VIEW  VW_GBC_9 (cr=254052 pr=257294 pw=3255 time=1181413 us cost=77807 size=112482384 card=3628464)
   3628464    3628464    3628464      HASH GROUP BY (cr=254052 pr=257294 pw=3255 time=702832 us cost=77807 size=163280880 card=3628464)
  12205347   12205347   12205347       FILTER  (cr=254052 pr=254039 pw=0 time=7536710 us)
  12205347   12205347   12205347        HASH JOIN  (cr=254052 pr=254039 pw=0 time=6162000 us cost=25447 size=549240615 card=12205347)
     99694      99694      99694         TABLE ACCESS FULL PARTS (cr=4996 pr=4989 pw=0 time=45175 us cost=394 size=2093574 card=99694)
  12205347   12205347   12205347         TABLE ACCESS FULL PO_LINE (cr=249056 pr=249050 pw=0 time=1711986 us cost=19697 size=292928328 card=12205347)
    103059     103059     103059    SORT JOIN (cr=13173 pr=13163 pw=0 time=0 us cost=1070 size=388920 card=13890)
     13890      13890      13890     TABLE ACCESS FULL PO_HEADER (cr=13173 pr=13163 pw=0 time=550208 us cost=1068 size=388920 card=13890) 

The join order is PARTS -> PO_LINE -> PO_HEADER.  What?  I thought that I requested, no DEMANDED (hints are directives after all) that we join in this order:
PARTS -> PO_HEADER -> PO_LINE

I guess that I need to check the documentation for the ORDERED hint: 

“The ORDERED hint instructs Oracle to join tables in the order in which they appear in the FROM clause. Oracle recommends that you use the LEADING hint, which is more versatile than the ORDERED hint.

When you omit the ORDERED hint from a SQL statement requiring a join, the optimizer chooses the order in which to join the tables. You might want to use the ORDERED hint to specify a join order if you know something that the optimizer does not know about the number of rows selected from each table. Such information lets you choose an inner and outer table better than the optimizer could.”

OK, so the documentation confirms that the ORDERED hint controls the join order, not the order in which the tables are accessed.  So, how can we explain why the optimizer did not do as we instructed, with a specific join order?  We could create a 10053 trace file for the query:

ALTER SESSION SET TRACEFILE_IDENTIFIER = 'ORDERED_P_PO_POL_FIND_ME';
ALTER SESSION SET EVENTS '10053 TRACE NAME CONTEXT FOREVER, LEVEL 1';

SELECT /*+ ORDERED */ /* FIND_ME */
  PO.VENDOR_ID,
  P.PRODUCT_CODE,
  P.STOCK_UM,
  SUM(POL.ORDER_QTY) ORDER_QTY
FROM
  PARTS P,
  PO_HEADER PO,
  PO_LINE POL
WHERE
  PO.ORDER_DATE BETWEEN TRUNC(SYSDATE-90) AND TRUNC(SYSDATE)
  AND PO.PURC_ORDER_ID=POL.PURC_ORDER_ID
  AND POL.PART_ID=P.PART_ID
GROUP BY
  PO.VENDOR_ID,
  P.PRODUCT_CODE,
  P.STOCK_UM;
ALTER SESSION SET EVENTS '10053 TRACE NAME CONTEXT OFF';

Taking a look inside the 10053 trace file, we find the execution plan with the Predicate Information section:

============
Plan Table
============
---------------------------------------------+-----------------------------------+
| Id  | Operation                 | Name     | Rows  | Bytes | Cost  | Time      |
---------------------------------------------+-----------------------------------+
| 0   | SELECT STATEMENT          |          |       |       |  125K |           |
| 1   |  HASH GROUP BY            |          |  168K | 9936K |  125K |  00:17:04 |
| 2   |   MERGE JOIN              |          |  548K |   32M |  122K |  00:17:39 |
| 3   |    SORT JOIN              |          | 9629K |  292M |  121K |  00:16:29 |
| 4   |     VIEW                  | VW_GBC_9 | 9629K |  292M |  121K |  00:16:29 |
| 5   |      HASH GROUP BY        |          | 9629K |  423M |  121K |  00:16:29 |
| 6   |       FILTER              |          |       |       |       |           |
| 7   |        HASH JOIN          |          |   12M |  524M |   25K |  00:03:24 |
| 8   |         TABLE ACCESS FULL | PARTS    |   97K | 2045K |   394 |  00:00:04 |
| 9   |         TABLE ACCESS FULL | PO_LINE  |   12M |  279M |   19K |  00:03:38 |
| 10  |    SORT JOIN              |          |   27K |  767K |  1251 |  00:00:11 |
| 11  |     TABLE ACCESS FULL     | PO_HEADER|   27K |  767K |  1068 |  00:00:09 |
---------------------------------------------+-----------------------------------+
Predicate Information:
----------------------
6 - filter(TRUNC(SYSDATE@!-90)<=TRUNC(SYSDATE@!))
7 - access("POL"."PART_ID"="P"."PART_ID")
10 - access("PO"."PURC_ORDER_ID"="ITEM_1")
10 - filter("PO"."PURC_ORDER_ID"="ITEM_1")
11 - filter(("PO"."ORDER_DATE"<=TRUNC(SYSDATE@!) AND "PO"."ORDER_DATE">=TRUNC(SYSDATE@!-90))) 

ITEM_1?  Someone’s been rewriting my SQL statement?  Let’s look a little further up the trace file (the query text printed in the 10053 trace file has been reformatted with extra line breaks and spaces):

Final query after transformations:******* UNPARSED QUERY IS *******
SELECT /*+ ORDERED */
  "PO"."VENDOR_ID" "VENDOR_ID",
  "VW_GBC_9"."ITEM_3" "PRODUCT_CODE",
  "VW_GBC_9"."ITEM_4" "STOCK_UM",
  SUM("VW_GBC_9"."ITEM_2") "ORDER_QTY"
FROM
  (SELECT
     "POL"."PURC_ORDER_ID" "ITEM_1",
     SUM("POL"."ORDER_QTY") "ITEM_2",
     "P"."PRODUCT_CODE" "ITEM_3",
     "P"."STOCK_UM" "ITEM_4"
   FROM
     "TESTUSER"."PO_LINE" "POL",
     "TESTUSER"."PARTS" "P"
   WHERE
     "POL"."PART_ID"="P"."PART_ID"
      AND TRUNC(SYSDATE@!-90)<=TRUNC(SYSDATE@!)
   GROUP BY
    "POL"."PURC_ORDER_ID","P"."PRODUCT_CODE","P"."STOCK_UM") "VW_GBC_9",
  "TESTUSER"."PO_HEADER" "PO"
WHERE
  "PO"."ORDER_DATE">=TRUNC(SYSDATE@!-90)
  AND "PO"."ORDER_DATE"<=TRUNC(SYSDATE@!)
  AND "PO"."PURC_ORDER_ID"="VW_GBC_9"."ITEM_1"
GROUP BY
  "PO"."VENDOR_ID",
  "VW_GBC_9"."ITEM_3",
  "VW_GBC_9"."ITEM_4"
kkoqbc: optimizing query block SEL$6E4193EC (#2) 

Well, I guess that explains why my ORDERED hint did not behave as expected.  Darn optimizer completely rewrote the SQL statement before the ORDERED hint was applied.  Now I am starting to wonder how this query might perform in Oracle Database 10.2.0.5, since the hinted query was affected by an automatic transformation performed by the optimizer.

————-

Now, back to demonstrate that Tanel’s claim is correct.  First, let’s find the OBJECT_IDs for the tables involved in the query:

SQL> SELECT OBJECT_NAME FROM DBA_OBJECTS WHERE OBJECT_ID=82052;

OBJECT_NAME
---------------------------------------------------------------
PARTS

SQL> SELECT OBJECT_NAME FROM DBA_OBJECTS WHERE OBJECT_ID=82062;

OBJECT_NAME
---------------------------------------------------------------
PO_HEADER

SQL> SELECT OBJECT_NAME FROM DBA_OBJECTS WHERE OBJECT_ID=82069;

OBJECT_NAME
---------------------------------------------------------------
PO_LINE 

From the above, we see that PARTS=82052, PO_HEADER=82062, and PO_LINE=82069.  Now what?  Let’s check the raw 10046 trace file for the second of our test queries that generated the following TKPROF output:

SELECT /*+ ORDERED */
FROM
  PO_LINE POL,
  PARTS P,
  PO_HEADER PO

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
      1855       1855       1855  HASH GROUP BY (cr=267225 pr=317918 pw=50685 time=529 us cost=26617 size=135415 card=1855)
    329789     329789     329789   FILTER  (cr=267225 pr=317918 pw=50685 time=3703300 us)
    329789     329789     329789    HASH JOIN  (cr=267225 pr=317918 pw=50685 time=3669899 us cost=26602 size=25101926 card=343862)
     13890      13890      13890     TABLE ACCESS FULL PO_HEADER (cr=13173 pr=13163 pw=0 time=546640 us cost=1068 size=388920 card=13890)
  12205347   12205347   12205347     HASH JOIN  (cr=254052 pr=304755 pw=50685 time=3243495 us cost=25476 size=549240615 card=12205347)
  12205347   12205347   12205347      TABLE ACCESS FULL PO_LINE (cr=249056 pr=249050 pw=0 time=1831025 us cost=19697 size=292928328 card=12205347)
     99694      99694      99694      TABLE ACCESS FULL PARTS (cr=4996 pr=4989 pw=0 time=50678 us cost=394 size=2093574 card=99694)  

------

PARSING IN CURSOR #3 len=338 dep=0 uid=286 oct=3 lid=286 tim=4471668209 hv=2860598711 ad='469666838' sqlid='2fhf2v2p82jdr'
SELECT /*+ ORDERED */
  PO.VENDOR_ID,
  P.PRODUCT_CODE,
  P.STOCK_UM,
  SUM(POL.ORDER_QTY) ORDER_QTY
FROM
  PO_LINE POL,
  PARTS P,
  PO_HEADER PO
WHERE
  PO.ORDER_DATE BETWEEN TRUNC(SYSDATE-90) AND TRUNC(SYSDATE)
  AND PO.PURC_ORDER_ID=POL.PURC_ORDER_ID
  AND POL.PART_ID=P.PART_ID
GROUP BY
  PO.VENDOR_ID,
  P.PRODUCT_CODE,
  P.STOCK_UM
END OF STMT
PARSE #3:c=0,e=43,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=4284856579,tim=4471668208
EXEC #3:c=0,e=47,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=4284856579,tim=4471668337
WAIT #3: nam='SQL*Net message to client' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=0 tim=4471668363
WAIT #3: nam='db file sequential read' ela= 359 file#=1 block#=536704 blocks=1 obj#=82062 tim=4471668787
WAIT #3: nam='db file scattered read' ela= 596 file#=1 block#=536705 blocks=7 obj#=82062 tim=4471669513
WAIT #3: nam='db file scattered read' ela= 564 file#=1 block#=536752 blocks=8 obj#=82062 tim=4471670343
WAIT #3: nam='db file scattered read' ela= 470 file#=1 block#=536760 blocks=8 obj#=82062 tim=4471671092
WAIT #3: nam='db file scattered read' ela= 639 file#=1 block#=536768 blocks=8 obj#=82062 tim=4471671970
...
WAIT #3: nam='db file scattered read' ela= 408 file#=1 block#=558336 blocks=128 obj#=82062 tim=4472034957
WAIT #3: nam='db file scattered read' ela= 334 file#=1 block#=558464 blocks=107 obj#=82062 tim=4472036641
WAIT #3: nam='db file sequential read' ela= 357 file#=1 block#=96384 blocks=1 obj#=82069 tim=4472040086
WAIT #3: nam='direct path read' ela= 385 file number=1 first dba=96385 block cnt=7 obj#=82069 tim=4472040755
WAIT #3: nam='direct path read' ela= 186 file number=1 first dba=96456 block cnt=8 obj#=82069 tim=4472045786
WAIT #3: nam='direct path read' ela= 611 file number=1 first dba=96496 block cnt=24 obj#=82069 tim=4472046904
...
WAIT #3: nam='direct path read' ela= 1104 file number=1 first dba=1342848 block cnt=128 obj#=82069 tim=4478002620
WAIT #3: nam='direct path write temp' ela= 1601 file number=201 first dba=245981 block cnt=31 obj#=82069 tim=4478004794
WAIT #3: nam='direct path write temp' ela= 1985 file number=201 first dba=246016 block cnt=31 obj#=82069 tim=4478017466
WAIT #3: nam='direct path write temp' ela= 2423 file number=201 first dba=245725 block cnt=31 obj#=82069 tim=4478023049
WAIT #3: nam='direct path write temp' ela= 441 file number=201 first dba=245791 block cnt=31 obj#=82069 tim=4478027519
WAIT #3: nam='direct path write temp' ela= 2985 file number=201 first dba=245950 block cnt=31 obj#=82069 tim=4478030913
WAIT #3: nam='asynch descriptor resize' ela= 4 outstanding #aio=1 current aio limit=4294967295 new aio limit=511 obj#=82069 tim=4478044778
WAIT #3: nam='db file sequential read' ela= 28370 file#=1 block#=515584 blocks=1 obj#=82052 tim=4478077091
WAIT #3: nam='db file scattered read' ela= 503 file#=1 block#=515585 blocks=7 obj#=82052 tim=4478077765
WAIT #3: nam='direct path read temp' ela= 806 file number=201 first dba=288671 block cnt=31 obj#=82052 tim=4478097288
WAIT #3: nam='direct path read temp' ela= 467 file number=201 first dba=288702 block cnt=31 obj#=82052 tim=4478098147
WAIT #3: nam='direct path read temp' ela= 435 file number=201 first dba=288733 block cnt=31 obj#=82052 tim=4478099041
... 

Repeating what we found earlier: PARTS=82052, PO_HEADER=82062, and PO_LINE=82069.  We see that Oracle Database's runtime engine accessed the PO_HEADER table first, and then the PO_LINE table, and then finally the PARTS table, just as Tanel had claimed would happen.

But what is that 'asynch descriptor resize' wait event?  Could Tanel Poder have already answered that question?

--------------

Edit November 30, 2010:

Summarizing the comments so far:

  • When unnesting a subquery, optimizer performs the unnest operation *before* appying the ORDERED hint. (Tanel Poder)
  • Outline directives use the LEADING hint and not the ORDERED hint. (Greg Rahn)
  • The LEADING hint should be used when possible rather than the ORDERED hint - the differences with the ORDERED hint may not be exposed until an upgrade is performed.
  • On Oracle Database 10.2.0.5 the execution plans are identical for the ORDERED and LEADING hint versions of the four SQL statements, including the one with the intentional Cartesian join (the GBP entries that appear to be associated with Cost-Based Group-By/Distinct Placement do not appear in 10053 trace files on 10.2.0.5).
  • On Oracle Database 11.1.0.7 the execution plans are identical for the ORDERED and LEADING hint versions of the first three SQL statements (and identical to 10.2.0.5), while the ORDERED version of the last SQL statement (with the intentional Cartesian join) was transformed into a different SQL statement in a section of the 10053 trace file called "Cost-Based Group By Placement"
  • On Oracle Database 11.2.0.1 the execution plans are identical for the ORDERED and LEADING hint versions of the first three SQL statements (and identical to 10.2.0.5), while the ORDERED version of the last SQL statement (with the intentional Cartesian join) was transformed into a different SQL statement (identical to 11.1.0.7) in a section of the 10053 trace file called "Cost-Based Group-By/Distinct Placement"
  • On Oracle Database 11.2.0.1 the execution plans for the SQL statements could potentially change on future executions due to the effects of Cardinality Feedback
  • On Oracle Database 11.2.0.1 when the execution plans for the SQL statements change due to Cardinality Feedback, SET AUTOTRACE TRACEONLY EXPLAIN continues to show the original execution plan even after 10046 trace files show that the runtime engine is using the execution plan developed as a result of Cardinality Feedback.
  • For execution plans affected by Cardinality Feedback, the "Content of other_xml column" portion of a 10053 trace (just below the printed execution plan) will show "cardinality_feedback: yes".

Just wondering if anyone noticed that on my laptop (used for this blog article) I apparently created the objects in the SYSTEM tablespace while connected as the TESTUSER (I did not set the default tablespace for the TESTUSER, and apparently did not set a system-wide default).  How would I know that from the above blog article alone?





Consistent Gets During a Hard Parse – a Test Case to See One Possible Cause

7 10 2010

October 7, 2010

A recent OTN thread caught my attention.  The original poster noticed that when generating a TKPROF summary from a 10046 trace file that the parse call showed a significant number of consistent gets when compared to the fetch and execute calls.  The TKPROF summary looked something like this:

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        4      0.85       1.07          0       3599          0           0
Execute      4      0.00       0.00          0          0          0           0
Fetch        4      0.01       0.05         14         36          0           4
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total       12      0.86       1.13         14       3635          0           4

Misses in library cache during parse: 4
Optimizer mode: ALL_ROWS
Parsing user id: 72 

Rows     Row Source Operation
-------  ---------------------------------------------------
      1  NESTED LOOPS  (cr=9 pr=8 pw=0 time=33821 us)
      1   TABLE ACCESS BY INDEX ROWID STATUS (cr=2 pr=2 pw=0 time=21787 us)
      1    INDEX UNIQUE SCAN STATUS_CON21 (cr=1 pr=1 pw=0 time=20957 us)(object id 58618)
      1   TABLE ACCESS FULL USER_ACCOUNT (cr=7 pr=6 pw=0 time=11978 us)

The TKPROF summary showed that there were 3,599 consistent gets during the four times that this SQL statement was hard parsed, while there were only 36 consistent gets performed during the four executions and fetches.  The displayed Row Source Operation execution plan shows that 9 consistent gets were performed, and because that number is 1/4 of the value displayed by the TKPROF  summary (the SQL statement was executed four times), that likely indicates that the OP is running Oracle Database 11.1.0.6 or greater, which by default outputs the STAT lines (TKPROF Row Source Operation lines) to the trace file after every execution.

How could this SQL statement perform an average of 900 consistent gets per hard parse, and only an average of 9 consistent gets to retrieve the rows from the two tables listed in the SQL statement?  This is probably a good excuse to build a test case to try out a couple of ideas.  A couple of good suggestions were offered in the OTN thread regarding what may cause consistent gets during a hard parse, but 900 consistent gets?  The OP mentioned that this problem is happening in a development database, and that may be a key clue.  What if the OP creates a couple of tables with a couple of indexes, and then loads data into the tables just before executing his SQL statement?  If the OPTIMIZER_MODE is set to the deprecated values of RULE or CHOOSE we could very well see one result for the number of consistent gets, which differs from what happens when the OPTIMIZER_MODE is set to a non-deprecated value.  What if the OP does not collect statistics on the tables and indexes, and those tables and indexes either do not survive until 10 PM or the DBA has disabled the automatic stale statistics collection job that typically starts around 10 PM?  What if the memory allocated to the SGA is much smaller than what is needed?  What if… (fill in the blank)?

Let’s build a quick test case.  First, we will make certain that no one has adjusted the default value for dynamic sampling:

SHOW PARAMETER DYNAMIC_SAMPLING

NAME                                 TYPE        VALUE
------------------------------------ ----------- -----
optimizer_dynamic_sampling           integer     2

Still the default, so we will continue building the test case on Oracle Database 11.1.0.7:

CREATE TABLE T1 (
  C1 NUMBER NOT NULL,
  C2 VARCHAR2(20) NOT NULL,
  C3 VARCHAR2(100));

CREATE INDEX IND_T1_C1 ON T1(C1);
CREATE INDEX IND_T1_C2 ON T1(C2);

CREATE TABLE T2 (
  C1 NUMBER NOT NULL,
  C2 VARCHAR2(20) NOT NULL,
  C3 VARCHAR2(100));

CREATE INDEX IND_T2_C1 ON T2(C1);
CREATE INDEX IND_T2_C2 ON T2(C2); 

The above creates two simple tables, each with two indexes.  Note that the indexes are created before the tables contain any rows, so statistics are not automatically collected for the indexes when they are created.  Now to insert the rows:

INSERT INTO
  T1
SELECT
  ROWNUM,
  RPAD(TO_CHAR(ROWNUM),20,'A'),
  RPAD(TO_CHAR(ROWNUM),100,'B')
FROM
  DUAL
CONNECT BY
  LEVEL<=100000;

COMMIT;

INSERT INTO
  T2
SELECT
  *
FROM
  T1;

COMMIT;

Just to verify that the indexes do not have statistics:

COLUMN INDEX_NAME FORMAT A10

SELECT
  INDEX_NAME,
  BLEVEL,
  NUM_ROWS,
  LEAF_BLOCKS
FROM
  USER_INDEXES
WHERE
  INDEX_NAME IN ('IND_T1_C1','IND_T2_C1','IND_T1_C2','IND_T2_C2');

INDEX_NAME     BLEVEL   NUM_ROWS LEAF_BLOCKS
---------- ---------- ---------- -----------
IND_T1_C1
IND_T1_C2
IND_T2_C1
IND_T2_C2 

Now for the experiment:

ALTER SESSION SET TRACEFILE_IDENTIFIER = 'PARSE_TEST1';
ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT FOREVER, LEVEL 1';

SET AUTOTRACE TRACEONLY STATISTICS

SELECT /* PARSE_TEST1 */
  T1.C1 T1_C1,
  SUBSTR(T1.C3,1,10) T1_C3,
  T2.C2 T2_C2,
  SUBSTR(T2.C3,1,10) T2_C3
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1
  AND T1.C1<=2000;

Statistics
---------------------------------------------------
         17  recursive calls
          0  db block gets
        514  consistent gets
         15  physical reads
          0  redo size
     113831  bytes sent via SQL*Net to client
       1844  bytes received via SQL*Net from client
        135  SQL*Net roundtrips to/from client
          2  sorts (memory)
          0  sorts (disk)
       2000  rows processed 

Note in the above that there were 514 consistent gets to retrieve 2,000 rows (15 rows at a time because of the default array fetch size in SQL*Plus).  Continuing with the test:

ALTER SESSION SET TRACEFILE_IDENTIFIER = 'PARSE_TEST2';

SELECT /* PARSE_TEST2 */
  T1.C1 T1_C1,
  SUBSTR(T1.C3,1,10) T1_C3,
  T2.C2 T2_C2,
  SUBSTR(T2.C3,1,10) T2_C3
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1
  AND T1.C1 BETWEEN 2001 AND 4000;

Statistics
---------------------------------------------------
         17  recursive calls
          0  db block gets
        560  consistent gets
         79  physical reads
       3276  redo size
     113930  bytes sent via SQL*Net to client
       1844  bytes received via SQL*Net from client
        135  SQL*Net roundtrips to/from client
          2  sorts (memory)
          0  sorts (disk)
       2000  rows processed 

Note in the above that there were 560 consistent gets to retrieve 2,000 rows.  Continuing with the test:

ALTER SESSION SET TRACEFILE_IDENTIFIER = 'PARSE_TEST3';

SELECT /* PARSE_TEST3 */
  T1.C1 T1_C1,
  SUBSTR(T1.C3,1,10) T1_C3,
  T2.C2 T2_C2,
  SUBSTR(T2.C3,1,10) T2_C3
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1
  AND T1.C1 BETWEEN 4001 AND 6000;

Statistics
---------------------------------------------------
         13  recursive calls
          0  db block gets
        568  consistent gets
        102  physical reads
       3564  redo size
     113930  bytes sent via SQL*Net to client
       1844  bytes received via SQL*Net from client
        135  SQL*Net roundtrips to/from client
          2  sorts (memory)
          0  sorts (disk)
       2000  rows processed 

Note in the above that there were 568 consistent gets to retrieve 2,000 rows.  Continuing with the test:

ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT OFF';

SELECT /* PARSE_TEST3 */
  T1.C1 T1_C1,
  SUBSTR(T1.C3,1,10) T1_C3,
  T2.C2 T2_C2,
  SUBSTR(T2.C3,1,10) T2_C3
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1
  AND T1.C1 BETWEEN 4001 AND 6000;

Statistics
---------------------------------------------------
         17  recursive calls
          0  db block gets
        520  consistent gets
          8  physical reads
          0  redo size
     113930  bytes sent via SQL*Net to client
       1844  bytes received via SQL*Net from client
        135  SQL*Net roundtrips to/from client
          2  sorts (memory)
          0  sorts (disk)
       2000  rows processed 

Note that in the above that there was another hard parse.

Now let’s take a look in the trace files (see this three part blog article series for help with decoding 10046 trace files), starting with the PARSE_TEST1 trace file (note that I manually line wrapped the dep=1 SQL statements):

...
=====================
PARSING IN CURSOR #4 len=619 dep=1 uid=60 oct=3 lid=60 tim=517854252962 hv=1052836258 ad='224c0738' sqlid='dwa1q5szc20d2'
SELECT /* OPT_DYN_SAMP */ /*+ ALL_ROWS IGNORE_WHERE_CLAUSE NO_PARALLEL(SAMPLESUB) opt_param('parallel_execution_enabled',
 'false') NO_PARALLEL_INDEX(SAMPLESUB) NO_SQL_TUNE */ NVL(SUM(C1),:"SYS_B_00"), NVL(SUM(C2),:"SYS_B_01"), COUNT(DISTINCT C3),
 NVL(SUM(CASE WHEN C3 IS NULL THEN :"SYS_B_02" ELSE :"SYS_B_03" END),:"SYS_B_04") FROM (SELECT /*+ IGNORE_WHERE_CLAUSE
 NO_PARALLEL("T1") FULL("T1") NO_PARALLEL_INDEX("T1") */ :"SYS_B_05" AS C1, CASE WHEN "T1"."C1"<=:"SYS_B_06" THEN :"SYS_B_07"
 ELSE :"SYS_B_08" END AS C2, "T1"."C1" AS C3 FROM "T1" SAMPLE BLOCK (:"SYS_B_09" , :"SYS_B_10") SEED (:"SYS_B_11") "T1") SAMPLESUB
END OF STMT
PARSE #4:c=0,e=0,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=1,plh=0,tim=517854252962
EXEC #4:c=0,e=0,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=1,plh=3906273573,tim=517854252962
FETCH #4:c=0,e=0,p=0,cr=71,cu=0,mis=0,r=1,dep=1,og=1,plh=3906273573,tim=517854252962
STAT #4 id=1 cnt=1 pid=0 pos=1 obj=0 op='SORT GROUP BY (cr=71 pr=0 pw=0 time=0 us)'
STAT #4 id=2 cnt=3410 pid=1 pos=1 obj=102003 op='TABLE ACCESS SAMPLE T1 (cr=71 pr=0 pw=0 time=0 us cost=2 size=75 card=3)'
CLOSE #4:c=0,e=0,dep=1,type=0,tim=517854252962
=====================
PARSING IN CURSOR #2 len=445 dep=1 uid=60 oct=3 lid=60 tim=517854252962 hv=840679737 ad='285dce70' sqlid='2r4a0g8t1rh9t'
SELECT /* OPT_DYN_SAMP */ /*+ ALL_ROWS opt_param('parallel_execution_enabled', 'false') NO_PARALLEL(SAMPLESUB)
 NO_PARALLEL_INDEX(SAMPLESUB) NO_SQL_TUNE */ NVL(SUM(C1),:"SYS_B_0"), NVL(SUM(C2),:"SYS_B_1"), NVL(SUM(C3),:"SYS_B_2") FROM
 (SELECT /*+ NO_PARALLEL("T1") INDEX("T1" IND_T1_C1) NO_PARALLEL_INDEX("T1") */ :"SYS_B_3" AS C1, :"SYS_B_4" AS C2, :"SYS_B_5" AS C3
  FROM "T1" "T1" WHERE "T1"."C1"<=:"SYS_B_6" AND ROWNUM <= :"SYS_B_7") SAMPLESUB
END OF STMT
PARSE #2:c=0,e=0,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=1,plh=0,tim=517854252962
EXEC #2:c=0,e=0,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=1,plh=311010773,tim=517854252962
FETCH #2:c=0,e=0,p=0,cr=5,cu=0,mis=0,r=1,dep=1,og=1,plh=311010773,tim=517854252962
STAT #2 id=1 cnt=1 pid=0 pos=1 obj=0 op='SORT AGGREGATE (cr=5 pr=0 pw=0 time=0 us)'
STAT #2 id=2 cnt=2000 pid=1 pos=1 obj=0 op='VIEW  (cr=5 pr=0 pw=0 time=0 us cost=12 size=390 card=10)'
STAT #2 id=3 cnt=2000 pid=2 pos=1 obj=0 op='COUNT STOPKEY (cr=5 pr=0 pw=0 time=0 us)'
STAT #2 id=4 cnt=2000 pid=3 pos=1 obj=102004 op='INDEX RANGE SCAN IND_T1_C1 (cr=5 pr=0 pw=0 time=0 us cost=12 size=429 card=33)'
CLOSE #2:c=0,e=0,dep=1,type=0,tim=517854252962
=====================
PARSING IN CURSOR #6 len=619 dep=1 uid=60 oct=3 lid=60 tim=517854252962 hv=3874662801 ad='224bb3a0' sqlid='0qdcdrrmg5acj'
SELECT /* OPT_DYN_SAMP */ /*+ ALL_ROWS IGNORE_WHERE_CLAUSE NO_PARALLEL(SAMPLESUB) opt_param('parallel_execution_enabled', 'false')
 NO_PARALLEL_INDEX(SAMPLESUB) NO_SQL_TUNE */ NVL(SUM(C1),:"SYS_B_00"), NVL(SUM(C2),:"SYS_B_01"), COUNT(DISTINCT C3),
 NVL(SUM(CASE WHEN C3 IS NULL THEN :"SYS_B_02" ELSE :"SYS_B_03" END),:"SYS_B_04") FROM (SELECT /*+ IGNORE_WHERE_CLAUSE NO_PARALLEL("T2")
 FULL("T2") NO_PARALLEL_INDEX("T2") */ :"SYS_B_05" AS C1, CASE WHEN "T2"."C1"<=:"SYS_B_06" THEN :"SYS_B_07" ELSE :"SYS_B_08" END AS C2,
 "T2"."C1" AS C3 FROM "T2" SAMPLE BLOCK (:"SYS_B_09" , :"SYS_B_10") SEED (:"SYS_B_11") "T2") SAMPLESUB
END OF STMT
PARSE #6:c=0,e=0,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=1,plh=0,tim=517854252962
EXEC #6:c=0,e=0,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=1,plh=724732973,tim=517854252962
FETCH #6:c=0,e=31245,p=15,cr=71,cu=0,mis=0,r=1,dep=1,og=1,plh=724732973,tim=517854284207
STAT #6 id=1 cnt=1 pid=0 pos=1 obj=0 op='SORT GROUP BY (cr=71 pr=15 pw=0 time=0 us)'
STAT #6 id=2 cnt=3420 pid=1 pos=1 obj=102006 op='TABLE ACCESS SAMPLE T2 (cr=71 pr=15 pw=0 time=0 us cost=2 size=75 card=3)'
CLOSE #6:c=0,e=0,dep=1,type=0,tim=517854284207
=====================
PARSING IN CURSOR #5 len=445 dep=1 uid=60 oct=3 lid=60 tim=517854284207 hv=2421073910 ad='1f4973fc' sqlid='b84x6ja84x9zq'
SELECT /* OPT_DYN_SAMP */ /*+ ALL_ROWS opt_param('parallel_execution_enabled', 'false') NO_PARALLEL(SAMPLESUB) NO_PARALLEL_INDEX(SAMPLESUB)
 NO_SQL_TUNE */ NVL(SUM(C1),:"SYS_B_0"), NVL(SUM(C2),:"SYS_B_1"), NVL(SUM(C3),:"SYS_B_2") FROM (SELECT /*+ NO_PARALLEL("T2")
 INDEX("T2" IND_T2_C1) NO_PARALLEL_INDEX("T2") */ :"SYS_B_3" AS C1, :"SYS_B_4" AS C2, :"SYS_B_5" AS C3  FROM "T2" "T2" WHERE
 "T2"."C1"<=:"SYS_B_6" AND ROWNUM <= :"SYS_B_7") SAMPLESUB
END OF STMT
PARSE #5:c=0,e=0,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=1,plh=0,tim=517854284207
EXEC #5:c=0,e=0,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=1,plh=1821698049,tim=517854284207
FETCH #5:c=0,e=0,p=0,cr=8,cu=0,mis=0,r=1,dep=1,og=1,plh=1821698049,tim=517854284207
STAT #5 id=1 cnt=1 pid=0 pos=1 obj=0 op='SORT AGGREGATE (cr=8 pr=0 pw=0 time=0 us)'
STAT #5 id=2 cnt=2000 pid=1 pos=1 obj=0 op='VIEW  (cr=8 pr=0 pw=0 time=0 us cost=12 size=390 card=10)'
STAT #5 id=3 cnt=2000 pid=2 pos=1 obj=0 op='COUNT STOPKEY (cr=8 pr=0 pw=0 time=0 us)'
STAT #5 id=4 cnt=2000 pid=3 pos=1 obj=102007 op='INDEX RANGE SCAN IND_T2_C1 (cr=8 pr=0 pw=0 time=0 us cost=12 size=429 card=33)'
CLOSE #5:c=0,e=0,dep=1,type=0,tim=517854284207
=====================
PARSING IN CURSOR #3 len=163 dep=0 uid=60 oct=3 lid=60 tim=517854284207 hv=701515739 ad='1f4849dc' sqlid='dr1pvrsnx0jyv'
SELECT /* PARSE_TEST1 */
  T1.C1 T1_C1,
  SUBSTR(T1.C3,1,10) T1_C3,
  T2.C2 T2_C2,
  SUBSTR(T2.C3,1,10) T2_C3
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1
  AND T1.C1<=2000
END OF STMT
PARSE #3:c=0,e=31245,p=15,cr=159,cu=0,mis=1,r=0,dep=0,og=1,plh=169351222,tim=517854284207
EXEC #3:c=0,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=169351222,tim=517854284207
FETCH #3:c=15625,e=31263,p=0,cr=45,cu=0,mis=0,r=1,dep=0,og=1,plh=169351222,tim=517854315470
FETCH #3:c=0,e=0,p=0,cr=2,cu=0,mis=0,r=15,dep=0,og=1,plh=169351222,tim=517854315470
...
FETCH #3:c=0,e=0,p=0,cr=2,cu=0,mis=0,r=4,dep=0,og=1,plh=169351222,tim=517854346714
STAT #3 id=1 cnt=2000 pid=0 pos=1 obj=0 op='HASH JOIN  (cr=355 pr=0 pw=0 time=0 us cost=48 size=284000 card=2000)'
STAT #3 id=2 cnt=2000 pid=1 pos=1 obj=102003 op='TABLE ACCESS BY INDEX ROWID T1 (cr=42 pr=0 pw=0 time=0 us cost=22 size=130000 card=2000)'
STAT #3 id=3 cnt=2000 pid=2 pos=1 obj=102004 op='INDEX RANGE SCAN IND_T1_C1 (cr=5 pr=0 pw=0 time=0 us cost=6 size=0 card=2000)'
STAT #3 id=4 cnt=2000 pid=1 pos=2 obj=102006 op='TABLE ACCESS BY INDEX ROWID T2 (cr=313 pr=0 pw=0 time=0 us cost=25 size=154000 card=2000)'
STAT #3 id=5 cnt=2000 pid=4 pos=1 obj=102007 op='INDEX RANGE SCAN IND_T2_C1 (cr=142 pr=0 pw=0 time=0 us cost=9 size=0 card=2000)' 

In the above, you will notice that the parse call for our SQL statement performed 159 consistent gets.  If you add up the number of consistent gets performed by the dep=1 SQL statements that immediately preceed our SQL statement (71 + 5 + 71 + 8), you can see where 155 consistent gets were performed during the hard parse.  The first STAT line shows that the SQL statement actually required 355 consistent gets and no physical reads (SQL*Plus showed that 514 consistent gets and 15 physical reads were performed, and if you look closely at the dep=1 SQL statement you can see where the 15 physical block reads were performed).  355 + 155 = 510, which is just less than the 514 consistent gets reported by SQL*Plus, so we could look further up in the trace file to find the remaining 4 consistent gets.

Let’s take a look at the PARSE_TEST3 trace file:

...
=====================
PARSING IN CURSOR #3 len=646 dep=1 uid=60 oct=3 lid=60 tim=518053123679 hv=1391468433 ad='2264adec' sqlid='16htvx19g07wj'
SELECT /* OPT_DYN_SAMP */ /*+ ALL_ROWS IGNORE_WHERE_CLAUSE NO_PARALLEL(SAMPLESUB) opt_param('parallel_execution_enabled', 'false')
 NO_PARALLEL_INDEX(SAMPLESUB) NO_SQL_TUNE */ NVL(SUM(C1),:"SYS_B_00"), NVL(SUM(C2),:"SYS_B_01"), COUNT(DISTINCT C3),
 NVL(SUM(CASE WHEN C3 IS NULL THEN :"SYS_B_02" ELSE :"SYS_B_03" END),:"SYS_B_04") FROM (SELECT /*+ IGNORE_WHERE_CLAUSE NO_PARALLEL("T1")
 FULL("T1") NO_PARALLEL_INDEX("T1") */ :"SYS_B_05" AS C1, CASE WHEN "T1"."C1">=:"SYS_B_06" AND "T1"."C1"<=:"SYS_B_07" THEN :"SYS_B_08"
 ELSE :"SYS_B_09" END AS C2, "T1"."C1" AS C3 FROM "T1" SAMPLE BLOCK (:"SYS_B_10" , :"SYS_B_11") SEED (:"SYS_B_12") "T1") SAMPLESUB
END OF STMT
PARSE #3:c=0,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=3906273573,tim=518053123679
EXEC #3:c=0,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=3906273573,tim=518053123679
FETCH #3:c=15625,e=0,p=0,cr=71,cu=0,mis=0,r=1,dep=1,og=1,plh=3906273573,tim=518053123679
STAT #3 id=1 cnt=1 pid=0 pos=1 obj=0 op='SORT GROUP BY (cr=71 pr=0 pw=0 time=0 us)'
STAT #3 id=2 cnt=3410 pid=1 pos=1 obj=102003 op='TABLE ACCESS SAMPLE T1 (cr=71 pr=0 pw=0 time=0 us cost=2 size=75 card=3)'
CLOSE #3:c=0,e=0,dep=1,type=0,tim=518053123679
=====================
PARSING IN CURSOR #4 len=471 dep=1 uid=60 oct=3 lid=60 tim=518053123679 hv=1212909034 ad='1f5d5b28' sqlid='c5sn2v544r1ga'
SELECT /* OPT_DYN_SAMP */ /*+ ALL_ROWS opt_param('parallel_execution_enabled', 'false') NO_PARALLEL(SAMPLESUB) NO_PARALLEL_INDEX(SAMPLESUB)
 NO_SQL_TUNE */ NVL(SUM(C1),:"SYS_B_0"), NVL(SUM(C2),:"SYS_B_1"), NVL(SUM(C3),:"SYS_B_2") FROM (SELECT /*+ NO_PARALLEL("T1") INDEX("T1" IND_T1_C1)
 NO_PARALLEL_INDEX("T1") */ :"SYS_B_3" AS C1, :"SYS_B_4" AS C2, :"SYS_B_5" AS C3  FROM "T1" "T1" WHERE "T1"."C1">=:"SYS_B_6"
 AND "T1"."C1"<=:"SYS_B_7" AND ROWNUM <= :"SYS_B_8") SAMPLESUB
END OF STMT
PARSE #4:c=0,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=41534134,tim=518053123679
EXEC #4:c=0,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=41534134,tim=518053123679
FETCH #4:c=0,e=0,p=4,cr=10,cu=0,mis=0,r=1,dep=1,og=1,plh=41534134,tim=518053123679
STAT #4 id=1 cnt=1 pid=0 pos=1 obj=0 op='SORT AGGREGATE (cr=10 pr=4 pw=0 time=0 us)'
STAT #4 id=2 cnt=2000 pid=1 pos=1 obj=0 op='VIEW  (cr=10 pr=4 pw=0 time=0 us cost=12 size=390 card=10)'
STAT #4 id=3 cnt=2000 pid=2 pos=1 obj=0 op='COUNT STOPKEY (cr=10 pr=4 pw=0 time=0 us)'
STAT #4 id=4 cnt=2000 pid=3 pos=1 obj=0 op='FILTER  (cr=10 pr=4 pw=0 time=0 us)'
STAT #4 id=5 cnt=2000 pid=4 pos=1 obj=102004 op='INDEX RANGE SCAN IND_T1_C1 (cr=10 pr=4 pw=0 time=0 us cost=12 size=429 card=33)'
CLOSE #4:c=0,e=0,dep=1,type=0,tim=518053123679
=====================
PARSING IN CURSOR #2 len=646 dep=1 uid=60 oct=3 lid=60 tim=518053123679 hv=2627297926 ad='1d2877e4' sqlid='0u23vn6f9ksn6'
SELECT /* OPT_DYN_SAMP */ /*+ ALL_ROWS IGNORE_WHERE_CLAUSE NO_PARALLEL(SAMPLESUB) opt_param('parallel_execution_enabled', 'false')
 NO_PARALLEL_INDEX(SAMPLESUB) NO_SQL_TUNE */ NVL(SUM(C1),:"SYS_B_00"), NVL(SUM(C2),:"SYS_B_01"), COUNT(DISTINCT C3),
 NVL(SUM(CASE WHEN C3 IS NULL THEN :"SYS_B_02" ELSE :"SYS_B_03" END),:"SYS_B_04") FROM (SELECT /*+ IGNORE_WHERE_CLAUSE NO_PARALLEL("T2") FULL("T2")
 NO_PARALLEL_INDEX("T2") */ :"SYS_B_05" AS C1, CASE WHEN "T2"."C1">=:"SYS_B_06" AND "T2"."C1"<=:"SYS_B_07" THEN :"SYS_B_08" ELSE :"SYS_B_09" END AS C2,
 "T2"."C1" AS C3 FROM "T2" SAMPLE BLOCK (:"SYS_B_10" , :"SYS_B_11") SEED (:"SYS_B_12") "T2") SAMPLESUB
END OF STMT
PARSE #2:c=0,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=724732973,tim=518053123679
EXEC #2:c=0,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=724732973,tim=518053123679
FETCH #2:c=0,e=31256,p=19,cr=71,cu=0,mis=0,r=1,dep=1,og=1,plh=724732973,tim=518053154935
STAT #2 id=1 cnt=1 pid=0 pos=1 obj=0 op='SORT GROUP BY (cr=71 pr=19 pw=0 time=0 us)'
STAT #2 id=2 cnt=3420 pid=1 pos=1 obj=102006 op='TABLE ACCESS SAMPLE T2 (cr=71 pr=19 pw=0 time=0 us cost=2 size=75 card=3)'
CLOSE #2:c=0,e=0,dep=1,type=0,tim=518053154935
=====================
PARSING IN CURSOR #6 len=471 dep=1 uid=60 oct=3 lid=60 tim=518053154935 hv=2390595667 ad='1f59c3c4' sqlid='5xf39uf77v62m'
SELECT /* OPT_DYN_SAMP */ /*+ ALL_ROWS opt_param('parallel_execution_enabled', 'false') NO_PARALLEL(SAMPLESUB) NO_PARALLEL_INDEX(SAMPLESUB) NO_SQL_TUNE */
 NVL(SUM(C1),:"SYS_B_0"), NVL(SUM(C2),:"SYS_B_1"), NVL(SUM(C3),:"SYS_B_2") FROM (SELECT /*+ NO_PARALLEL("T2") INDEX("T2" IND_T2_C1)
 NO_PARALLEL_INDEX("T2") */ :"SYS_B_3" AS C1, :"SYS_B_4" AS C2, :"SYS_B_5" AS C3  FROM "T2" "T2" WHERE "T2"."C1">=:"SYS_B_6" AND "T2"."C1"<=:"SYS_B_7"
 AND ROWNUM <= :"SYS_B_8") SAMPLESUB
END OF STMT
PARSE #6:c=0,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=4155399029,tim=518053154935
EXEC #6:c=0,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=4155399029,tim=518053154935
FETCH #6:c=0,e=31251,p=7,cr=16,cu=0,mis=0,r=1,dep=1,og=1,plh=4155399029,tim=518053186186
STAT #6 id=1 cnt=1 pid=0 pos=1 obj=0 op='SORT AGGREGATE (cr=16 pr=7 pw=0 time=0 us)'
STAT #6 id=2 cnt=2000 pid=1 pos=1 obj=0 op='VIEW  (cr=16 pr=7 pw=0 time=0 us cost=12 size=390 card=10)'
STAT #6 id=3 cnt=2000 pid=2 pos=1 obj=0 op='COUNT STOPKEY (cr=16 pr=7 pw=0 time=0 us)'
STAT #6 id=4 cnt=2000 pid=3 pos=1 obj=0 op='FILTER  (cr=16 pr=7 pw=0 time=0 us)'
STAT #6 id=5 cnt=2000 pid=4 pos=1 obj=102007 op='INDEX RANGE SCAN IND_T2_C1 (cr=16 pr=7 pw=0 time=0 us cost=12 size=429 card=33)'
CLOSE #6:c=0,e=0,dep=1,type=0,tim=518053186186
=====================
PARSING IN CURSOR #5 len=179 dep=0 uid=60 oct=3 lid=60 tim=518053186186 hv=3811374996 ad='28520570' sqlid='0m07kq3jktxwn'
SELECT /* PARSE_TEST3 */
  T1.C1 T1_C1,
  SUBSTR(T1.C3,1,10) T1_C3,
  T2.C2 T2_C2,
  SUBSTR(T2.C3,1,10) T2_C3
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1
  AND T1.C1 BETWEEN 4001 AND 6000
END OF STMT
PARSE #5:c=15625,e=62507,p=30,cr=172,cu=0,mis=1,r=0,dep=0,og=1,plh=169351222,tim=518053186186
EXEC #5:c=0,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=169351222,tim=518053186186
...
FETCH #5:c=0,e=0,p=0,cr=2,cu=0,mis=0,r=4,dep=0,og=1,plh=169351222,tim=518053279932
STAT #5 id=1 cnt=2000 pid=0 pos=1 obj=0 op='HASH JOIN  (cr=396 pr=72 pw=0 time=0 us cost=48 size=284000 card=2000)'
STAT #5 id=2 cnt=2000 pid=1 pos=1 obj=102003 op='TABLE ACCESS BY INDEX ROWID T1 (cr=44 pr=35 pw=0 time=0 us cost=22 size=130000 card=2000)'
STAT #5 id=3 cnt=2000 pid=2 pos=1 obj=102004 op='INDEX RANGE SCAN IND_T1_C1 (cr=6 pr=0 pw=0 time=0 us cost=6 size=0 card=2000)'
STAT #5 id=4 cnt=2000 pid=1 pos=2 obj=102006 op='TABLE ACCESS BY INDEX ROWID T2 (cr=352 pr=37 pw=0 time=0 us cost=25 size=154000 card=2000)'
STAT #5 id=5 cnt=2000 pid=4 pos=1 obj=102007 op='INDEX RANGE SCAN IND_T2_C1 (cr=143 pr=0 pw=0 time=0 us cost=9 size=0 card=2000)' 

This time you can see that 172 consistent gets were performed during the hard parse.  If we add up the consistent gets just before our SQL statement appeared in the trace file (71 + 10 + 71 + 16), we can account for 168 of the 172 consistent gets during the parse of our SQL statement.  Note that the SQL_ID for the SQL statement appeared in the trace file (’0m07kq3jktxwn’ for the last trace file), so we could do something like this to quickly confirm that dynamic sampling happened during the hard parse without looking in the 10046 trace file:

SET LINESIZE 140
SET PAGESIZE 1000
SET TRIMSPOOL ON
SET AUTOTRACE OFF

SELECT
  *
FROM
  TABLE(DBMS_XPLAN.DISPLAY_CURSOR('0m07kq3jktxwn',NULL,'TYPICAL'));

SQL_ID  0m07kq3jktxwn, child number 0
-------------------------------------
SELECT /* PARSE_TEST3 */   T1.C1 T1_C1,   SUBSTR(T1.C3,1,10) T1_C3,
T2.C2 T2_C2,   SUBSTR(T2.C3,1,10) T2_C3 FROM   T1,   T2 WHERE
T1.C1=T2.C1   AND T1.C1 BETWEEN 4001 AND 6000

Plan hash value: 169351222

------------------------------------------------------------------------------------------
| Id  | Operation                    | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |           |       |       |    48 (100)|          |
|*  1 |  HASH JOIN                   |           |  2000 |   277K|    48   (3)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1        |  2000 |   126K|    22   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | IND_T1_C1 |  2000 |       |     6   (0)| 00:00:01 |
|   4 |   TABLE ACCESS BY INDEX ROWID| T2        |  2000 |   150K|    25   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN          | IND_T2_C1 |  2000 |       |     9   (0)| 00:00:01 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."C1"="T2"."C1")
   3 - access("T1"."C1">=4001 AND "T1"."C1"<=6000)
   5 - access("T2"."C1">=4001 AND "T2"."C1"<=6000)

Note
-----
   - dynamic sampling used for this statement

SQL_ID  0m07kq3jktxwn, child number 1
-------------------------------------
SELECT /* PARSE_TEST3 */   T1.C1 T1_C1,   SUBSTR(T1.C3,1,10) T1_C3,
T2.C2 T2_C2,   SUBSTR(T2.C3,1,10) T2_C3 FROM   T1,   T2 WHERE
T1.C1=T2.C1   AND T1.C1 BETWEEN 4001 AND 6000

Plan hash value: 169351222

------------------------------------------------------------------------------------------
| Id  | Operation                    | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |           |       |       |    48 (100)|          |
|*  1 |  HASH JOIN                   |           |  2000 |   277K|    48   (3)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1        |  2000 |   126K|    22   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | IND_T1_C1 |  2000 |       |     6   (0)| 00:00:01 |
|   4 |   TABLE ACCESS BY INDEX ROWID| T2        |  2000 |   150K|    25   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN          | IND_T2_C1 |  2000 |       |     9   (0)| 00:00:01 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."C1"="T2"."C1")
   3 - access("T1"."C1">=4001 AND "T1"."C1"<=6000)
   5 - access("T2"."C1">=4001 AND "T2"."C1"<=6000)

Note
-----
   - dynamic sampling used for this statement 

The note at the bottom of each execution plan shows that dynamic sampling happened during the hard parse.  Note also that there are two child cursors with the same execution plan.  One was created when the 10046 trace was active, and the other was created after the 10046 trace was disabled.

We could then extend this test case by collecting table AND index statistics, and then re-execute the test script to compare the results.  So, what are the other possible causes for consistent gets during a hard parse?  Start with my test case and see if you are able to show the source of the consistent gets that are output by SQL*Plus or a TKPROF summary.





Test Case Showing Oracle Database 11.2.0.1 Completely Ignoring an Index Even when Hinted

22 09 2010

September 22, 2010

In a recent OTN thread a person asked an interesting question: why isn’t my index being used?  A query of a table with 8,000,000 rows should quickly return exactly 3 rows when an available index is used, and that index is used when the WHERE clause is simply:

WHERE
  C200000020 LIKE 'BOSS' || '%' 

However, the application is submitting a WHERE clause that includes an impossible condition in an OR clause, like the following, which is not much different from stating OR 1=2:

WHERE
  C200000020 LIKE 'BOSS' || '%'
  OR 'BOSS' = ''

That constant1 = constant2 predicate, at least on Oracle Database 10.1 and above, is sufficient to keep the index from being used, thus the query performs a full table scan.  But why?

I think that we need a test case to see what is happening.  First, we will create a simple table with our column of interest and a large column that should help to discourage full table scans:

CREATE TABLE T1 (
  C200000020 VARCHAR2(20),
  PADDING VARCHAR2(250));

Next, we will insert 10,000,000 rows into the table such that an index built on the column C200000020 will have a very high clustering factor, and 3 rows will have a value that begins with BOSS (as a result of the DECODE statement):

INSERT INTO
  T1
SELECT
  DECODE(MOD(ROWNUM,3000000),0,'BOSS'||ROWNUM,
  CHR(90-MOD(ROWNUM-1,26))||
  CHR(75+MOD(ROWNUM,10))||
  CHR(80+MOD(ROWNUM,5))||
  'S'||ROWNUM) C200000020,
  RPAD('A',200,'A') PADDING
FROM
  (SELECT
    ROWNUM RN
  FROM
    DUAL
  CONNECT BY
    LEVEL<=10000) V1,
  (SELECT
    ROWNUM RN
  FROM
    DUAL
  CONNECT BY
    LEVEL<=1000) V2;

Now to create the index and collect statistics:

CREATE INDEX IND_T1 ON T1(C200000020);

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T1',CASCADE=>TRUE)

Let’s take a look at the execution plans:

SET AUTOTRACE TRACEONLY EXPLAIN

SELECT
  *
FROM
  T1
WHERE
  C200000020 LIKE 'BOSS' || '%';

Plan hash value: 634656657

--------------------------------------------------------------------------------------
| Id  | Operation                   | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |        |     1 |   213 |     5   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1     |     1 |   213 |     5   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IND_T1 |     1 |       |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C200000020" LIKE 'BOSS%')
       filter("C200000020" LIKE 'BOSS%')

An index access, just like we had hoped.  The optimizer is predicting a single row to be retrieved.  Let’s try the other query:

SELECT
  *
FROM
  T1
WHERE
  C200000020 LIKE 'BOSS' || '%'
  OR 'BOSS' = '';

Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |   100K|    20M| 82353   (1)| 00:16:29 |
|*  1 |  TABLE ACCESS FULL| T1   |   100K|    20M| 82353   (1)| 00:16:29 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter('BOSS'='' OR "C200000020" LIKE 'BOSS%')

A full table scan, just like the original poster in the OTN thread experienced.  Notice that the optimizer is now predicting that 100,000 rows (1% of the rows) will be retrieved.  Repeating, 1% of the rows and a full table scan.  Let’s generate a 10053 trace for the SQL statement:

ALTER SYSTEM FLUSH SHARED_POOL;

SET AUTOTRACE OFF

ALTER SESSION SET TRACEFILE_IDENTIFIER = 'T1_10053';
ALTER SESSION SET EVENTS '10053 TRACE NAME CONTEXT FOREVER, LEVEL 1';

SELECT
  *
FROM
  T1
WHERE
  C200000020 LIKE 'BOSS' || '%'
  OR 'BOSS' = '';

ALTER SESSION SET EVENTS '10053 TRACE NAME CONTEXT OFF';

Inside the 10053 trace, my 11.2.0.1 test database produced the following:

***************************************
BASE STATISTICAL INFORMATION
***********************
Table Stats::
  Table: T1  Alias: T1
    #Rows: 10000000  #Blks:  303031  AvgRowLen:  213.00
Index Stats::
  Index: IND_T1  Col#: 1
    LVLS: 2  #LB: 32323  #DK: 9939968  LB/K: 1.00  DB/K: 1.00  CLUF: 10120176.00
Access path analysis for T1
***************************************
SINGLE TABLE ACCESS PATH
  Single Table Cardinality Estimation for T1[T1]
  Table: T1  Alias: T1
    Card: Original: 10000000.000000  Rounded: 100001  Computed: 100001.02  Non Adjusted: 100001.02
  Access Path: TableScan
    Cost:  82352.89  Resp: 82352.89  Degree: 0
      Cost_io: 82073.00  Cost_cpu: 7150017105
      Resp_io: 82073.00  Resp_cpu: 7150017105
  ****** trying bitmap/domain indexes ******
  ****** finished trying bitmap/domain indexes ******
  Best:: AccessPath: TableScan
         Cost: 82352.89  Degree: 1  Resp: 82352.89  Card: 100001.02  Bytes: 0

***************************************

OPTIMIZER STATISTICS AND COMPUTATIONS
***************************************
GENERAL PLANS
***************************************
Considering cardinality-based initial join order.
Permutations for Starting Table :0
Join order[1]:  T1[T1]#0
***********************
Best so far:  Table#: 0  cost: 82352.8851  card: 100001.0195  bytes: 21300213
***********************
(newjo-stop-1) k:0, spcnt:0, perm:1, maxperm:2000

*********************************
Number of join permutations tried: 1
*********************************

The unknown result of the constant in the WHERE clause (‘BOSS’ = ”) caused Oracle to predict that the cardinality will be (1 row) + (1% of the rows) = 100,001.  With a clustering factor of 10,120,176 the optimizer is (possibly) convinced that it will need to perform single block physical reads of a large number of table blocks to read the 100,001 rows that it expects to retrieve, so it decided that a full table scan would complete faster.  But the situation is worse than that – it did not even consider an index access path.  As a demonstration, I will manually set the index’s clustering factor to a low value and check the execution plan again:

EXEC DBMS_STATS.SET_INDEX_STATS(OWNNAME=>USER,INDNAME=>'IND_T1',CLSTFCT=>10000)

SET AUTOTRACE TRACEONLY EXPLAIN

ALTER SYSTEM FLUSH SHARED_POOL;

SELECT
  *
FROM
  T1
WHERE
  C200000020 LIKE 'BOSS' || '%'
  OR 'BOSS' = '';

Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |   100K|    20M| 82353   (1)| 00:16:29 |
|*  1 |  TABLE ACCESS FULL| T1   |   100K|    20M| 82353   (1)| 00:16:29 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter('BOSS'='' OR "C200000020" LIKE 'BOSS%')

Still a full table scan.  If we had generated a 10053 trace, we would see that the clustering factor for the index was indeed adjusted from what we saw earlier:

***************************************
BASE STATISTICAL INFORMATION
***********************
Table Stats::
  Table: T1  Alias: T1
    #Rows: 10000000  #Blks:  303031  AvgRowLen:  213.00
Index Stats::
  Index: IND_T1  Col#: 1
    LVLS: 2  #LB: 32323  #DK: 9939968  LB/K: 1.00  DB/K: 1.00  CLUF: 10000.00
Access path analysis for T1
***************************************
SINGLE TABLE ACCESS PATH
  Single Table Cardinality Estimation for T1[T1]
  Table: T1  Alias: T1
    Card: Original: 10000000.000000  Rounded: 100001  Computed: 100001.02  Non Adjusted: 100001.02
  Access Path: TableScan
    Cost:  82352.89  Resp: 82352.89  Degree: 0
      Cost_io: 82073.00  Cost_cpu: 7150017105
      Resp_io: 82073.00  Resp_cpu: 7150017105
  ****** trying bitmap/domain indexes ******
  ****** finished trying bitmap/domain indexes ******
  Best:: AccessPath: TableScan
         Cost: 82352.89  Degree: 1  Resp: 82352.89  Card: 100001.02  Bytes: 0

Let’s force the execution plan with an index hint to see what happens:

SELECT /*+ INDEX(T1 IND_T1) */
  *
FROM
  T1
WHERE
  C200000020 LIKE 'BOSS' || '%'
  OR 'BOSS' = '';

Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |   100K|    20M| 82353   (1)| 00:16:29 |
|*  1 |  TABLE ACCESS FULL| T1   |   100K|    20M| 82353   (1)| 00:16:29 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter('BOSS'='' OR "C200000020" LIKE 'BOSS%')

Note that the optimizer did not (or could not) obey the hint.  It decided to apply the ‘BOSS’=” predicate first, so maybe that is the problem.  Let’s try a hint to force the optimizer to apply the predicates in order, rather than based on calculated cost:

SELECT /*+ ORDERED_PREDICATES INDEX(T1 IND_T1) */
  *
FROM
  T1
WHERE
  C200000020 LIKE 'BOSS' || '%'
  OR 'BOSS' = '';

Execution Plan
----------------------------------------------------------
Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |   100K|    20M| 82353   (1)| 00:16:29 |
|*  1 |  TABLE ACCESS FULL| T1   |   100K|    20M| 82353   (1)| 00:16:29 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("C200000020" LIKE 'BOSS%' OR 'BOSS'='')

The predicate section of the execution plan changed, but the optimizer still will not consider an index access path for the SQL statement.  There is a chance that the OP could do something to force the index access path by hacking a stored outline for the query, but my guess is that the restriction on the C200000020 column changes from time to time, so an outline likely will not work.  The OP could try to file an Oracle bug report because the optimizer completely ignored the index access paths (as shown in the 10053 trace file), but a better course of action would be to have the application submitting the SQL statement fixed.

Let’s try a small variation on the original test SQL statement.  Let’s see what happens when we add a space between the two characters:

SET AUTOTRACE TRACEONLY EXPLAIN

ALTER SESSION SET TRACEFILE_IDENTIFIER = 'T1_10053-3';
ALTER SESSION SET EVENTS '10053 TRACE NAME CONTEXT FOREVER, LEVEL 1';

SELECT
  *
FROM
  T1
WHERE
  C200000020 LIKE 'BOSS' || '%'
  OR 'BOSS' = ' ';

Execution Plan
----------------------------------------------------------
Plan hash value: 634656657
--------------------------------------------------------------------------------------
| Id  | Operation                   | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |        |     1 |   213 |     4   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1     |     1 |   213 |     4   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IND_T1 |     1 |       |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C200000020" LIKE 'BOSS%')
       filter("C200000020" LIKE 'BOSS%')

ALTER SESSION SET EVENTS '10053 TRACE NAME CONTEXT OFF';

Notice by looking at the Predicate Information section of the plan that Oracle removed the nonsensical OR clause.  The 10053 trace file showed this:

***************************************
BASE STATISTICAL INFORMATION
***********************
Table Stats::
  Table: T1  Alias: T1
    #Rows: 10000000  #Blks:  303031  AvgRowLen:  213.00
Index Stats::
  Index: IND_T1  Col#: 1
    LVLS: 2  #LB: 32323  #DK: 9939968  LB/K: 1.00  DB/K: 1.00  CLUF: 10000.00
Access path analysis for T1
***************************************
SINGLE TABLE ACCESS PATH
  Single Table Cardinality Estimation for T1[T1]
  Table: T1  Alias: T1
    Card: Original: 10000000.000000  Rounded: 1  Computed: 1.03  Non Adjusted: 1.03
  Access Path: TableScan
    Cost:  82255.34  Resp: 82255.34  Degree: 0
      Cost_io: 82073.00  Cost_cpu: 4658017105
      Resp_io: 82073.00  Resp_cpu: 4658017105
kkofmx: index filter:"T1"."C200000020" LIKE 'BOSS%'
  Access Path: index (RangeScan)
    Index: IND_T1
    resc_io: 4.00  resc_cpu: 29226
    ix_sel: 0.000000  ix_sel_with_filters: 0.000000
    Cost: 4.00  Resp: 4.00  Degree: 1
  Best:: AccessPath: IndexRange
  Index: IND_T1
         Cost: 4.00  Degree: 1  Resp: 4.00  Card: 1.03  Bytes: 0

So, it appears that if the optimizer is presented with a zero length VARCHAR being compared with another VARCHAR in the WHERE clause, there could be unexpected cases were index access paths will not be used even when hinted.

Toon Koppelaars mentioned in the OTN thread that the WHERE clause should be using bind variables, and suggested the following for the WHERE clause:

( (T131.C200000020 LIKE (:B0 || '%')) OR (:B0 IS NULL))

I agree with Toon regarding the use of bind variables.  Unfortunately, it does not look like bind variables improve the situation, at least in my test case.

I cannot use AUTOTRACE due to the risk that it will display an incorrect execution plan due to the bind variables, so I will use DBMS_XPLAN.DISPLAY_CURSOR along with a GATHER_PLAN_STATISTICS hint in the SQL statement.  First the statistics collection (to correct any manual adjustment to the index’s clustering factor that was performed earlier) and bind variable setup:

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T1',CASCADE=>TRUE)

VARIABLE B0 VARCHAR2(50)
VARIABLE B1 VARCHAR2(50)
VARIABLE B2 VARCHAR2(50)

EXEC :B0 := 'BOSS'
EXEC :B1 := 'BOSS'
EXEC :B2 := ''

SET AUTOTRACE OFF

Since we do not know the intention of the developer, I will try a couple of combinations to see what happens:

SELECT /*+ GATHER_PLAN_STATISTICS */
  *
FROM
  T1
WHERE
  C200000020 LIKE :B0 || '%'
  OR :B1 = :B2;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

SQL_ID  b47qzqbb6wymu, child number 0
-------------------------------------
SELECT /*+ GATHER_PLAN_STATISTICS */   * FROM   T1 WHERE   C200000020
LIKE :B0 || '%'   OR :B1 = :B2

Plan hash value: 3617692013

------------------------------------------------------------------------------------
| Id  | Operation         | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |      1 |        |      3 |00:00:00.34 |     303K|
|*  1 |  TABLE ACCESS FULL| T1   |      1 |    100K|      3 |00:00:00.34 |     303K|
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter((:B1=:B2 OR "C200000020" LIKE :B0||'%'))

The above resulted in the same full table scan that we saw earlier.

SELECT /*+ GATHER_PLAN_STATISTICS */
  *
FROM
  T1
WHERE
  C200000020 LIKE :B0 || '%'
  OR :B0 = :B2;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

SQL_ID  1vdyc7t6wazhz, child number 0
-------------------------------------
SELECT /*+ GATHER_PLAN_STATISTICS */   * FROM   T1 WHERE   C200000020
LIKE :B0 || '%'   OR :B0 = :B2

Plan hash value: 3617692013

------------------------------------------------------------------------------------
| Id  | Operation         | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |      1 |        |      3 |00:00:00.34 |     303K|
|*  1 |  TABLE ACCESS FULL| T1   |      1 |    100K|      3 |00:00:00.34 |     303K|
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter((:B0=:B2 OR "C200000020" LIKE :B0||'%'))

Specifying the B0 bind variable twice did not help.

SELECT /*+ GATHER_PLAN_STATISTICS */
  *
FROM
  T1
WHERE
  C200000020 LIKE :B0 || '%'
  OR :B0 IS NULL;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

SQL_ID  8n1bg0z9j0001, child number 0
-------------------------------------
SELECT /*+ GATHER_PLAN_STATISTICS */   * FROM   T1 WHERE   C200000020
LIKE :B0 || '%'   OR :B0 IS NULL

Plan hash value: 3617692013

------------------------------------------------------------------------------------
| Id  | Operation         | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |      1 |        |      3 |00:00:00.34 |     303K|
|*  1 |  TABLE ACCESS FULL| T1   |      1 |    500K|      3 |00:00:00.34 |     303K|
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter((:B0 IS NULL OR "C200000020" LIKE :B0||'%'))

Specifying that the B0 bind variable IS NULL did not help either, but notice the change in the predicted cardinality (the 100,000 rows increased to 500,000 rows).

Let’s try an index hint:

SELECT /*+ GATHER_PLAN_STATISTICS INDEX(T1 IND_T1) */
  *
FROM
  T1
WHERE
  C200000020 LIKE :B0 || '%'
  OR :B0 IS NULL;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

SQL_ID  a241dy7mvudtk, child number 0
-------------------------------------
SELECT /*+ GATHER_PLAN_STATISTICS INDEX(T1 IND_T1) */   * FROM   T1
WHERE   C200000020 LIKE :B0 || '%'   OR :B0 IS NULL

Plan hash value: 3617692013

------------------------------------------------------------------------------------
| Id  | Operation         | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |      1 |        |      3 |00:00:00.34 |     303K|
|*  1 |  TABLE ACCESS FULL| T1   |      1 |    500K|      3 |00:00:00.34 |     303K|
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter((:B0 IS NULL OR "C200000020" LIKE :B0||'%'))

The optimizer still did not use the index.

—————–

Cases where Oracle’s optimizer ignores index hints are typically indications of bugs in the optimizer – as we saw, the optimizer did not even consider (generate a calculated cost for) an index access path when no space appeared between the two ‘ characters in the original SQL statement.  Other cases of Oracle’s optimizer ignoring hints may be found here: Demonstration of Oracle “Ignoring” an Index Hint.





SQL Challenge – Submit Update Statements, Updated Values are Reversed on Successful Commit

21 09 2010

September 21, 2010

Is it possible to construct an Oracle Database test case such that:

  1. An UPDATE statement is issued to change a DATE column value from NULL to the current date (SYSDATE) for two rows.
  2. The data source is queried and the result set returned by the database shows that the DATE column is set to the current date for the two rows.
  3. A COMMIT is successfully executed.
  4. The data source is queried and the result set returned by the database shows that the DATE column is again NULL for the two rows.

—————–

For example, something like this:

CREATE TABLE T1 (
  C1 NUMBER,
  C2 DATE,
  C3 DATE,
  CONSTRAINT "CHECK_DATE" CHECK (
    NVL(C2,TO_DATE('01-JAN-2000','DD-MON-YYYY')) < NVL(C3,TO_DATE('01-JAN-2000','DD-MON-YYYY')))
    INITIALLY DEFERRED DEFERRABLE);

INSERT INTO T1 VALUES (1,NULL,TO_DATE('31-DEC-2000','DD-MON-YYYY'));
INSERT INTO T1 VALUES (2,NULL,TO_DATE('31-DEC-2000','DD-MON-YYYY'));
INSERT INTO T1 VALUES (3,NULL,TO_DATE('31-DEC-2000','DD-MON-YYYY'));

COMMIT;

There are now three rows in the table T1:

SELECT
  *
FROM
  T1;

 C1 C2        C3
--- --------- ---------
  1           31-DEC-00
  2           31-DEC-00
  3           31-DEC-00

We are able to update the column C2 to the current date (SYSDATE) with the following SQL statement:

UPDATE
  T1
SET
  C2=SYSDATE
WHERE
  C1 IN (1,2);

2 rows updated.

The update was successful, and we are able to confirm that the update was successful:

SELECT
  *
FROM
  T1;

C1 C2        C3
-- --------- ---------
 1 21-SEP-10 31-DEC-00
 2 21-SEP-10 31-DEC-00
 3           31-DEC-00

Now we will issue a COMMIT:

COMMIT;

Then query the table again to find the original value of column C2 was restored:

SELECT
  *
FROM
  T1;

 C1 C2        C3
--- --------- ---------
  1           31-DEC-00
  2           31-DEC-00
  3           31-DEC-00

The only catch is that “Commit complete.” must appear after the COMMIT, rather than something like the following:

COMMIT
*
ERROR at line 1:
ORA-02091: transaction rolled back
ORA-02290: check constraint (TESTUSER.CHECK_DATE) violated

Be creative.  How is it possible to update a set of rows, select the rows to prove that the rows were updated, COMMIT, receive a confirmation that the COMMIT was successful, and then find that the original values were restored?  In case you are wondering, this OTN thread was the inspiration for this blog article (but don’t look at forum thread yet).

Is the data source that makes this possible a table, or is it something else?  Is there possibly a corrupt index involved?  Is there an ON COMMIT trigger involved (as far as I am aware, there is no such trigger available on Oracle Database).  VPD (virtual private database) tricks?  View tricks?  Magic?





The SQL to the Orbiting Ball

12 09 2010

September 12, 2010

Several years ago I developed a somewhat simple program in 16 bit Borland C (last compiled in 1994, so it pre-dates graphical web pages) that was optimized for 386 and 486 computers which sported VGA graphics but was also able to work with EGA graphics cards.  The simple program, after creating 18 images on the fly that simulated a rotating basketball, generated X and Y coordinates using a specially crafted formula that was capable of producing X and Y cordinates in a circular pattern.  With slightly different inputs the same formula produced a slightly out of round circular pattern that eventually might have generated straight line patterns.  With a slight adjustment to the inputs the same formula produced W patterns within a bounded box.  Keyboard input allowed the user to specify adjusted inputs as the program executed.  The C program looked like this:

#include <graphics.h>               // for graphics functions
#include <conio.h>                  // for kbhit()
#include <math.h>
#include <dos.h>
#define LEFT      100               // boundary of rectangle
#define TOP       100
#define RADIUS    20                // radius of ball
#define pi        3.14159276
#define L_ARROW   75
#define R_ARROW   77
#define U_ARROW   72
#define D_ARROW   80
#define ESC       27

void main(void)
   {
   int driver, mode;                // for initgraph()
   float x, y, dx, dy, i, sped, temp, x1, x2, y1, y2, sta1, sta2, ena1, ena2, rad;    // ball coordinates
   int imgnum, tim, del;
   char key;
   unsigned char ballbuff[10][5000];    // buffer for ball image

   driver = DETECT;                 // auto detect
        // set graphics mode
   initgraph(&driver, &mode, "c:\\bc4\\bgi");

   x = LEFT + RADIUS;          // designate center where ball is created
   y = TOP  + RADIUS;
   for (i =0; i <180; i = i + 18) // create basketball rotation images
       {
 setcolor(RED);
 setfillstyle(SOLID_FILL, RED);
 circle(x, y, RADIUS);
 floodfill(x, y, RED);
 setcolor(BLACK);
 rad = i / (2 * pi);
 x1 = x + 30 * cos(rad);
 x2 = x - 30 * cos(rad);
 y1 = y + 30 * sin(rad);
 y2 = y - 30 * sin(rad);
 sta1 = (180 + i -62);
 ena1 = (180 + i + 42);
 sta2 = (i -62);
 ena2 = (i + 42);
 if ((i/36) != int(i/36))    // must be included to swap arcs
    {
     temp = sta1;
     sta1 = sta2;
     sta2 = temp;
     temp = ena1;
     ena1 = ena2;
     ena2 = temp;
    }
 ellipse (x1, y1, sta1, ena1, RADIUS , RADIUS);
 ellipse (x2, y2, sta2, ena2, RADIUS , RADIUS);
 line (x - cos(rad + pi/2) * RADIUS, y - sin(rad + pi/2) * RADIUS, x + cos(rad + pi/2) * RADIUS, y + sin(rad + pi/2) * RADIUS);
        // pickup image
 getimage(x-RADIUS-20, y-RADIUS-20, x+RADIUS+20, y+RADIUS+20, ballbuff[i/18]);
        // clear screen
 setcolor(WHITE);
 rectangle(-1,-1, 640, 480);
 setfillstyle(SOLID_FILL, BLACK);
 floodfill(100,100, WHITE);
       }

   imgnum = 10;                         // load first position + 1
   tim = 0;                             // set delay to zero
   dx = .1;                             // set constant in x equation
   dy = .1;
   sped = 1;
   del = 1;
   while ( key != ESC )             
      {
      if (kbhit())
      {
       key = getch();
       if (key == 61) del++;            // = key pressed
       if (key == 43) del++;            // + key pressed
       if (key == 45) del--;            // - key pressed
       if (key == 47) sped = sped +.1;  // / key pressed
       if (key == 92) sped = sped -.1;  // \ key pressed
       if (key == 0)                    // place image on screen
 switch(getch())
 {
  case L_ARROW:
     putimage(x-RADIUS, y-RADIUS, ballbuff[imgnum], XOR_PUT);
     dx = dx -.01;
     break;
  case R_ARROW:
     putimage(x-RADIUS, y-RADIUS, ballbuff[imgnum], XOR_PUT);
     dx = dx + .01;
     break;
  case U_ARROW:
     putimage(x-RADIUS, y-RADIUS, ballbuff[imgnum], XOR_PUT);
     dy = dy + .01;
     break;
  case D_ARROW:
     putimage(x-RADIUS, y-RADIUS, ballbuff[imgnum], XOR_PUT);
     dy = dy -.01;
     break;
  case ESC:
     key = 27;
     break;
        }
      }
      tim = tim + sped;
      x = (sin(dx * tim)*100) + 300;
      y = (cos(dy * tim)* 100) + 300;
      imgnum--;                   // cycle through images
      if (imgnum == -1) imgnum = 9;
      putimage(x-RADIUS, y-RADIUS, ballbuff[imgnum], COPY_PUT);

      // move ball across screen
      // to make ball move rapidly increase +
      // set height on screen
      delay(del);                 // make delay smaller for slow computers
      }
   getch();

   closegraph();
   }

If you still have a 16 bit Borland C compiler (my copy is in a huge box on the bottom self of my bookcase), feel free to compile the above program to see it in action.  The original compiled C program, last compiled in 1994, may be downloaded here: OrbitBall2.zip (save as OrbitBall.zip and extract the files – download might not work).  The original program is confirmed to work on 32 bit Windows 95 through Windows XP (in DOS full screen mode), but definitely will not work on 64 bit Windows, even in a virtual machine (it also failed to run on a 32 bit Windows 7 netbook).

You are probably thinking, “Neat, but what does that have to do with Oracle?”.  It might be interesting to produce a modernized version of the above program.  We could use a simple SQL statement like the following to generate 3,600 X and Y coordinates, much like the X and Y coordinates that were generated by the above C program (note that the COS function is typically used to derive X values (using mathematical cosine) and the SIN function is typically used to derive Y values – the functions were swapped so that the X and Y cordinates start at the bottom-center of the screen).  The SQL follows:

SELECT
  ROUND((SIN(DX * (SPEED * COUNTER)) * WIDTH/2) + WIDTH/2) X,
  ROUND((COS(DY * (SPEED * COUNTER)) * HEIGHT/2) + HEIGHT/2) Y
FROM
  (SELECT
    0.1 DX,
    0.1 DY,
    1 SPEED,
    1 DELAY,
    300 WIDTH,
    300 HEIGHT,
    LEVEL COUNTER
  FROM
    DUAL
  CONNECT BY
    LEVEL<=3600);

Now we have a slight problem, how do we present the X and Y coordinate points produced by the above SQL statement?  We need some sort of object to track the X and Y coordinate pairs.  Drawn basketballs might work, but instead I will use these pictures (created with Microsoft Power Point 2010):


To display the round oak pictures, we will put together a VBS script to build a web page on the fly, cycling through the above eight pictures (every two X,Y coordinate pairs cause the displayed picture to change).  Much like the original program, we will allow the user to control the input parameters as the program runs.  Each time the parameters are adjusted, 3,600 new X,Y coordinate points are retrieved from the database into an array – this allows the VBS script to continue at the next X,Y coordinate pair, rather than starting at the beginning every time the parameters are adjusted. 

Option Explicit

Const adCmdText = 1
Const adCmdStoredProc = 4
Const adParamInput = 1
Const adVarNumeric = 139
Const adBigInt = 20
Const adDecimal = 14
Const adDouble = 5
Const adInteger = 3
Const adLongVarBinary = 205
Const adNumeric = 131
Const adSingle = 4
Const adSmallInt = 2
Const adTinyInt = 16
Const adUnsignedBigInt = 21
Const adUnsignedInt = 19
Const adUnsignedSmallInt = 18
Const adUnsignedTinyInt = 17
Const adDate = 7
Const adDBDate = 133
Const adDBTimeStamp = 135
Const adDBTime = 134
Const adVarChar = 200
Const adUseClient = 3

Dim dbDatabase
Dim snpData
Dim comData
Dim varData
Dim objIE

Dim strUsername             'Username
Dim strPassword             'Password
Dim strDatabase             'SID name from tnsnames.ora

startup

Sub startup()
    Dim strSQL
    Dim strHTML
    Dim objOrbitBall
    Dim objOrbitBallPic
    Dim objCommand
    Dim objSettings
    Dim i
    Dim intQuit

    'Fire up Internet Explorer
    Set objIE = CreateObject("InternetExplorer.Application")
    objIE.Left = 0
    objIE.Top = 0
    objIE.Width = 930
    objIE.Height = 820
    objIE.StatusBar = True
    objIE.MenuBar = False
    objIE.Toolbar = False
    objIE.Navigate "about:blank"
    objIE.Document.Title = "The SQL to the Orbiting Ball"
    objIE.Visible = True

    'The data entry area
    strHTML = "<div style=""position: absolute;width: 180px; height: 200px;top: 10px;left: 710px;"">" & vbCrLf
    strHTML = strHTML & "<input type=text id=txtCommand value="""" size=""1""><br>" & vbCrLf
    strHTML = strHTML & "<font size=1><b>+&nbsp;&nbsp;&nbsp;Increase Delay (Not Used)<br>" & vbCrLf
    strHTML = strHTML & "-&nbsp;&nbsp;&nbsp;Decrease Delay (Not Used)<br>" & vbCrLf
    strHTML = strHTML & "/&nbsp;&nbsp;&nbsp;Increase Rotation Speed<br>" & vbCrLf
    strHTML = strHTML & "\&nbsp;&nbsp;&nbsp;Decrease Rotation Speed<br>" & vbCrLf
    strHTML = strHTML & "D&nbsp;&nbsp;&nbsp;Increase Rotation Speed X Axis<br>" & vbCrLf
    strHTML = strHTML & "A&nbsp;&nbsp;&nbsp;Decrease Rotation Speed X Axis<br>" & vbCrLf
    strHTML = strHTML & "W&nbsp;&nbsp;&nbsp;Increase Rotation Speed Y Axis<br>" & vbCrLf
    strHTML = strHTML & "S&nbsp;&nbsp;&nbsp;Decrease Rotation Speed Y Axis<br>" & vbCrLf
    strHTML = strHTML & "L&nbsp;&nbsp;&nbsp;Increase Width X Axis<br>" & vbCrLf
    strHTML = strHTML & "J&nbsp;&nbsp;&nbsp;Decrease Width X Axis<br>" & vbCrLf
    strHTML = strHTML & "I&nbsp;&nbsp;&nbsp;Increase Height Y Axis<br>" & vbCrLf
    strHTML = strHTML & "K&nbsp;&nbsp;&nbsp;Decrease Height Y Axis<br>" & vbCrLf
    strHTML = strHTML & "(space)&nbsp;&nbsp;&nbsp;Restart at 0<br>" & vbCrLf
    strHTML = strHTML & "X&nbsp;&nbsp;&nbsp;Exit</b></font>" & vbCrLf
    strHTML = strHTML & "</div>" & vbCrLf

    'The current orbit information
    strHTML = strHTML & "<div id=""Settings"" style=""position: absolute;width: 180px; height: 100px;top: 600px;left: 710px;""> </div>"
    strHTML = strHTML & "<IMG ID=""picOrbitBall"" style=""position: absolute;"" src=""http://hoopercharles.files.wordpress.com/2010/09/sqlorbitingball0.png"">" & vbCrLf
    objIE.Document.Body.InnerHTML = strHTML

    'The sleep here is only necessary if the database connections happen very quickly
    'Wscript.Sleep 500

    Set dbDatabase = CreateObject("ADODB.Connection")
    Set snpData = CreateObject("ADODB.Recordset")
    Set comData = CreateObject("ADODB.Command")

    'Database configuration
    strUsername = "MyUsername"
    strPassword = "MyPassword"
    strDatabase = "MyDB"

    On Error Resume Next

    dbDatabase.ConnectionString = "Provider=OraOLEDB.Oracle;Data Source=" & strDatabase & ";User ID=" & strUsername & ";Password=" & strPassword & ";"
    dbDatabase.open

    'Should verify that the connection attempt was successful, but I will leave that for someone else to code

    strSQL = "SELECT" & vbCrLf
    strSQL = strSQL & "  ROUND((SIN(DX * (SPEED * COUNTER)) * WIDTH/2) + WIDTH/2) X," & vbCrLf
    strSQL = strSQL & "  ROUND((COS(DY * (SPEED * COUNTER)) * HEIGHT/2) + HEIGHT/2) Y" & vbCrLf
    strSQL = strSQL & "FROM" & vbCrLf
    strSQL = strSQL & "  (SELECT" & vbCrLf
    strSQL = strSQL & "    ? DX," & vbCrLf
    strSQL = strSQL & "    ? DY," & vbCrLf
    strSQL = strSQL & "    ? SPEED," & vbCrLf
    strSQL = strSQL & "    ? DELAY," & vbCrLf
    strSQL = strSQL & "    ? WIDTH," & vbCrLf
    strSQL = strSQL & "    ? HEIGHT," & vbCrLf
    strSQL = strSQL & "    LEVEL COUNTER" & vbCrLf
    strSQL = strSQL & "  FROM" & vbCrLf
    strSQL = strSQL & "    DUAL" & vbCrLf
    strSQL = strSQL & "  CONNECT BY" & vbCrLf
    strSQL = strSQL & "    LEVEL<=3600)"

    With comData
        'Set up the command properties
        .CommandText = strSQL
        .CommandType = adCmdText
        .CommandTimeout = 30

        .ActiveConnection = dbDatabase

        'Add the bind variables
        .Parameters.Append .CreateParameter("dx", adDouble, adParamInput, 30)
        .Parameters.Append .CreateParameter("dy", adDouble, adParamInput, 30)
        .Parameters.Append .CreateParameter("speed", adDouble, adParamInput, 30)
        .Parameters.Append .CreateParameter("delay", adDouble, adParamInput, 30)
        .Parameters.Append .CreateParameter("width", adDouble, adParamInput, 30)
        .Parameters.Append .CreateParameter("height", adDouble, adParamInput, 30)
    End With

    comData("dx") = 0.1
    comData("dy") = 0.1
    comData("speed") = 1
    comData("delay") = 1
    comData("width") = 600
    comData("height") = 600

    Set snpData = comData.Execute

    'Retrieve up to 10,000 data points from Oracle
    varData = snpData.GetRows(10000)
    snpData.Close
    Set snpData = Nothing

    'Allow faster access to these objects when executing in the loop
    Set objOrbitBall = objIE.Document.getElementById("picOrbitBall").Style
    Set objOrbitBallPic = objIE.Document.getElementById("picOrbitBall")
    Set objCommand = objIE.Document.All.txtCommand
    Set objSettings = objIE.Document.getElementById("Settings")

    'Write out the current settings for the orbit
    objSettings.InnerText = "DX: " & comData("dx") & Chr(10) & "DY: " & comData("dy") & Chr(10) & _
                            "Speed: " & comData("speed") &Chr(10) & _
                            "Width: " & comData("width") & Chr(10) & "Height: " & comData("height")

    intQuit = False

    Do While intQuit = False
        For i = 0 To UBound(varData, 2)
            objOrbitBall.Left = CInt(varData(0, i))
            objOrbitBall.Top = CInt(varData(1, i))
            objOrbitBallPic.Src = "http://hoopercharles.files.wordpress.com/2010/09/sqlorbitingball" & cStr(i/2 Mod 8 ) & ".png"

            Wscript.Sleep 50
            Select Case Left(objCommand.Value, 1)
                Case "=", "+"
                    comData("delay") = comData("delay") + 1
                Case "-"
                    comData("delay") = comData("delay") - 1
                Case "/"
                    comData("speed") = comData("speed") + 0.1
                Case "\"
                    comData("speed") = comData("speed") - 0.1
                Case "W", "w"
                    comData("dy") = comData("dy") + 0.0005
                Case "S", "s"
                    comData("dy") = comData("dy") - 0.0005
                Case "D", "d"
                    comData("dx") = comData("dx") + 0.0005
                Case "A", "a"
                    comData("dx") = comData("dx") - 0.0005
                Case "I", "i"
                    comData("height") = comData("height") + 5
                Case "K", "k"
                    comData("height") = comData("height") - 5
                Case "L", "l"
                    comData("width") = comData("width") + 5
                Case "J", "j"
                    comData("width") = comData("width") - 5
                Case "X", "x"
                    intQuit = True
                    Exit For
                Case " "
                     'Reset the loop from the beginning
                     objCommand.Value = ""
                     Exit For
            End Select

            If objCommand.Value <> "" Then
                objCommand.Value = ""

                Set snpData = comData.Execute

                'Retrieve up to 10,000 data points from Oracle
                varData = snpData.GetRows(10000)

                snpData.Close
                Set snpData = Nothing

                'Write out the current settings for the orbit
                objSettings.InnerText = "DX: " & comData("dx") & Chr(10) & "DY: " & comData("dy") & Chr(10) & _
                                        "Speed: " & comData("speed") &Chr(10) & _
                                        "Width: " & comData("width") & Chr(10) & "Height: " & comData("height")
            End If
        Next
    Loop

    objIE.quit
    dbDatabase.Close
    Set dbDatabase = Nothing
    Set objIE = Nothing
End Sub

You may download the above script here: SQLOrbitingBall.vbs (save as SQLOrbitingBall.vbs).

A Circular Orbit:

————-

An Orbit that Changes from a Circular Orbit to a Straight Line:

————-

An Orbit where the Ball Bounces Between the Top and Bottom of the Window:

—————————

As written, the script assumes a minimum of 930 x 820 resolution (1080p resolution or greater should work without any problems).  Adjust the script as necessary for lower resolution screens.  The program written in C certainly is shorter than the moderized version of the program, and had a bit more wow factor prior to the widespread use of Windows and other graphical user interfaces.





Graphical Work Center Utilization – Creating the Demo Data and Active Server Page

1 09 2010

September 1, 2010

Today’s blog article provides a graphical view of production work areas on a factory floor, providing feedback to indicate when the production area is in use.  The blog article includes a classic ASP web page which uses VBScript to write the web page on the fly to the browser on the client computer.  The web page written to the client browser automatically refreshes itself every 30 seconds, automatically advancing the view time by 15 minutes, displaying the production work areas that were in use during the new view time (the time displayed at the top of the page).

The ASP web page is the simple part of today’s blog article (although enabling classic ASP support may be a little challenging), while creating the demo data is actually the challenging portion of the article.  First, we need a table that will define the production work areas and the locations of those areas on the factory floor:

CREATE TABLE RESOURCE_LOCATION_DEMO (
  RESOURCE_ID VARCHAR2(15),
  DESCRIPTION VARCHAR2(30),
  LOCATION_LEFT NUMBER(12,4),
  LOCATION_TOP NUMBER(12,4),
  LOCATION_WIDTH NUMBER(12,4),
  LOCATION_HEIGHT NUMBER(12,4),
  PRIMARY KEY (RESOURCE_ID));

To keep things interesting, I do not want to just place the first production work area next to the second, the second next to the third, etc.  Instead, I want to randomly locate the production work areas on the factory floor, making certain that no two work areas overlap.  We can accomplish this by creating a list of numbers and ordering the numbers in a random sequence, like this:

SELECT
  ROWNUM RN
FROM
  DUAL
CONNECT BY
  LEVEL<=200
ORDER BY
  DBMS_RANDOM.VALUE;

  RN
----
 191
 165
 122
  12
  48
  27
 104
...
 198
 168
 150

200 rows selected.

Now, to locate the production work areas, imagine that we permitted 10 work areas horizontally (along the X axis) across the factory floor.  We can use the above number sequence along with the MOD function to determine the horizontal location of the work areas, and the FLOOR function to determine the vertical location of the work areas (note that each time we run this SQL statement will we receive different results):

SELECT
  'MA'||TO_CHAR(ROWNUM) RESOURCE_ID,
  'Shop Floor Machine '||TO_CHAR(ROWNUM) DESCRIPTION,
  MOD(RN,10) BOX_LEFT,
  FLOOR(RN/10) BOX_TOP
FROM
  (SELECT
    ROWNUM RN
  FROM
    DUAL
  CONNECT BY
    LEVEL<=200
  ORDER BY
    DBMS_RANDOM.VALUE)
WHERE
  ROWNUM<=50;

RESOURCE_ID DESCRIPTION                 BOX_LEFT    BOX_TOP
----------- ------------------------- ---------- ----------
MA1         Shop Floor Machine 1               4         14
MA2         Shop Floor Machine 2               9          6
MA3         Shop Floor Machine 3               5          2
MA4         Shop Floor Machine 4               5          9
MA5         Shop Floor Machine 5               7         18
MA6         Shop Floor Machine 6               9          4
MA7         Shop Floor Machine 7               0          8
MA8         Shop Floor Machine 8               6          6
MA9         Shop Floor Machine 9               5          5
MA10        Shop Floor Machine 10              7         15
...
MA49        Shop Floor Machine 49              2         11
MA50        Shop Floor Machine 50              8          7

It would be too boring to assume that each of the production work areas is exactly the same width and height, so we will add a little more randomization.  Additionally, I want each production area to be up to 1.5 units wide and up to 1.0 units tall, both offset 0.25 units from the top-left (we are dealing with screen coordinates here, where positive Y is the same as mathematical -Y).  While there are up to 200 locations on the factory floor for work areas, we will only define 50 work areas (controlled by the ROWNUM<=50 predicate at the end of the SQL statement):

SELECT
  'MA'||TO_CHAR(ROWNUM) RESOURCE_ID,
  'Shop Floor Machine '||TO_CHAR(ROWNUM) DESCRIPTION,
  (MOD(RN,10))*1.5 + 0.25 LOCATION_LEFT,
  (FLOOR(RN/10))*1.0 + 0.25 LOCATION_TOP,
  ROUND(1.5*DBMS_RANDOM.VALUE,4) LOCATION_WIDTH,
  ROUND(1.0*DBMS_RANDOM.VALUE,4) LOCATION_HEIGHT
FROM
  (SELECT
    ROWNUM RN
  FROM
    DUAL
  CONNECT BY
    LEVEL<=200
  ORDER BY
    DBMS_RANDOM.VALUE)
WHERE
  ROWNUM<=50;

RESOURCE_ID DESCRIPTION               LOCATION_LEFT LOCATION_TOP LOCATION_WIDTH LOCATION_HEIGHT
----------- ------------------------- ------------- ------------ -------------- ---------------
MA1         Shop Floor Machine 1               3.25        18.25         1.2386           .7948
MA2         Shop Floor Machine 2               4.75        11.25          .6078           .9578
MA3         Shop Floor Machine 3               1.75        12.25          .5318            .457
MA4         Shop Floor Machine 4               3.25        13.25         1.2908           .9813
MA5         Shop Floor Machine 5                .25        16.25          .3245           .4644
MA6         Shop Floor Machine 6              12.25        15.25           .239           .3822
MA7         Shop Floor Machine 7               1.75        18.25          .0159           .8868
MA8         Shop Floor Machine 8               1.75        16.25          .3948           .8511
MA9         Shop Floor Machine 9              12.25         6.25          .4856           .3356
MA10        Shop Floor Machine 10             13.75        11.25         1.2619           .6124
...
MA49        Shop Floor Machine 49              7.75        16.25          .6664           .6938
MA50        Shop Floor Machine 50              9.25        15.25         1.3449           .6606

Now that we have tested the results, let’s insert a new set of similar random values into the RESOURCE_LOCATION_DEMO table, and display some of the inserted rows:

INSERT INTO
  RESOURCE_LOCATION_DEMO 
SELECT
  'MA'||TO_CHAR(ROWNUM) RESOURCE_ID,
  'Shop Floor Machine '||TO_CHAR(ROWNUM) DESCRIPTION,
  (MOD(RN,10))*1.5 + 0.25 LOCATION_LEFT,
  (FLOOR(RN/10))*1.0 + 0.25 LOCATION_TOP,
  ROUND(1.5*DBMS_RANDOM.VALUE,4) LOCATION_WIDTH,
  ROUND(1.0*DBMS_RANDOM.VALUE,4) LOCATION_HEIGHT
FROM
  (SELECT
    ROWNUM RN
  FROM
    DUAL
  CONNECT BY
    LEVEL<=200
  ORDER BY
    DBMS_RANDOM.VALUE)
WHERE
  ROWNUM<=50;

COMMIT;

SELECT
  *
FROM
  RESOURCE_LOCATION_DEMO;

RESOURCE_ID DESCRIPTION               LOCATION_LEFT LOCATION_TOP LOCATION_WIDTH LOCATION_HEIGHT
----------- ------------------------- ------------- ------------ -------------- ---------------
MA1         Shop Floor Machine 1              10.75        13.25           .104           .2165
MA2         Shop Floor Machine 2               7.75        18.25         1.2291            .478
MA3         Shop Floor Machine 3               9.25        16.25          .3431           .4364
MA4         Shop Floor Machine 4               1.75        15.25          .3665           .7278
MA5         Shop Floor Machine 5               4.75        15.25          .6842           .4507
MA6         Shop Floor Machine 6               4.75        18.25          .1384           .6434
MA7         Shop Floor Machine 7               4.75         7.25          .7448           .2178
MA8         Shop Floor Machine 8               7.75          .25          .3756            .499
MA9         Shop Floor Machine 9               1.75        18.25         1.0155           .0769
MA10        Shop Floor Machine 10              7.75        13.25         1.1892           .7518
...
MA49        Shop Floor Machine 49              6.25         3.25           .278           .6513
MA50        Shop Floor Machine 50               .25        15.25          .5216           .9607

To translate the above storage units (maybe in inch scale) into screen units we will multiply the storage units by 96 (96 dots per inch) and then multiply by the zoom percent (75% = 0.75).

SELECT
  RESOURCE_ID,
  ROUND(LOCATION_LEFT*96 *0.75) LOCATION_LEFT,
  ROUND(LOCATION_TOP*96 *0.75) LOCATION_TOP,
  ROUND(LOCATION_WIDTH*96 *0.75) LOCATION_WIDTH,
  ROUND(LOCATION_HEIGHT*96 *0.75) LOCATION_HEIGHT
FROM
  RESOURCE_LOCATION_DEMO;

RESOURCE_ID LOCATION_LEFT LOCATION_TOP LOCATION_WIDTH LOCATION_HEIGHT
----------- ------------- ------------ -------------- ---------------
MA1                   774          954              7              16
MA2                   558         1314             88              34
MA3                   666         1170             25              31
MA4                   126         1098             26              52
MA5                   342         1098             49              32
MA6                   342         1314             10              46
MA7                   342          522             54              16
MA8                   558           18             27              36
MA9                   126         1314             73               6
MA10                  558          954             86              54
...

Next, we need a table to maintain the time periods in which each of the production work areas was in use, and by whom the work areas were used:

CREATE TABLE LABOR_TICKET_DEMO (
  TRANSACTION_ID NUMBER,
  RESOURCE_ID VARCHAR2(15),
  EMPLOYEE_ID VARCHAR2(15),
  CLOCK_IN DATE,
  CLOCK_OUT DATE,
  PRIMARY KEY (TRANSACTION_ID));

Let’s see if we are able to generate some random data for the table:

ALTER SESSION SET NLS_DATE_FORMAT='DD-MON-YYYY HH24:MI';

SELECT
  ROWNUM TRANSACTION_ID,
  'MA'||TO_CHAR(ROUND(DBMS_RANDOM.VALUE(1,50))) RESOURCE_ID,
  'EMP'||TO_CHAR(ROUND(DBMS_RANDOM.VALUE(1,300))) EMPLOYEE_ID,
  TRUNC(SYSDATE)+DBMS_RANDOM.VALUE CLOCK_IN
FROM
  DUAL
CONNECT BY
  LEVEL<=10;

TRANSACTION_ID RESOURCE_ID EMPLOYEE_ID CLOCK_IN
-------------- ----------- ----------- -----------------
             1 MA29        EMP50       01-SEP-2010 01:56
             2 MA35        EMP181      01-SEP-2010 10:06
             3 MA13        EMP172      01-SEP-2010 17:40
             4 MA21        EMP182      01-SEP-2010 09:00
             5 MA14        EMP80       01-SEP-2010 09:53
             6 MA4         EMP80       01-SEP-2010 19:04
             7 MA7         EMP110      01-SEP-2010 14:34
             8 MA45        EMP19       01-SEP-2010 22:05
             9 MA10        EMP207      01-SEP-2010 21:51
            10 MA46        EMP127      01-SEP-2010 16:49

That worked, but note that we did not generate a CLOCK_OUT time – we want to make certain that the CLOCK_OUT time is after the CLOCK_IN time, but we simply cannot do that with the above SQL statement as written.  We slide the above into an inline view and then set the CLOCK_OUT time to be up to 12 hours after the CLOCK_IN time (DBMS_RANDOM.VALUE by default returns a value between 0 and 1, so if we divide that value by 2, we can add up to 1/2 of a day to the CLOCK_IN date and time):

INSERT INTO
  LABOR_TICKET_DEMO
SELECT
  TRANSACTION_ID,
  RESOURCE_ID,
  EMPLOYEE_ID,
  CLOCK_IN,
  CLOCK_IN + (DBMS_RANDOM.VALUE/2) CLOCK_OUT
FROM
  (SELECT
    ROWNUM TRANSACTION_ID,
    'MA'||TO_CHAR(ROUND(DBMS_RANDOM.VALUE(1,50))) RESOURCE_ID,
    'EMP'||TO_CHAR(ROUND(DBMS_RANDOM.VALUE(1,300))) EMPLOYEE_ID,
    TRUNC(SYSDATE)+DBMS_RANDOM.VALUE CLOCK_IN
  FROM
    DUAL
  CONNECT BY
    LEVEL<=1000);

COMMIT;

Let’s take a look at what made it into the table (your results will be different):

SELECT
  *
FROM
  LABOR_TICKET_DEMO
ORDER BY
  TRANSACTION_ID; 

TRANSACTION_ID RESOURCE_ID     EMPLOYEE_ID     CLOCK_IN          CLOCK_OUT
-------------- --------------- --------------- ----------------- -----------------
             1 MA34            EMP49           01-SEP-2010 20:32 02-SEP-2010 08:18
             2 MA47            EMP230          01-SEP-2010 20:08 02-SEP-2010 03:06
             3 MA20            EMP257          01-SEP-2010 02:17 01-SEP-2010 05:44
             4 MA21            EMP129          01-SEP-2010 09:37 01-SEP-2010 15:41
             5 MA18            EMP214          01-SEP-2010 18:57 01-SEP-2010 20:57
             6 MA46            EMP173          01-SEP-2010 05:51 01-SEP-2010 15:32
             7 MA34            EMP224          01-SEP-2010 20:23 02-SEP-2010 08:17
             8 MA31            EMP8            01-SEP-2010 02:56 01-SEP-2010 14:02
             9 MA37            EMP178          01-SEP-2010 09:28 01-SEP-2010 16:03
            10 MA8             EMP31           01-SEP-2010 20:17 02-SEP-2010 05:51
...
           999 MA43            EMP138          01-SEP-2010 05:07 01-SEP-2010 05:23
          1000 MA2             EMP235          01-SEP-2010 05:29 01-SEP-2010 13:28

We now have 1000 transactions scattered across the 50 work areas (RESOURCE_ID column).  What we next want to determine is which of the work areas was in use at a specific time of the day.  Because we eventually will want only one row per unique RESOURCE_ID value, we will use the ROW_NUMBER analytic function to number each of the rows within each unique RESOURCE_ID value:

SELECT
  RESOURCE_ID,
  EMPLOYEE_ID,
  TRUNC(SYSDATE) + (1.25)/24 CHECK_TIME,
  CLOCK_IN,
  CLOCK_OUT,
  ROW_NUMBER() OVER (PARTITION BY RESOURCE_ID ORDER BY CLOCK_IN) RN
FROM
  LABOR_TICKET_DEMO
WHERE
  CLOCK_IN<=TRUNC(SYSDATE) + (1.25)/24
  AND CLOCK_OUT>TRUNC(SYSDATE) + (1.25)/24
ORDER BY
  RESOURCE_ID,
  CLOCK_IN;

RESOURCE_ID  EMPLOYEE_ID  CHECK_TIME        CLOCK_IN          CLOCK_OUT         RN
------------ ------------ ----------------- ----------------- ----------------- --
MA10         EMP192       01-SEP-2010 01:15 01-SEP-2010 00:21 01-SEP-2010 05:44  1
MA10         EMP233       01-SEP-2010 01:15 01-SEP-2010 00:23 01-SEP-2010 02:42  2
MA16         EMP114       01-SEP-2010 01:15 01-SEP-2010 00:21 01-SEP-2010 04:48  1
MA18         EMP261       01-SEP-2010 01:15 01-SEP-2010 00:18 01-SEP-2010 07:02  1
MA18         EMP133       01-SEP-2010 01:15 01-SEP-2010 00:32 01-SEP-2010 04:35  2
MA2          EMP62        01-SEP-2010 01:15 01-SEP-2010 01:14 01-SEP-2010 12:03  1
MA21         EMP235       01-SEP-2010 01:15 01-SEP-2010 00:05 01-SEP-2010 10:42  1
MA22         EMP4         01-SEP-2010 01:15 01-SEP-2010 00:01 01-SEP-2010 06:27  1
MA22         EMP300       01-SEP-2010 01:15 01-SEP-2010 01:12 01-SEP-2010 11:50  2
MA23         EMP135       01-SEP-2010 01:15 01-SEP-2010 00:35 01-SEP-2010 05:19  1
MA25         EMP35        01-SEP-2010 01:15 01-SEP-2010 00:20 01-SEP-2010 06:58  1
MA28         EMP298       01-SEP-2010 01:15 01-SEP-2010 00:52 01-SEP-2010 06:27  1
MA30         EMP72        01-SEP-2010 01:15 01-SEP-2010 00:56 01-SEP-2010 07:28  1
MA32         EMP84        01-SEP-2010 01:15 01-SEP-2010 01:00 01-SEP-2010 05:25  1
MA34         EMP299       01-SEP-2010 01:15 01-SEP-2010 00:31 01-SEP-2010 12:04  1
MA38         EMP268       01-SEP-2010 01:15 01-SEP-2010 00:31 01-SEP-2010 04:15  1
MA38         EMP278       01-SEP-2010 01:15 01-SEP-2010 00:32 01-SEP-2010 04:50  2
MA38         EMP176       01-SEP-2010 01:15 01-SEP-2010 01:01 01-SEP-2010 04:01  3
MA4          EMP257       01-SEP-2010 01:15 01-SEP-2010 00:10 01-SEP-2010 10:45  1
MA40         EMP231       01-SEP-2010 01:15 01-SEP-2010 00:58 01-SEP-2010 11:01  1
MA43         EMP65        01-SEP-2010 01:15 01-SEP-2010 00:54 01-SEP-2010 09:29  1
MA44         EMP18        01-SEP-2010 01:15 01-SEP-2010 00:07 01-SEP-2010 03:30  1
MA46         EMP36        01-SEP-2010 01:15 01-SEP-2010 00:40 01-SEP-2010 04:57  1
MA48         EMP61        01-SEP-2010 01:15 01-SEP-2010 00:27 01-SEP-2010 10:20  1
MA48         EMP169       01-SEP-2010 01:15 01-SEP-2010 00:44 01-SEP-2010 01:27  2
MA5          EMP147       01-SEP-2010 01:15 01-SEP-2010 00:02 01-SEP-2010 04:48  1
MA6          EMP132       01-SEP-2010 01:15 01-SEP-2010 00:34 01-SEP-2010 09:42  1

27 rows selected.

In some cases we have up to three employees working in a work area at 1:15AM (the time of day is indicated by the 1.25 value in the SQL statement).  Now, lets eliminate the duplicate rows, leaving just the rows where the calculated RN column is equal to 1.  We will join the above SQL statement in an inline view to the RESOURCE_LOCATION_DEMO table and convert the production work area coordinates to screen coordinates, in this case 96 pixels per unit (inches) at a 75% zoom percent (we made this same screen coordinate conversion in a previous SQL statement above):

SELECT
  RL.RESOURCE_ID,
  RL.DESCRIPTION,
  ROUND(RL.LOCATION_LEFT*96 *0.75) LOCATION_LEFT,
  ROUND(RL.LOCATION_TOP*96 *0.75) LOCATION_TOP,
  ROUND(RL.LOCATION_WIDTH*96 *0.75) LOCATION_WIDTH,
  ROUND(RL.LOCATION_HEIGHT*96 *0.75) LOCATION_HEIGHT,
  LT.EMPLOYEE_ID,
  LT.CLOCK_IN,
  LT.CLOCK_OUT
FROM
  RESOURCE_LOCATION_DEMO RL,
  (SELECT
    RESOURCE_ID,
    EMPLOYEE_ID,
    CLOCK_IN,
    CLOCK_OUT,
    ROW_NUMBER() OVER (PARTITION BY RESOURCE_ID ORDER BY CLOCK_IN) RN
  FROM
    LABOR_TICKET_DEMO
  WHERE
    CLOCK_IN<=TRUNC(SYSDATE) + (1.25)/24
    AND CLOCK_OUT>TRUNC(SYSDATE) + (1.25)/24) LT
WHERE
  RL.RESOURCE_ID=LT.RESOURCE_ID(+)
  AND NVL(LT.RN,1)=1
ORDER BY
  RL.RESOURCE_ID;

RESOURCE_ID  DESCRIPTION                    LOCATION_LEFT LOCATION_TOP LOCATION_WIDTH LOCATION_HEIGHT EMPLOYEE_ID  CLOCK_IN          CLOCK_OUT
------------ ------------------------------ ------------- ------------ -------------- --------------- ------------ ----------------- -----------------
MA1          Shop Floor Machine 1                     774          954              7              16
MA10         Shop Floor Machine 10                    558          954             86              54 EMP192       01-SEP-2010 00:21 01-SEP-2010 05:44
MA11         Shop Floor Machine 11                    882         1098             29               1
MA12         Shop Floor Machine 12                    234          378             51              51
MA13         Shop Floor Machine 13                    882         1314             83              62
MA14         Shop Floor Machine 14                    558          378             38              61
MA15         Shop Floor Machine 15                    774          522             63              64
MA16         Shop Floor Machine 16                    126          666            103              55 EMP114       01-SEP-2010 00:21 01-SEP-2010 04:48
MA17         Shop Floor Machine 17                    558          234             94              30
MA18         Shop Floor Machine 18                    342          450             85              21 EMP261       01-SEP-2010 00:18 01-SEP-2010 07:02
MA19         Shop Floor Machine 19                    342          666             94              20
MA2          Shop Floor Machine 2                     558         1314             88              34 EMP62        01-SEP-2010 01:14 01-SEP-2010 12:03
MA20         Shop Floor Machine 20                    666          162             33              33
MA21         Shop Floor Machine 21                    774          306             66              22 EMP235       01-SEP-2010 00:05 01-SEP-2010 10:42
MA22         Shop Floor Machine 22                    990          378             78              71 EMP4         01-SEP-2010 00:01 01-SEP-2010 06:27
MA23         Shop Floor Machine 23                    666          666             50              37 EMP135       01-SEP-2010 00:35 01-SEP-2010 05:19
MA24         Shop Floor Machine 24                    990          810            107              45
...

From the above output, we know the screen coordinates of each production work area (RESOURCE_ID) and the first employee to use the production work area (and the employee was still using it) at 1:15AM.

For the next portion of this blog article, the portion that requires an ASP enabled web server, we need a couple of pictures (these were created using Microsoft Power Point):

Representing a production work center that is in use:

Representing a production work center that is idle:

The factory floor – the background area:

——

Now we need the classic ASP page code – note that the code syntax is very similar to that of the previous VBScript examples.  The script uses Response.Write to write information to the client computer’s web browser, and an embedded Java script to call the post event of the embedded HTML form to update the display as of time every 30 seconds (yes, I should have used bind variables in the SQL statement, but that would have required an extra 120 seconds to code and would have left you with nothing to improve):

<html>

<head>
<meta http-equiv="refresh" content="600">
<title>Graphical Work Center Utilization Animated</title>
</head>

<body>
    <%
    Dim strSQL

    Dim sglOriginLeft
    Dim sglOriginTop

    Dim sglZoom
    Dim strZoom

    Dim i
    Dim intWidth
    Dim intHeight
    Dim strOffset
    Dim sglOffset
    Dim varDateTime

    Dim snpData
    Dim dbDatabase

    Set dbDatabase = Server.CreateObject("ADODB.Connection")

    'Database configuration
    strUsername = "MyUsername"
    strPassword = "MyPassword"
    strDatabase = "MyDB"

    On Error Resume Next

    dbDatabase.ConnectionString = "Provider=OraOLEDB.Oracle;Data Source=" & strDatabase & ";User ID=" & strUsername & ";Password=" & strPassword & ";"
    dbDatabase.Open

    'Should verify that the connection attempt was successful, but I will leave that for someone else to code

    Set snpData = Server.CreateObject("ADODB.Recordset")

    'Retrieve the last value for the time offset and the selected zoom percent
    strOffset = Request.Form("txtOffset")
    strZoom = Request.Form("cboZoom")

    'Convert back to a number
    sglOffset = CSng(strOffset)
    sglZoom = CSng(strZoom) / 100

    'Advance to the next 0.25 hour
    if (sglOffset = 0) or (sglOffset = 24) then
        sglOffset = 0.25
    else
        sglOffset = sglOffset + 0.25
    end if

    'Set the zoom percent, if not already set
    If sglZoom = 0 Then
        sglZoom = 0.5 '50% zoom
    End If

    varDateTime = DateAdd("n", sglOffset*60, Date) 'Create a printable version of the view date and time

    Response.Write "<form name=" & chr(34) & "reportform" & chr(34) & " method=" & chr(34) & "POST" & chr(34) & " action=" & chr(34) & "GraphicalWorkCenterUtilization.asp" & chr(34) & ">" & vbcrlf
    Response.Write varDateTime & "  " & vbCrLf
    Response.Write "&nbsp;&nbsp;&nbsp;Zoom Percent <select size=""1"" id=""cboZoom"" name=""cboZoom"" style=""width:50"">" & vbCrLf
    For i = 10 to 200 Step 10
        If sglZoom = i / 100 Then
            Response.Write "<option selected value=""" & cStr(i) & """>" & cStr(i) & "</option>"
        Else
            Response.Write "<option value=""" & cStr(i) & """>" & cStr(i) & "</option>"
        End If
    Next
    Response.Write "</select><br>"
    Response.Write "  <input type=" & chr(34) & "submit" & chr(34) & " value=" & chr(34) & "Update View" & chr(34) & " name=" & chr(34) & "cmdViewReport" & chr(34) & ">" & vbcrlf
    Response.Write "  <input type=" & chr(34) & "hidden" & chr(34) & " name=" & chr(34) & "txtOffset" & chr(34) & " size=" & chr(34) & "5" & chr(34) & " value=" & chr(34) & cStr(sglOffset) & chr(34) & ">" & vbcrlf
    Response.Write "</form>" & vbcrlf

    'The background image
    intWidth = Round(16 * 96 * sglZoom)
    intHeight = Round(20 * 96 * sglZoom)
    Response.Write "<img src=" & chr(34) & "http://hoopercharles.files.wordpress.com/2010/09/graphicalworkcenterutilizationbackground.jpg" & chr(34) & " width=" & chr(34) & cstr(intWidth) & chr(34) & " height=" & chr(34) & cstr(intheight) & chr(34) & " style=" & chr(34) & "position:absolute;top:50px;left:0px;z-index:-1" & chr(34) & "/>" & vbcrlf

    'The SQL statement developed earlier
    strSQL = "SELECT" & VBCrLf
    strSQL = strSQL & "  RL.RESOURCE_ID," & VBCrLf
    strSQL = strSQL & "  RL.DESCRIPTION," & VBCrLf
    strSQL = strSQL & "  ROUND(RL.LOCATION_LEFT*96 *" & cStr(sglZoom) & ") LOCATION_LEFT," & VBCrLf
    strSQL = strSQL & "  ROUND(RL.LOCATION_TOP*96 *" & cStr(sglZoom) & ") LOCATION_TOP," & VBCrLf
    strSQL = strSQL & "  ROUND(RL.LOCATION_WIDTH*96 *" & cStr(sglZoom) & ") LOCATION_WIDTH," & VBCrLf
    strSQL = strSQL & "  ROUND(RL.LOCATION_HEIGHT*96 *" & cStr(sglZoom) & ") LOCATION_HEIGHT," & VBCrLf
    strSQL = strSQL & "  LT.EMPLOYEE_ID," & VBCrLf
    strSQL = strSQL & "  LT.CLOCK_IN," & VBCrLf
    strSQL = strSQL & "  LT.CLOCK_OUT" & VBCrLf
    strSQL = strSQL & "FROM" & VBCrLf
    strSQL = strSQL & "  RESOURCE_LOCATION_DEMO RL," & VBCrLf
    strSQL = strSQL & "  (SELECT" & VBCrLf
    strSQL = strSQL & "    RESOURCE_ID," & VBCrLf
    strSQL = strSQL & "    EMPLOYEE_ID," & VBCrLf
    strSQL = strSQL & "    CLOCK_IN," & VBCrLf
    strSQL = strSQL & "    CLOCK_OUT," & VBCrLf
    strSQL = strSQL & "    ROW_NUMBER() OVER (PARTITION BY RESOURCE_ID ORDER BY CLOCK_IN) RN" & VBCrLf
    strSQL = strSQL & "  FROM" & VBCrLf
    strSQL = strSQL & "    LABOR_TICKET_DEMO" & VBCrLf
    strSQL = strSQL & "  WHERE" & VBCrLf
    strSQL = strSQL & "    CLOCK_IN<=TRUNC(SYSDATE) + (" & cStr(sglOffset) & ")/24" & VBCrLf
    strSQL = strSQL & "    AND CLOCK_OUT>TRUNC(SYSDATE) + (" & cStr(sglOffset) & ")/24) LT" & VBCrLf
    strSQL = strSQL & "WHERE" & VBCrLf
    strSQL = strSQL & "  RL.RESOURCE_ID=LT.RESOURCE_ID(+)" & VBCrLf
    strSQL = strSQL & "  AND NVL(LT.RN,1)=1" & VBCrLf
    strSQL = strSQL & "ORDER BY" & VBCrLf
    strSQL = strSQL & "  RL.RESOURCE_ID"

    snpData.open strSQL, dbDatabase

    If snpData.State = 1 then
        Response.Write "<B><font color=" & chr(34) & "#0000FF" & chr(34) & "><p style=" & chr(34) & "position:absolute;top:15px;left:500px" & chr(34) & ">" & FormatDateTime(cdate(snpData("START_TIME")),2) & " " & FormatDateTime(cdate(snpData("START_TIME")),4) & " - " & FormatDateTime(cdate(snpData("END_TIME")),4) & "</p></font></b>" & vbcrlf

        Do While Not snpData.EOF
            If Not(IsNull(snpData("employee_id"))) Then
                'A labor ticket was in process during this time period
                Response.Write "<img alt=" & Chr(34) & snpData("resource_id") & "  " & snpData("description") & vbCrlf & snpData("employee_id") & "(" & snpData("clock_in") & " - " & snpData("clock_out") & ")" & Chr(34) & " src=" & chr(34) & "http://hoopercharles.files.wordpress.com/2010/09/graphicalworkcenterutilizationrunning.jpg" & chr(34) & " width=" & chr(34) & cStr(snpData("location_width")) & chr(34) & " height=" & chr(34) & cStr(snpData("location_height")) & chr(34) & " style=" & chr(34) & "position:absolute;top:" & cStr(cLng(snpData("location_top")) + 40) & "px;left:" & cStr(snpData("location_left")) & "px" & chr(34) & "/>" & vbcrlf
                'Write the title down 1 pixel
                Response.Write "<B><font color=" & chr(34) & "#00FFFF" & chr(34) & "><p style=" & chr(34) & "position:absolute;top:" & cStr(Round(41 + CSng(snpData("location_top")))) & "px;left:" & cStr(Round(CSng(snpData("location_left")))) & "px" & chr(34) & ">" & snpData("resource_id") & "</p></font></b>" & vbcrlf
            Else
                'No labor ticket was in process during this time period
                Response.Write "<img alt=" & Chr(34) & snpData("resource_id") & "  " & snpData("description") & Chr(34) & " src=" & chr(34) & "http://hoopercharles.files.wordpress.com/2010/09/graphicalworkcenterutilizationstopped.jpg" & chr(34) & " width=" & chr(34) & cStr(snpData("location_width")) & chr(34) & " height=" & chr(34) & cStr(snpData("location_height")) & chr(34) & " style=" & chr(34) & "position:absolute;top:" & cStr(cLng(snpData("location_top")) + 40) & "px;left:" & cStr(snpData("location_left")) & "px" & chr(34) & "/>" & vbcrlf
                'Write the title down 1 pixel
                Response.Write "<B><font color=" & chr(34) & "#FF0000" & chr(34) & "><p style=" & chr(34) & "position:absolute;top:" & cStr(Round(41 + CSng(snpData("location_top")))) & "px;left:" & cStr(Round(CSng(snpData("location_left")))) & "px" & chr(34) & ">" & snpData("resource_id") & "</p></font></b>" & vbcrlf
            End If
            snpData.movenext
        Loop
    End if

    snpData.Close
    dbDatabase.Close

    Set snpData = Nothing
    Set dbDatabase = Nothing
    %>

    <script language="JavaScript1.2">
    function NextInterval(){
      reportform.submit();
    }

      setTimeout("NextInterval()",30000)
    </script>
</body>

</html>

GraphicalWorkCenterUtilization.asp (save as GraphicalWorkCenterUtilization.asp on a web server that supports classic ASP pages)

Below are samples of the output as the display as of time advanced – the zoom percent was set to 50.  Notice that the work centers go online and offline as the time progresses (click a picture to see a larger version of that picture):

The final example demonstrates how the display changed when the zoom percent was changed from 50% to 130%:

-

As the mouse pointer is moved over the boxes representing the work centers, a pop-up tooltip appears that describes the work center, as well as employee ID, clock in time, and clock out time for the first labor ticket at that work center in the time period.

——-

Hopefully, you have found this example to be interesting.  Your assignment is to now connect proximity switches to the devices in your office and surrounding areas, recording their location in the RESOURCE_LOCATION_DEMO table.  Then log the proximity switch status to the LABOR_TICKET_DEMO table so that you are able to determine when the water cooler, coffee machine, chairs, keyboards, and computer mice are in use.  Use the data to determine which employees are the hardest working, and which employees have determined how to think smarter rather than working harder.  :-)





Down for the Count – Multiple Choice Quiz

8 08 2010

August 8, 2010 (Modified August 8, 2010, August 10, 2010)

I am not much of a supporter of True or False type questions, nor do I much care for multiple choice type questions.  It seems that essay questions are usually the only appropriate type of questions on exams.  Take, for example the following question that might appear on a SQL exam:

Three, and only three, user sessions are connected to the database.  Session 1 creates 2 tables using the following commands:

DROP TABLE T1 PURGE;
DROP TABLE T2 PURGE;
CREATE TABLE T1(COL1 NUMBER);
CREATE TABLE T2(COL1 NUMBER);

These are the only tables in the database named T1 and T2, as shown by the following:

SELECT
  TABLE_NAME
FROM
  DBA_TABLES
WHERE
  TABLE_NAME IN ('T1','T2');

TABLE_NAME
------------------------------
T1
T2

The following operations are performed in order:

In Session 1:

INSERT INTO T1
SELECT
  ROWNUM
FROM
  DUAL
CONNECT BY
  LEVEL<=100;

COMMIT;

INSERT INTO T1
SELECT
  ROWNUM
FROM
  DUAL
CONNECT BY
  LEVEL<=100;

In Session 2:

INSERT INTO T1
SELECT
  ROWNUM
FROM
  DUAL
CONNECT BY
  LEVEL<=100;

INSERT INTO T2
SELECT
  *
FROM
  T1;

COMMIT;

INSERT INTO T2
SELECT
  *
FROM
  T2;

In Session 3:

SELECT
  COUNT(*)
FROM
  T2;

What value is displayed when session 3 executes its query against table T2?
a.  600
b.  200
c.  100
d.  0
e.  All of the above
f.  None of the above

After you answer the multiple choice question, explain why your answer is correct.

————–
Edit roughly 4.5 hours after the initial publishing of this blog article: 7 people, including the first person to comment (Sean) saw a value of 1600 for answer a. – that value was modified within minutes of the first comment appearing in this article to the value 600.  The number 1600 was either a typo or an answer that I thought no one would ever select.  For fun, let’s add the following to the list of possible answers for the question… could it be the correct answer?:
a2.  1600

—————-

Edit August 10, 2010: The Test Answers:

As I stated a couple of days ago, I intend to reveal the correct answer to the question posed by this blog article.  As mentioned by Gary, just because a person (or certification board) designs a test question, that by default does not mean that the person knows the correct answer to the question.

Within roughly 15 minutes of this blog post appearing on Sunday Sean provided a very good answer with strong justification.  If I only had 60 seconds to answer the question, I would hope to be able to work out the same solution.  That said, the WordPress blog article category for this article is “Quiz – Whose Answer is it Anyway?” and answers E and F seem to be a bit of a clue that something is a bit strange about this question.  I had more than 60 seconds to think about the answer, so I will pick one of the other answers.

I suspect that several readers were a bit suspicious about the question in this blog article that just might appear on some sort of standardized test (I designed the question, so I hope that it does not).  Gary, Joel Garry, and Martin Berger stepped up to the challenge and offered suggestions regarding how there *might* be more than one answer to the provided question.  As you might guess, the numbers selected for the answers are not random.  I thought that I would share with you the thoughts that I had when putting together the potential test answers.

——

b. 200 – assumes that we are using SQL*Plus or some other tool that does not auto-commit after every DML call, have not modified the environment of the tool, and do not closed the tool after the script for each session was executed – SQL*Plus commits by default when exiting.  This is probably the right answer.

——

a. 600 – assumes that we are using a tool that auto-commits after every DML statement by default, or possibly using a database (other than Oracle) that auto-commits after every DML statement.  This could be a Visual Basic (or VBScript) program using ODBC or OLEDB (I might have stated ADO before) that does not start a transaction with a controllable end-point unless the BeginTrans method of the connection object is called first – or it could just as easily be a Java program using JDBC that does not call the setAutoCommit method of the connection object with a parameter of false.  Gary mentioned another possibility.

——

d. 0 – the question states that there are three and exactly three sessions connected to the database, so it probably is not reasonable to connect or disconnect from the database.  This is an instance where a pre-existing condition in one of the sessions might cause problems.  For example, assume that these three sessions had been connected for a while and one of the previous questions in the test asked about the serializable isolation level.  So, at some point in session 3 the command ALTER SESSION SET ISOLATION_LEVEL=SERIALIZABLE; was executed and a transaction started – maybe something as simple as: SELECT * FROM TABLE2 WHERE 0=1 FOR UPDATE;  With this isolation level, once a transaction begins, the answers to all SQL statements are as of the start time for the transaction.  Session 3 will therefore see a count of 0 for table T2.

Another possibility is that either session 2 or (mutually exclusive) session 3 is connected as a different user.  The problem is that the test showed that there is only one table T1 and one table T2 in the database.  Suppose that one of the earlier test questions asked about synonyms and a table named TABLE2 was created in session 1 with a public synonym name of T2.  If session 2 is connected as a different user, it will actually insert into TABLE2 when attempting to insert into table T2, and session 3 will report the number of rows in the real table T2.  Note that the output does not show whether or not the inserts were successful, so it is possible that session 2 could not resolve the name T2 and returned an error.  If session 3 is connected as a different user, it will report the number of rows in table TABLE2, rather than T2.

Another possibility is that either session 2 or session 3 is connected as a different user and a public synonym points to a view created on table T2 that includes a WHERE clause of COL1=0.  Note that the output does not show whether or not the inserts were successful, so the view could have been created WITH CHECK OPTION.

——

c. 100 – the easiest way to obtain this value is if session 2 had at some point issued the command ALTER SESSION SET ISOLATION_LEVEL=SERIALIZABLE; and already had an in-process transaction (maybe SELECT * FROM TABLE2 WHERE 0=1 FOR UPDATE;).

A second possibility of obtaining a value of 100 is if at some point in the past session 1 executed the following commands:

CREATE OR REPLACE VIEW VIEW_T2 AS
SELECT
  *
FROM
  T2
WHERE
  COL1<=50;
 
GRANT ALL ON VIEW_T2 TO PUBLIC;
 
CREATE PUBLIC SYNONYM T1 FOR T1;
CREATE PUBLIC SYNONYM T2 FOR VIEW_T2;

In this case, sessions 2 and 3 are logged in as the same user, which is different from session 1.

——

a2. 1600 – you might be able to guess how I obtained this number.  I performed the test several times, obtaining different results each time, not realizing that session 1 could not drop tables T1 and T2 because session 2 had an active transaction referencing one or both of those tables (maybe you could not guess how I obtained that number).

How else might you obtain a value of 1600?  Assume that at some point in the past session 1 executed the following commands:

CREATE TABLE TABLE2(COL1 NUMBER);
GRANT ALL ON TABLE2 TO PUBLIC;
CREATE PUBLIC SYNONYM T1 FOR T1;
CREATE PUBLIC SYNONYM T2 FOR TABLE2;
 
CREATE OR REPLACE TRIGGER T2_POPULATE AFTER INSERT ON TABLE2
BEGIN
  DELETE FROM T2;
 
  INSERT INTO
    T2
  SELECT
    ROWNUM
  FROM
    DUAL
  CONNECT BY
    LEVEL<=1600;
END;
/

Now every time a transaction inserts into table TABLE2 all rows will be deleted from table T2 and 1600 new rows will be inserted into table T2.  Session 1 and session 3 are connected to the database as the same user.  Session 2 is connected to the database as a different user.

——
 
e. All of the Above – all answers were obtained without violating what the question asked, so I guess that e is the correct answer.

——

f. None of the Above – the count of table T2 will produce a single resulting value, not 5 values, and the value returned could very well be something other than the 5 values listed, depending on the prior state of the database.

————————

How else might you obtain the different potential answers listed for the test question?





Full Table Scan when Selecting Null Values

29 07 2010

July 29, 2010

Another interesting thread appeared on the OTN forums this week.  The original poster in the thread stated that Oracle Database was performing a full table scan for a SQL statement like this:

SELECT
  COUNT(*)
FROM
  T1
WHERE
  COL1 IS NULL;

While a SQL statement like this following did not perform a full table scan:

SELECT
  COUNT(*)
FROM
  T1;

The original poster stated that there is an index on the column COL1 (COL1 is actually ID_NUMBER in the SQL statement posted by the OP).  Full table scans are not always the worst execution path, however, the OP stated that only 0.1% (average of 1 in 1000) of the table’s rows contain a NULL value in the column.  For a standard B*Tree index, NULL values are not included in the index structure if the values for all columns in the index definition are NULL – thus Oracle Database could not be able to use a standard single column B*Tree index to locate all of the rows where that column contains a NULL value.

Several suggestions were offered in the thread:

  • Sybrand Bakker suggested creating a function based index similar to NVL(COL1, <somevalue>) and then modifying the WHERE clause to specify NVL(COL1, <somevalue>) = <somevalue>  – This is a good suggestion if the original query may be modified.
  • SomeoneElse suggested creating a bipmap index for COL1 since bitmap indexes do store NULL values for rows where all columns in the index structure contain NULL values.  – This is a good suggestion if the table is not subject to many changes.
  • I suggested creating a standard, composite B*Tree index on COL1, ‘ ‘, which is a technique I found in one of Richard Foote’s blog articles some time ago.  This approach requires roughly 2 additional bytes per index entry.
  • David Aldridge suggested creating a function based index that only includes the NULL values contained in COL1 using the following for the index definition CASE WHEN COL1 IS NULL THEN 1 END and then modifying the WHERE clause in the SQL statement to CASE WHEN COL1 IS NULL THEN 1 END = 1  – this is a good suggestion that will create a very small index structure, useful when it is possible to modify the original SQL statement.

A lot of great ideas in the thread (Richard Foote’s blog articles demonstrate several of these approaches).  Hemant K Chitale pointed out that there are potential problems with the suggestion that I offered, where a constant is used for the second column in the index so that NULL values in the first column are included in the index structure.  He referenced Metalink (My Oracle Support) Doc ID 551754.1 that described Bug 6737251 and one of his blog articles.  That Metalink article, as well as another mentioned by Amardeep Sidhu which caused problems for DBMS_STATS (Metalink Doc ID 559389.1, not mentioned in the thread, suggests a work around for the bug 6737251), essentially state to NOT use a constant for the second column in the composite index, but instead to use another column in the table.  I reviewed Metalink Doc ID 551754.1, and found it a bit lacking in detail (maybe it is just me?).  Below are my comments from the thread regarding that Metalink document:

I see a couple of problems with that Metalink (My Oracle Support) article… it lacks a little clarity (then again I could be seeing things that do not exist). First, the suggestion to use a pair of columns rather than a constant has at least four potential problems that I am able to identify:

  1. The second column must have a NOT NULL constraint (not mentioned in the article) – it cannot be just any secondary column.
  2. The secondary column will likely increase the index size a bit more than a single character used for the second column in the index.
  3. The secondary column will likely affect the clustering factor calculation for the index.
  4. The secondary column could affect the cardinality estimates for the index access path.

The second problem with the Metalink article is that, while it does demonstrate a bug, it does not show why the bug affects the results, nor does it explore other possibilities – like the one that Richard Foote used in his blog article. Here is a quick test case, loosely based on the Metalink test case, to demonstrate (note that I have not copied the statistics output from AUTOTRACE so that I may improve the clarity of the output):

First, the setup:

DROP TABLE T1 PURGE;

CREATE TABLE T1 (C1 NUMBER, C2 NUMBER);

INSERT INTO T1 VALUES (NULL, 1);

COMMIT;

CREATE INDEX IND_T1_1 ON T1(C1,1);
CREATE INDEX IND_T1_2 ON T1(C1,' ');

We now have a table containing a single row and two indexes – the first index uses a numeric constant for the second column in the index, while the second index uses a blank space for the second column in the index. Now continuing:

SET LINESIZE 140
SET PAGESIZE 1000
SET TRIMSPOOL ON
SET AUTOTRACE ON

SELECT /*+ INDEX(T1 IND_T1_1) */
  *
FROM
  T1
WHERE
  C1 IS NULL;

        C1         C2
---------- ----------
                    1

Execution Plan
----------------------------------------------------------
Plan hash value: 2805969644

----------------------------------------------------------------------------------------
| Id  | Operation                   | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |          |     1 |    26 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1       |     1 |    26 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IND_T1_1 |     1 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C1" IS NULL)

Note
-----
   - dynamic sampling used for this statement

The above worked as expected. Continuing:

SELECT /*+ INDEX(T1 IND_T1_1) */
  *
FROM
  T1
WHERE
  C1 IS NULL
  AND ROWNUM<=1;

no rows selected

Execution Plan
----------------------------------------------------------
Plan hash value: 404994253

-----------------------------------------------------------------------------------------
| Id  | Operation                    | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |          |     1 |    26 |     1   (0)| 00:00:01 |
|*  1 |  COUNT STOPKEY               |          |       |       |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1       |     1 |    26 |     1   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | IND_T1_1 |     1 |       |     1   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(ROWNUM<=1)
   3 - access("C1" IS NULL AND ROWNUM<=1)
       filter(ROWNUM<=1)

Note
-----
   - dynamic sampling used for this statement

Note that this time we encountered the bug – take a close look at the Predicate Information section of the execution plan to see why.

Now the test continues with the suggestion from Richard’s blog:

SELECT /*+ INDEX(T1 IND_T1_2) */
  *
FROM
  T1
WHERE
  C1 IS NULL;

        C1         C2
---------- ----------
                    1

Execution Plan
----------------------------------------------------------
Plan hash value: 348287884

----------------------------------------------------------------------------------------
| Id  | Operation                   | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |          |     1 |    26 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1       |     1 |    26 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IND_T1_2 |     1 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C1" IS NULL)

Note
-----
   - dynamic sampling used for this statement

We obtained the same result as before [the correct results], continuing:

SELECT /*+ INDEX(T1 IND_T1_2) */
  *
FROM
  T1
WHERE
  C1 IS NULL
  AND ROWNUM<=1;

        C1         C2
---------- ----------
                    1

Execution Plan
----------------------------------------------------------
Plan hash value: 2383334138

-----------------------------------------------------------------------------------------
| Id  | Operation                    | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |          |     1 |    26 |     2   (0)| 00:00:01 |
|*  1 |  COUNT STOPKEY               |          |       |       |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1       |     1 |    26 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | IND_T1_2 |     1 |       |     1   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(ROWNUM<=1)
   3 - access("C1" IS NULL)

Note
-----
   - dynamic sampling used for this statement

Note that this time we did not encounter the bug [we received the correct results], and a row was returned. Compare the Predicate Information section of the execution plan with the one that failed to produce the correct result.

Let’s remove the “dynamic sampling used for this statement” note by gathering statistics:

SQL> EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T1',CASCADE=>TRUE,NO_INVALIDATE=>FALSE)
BEGIN DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T1',CASCADE=>TRUE,NO_INVALIDATE=>FALSE); END;

*
ERROR at line 1:
ORA-03001: unimplemented feature
ORA-06512: at "SYS.DBMS_STATS", line 13159
ORA-06512: at "SYS.DBMS_STATS", line 13179
ORA-06512: at line 1

SQL> EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T1',CASCADE=>TRUE)
BEGIN DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T1',CASCADE=>TRUE); END;

*
ERROR at line 1:
ORA-03001: unimplemented feature
ORA-06512: at "SYS.DBMS_STATS", line 13159
ORA-06512: at "SYS.DBMS_STATS", line 13179
ORA-06512: at line 1

SQL> EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T1')
BEGIN DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T1'); END;

*
ERROR at line 1:
ORA-03001: unimplemented feature
ORA-06512: at "SYS.DBMS_STATS", line 13159
ORA-06512: at "SYS.DBMS_STATS", line 13179
ORA-06512: at line 1

SQL> EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T2')

PL/SQL procedure successfully completed.

Well, it appears that we hit another bug [likely the one mentioned by Amardeep Sidhu]. Note that I successfully gathered statistics on another table just to demonstrate that there was not a problem with my statistics gathering syntax. Let’s fix that problem [removing the problem that triggered the bug in the Oracle Database software]:

SQL> DROP INDEX IND_T1_1;

SQL> EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T1',CASCADE=>TRUE,NO_INVALIDATE=>FALSE)

PL/SQL procedure successfully completed.

That is better. Apparently, Oracle Database (at least 10.2.0.2) has problems when the second column in an index definition is a number constant, but not when the second column is a character constant.

Mohamed Houri asked a very good question about the impact on the calculated clustering factor for an index when the index definition is changed to have a blank space for the second column, which will allow the rows containing NULL values to be indexed.  A change in the clustering factor may affect the optimizer’s decision to select an index access path using that index.

Without performing a test, I would estimate that each NULL value to be included in the index structure will increase the clustering factor by a value of 1, unless all of those rows containing the NULL values are contained within a couple of table blocks, in which case the clustering factor should increase by less than 1 per NULL value – such a case would likely assume that the rows were inserted at roughly the same time.

Confusing? The NULL values in the index structure will be logically grouped together in logically adjacent index leaf blocks, when this is considered as well as the algorithm for calculating the clustering factor, the maximum increase in the clustering for this approach should be fairly easy to guess. See the clustering factor chapter in the book “Cost-Based Oracle Fundamentals”.

Let’s create a test case (note that this test case differs slightly from the test case in the thread) that hopefully explores some of the possibilities, and tries to determine the potential impact on the clustering factor.  The setup:

DROP TABLE T1 PURGE;

CREATE TABLE T1 (
  COL1 NUMBER NOT NULL,
  COL2 NUMBER,
  COL3 NUMBER,
  EMPLOYEE_ID VARCHAR2(20),
  SHIFT_DATE DATE NOT NULL,
  ROLLING_DATE DATE NOT NULL,
  INDIRECT_ID VARCHAR2(20));

INSERT INTO
  T1
SELECT
  ROWNUM COL1,
  DECODE(MOD(ROWNUM,1000),0,NULL,ROWNUM) COL2,
  DECODE(SIGN(999001-ROWNUM),1,ROWNUM,NULL) COL3,
  DECODE(TRUNC(DBMS_RANDOM.VALUE(0,5)),
          0,'MIKE',
          1,'ROB',
          2,'SAM',
          3,'JOE',
          4,'ERIC') EMPLOYEE_ID,
  TRUNC(SYSDATE)-ROUND(DBMS_RANDOM.VALUE(0,1000)) SHIFT_DATE,
  TRUNC(SYSDATE) + (1/1000)*ROWNUM ROLLING_DATE,
  DECODE(TRUNC(DBMS_RANDOM.VALUE(0,10)),
          0,'VAC',
          1,'HOL',
          2,'BEREAVE',
          3,'JURY',
          4,'ABS',
          5,'EXCUSE',
          6,'MIL',
          'OTHER') INDIRECT_ID
FROM
  DUAL
CONNECT BY
  LEVEL<=1000000;

COMMIT;

We now have 1,000,000 rows.  Columns COL2 and COL3 each contain 0.1% NULL values (COL2 has the NULL values evenly dispursed, while COL3 has the NULL values in the last 1,000 rows inserted).  Now, let’s demonstrate an example of over-indexing – do not do this in a production environment.

CREATE INDEX IND_T1_COL1 ON T1(COL1);
CREATE INDEX IND_T1_COL2 ON T1(COL2);
CREATE INDEX IND_T1_EMPL ON T1(EMPLOYEE_ID);
CREATE INDEX IND_T1_SHIFT_DATE ON T1(SHIFT_DATE);
CREATE INDEX IND_T1_ROLL_DATE ON T1(ROLLING_DATE);

CREATE INDEX IND_T1_COL2_NOT_NULL ON T1(COL2,' ');
CREATE INDEX IND_T1_COL3_NOT_NULL ON T1(COL3,' ');
CREATE INDEX IND_T1_EMPL_NOT_NULL ON T1(EMPLOYEE_ID,' ');

CREATE INDEX IND_T1_COL2_NOT_NULL2 ON T1(COL2,COL1);
CREATE INDEX IND_T1_COL2_NOT_NULL3 ON T1(COL2,SHIFT_DATE);
CREATE INDEX IND_T1_COL2_NOT_NULL4 ON T1(COL2,ROLLING_DATE);
CREATE INDEX IND_T1_COL2_NOT_NULL5 ON T1(COL2,EMPLOYEE_ID);
CREATE INDEX IND_T1_COL2_NOT_NULL6 ON T1(NVL(COL2,-1));
CREATE INDEX IND_T1_COL2_NOT_NULL7 ON T1(CASE WHEN COL2 IS NULL THEN 1 END);

CREATE INDEX IND_T1_COL3_NOT_NULL2 ON T1(COL3,COL1);
CREATE INDEX IND_T1_COL3_NOT_NULL3 ON T1(COL3,SHIFT_DATE);
CREATE INDEX IND_T1_COL3_NOT_NULL4 ON T1(COL3,ROLLING_DATE);
CREATE INDEX IND_T1_COL3_NOT_NULL5 ON T1(COL3,EMPLOYEE_ID);
CREATE INDEX IND_T1_COL3_NOT_NULL6 ON T1(NVL(COL3,-1));
CREATE INDEX IND_T1_COL3_NOT_NULL7 ON T1(CASE WHEN COL3 IS NULL THEN 1 END);

CREATE INDEX IND_T1_EMPL_NOT_NULL2 ON T1(EMPLOYEE_ID,COL1);
CREATE INDEX IND_T1_EMPL_NOT_NULL3 ON T1(NVL(EMPLOYEE_ID,'UNKNOWN'));
CREATE INDEX IND_T1_EMPL_NOT_NULL4 ON T1(CASE WHEN EMPLOYEE_ID IS NULL THEN 'UNKNOWN' END);

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T1',CASCADE=>TRUE,ESTIMATE_PERCENT=>NULL)

The statistics collection script was specified to sample 100% of the table and indexes.  For my test run on Oracle Database 11.1.0.7, I received the following results (I added the index definition to the end of each line of output):

SELECT
  INDEX_NAME,
  DISTINCT_KEYS,
  BLEVEL,
  LEAF_BLOCKS,
  CLUSTERING_FACTOR,
  SAMPLE_SIZE
FROM
  USER_INDEXES
WHERE
  TABLE_NAME='T1'
ORDER BY
  INDEX_NAME;

INDEX_NAME            DISTINCT_KEYS BLEVEL LEAF_BLOCKS CLUSTERING_FACTOR SAMPLE_SIZE
--------------------- ------------- ------ ----------- ----------------- -----------
IND_T1_COL1                 1000000      2        2226              6330     1000000  (COL1)
IND_T1_COL2                  999000      2        2224              6330      999000  (COL2)
IND_T1_COL2_NOT_NULL         999001      2        2505              7330     1000000  (COL2,' ')
IND_T1_COL2_NOT_NULL2       1000000      2        2921              7330     1000000  (COL2,COL1)
IND_T1_COL2_NOT_NULL3        999640      2        3343              7330     1000000  (COL2,SHIFT_DATE)
IND_T1_COL2_NOT_NULL4       1000000      2        3343              7330     1000000  (COL2,ROLLING_DATE)
IND_T1_COL2_NOT_NULL5        999005      2        2841              7330     1000000  (COL2,EMPLOYEE_ID)
IND_T1_COL2_NOT_NULL6        999001      2        2226              7330     1000000  (NVL(COL2,-1))
IND_T1_COL2_NOT_NULL7             1      1           2              1000        1000  (CASE WHEN COL2 IS NULL THEN 1 END)
IND_T1_COL3_NOT_NULL         999001      2        2505              6331     1000000  (COL3,' ')
IND_T1_COL3_NOT_NULL2       1000000      2        2921              6330     1000000  (COL3,COL1)
IND_T1_COL3_NOT_NULL3        999638      2        3343              7150     1000000  (COL3,SHIFT_DATE)
IND_T1_COL3_NOT_NULL4       1000000      2        3343              6330     1000000  (COL3,ROLLING_DATE)
IND_T1_COL3_NOT_NULL5        999005      2        2841              6359     1000000  (COL3,EMPLOYEE_ID)
IND_T1_COL3_NOT_NULL6        999001      2        2226              6331     1000000  (NVL(COL3,-1))
IND_T1_COL3_NOT_NULL7             1      1           2                 7        1000  (CASE WHEN COL3 IS NULL THEN 1 END)
IND_T1_EMPL                       5      2        2147             31585     1000000  (EMPLOYEE_ID)
IND_T1_EMPL_NOT_NULL              5      2        2425             31585     1000000  (EMPLOYEE_ID,' ')
IND_T1_EMPL_NOT_NULL2       1000000      2        2840             31598     1000000  (EMPLOYEE_ID,COL1)
IND_T1_EMPL_NOT_NULL3             5      2        2147             31585     1000000  (NVL(EMPLOYEE_ID,'UNKNOWN'))
IND_T1_EMPL_NOT_NULL4             0      0           0                 0           0  (CASE WHEN EMPLOYEE_ID IS NULL THEN 'UNKNOWN' END)
IND_T1_ROLL_DATE            1000000      2        2646              6330     1000000  (ROLLING_DATE)
IND_T1_SHIFT_DATE              1001      2        2646            925286     1000000  (SHIFT_DATE)

SELECT
  BLOCKS
FROM
  USER_TABLES
WHERE
  TABLE_NAME='T1';

BLOCKS
------
  6418

For the 1,000,000 row table, 1 of every 1,000 rows contains a NULL value in column COL2, and the clustering factor increased by exactly 1,000 (1,000,000/1,000) for the index on that column that used a single space for the second column in the index (IND_T1_COL2_NOT_NULL). The results were different when the majority of the NULL values were grouped together, as was the case for column COL3, where the clustering factor increased by a value of 1. In this particular test case, with the NULL values evenly dispursed, the clustering factor is about the same regardless of whether a blank space is used for the second column, or an actual table column.  The clustering factor calculation for the index definition NVL(COL2, -1) and NVL(COL3, -1) was identical to that for the index definition COL2, ‘ ‘ and COL3, ‘ ‘, respectively.  The index structure using the CASE syntax was extremely small when 0.1% of the rows in the table contained NULL values, and thus produced a very small clustering factor because there were few index entries.  The test case did not include a bitmap index, because the clustering factor statistic for such an index has a different meaning.  A clustering factor of 6,330 is the lowest possible value for the indexes not using the CASE syntax (I expected it to be closer to the number of blocks in the table). 

Your results in a production environment may be very different from those in my simple test case, so be certain to test your solution.  So, the question: What would you do if you were faced with the problem identified by the original poster in the OTN thread?





SQL – Programmatic Row By Row to MERGE INTO

27 07 2010

July 27, 2010

A question in an email from an ERP mailing list combined with Cary Millsap’s latest blog article inspired this blog article.  The question from the ERP mailing list asked the following question:

Does anyone have Oracle syntax for the ‘upsert‘ command?  I have found a few examples, but little success yet.

Using VB.net, I want to have one command which will see if data exists, and if yes, update, if no, then insert.

There are several ways to approach this particular problem, some of which may be more efficient than others.  For example, assume that we have a table defined like this:

CREATE TABLE T2(
  ID NUMBER,
  COL2 NUMBER,
  COL3 NUMBER,
  COL4 NUMBER,
  PRIMARY KEY (ID));

Then we insert 5 rows using the following SQL statement (if you receive a primary key violation, just try executing the INSERT statement again) and then create a table that will allow us to quickly restore the original values for various repeated tests:

INSERT INTO
  T2
SELECT
  TRUNC(DBMS_RANDOM.VALUE(1,30)),
  TRUNC(DBMS_RANDOM.VALUE(1,1000)),
  TRUNC(DBMS_RANDOM.VALUE(1,1000)),
  TRUNC(DBMS_RANDOM.VALUE(1,1000))
FROM
  DUAL
CONNECT BY
  LEVEL<=5;

CREATE TABLE
  T2_BACKUP
AS
SELECT
  *
FROM
  T2;

The five rows created by the above will have random numeric values in the COL2, COL3, and COL4 columns.  The rows might look something like this:

SELECT
  *
FROM
  T2
ORDER BY
  ID;

ID       COL2       COL3       COL4
-- ---------- ---------- ----------
 1        993        718        103
10        583        924        458
13         27        650        861
16        141        348        813
28        716        517        204

Now we want to fill in the missing rows, so that ID values 1 through 30 appear in the table, but if the row already exists, we will modify the column values as follows:

  • COL2 will be set to a value of 0
  • COL3 will be set to a value of COL2 + COL3
  • COL4 will be set to a random value

How might we make these changes?  Well, we might do something silly, as demonstrated by the following VB Script code (this code may be executed with wscript or cscript on the Windows platform – it is also compatible with Visual Basic 6 and the Excel macro language, but the late binding should be changed to early binding, and variable types should be declared):

Const adOpenKeyset = 1
Const adLockOptimistic = 3

Dim dbDatabase
Dim dynData
Dim intS_ID
Dim intS_C2
Dim intS_C3
Dim intS_C4
Dim i
Dim strSQL
Dim strUsername
Dim strPassword
Dim strDatabase

Set dbDatabase = CreateObject("ADODB.Connection")
Set dynData = CreateObject("ADODB.Recordset")

'Database configuration
strUsername = "MyUsername"
strPassword = "MyPassword"
strDatabase = "MyDB"

dbDatabase.ConnectionString = "Provider=OraOLEDB.Oracle;Data Source=" & strDatabase & ";User ID=" & strUsername & ";Password=" & strPassword & ";"
dbDatabase.Open
'Should verify that the connection attempt was successful, but I will leave that for someone else to code

dbDatabase.BeginTrans

For i = 1 To 30
    strSQL = "SELECT" & vbCrLf
    strSQL = strSQL & "  ID," & vbCrLf
    strSQL = strSQL & "  COL2," & vbCrLf
    strSQL = strSQL & "  COL3," & vbCrLf
    strSQL = strSQL & "  COL4" & vbCrLf
    strSQL = strSQL & "FROM" & vbCrLf
    strSQL = strSQL & "  T2" & vbCrLf
    strSQL = strSQL & "WHERE" & vbCrLf
    strSQL = strSQL & "  ID=" & CStr(i)

    dynData.Open strSQL, dbDatabase, adOpenKeyset, adLockOptimistic

    intS_ID = i
    intS_C2 = Int(Rnd * 1000) + 1
    intS_C3 = Int(Rnd * 1000) + 1
    intS_C4 = Int(Rnd * 1000) + 1

    If Not (dynData.BOF And dynData.EOF) Then
        dynData("col2") = 0
        dynData("col3") = dynData("col2") + dynData("col3")
        dynData("col4") = intS_C4
    Else
        'No row found, need to add
        dynData.AddNew

        dynData("id") = i
        dynData("col2") = intS_C2
        dynData("col3") = intS_C3
        dynData("col4") = intS_C4
    End If
    dynData.Update

    dynData.Close
Next

dbDatabase.CommitTrans

dbDatabase.Close

Set dynData = Nothing
Set dbDatabase = Nothing

There are a couple of problems with the above, beyond the lack of bind variable usage.  At least 30 SQL statements are sent to the database.  If a row is tested to exist (the recordset’s BOF and EOF properties are not both true) then the row’s values are updated, otherwise a row is inserted.  This is the row by row (slow by slow) method of accomplishing the task.  When the script is executed, the table contents might look like this:

SELECT
  *
FROM
  T2
ORDER BY
  ID;

ID       COL2       COL3       COL4
-- ---------- ---------- ----------
 1          0        718        580
 2        290        302        775
 3         15        761        815
 4        710         46        415
 5        863        791        374
 6        962        872         57
 7        950        365        525
 8        768         54        593
 9        469        299        623
10          0        924        280
11        830        825        590
12        987        911        227
13          0        650        244
14        534        107       1000
15        677         16        576
16          0        348        799
17        285         46        296
18        383        301        949
19        980        402        279
20        161        163        647
21        411        413        713
22        327        634        208
23        187        584         81
24        458        906        262
25        786        379        290
26        920        632        628
27        429         98        562
28          0        517        835
29         23        544        917
30        431        678        503

Let’s return to the original starting point for table T2 so that we may try another test:

DELETE FROM T2;

INSERT INTO
  T2
SELECT
  *
FROM
  T2_BACKUP;

COMMIT;

Let’s eliminate the majority of the 30+ SQL statements that are sent to the database by modifying the VBS script:

Const adOpenKeyset = 1
Const adLockOptimistic = 3

Dim dbDatabase
Dim dynData
Dim intS_ID
Dim intS_C2
Dim intS_C3
Dim intS_C4
Dim i
Dim intMissing(30)
Dim strSQL
Dim strUsername
Dim strPassword
Dim strDatabase

Set dbDatabase = CreateObject("ADODB.Connection")
Set dynData = CreateObject("ADODB.Recordset")

'Database configuration
strUsername = "MyUsername"
strPassword = "MyPassword"
strDatabase = "MyDB"

dbDatabase.ConnectionString = "Provider=OraOLEDB.Oracle;Data Source=" & strDatabase & ";User ID=" & strUsername & ";Password=" & strPassword & ";"
dbDatabase.Open
'Should verify that the connection attempt was successful, but I will leave that for someone else to code

dbDatabase.BeginTrans

strSQL = "SELECT" & vbCrLf
strSQL = strSQL & "  ID," & vbCrLf
strSQL = strSQL & "  COL2," & vbCrLf
strSQL = strSQL & "  COL3," & vbCrLf
strSQL = strSQL & "  COL4" & vbCrLf
strSQL = strSQL & "FROM" & vbCrLf
strSQL = strSQL & "  T2" & vbCrLf
strSQL = strSQL & "ORDER BY" & vbCrLf
strSQL = strSQL & "  ID"

dynData.Open strSQL, dbDatabase, adOpenKeyset, adLockOptimistic

For i = 1 To 30
    intS_C2 = Int(Rnd * 1000) + 1
    intS_C3 = Int(Rnd * 1000) + 1
    intS_C4 = Int(Rnd * 1000) + 1

    If Not (dynData.EOF) Then
        If i = CInt(dynData("id")) Then
            intMissing(i) = False
            dynData("col2") = 0
            dynData("col3") = dynData("col2") + dynData("col3")
            dynData("col4") = intS_C4
            dynData.Update

            dynData.MoveNext
        Else
            intMissing(i) = True
        End If
    Else
        intMissing(i) = True
    End If
Next

'Add the missing rows
For i = 1 To 30
    intS_C2 = Int(Rnd * 1000) + 1
    intS_C3 = Int(Rnd * 1000) + 1
    intS_C4 = Int(Rnd * 1000) + 1

    If intMissing(i) = True Then
        dynData.AddNew
        dynData("id") = i
        dynData("col2") = intS_C2
        dynData("col3") = intS_C3
        dynData("col4") = intS_C4
        dynData.Update
    End If
Next

dynData.Close
dbDatabase.CommitTrans

dbDatabase.Close
Set dynData = Nothing
Set dbDatabase = Nothing

That certainly is better.  Here is the output showing the table’s contents:

SELECT
  *
FROM
  T2
ORDER BY
  ID;

ID       COL2       COL3       COL4
-- ---------- ---------- ----------
 1          0        718        580
 2        405        270         56
 3        244        980         61
 4        391        365        490
 5        156        475        258
 6        629        543        157
 7        939        655        507
 8        391        108        784
 9        460        754        597
10          0        924        280
11         74        106        332
12        129          1        537
13          0        650        244
14         82        192        679
15        455        358        150
16          0        348        799
17         90        758        402
18        462        493        208
19        330         96        590
20        170        928         98
21        444        273        873
22        751        273        674
23        257         90         31
24        323        791        298
25        236        481        255
26        341         45        483
27        207        865        589
28          0        517        835
29        543         81        635
30        411        961        115

Better, but not good enough.  There are too many round-trips between the client and server.  Let’s reset the T2 test table and try again:

DELETE FROM T2;

INSERT INTO
  T2
SELECT
  *
FROM
  T2_BACKUP;

COMMIT;

A third attempt collapses a lot of client-side code into two SQL statement:

Dim dbDatabase

Dim strSQL
Dim strUsername
Dim strPassword
Dim strDatabase

Set dbDatabase = CreateObject("ADODB.Connection")

'Database configuration
strUsername = "MyUsername"
strPassword = "MyPassword"
strDatabase = "MyDB"

dbDatabase.ConnectionString = "Provider=OraOLEDB.Oracle;Data Source=" & strDatabase & ";User ID=" & strUsername & ";Password=" & strPassword & ";"
dbDatabase.Open
'Should verify that the connection attempt was successful, but I will leave that for someone else to code

dbDatabase.BeginTrans

strSQL = "UPDATE" & vbCrLf
strSQL = strSQL & "  T2" & vbCrLf
strSQL = strSQL & "SET" & vbCrLf
strSQL = strSQL & "  COL2=0," & vbCrLf
strSQL = strSQL & "  COL3=COL2+COL3," & vbCrLf
strSQL = strSQL & "  COL4=TRUNC(DBMS_RANDOM.VALUE(1,1000))"
dbDatabase.Execute strSQL

strSQL = "INSERT INTO" & vbCrLf
strSQL = strSQL & "  T2" & vbCrLf
strSQL = strSQL & "SELECT" & vbCrLf
strSQL = strSQL & "  S.S_ID," & vbCrLf
strSQL = strSQL & "  S.S_C2," & vbCrLf
strSQL = strSQL & "  S.S_C3," & vbCrLf
strSQL = strSQL & "  S.S_C4" & vbCrLf
strSQL = strSQL & "FROM" & vbCrLf
strSQL = strSQL & "  (SELECT" & vbCrLf
strSQL = strSQL & "    ROWNUM S_ID," & vbCrLf
strSQL = strSQL & "    TRUNC(DBMS_RANDOM.VALUE(1,1000)) S_C2," & vbCrLf
strSQL = strSQL & "    TRUNC(DBMS_RANDOM.VALUE(1,1000)) S_C3," & vbCrLf
strSQL = strSQL & "    TRUNC(DBMS_RANDOM.VALUE(1,1000)) S_C4" & vbCrLf
strSQL = strSQL & "  FROM" & vbCrLf
strSQL = strSQL & "    DUAL" & vbCrLf
strSQL = strSQL & "  CONNECT BY" & vbCrLf
strSQL = strSQL & "    LEVEL<=30) S," & vbCrLf
strSQL = strSQL & "  T2" & vbCrLf
strSQL = strSQL & "WHERE" & vbCrLf
strSQL = strSQL & "  S.S_ID=T2.ID(+)" & vbCrLf
strSQL = strSQL & "  AND T2.ID IS NULL"
dbDatabase.Execute strSQL

dbDatabase.CommitTrans

Set dbDatabase = Nothing

Here is the output:

SELECT
  *
FROM
  T2
ORDER BY
  ID;

ID       COL2       COL3       COL4
-- ---------- ---------- ----------
 1          0       1711        202
 2        944        284        604
 3        612        909        576
 4        828        606        970
 5        433        868        446
 6        304        770        397
 7        502        257        474
 8        541        906        761
 9        283        614        819
10          0       1507        841
11        772         52        635
12        325         45        792
13          0        677        320
14        691        433        234
15        733        673        416
16          0        489        483
17        257         50         99
18        429        861        108
19        244          4        858
20        323        697        493
21        565        384        960
22        211        153        651
23        762        231        488
24         85        994        204
25        630        235        930
26        890        778        374
27         64        540        663
28          0       1233        955
29         70         16         56
30        493        647        742

Look closely at the above output.  Are you able to spot the “logic bug” in the first two code examples?

I like the above code sample, but we are able to improve it a bit by using a single SQL statement.  First, let’s reset the test table again:

DELETE FROM T2;

INSERT INTO
  T2
SELECT
  *
FROM
  T2_BACKUP;

COMMIT;

Now the code sample that uses a single SQL statement:

Dim dbDatabase

Dim strSQL
Dim strUsername
Dim strPassword
Dim strDatabase

Set dbDatabase = CreateObject("ADODB.Connection")

'Database configuration
strUsername = "MyUsername"
strPassword = "MyPassword"
strDatabase = "MyDB"

dbDatabase.ConnectionString = "Provider=OraOLEDB.Oracle;Data Source=" & strDatabase & ";User ID=" & strUsername & ";Password=" & strPassword & ";"
dbDatabase.Open
'Should verify that the connection attempt was successful, but I will leave that for someone else to code

dbDatabase.BeginTrans

strSQL = "MERGE INTO" & vbCrLf
strSQL = strSQL & "  T2" & vbCrLf
strSQL = strSQL & "USING" & vbCrLf
strSQL = strSQL & "  (SELECT" & vbCrLf
strSQL = strSQL & "    ROWNUM S_ID," & vbCrLf
strSQL = strSQL & "    TRUNC(DBMS_RANDOM.VALUE(1,1000)) S_C2," & vbCrLf
strSQL = strSQL & "    TRUNC(DBMS_RANDOM.VALUE(1,1000)) S_C3," & vbCrLf
strSQL = strSQL & "    TRUNC(DBMS_RANDOM.VALUE(1,1000)) S_C4" & vbCrLf
strSQL = strSQL & "  FROM" & vbCrLf
strSQL = strSQL & "    DUAL" & vbCrLf
strSQL = strSQL & "  CONNECT BY" & vbCrLf
strSQL = strSQL & "    LEVEL<=30) S" & vbCrLf
strSQL = strSQL & "ON" & vbCrLf
strSQL = strSQL & "  (T2.ID=S.S_ID)" & vbCrLf
strSQL = strSQL & "WHEN MATCHED THEN" & vbCrLf
strSQL = strSQL & "  UPDATE SET" & vbCrLf
strSQL = strSQL & "    T2.COL2=0," & vbCrLf
strSQL = strSQL & "    T2.COL3=T2.COL2+T2.COL3," & vbCrLf
strSQL = strSQL & "    T2.COL4=S.S_C4" & vbCrLf
strSQL = strSQL & "WHEN NOT MATCHED THEN" & vbCrLf
strSQL = strSQL & "  INSERT (ID, COL2, COL3, COL4) VALUES" & vbCrLf
strSQL = strSQL & "    (S.S_ID," & vbCrLf
strSQL = strSQL & "    S.S_C2," & vbCrLf
strSQL = strSQL & "    S.S_C3," & vbCrLf
strSQL = strSQL & "    S.S_C4)"
dbDatabase.Execute strSQL

dbDatabase.CommitTrans

Set dbDatabase = Nothing

The output of the above looks like this:

SELECT
  *
FROM
  T2
ORDER BY
  ID;

ID       COL2       COL3       COL4
-- ---------- ---------- ----------
 1          0       1711        286
 2        419         68        698
 3        849        296        986
 4         92         87        433
 5        425        786        802
 6        758        862        868
 7        450        327        978
 8        102        618        382
 9        276        563        620
10          0       1507        629
11        292        591        300
12        521        599        941
13          0        677        438
14        182        905        135
15        716        121        964
16          0        489        165
17        552        661         95
18        332        572        255
19        126        624        463
20        906        422        368
21        328        141        886
22        286        612        685
23        375        868        904
24        240        940        768
25          4        166        447
26        942        754        124
27        547        828        225
28          0       1233        872
29        883        417        215
30        762        427         21

At this point you are probably wondering why I even bothered to use VBScript for such a simple SQL statement.  Let’s reset the test table again:

DELETE FROM T2;

INSERT INTO
  T2
SELECT
  *
FROM
  T2_BACKUP;

COMMIT;

If I was trying to be as efficient as possible, I probably should have just executed the following in SQL*Plus:

MERGE INTO
  T2
USING
  (SELECT
    ROWNUM S_ID,
    TRUNC(DBMS_RANDOM.VALUE(1,1000)) S_C2,
    TRUNC(DBMS_RANDOM.VALUE(1,1000)) S_C3,
    TRUNC(DBMS_RANDOM.VALUE(1,1000)) S_C4
  FROM
    DUAL
  CONNECT BY
    LEVEL<=30) S
ON
  (T2.ID=S.S_ID)
WHEN MATCHED THEN
  UPDATE SET
    T2.COL2=0,
    T2.COL3=T2.COL2+T2.COL3,
    T2.COL4=S.S_C4
WHEN NOT MATCHED THEN
  INSERT (ID, COL2, COL3, COL4) VALUES
    (S.S_ID,
    S.S_C2,
    S.S_C3,
    S.S_C4);

The following shows the modifications made by the above:

SELECT
  *
FROM
  T2
ORDER BY
  ID;

ID       COL2       COL3       COL4
-- ---------- ---------- ----------
 1          0       1711        849
 2        502        487        567
 3        273        966        847
 4        236        544        198
 5        191        970        986
 6        820        316        468
 7        833        651         82
 8         46        583        368
 9         63        685        148
10          0       1507        249
11        111        409         88
12        219        795        409
13          0        677        571
14        771         26        313
15        373        962        186
16          0        489        514
17        230        970        824
18         92        715        131
19        355        220        206
20        996         87        841
21        815        384        375
22        935        455        339
23        606        190        720
24        558        591        341
25        780        207        614
26        267        430        371
27        881        292        655
28          0       1233         70
29        379        466        628
30        293        216        881

We are certainly able to arrive at the correct answer many different ways (and the incorrect answer at least once), but what is the right way to achieve the task placed in front of us is not always easy to see.  The MERGE INTO syntax is one that I have not used often enough, and probably deserves a greater investment of experimentation.

Have you found the logic bug with the first two code samples yet?

dynData("col2") = 0
dynData("col3") = dynData("col2") + dynData("col3")
dynData("col4") = intS_C4

The above works correctly when the columns are updated in that order in a SQL statement, but VBScript requires a minor adjustment to produce the correct, expected results:

dynData("col3") = dynData("col2") + dynData("col3")
dynData("col2") = 0
dynData("col4") = intS_C4

Picky, picky, picky…  :-)





Create an Auto-Scaling HTML Chart using Only SQL

8 07 2010

July 8, 2010 (Modified July 9, 2010)

I thought that I would try something a little different today – build an auto-scaling HTML bar chart using nothing more than a SQL statement.  I mentioned in this book review that I was impressed with the HTML chart that was included in the book, but I felt that it might be more interesting if the example used absolute positioning, rather than an HTML table.  So, I built an example using dynamic positioning that is not based on what appears in that book.

We will use the sample table from this blog article (side note: this is an interesting article that shows how a VBS script can generate a 10046 trace file, and then transform that trace file back into a VBS script), just with the table renamed to T1.

CREATE TABLE T1 AS
SELECT
  DECODE(TRUNC(DBMS_RANDOM.VALUE(0,5)),
          0,'MIKE',
          1,'ROB',
          2,'SAM',
          3,'JOE',
          4,'ERIC') EMPLOYEE_ID,
  TRUNC(SYSDATE)-ROUND(DBMS_RANDOM.VALUE(0,1000)) SHIFT_DATE,
  DECODE(TRUNC(DBMS_RANDOM.VALUE(0,10)),
          0,'VAC',
          1,'HOL',
          2,'BEREAVE',
          3,'JURY',
          4,'ABS',
          5,'EXCUSE',
          6,'MIL',
          'OTHER') INDIRECT_ID
FROM
  DUAL
CONNECT BY
  LEVEL<=1000;

COMMIT;

Now that we have 1000 rows in the sample table, let’s see how many entries fall into each week in the table (the week starts on a Monday) for those indirect entries that are either VAC, ABS, or EXCUSE:

SELECT
  NEXT_DAY(SHIFT_DATE,'MONDAY')-7 WEEK_OF,
  COUNT(*) IND
FROM
  T1
WHERE
  INDIRECT_ID IN ('VAC','ABS','EXCUSE')
GROUP BY
  NEXT_DAY(SHIFT_DATE,'MONDAY')-7
ORDER BY
  1;

Your results of course will be different from what follows due to the randomization of the data, but this is what was returned from my database:

WEEK_OF   IND
--------- ---
08-OCT-07   1
15-OCT-07   2
22-OCT-07   4
29-OCT-07   3
05-NOV-07   3
03-DEC-07   2
10-DEC-07   3
24-DEC-07   2
...
05-JAN-09   1
12-JAN-09   3
19-JAN-09   7
02-FEB-09   1
...
21-JUN-10   3
28-JUN-10   2
05-JUL-10   2

The above SQL statement should work for the base query, now we need to start manipulating the data so that we are able to calculate the size and location of the bars in the chart.  We will slide the above SQL statement into an inline view:

SELECT
  WEEK_OF,
  IND,
  MAX(IND) OVER () MAX_IND,
  COUNT(WEEK_OF) OVER () COUNT_WEEK_OF,
  ROWNUM RN
FROM
  (SELECT
    NEXT_DAY(SHIFT_DATE,'MONDAY')-7 WEEK_OF,
    COUNT(*) IND
  FROM
    T1
  WHERE
    INDIRECT_ID IN ('VAC','ABS','EXCUSE')
  GROUP BY
    NEXT_DAY(SHIFT_DATE,'MONDAY')-7
  ORDER BY
    1);

In addition to returning the original data from the SQL statement, we are now also returning the maximum data value, the total number of weeks with at least one entry, and a row counter:

WEEK_OF   IND MAX_IND COUNT_WEEK_OF    RN
--------- --- ------- ------------- -----
08-OCT-07   1       7           126     1
15-OCT-07   2       7           126     2
22-OCT-07   4       7           126     3
29-OCT-07   3       7           126     4
05-NOV-07   3       7           126     5
03-DEC-07   2       7           126     6
10-DEC-07   3       7           126     7
...
05-JAN-09   1       7           126    57
12-JAN-09   3       7           126    58
19-JAN-09   7       7           126    59
02-FEB-09   1       7           126    60
...
14-JUN-10   2       7           126   123
21-JUN-10   3       7           126   124
28-JUN-10   2       7           126   125
05-JUL-10   2       7           126   126

Next, we need to calculate the position and size of each of the bars in the chart, so we will again slide the above into an inline view:

SELECT
  WEEK_OF,
  IND,
  MAX_IND,
  COUNT_WEEK_OF,
  RN,
  TRUNC(300 * IND/MAX_IND) BAR_WIDTH,
  TRUNC(800 * 1/COUNT_WEEK_OF) BAR_HEIGHT,
  TRUNC(800 * 1/COUNT_WEEK_OF * (RN-1)) BAR_TOP,
  100 BAR_LEFT
FROM
  (SELECT
    WEEK_OF,
    IND,
    MAX(IND) OVER () MAX_IND,
    COUNT(WEEK_OF) OVER () COUNT_WEEK_OF,
    ROWNUM RN
  FROM
    (SELECT
      NEXT_DAY(SHIFT_DATE,'MONDAY')-7 WEEK_OF,
      COUNT(*) IND
    FROM
      T1
    WHERE
      INDIRECT_ID IN ('VAC','ABS','EXCUSE')
    GROUP BY
      NEXT_DAY(SHIFT_DATE,'MONDAY')-7
    ORDER BY
      1));

You might notice in the above that I specified that the maximum width of the chart will be 300 (pixels) and the maximum height will be 800 (pixels).  Here is the output:

WEEK_OF   IND MAX_IND COUNT_WEEK_OF    RN  BAR_WIDTH BAR_HEIGHT BAR_TOP BAR_LEFT
--------- --- ------- ------------- ----- ---------- ---------- ------- --------
08-OCT-07   1       7           126     1         42          6       0      100
15-OCT-07   2       7           126     2         85          6       6      100
22-OCT-07   4       7           126     3        171          6      12      100
29-OCT-07   3       7           126     4        128          6      19      100
05-NOV-07   3       7           126     5        128          6      25      100
03-DEC-07   2       7           126     6         85          6      31      100
10-DEC-07   3       7           126     7        128          6      38      100
24-DEC-07   2       7           126     8         85          6      44      100
...
05-JAN-09   1       7           126    57         42          6     355      100
12-JAN-09   3       7           126    58        128          6     361      100
19-JAN-09   7       7           126    59        300          6     368      100
02-FEB-09   1       7           126    60         42          6     374      100
...
14-JUN-10   2       7           126   123         85          6     774      100
21-JUN-10   3       7           126   124        128          6     780      100
28-JUN-10   2       7           126   125         85          6     787      100
05-JUL-10   2       7           126   126         85          6     793      100

Now what?  We need to convert the above into HTML using DIV tags to position the bars as calculated.  Prior to the first row we need to write a couple of HTML tags to set the page title, and after the last row we need to write a couple more HTML tags to close the BODY and HTML section of the document.  The transformed SQL statement looks like this:

SET TRIMSPOOL ON
SET LINESIZE 400
SET ECHO OFF
SET FEEDBACK OFF
SET VERIFY OFF
SET PAGESIZE 0
SET SQLPROMPT ''

SPOOL C:\CUSTOM_CHART.HTM

SELECT
  DECODE(RN,1,'<html><head><title>Custom Chart</title></head><body>' || CHR(13) || CHR(10),' ') ||
  '<div style="position:absolute;' ||
    'top:' || TO_CHAR(TRUNC(800 * 1/COUNT_WEEK_OF * (RN-1))) || 'px;' ||
    'left:' || TO_CHAR(5) || 'px;' ||
    'width:' || TO_CHAR(100) || 'px;' ||
    'height:' || TO_CHAR(TRUNC(800 * 1/COUNT_WEEK_OF)) || 'px;' ||
    '"><font size="1px" color="#0000FF">' || TO_CHAR(WEEK_OF,'MM/DD/YY') ||
      REPLACE('     ',' ',CHR(38) || 'nbsp;') || TO_CHAR(IND) || '</font></div>' ||
  '<div style="background:#444466;position:absolute;' ||
    'top:' || TO_CHAR(TRUNC(800 * 1/COUNT_WEEK_OF * (RN-1))) || 'px;' ||
    'left:' || TO_CHAR(100) || 'px;' ||
    'width:' || TO_CHAR(TRUNC(300 * IND/MAX_IND)) || 'px;' ||
    'height:' || TO_CHAR(TRUNC(800 * 1/COUNT_WEEK_OF)) || 'px;' ||
    '"><font size="1px" color="#FFFFFF"></font></div>' ||
  DECODE(RN,COUNT_WEEK_OF, CHR(13) || CHR(10) || '</body></html>',' ') HTML_LINE
FROM
  (SELECT
    WEEK_OF,
    IND,
    MAX(IND) OVER () MAX_IND,
    COUNT(WEEK_OF) OVER () COUNT_WEEK_OF,
    ROWNUM RN
  FROM
    (SELECT
      NEXT_DAY(SHIFT_DATE,'MONDAY')-7 WEEK_OF,
      COUNT(*) IND
    FROM
      T1
    WHERE
      INDIRECT_ID IN ('VAC','ABS','EXCUSE')
    GROUP BY
      NEXT_DAY(SHIFT_DATE,'MONDAY')-7
    ORDER BY
      1));

SPOOL OFF

There is a slight problem with the above, the SQL statement and SPOOL OFF are printed in the resulting HTML file – if someone knows how to avoid that behavior (without placing the above into another script file), I would like to see how it is done (Oracle’s documentation did not help).

This is what the resulting HTML file looks like:

The number of result rows from the query was a bit high (126) so the bars are significantly compressed in height.  Just to see what happens, let’s add the following to the WHERE clause in the inner-most inline view:

AND SHIFT_DATE >= TO_DATE('01-JAN-2010','DD-MON-YYYY')

The resulting chart now looks like this:

Of course it is possible to adjust the colors of the font (#0000FF) and the bars (#444466), which are specified in hex in the format of RRGGBB (red green blue).  It is also possible to adjust the color of the bars to reflect the value represented by the bar, but that is an exercise for the reader.  For those who need to feel creative, it is also possible to display pictures in the bars, but that is also an exercise left for the reader.

—-

Edit: The sample output from the SQL statement displays correctly on Red Hat Enterprise Linux 3 using Firefox 0.8:





SQL – Experimenting with Case Insensitive Searches

4 06 2010

June 4, 2010

Have you ever read about something, or heard about something, and wanted to be able to reproduce it?  Have you ever been warned that doing something is not a good idea because a specific problem is certain to happen, yet you do it anyway just to see if the problem can be avoided?  I recently read (again) about performing case insensitive searches in Oracle… as is the default behavior on SQL Server.  So, let’s try a couple of experiments.

First, we need a test table with a primary key index:

CREATE TABLE T9 (
  C1 VARCHAR2(20),
  C2 VARCHAR2(200),
  PRIMARY KEY (C1));

INSERT INTO
  T9
SELECT
  CHR(65+MOD(ROWNUM-1,26))||
    CHR(65+MOD(CEIL(ROWNUM/26)-1,26))||
    CHR(65+MOD(CEIL(ROWNUM/676)-1,26)),
  RPAD('A',200,'A')
FROM
  DUAL
CONNECT BY
  LEVEL<=17576
UNION ALL
SELECT
  CHR(97+MOD(ROWNUM-1,26))||
    CHR(97+MOD(CEIL(ROWNUM/26)-1,26))||
    CHR(97+MOD(CEIL(ROWNUM/676)-1,26)),
  RPAD('A',200,'A')
FROM
  DUAL
CONNECT BY
  LEVEL<=17576
UNION ALL
SELECT
  CHR(65+MOD(ROWNUM-1,26))||
    CHR(65+MOD(CEIL(ROWNUM/26)-1,26))||
    CHR(97+MOD(CEIL(ROWNUM/676)-1,26)),
  RPAD('A',200,'A')
FROM
  DUAL
CONNECT BY
  LEVEL<=17576
UNION ALL
SELECT
  CHR(65+MOD(ROWNUM-1,26))||
    CHR(97+MOD(CEIL(ROWNUM/26)-1,26))||
    CHR(97+MOD(CEIL(ROWNUM/676)-1,26)),
  RPAD('A',200,'A')
FROM
  DUAL
CONNECT BY
  LEVEL<=17576
UNION ALL
SELECT
  CHR(97+MOD(ROWNUM-1,26))||
    CHR(97+MOD(CEIL(ROWNUM/26)-1,26))||
    CHR(65+MOD(CEIL(ROWNUM/676)-1,26)),
  RPAD('A',200,'A')
FROM
  DUAL
CONNECT BY
  LEVEL<=17576
UNION ALL
SELECT
  CHR(97+MOD(ROWNUM-1,26))||
    CHR(65+MOD(CEIL(ROWNUM/26)-1,26))||
    CHR(65+MOD(CEIL(ROWNUM/676)-1,26)),
  RPAD('A',200,'A')
FROM
  DUAL
CONNECT BY
  LEVEL<=17576;

COMMIT;

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T9',CASCADE=>TRUE)

The above test table is filled with one column having a three letter sequence (AAA, BAA, CAA,… ABA, BBA,… AZZ, … ) with a couple variations of upper and lowercase letters.  The other column is padded to 200 characters to intentionally discourage Oracle’s optimizer from using a full table scan when a suitable index is available.  Now the test (on Oracle Database 11.1.0.7).

To begin, we will explicitly list the upper and lowercase versions of the letters that are of interest:

SET AUTOTRACE TRACEONLY EXPLAIN

SELECT
  C1,
  C2
FROM
  T9
WHERE
  C1 IN ('ABC','abc','ABc','Abc','abC','aBC');

Plan hash value: 2861409042

---------------------------------------------------------------------------------------------
| Id  | Operation                    | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |              |     6 |  1230 |    13   (0)| 00:00:01 |
|   1 |  INLIST ITERATOR             |              |       |       |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID| T9           |     6 |  1230 |    13   (0)| 00:00:01 |
|*  3 |    INDEX UNIQUE SCAN         | SYS_C0016172 |     6 |       |     7   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("C1"='ABC' OR "C1"='ABc' OR "C1"='Abc' OR "C1"='aBC' OR "C1"='abC' OR
              "C1"='abc')

As can be seen, an index unique scan was performed for each of the values in the IN list.  The calculated cost is 13, and the optimizer is correctly predicting that 6 rows will be returned.

Next, we will try the brute force method, using the UPPER function on the C1 column:

SELECT
  C1,
  C2
FROM
  T9
WHERE
  UPPER(C1) = 'ABC';

Execution Plan
----------------------------------------------------------
Plan hash value: 3973213776

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |  1055 |   211K|   380   (2)| 00:00:02 |
|*  1 |  TABLE ACCESS FULL| T9   |  1055 |   211K|   380   (2)| 00:00:02 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(UPPER("C1")='ABC')

The calculated cost is now 380, compared to the earlier value of 13, and Oracle is predicting that 1,055 rows will be returned – the increased cost likely explains why the optimizer did not use a full table scan for the IN list version of the SQL statement.

Next, let’s try something that is not supposed to work, using an index when a function is applied to that index’s column in the WHERE clause:

SELECT /*+ INDEX(T9) */
  C1,
  C2
FROM
  T9
WHERE
  UPPER(C1) = 'ABC';

Plan hash value: 1084614729

--------------------------------------------------------------------------------------------
| Id  | Operation                   | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |              |  1055 |   211K|  1332   (1)| 00:00:06 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T9           |  1055 |   211K|  1332   (1)| 00:00:06 |
|*  2 |   INDEX FULL SCAN           | SYS_C0016172 |  1055 |       |   277   (2)| 00:00:02 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(UPPER("C1")='ABC')

The primary key index was used, but note that this access is an index full scan where every block in the index is accessed (using single block reads), rather than an index unique scan as happened with the first SQL statement.  The calculated cost also increased again, this time to 1,332.

Now what?  Well, we can tell Oracle to perform case insensitive matches:

ALTER SESSION SET NLS_SORT=BINARY_CI;
ALTER SESSION SET NLS_COMP=LINGUISTIC;

SELECT
  C1,
  C2
FROM
  T9
WHERE
  C1 = 'ABC';

Plan hash value: 3973213776

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |   205 |   382   (3)| 00:00:02 |
|*  1 |  TABLE ACCESS FULL| T9   |     1 |   205 |   382   (3)| 00:00:02 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(NLSSORT("C1",'nls_sort=''BINARY_CI''')=HEXTORAW('61626300') )

But now we are back to a full table scan and the calculated cost of that full table scan is 382, rather than the 380 that we saw earlier.  However, Oracle is expecting to retrieve only a single row now, not the actual 6 rows nor the 1,055 rows when a full table scan appeared earlier in this article.

So let’s force an index access path with a hint just to see what happens:

SELECT /*+ INDEX(T9) */
  C1,
  C2
FROM
  T9
WHERE
  C1 = 'ABC';

Plan hash value: 1084614729

--------------------------------------------------------------------------------------------
| Id  | Operation                   | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |              |     1 |   205 |   279   (3)| 00:00:02 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T9           |     1 |   205 |   279   (3)| 00:00:02 |
|*  2 |   INDEX FULL SCAN           | SYS_C0016172 |     1 |       |   278   (3)| 00:00:02 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(NLSSORT("C1",'nls_sort=''BINARY_CI''')=HEXTORAW('61626300') )

There is the index access path, and notice that the calculated cost for this access path is less than that of the full table scan, yet Oracle did not automatically select that access path.  We saw this behavior in an earlier article too.  The predicted cardinality still shows only a single row is expected to be returned.

We still have not tried a function based index, so we will switch back to case sensitive matches and try again:

ALTER SESSION SET NLS_COMP=BINARY;

CREATE INDEX IND_T9_C1_UPPER ON T9(UPPER(C1));
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T9',CASCADE=>TRUE)

SELECT
  C1,
  C2
FROM
  T9
WHERE
  UPPER(C1) = 'ABC';

Execution Plan
----------------------------------------------------------
Plan hash value: 1260941705

-----------------------------------------------------------------------------------------------
| Id  | Operation                   | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                 |     6 |  1254 |     7   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T9              |     6 |  1254 |     7   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IND_T9_C1_UPPER |     6 |       |     1   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access(UPPER("C1")='ABC')

The function based index access path produced a lower cost plan than the IN list plan at the start of this article, and the cardinality estimate is correct.

Now let’s change back to a case insensitive search of column C1 to see what happens:

ALTER SESSION SET NLS_SORT=BINARY_CI;
ALTER SESSION SET NLS_COMP=LINGUISTIC;

SELECT
  C1,
  C2
FROM
  T9
WHERE
  UPPER(C1) = 'ABC';

Execution Plan
----------------------------------------------------------
Plan hash value: 3973213776

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     6 |  1254 |   383   (3)| 00:00:02 |
|*  1 |  TABLE ACCESS FULL| T9   |     6 |  1254 |   383   (3)| 00:00:02 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(NLSSORT(UPPER("C1"),'nls_sort=''BINARY_CI''')=HEXTORAW('61
              626300') )

The optimizer did not use our function based index, so a full table scan was performed.

Trying again:

SELECT
  C1,
  C2
FROM
  T9
WHERE
  C1 = 'ABC';

Plan hash value: 3973213776

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |   205 |   382   (3)| 00:00:02 |
|*  1 |  TABLE ACCESS FULL| T9   |     1 |   205 |   382   (3)| 00:00:02 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(NLSSORT("C1",'nls_sort=''BINARY_CI''')=HEXTORAW('61626300') )

Well, the WHERE clause did not match the function based index definition, so of course that index was not used.

One more time:

SELECT /*+ INDEX(T9) */
  C1,
  C2
FROM
  T9
WHERE
  C1 = 'ABC';

Plan hash value: 1084614729

--------------------------------------------------------------------------------------------
| Id  | Operation                   | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |              |     1 |   205 |   279   (3)| 00:00:02 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T9           |     1 |   205 |   279   (3)| 00:00:02 |
|*  2 |   INDEX FULL SCAN           | SYS_C0016172 |     1 |       |   278   (3)| 00:00:02 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(NLSSORT("C1",'nls_sort=''BINARY_CI''')=HEXTORAW('61626300') )

I think that we saw that plan earlier in this article.

Now who wanted Oracle to behave like SQL Server anyway?





Measuring Numbers – Is this a Valid Comparison?

4 06 2010

June 4, 2010

I encountered an interesting test case in the “Oracle SQL Recipes” book, but I fear that my degree in mathematics is causing me to fail to fully comprehend the test case.  I developed a parallel test case that possibly answers the questions that are left unanswered.  Here is my test case:

CREATE TABLE T8 (
  NUMBER_DIV NUMBER,
  BIN_DBL_DIV BINARY_DOUBLE,
  NUMBER_VALUE NUMBER,
  BIN_DBL_VALUE BINARY_DOUBLE,
  NUMBER_VALUE2 NUMBER(7,2));

INSERT INTO
  T8
SELECT
  ROWNUM,
  ROWNUM,
  1000/ROWNUM,
  1000/ROWNUM,
  1000/ROWNUM
FROM
  DUAL
CONNECT BY
  LEVEL<=10000;

COMMIT;

COLUMN NUMBER_DIV FORMAT 99999
COLUMN BIN_DBL_DIV FORMAT 99999
COLUMN NUMBER_VALUE FORMAT 99990.00000000000000000000
COLUMN BIN_DBL_VALUE FORMAT 99990.00000000000000000000
COLUMN NUMBER_VALUE2 FORMAT 99990.000
COLUMN VND FORMAT 999
COLUMN VBDD FORMAT 9999
COLUMN VNV FORMAT 999
COLUMN VBDV FORMAT 9999
COLUMN VNV2 FORMAT 9999

SET LINESIZE 140
SET TRIMSPOOL ON
SET PAGESIZE 1000
SPOOL NUMBERTEST.TXT

SELECT
  NUMBER_DIV,
  BIN_DBL_DIV,
  NUMBER_VALUE,
  BIN_DBL_VALUE,
  NUMBER_VALUE2,
  VSIZE(NUMBER_DIV) VND,
  VSIZE(BIN_DBL_DIV) VBDD,
  VSIZE(NUMBER_VALUE) VNV,
  VSIZE(BIN_DBL_VALUE) VBDV,
  VSIZE(NUMBER_VALUE2) VNV2
FROM
  T8
ORDER BY
  NUMBER_DIV;

SPOOL OFF

Quoting from page 190 of the book:

With the value one-third stored in each column, we can use the VSIZE function to show it was much more complicated to store this [the value of 1/3] with  decimal precision [using the NUMBER datatype], taking nearly three times the space [when compared to the BINARY_DOUBLE datatype].

Here is the output from my script for the row containing the value one-third:

NUMBER_DIV BIN_DBL_DIV            NUMBER_VALUE           BIN_DBL_VALUE NUMBER_VALUE2  VND  VBDD  VNV  VBDV  VNV2
---------- ----------- ----------------------- ----------------------- ------------- ---- ----- ---- ----- -----
      3000        3000      0.3333333333333333      0.3333333333333333         0.330    2     8   21     8     2

As the book states, the column with the NUMBER datatype requires 21 bytes, while the column with the BINARY_DOUBLE datatype requires just 8 bytes to store the value one-third in the table.  What, if anything,  is wrong with the comparison?

Hint: To conserve space, the column format for the NUMBER_VALUE and BIN_DBL_VALUE columns in the above output was changed from:

99990.00000000000000000000
to:
99990.0000000000000000

There is an interesting description of the NUMBER and BINARY_DOUBLE (or other similar datatypes) datatypes in the book “Troubleshooting Oracle Performance“.





Lock Watching – What is Wrong with this SQL Statement?

3 06 2010

June 3, 2010

I came across an interesting SQL statement that is described as identifying blocking transactions:

select s1.username blkg_user, s1.machine blkg_ws, s1.sid blkg_sid,
       s2.username wait_user, s2.machine wait_ws, s2.sid wait_sid,
       lo.object_id blkd_obj_id, do.owner, do.object_name
from v$lock l1, v$session s1, v$lock l2, v$session s2,
     v$locked_object lo, dba_objects do
where s1.sid = l1.sid
  and s2.sid = l2.sid
  and l1.id1 = l2.id1
  and s1.sid = lo.session_id
  and lo.object_id = do.object_id
  and l1.block = 1
  and l2.request > 0;

The SQL statement is a bit different from the one that I typically use for determining enqueues.  The documentation also includes a SQL statement for determining enqueues.

What, if anything, is wrong with the above SQL statement?  If you need a test case, try the one found in this article.  I suspect that there may be more than one answer.





Date Delta SQL – What is Wrong with this SQL Statement?

2 06 2010

June 2, 2010

I found this interesting example a couple of months ago in a book, and something just did not seem right with the example.  What is the problem with the example?

select employee_id, first_name, last_name,
(sysdate - hire_date)*86400 Emp_Length_Seconds,
extract(year from (sysdate - hire_date) year to month) || ' years, ' ||
extract(month from (sysdate - hire_date) year to month) || ' months. '
Emp_Length_Readable
from hr.employees;

I have not used the EXTRACT SQL function much, if at all, prior to seeing this example, but I sensed that something was wrong.

Here is a test script to possibly help you find the problem with the above:

SET PAGESIZE 2000
COLUMN T FORMAT A20
SET TRIMSPOOL ON
SPOOL datedelta.txt

WITH DATES AS (
SELECT
  TO_DATE('01-JAN-2010','DD-MON-YYYY') + (ROWNUM-1) D
FROM
  DUAL
CONNECT BY
  LEVEL<=365)
SELECT
  D1.D D1,
  D2.D D2,
  D2.D-D1.D DATE_DELTA,
  extract(year from (D2.D-D1.D) year to month) || ' years, ' ||
  extract(month from (D2.D-D1.D) year to month) || ' months. ' T
FROM
  DATES D1,
  DATES D2
WHERE
  D1.D<D2.D;

SPOOL OFF

The above script scolls through every start and end date combination for this year, writing a modified version of the calculation from the original SQL statement to a text file.  The output should look something like this:

D1        D2        DATE_DELTA T
--------- --------- ---------- --------------------
01-JAN-10 02-JAN-10          1 0 years, 0 months.
01-JAN-10 03-JAN-10          2 0 years, 0 months.
01-JAN-10 04-JAN-10          3 0 years, 0 months.
01-JAN-10 05-JAN-10          4 0 years, 0 months.
01-JAN-10 06-JAN-10          5 0 years, 0 months.
01-JAN-10 07-JAN-10          6 0 years, 0 months.
01-JAN-10 08-JAN-10          7 0 years, 0 months.
01-JAN-10 09-JAN-10          8 0 years, 0 months.
01-JAN-10 10-JAN-10          9 0 years, 0 months.
...
05-MAR-10 16-JUN-10        103 0 years, 3 months.
05-MAR-10 17-JUN-10        104 0 years, 3 months.
05-MAR-10 18-JUN-10        105 0 years, 3 months.
05-MAR-10 19-JUN-10        106 0 years, 3 months.
05-MAR-10 20-JUN-10        107 0 years, 4 months.
05-MAR-10 21-JUN-10        108 0 years, 4 months.
05-MAR-10 22-JUN-10        109 0 years, 4 months.
...

What is wrong with this SQL statement, and more importantly, how do we fix it?





The INSTR Function will Never Use an Index? I will give You a Hint

1 06 2010

June 1, 2010

As regular readers probably know, I frequently read books on various computer related topics.  Last winter I pre-ordered the book “Oracle SQL Recipes”, and have been reading the book on and off since that time.  Some parts of the book are absolutely fantastic, and others leave me scratching my head – did I read that right?  Page 169 of the book describes the INSTR and LIKE functions.  The book makes the following statement:

LIKE can be more readable, and may even use an index if there are no pattern-matching characters at the beginning of the search string. INSTR will never use an existing index unless you have created a function-based index containing the exact INSTR clause that you used in the WHERE clause.

The above paragraph conveys a lot of good information, and certainly the authors have tested that the above paragraph is correct.  Repeat after me: The INSTR function will never use a non-function based index!  The INSTR function will never use a non-function based index.  The INSTR function will never use a non-function based index?

I sense that some people are not sure that the above is true.  Let’s put together a quick test table for a test:

CREATE TABLE T5 (
  CHAR_COL VARCHAR2(10),
  C2 VARCHAR2(100),
  PRIMARY KEY (CHAR_COL));

INSERT INTO
  T5
SELECT
  TO_CHAR(ROWNUM),
  RPAD('A',100,'A')
FROM
  DUAL
CONNECT BY
  LEVEL <=1000000;

COMMIT;

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T5',CASCADE=>TRUE)

Now that we have a test table with a primary key index, we need a test script:

SET AUTOTRACE TRACEONLY EXPLAIN

SELECT
  *
FROM
  T5
WHERE
  CHAR_COL LIKE '99999%';

SELECT
  *
FROM
  T5
WHERE
  INSTR(CHAR_COL,'99999') = 1;

SELECT
  *
FROM
  T5
WHERE
  SUBSTR(CHAR_COL,1,5) = '99999';

SELECT /*+ INDEX(T5) */
  *
FROM
  T5
WHERE
  INSTR(CHAR_COL,'99999') = 1;

SELECT /*+ INDEX(T5) */
  *
FROM
  T5
WHERE
  SUBSTR(CHAR_COL,1,5) = '99999';

SELECT /*+ INDEX(T5) CARDINALITY(T5 11) */
  *
FROM
  T5
WHERE
  INSTR(CHAR_COL,'99999') = 1;

SELECT /*+ INDEX(T5) CARDINALITY(T5 11) */
  *
FROM
  T5
WHERE
  SUBSTR(CHAR_COL,1,5) = '99999';

If the above script is executed on Oracle Database 11.2.0.1, which of the above seven SQL statements will use the primary key index?

Before you cry fowl, think about the question and the expected execution plans.

Wild turkeys, probably not what you would expect to find on a blog that is about Oracle Databases.  Yet seemingly appropriate.  I found this group of turkeys wandering in my field a year or two ago.  It is amazing how close I was able to get to the birds – with a 20x zoom camera.

Now that you have had a chance to think about it, and you have probably seen this blog article, which, if any, of the execution plans will show that the primary key index was used?  Here is the output of the script:

SQL> SELECT
  2    *
  3  FROM
  4    T5
  5  WHERE
  6    CHAR_COL LIKE '99999%'; 

Execution Plan
----------------------------------------------------------
Plan hash value: 260357324

--------------------------------------------------------------------------------------------
| Id  | Operation                   | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |              |     1 |   108 |     4   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T5           |     1 |   108 |     4   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | SYS_C0023201 |     1 |       |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("CHAR_COL" LIKE '99999%')
       filter("CHAR_COL" LIKE '99999%')

---

SQL> SELECT
  2    *
  3  FROM
  4    T5
  5  WHERE
  6    INSTR(CHAR_COL,'99999') = 1;

Execution Plan
----------------------------------------------------------
Plan hash value: 2002323537

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 10000 |  1054K|  4407   (1)| 00:00:53 |
|*  1 |  TABLE ACCESS FULL| T5   | 10000 |  1054K|  4407   (1)| 00:00:53 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(INSTR("CHAR_COL",'99999')=1)

---

SQL> SELECT
  2    *
  3  FROM
  4    T5
  5  WHERE
  6    SUBSTR(CHAR_COL,1,5) = '99999';

Execution Plan
----------------------------------------------------------
Plan hash value: 2002323537

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 10000 |  1054K|  4422   (1)| 00:00:54 |
|*  1 |  TABLE ACCESS FULL| T5   | 10000 |  1054K|  4422   (1)| 00:00:54 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(SUBSTR("CHAR_COL",1,5)='99999')

---

SQL> SELECT /*+ INDEX(T5) */
  2    *
  3  FROM
  4    T5
  5  WHERE
  6    INSTR(CHAR_COL,'99999') = 1;

Execution Plan
----------------------------------------------------------
Plan hash value: 1277447555

--------------------------------------------------------------------------------------------
| Id  | Operation                   | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |              | 10000 |  1054K|  5935   (1)| 00:01:12 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T5           | 10000 |  1054K|  5935   (1)| 00:01:12 |
|*  2 |   INDEX FULL SCAN           | SYS_C0023201 | 10000 |       |  3922   (1)| 00:00:48 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(INSTR("CHAR_COL",'99999')=1)

---

SQL> SELECT /*+ INDEX(T5) */
  2    *
  3  FROM
  4    T5
  5  WHERE
  6    SUBSTR(CHAR_COL,1,5) = '99999';

Execution Plan
----------------------------------------------------------
Plan hash value: 1277447555

--------------------------------------------------------------------------------------------
| Id  | Operation                   | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |              | 10000 |  1054K|  5950   (1)| 00:01:12 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T5           | 10000 |  1054K|  5950   (1)| 00:01:12 |
|*  2 |   INDEX FULL SCAN           | SYS_C0023201 | 10000 |       |  3937   (1)| 00:00:48 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(SUBSTR("CHAR_COL",1,5)='99999')

---

SQL> SELECT /*+ INDEX(T5) CARDINALITY(T5 11) */
  2    *
  3  FROM
  4    T5
  5  WHERE
  6    INSTR(CHAR_COL,'99999') = 1;

Execution Plan
----------------------------------------------------------
Plan hash value: 1277447555

--------------------------------------------------------------------------------------------
| Id  | Operation                   | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |              |    11 |  1188 |  5935   (1)| 00:01:12 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T5           |    11 |  1188 |  5935   (1)| 00:01:12 |
|*  2 |   INDEX FULL SCAN           | SYS_C0023201 | 10000 |       |  3922   (1)| 00:00:48 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(INSTR("CHAR_COL",'99999')=1)

---

SQL> SELECT /*+ INDEX(T5) CARDINALITY(T5 11) */
  2    *
  3  FROM
  4    T5
  5  WHERE
  6    SUBSTR(CHAR_COL,1,5) = '99999';

Execution Plan
----------------------------------------------------------
Plan hash value: 1277447555

--------------------------------------------------------------------------------------------
| Id  | Operation                   | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |              |    11 |  1188 |  5950   (1)| 00:01:12 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T5           |    11 |  1188 |  5950   (1)| 00:01:12 |
|*  2 |   INDEX FULL SCAN           | SYS_C0023201 | 10000 |       |  3937   (1)| 00:00:48 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(SUBSTR("CHAR_COL",1,5)='99999')

So, the answer to the question, “the INSTR function will never use a non-function based index?”  False, if an index use is hinted.  When an index hint is provided an index full scan is used to read the index, which reads the index blocks one at a time.  This could be a time consuming operation compared to a full table scan if all block accesses result in physical reads and the average row length for the table is not terribly long.  Consider with an 8KB block size, Oracle should be able to read up to 128 blocks (depending on extent size, OS limits, blocks already in the buffer cache, and DB_FILE_MULTIBLOCK_READ_COUNT value) in a single read request during the full table scan in about the same amount of time as would be required to read a single index block from disk during the index full scan operation (unless index pre-fetch kicks in to read multiple index blocks in a single read request).

So, yes we can use an index with the INSTR function, but would we really want to do that?  Maybe on rare occasions, but not as a regular practice.





Ramming Two SQL Statements Together – To Scalar or Too Scalar

30 05 2010

May 30, 2010

A recent comp.databases.oracle.server message thread about combining SQL statements triggered a bit of testing.  In light of some of my later comments in that thread, I thought I might start out by first showing the summary of the performance results.

In the above, the left side shows the TKPROF output from running a query that uses a standard join between four tables, but would probably behave about the same if it was a query with one table and the other three tables were specified in inline views.  The right side shows a query that produces identical results as the first, but uses SELECT statements in three columns of the parent SELECT statement that accesses a single table – there are three scalar (thanks Joel for looking up that term) subqueries in the SQL statement.  Each row in the above output shows what happened to the statistics output by TKPROF when the range of rows selected increased from 6,366 to 10,000,000 by changing the WHERE clause so that the upper range of the range scan in the query increased from 10 to 10,000.

By the end of the Server 1 test, where the test was executed directly on the server, the first query completed roughly 4.49 times faster than the second query, and required 102.27 times fewer consistent gets than the second query.

By the end of the Server 2 test, where the test was executed remotely across a gigabit network against a much slower server with 1/10 the number of rows in the main table, the first query completed roughly 6.27 times faster, and required 121.08 times fewer consistent gets than the second query.

The obvious conclusion is that scalar subqueries should not be used in column positions because they are much slower, and require a lot more consistent gets.

——–

I hear someone in the back shouting, “But, but… where is your evidence?  How can I verify that what you state to be correct is correct.  How can I be satisfied that my server isn’t substantially slower than your server 2?  Why were there all of those physical reads?  What were the execution plans, and did you see any unexpected Cartesian joins?  What did the SQL statements look like?  What were the table definitions”

Is it not enough to state that I achieved a 627% performance improvement by just making a small change to a SQL statement?  :-)  Of course not – switching now to the approach used by the other book that I have on pre-order.

Here is the test case that was executed for the remote test with Server 2:

CREATE TABLE T1 (
  ID NUMBER,
  DESCRIPTION VARCHAR2(80));

INSERT INTO T1
SELECT
  CEIL(ABS(SIN(ROWNUM/9.9999)*10000)),
  'This is the long description for this number '|| TO_CHAR(CEIL(ABS(SIN(ROWNUM/9.9999)*10000)))
FROM
  (SELECT
    ROWNUM RN
  FROM
    DUAL
  CONNECT BY
    LEVEL<=1000),
  (SELECT
    ROWNUM RN
  FROM
    DUAL
  CONNECT BY
    LEVEL<=1000);

CREATE INDEX IND_T1 ON T1(ID);

CREATE TABLE T2 AS
SELECT
  ROWNUM C1,
  LPAD('A',100,'A') C2
FROM
  DUAL
CONNECT BY
  LEVEL<=10000;

CREATE TABLE T3 AS
SELECT
  ROWNUM C1,
  LPAD('A',100,'A') C2
FROM
  DUAL
CONNECT BY
  LEVEL<=10000;

CREATE TABLE T4 AS
SELECT
  ROWNUM C1,
  LPAD('A',100,'A') C2
FROM
  DUAL
CONNECT BY
  LEVEL<=10000;

CREATE INDEX IND_T4 ON T4(C1);

COMMIT;

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T1',CASCADE=>TRUE,METHOD_OPT=>'FOR ALL COLUMNS SIZE 1')
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T2',CASCADE=>TRUE)
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T3',CASCADE=>TRUE)
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T4',CASCADE=>TRUE)

There are just 1,000,000 rows in table T1, and I did not allow the optimizer to create a histogram on the columns in table T1.  Now, let’s try a test script using the tables to see the expected execution plans (on Oracle Database 11.1.0.7):

SET AUTOTRACE TRACEONLY EXPLAIN

SELECT
  T1.ID,
  T2.C1 T2_C1,
  T3.C1 T3_C1,
  T4.C1 T4_C1
FROM
  T1,
  T2,
  T3,
  T4
WHERE
  T2.C1 BETWEEN 1 AND 200
  AND T2.C1=T3.C1
  AND T2.C1=T4.C1
  AND T2.C1=T1.ID;

SELECT
  T1.ID,
  (SELECT
    T2.C1
  FROM
    T2
  WHERE
    T1.ID=T2.C1) T2_C1,
  (SELECT
    T3.C1
  FROM
    T3
  WHERE
    T1.ID=T3.C1) T3_C1,
  (SELECT
    T4.C1
  FROM
    T4
  WHERE
    T1.ID=T4.C1) T4_C1
FROM
  T1
WHERE
  T1.ID BETWEEN 1 AND 200;

The first of the SQL statement directly joins the tables, while the second SQL statement places SELECT statements in column positions (scalar subqueries).  The output (11.1.0.7):

SQL> SELECT
  2    T1.ID,
  3    T2.C1 T2_C1,
  4    T3.C1 T3_C1,
  5    T4.C1 T4_C1
  6  FROM
  7    T1,
  8    T2,
  9    T3,
 10    T4
 11  WHERE
 12    T2.C1 BETWEEN 1 AND 200
 13    AND T2.C1=T3.C1
 14    AND T2.C1=T4.C1
 15    AND T2.C1=T1.ID;

Execution Plan
----------------------------------------------------------
Plan hash value: 3780653648

-------------------------------------------------------------------------------
| Id  | Operation            | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |        |   457 |  7312 | 10041   (1)| 00:00:41 |
|*  1 |  HASH JOIN           |        |   457 |  7312 | 10041   (1)| 00:00:41 |
|*  2 |   HASH JOIN          |        |   198 |  2376 |    46   (5)| 00:00:01 |
|*  3 |    HASH JOIN         |        |   199 |  1592 |    43   (3)| 00:00:01 |
|*  4 |     TABLE ACCESS FULL| T2     |   200 |   800 |    21   (0)| 00:00:01 |
|*  5 |     TABLE ACCESS FULL| T3     |   200 |   800 |    21   (0)| 00:00:01 |
|*  6 |    INDEX RANGE SCAN  | IND_T4 |   200 |   800 |     2   (0)| 00:00:01 |
|*  7 |   TABLE ACCESS FULL  | T1     | 20002 | 80008 |  9994   (1)| 00:00:41 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T2"."C1"="T1"."ID")
   2 - access("T2"."C1"="T4"."C1")
   3 - access("T2"."C1"="T3"."C1")
   4 - filter("T2"."C1"<=200 AND "T2"."C1">=1)
   5 - filter("T3"."C1"<=200 AND "T3"."C1">=1)
   6 - access("T4"."C1">=1 AND "T4"."C1"<=200)
   7 - filter("T1"."ID"<=200 AND "T1"."ID">=1)

SQL> SELECT
  2    T1.ID,
  3    (SELECT
  4      T2.C1
  5    FROM
  6      T2
  7    WHERE
  8      T1.ID=T2.C1) T2_C1,
  9    (SELECT
 10      T3.C1
 11    FROM
 12      T3
 13    WHERE
 14      T1.ID=T3.C1) T3_C1,
 15    (SELECT
 16      T4.C1
 17    FROM
 18      T4
 19    WHERE
 20      T1.ID=T4.C1) T4_C1
 21  FROM
 22    T1
 23  WHERE
 24    T1.ID BETWEEN 1 AND 200;

Execution Plan
----------------------------------------------------------
Plan hash value: 2945978589

----------------------------------------------------------------------------
| Id  | Operation         | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |        | 20002 | 80008 |  9994   (1)| 00:00:41 |
|*  1 |  TABLE ACCESS FULL| T2     |     1 |     4 |    21   (0)| 00:00:01 |
|*  2 |  TABLE ACCESS FULL| T3     |     1 |     4 |    21   (0)| 00:00:01 |
|*  3 |  INDEX RANGE SCAN | IND_T4 |     1 |     4 |     1   (0)| 00:00:01 |
|*  4 |  TABLE ACCESS FULL| T1     | 20002 | 80008 |  9994   (1)| 00:00:41 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("T2"."C1"=:B1)
   2 - filter("T3"."C1"=:B1)
   3 - access("T4"."C1"=:B1)
   4 - filter("T1"."ID"<=200 AND "T1"."ID">=1)

The execution plans look a bit different, and notice that the second execution plan shows bind variables in the Predicate Information section.  If we were to actually run the SQL statements, we might find that the first runs in about 15 seconds and the second in about 16-17 seconds, both with the same number of physical reads.  That is no fun, so let’s change the number 200 to 1200 to see what happens.  We will flush the buffer cache twice between executions to force physical reads for both executions (Oracle is set to use direct, asynchronous IO), and set the array fetch size to 1,000 to minimize the amount of unnecessary network traffic.  The test script follows:

SET TIMING ON
SET AUTOTRACE TRACEONLY STATISTICS;
ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE;
SET ARRAYSIZE 1000

SELECT
  T1.ID,
  T2.C1 T2_C1,
  T3.C1 T3_C1,
  T4.C1 T4_C1
FROM
  T1,
  T2,
  T3,
  T4
WHERE
  T2.C1 BETWEEN 1 AND 1200
  AND T2.C1=T3.C1
  AND T2.C1=T4.C1
  AND T2.C1=T1.ID;

ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE;

SELECT
  T1.ID,
  (SELECT
    T2.C1
  FROM
    T2
  WHERE
    T1.ID=T2.C1) T2_C1,
  (SELECT
    T3.C1
  FROM
    T3
  WHERE
    T1.ID=T3.C1) T3_C1,
  (SELECT
    T4.C1
  FROM
    T4
  WHERE
    T1.ID=T4.C1) T4_C1
FROM
  T1
WHERE
  T1.ID BETWEEN 1 AND 1200;

Here is the output of the above script:

SQL> SELECT
  2    T1.ID,
  3    T2.C1 T2_C1,
  4    T3.C1 T3_C1,
  5    T4.C1 T4_C1
  6  FROM
  7    T1,
  8    T2,
  9    T3,
 10    T4
 11  WHERE
 12    T2.C1 BETWEEN 1 AND 1200
 13    AND T2.C1=T3.C1
 14    AND T2.C1=T4.C1
 15    AND T2.C1=T1.ID;

76580 rows selected.

Elapsed: 00:00:15.96

Statistics
---------------------------------------------------
          1  recursive calls
          0  db block gets
      83197  consistent gets
      83110  physical reads
          0  redo size
    1288037  bytes sent via SQL*Net to client
       1217  bytes received via SQL*Net from client
         78  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      76580  rows processed

SQL> SELECT
  2    T1.ID,
  3    (SELECT
  4      T2.C1
  5    FROM
  6      T2
  7    WHERE
  8      T1.ID=T2.C1) T2_C1,
  9    (SELECT
 10      T3.C1
 11    FROM
 12      T3
 13    WHERE
 14      T1.ID=T3.C1) T3_C1,
 15    (SELECT
 16      T4.C1
 17    FROM
 18      T4
 19    WHERE
 20      T1.ID=T4.C1) T4_C1
 21  FROM
 22    T1
 23  WHERE
 24    T1.ID BETWEEN 1 AND 1200;

76580 rows selected.

Elapsed: 00:01:40.09

Statistics
---------------------------------------------------
          1  recursive calls
          0  db block gets
   10073639  consistent gets
      83110  physical reads
          0  redo size
    1288037  bytes sent via SQL*Net to client
       1217  bytes received via SQL*Net from client
         78  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      76580  rows processed

The number of consistent gets jumps significantly for the second SQL statement and so did the execution time (and CPU usage).  Here is the autotrace statistics for the second SQL statement (for comparison) when T1.ID BETWEEN 1 AND 200 was specified:
12732 rows selected.

Elapsed: 00:00:17.54

Statistics
---------------------------------------------------
          0  recursive calls
          0  db block gets
     522390  consistent gets
      83108  physical reads
          0  redo size
     196813  bytes sent via SQL*Net to client
        513  bytes received via SQL*Net from client
         14  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      12732  rows processed

While the above might be interesting, a slightly different table T1 was created for the Server 1 test script:

CREATE TABLE T1 (
  ID NUMBER,
  DESCRIPTION VARCHAR2(80));

INSERT INTO T1
SELECT
  CEIL(ABS(SIN(ROWNUM/9.9999)*10000)),
  'This is the long description for this number '|| TO_CHAR(CEIL(ABS(SIN(ROWNUM/9.9999)*10000)))
FROM
  (SELECT
    ROWNUM RN
  FROM
    DUAL
  CONNECT BY
    LEVEL<=1000),
  (SELECT
    ROWNUM RN
  FROM
    DUAL
  CONNECT BY
    LEVEL<=10000);

CREATE INDEX IND_T1 ON T1(ID);

CREATE TABLE T2 AS
SELECT
  ROWNUM C1,
  LPAD('A',100,'A') C2
FROM
  DUAL
CONNECT BY
  LEVEL<=10000;

CREATE TABLE T3 AS
SELECT
  ROWNUM C1,
  LPAD('A',100,'A') C2
FROM
  DUAL
CONNECT BY
  LEVEL<=10000;

CREATE TABLE T4 AS
SELECT
  ROWNUM C1,
  LPAD('A',100,'A') C2
FROM
  DUAL
CONNECT BY
  LEVEL<=10000;

CREATE INDEX IND_T4 ON T4(C1);

COMMIT;

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T1',CASCADE=>TRUE)
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T2',CASCADE=>TRUE)
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T3',CASCADE=>TRUE)
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T4',CASCADE=>TRUE)

This time I loaded table T1 with 10,000,000 rows and allowed the optimizer to collect a histogram on table T1, if it determined that a histogram would help.  The full test script on Server 1 is a bit long, so I will post an abbreviated version here.  First, let’s determine the execution plans for the SQL statements (Oracle Database 11.2.0.1 with 8,000MB SGA_TARGET):

SET AUTOTRACE TRACEONLY EXPLAIN
SET LINESIZE 160
SET TRIMSPOOL ON
SET PAGESIZE 2000
SPOOL SCALAR_EXECUTION_PLANS.TXT

SELECT
  T1.ID,
  T2.C1 T2_C1,
  T3.C1 T3_C1,
  T4.C1 T4_C1
FROM
  T1,
  T2,
  T3,
  T4
WHERE
  T2.C1 BETWEEN 1 AND 10
  AND T2.C1=T3.C1
  AND T2.C1=T4.C1
  AND T2.C1=T1.ID;

SELECT
  T1.ID,
  (SELECT
    T2.C1
  FROM
    T2
  WHERE
    T1.ID=T2.C1) T2_C1,
  (SELECT
    T3.C1
  FROM
    T3
  WHERE
    T1.ID=T3.C1) T3_C1,
  (SELECT
    T4.C1
  FROM
    T4
  WHERE
    T1.ID=T4.C1) T4_C1
FROM
  T1
WHERE
  T1.ID BETWEEN 1 AND 10;

SELECT
  T1.ID,
  T2.C1 T2_C1,
  T3.C1 T3_C1,
  T4.C1 T4_C1
FROM
  T1,
  T2,
  T3,
  T4
WHERE
  T2.C1 BETWEEN 1 AND 50
  AND T2.C1=T3.C1
  AND T2.C1=T4.C1
  AND T2.C1=T1.ID;

SELECT
  T1.ID,
  (SELECT
    T2.C1
  FROM
    T2
  WHERE
    T1.ID=T2.C1) T2_C1,
  (SELECT
    T3.C1
  FROM
    T3
  WHERE
    T1.ID=T3.C1) T3_C1,
  (SELECT
    T4.C1
  FROM
    T4
  WHERE
    T1.ID=T4.C1) T4_C1
FROM
  T1
WHERE
  T1.ID BETWEEN 1 AND 50;

...

SELECT
  T1.ID,
  T2.C1 T2_C1,
  T3.C1 T3_C1,
  T4.C1 T4_C1
FROM
  T1,
  T2,
  T3,
  T4
WHERE
  T2.C1 BETWEEN 1 AND 10000
  AND T2.C1=T3.C1
  AND T2.C1=T4.C1
  AND T2.C1=T1.ID;

SELECT
  T1.ID,
  (SELECT
    T2.C1
  FROM
    T2
  WHERE
    T1.ID=T2.C1) T2_C1,
  (SELECT
    T3.C1
  FROM
    T3
  WHERE
    T1.ID=T3.C1) T3_C1,
  (SELECT
    T4.C1
  FROM
    T4
  WHERE
    T1.ID=T4.C1) T4_C1
FROM
  T1
WHERE
  T1.ID BETWEEN 1 AND 10000;

SPOOL OFF

Then to check the performance results, the script that actually executes the statements (Oracle Database 11.2.0.1 with 8,000MB SGA_TARGET):

SET TIMING ON
SET AUTOTRACE TRACEONLY STATISTICS;
SET ARRAYSIZE 1000
SPOOL SCALAR_TEST_RESULTS.TXT
ALTER SESSION SET TRACEFILE_IDENTIFIER = 'SCALAR_TEST';
ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT FOREVER, LEVEL 8';

ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE;

SELECT
  T1.ID,
  T2.C1 T2_C1,
  T3.C1 T3_C1,
  T4.C1 T4_C1
FROM
  T1,
  T2,
  T3,
  T4
WHERE
  T2.C1 BETWEEN 1 AND 10
  AND T2.C1=T3.C1
  AND T2.C1=T4.C1
  AND T2.C1=T1.ID;

ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE;

SELECT
  T1.ID,
  (SELECT
    T2.C1
  FROM
    T2
  WHERE
    T1.ID=T2.C1) T2_C1,
  (SELECT
    T3.C1
  FROM
    T3
  WHERE
    T1.ID=T3.C1) T3_C1,
  (SELECT
    T4.C1
  FROM
    T4
  WHERE
    T1.ID=T4.C1) T4_C1
FROM
  T1
WHERE
  T1.ID BETWEEN 1 AND 10;

ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE;

SELECT
  T1.ID,
  T2.C1 T2_C1,
  T3.C1 T3_C1,
  T4.C1 T4_C1
FROM
  T1,
  T2,
  T3,
  T4
WHERE
  T2.C1 BETWEEN 1 AND 50
  AND T2.C1=T3.C1
  AND T2.C1=T4.C1
  AND T2.C1=T1.ID;

ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE;

SELECT
  T1.ID,
  (SELECT
    T2.C1
  FROM
    T2
  WHERE
    T1.ID=T2.C1) T2_C1,
  (SELECT
    T3.C1
  FROM
    T3
  WHERE
    T1.ID=T3.C1) T3_C1,
  (SELECT
    T4.C1
  FROM
    T4
  WHERE
    T1.ID=T4.C1) T4_C1
FROM
  T1
WHERE
  T1.ID BETWEEN 1 AND 50;

ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE;

...

ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE;

SELECT
  T1.ID,
  T2.C1 T2_C1,
  T3.C1 T3_C1,
  T4.C1 T4_C1
FROM
  T1,
  T2,
  T3,
  T4
WHERE
  T2.C1 BETWEEN 1 AND 10000
  AND T2.C1=T3.C1
  AND T2.C1=T4.C1
  AND T2.C1=T1.ID;

ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE;

SELECT
  T1.ID,
  (SELECT
    T2.C1
  FROM
    T2
  WHERE
    T1.ID=T2.C1) T2_C1,
  (SELECT
    T3.C1
  FROM
    T3
  WHERE
    T1.ID=T3.C1) T3_C1,
  (SELECT
    T4.C1
  FROM
    T4
  WHERE
    T1.ID=T4.C1) T4_C1
FROM
  T1
WHERE
  T1.ID BETWEEN 1 AND 10000;

SPOOL OFF
SET AUTOTRACE OFF

ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT OFF';

It is a bit surprising to see the number of different execution plans that appear in the TKPROF output for the first query compared with the number that appear for the second query.  For example (the Rows statistic shows the actual number of rows returned for the execution plan line, while card shows the predicted number of rows), only showing the execution plan when it changed (notice at one point we see three sort-merge joins):

Rows     Row Source Operation
-------  ---------------------------------------------------
   6366  NESTED LOOPS  (cr=371 pr=350 pw=0 time=2208 us cost=121 size=208 card=13)
     10   HASH JOIN  (cr=325 pr=318 pw=0 time=27 us cost=97 size=96 card=8)
     10    HASH JOIN  (cr=316 pr=310 pw=0 time=9 us cost=95 size=72 card=9)
     10     TABLE ACCESS FULL T2 (cr=158 pr=155 pw=0 time=0 us cost=47 size=40 card=10)
     10     TABLE ACCESS FULL T3 (cr=158 pr=155 pw=0 time=0 us cost=47 size=40 card=10)
     10    INDEX RANGE SCAN IND_T4 (cr=9 pr=8 pw=0 time=0 us cost=2 size=40 card=10)(object id 82879)
   6366   INDEX RANGE SCAN IND_T1 (cr=46 pr=32 pw=0 time=1315 us cost=3 size=8 card=2)(object id 82875)

Rows     Row Source Operation
-------  ---------------------------------------------------
  31832  HASH JOIN  (cr=415 pr=398 pw=0 time=13733 us cost=208 size=3872 card=242)
     50   HASH JOIN  (cr=318 pr=318 pw=0 time=49 us cost=97 size=576 card=48)
     50    HASH JOIN  (cr=316 pr=310 pw=0 time=49 us cost=95 size=392 card=49)
     50     TABLE ACCESS FULL T2 (cr=158 pr=155 pw=0 time=0 us cost=47 size=200 card=50)
     50     TABLE ACCESS FULL T3 (cr=158 pr=155 pw=0 time=49 us cost=47 size=200 card=50)
     50    INDEX RANGE SCAN IND_T4 (cr=2 pr=8 pw=0 time=0 us cost=2 size=200 card=50)(object id 82879)
  31832   INDEX RANGE SCAN IND_T1 (cr=97 pr=80 pw=0 time=3465 us cost=110 size=200020 card=50005)(object id 82875)

Rows     Row Source Operation
-------  ---------------------------------------------------
 127334  HASH JOIN  (cr=706 pr=606 pw=0 time=176652 us cost=533 size=63392 card=3962)
    200   INDEX RANGE SCAN IND_T4 (cr=2 pr=8 pw=0 time=0 us cost=2 size=800 card=200)(object id 82879)
 127334   HASH JOIN  (cr=704 pr=598 pw=0 time=138605 us cost=530 size=47772 card=3981)
    200    TABLE ACCESS FULL T3 (cr=158 pr=155 pw=0 time=99 us cost=47 size=800 card=200)
 127334    MERGE JOIN  (cr=546 pr=443 pw=0 time=107989 us cost=482 size=32008 card=4001)
 127334     INDEX RANGE SCAN IND_T1 (cr=388 pr=288 pw=0 time=18939 us cost=434 size=800080 card=200020)(object id 82875)
 127334     SORT JOIN (cr=158 pr=155 pw=0 time=0 us cost=48 size=800 card=200)
    200      TABLE ACCESS FULL T2 (cr=158 pr=155 pw=0 time=0 us cost=47 size=800 card=200)

Rows     Row Source Operation
-------  ---------------------------------------------------
 894205  MERGE JOIN  (cr=3077 pr=2222 pw=0 time=2023576 us cost=3124 size=3132784 card=195799)
 894205   MERGE JOIN  (cr=3073 pr=2214 pw=0 time=1414521 us cost=3119 size=2351028 card=195919)
 894205    MERGE JOIN  (cr=2915 pr=2059 pw=0 time=767100 us cost=3071 size=1568312 card=196039)
 894205     INDEX RANGE SCAN IND_T1 (cr=2757 pr=1904 pw=0 time=111615 us cost=3023 size=5600560 card=1400140)(object id 82875)
 894205     SORT JOIN (cr=158 pr=155 pw=0 time=0 us cost=48 size=5600 card=1400)
   1400      TABLE ACCESS FULL T2 (cr=158 pr=155 pw=0 time=127 us cost=47 size=5600 card=1400)
 894205    SORT JOIN (cr=158 pr=155 pw=0 time=0 us cost=48 size=5600 card=1400)
   1400     TABLE ACCESS FULL T3 (cr=158 pr=155 pw=0 time=127 us cost=47 size=5600 card=1400)
 894205   SORT JOIN (cr=4 pr=8 pw=0 time=0 us cost=5 size=5600 card=1400)
   1400    INDEX RANGE SCAN IND_T4 (cr=4 pr=8 pw=0 time=127 us cost=4 size=5600 card=1400)(object id 82879)

Rows     Row Source Operation
-------  ---------------------------------------------------
1939734  HASH JOIN  (cr=23230 pr=21260 pw=0 time=929418 us cost=6000 size=14396160 card=899760)
   3000   HASH JOIN  (cr=343 pr=333 pw=0 time=2499 us cost=102 size=35988 card=2999)
   3000    INDEX FAST FULL SCAN IND_T4 (cr=27 pr=23 pw=0 time=374 us cost=7 size=12000 card=3000)(object id 82879)
   3000    HASH JOIN  (cr=316 pr=310 pw=0 time=1249 us cost=95 size=24000 card=3000)
   3000     TABLE ACCESS FULL T2 (cr=158 pr=155 pw=0 time=499 us cost=47 size=12000 card=3000)
   3000     TABLE ACCESS FULL T3 (cr=158 pr=155 pw=0 time=249 us cost=47 size=12000 card=3000)
1939734   INDEX FAST FULL SCAN IND_T1 (cr=22887 pr=20927 pw=0 time=287619 us cost=5888 size=12001200 card=3000300)(object id 82875)

Edit May 30, 2010: The above execution plans demonstrate the cost-based optimizer’s ability to adapt the execution plan operations as the projected data volumes increase.  Not all of the tables in this test case have indexes, and that was intentional to see how the lack of indexes on certain tables affected the execution plans.

The execution plan for the second query remained a bit more unchanged (again, only showing when the execution plan changed):

Rows     Row Source Operation
-------  ---------------------------------------------------
     10  TABLE ACCESS FULL T2 (cr=1580 pr=155 pw=0 time=0 us cost=47 size=4 card=1)
     10  TABLE ACCESS FULL T3 (cr=1580 pr=155 pw=0 time=0 us cost=47 size=4 card=1)
     10  INDEX RANGE SCAN IND_T4 (cr=12 pr=8 pw=0 time=0 us cost=1 size=4 card=1)(object id 82879)
   6366  INDEX RANGE SCAN IND_T1 (cr=22 pr=32 pw=0 time=389 us cost=24 size=40004 card=10001)(object id 82875)

Rows     Row Source Operation
-------  ---------------------------------------------------
   3000  TABLE ACCESS FULL T2 (cr=474000 pr=155 pw=0 time=0 us cost=47 size=4 card=1)
   3000  TABLE ACCESS FULL T3 (cr=474000 pr=155 pw=0 time=0 us cost=47 size=4 card=1)
   3000  INDEX RANGE SCAN IND_T4 (cr=1990 pr=16 pw=0 time=0 us cost=1 size=4 card=1)(object id 82879)
1939734  INDEX FAST FULL SCAN IND_T1 (cr=22887 pr=20927 pw=0 time=313731 us cost=5888 size=12001200 card=3000300)(object id 82875)

How frequently do people use scalar subqueries in column positions rather than inline views, and then wonder why performance is slower than expected?  Every once in a while a thread will appear on a discussion forum asking about performance problems with queries that include scalar subqueries.  For example:

The test script (strip the .doc extension): scalar_test_script.sql

The test results (Oracle Database 11.2.0.1 with 8,000MB SGA_TARGET):
The spool file: scalar_test_results_spool_or112.txt
The execution plans: scalar_test_execution_plans_or112.txt
The TKPROF summary: scalar_test_or112_tkprof.txt

The test results (Oracle Database 11.1.0.7 with 8,000MB SGA_TARGET – same server as 11.2.0.1 test and same test script):
The spool file: scalar_test_results_spool_or111.txt
The execution plans: scalar_test_execution_plans_or111.txt
The TKPROF summary: scalar_test_or111_tkprof.txt





Defy Logic – the Cost-Based Optimizer does Not Select the Lowest Cost Plan – Implicit Data Type Conversion

28 05 2010

May 28, 2010

How many times have you heard someone say:

If you need to store numbers in the database, store the numbers in a column with a numeric datatype – storing the numbers in a column with a VARCHAR2 datatype invites problems.

If you need to store dates in the database, store the dates in a column with a DATE datatype – storing the dates in a column with a VARCHAR2 datatype invites problems; storing the dates in a column with a numeric datatype invites problems.

Yes, it happens in production envionments

Why is my SQL statement executing so slowly.

Why am I receiving unexpected results from this query?

The same holds true when selecting data from tables. 

  • If a number happens to exist in a column with a VARCHAR2 datatype, and it is necessary to retrieve that row by specifying that column in the WHERE clause, make certain that either the bind variable is defined as a VARCHAR/VARCHAR2 or that the constant (literal) value is wrapped in single quotes so that it is treated as a VARCHAR2, rather than as a number.  Yes, this happens in real life, as demontrated in this very recent OTN thread.
  • If the column is a VARCHAR2 datatype, do not define the bind variable with a NVARCHAR2 datatype.  Yes, this happens in real life, as demonstrated in this recent OTN thread and this slightly older OTN thread.
  • If the column is a DATE datatype, do not define the bind variable with a VARCHAR2 datatype and do not pass in a constant as a VARCHAR2 (string).  Something is bound to go wrong at some point.  I am still trying to determine why this technique was demonstrated multiple times in the “Oracle SQL Recipes” book (I have not had a chance to finish reading this book, so I have not posted a review yet that draws attention to this bad practice).

In the comments section of the most recent True or False Quiz article I showed a couple of demonstrations why numbers should not be stored in VARCHAR2 columns, if only numbers are to be stored in that column.  One of my comments showed why, when the column datatype does not match the datatype of the constant, and one of those is a numeric, why the other entity is converted to a numeric, rather than converting the numeric to a VARCHAR2.  From that comment entry, consider the following:

CREATE TABLE T3(
  CHAR_COL VARCHAR2(10),
  C2 VARCHAR2(100),
  PRIMARY KEY (CHAR_COL));

INSERT INTO T3 VALUES('1','A');
INSERT INTO T3 VALUES('1.0','A');
INSERT INTO T3 VALUES('1.00','A');
INSERT INTO T3 VALUES('1.000','A');
INSERT INTO T3 VALUES('1.0000','A');
INSERT INTO T3 VALUES('1.00000','A');
INSERT INTO T3 VALUES('1.000000','A');
INSERT INTO T3 VALUES('1.0000000','A');
INSERT INTO T3 VALUES('1.00000000','A');

COMMIT;

Now consider the following three queries – would the developer expect the three queries to return the same result rows?

SELECT
  *
FROM
  T3
WHERE
  CHAR_COL=1;

CHAR_COL   C2
---------- -----
1          A
1.0        A
1.00       A
1.000      A
1.0000     A
1.00000    A
1.000000   A
1.0000000  A
1.00000000 A

9 rows selected.

---

SELECT
  *
FROM
  T3
WHERE
  TO_NUMBER(CHAR_COL)=1;

CHAR_COL   C2
---------- -----
1          A
1.0        A
1.00       A
1.000      A
1.0000     A
1.00000    A
1.000000   A
1.0000000  A
1.00000000 A

9 rows selected.

---

SELECT
  *
FROM
  T3
WHERE
  CHAR_COL=TO_CHAR(1);

CHAR_COL   C2
---------- -----
1          A

Notice that the last query only returned one row, while the other two queries returned nine rows.  So, why are VARCHAR2 columns always converted to numeric values, rather than numeric values converted to VARCHAR2?  If the number was automatically converted by Oracle into a character value, Oracle might need to test a nearly infinite number of 0 characters appended to the end of the converted value after the decimal point (up to the number of characters of precision) for a matching result – this extra work is avoided by converting the character value to a number.

I might have appeared to have drifted off the topic of this blog article, so now let’s see a case where Oracle’s cost-based optimizer does the impossible – it does not pick the execution plan with the lowest calculated cost for a simple SQL statement involving a single table.  This test case can be reproduced on Oracle Database 10.2.0.4 through 11.2.0.1 (and probably a couple of other releases as well).  This test case is from another one of my comments in the recent True or False Quiz article – the bonus question.  The test case:

CREATE TABLE T1 (
  CHAR_COL VARCHAR2(10),
  C2 VARCHAR2(100),
  PRIMARY KEY (CHAR_COL));

INSERT INTO
  T1
SELECT
  TO_CHAR(ROWNUM),
  RPAD('A',100,'A')
FROM
  DUAL
CONNECT BY
  LEVEL <=1000000;

COMMIT;

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T1',CASCADE=>TRUE)

SET AUTOTRACE TRACEONLY EXPLAIN

ALTER SESSION SET TRACEFILE_IDENTIFIER = 'LOWEST_COST';
ALTER SESSION SET EVENTS '10053 TRACE NAME CONTEXT FOREVER, LEVEL 1';

SELECT /* FIND_ME */
  *
FROM
  T1
WHERE
  CHAR_COL = 10;

SELECT /*+ INDEX(T1) */  /* FIND_ME */
  *
FROM
  T1
WHERE
  CHAR_COL = 10;

SELECT /* FIND_ME */
  *
FROM
  T1
WHERE
  CHAR_COL = '10';

SET AUTOTRACE OFF
ALTER SESSION SET EVENTS '10053 TRACE NAME CONTEXT OFF';

The test case creates a simple table with 1,000,000 rows and then executes three SELECT statements.  Page 477 of Tom Kyte’s “Expert Oracle Database Architecture” book states the following:

Case 4: We have indexed a character column. This column contains only numeric data. We query using the following syntax:

select * from t where indexed_column = 5

Note that the number 5 in the query is the constant number 5 (not a character string). The index on INDEXED_COLUMN is not used…”

Jonathan Lewis stated here (a blog article from 2006) that:

“It’s a great shame that Oracle Corp. decided to use the name “hints” for its optimizer directive mechanism.  “Hints” are not hints, they are interception points in the optimizer code path, and must be obeyed.”

So, which expert’s explanation is correct for Oracle Database 10.2.0.4 through 11.2.0.1 for this particular test case?  Neither?  Both?  Let’s take a look at the output that was written to the SQL*Plus screen:

SQL> SELECT /* FIND_ME */
  2    *
  3  FROM
  4    T1
  5  WHERE
  6    CHAR_COL = 10;

Execution Plan
----------------------------------------------------------
Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |   108 |  4450   (2)| 00:00:54 |
|*  1 |  TABLE ACCESS FULL| T1   |     1 |   108 |  4450   (2)| 00:00:54 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(TO_NUMBER("CHAR_COL")=10)

The above shows that Tom Kyte’s book is correct – the primary key index on the column CHAR_COL is not used.  Continuing with the SQL*Plus output:

SQL> SELECT /*+ INDEX(T1) */  /* FIND_ME */
  2    *
  3  FROM
  4    T1
  5  WHERE
  6    CHAR_COL = 10;

Execution Plan
----------------------------------------------------------
Plan hash value: 458899268

--------------------------------------------------------------------------------------------
| Id  | Operation                   | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |              |     1 |   108 |  3961   (2)| 00:00:48 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1           |     1 |   108 |  3961   (2)| 00:00:48 |
|*  2 |   INDEX FULL SCAN           | SYS_C0010157 |     1 |       |  3960   (2)| 00:00:48 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(TO_NUMBER("CHAR_COL")=10)

The above SQL statement is the same as the first, just with an INDEX hint.  The index hint was obeyed.  Jonathan Lewis’ blog article is correct – index hints are directives, so Oracle’s optimizer selected an index access path using the only available index – the index on the primary key column.

But now we have a serious problem.  What is the problem?  For a SQL statement involving a single table a higher calculated cost execution plan (4,450) with a higher estimated time (54 seconds) was selected rather than using the obviously less expensive execution plan with a lower calculated cost (3,961) and with a lower estimated time (48 seconds).  Interesting…

The SQL*Plus output continues:

SQL> SELECT /* FIND_ME */
  2    *
  3  FROM
  4    T1
  5  WHERE
  6    CHAR_COL = '10';

Execution Plan
----------------------------------------------------------
Plan hash value: 1657849122

--------------------------------------------------------------------------------------------
| Id  | Operation                   | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |              |     1 |   108 |     3   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1           |     1 |   108 |     3   (0)| 00:00:01 |
|*  2 |   INDEX UNIQUE SCAN         | SYS_C0010157 |     1 |       |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("CHAR_COL"='10')

Notice that the calculated cost decreased significantly when we constructed the query correctly, and the available index access path was automatically selected.

So, why did the optimizer not select the lowest cost access path?  Fortunately, the test case created a 10053 trace file that helps explain what happened.  In the 10053 trace file, we see that the optimizer transformed the original SQL statement a bit:

Final query after transformations:******* UNPARSED QUERY IS *******
SELECT "T1"."CHAR_COL" "CHAR_COL","T1"."C2" "C2" FROM "TESTUSER"."T1" "T1" WHERE TO_NUMBER("T1"."CHAR_COL")=10

Now, with the transformed version of the SQL statement, it appears that we need a function based index on the CHAR_COL column that converts the column value to a number so that an index access path is possible.  Further down in the 10053 trace we find the following:

-----------------------------
SYSTEM STATISTICS INFORMATION
-----------------------------
  Using NOWORKLOAD Stats
  CPUSPEEDNW: 466 millions instructions/sec (default is 100)
  IOTFRSPEED: 4096 bytes per millisecond (default is 4096)
  IOSEEKTIM: 10 milliseconds (default is 10)
  MBRC: -1 blocks (default is 8 )

***************************************
BASE STATISTICAL INFORMATION
***********************
Table Stats::
  Table: T1  Alias: T1
    #Rows: 1000000  #Blks:  16217  AvgRowLen:  108.00
Index Stats::
  Index: SYS_C0010157  Col#: 1
    LVLS: 2  #LB: 3908  #DK: 1000000  LB/K: 1.00  DB/K: 1.00  CLUF: 201245.00
Access path analysis for T1
***************************************
SINGLE TABLE ACCESS PATH
  Single Table Cardinality Estimation for T1[T1]
  Table: T1  Alias: T1
    Card: Original: 1000000.000000  Rounded: 1  Computed: 1.00  Non Adjusted: 1.00
  Access Path: TableScan
    Cost:  4450.40  Resp: 4450.40  Degree: 0
      Cost_io: 4394.00  Cost_cpu: 315488412
      Resp_io: 4394.00  Resp_cpu: 315488412
  Best:: AccessPath: TableScan
         Cost: 4450.40  Degree: 1  Resp: 4450.40  Card: 1.00  Bytes: 0

***************************************

In the above, we see that the optimizer immediately jumped to a full table scan access path and then immediately declared that a full table scan offered the lowest cost – the optimizer did not even consider an index access path.  Now, let’s compare the above with the SQL statement having a hinted access path:

Final query after transformations:******* UNPARSED QUERY IS *******
SELECT /*+ INDEX ("T1") */ "T1"."CHAR_COL" "CHAR_COL","T1"."C2" "C2" FROM "TESTUSER"."T1" "T1" WHERE TO_NUMBER("T1"."CHAR_COL")=10
...

-----------------------------
SYSTEM STATISTICS INFORMATION
-----------------------------
  Using NOWORKLOAD Stats
  CPUSPEEDNW: 466 millions instructions/sec (default is 100)
  IOTFRSPEED: 4096 bytes per millisecond (default is 4096)
  IOSEEKTIM: 10 milliseconds (default is 10)
  MBRC: -1 blocks (default is 8 )

***************************************
BASE STATISTICAL INFORMATION
***********************
Table Stats::
  Table: T1  Alias: T1
    #Rows: 1000000  #Blks:  16217  AvgRowLen:  108.00
Index Stats::
  Index: SYS_C0010157  Col#: 1
    LVLS: 2  #LB: 3908  #DK: 1000000  LB/K: 1.00  DB/K: 1.00  CLUF: 201245.00
    User hint to use this index
Access path analysis for T1
***************************************
SINGLE TABLE ACCESS PATH
  Single Table Cardinality Estimation for T1[T1]
  Table: T1  Alias: T1
    Card: Original: 1000000.000000  Rounded: 1  Computed: 1.00  Non Adjusted: 1.00
kkofmx: index filter:TO_NUMBER("T1"."CHAR_COL")=10

  Access Path: index (FullScan)
    Index: SYS_C0010157
    resc_io: 3911.00  resc_cpu: 227852122
    ix_sel: 1.000000  ix_sel_with_filters: 0.000001
 ***** Logdef predicate Adjustment ******
 Final IO cst 0.00 , CPU cst 50.00
 ***** End Logdef Adjustment ******
    Cost: 3960.67  Resp: 3960.67  Degree: 1
  Best:: AccessPath: IndexRange
  Index: SYS_C0010157
         Cost: 3960.67  Degree: 1  Resp: 3960.67  Card: 1.00  Bytes: 0

The hint provided in the SQL statement forced the optimizer to do something that is supposedly not possible.  Notice that unlike the previous output, the full table scan was not even considered because of the hint.

The short summary of the above: do things correct from the start to avoid confusion and unexplained performance problems later.

The 10053 trace files (PDF versions of the 10053 trace files):
Oracle Database 11.2.0.1 OR112_LOWEST_COST
Oracle Database 11.1.0.7 OR1117_LOWEST_COST
Oracle Database 10.2.0.4 OR1024_LOWEST_COST

——–

* Late edit May 28, 2010: In the most recent True or False quiz, Centinul provided a documentation reference that states that VARCHAR2 values are always converted to numbers when a number is implicitly compared to a VARCHAR2 value.





Column Order in a Table – Does it Matter? 2

24 05 2010

May 24, 2010 (Updated May 24, 2010)

(Back to the Previous Post in the Series)

In the previous blog article in this series I created a test case with a table containing 47 columns.  The PL/SQL test precedure experienced 15% to 22% longer execution times when a column with the same definition, just buried 40 columns deeper into the table definition, was referenced by the PL/SQL script.  Could the problem be much worse than 20% – possibly performance problems in the anonymous PL/SQL testing scripts were hiding the true performance impact due to the overhead that did not change from one test to the other (moving the SQL data into a PL/SQL table, for instance).

Today’s blog article provides a demonstration what of happens, performance wise, when very wide tables (having many column) are accessed.  Oracle tables support up to 1,000 columns, however, only 255 columns will fit into a single row piece.  A 1,000 column table, therefore, will use at least 4 row pieces per row inserted into the table.  (I have not yet determined if this is still true if, for instance, the last 800 columns are all NULL, with the rest of the columns sized small enough to fit into a single block).  What performance impact, if any, will appear in the test when accessing a table’s columns that are beyond the first couple of columns?  Will we see the problem identified on page 537 of the Troubleshooting Oracle Performance book?

Creating a test script for a table with 1,000 columns potentially requires a significant amount of typing.  To automate the script writing, an Excel macro compatible (and Visual Basic 6 compatible) script is provided below to help automate the creation of the script:

Dim intFilenum As Integer
Dim strOut As String
Dim i As Integer
Dim j As Integer
Dim k As Integer
Dim intFlag As Integer

intFilenum = FreeFile

Open "C:\ColumnOrder2Script.sql" For Output As #intFilenum

strOut = "CREATE TABLE T10 (" & vbCrLf

For i = 1 To 1000
   strOut = strOut & "  C" & Format(i) & " VARCHAR2(10)," & vbCrLf
Next i

strOut = Left(strOut, Len(strOut) - 3) & ");" & vbCrLf

Print #intFilenum, strOut

Print #intFilenum, "ALTER TABLE T10 CACHE;" & vbCrLf

strOut = "INSERT INTO" & vbCrLf
strOut = strOut & "  T10" & vbCrLf
strOut = strOut & "SELECT" & vbCrLf

For i = 1 To 1000
    intFlag = False
    If i Mod 50 = 0 Then
        intFlag = True
    Else
        Select Case i
            Case 1, 255, 256, 510, 511, 765, 766, 1000
                intFlag = True
        End Select
    End If
    If intFlag = True Then
        strOut = strOut & "  RPAD(CHR(65+MOD(ROWNUM-1,20))||'" & Format(i) & "',10,'A')," & vbCrLf
    Else
        strOut = strOut & "  NULL," & vbCrLf
    End If
Next i

strOut = Left(strOut, Len(strOut) - 3) & vbCrLf
strOut = strOut & "FROM" & vbCrLf
strOut = strOut & "  DUAL" & vbCrLf
strOut = strOut & "CONNECT BY" & vbCrLf
strOut = strOut & "  LEVEL<=1000000;" & vbCrLf

Print #intFilenum, strOut

Print #intFilenum, "COMMIT;" & vbCrLf
Print #intFilenum, "EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T10',CASCADE=>TRUE)"
Print #intFilenum, ""
'Print #intFilenum, "SET AUTOTRACE TRACEONLY STATISTICS"
Print #intFilenum, ""
Print #intFilenum, "ALTER SESSION SET EVENTS '10949 TRACE NAME CONTEXT FOREVER, LEVEL 1';"
Print #intFilenum, "ALTER SESSION SET TRACEFILE_IDENTIFIER = 'IGNORE';"
Print #intFilenum, "ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT FOREVER, LEVEL 8';" & vbCrLf

For j = 1 To 2
    If j = 2 Then
        Print #intFilenum, "SET TIMING ON"
        Print #intFilenum, "SPOOL COLUMN_ORDER2.TXT"
        Print #intFilenum, "ALTER SESSION SET TRACEFILE_IDENTIFIER = 'COLUMN_ORDER_TEST2';" & vbCrLf
    End If
    For i = 1 To 1000
        intFlag = False
        If i Mod 50 = 0 Then
            intFlag = True
        Else
            Select Case i
                Case 1, 255, 256, 510, 511, 765, 766, 1000
                    intFlag = True
            End Select
        End If
        If intFlag = True Then
            For k = 1 To 10
                If (j = 2) Or (k <= 3) Then
                    'If on the second pass - the one that counts, run SQL 10 times
                    strOut = "SELECT" & vbCrLf
                    strOut = strOut & "  COUNT(C" & Format(i) & ") C" & vbCrLf
                    strOut = strOut & "FROM" & vbCrLf
                    strOut = strOut & "  T10;" & vbCrLf
                    Print #intFilenum, strOut
                End If
            Next k
        End If
    Next i
Next j

Print #intFilenum, "ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT OFF';"
Print #intFilenum, "ALTER SESSION SET EVENTS '10949 TRACE NAME CONTEXT OFF';"
Print #intFilenum, "SET TIMING OFF"
'Print #intFilenum, "SET AUTOTRACE OFF"

Close #intFilenum

————————————————————–

For those with a Windows client computer that does not have Excel installed, the following script may be saved with a VBS extension and executed directly from Windows to generate the test script.

Dim objFSO
Dim objFile
Dim strOut
Dim i
Dim j
Dim k
Dim intFlag

Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objFile = objFSO.CreateTextFile("C:\ColumnOrder2Script.sql", True)

strOut = "CREATE TABLE T10 (" & vbCrLf

For i = 1 To 1000
   strOut = strOut & "  C" & i & " VARCHAR2(10)," & vbCrLf
Next

strOut = Left(strOut, Len(strOut) - 3) & ");" & vbCrLf

objFile.Write strOut & vbCrLf

objFile.Write "ALTER TABLE T10 CACHE;" & vbCrLf & vbCrLf

strOut = "INSERT INTO" & vbCrLf
strOut = strOut & "  T10" & vbCrLf
strOut = strOut & "SELECT" & vbCrLf

For i = 1 To 1000
    intFlag = False
    If i Mod 50 = 0 Then
        intFlag = True
    Else
        Select Case i
            Case 1, 255, 256, 510, 511, 765, 766, 1000
                intFlag = True
        End Select
    End If
    If intFlag = True Then
        strOut = strOut & "  RPAD(CHR(65+MOD(ROWNUM-1,20))||'" & i & "',10,'A')," & vbCrLf
    Else
        strOut = strOut & "  NULL," & vbCrLf
    End If
Next

strOut = Left(strOut, Len(strOut) - 3) & vbCrLf
strOut = strOut & "FROM" & vbCrLf
strOut = strOut & "  DUAL" & vbCrLf
strOut = strOut & "CONNECT BY" & vbCrLf
strOut = strOut & "  LEVEL<=1000000;" & vbCrLf

objFile.Write strOut & vbCrLf

objFile.Write "COMMIT;" & vbCrLf & vbCrLf
objFile.Write "EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T10',CASCADE=>TRUE)" & vbCrLf
objFile.Write "" & vbCrLf
'objFile.Write "SET AUTOTRACE TRACEONLY STATISTICS" & vbCrLf
objFile.Write "" & vbCrLf
objFile.Write "ALTER SESSION SET EVENTS '10949 TRACE NAME CONTEXT FOREVER, LEVEL 1';" & vbCrLf
objFile.Write "ALTER SESSION SET TRACEFILE_IDENTIFIER = 'IGNORE';" & vbCrLf
objFile.Write "ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT FOREVER, LEVEL 8';" & vbCrLf & vbCrLf

For j = 1 To 2
    If j = 2 Then
        objFile.Write "SET TIMING ON" & vbCrLf
        objFile.Write "SPOOL COLUMN_ORDER2.TXT" & vbCrLf
        objFile.Write "ALTER SESSION SET TRACEFILE_IDENTIFIER = 'COLUMN_ORDER_TEST2';" & vbCrLf & vbCrLf
    End If
    For i = 1 To 1000
        intFlag = False
        If i Mod 50 = 0 Then
            intFlag = True
        Else
            Select Case i
                Case 1, 255, 256, 510, 511, 765, 766, 1000
                    intFlag = True
            End Select
        End If
        If intFlag = True Then
            For k = 1 To 10
                If (j = 2) Or (k <= 3) Then
                    'If on the second pass - the one that counts, run SQL 10 times
                    strOut = "SELECT" & vbCrLf
                    strOut = strOut & "  COUNT(C" & i & ") C" & vbCrLf
                    strOut = strOut & "FROM" & vbCrLf
                    strOut = strOut & "  T10;" & vbCrLf
                    objFile.Write strOut & vbCrLf
                End If
            Next
        End If
    Next
Next

objFile.Write "ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT OFF';" & vbCrLf
objFile.Write "ALTER SESSION SET EVENTS '10949 TRACE NAME CONTEXT OFF';" & vbCrLf
objFile.Write "SET TIMING OFF" & vbCrLf
'objFile.Write "SET AUTOTRACE OFF" & vbCrLf

objFile.Close

set objFSO = Nothing
Set objFile = Nothing

ColumnOrder2ScriptCreator.vbs

————————————————————–

The above script might be a little complicated to understand.  What the script does:

  • Create a 1,000 column table with columns defined as VARCHAR2(10) – the columns are named C1, C2, C3, …, C999, C1000.
  • Alters the table to hopefully instruct Oracle to keep the table’s blocks in the buffer cache after those block have been read.
  • Inserts a letter between “A” and “Z” followed by the column position, padded to 10 characters, in every 50th column, as well as the columns that should mark the start column and end column of each row piece (columns 1, 255, 256, 510, 511, 765, 766, 1000).
  • Gathers statistics on the table and any indexes (none in this case).
  • Disables serial direct path read by setting event 10949 (try taking that line out of the script to see how the performance changes).
  • Creates essentially 2 stages of querying the COUNT of every 50th column, in addition to the columns that should represent the end points of each row piece (1, 255, 256, 510, 511, 765, 766, 1000).  The first stage is used to load the blocks into the buffer cache and allow Oracle’s optimizer to settle on the final execution plan (this seems to be necessary on Oracle Database 11.2.0.1), each query is executed three times in this stage.  For the second stage, the trace file identifier (for the 10046 trace) is changed, and each query is listed 10 times to help average the execution time.

Note that the original script included a bug that did not prevent the query output from being suppressed – the SET AUTOTRACE line was incorrectly specified.  I suggest leaving that line commented out in the script generating code, but change the script so that an alias is not assigned to the COUNT() column or assign a unique alias to the column so that the timing that is output on screen may be easily related to the column that is being selected.  The generated script may also be downloaded here: ColumnOrder2Script.sql (strip off the .doc extension and open with a text editor, or run directly with SQL*Plus).

If you add a /*+ Find Me 2 */ comment to each of the SQL statements and change the STATISTICS_LEVEL parameter to ALL, you can use the following to retrieve the actual execution plans and execution statistics:

SET TIMING OFF
SET PAGESIZE 1000
SET LINESIZE 160

SELECT /*+ LEADING(S) */
  T.PLAN_TABLE_OUTPUT
FROM
  (SELECT
    SQL_ID,
    CHILD_NUMBER
  FROM
    V$SQL
  WHERE
    SQL_TEXT LIKE 'SELECT /*+ Find Me 2 */%') S,
  TABLE(DBMS_XPLAN.DISPLAY_CURSOR(S.SQL_ID,S.CHILD_NUMBER,'ALLSTATS LAST +COST')) T;

SPOOL OFF

So, what were the results when the script was executed on 64 bit Oracle 11.2.0.1 on the Linux and Windows platforms (with equivalent hardware) using an 8KB block size in an ASSM autoallocate tablespace?

The spool files:

The TKPROF summaries of the 10046 trace files:

In the following summary table, the TKPROF elapsed time totals for the 10 executions of counting each of the non-NULL VARCHAR2(10) columns in the test table is displayed, as well as the elapsed time for that execution divided by the elapsed time for the count of the first column (updated May 24, 2010 to include results from 64 bit Windows running 11.1.0.7 for comparison):

The Factor of C1 column indicates how many times longer the count of that column required compared to the first column.  Counting column 1,000 on Linux required 8.2 times as much time as counting column 1.  On the Windows platform, counting column 1,000 required 7.92 times as much time as counting column C1.  If you review the TKPROF summaries for both platforms you will see an interesting, possibly unexpected, behavior with the way the number of consistent gets increased the further away from the first column our query counted.  The query of column C1 required 1,819,010 consistent gets for 10 executions, while the query of column C1000 required 31,819,010 consistent gets for 10 executions – when you consider that each row required roughly 1300 bytes, that is a significant number of consistent gets for a table whose rows never expanded in size after the initial insert.  The increase in consistent gets is not linear, and neither is the increase in execution time.





Column Order in a Table – Does it Matter? 1

22 05 2010

May 22, 2010 (Updated May 23, 2010)

(Forward to the Next Post in the Series)

This week, between periodic outages, a couple of message threads appeared on the OTN forums that caught my attention.  The first is a thread in which the original poster asked how to add a column between two existing columns in a table – a task that is fairly easy to do in a database such as Microsoft Access, but is not possible without recreating the table in an Oracle database.  When I first started working with Oracle databases, I too wanted to see the columns listed in a specific, logical order in the tables that I created.  The general response in the thread was that the column order did not matter because it is possible to select the table’s columns in whatever order is preferred.

The second thread included a question asking whether column order could potentially affect the performance of a query.  Responders in the thread provided several different answers that boiled down to:

  • Not a chance.
  • Yes with the rule based optimizer, but not with the cost-based optimizer
  • Maybe

In 2008 I read the book Troubleshooting Oracle Performance, and recently started re-reading the book.  In the book I recall reading a couple of paragraphs that suggested that the order of the columns in a table may, in fact, affect the performance of queries.  How?

  • The calculated CPU_COST for a line in an execution plan increases – accessing the first column will have a slightly lower CPU_COST than would accessing the second column in a table.  This change in the calculated CPU_COST could cause a different execution plan to be generated.
  • With variable length columns in a table, Oracle is not able to quickly determine the starting byte location of the tenth column without first determining the byte length of the first nine columns.  While this calculation may not seem significant, it can increase execution time, especially for servers with slower CPUs and higher latency memory.
  • Edit: See the first comment about row chaining, written by Chris Antognini below.

Page 529 of the book Troubleshooting Oracle Performance includes a helpful chart that shows that, for a particular query, accessing the 250th column in a table required roughly five times as long as accessing the first column.  There are of course a lot of variables at play including:

  • The speed of the server.
  • The table design – what type of columns were in the table, what were the column lengths, and how many columns were in the table.
  • The background activity in the server due to the other sessions competing for CPU, memory, and disk access.
  • The number of times the test was repeated.
  • Were all of the blocks in memory, or were physical reads required.

I commented in the second thread, stating that a couple people have reported that their testing found that the position of a column in a table could affect performance, and that is why the optimizer generates a different CPU_COST component for each column in a table.  I did not have an independent test case to verify the results that others obtained, so I posted links to the results that were provided by other people who have tested a column position’s effect on performance.  Was it a bad choice for me to provide an answer without first verifying that it could happen?

I think that we need to put together a test case to see if the column order really matters.  It is a little challenging to put together a test case where the outcome is not affected by spurious network interference, client-side delays, execution statistics calculations, Oracle Database release specific optimizations, and other similar issues.  For example, if you execute a SQL statement twice from a client on an otherwise idle server and see that the number of consistent gets increased on the second execution, that could throw off the accuracy of any intended benchmark test (this behavior was encountered on the 64 bit version of Oracle Database 11.2.0.1).

So, how to test?  We will create a test script that uses PL/SQL to repeatedly execute the same SQL statement, potentially with different WHERE clause predicates, to limit the effects of adaptive cursor sharing and the various other potential problem areas.  The server will run the 64 bit version of Oracle Database 11.2.0.1 on the Linux and Windows platforms with essentially the same hardware to help determine the effects of excessive context switching, and potential optimizations on each operating system platform – there may still be too many variables to state that one operating system platform is more efficient than the other for this specific test script.   The database instance will be configured with the SGA_TARGET set to 8000M, and the FILESYSTEMIO_OPTIONS parameter set to SETALL to enable direct, asychronous I/O (the default on the Windows platform).  As best as possible, this test will seek to factor out delays caused by physical reads, so event 10949 will be set to limit the chances of Oracle switching to physical direct path reads during the portion of the test that performs full table scans – ideally, the entire table and indexes will reside in the buffer cache.

First, we will create a test table with the same set of column definitions in the first couple of columns of the table as are present in the last columns of the table, with several VARCHAR2 columns in between.  Several indexes will be created which may be used for additional tests to determine what happens when the table columns referenced by an index are not stored in numerical/alphabetic order – not all of the indexes will be used for this test.

CREATE TABLE T11 (
  N0 NUMBER,
  N1 NUMBER,
  ND2 NUMBER,
  V3 VARCHAR2(30),
  VD4 VARCHAR2(30),
  D5 DATE,
  DD6 DATE,
  V7 VARCHAR2(50),
  V8 VARCHAR2(50),
  V9 VARCHAR2(50),
  V10 VARCHAR2(50),
  V11 VARCHAR2(50),
  V12 VARCHAR2(50),
  V13 VARCHAR2(50),
  V14 VARCHAR2(50),
  V15 VARCHAR2(50),
  V16 VARCHAR2(50),
  V17 VARCHAR2(50),
  V18 VARCHAR2(50),
  V19 VARCHAR2(50),
  V20 VARCHAR2(50),
  V21 VARCHAR2(50),
  V22 VARCHAR2(50),
  V23 VARCHAR2(50),
  V24 VARCHAR2(50),
  V25 VARCHAR2(50),
  V26 VARCHAR2(50),
  V27 VARCHAR2(50),
  V28 VARCHAR2(50),
  V29 VARCHAR2(50),
  V30 VARCHAR2(50),
  V31 VARCHAR2(50),
  V32 VARCHAR2(50),
  V33 VARCHAR2(50),
  V34 VARCHAR2(50),
  V35 VARCHAR2(50),
  V36 VARCHAR2(50),
  V37 VARCHAR2(50),
  V38 VARCHAR2(50),
  V39 VARCHAR2(50),
  V40 VARCHAR2(50),
  N41 NUMBER,
  ND42 NUMBER,
  V43 VARCHAR2(30),
  VD44 VARCHAR2(30),
  D45 DATE,
  DD46 DATE);

INSERT INTO
  T11
SELECT
  ROWNUM,
  ROWNUM,
  1000000-ROWNUM,
  CHR(MOD(ROWNUM-1,26)+65)||CHR(MOD(ROWNUM,26)+65)||CHR(MOD(ROWNUM+1,26)+65)||
    CHR(MOD(ROWNUM+2,26)+65)||CHR(MOD(ROWNUM+3,26)+65)||CHR(MOD(ROWNUM+4,26)+65),
  REVERSE(CHR(MOD(ROWNUM-1,26)+65)||CHR(MOD(ROWNUM,26)+65)||CHR(MOD(ROWNUM+1,26)+65)||
    CHR(MOD(ROWNUM+2,26)+65)||CHR(MOD(ROWNUM+3,26)+65)||CHR(MOD(ROWNUM+4,26)+65)),
  TRUNC(SYSDATE) + ROWNUM/1000,
  TRUNC(SYSDATE) + (1000000-ROWNUM)/1000,
  TO_CHAR(ROWNUM),
  TO_CHAR(1000000-ROWNUM),
  NULL,
  NULL,
  NULL,
  NULL,
  NULL,
  NULL,
  NULL,
  NULL,
  NULL,
  NULL,
  NULL,
  NULL,
  NULL,
  NULL,
  NULL,
  NULL,
  NULL,
  NULL,
  NULL,
  NULL,
  NULL,
  NULL,
  NULL,
  NULL,
  NULL,
  NULL,
  NULL,
  NULL,
  NULL,
  NULL,
  TO_CHAR(ROWNUM),
  TO_CHAR(1000000-ROWNUM),
  ROWNUM,
  1000000-ROWNUM,
  CHR(MOD(ROWNUM-1,26)+65)||CHR(MOD(ROWNUM,26)+65)||CHR(MOD(ROWNUM+1,26)+65)||
    CHR(MOD(ROWNUM+2,26)+65)||CHR(MOD(ROWNUM+3,26)+65)||CHR(MOD(ROWNUM+4,26)+65),
  REVERSE(CHR(MOD(ROWNUM-1,26)+65)||CHR(MOD(ROWNUM,26)+65)||CHR(MOD(ROWNUM+1,26)+65)||
    CHR(MOD(ROWNUM+2,26)+65)||CHR(MOD(ROWNUM+3,26)+65)||CHR(MOD(ROWNUM+4,26)+65)),
  TRUNC(SYSDATE) + ROWNUM/1000,
  TRUNC(SYSDATE) + (1000000-ROWNUM)/1000
FROM
  (SELECT
    ROWNUM RN
  FROM
    DUAL
  CONNECT BY
    LEVEL<=1000) V1,
  (SELECT
    ROWNUM RN
  FROM
    DUAL
  CONNECT BY
    LEVEL<=1000) V2;

COMMIT;

CREATE INDEX T11_N1 ON T11(N1);
CREATE INDEX T11_N41 ON T11(N41);
CREATE INDEX T11_ND2 ON T11(ND2);
CREATE INDEX T11_ND42 ON T11(ND42);
CREATE INDEX T11_V3 ON T11(V3);
CREATE INDEX T11_VD4 ON T11(VD4);
CREATE INDEX T11_V43 ON T11(V43);
CREATE INDEX T11_VD44 ON T11(VD44);
CREATE INDEX T11_D5 ON T11(D5);
CREATE INDEX T11_DD6 ON T11(DD6);
CREATE INDEX T11_D45 ON T11(D45);
CREATE INDEX T11_DD46 ON T11(DD46);

ALTER TABLE T11 CACHE;

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T11',CASCADE=>TRUE)

In the above, notice that a NULL value was assigned to most of the columns located between the two repeating set of column definitions.  1,000,000 rows were inserted into the table.

The testing script that we will use follows.  Essentially, the script is composed of serveral anonymous PL/SQL blocks that compare the execution performance of repeatedly selecting from the first set of columns with the execution performance of repeatedly selecting from the last set of columns.  The first set of tests attempts to test performance using a large number of index range scan accesses, while the second set of tests attempts to test performance using a smaller number of full table scan accesses.

The test script will be repeated 3 times: 1) to flood the buffer cache with the blocks used by the tests (these results will be discarded), 2) with 10046 trace at level 8 enabled, 3) without a 10046 trace enabled.

SET TIMING ON
SET SERVEROUTPUT ON SIZE 1000000

ALTER SESSION SET TRACEFILE_IDENTIFIER = 'COLUMN_POSITION_TEST';
ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT FOREVER, LEVEL 8';

SPOOL COLUMN_POSITION_TEST.TXT

DECLARE
  tsStart TIMESTAMP := SYSTIMESTAMP;

  CURSOR cuCursor(intVALUE IN NUMBER) IS
    SELECT /*+ FIND_ME */
      N1,
      ND2
    FROM
      T11
    WHERE
      N1 BETWEEN intVALUE AND INTVALUE+10000;

  TYPE TYPE_cuCursor IS TABLE OF cuCursor%ROWTYPE
    INDEX BY BINARY_INTEGER;

  T_cuCursor TYPE_cuCursor;

BEGIN
  FOR I IN 1..100000 LOOP
    OPEN cuCursor(I);

    LOOP
      FETCH cuCursor BULK COLLECT INTO T_cuCursor LIMIT 1000;
      EXIT WHEN T_cuCursor.COUNT = 0;

      NULL;
    END LOOP;

    CLOSE cuCursor;
  END LOOP;

  DBMS_OUTPUT.PUT_LINE('N1 ND2: '||TO_CHAR(SYSTIMESTAMP-tsStart));
END;
/

DECLARE
  tsStart TIMESTAMP := SYSTIMESTAMP;

  CURSOR cuCursor(intVALUE IN NUMBER) IS
    SELECT /*+ FIND_ME */
      N41,
      ND42
    FROM
      T11
    WHERE
      N41 BETWEEN intVALUE AND INTVALUE+10000;

  TYPE TYPE_cuCursor IS TABLE OF cuCursor%ROWTYPE
    INDEX BY BINARY_INTEGER;

  T_cuCursor TYPE_cuCursor;

BEGIN
  FOR I IN 1..100000 LOOP
    OPEN cuCursor(I);

    LOOP
      FETCH cuCursor BULK COLLECT INTO T_cuCursor LIMIT 1000;
      EXIT WHEN T_cuCursor.COUNT = 0;

      NULL;
    END LOOP;

    CLOSE cuCursor;
  END LOOP;

  DBMS_OUTPUT.PUT_LINE('N41 ND42: '||TO_CHAR(SYSTIMESTAMP-tsStart));
END;
/

DECLARE
  tsStart TIMESTAMP := SYSTIMESTAMP;

  CURSOR cuCursor(intVALUE IN NUMBER) IS
    SELECT /*+ FIND_ME */
      V3,
      VD4
    FROM
      T11
    WHERE
      N1 BETWEEN intVALUE AND INTVALUE+10000;

  TYPE TYPE_cuCursor IS TABLE OF cuCursor%ROWTYPE
    INDEX BY BINARY_INTEGER;

  T_cuCursor TYPE_cuCursor;

BEGIN
  FOR I IN 1..100000 LOOP
    OPEN cuCursor(I);

    LOOP
      FETCH cuCursor BULK COLLECT INTO T_cuCursor LIMIT 1000;
      EXIT WHEN T_cuCursor.COUNT = 0;

      NULL;
    END LOOP;

    CLOSE cuCursor;
  END LOOP;

  DBMS_OUTPUT.PUT_LINE('V3 VD4: '||TO_CHAR(SYSTIMESTAMP-tsStart));
END;
/

DECLARE
  tsStart TIMESTAMP := SYSTIMESTAMP;

  CURSOR cuCursor(intVALUE IN NUMBER) IS
    SELECT /*+ FIND_ME */
      V43,
      VD44
    FROM
      T11
    WHERE
      N41 BETWEEN intVALUE AND INTVALUE+10000;

  TYPE TYPE_cuCursor IS TABLE OF cuCursor%ROWTYPE
    INDEX BY BINARY_INTEGER;

  T_cuCursor TYPE_cuCursor;

BEGIN
  FOR I IN 1..100000 LOOP
    OPEN cuCursor(I);

    LOOP
      FETCH cuCursor BULK COLLECT INTO T_cuCursor LIMIT 1000;
      EXIT WHEN T_cuCursor.COUNT = 0;

      NULL;
    END LOOP;

    CLOSE cuCursor;
  END LOOP;

  DBMS_OUTPUT.PUT_LINE('V43 VD44: '||TO_CHAR(SYSTIMESTAMP-tsStart));
END;
/

DECLARE
  tsStart TIMESTAMP := SYSTIMESTAMP;

  CURSOR cuCursor(intVALUE IN NUMBER) IS
    SELECT /*+ FIND_ME */
      D5,
      DD6
    FROM
      T11
    WHERE
      N1 BETWEEN intVALUE AND INTVALUE+10000;

  TYPE TYPE_cuCursor IS TABLE OF cuCursor%ROWTYPE
    INDEX BY BINARY_INTEGER;

  T_cuCursor TYPE_cuCursor;

BEGIN
  FOR I IN 1..100000 LOOP
    OPEN cuCursor(I);

    LOOP
      FETCH cuCursor BULK COLLECT INTO T_cuCursor LIMIT 1000;
      EXIT WHEN T_cuCursor.COUNT = 0;

      NULL;
    END LOOP;

    CLOSE cuCursor;
  END LOOP;

  DBMS_OUTPUT.PUT_LINE('D5 DD6: '||TO_CHAR(SYSTIMESTAMP-tsStart));
END;
/

DECLARE
  tsStart TIMESTAMP := SYSTIMESTAMP;

  CURSOR cuCursor(intVALUE IN NUMBER) IS
    SELECT /*+ FIND_ME */
      D45,
      DD46
    FROM
      T11
    WHERE
      N41 BETWEEN intVALUE AND INTVALUE+10000;

  TYPE TYPE_cuCursor IS TABLE OF cuCursor%ROWTYPE
    INDEX BY BINARY_INTEGER;

  T_cuCursor TYPE_cuCursor;

BEGIN
  FOR I IN 1..100000 LOOP
    OPEN cuCursor(I);

    LOOP
      FETCH cuCursor BULK COLLECT INTO T_cuCursor LIMIT 1000;
      EXIT WHEN T_cuCursor.COUNT = 0;

      NULL;
    END LOOP;

    CLOSE cuCursor;
  END LOOP;

  DBMS_OUTPUT.PUT_LINE('D45 DD46: '||TO_CHAR(SYSTIMESTAMP-tsStart));
END;
/

ALTER SESSION SET EVENTS '10949 TRACE NAME CONTEXT FOREVER, LEVEL 1';

DECLARE
  tsStart TIMESTAMP := SYSTIMESTAMP;

  CURSOR cuCursor IS
    SELECT /*+ FIND_ME */
      N1,
      ND2
    FROM
      T11;

  TYPE TYPE_cuCursor IS TABLE OF cuCursor%ROWTYPE
    INDEX BY BINARY_INTEGER;

  T_cuCursor TYPE_cuCursor;

BEGIN
  FOR I IN 1..100 LOOP
    OPEN cuCursor;

    LOOP
      FETCH cuCursor BULK COLLECT INTO T_cuCursor LIMIT 1000;
      EXIT WHEN T_cuCursor.COUNT = 0;

      NULL;
    END LOOP;

    CLOSE cuCursor;
  END LOOP;

  DBMS_OUTPUT.PUT_LINE('N1 ND2 2: '||TO_CHAR(SYSTIMESTAMP-tsStart));
END;
/

DECLARE
  tsStart TIMESTAMP := SYSTIMESTAMP;

  CURSOR cuCursor IS
    SELECT /*+ FIND_ME */
      N41,
      ND42
    FROM
      T11;

  TYPE TYPE_cuCursor IS TABLE OF cuCursor%ROWTYPE
    INDEX BY BINARY_INTEGER;

  T_cuCursor TYPE_cuCursor;

BEGIN
  FOR I IN 1..100 LOOP
    OPEN cuCursor;

    LOOP
      FETCH cuCursor BULK COLLECT INTO T_cuCursor LIMIT 1000;
      EXIT WHEN T_cuCursor.COUNT = 0;

      NULL;
    END LOOP;

    CLOSE cuCursor;
  END LOOP;

  DBMS_OUTPUT.PUT_LINE('N41 ND42 2: '||TO_CHAR(SYSTIMESTAMP-tsStart));
END;
/

DECLARE
  tsStart TIMESTAMP := SYSTIMESTAMP;

  CURSOR cuCursor IS
    SELECT /*+ FIND_ME */
      V3,
      VD4
    FROM
      T11;

  TYPE TYPE_cuCursor IS TABLE OF cuCursor%ROWTYPE
    INDEX BY BINARY_INTEGER;

  T_cuCursor TYPE_cuCursor;

BEGIN
  FOR I IN 1..100 LOOP
    OPEN cuCursor;

    LOOP
      FETCH cuCursor BULK COLLECT INTO T_cuCursor LIMIT 1000;
      EXIT WHEN T_cuCursor.COUNT = 0;

      NULL;
    END LOOP;

    CLOSE cuCursor;
  END LOOP;

  DBMS_OUTPUT.PUT_LINE('V3 VD4 2: '||TO_CHAR(SYSTIMESTAMP-tsStart));
END;
/

DECLARE
  tsStart TIMESTAMP := SYSTIMESTAMP;

  CURSOR cuCursor IS
    SELECT /*+ FIND_ME */
      V43,
      VD44
    FROM
      T11;

  TYPE TYPE_cuCursor IS TABLE OF cuCursor%ROWTYPE
    INDEX BY BINARY_INTEGER;

  T_cuCursor TYPE_cuCursor;

BEGIN
  FOR I IN 1..100 LOOP
    OPEN cuCursor;

    LOOP
      FETCH cuCursor BULK COLLECT INTO T_cuCursor LIMIT 1000;
      EXIT WHEN T_cuCursor.COUNT = 0;

      NULL;
    END LOOP;

    CLOSE cuCursor;
  END LOOP;

  DBMS_OUTPUT.PUT_LINE('V43 VD44 2: '||TO_CHAR(SYSTIMESTAMP-tsStart));
END;
/

DECLARE
  tsStart TIMESTAMP := SYSTIMESTAMP;

  CURSOR cuCursor IS
    SELECT /*+ FIND_ME */
      D5,
      DD6
    FROM
      T11;

  TYPE TYPE_cuCursor IS TABLE OF cuCursor%ROWTYPE
    INDEX BY BINARY_INTEGER;

  T_cuCursor TYPE_cuCursor;

BEGIN
  FOR I IN 1..100 LOOP
    OPEN cuCursor;

    LOOP
      FETCH cuCursor BULK COLLECT INTO T_cuCursor LIMIT 1000;
      EXIT WHEN T_cuCursor.COUNT = 0;

      NULL;
    END LOOP;

    CLOSE cuCursor;
  END LOOP;

  DBMS_OUTPUT.PUT_LINE('D5 DD6 2: '||TO_CHAR(SYSTIMESTAMP-tsStart));
END;
/

DECLARE
  tsStart TIMESTAMP := SYSTIMESTAMP;

  CURSOR cuCursor IS
    SELECT /*+ FIND_ME */
      D45,
      DD46
    FROM
      T11;

  TYPE TYPE_cuCursor IS TABLE OF cuCursor%ROWTYPE
    INDEX BY BINARY_INTEGER;

  T_cuCursor TYPE_cuCursor;

BEGIN
  FOR I IN 1..100 LOOP
    OPEN cuCursor;

    LOOP
      FETCH cuCursor BULK COLLECT INTO T_cuCursor LIMIT 1000;
      EXIT WHEN T_cuCursor.COUNT = 0;

      NULL;
    END LOOP;

    CLOSE cuCursor;
  END LOOP;

  DBMS_OUTPUT.PUT_LINE('D45 DD46 2: '||TO_CHAR(SYSTIMESTAMP-tsStart));
END;
/

ALTER SESSION SET EVENTS '10949 TRACE NAME CONTEXT OFF';
ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT OFF';

SPOOL OFF

What were the test results, does column position in a table matter?  Is performance improved by locating frequently accessed columns in the first couple of column positions?  Does the operating system platform matter (there are a lot of factors contributing to any time differences)?

The script output should look something like this, which is the output from the Linux execution of the script with a 10046 trace enabled (blank spaces between lines were adjusted to group related time values):

N1 ND2: +000000000 00:12:14.750567000
PL/SQL procedure successfully completed.
Elapsed: 00:12:14.77

N41 ND42: +000000000 00:14:13.846775000
PL/SQL procedure successfully completed.
Elapsed: 00:14:13.87

V3 VD4: +000000000 00:12:28.702281000
PL/SQL procedure successfully completed.
Elapsed: 00:12:28.72

V43 VD44: +000000000 00:14:28.663870000
PL/SQL procedure successfully completed.
Elapsed: 00:14:28.68

D5 DD6: +000000000 00:11:58.169894000
PL/SQL procedure successfully completed.
Elapsed: 00:11:58.19

D45 DD46: +000000000 00:13:52.969594000
PL/SQL procedure successfully completed.
Elapsed: 00:13:52.98

Session altered.

N1 ND2 2: +000000000 00:00:41.202708000
PL/SQL procedure successfully completed.
Elapsed: 00:00:41.23

N41 ND42 2: +000000000 00:00:47.222258000
PL/SQL procedure successfully completed.
Elapsed: 00:00:47.22

V3 VD4 2: +000000000 00:00:42.662874000
PL/SQL procedure successfully completed.
Elapsed: 00:00:42.68

V43 VD44 2: +000000000 00:00:50.135338000
PL/SQL procedure successfully completed.
Elapsed: 00:00:50.13

D5 DD6 2: +000000000 00:00:38.307032000
PL/SQL procedure successfully completed.
Elapsed: 00:00:38.31

D45 DD46 2: +000000000 00:00:47.696470000
PL/SQL procedure successfully completed.
Elapsed: 00:00:47.70

In the above, the N1 and ND2 test run shows the performance when the second and third columns, both of which are numbers, were accessed using the index on the N1 column.  The N41 and ND42 test run shows the performance when the 42nd and 43rd columns, both of which are numbers, were accessed using the index on the N41 column.  Thus, the N indicates a numeric column, the V indicates a varchar2 column, and the D indicates a date column, all accessed with either the index on  the N1 column or N41 column.

Below is a summary of the test results when accessing the various columns.

The following code may be used to retrieve the execution plans just to confirm that the expected execution plan was selected, but we have something much better – a 10046 trace file for the execution.

SET TIMING OFF
SET PAGESIZE 1000
SET LINESIZE 160
SPOOL COL_POSITION_PLANS.TXT

SELECT /*+ LEADING(S) */
  T.PLAN_TABLE_OUTPUT
FROM
  (SELECT
    SQL_ID,
    CHILD_NUMBER
  FROM
    V$SQL
  WHERE
    SQL_TEXT LIKE 'SELECT /*+ FIND_ME */%') S,
  TABLE(DBMS_XPLAN.DISPLAY_CURSOR(S.SQL_ID,S.CHILD_NUMBER,'TYPICAL')) T;

SPOOL OFF

The TKPROF output for the Linux and Windows executions are attached below.  When downloading, strip off the .doc extension and open the files with a text viewer (Notepad, vi, jed, etc.).

ColumnOrder1TKPROF_LinuxX64.txt
ColumnOrder1TKPROF_WindowsX64.txt

So, how much of an impact does the column order have in your environment?  Oracle databases support up to 1000 columns per table.  How might the performance of selecting data from the second column compare to selecting data from the 902nd column?  How might the outcome change if physical block reads were required?  How might the outcome change if there was significant competition for the CPU cycles in the server?

Feel free to modify and extend the test script.








Follow

Get every new post delivered to your Inbox.

Join 137 other followers