Ramming Two SQL Statements Together – To Scalar or Too Scalar

30 05 2010

May 30, 2010

A recent comp.databases.oracle.server message thread about combining SQL statements triggered a bit of testing.  In light of some of my later comments in that thread, I thought I might start out by first showing the summary of the performance results.

In the above, the left side shows the TKPROF output from running a query that uses a standard join between four tables, but would probably behave about the same if it was a query with one table and the other three tables were specified in inline views.  The right side shows a query that produces identical results as the first, but uses SELECT statements in three columns of the parent SELECT statement that accesses a single table – there are three scalar (thanks Joel for looking up that term) subqueries in the SQL statement.  Each row in the above output shows what happened to the statistics output by TKPROF when the range of rows selected increased from 6,366 to 10,000,000 by changing the WHERE clause so that the upper range of the range scan in the query increased from 10 to 10,000.

By the end of the Server 1 test, where the test was executed directly on the server, the first query completed roughly 4.49 times faster than the second query, and required 102.27 times fewer consistent gets than the second query.

By the end of the Server 2 test, where the test was executed remotely across a gigabit network against a much slower server with 1/10 the number of rows in the main table, the first query completed roughly 6.27 times faster, and required 121.08 times fewer consistent gets than the second query.

The obvious conclusion is that scalar subqueries should not be used in column positions because they are much slower, and require a lot more consistent gets.

——–

I hear someone in the back shouting, “But, but… where is your evidence?  How can I verify that what you state to be correct is correct.  How can I be satisfied that my server isn’t substantially slower than your server 2?  Why were there all of those physical reads?  What were the execution plans, and did you see any unexpected Cartesian joins?  What did the SQL statements look like?  What were the table definitions”

Is it not enough to state that I achieved a 627% performance improvement by just making a small change to a SQL statement?  :-)  Of course not – switching now to the approach used by the other book that I have on pre-order.

Here is the test case that was executed for the remote test with Server 2:

CREATE TABLE T1 (
  ID NUMBER,
  DESCRIPTION VARCHAR2(80));

INSERT INTO T1
SELECT
  CEIL(ABS(SIN(ROWNUM/9.9999)*10000)),
  'This is the long description for this number '|| TO_CHAR(CEIL(ABS(SIN(ROWNUM/9.9999)*10000)))
FROM
  (SELECT
    ROWNUM RN
  FROM
    DUAL
  CONNECT BY
    LEVEL<=1000),
  (SELECT
    ROWNUM RN
  FROM
    DUAL
  CONNECT BY
    LEVEL<=1000);

CREATE INDEX IND_T1 ON T1(ID);

CREATE TABLE T2 AS
SELECT
  ROWNUM C1,
  LPAD('A',100,'A') C2
FROM
  DUAL
CONNECT BY
  LEVEL<=10000;

CREATE TABLE T3 AS
SELECT
  ROWNUM C1,
  LPAD('A',100,'A') C2
FROM
  DUAL
CONNECT BY
  LEVEL<=10000;

CREATE TABLE T4 AS
SELECT
  ROWNUM C1,
  LPAD('A',100,'A') C2
FROM
  DUAL
CONNECT BY
  LEVEL<=10000;

CREATE INDEX IND_T4 ON T4(C1);

COMMIT;

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T1',CASCADE=>TRUE,METHOD_OPT=>'FOR ALL COLUMNS SIZE 1')
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T2',CASCADE=>TRUE)
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T3',CASCADE=>TRUE)
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T4',CASCADE=>TRUE)

There are just 1,000,000 rows in table T1, and I did not allow the optimizer to create a histogram on the columns in table T1.  Now, let’s try a test script using the tables to see the expected execution plans (on Oracle Database 11.1.0.7):

SET AUTOTRACE TRACEONLY EXPLAIN

SELECT
  T1.ID,
  T2.C1 T2_C1,
  T3.C1 T3_C1,
  T4.C1 T4_C1
FROM
  T1,
  T2,
  T3,
  T4
WHERE
  T2.C1 BETWEEN 1 AND 200
  AND T2.C1=T3.C1
  AND T2.C1=T4.C1
  AND T2.C1=T1.ID;

SELECT
  T1.ID,
  (SELECT
    T2.C1
  FROM
    T2
  WHERE
    T1.ID=T2.C1) T2_C1,
  (SELECT
    T3.C1
  FROM
    T3
  WHERE
    T1.ID=T3.C1) T3_C1,
  (SELECT
    T4.C1
  FROM
    T4
  WHERE
    T1.ID=T4.C1) T4_C1
FROM
  T1
WHERE
  T1.ID BETWEEN 1 AND 200;

The first of the SQL statement directly joins the tables, while the second SQL statement places SELECT statements in column positions (scalar subqueries).  The output (11.1.0.7):

SQL> SELECT
  2    T1.ID,
  3    T2.C1 T2_C1,
  4    T3.C1 T3_C1,
  5    T4.C1 T4_C1
  6  FROM
  7    T1,
  8    T2,
  9    T3,
 10    T4
 11  WHERE
 12    T2.C1 BETWEEN 1 AND 200
 13    AND T2.C1=T3.C1
 14    AND T2.C1=T4.C1
 15    AND T2.C1=T1.ID;

Execution Plan
----------------------------------------------------------
Plan hash value: 3780653648

-------------------------------------------------------------------------------
| Id  | Operation            | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |        |   457 |  7312 | 10041   (1)| 00:00:41 |
|*  1 |  HASH JOIN           |        |   457 |  7312 | 10041   (1)| 00:00:41 |
|*  2 |   HASH JOIN          |        |   198 |  2376 |    46   (5)| 00:00:01 |
|*  3 |    HASH JOIN         |        |   199 |  1592 |    43   (3)| 00:00:01 |
|*  4 |     TABLE ACCESS FULL| T2     |   200 |   800 |    21   (0)| 00:00:01 |
|*  5 |     TABLE ACCESS FULL| T3     |   200 |   800 |    21   (0)| 00:00:01 |
|*  6 |    INDEX RANGE SCAN  | IND_T4 |   200 |   800 |     2   (0)| 00:00:01 |
|*  7 |   TABLE ACCESS FULL  | T1     | 20002 | 80008 |  9994   (1)| 00:00:41 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T2"."C1"="T1"."ID")
   2 - access("T2"."C1"="T4"."C1")
   3 - access("T2"."C1"="T3"."C1")
   4 - filter("T2"."C1"<=200 AND "T2"."C1">=1)
   5 - filter("T3"."C1"<=200 AND "T3"."C1">=1)
   6 - access("T4"."C1">=1 AND "T4"."C1"<=200)
   7 - filter("T1"."ID"<=200 AND "T1"."ID">=1)

SQL> SELECT
  2    T1.ID,
  3    (SELECT
  4      T2.C1
  5    FROM
  6      T2
  7    WHERE
  8      T1.ID=T2.C1) T2_C1,
  9    (SELECT
 10      T3.C1
 11    FROM
 12      T3
 13    WHERE
 14      T1.ID=T3.C1) T3_C1,
 15    (SELECT
 16      T4.C1
 17    FROM
 18      T4
 19    WHERE
 20      T1.ID=T4.C1) T4_C1
 21  FROM
 22    T1
 23  WHERE
 24    T1.ID BETWEEN 1 AND 200;

Execution Plan
----------------------------------------------------------
Plan hash value: 2945978589

----------------------------------------------------------------------------
| Id  | Operation         | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |        | 20002 | 80008 |  9994   (1)| 00:00:41 |
|*  1 |  TABLE ACCESS FULL| T2     |     1 |     4 |    21   (0)| 00:00:01 |
|*  2 |  TABLE ACCESS FULL| T3     |     1 |     4 |    21   (0)| 00:00:01 |
|*  3 |  INDEX RANGE SCAN | IND_T4 |     1 |     4 |     1   (0)| 00:00:01 |
|*  4 |  TABLE ACCESS FULL| T1     | 20002 | 80008 |  9994   (1)| 00:00:41 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("T2"."C1"=:B1)
   2 - filter("T3"."C1"=:B1)
   3 - access("T4"."C1"=:B1)
   4 - filter("T1"."ID"<=200 AND "T1"."ID">=1)

The execution plans look a bit different, and notice that the second execution plan shows bind variables in the Predicate Information section.  If we were to actually run the SQL statements, we might find that the first runs in about 15 seconds and the second in about 16-17 seconds, both with the same number of physical reads.  That is no fun, so let’s change the number 200 to 1200 to see what happens.  We will flush the buffer cache twice between executions to force physical reads for both executions (Oracle is set to use direct, asynchronous IO), and set the array fetch size to 1,000 to minimize the amount of unnecessary network traffic.  The test script follows:

SET TIMING ON
SET AUTOTRACE TRACEONLY STATISTICS;
ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE;
SET ARRAYSIZE 1000

SELECT
  T1.ID,
  T2.C1 T2_C1,
  T3.C1 T3_C1,
  T4.C1 T4_C1
FROM
  T1,
  T2,
  T3,
  T4
WHERE
  T2.C1 BETWEEN 1 AND 1200
  AND T2.C1=T3.C1
  AND T2.C1=T4.C1
  AND T2.C1=T1.ID;

ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE;

SELECT
  T1.ID,
  (SELECT
    T2.C1
  FROM
    T2
  WHERE
    T1.ID=T2.C1) T2_C1,
  (SELECT
    T3.C1
  FROM
    T3
  WHERE
    T1.ID=T3.C1) T3_C1,
  (SELECT
    T4.C1
  FROM
    T4
  WHERE
    T1.ID=T4.C1) T4_C1
FROM
  T1
WHERE
  T1.ID BETWEEN 1 AND 1200;

Here is the output of the above script:

SQL> SELECT
  2    T1.ID,
  3    T2.C1 T2_C1,
  4    T3.C1 T3_C1,
  5    T4.C1 T4_C1
  6  FROM
  7    T1,
  8    T2,
  9    T3,
 10    T4
 11  WHERE
 12    T2.C1 BETWEEN 1 AND 1200
 13    AND T2.C1=T3.C1
 14    AND T2.C1=T4.C1
 15    AND T2.C1=T1.ID;

76580 rows selected.

Elapsed: 00:00:15.96

Statistics
---------------------------------------------------
          1  recursive calls
          0  db block gets
      83197  consistent gets
      83110  physical reads
          0  redo size
    1288037  bytes sent via SQL*Net to client
       1217  bytes received via SQL*Net from client
         78  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      76580  rows processed

SQL> SELECT
  2    T1.ID,
  3    (SELECT
  4      T2.C1
  5    FROM
  6      T2
  7    WHERE
  8      T1.ID=T2.C1) T2_C1,
  9    (SELECT
 10      T3.C1
 11    FROM
 12      T3
 13    WHERE
 14      T1.ID=T3.C1) T3_C1,
 15    (SELECT
 16      T4.C1
 17    FROM
 18      T4
 19    WHERE
 20      T1.ID=T4.C1) T4_C1
 21  FROM
 22    T1
 23  WHERE
 24    T1.ID BETWEEN 1 AND 1200;

76580 rows selected.

Elapsed: 00:01:40.09

Statistics
---------------------------------------------------
          1  recursive calls
          0  db block gets
   10073639  consistent gets
      83110  physical reads
          0  redo size
    1288037  bytes sent via SQL*Net to client
       1217  bytes received via SQL*Net from client
         78  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      76580  rows processed

The number of consistent gets jumps significantly for the second SQL statement and so did the execution time (and CPU usage).  Here is the autotrace statistics for the second SQL statement (for comparison) when T1.ID BETWEEN 1 AND 200 was specified:
12732 rows selected.

Elapsed: 00:00:17.54

Statistics
---------------------------------------------------
          0  recursive calls
          0  db block gets
     522390  consistent gets
      83108  physical reads
          0  redo size
     196813  bytes sent via SQL*Net to client
        513  bytes received via SQL*Net from client
         14  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      12732  rows processed

While the above might be interesting, a slightly different table T1 was created for the Server 1 test script:

CREATE TABLE T1 (
  ID NUMBER,
  DESCRIPTION VARCHAR2(80));

INSERT INTO T1
SELECT
  CEIL(ABS(SIN(ROWNUM/9.9999)*10000)),
  'This is the long description for this number '|| TO_CHAR(CEIL(ABS(SIN(ROWNUM/9.9999)*10000)))
FROM
  (SELECT
    ROWNUM RN
  FROM
    DUAL
  CONNECT BY
    LEVEL<=1000),
  (SELECT
    ROWNUM RN
  FROM
    DUAL
  CONNECT BY
    LEVEL<=10000);

CREATE INDEX IND_T1 ON T1(ID);

CREATE TABLE T2 AS
SELECT
  ROWNUM C1,
  LPAD('A',100,'A') C2
FROM
  DUAL
CONNECT BY
  LEVEL<=10000;

CREATE TABLE T3 AS
SELECT
  ROWNUM C1,
  LPAD('A',100,'A') C2
FROM
  DUAL
CONNECT BY
  LEVEL<=10000;

CREATE TABLE T4 AS
SELECT
  ROWNUM C1,
  LPAD('A',100,'A') C2
FROM
  DUAL
CONNECT BY
  LEVEL<=10000;

CREATE INDEX IND_T4 ON T4(C1);

COMMIT;

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T1',CASCADE=>TRUE)
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T2',CASCADE=>TRUE)
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T3',CASCADE=>TRUE)
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T4',CASCADE=>TRUE)

This time I loaded table T1 with 10,000,000 rows and allowed the optimizer to collect a histogram on table T1, if it determined that a histogram would help.  The full test script on Server 1 is a bit long, so I will post an abbreviated version here.  First, let’s determine the execution plans for the SQL statements (Oracle Database 11.2.0.1 with 8,000MB SGA_TARGET):

SET AUTOTRACE TRACEONLY EXPLAIN
SET LINESIZE 160
SET TRIMSPOOL ON
SET PAGESIZE 2000
SPOOL SCALAR_EXECUTION_PLANS.TXT

SELECT
  T1.ID,
  T2.C1 T2_C1,
  T3.C1 T3_C1,
  T4.C1 T4_C1
FROM
  T1,
  T2,
  T3,
  T4
WHERE
  T2.C1 BETWEEN 1 AND 10
  AND T2.C1=T3.C1
  AND T2.C1=T4.C1
  AND T2.C1=T1.ID;

SELECT
  T1.ID,
  (SELECT
    T2.C1
  FROM
    T2
  WHERE
    T1.ID=T2.C1) T2_C1,
  (SELECT
    T3.C1
  FROM
    T3
  WHERE
    T1.ID=T3.C1) T3_C1,
  (SELECT
    T4.C1
  FROM
    T4
  WHERE
    T1.ID=T4.C1) T4_C1
FROM
  T1
WHERE
  T1.ID BETWEEN 1 AND 10;

SELECT
  T1.ID,
  T2.C1 T2_C1,
  T3.C1 T3_C1,
  T4.C1 T4_C1
FROM
  T1,
  T2,
  T3,
  T4
WHERE
  T2.C1 BETWEEN 1 AND 50
  AND T2.C1=T3.C1
  AND T2.C1=T4.C1
  AND T2.C1=T1.ID;

SELECT
  T1.ID,
  (SELECT
    T2.C1
  FROM
    T2
  WHERE
    T1.ID=T2.C1) T2_C1,
  (SELECT
    T3.C1
  FROM
    T3
  WHERE
    T1.ID=T3.C1) T3_C1,
  (SELECT
    T4.C1
  FROM
    T4
  WHERE
    T1.ID=T4.C1) T4_C1
FROM
  T1
WHERE
  T1.ID BETWEEN 1 AND 50;

...

SELECT
  T1.ID,
  T2.C1 T2_C1,
  T3.C1 T3_C1,
  T4.C1 T4_C1
FROM
  T1,
  T2,
  T3,
  T4
WHERE
  T2.C1 BETWEEN 1 AND 10000
  AND T2.C1=T3.C1
  AND T2.C1=T4.C1
  AND T2.C1=T1.ID;

SELECT
  T1.ID,
  (SELECT
    T2.C1
  FROM
    T2
  WHERE
    T1.ID=T2.C1) T2_C1,
  (SELECT
    T3.C1
  FROM
    T3
  WHERE
    T1.ID=T3.C1) T3_C1,
  (SELECT
    T4.C1
  FROM
    T4
  WHERE
    T1.ID=T4.C1) T4_C1
FROM
  T1
WHERE
  T1.ID BETWEEN 1 AND 10000;

SPOOL OFF

Then to check the performance results, the script that actually executes the statements (Oracle Database 11.2.0.1 with 8,000MB SGA_TARGET):

SET TIMING ON
SET AUTOTRACE TRACEONLY STATISTICS;
SET ARRAYSIZE 1000
SPOOL SCALAR_TEST_RESULTS.TXT
ALTER SESSION SET TRACEFILE_IDENTIFIER = 'SCALAR_TEST';
ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT FOREVER, LEVEL 8';

ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE;

SELECT
  T1.ID,
  T2.C1 T2_C1,
  T3.C1 T3_C1,
  T4.C1 T4_C1
FROM
  T1,
  T2,
  T3,
  T4
WHERE
  T2.C1 BETWEEN 1 AND 10
  AND T2.C1=T3.C1
  AND T2.C1=T4.C1
  AND T2.C1=T1.ID;

ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE;

SELECT
  T1.ID,
  (SELECT
    T2.C1
  FROM
    T2
  WHERE
    T1.ID=T2.C1) T2_C1,
  (SELECT
    T3.C1
  FROM
    T3
  WHERE
    T1.ID=T3.C1) T3_C1,
  (SELECT
    T4.C1
  FROM
    T4
  WHERE
    T1.ID=T4.C1) T4_C1
FROM
  T1
WHERE
  T1.ID BETWEEN 1 AND 10;

ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE;

SELECT
  T1.ID,
  T2.C1 T2_C1,
  T3.C1 T3_C1,
  T4.C1 T4_C1
FROM
  T1,
  T2,
  T3,
  T4
WHERE
  T2.C1 BETWEEN 1 AND 50
  AND T2.C1=T3.C1
  AND T2.C1=T4.C1
  AND T2.C1=T1.ID;

ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE;

SELECT
  T1.ID,
  (SELECT
    T2.C1
  FROM
    T2
  WHERE
    T1.ID=T2.C1) T2_C1,
  (SELECT
    T3.C1
  FROM
    T3
  WHERE
    T1.ID=T3.C1) T3_C1,
  (SELECT
    T4.C1
  FROM
    T4
  WHERE
    T1.ID=T4.C1) T4_C1
FROM
  T1
WHERE
  T1.ID BETWEEN 1 AND 50;

ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE;

...

ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE;

SELECT
  T1.ID,
  T2.C1 T2_C1,
  T3.C1 T3_C1,
  T4.C1 T4_C1
FROM
  T1,
  T2,
  T3,
  T4
WHERE
  T2.C1 BETWEEN 1 AND 10000
  AND T2.C1=T3.C1
  AND T2.C1=T4.C1
  AND T2.C1=T1.ID;

ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE;

SELECT
  T1.ID,
  (SELECT
    T2.C1
  FROM
    T2
  WHERE
    T1.ID=T2.C1) T2_C1,
  (SELECT
    T3.C1
  FROM
    T3
  WHERE
    T1.ID=T3.C1) T3_C1,
  (SELECT
    T4.C1
  FROM
    T4
  WHERE
    T1.ID=T4.C1) T4_C1
FROM
  T1
WHERE
  T1.ID BETWEEN 1 AND 10000;

SPOOL OFF
SET AUTOTRACE OFF

ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT OFF';

It is a bit surprising to see the number of different execution plans that appear in the TKPROF output for the first query compared with the number that appear for the second query.  For example (the Rows statistic shows the actual number of rows returned for the execution plan line, while card shows the predicted number of rows), only showing the execution plan when it changed (notice at one point we see three sort-merge joins):

Rows     Row Source Operation
-------  ---------------------------------------------------
   6366  NESTED LOOPS  (cr=371 pr=350 pw=0 time=2208 us cost=121 size=208 card=13)
     10   HASH JOIN  (cr=325 pr=318 pw=0 time=27 us cost=97 size=96 card=8)
     10    HASH JOIN  (cr=316 pr=310 pw=0 time=9 us cost=95 size=72 card=9)
     10     TABLE ACCESS FULL T2 (cr=158 pr=155 pw=0 time=0 us cost=47 size=40 card=10)
     10     TABLE ACCESS FULL T3 (cr=158 pr=155 pw=0 time=0 us cost=47 size=40 card=10)
     10    INDEX RANGE SCAN IND_T4 (cr=9 pr=8 pw=0 time=0 us cost=2 size=40 card=10)(object id 82879)
   6366   INDEX RANGE SCAN IND_T1 (cr=46 pr=32 pw=0 time=1315 us cost=3 size=8 card=2)(object id 82875)

Rows     Row Source Operation
-------  ---------------------------------------------------
  31832  HASH JOIN  (cr=415 pr=398 pw=0 time=13733 us cost=208 size=3872 card=242)
     50   HASH JOIN  (cr=318 pr=318 pw=0 time=49 us cost=97 size=576 card=48)
     50    HASH JOIN  (cr=316 pr=310 pw=0 time=49 us cost=95 size=392 card=49)
     50     TABLE ACCESS FULL T2 (cr=158 pr=155 pw=0 time=0 us cost=47 size=200 card=50)
     50     TABLE ACCESS FULL T3 (cr=158 pr=155 pw=0 time=49 us cost=47 size=200 card=50)
     50    INDEX RANGE SCAN IND_T4 (cr=2 pr=8 pw=0 time=0 us cost=2 size=200 card=50)(object id 82879)
  31832   INDEX RANGE SCAN IND_T1 (cr=97 pr=80 pw=0 time=3465 us cost=110 size=200020 card=50005)(object id 82875)

Rows     Row Source Operation
-------  ---------------------------------------------------
 127334  HASH JOIN  (cr=706 pr=606 pw=0 time=176652 us cost=533 size=63392 card=3962)
    200   INDEX RANGE SCAN IND_T4 (cr=2 pr=8 pw=0 time=0 us cost=2 size=800 card=200)(object id 82879)
 127334   HASH JOIN  (cr=704 pr=598 pw=0 time=138605 us cost=530 size=47772 card=3981)
    200    TABLE ACCESS FULL T3 (cr=158 pr=155 pw=0 time=99 us cost=47 size=800 card=200)
 127334    MERGE JOIN  (cr=546 pr=443 pw=0 time=107989 us cost=482 size=32008 card=4001)
 127334     INDEX RANGE SCAN IND_T1 (cr=388 pr=288 pw=0 time=18939 us cost=434 size=800080 card=200020)(object id 82875)
 127334     SORT JOIN (cr=158 pr=155 pw=0 time=0 us cost=48 size=800 card=200)
    200      TABLE ACCESS FULL T2 (cr=158 pr=155 pw=0 time=0 us cost=47 size=800 card=200)

Rows     Row Source Operation
-------  ---------------------------------------------------
 894205  MERGE JOIN  (cr=3077 pr=2222 pw=0 time=2023576 us cost=3124 size=3132784 card=195799)
 894205   MERGE JOIN  (cr=3073 pr=2214 pw=0 time=1414521 us cost=3119 size=2351028 card=195919)
 894205    MERGE JOIN  (cr=2915 pr=2059 pw=0 time=767100 us cost=3071 size=1568312 card=196039)
 894205     INDEX RANGE SCAN IND_T1 (cr=2757 pr=1904 pw=0 time=111615 us cost=3023 size=5600560 card=1400140)(object id 82875)
 894205     SORT JOIN (cr=158 pr=155 pw=0 time=0 us cost=48 size=5600 card=1400)
   1400      TABLE ACCESS FULL T2 (cr=158 pr=155 pw=0 time=127 us cost=47 size=5600 card=1400)
 894205    SORT JOIN (cr=158 pr=155 pw=0 time=0 us cost=48 size=5600 card=1400)
   1400     TABLE ACCESS FULL T3 (cr=158 pr=155 pw=0 time=127 us cost=47 size=5600 card=1400)
 894205   SORT JOIN (cr=4 pr=8 pw=0 time=0 us cost=5 size=5600 card=1400)
   1400    INDEX RANGE SCAN IND_T4 (cr=4 pr=8 pw=0 time=127 us cost=4 size=5600 card=1400)(object id 82879)

Rows     Row Source Operation
-------  ---------------------------------------------------
1939734  HASH JOIN  (cr=23230 pr=21260 pw=0 time=929418 us cost=6000 size=14396160 card=899760)
   3000   HASH JOIN  (cr=343 pr=333 pw=0 time=2499 us cost=102 size=35988 card=2999)
   3000    INDEX FAST FULL SCAN IND_T4 (cr=27 pr=23 pw=0 time=374 us cost=7 size=12000 card=3000)(object id 82879)
   3000    HASH JOIN  (cr=316 pr=310 pw=0 time=1249 us cost=95 size=24000 card=3000)
   3000     TABLE ACCESS FULL T2 (cr=158 pr=155 pw=0 time=499 us cost=47 size=12000 card=3000)
   3000     TABLE ACCESS FULL T3 (cr=158 pr=155 pw=0 time=249 us cost=47 size=12000 card=3000)
1939734   INDEX FAST FULL SCAN IND_T1 (cr=22887 pr=20927 pw=0 time=287619 us cost=5888 size=12001200 card=3000300)(object id 82875)

Edit May 30, 2010: The above execution plans demonstrate the cost-based optimizer’s ability to adapt the execution plan operations as the projected data volumes increase.  Not all of the tables in this test case have indexes, and that was intentional to see how the lack of indexes on certain tables affected the execution plans.

The execution plan for the second query remained a bit more unchanged (again, only showing when the execution plan changed):

Rows     Row Source Operation
-------  ---------------------------------------------------
     10  TABLE ACCESS FULL T2 (cr=1580 pr=155 pw=0 time=0 us cost=47 size=4 card=1)
     10  TABLE ACCESS FULL T3 (cr=1580 pr=155 pw=0 time=0 us cost=47 size=4 card=1)
     10  INDEX RANGE SCAN IND_T4 (cr=12 pr=8 pw=0 time=0 us cost=1 size=4 card=1)(object id 82879)
   6366  INDEX RANGE SCAN IND_T1 (cr=22 pr=32 pw=0 time=389 us cost=24 size=40004 card=10001)(object id 82875)

Rows     Row Source Operation
-------  ---------------------------------------------------
   3000  TABLE ACCESS FULL T2 (cr=474000 pr=155 pw=0 time=0 us cost=47 size=4 card=1)
   3000  TABLE ACCESS FULL T3 (cr=474000 pr=155 pw=0 time=0 us cost=47 size=4 card=1)
   3000  INDEX RANGE SCAN IND_T4 (cr=1990 pr=16 pw=0 time=0 us cost=1 size=4 card=1)(object id 82879)
1939734  INDEX FAST FULL SCAN IND_T1 (cr=22887 pr=20927 pw=0 time=313731 us cost=5888 size=12001200 card=3000300)(object id 82875)

How frequently do people use scalar subqueries in column positions rather than inline views, and then wonder why performance is slower than expected?  Every once in a while a thread will appear on a discussion forum asking about performance problems with queries that include scalar subqueries.  For example:

The test script (strip the .doc extension): scalar_test_script.sql

The test results (Oracle Database 11.2.0.1 with 8,000MB SGA_TARGET):
The spool file: scalar_test_results_spool_or112.txt
The execution plans: scalar_test_execution_plans_or112.txt
The TKPROF summary: scalar_test_or112_tkprof.txt

The test results (Oracle Database 11.1.0.7 with 8,000MB SGA_TARGET – same server as 11.2.0.1 test and same test script):
The spool file: scalar_test_results_spool_or111.txt
The execution plans: scalar_test_execution_plans_or111.txt
The TKPROF summary: scalar_test_or111_tkprof.txt








Follow

Get every new post delivered to your Inbox.

Join 142 other followers