Ramming Two SQL Statements Together – To Scalar or Too Scalar

30 05 2010

May 30, 2010

A recent comp.databases.oracle.server message thread about combining SQL statements triggered a bit of testing. In light of some of my later comments in that thread, I thought I might start out by first showing the summary of the performance results.

In the above, the left side shows the TKPROF output from running a query that uses a standard join between four tables, but would probably behave about the same if it was a query with one table and the other three tables were specified in inline views. The right side shows a query that produces identical results as the first, but uses SELECT statements in three columns of the parent SELECT statement that accesses a single table – there are three scalar (thanks Joel for looking up that term) subqueries in the SQL statement. Each row in the above output shows what happened to the statistics output by TKPROF when the range of rows selected increased from 6,366 to 10,000,000 by changing the WHERE clause so that the upper range of the range scan in the query increased from 10 to 10,000.

By the end of the Server 1 test, where the test was executed directly on the server, the first query completed roughly 4.49 times faster than the second query, and required 102.27 times fewer consistent gets than the second query.

By the end of the Server 2 test, where the test was executed remotely across a gigabit network against a much slower server with 1/10 the number of rows in the main table, the first query completed roughly 6.27 times faster, and required 121.08 times fewer consistent gets than the second query.

The obvious conclusion is that scalar subqueries should not be used in column positions because they are much slower, and require a lot more consistent gets.

——–

I hear someone in the back shouting, “But, but… where is your evidence? How can I verify that what you state to be correct is correct. How can I be satisfied that my server isn’t substantially slower than your server 2? Why were there all of those physical reads? What were the execution plans, and did you see any unexpected Cartesian joins? What did the SQL statements look like? What were the table definitions”

Is it not enough to state that I achieved a 627% performance improvement by just making a small change to a SQL statement? 🙂 Of course not – switching now to the approach used by the other book that I have on pre-order.

—

Here is the test case that was executed for the remote test with Server 2:

CREATE TABLE T1 (
  ID NUMBER,
  DESCRIPTION VARCHAR2(80));

INSERT INTO T1
SELECT
  CEIL(ABS(SIN(ROWNUM/9.9999)*10000)),
  'This is the long description for this number '|| TO_CHAR(CEIL(ABS(SIN(ROWNUM/9.9999)*10000)))
FROM
  (SELECT
    ROWNUM RN
  FROM
    DUAL
  CONNECT BY
    LEVEL<=1000),
  (SELECT
    ROWNUM RN
  FROM
    DUAL
  CONNECT BY
    LEVEL<=1000);

CREATE INDEX IND_T1 ON T1(ID);

CREATE TABLE T2 AS
SELECT
  ROWNUM C1,
  LPAD('A',100,'A') C2
FROM
  DUAL
CONNECT BY
  LEVEL<=10000;

CREATE TABLE T3 AS
SELECT
  ROWNUM C1,
  LPAD('A',100,'A') C2
FROM
  DUAL
CONNECT BY
  LEVEL<=10000;

CREATE TABLE T4 AS
SELECT
  ROWNUM C1,
  LPAD('A',100,'A') C2
FROM
  DUAL
CONNECT BY
  LEVEL<=10000;

CREATE INDEX IND_T4 ON T4(C1);

COMMIT;

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T1',CASCADE=>TRUE,METHOD_OPT=>'FOR ALL COLUMNS SIZE 1')
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T2',CASCADE=>TRUE)
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T3',CASCADE=>TRUE)
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T4',CASCADE=>TRUE)

There are just 1,000,000 rows in table T1, and I did not allow the optimizer to create a histogram on the columns in table T1. Now, let’s try a test script using the tables to see the expected execution plans (on Oracle Database 11.1.0.7):

SET AUTOTRACE TRACEONLY EXPLAIN

SELECT
  T1.ID,
  T2.C1 T2_C1,
  T3.C1 T3_C1,
  T4.C1 T4_C1
FROM
  T1,
  T2,
  T3,
  T4
WHERE
  T2.C1 BETWEEN 1 AND 200
  AND T2.C1=T3.C1
  AND T2.C1=T4.C1
  AND T2.C1=T1.ID;

SELECT
  T1.ID,
  (SELECT
    T2.C1
  FROM
    T2
  WHERE
    T1.ID=T2.C1) T2_C1,
  (SELECT
    T3.C1
  FROM
    T3
  WHERE
    T1.ID=T3.C1) T3_C1,
  (SELECT
    T4.C1
  FROM
    T4
  WHERE
    T1.ID=T4.C1) T4_C1
FROM
  T1
WHERE
  T1.ID BETWEEN 1 AND 200;

The first of the SQL statement directly joins the tables, while the second SQL statement places SELECT statements in column positions (scalar subqueries). The output (11.1.0.7):

SQL> SELECT
  2    T1.ID,
  3    T2.C1 T2_C1,
  4    T3.C1 T3_C1,
  5    T4.C1 T4_C1
  6  FROM
  7    T1,
  8    T2,
  9    T3,
 10    T4
 11  WHERE
 12    T2.C1 BETWEEN 1 AND 200
 13    AND T2.C1=T3.C1
 14    AND T2.C1=T4.C1
 15    AND T2.C1=T1.ID;

Execution Plan
----------------------------------------------------------
Plan hash value: 3780653648

-------------------------------------------------------------------------------
| Id  | Operation            | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |        |   457 |  7312 | 10041   (1)| 00:00:41 |
|*  1 |  HASH JOIN           |        |   457 |  7312 | 10041   (1)| 00:00:41 |
|*  2 |   HASH JOIN          |        |   198 |  2376 |    46   (5)| 00:00:01 |
|*  3 |    HASH JOIN         |        |   199 |  1592 |    43   (3)| 00:00:01 |
|*  4 |     TABLE ACCESS FULL| T2     |   200 |   800 |    21   (0)| 00:00:01 |
|*  5 |     TABLE ACCESS FULL| T3     |   200 |   800 |    21   (0)| 00:00:01 |
|*  6 |    INDEX RANGE SCAN  | IND_T4 |   200 |   800 |     2   (0)| 00:00:01 |
|*  7 |   TABLE ACCESS FULL  | T1     | 20002 | 80008 |  9994   (1)| 00:00:41 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T2"."C1"="T1"."ID")
   2 - access("T2"."C1"="T4"."C1")
   3 - access("T2"."C1"="T3"."C1")
   4 - filter("T2"."C1"<=200 AND "T2"."C1">=1)
   5 - filter("T3"."C1"<=200 AND "T3"."C1">=1)
   6 - access("T4"."C1">=1 AND "T4"."C1"<=200)
   7 - filter("T1"."ID"<=200 AND "T1"."ID">=1)

SQL> SELECT
  2    T1.ID,
  3    (SELECT
  4      T2.C1
  5    FROM
  6      T2
  7    WHERE
  8      T1.ID=T2.C1) T2_C1,
  9    (SELECT
 10      T3.C1
 11    FROM
 12      T3
 13    WHERE
 14      T1.ID=T3.C1) T3_C1,
 15    (SELECT
 16      T4.C1
 17    FROM
 18      T4
 19    WHERE
 20      T1.ID=T4.C1) T4_C1
 21  FROM
 22    T1
 23  WHERE
 24    T1.ID BETWEEN 1 AND 200;

Execution Plan
----------------------------------------------------------
Plan hash value: 2945978589

----------------------------------------------------------------------------
| Id  | Operation         | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |        | 20002 | 80008 |  9994   (1)| 00:00:41 |
|*  1 |  TABLE ACCESS FULL| T2     |     1 |     4 |    21   (0)| 00:00:01 |
|*  2 |  TABLE ACCESS FULL| T3     |     1 |     4 |    21   (0)| 00:00:01 |
|*  3 |  INDEX RANGE SCAN | IND_T4 |     1 |     4 |     1   (0)| 00:00:01 |
|*  4 |  TABLE ACCESS FULL| T1     | 20002 | 80008 |  9994   (1)| 00:00:41 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("T2"."C1"=:B1)
   2 - filter("T3"."C1"=:B1)
   3 - access("T4"."C1"=:B1)
   4 - filter("T1"."ID"<=200 AND "T1"."ID">=1)

The execution plans look a bit different, and notice that the second execution plan shows bind variables in the Predicate Information section. If we were to actually run the SQL statements, we might find that the first runs in about 15 seconds and the second in about 16-17 seconds, both with the same number of physical reads. That is no fun, so let’s change the number 200 to 1200 to see what happens. We will flush the buffer cache twice between executions to force physical reads for both executions (Oracle is set to use direct, asynchronous IO), and set the array fetch size to 1,000 to minimize the amount of unnecessary network traffic. The test script follows:

SET TIMING ON
SET AUTOTRACE TRACEONLY STATISTICS;
ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE;
SET ARRAYSIZE 1000

SELECT
  T1.ID,
  T2.C1 T2_C1,
  T3.C1 T3_C1,
  T4.C1 T4_C1
FROM
  T1,
  T2,
  T3,
  T4
WHERE
  T2.C1 BETWEEN 1 AND 1200
  AND T2.C1=T3.C1
  AND T2.C1=T4.C1
  AND T2.C1=T1.ID;

ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE;

SELECT
  T1.ID,
  (SELECT
    T2.C1
  FROM
    T2
  WHERE
    T1.ID=T2.C1) T2_C1,
  (SELECT
    T3.C1
  FROM
    T3
  WHERE
    T1.ID=T3.C1) T3_C1,
  (SELECT
    T4.C1
  FROM
    T4
  WHERE
    T1.ID=T4.C1) T4_C1
FROM
  T1
WHERE
  T1.ID BETWEEN 1 AND 1200;

Here is the output of the above script:

SQL> SELECT
  2    T1.ID,
  3    T2.C1 T2_C1,
  4    T3.C1 T3_C1,
  5    T4.C1 T4_C1
  6  FROM
  7    T1,
  8    T2,
  9    T3,
 10    T4
 11  WHERE
 12    T2.C1 BETWEEN 1 AND 1200
 13    AND T2.C1=T3.C1
 14    AND T2.C1=T4.C1
 15    AND T2.C1=T1.ID;

76580 rows selected.

Elapsed: 00:00:15.96

Statistics
---------------------------------------------------
          1  recursive calls
          0  db block gets
      83197  consistent gets
      83110  physical reads
          0  redo size
    1288037  bytes sent via SQL*Net to client
       1217  bytes received via SQL*Net from client
         78  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      76580  rows processed

SQL> SELECT
  2    T1.ID,
  3    (SELECT
  4      T2.C1
  5    FROM
  6      T2
  7    WHERE
  8      T1.ID=T2.C1) T2_C1,
  9    (SELECT
 10      T3.C1
 11    FROM
 12      T3
 13    WHERE
 14      T1.ID=T3.C1) T3_C1,
 15    (SELECT
 16      T4.C1
 17    FROM
 18      T4
 19    WHERE
 20      T1.ID=T4.C1) T4_C1
 21  FROM
 22    T1
 23  WHERE
 24    T1.ID BETWEEN 1 AND 1200;

76580 rows selected.

Elapsed: 00:01:40.09

Statistics
---------------------------------------------------
          1  recursive calls
          0  db block gets
   10073639  consistent gets
      83110  physical reads
          0  redo size
    1288037  bytes sent via SQL*Net to client
       1217  bytes received via SQL*Net from client
         78  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      76580  rows processed

The number of consistent gets jumps significantly for the second SQL statement and so did the execution time (and CPU usage). Here is the autotrace statistics for the second SQL statement (for comparison) when T1.ID BETWEEN 1 AND 200 was specified:
12732 rows selected.

Elapsed: 00:00:17.54

Statistics
---------------------------------------------------
          0  recursive calls
          0  db block gets
     522390  consistent gets
      83108  physical reads
          0  redo size
     196813  bytes sent via SQL*Net to client
        513  bytes received via SQL*Net from client
         14  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      12732  rows processed

—

While the above might be interesting, a slightly different table T1 was created for the Server 1 test script:

CREATE TABLE T1 (
  ID NUMBER,
  DESCRIPTION VARCHAR2(80));

INSERT INTO T1
SELECT
  CEIL(ABS(SIN(ROWNUM/9.9999)*10000)),
  'This is the long description for this number '|| TO_CHAR(CEIL(ABS(SIN(ROWNUM/9.9999)*10000)))
FROM
  (SELECT
    ROWNUM RN
  FROM
    DUAL
  CONNECT BY
    LEVEL<=1000),
  (SELECT
    ROWNUM RN
  FROM
    DUAL
  CONNECT BY
    LEVEL<=10000);

CREATE INDEX IND_T1 ON T1(ID);

CREATE TABLE T2 AS
SELECT
  ROWNUM C1,
  LPAD('A',100,'A') C2
FROM
  DUAL
CONNECT BY
  LEVEL<=10000;

CREATE TABLE T3 AS
SELECT
  ROWNUM C1,
  LPAD('A',100,'A') C2
FROM
  DUAL
CONNECT BY
  LEVEL<=10000;

CREATE TABLE T4 AS
SELECT
  ROWNUM C1,
  LPAD('A',100,'A') C2
FROM
  DUAL
CONNECT BY
  LEVEL<=10000;

CREATE INDEX IND_T4 ON T4(C1);

COMMIT;

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T1',CASCADE=>TRUE)
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T2',CASCADE=>TRUE)
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T3',CASCADE=>TRUE)
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T4',CASCADE=>TRUE)

This time I loaded table T1 with 10,000,000 rows and allowed the optimizer to collect a histogram on table T1, if it determined that a histogram would help. The full test script on Server 1 is a bit long, so I will post an abbreviated version here. First, let’s determine the execution plans for the SQL statements (Oracle Database 11.2.0.1 with 8,000MB SGA_TARGET):

SET AUTOTRACE TRACEONLY EXPLAIN
SET LINESIZE 160
SET TRIMSPOOL ON
SET PAGESIZE 2000
SPOOL SCALAR_EXECUTION_PLANS.TXT

SELECT
  T1.ID,
  T2.C1 T2_C1,
  T3.C1 T3_C1,
  T4.C1 T4_C1
FROM
  T1,
  T2,
  T3,
  T4
WHERE
  T2.C1 BETWEEN 1 AND 10
  AND T2.C1=T3.C1
  AND T2.C1=T4.C1
  AND T2.C1=T1.ID;

SELECT
  T1.ID,
  (SELECT
    T2.C1
  FROM
    T2
  WHERE
    T1.ID=T2.C1) T2_C1,
  (SELECT
    T3.C1
  FROM
    T3
  WHERE
    T1.ID=T3.C1) T3_C1,
  (SELECT
    T4.C1
  FROM
    T4
  WHERE
    T1.ID=T4.C1) T4_C1
FROM
  T1
WHERE
  T1.ID BETWEEN 1 AND 10;

SELECT
  T1.ID,
  T2.C1 T2_C1,
  T3.C1 T3_C1,
  T4.C1 T4_C1
FROM
  T1,
  T2,
  T3,
  T4
WHERE
  T2.C1 BETWEEN 1 AND 50
  AND T2.C1=T3.C1
  AND T2.C1=T4.C1
  AND T2.C1=T1.ID;

SELECT
  T1.ID,
  (SELECT
    T2.C1
  FROM
    T2
  WHERE
    T1.ID=T2.C1) T2_C1,
  (SELECT
    T3.C1
  FROM
    T3
  WHERE
    T1.ID=T3.C1) T3_C1,
  (SELECT
    T4.C1
  FROM
    T4
  WHERE
    T1.ID=T4.C1) T4_C1
FROM
  T1
WHERE
  T1.ID BETWEEN 1 AND 50;

...

SELECT
  T1.ID,
  T2.C1 T2_C1,
  T3.C1 T3_C1,
  T4.C1 T4_C1
FROM
  T1,
  T2,
  T3,
  T4
WHERE
  T2.C1 BETWEEN 1 AND 10000
  AND T2.C1=T3.C1
  AND T2.C1=T4.C1
  AND T2.C1=T1.ID;

SELECT
  T1.ID,
  (SELECT
    T2.C1
  FROM
    T2
  WHERE
    T1.ID=T2.C1) T2_C1,
  (SELECT
    T3.C1
  FROM
    T3
  WHERE
    T1.ID=T3.C1) T3_C1,
  (SELECT
    T4.C1
  FROM
    T4
  WHERE
    T1.ID=T4.C1) T4_C1
FROM
  T1
WHERE
  T1.ID BETWEEN 1 AND 10000;

SPOOL OFF

Then to check the performance results, the script that actually executes the statements (Oracle Database 11.2.0.1 with 8,000MB SGA_TARGET):

SET TIMING ON
SET AUTOTRACE TRACEONLY STATISTICS;
SET ARRAYSIZE 1000
SPOOL SCALAR_TEST_RESULTS.TXT
ALTER SESSION SET TRACEFILE_IDENTIFIER = 'SCALAR_TEST';
ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT FOREVER, LEVEL 8';

ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE;

SELECT
  T1.ID,
  T2.C1 T2_C1,
  T3.C1 T3_C1,
  T4.C1 T4_C1
FROM
  T1,
  T2,
  T3,
  T4
WHERE
  T2.C1 BETWEEN 1 AND 10
  AND T2.C1=T3.C1
  AND T2.C1=T4.C1
  AND T2.C1=T1.ID;

ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE;

SELECT
  T1.ID,
  (SELECT
    T2.C1
  FROM
    T2
  WHERE
    T1.ID=T2.C1) T2_C1,
  (SELECT
    T3.C1
  FROM
    T3
  WHERE
    T1.ID=T3.C1) T3_C1,
  (SELECT
    T4.C1
  FROM
    T4
  WHERE
    T1.ID=T4.C1) T4_C1
FROM
  T1
WHERE
  T1.ID BETWEEN 1 AND 10;

ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE;

SELECT
  T1.ID,
  T2.C1 T2_C1,
  T3.C1 T3_C1,
  T4.C1 T4_C1
FROM
  T1,
  T2,
  T3,
  T4
WHERE
  T2.C1 BETWEEN 1 AND 50
  AND T2.C1=T3.C1
  AND T2.C1=T4.C1
  AND T2.C1=T1.ID;

ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE;

SELECT
  T1.ID,
  (SELECT
    T2.C1
  FROM
    T2
  WHERE
    T1.ID=T2.C1) T2_C1,
  (SELECT
    T3.C1
  FROM
    T3
  WHERE
    T1.ID=T3.C1) T3_C1,
  (SELECT
    T4.C1
  FROM
    T4
  WHERE
    T1.ID=T4.C1) T4_C1
FROM
  T1
WHERE
  T1.ID BETWEEN 1 AND 50;

ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE;

...

ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE;

SELECT
  T1.ID,
  T2.C1 T2_C1,
  T3.C1 T3_C1,
  T4.C1 T4_C1
FROM
  T1,
  T2,
  T3,
  T4
WHERE
  T2.C1 BETWEEN 1 AND 10000
  AND T2.C1=T3.C1
  AND T2.C1=T4.C1
  AND T2.C1=T1.ID;

ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE;

SELECT
  T1.ID,
  (SELECT
    T2.C1
  FROM
    T2
  WHERE
    T1.ID=T2.C1) T2_C1,
  (SELECT
    T3.C1
  FROM
    T3
  WHERE
    T1.ID=T3.C1) T3_C1,
  (SELECT
    T4.C1
  FROM
    T4
  WHERE
    T1.ID=T4.C1) T4_C1
FROM
  T1
WHERE
  T1.ID BETWEEN 1 AND 10000;

SPOOL OFF
SET AUTOTRACE OFF

ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT OFF';

It is a bit surprising to see the number of different execution plans that appear in the TKPROF output for the first query compared with the number that appear for the second query. For example (the Rows statistic shows the actual number of rows returned for the execution plan line, while card shows the predicted number of rows), only showing the execution plan when it changed (notice at one point we see three sort-merge joins):

Rows     Row Source Operation
-------  ---------------------------------------------------
   6366  NESTED LOOPS  (cr=371 pr=350 pw=0 time=2208 us cost=121 size=208 card=13)
     10   HASH JOIN  (cr=325 pr=318 pw=0 time=27 us cost=97 size=96 card=8)
     10    HASH JOIN  (cr=316 pr=310 pw=0 time=9 us cost=95 size=72 card=9)
     10     TABLE ACCESS FULL T2 (cr=158 pr=155 pw=0 time=0 us cost=47 size=40 card=10)
     10     TABLE ACCESS FULL T3 (cr=158 pr=155 pw=0 time=0 us cost=47 size=40 card=10)
     10    INDEX RANGE SCAN IND_T4 (cr=9 pr=8 pw=0 time=0 us cost=2 size=40 card=10)(object id 82879)
   6366   INDEX RANGE SCAN IND_T1 (cr=46 pr=32 pw=0 time=1315 us cost=3 size=8 card=2)(object id 82875)

Rows     Row Source Operation
-------  ---------------------------------------------------
  31832  HASH JOIN  (cr=415 pr=398 pw=0 time=13733 us cost=208 size=3872 card=242)
     50   HASH JOIN  (cr=318 pr=318 pw=0 time=49 us cost=97 size=576 card=48)
     50    HASH JOIN  (cr=316 pr=310 pw=0 time=49 us cost=95 size=392 card=49)
     50     TABLE ACCESS FULL T2 (cr=158 pr=155 pw=0 time=0 us cost=47 size=200 card=50)
     50     TABLE ACCESS FULL T3 (cr=158 pr=155 pw=0 time=49 us cost=47 size=200 card=50)
     50    INDEX RANGE SCAN IND_T4 (cr=2 pr=8 pw=0 time=0 us cost=2 size=200 card=50)(object id 82879)
  31832   INDEX RANGE SCAN IND_T1 (cr=97 pr=80 pw=0 time=3465 us cost=110 size=200020 card=50005)(object id 82875)

Rows     Row Source Operation
-------  ---------------------------------------------------
 127334  HASH JOIN  (cr=706 pr=606 pw=0 time=176652 us cost=533 size=63392 card=3962)
    200   INDEX RANGE SCAN IND_T4 (cr=2 pr=8 pw=0 time=0 us cost=2 size=800 card=200)(object id 82879)
 127334   HASH JOIN  (cr=704 pr=598 pw=0 time=138605 us cost=530 size=47772 card=3981)
    200    TABLE ACCESS FULL T3 (cr=158 pr=155 pw=0 time=99 us cost=47 size=800 card=200)
 127334    MERGE JOIN  (cr=546 pr=443 pw=0 time=107989 us cost=482 size=32008 card=4001)
 127334     INDEX RANGE SCAN IND_T1 (cr=388 pr=288 pw=0 time=18939 us cost=434 size=800080 card=200020)(object id 82875)
 127334     SORT JOIN (cr=158 pr=155 pw=0 time=0 us cost=48 size=800 card=200)
    200      TABLE ACCESS FULL T2 (cr=158 pr=155 pw=0 time=0 us cost=47 size=800 card=200)

Rows     Row Source Operation
-------  ---------------------------------------------------
 894205  MERGE JOIN  (cr=3077 pr=2222 pw=0 time=2023576 us cost=3124 size=3132784 card=195799)
 894205   MERGE JOIN  (cr=3073 pr=2214 pw=0 time=1414521 us cost=3119 size=2351028 card=195919)
 894205    MERGE JOIN  (cr=2915 pr=2059 pw=0 time=767100 us cost=3071 size=1568312 card=196039)
 894205     INDEX RANGE SCAN IND_T1 (cr=2757 pr=1904 pw=0 time=111615 us cost=3023 size=5600560 card=1400140)(object id 82875)
 894205     SORT JOIN (cr=158 pr=155 pw=0 time=0 us cost=48 size=5600 card=1400)
   1400      TABLE ACCESS FULL T2 (cr=158 pr=155 pw=0 time=127 us cost=47 size=5600 card=1400)
 894205    SORT JOIN (cr=158 pr=155 pw=0 time=0 us cost=48 size=5600 card=1400)
   1400     TABLE ACCESS FULL T3 (cr=158 pr=155 pw=0 time=127 us cost=47 size=5600 card=1400)
 894205   SORT JOIN (cr=4 pr=8 pw=0 time=0 us cost=5 size=5600 card=1400)
   1400    INDEX RANGE SCAN IND_T4 (cr=4 pr=8 pw=0 time=127 us cost=4 size=5600 card=1400)(object id 82879)

Rows     Row Source Operation
-------  ---------------------------------------------------
1939734  HASH JOIN  (cr=23230 pr=21260 pw=0 time=929418 us cost=6000 size=14396160 card=899760)
   3000   HASH JOIN  (cr=343 pr=333 pw=0 time=2499 us cost=102 size=35988 card=2999)
   3000    INDEX FAST FULL SCAN IND_T4 (cr=27 pr=23 pw=0 time=374 us cost=7 size=12000 card=3000)(object id 82879)
   3000    HASH JOIN  (cr=316 pr=310 pw=0 time=1249 us cost=95 size=24000 card=3000)
   3000     TABLE ACCESS FULL T2 (cr=158 pr=155 pw=0 time=499 us cost=47 size=12000 card=3000)
   3000     TABLE ACCESS FULL T3 (cr=158 pr=155 pw=0 time=249 us cost=47 size=12000 card=3000)
1939734   INDEX FAST FULL SCAN IND_T1 (cr=22887 pr=20927 pw=0 time=287619 us cost=5888 size=12001200 card=3000300)(object id 82875)

Edit May 30, 2010: The above execution plans demonstrate the cost-based optimizer’s ability to adapt the execution plan operations as the projected data volumes increase. Not all of the tables in this test case have indexes, and that was intentional to see how the lack of indexes on certain tables affected the execution plans.

The execution plan for the second query remained a bit more unchanged (again, only showing when the execution plan changed):

Rows     Row Source Operation
-------  ---------------------------------------------------
     10  TABLE ACCESS FULL T2 (cr=1580 pr=155 pw=0 time=0 us cost=47 size=4 card=1)
     10  TABLE ACCESS FULL T3 (cr=1580 pr=155 pw=0 time=0 us cost=47 size=4 card=1)
     10  INDEX RANGE SCAN IND_T4 (cr=12 pr=8 pw=0 time=0 us cost=1 size=4 card=1)(object id 82879)
   6366  INDEX RANGE SCAN IND_T1 (cr=22 pr=32 pw=0 time=389 us cost=24 size=40004 card=10001)(object id 82875)

Rows     Row Source Operation
-------  ---------------------------------------------------
   3000  TABLE ACCESS FULL T2 (cr=474000 pr=155 pw=0 time=0 us cost=47 size=4 card=1)
   3000  TABLE ACCESS FULL T3 (cr=474000 pr=155 pw=0 time=0 us cost=47 size=4 card=1)
   3000  INDEX RANGE SCAN IND_T4 (cr=1990 pr=16 pw=0 time=0 us cost=1 size=4 card=1)(object id 82879)
1939734  INDEX FAST FULL SCAN IND_T1 (cr=22887 pr=20927 pw=0 time=313731 us cost=5888 size=12001200 card=3000300)(object id 82875)

—

How frequently do people use scalar subqueries in column positions rather than inline views, and then wonder why performance is slower than expected? Every once in a while a thread will appear on a discussion forum asking about performance problems with queries that include scalar subqueries. For example:

The test script (strip the .doc extension): scalar_test_script.sql

The test results (Oracle Database 11.2.0.1 with 8,000MB SGA_TARGET):
The spool file: scalar_test_results_spool_or112.txt
The execution plans: scalar_test_execution_plans_or112.txt
The TKPROF summary: scalar_test_or112_tkprof.txt

The test results (Oracle Database 11.1.0.7 with 8,000MB SGA_TARGET – same server as 11.2.0.1 test and same test script):
The spool file: scalar_test_results_spool_or111.txt
The execution plans: scalar_test_execution_plans_or111.txt
The TKPROF summary: scalar_test_or111_tkprof.txt

Actions

Information

Date : May 30, 2010
Categories : Performance, SQL

11 responses

30 05 2010

Timur Akhmadeev (15:21:19) :

>How frequently do people use scalar subqueries in column positions rather than inline views, and then wonder why performance is slower than expected?
I use scalar subqueries instead of joins in certain situations – for performance reasons (to decrease number of consistent gets). Scalar subquery caching rocks.

30 05 2010

Charles Hooper (16:43:47) :

Timur,

Thanks for the link. From the link that you provided it appears that scalar subquery caching applies to deterministic functions that are used in the scalar subquery – where it appears that Oracle “remembers” the last result (or maybe last several results) retrieved from the deterministic function so that it does not need to re-execute the function for every row in the resultset. Did I understand that article correctly?

It is an interesting feature. I had heard previously that Oracle will cache the results of deterministic functions in some cases – but I am having trouble at the moment remembering exactly when the caching may take place. The link that you provided shows one case where the caching happens. Any others?

In my test case the number of consistent gets increased by large values for the query that used scalar subqueries. Would you be able to put together a test case that shows a decrease in the number of consistent gets when scalar subqueries are used? That test case would be a good way to balance my mostly one-sided article and test, and might explain why scalar subqueries appear so frequently.

I remember when I was learning SQL on Oracle and found that I could embed a SELECT statement into a column position. I thought that it was a great feature until I needed to retrieve more than one column from the scalar subquery. Then I figured out how to work with Oracle’s (+) outer join syntax and never found a need for those scalar subqueries.

31 05 2010

Timur Akhmadeev (05:26:13) :

Charles,

>From the link that you provided it appears that scalar subquery caching applies to deterministic functions that are used in the scalar subquery
Scalar subquery caching works with non-determenistic functions too, IIRC. It works the same way as filter subqueries

>It is an interesting feature. I had heard previously that Oracle will cache the results of deterministic functions in some cases – but I am having trouble at the moment remembering exactly when the caching may take place. The link that you provided shows one case where the caching happens. Any others?
IIRC Oracle can cache results of deterministic functions not only in scalar subqueries since 10g. This may be wrong – I’m not sure about this.

>Would you be able to put together a test case that shows a decrease in the number of consistent gets when scalar subqueries are used?
Something like this:

drop table t1 cascade constraints purge;
drop table t2 cascade constraints purge;

create table t1 as 
select rownum id
     , cast('x' as char(2000)) pad 
  from dual connect by level <= 1e5;
 
create table t2 as 
select rownum id
     , mod(ora_hash(rownum), 10) t1_id
	 , mod(rownum, 100) x
	 , cast('x' as char(50)) pad 
  from dual connect by level <= 1e6;
  
create unique index t1_uq on t1(id);

exec dbms_stats.gather_table_stats(user, 't1', method_opt=>'for all columns size 1', estimate_percent=>null)
exec dbms_stats.gather_table_stats(user, 't2', method_opt=>'for all columns size 1', estimate_percent=>null)

set arraysize 1000 timing on autotrace traceonly stat

select t2.*
      ,(select pad from t1 where t1.id = t2.t1_id) t1_pad
  from t2
 where x = 10;
/

select t2.*
     , t1.pad
  from t1, t2
 where t2.x = 10
   and t1.id(+) = t2.t1_id;
/

SQL> select t2.*
  2        ,(select pad from t1 where t1.id = t2.t1_id) t1_pad
  3    from t2
  4   where x = 10;

10000 rows selected.

Elapsed: 00:00:02.79

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
       9300  consistent gets
       9259  physical reads
          0  redo size
    1991703  bytes sent via SQL*Net to client
        491  bytes received via SQL*Net from client
         11  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      10000  rows processed

SQL> /

10000 rows selected.

Elapsed: 00:00:03.23

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
       9300  consistent gets
       9256  physical reads
          0  redo size
    1991703  bytes sent via SQL*Net to client
        491  bytes received via SQL*Net from client
         11  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      10000  rows processed

SQL>
SQL> select t2.*
  2       , t1.pad
  3    from t1, t2
  4   where t2.x = 10
  5     and t1.id(+) = t2.t1_id;

10000 rows selected.

Elapsed: 00:00:02.43

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
      28257  consistent gets
       9254  physical reads
          0  redo size
    1991700  bytes sent via SQL*Net to client
        491  bytes received via SQL*Net from client
         11  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      10000  rows processed

SQL> /

10000 rows selected.

Elapsed: 00:00:03.04

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
      28257  consistent gets
       9257  physical reads
          0  redo size
    1991700  bytes sent via SQL*Net to client
        491  bytes received via SQL*Net from client
         11  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      10000  rows processed

31 05 2010

Charles Hooper (08:41:07) :

Timur,

Thank you for posting a test case that shows the counter-point for my article.

I ran your test case on a netbook that is running the 32 bit version of Oracle Database 11.2.0.1. In my case, the SQL statement using the scalar query completed in about the same amount of time as your query, while the outer join version required 2 to 3 times longer than the scalar version – even if I changed the outer join to a standard join it still required 2 to 3 times longer than the scalar version.

I became curious, so I took a look at the execution plans, and then made a slight addition to your test script:

set arraysize 1000 linesize 140 timing on autotrace traceonly stat explain

select t2.*
      ,(select pad from t1 where t1.id = t2.t1_id) t1_pad
  from t2
 where x = 10;

select t2.*
     , t1.pad
  from t1, t2
 where t2.x = 10
   and t1.id(+) = t2.t1_id;

select /*+ INDEX(T1) */ t2.*
     , t1.pad
  from t1, t2
 where t2.x = 10
   and t1.id(+) = t2.t1_id;

I added a linesize specification and explain to the set command, added a third query with an index hint, and removed the multiple executions. After executing the script a couple of times, this is what I received:

SQL> select t2.*
  2        ,(select pad from t1 where t1.id = t2.t1_id) t1_pad
  3    from t2
  4   where x = 10;
 
10000 rows selected.
 
Elapsed: 00:00:02.04
 
Execution Plan
----------------------------------------------------------
Plan hash value: 2561529335
 
-------------------------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       | 10000 |   605K|  2603   (3)| 00:00:32 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1    |     1 |  2006 |     2   (0)| 00:00:01 |
|*  2 |   INDEX UNIQUE SCAN         | T1_UQ |     1 |       |     1   (0)| 00:00:01 |
|*  3 |  TABLE ACCESS FULL          | T2    | 10000 |   605K|  2603   (3)| 00:00:32 |
-------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("T1"."ID"=:B1)
   3 - filter("X"=10)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       9290  consistent gets
       9259  physical reads
          0  redo size
    1992203  bytes sent via SQL*Net to client
        515  bytes received via SQL*Net from client
         11  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      10000  rows processed

SQL> select t2.*
  2       , t1.pad
  3    from t1, t2
  4   where t2.x = 10
  5     and t1.id(+) = t2.t1_id;

10000 rows selected.

Elapsed: 00:00:06.06

Execution Plan
----------------------------------------------------------
Plan hash value: 1426054487

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      | 10000 |    19M| 11744   (1)| 00:02:21 |
|*  1 |  HASH JOIN OUTER   |      | 10000 |    19M| 11744   (1)| 00:02:21 |
|*  2 |   TABLE ACCESS FULL| T2   | 10000 |   605K|  2603   (3)| 00:00:32 |
|   3 |   TABLE ACCESS FULL| T1   |   100K|   191M|  9139   (1)| 00:01:50 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."ID"(+)="T2"."T1_ID")
   2 - filter("T2"."X"=10)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      42607  consistent gets
      42593  physical reads
          0  redo size
     105869  bytes sent via SQL*Net to client
        515  bytes received via SQL*Net from client
         11  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      10000  rows processed

SQL> select /*+ INDEX(T1) */ t2.*
  2       , t1.pad
  3    from t1, t2
  4   where t2.x = 10
  5     and t1.id(+) = t2.t1_id;

10000 rows selected.

Elapsed: 00:00:02.13

Execution Plan
----------------------------------------------------------
Plan hash value: 647702405

--------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       | 10000 |    19M| 12619   (1)| 00:02:32 |
|   1 |  NESTED LOOPS OUTER          |       | 10000 |    19M| 12619   (1)| 00:02:32 |
|*  2 |   TABLE ACCESS FULL          | T2    | 10000 |   605K|  2603   (3)| 00:00:32 |
|   3 |   TABLE ACCESS BY INDEX ROWID| T1    |     1 |  2006 |     1   (0)| 00:00:01 |
|*  4 |    INDEX UNIQUE SCAN         | T1_UQ |     1 |       |     0   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("T2"."X"=10)
   4 - access("T1"."ID"(+)="T2"."T1_ID")


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      18257  consistent gets
       9259  physical reads
          0  redo size
    1992200  bytes sent via SQL*Net to client
        515  bytes received via SQL*Net from client
         11  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      10000  rows processed

With the index hint the timing of the outer join is close, but not quite as fast as the scalar – the number of consistent gets does seem to make the difference in execution time.

31 05 2010

Charles Hooper (09:36:46) :

This is output of the modified script from another system running Oracle Database 11.2.0.1 – definitely not a netbook:

SQL> select t2.*
  2        ,(select pad from t1 where t1.id = t2.t1_id) t1_pad
  3    from t2
  4   where x = 10;

10000 rows selected.

Elapsed: 00:00:00.28

Execution Plan
----------------------------------------------------------
Plan hash value: 2561529335

-------------------------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       | 10000 |   605K|  2557   (1)| 00:00:31 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1    |     1 |  2006 |     2   (0)| 00:00:01 |
|*  2 |   INDEX UNIQUE SCAN         | T1_UQ |     1 |       |     1   (0)| 00:00:01 |
|*  3 |  TABLE ACCESS FULL          | T2    | 10000 |   605K|  2557   (1)| 00:00:31 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("T1"."ID"=:B1)
   3 - filter("X"=10)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       9297  consistent gets
          0  physical reads
          0  redo size
    1992859  bytes sent via SQL*Net to client
        619  bytes received via SQL*Net from client
         11  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      10000  rows processed
 
SQL> select t2.*
  2       , t1.pad
  3    from t1, t2
  4   where t2.x = 10
  5     and t1.id(+) = t2.t1_id;

10000 rows selected.

Elapsed: 00:00:02.76

Execution Plan
----------------------------------------------------------
Plan hash value: 1426054487

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      | 10000 |    19M| 11659   (1)| 00:02:20 |
|*  1 |  HASH JOIN OUTER   |      | 10000 |    19M| 11659   (1)| 00:02:20 |
|*  2 |   TABLE ACCESS FULL| T2   | 10000 |   605K|  2557   (1)| 00:00:31 |
|   3 |   TABLE ACCESS FULL| T1   |   100K|   191M|  9101   (1)| 00:01:50 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."ID"(+)="T2"."T1_ID")
   2 - filter("T2"."X"=10)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      42614  consistent gets
      33334  physical reads
          0  redo size
     106525  bytes sent via SQL*Net to client
        619  bytes received via SQL*Net from client
         11  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      10000  rows processed

SQL> select /*+ INDEX(T1) */ t2.*
  2       , t1.pad
  3    from t1, t2
  4   where t2.x = 10
  5     and t1.id(+) = t2.t1_id;

10000 rows selected.

Elapsed: 00:00:00.28

Execution Plan
----------------------------------------------------------
Plan hash value: 647702405

--------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       | 10000 |    19M| 12560   (1)| 00:02:31 |
|   1 |  NESTED LOOPS OUTER          |       | 10000 |    19M| 12560   (1)| 00:02:31 |
|*  2 |   TABLE ACCESS FULL          | T2    | 10000 |   605K|  2557   (1)| 00:00:31 |
|   3 |   TABLE ACCESS BY INDEX ROWID| T1    |     1 |  2006 |     1   (0)| 00:00:01 |
|*  4 |    INDEX UNIQUE SCAN         | T1_UQ |     1 |       |     0   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("T2"."X"=10)
   4 - access("T1"."ID"(+)="T2"."T1_ID")


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      18264  consistent gets
          0  physical reads
          0  redo size
    1992856  bytes sent via SQL*Net to client
        619  bytes received via SQL*Net from client
         11  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      10000  rows processed

On this other system the time for the scalar query bounces between 0.26 and 0.28 seconds. If table T1 is modified so that column ID has a NOT NULL constraint, and the outer join is converted to a standard join, the output looks like this after repeated executions:

SQL> select t2.*
  2        ,(select pad from t1 where t1.id = t2.t1_id) t1_pad
  3    from t2
  4   where x = 10;

10000 rows selected.

Elapsed: 00:00:00.26

Execution Plan
----------------------------------------------------------
Plan hash value: 2561529335

-------------------------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       | 10000 |   605K|  2557   (1)| 00:00:31 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1    |     1 |  2006 |     2   (0)| 00:00:01 |
|*  2 |   INDEX UNIQUE SCAN         | T1_UQ |     1 |       |     1   (0)| 00:00:01 |
|*  3 |  TABLE ACCESS FULL          | T2    | 10000 |   605K|  2557   (1)| 00:00:31 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("T1"."ID"=:B1)
   3 - filter("X"=10)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       9297  consistent gets
          0  physical reads
          0  redo size
    1992859  bytes sent via SQL*Net to client
        619  bytes received via SQL*Net from client
         11  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      10000  rows processed

SQL> select t2.*
  2       , t1.pad
  3    from t1, t2
  4   where t2.x = 10
  5     and t1.id = t2.t1_id;

8967 rows selected.

Elapsed: 00:00:00.29

Execution Plan
----------------------------------------------------------
Plan hash value: 2959412835

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      | 10000 |    19M| 11659   (1)| 00:02:20 |
|*  1 |  HASH JOIN         |      | 10000 |    19M| 11659   (1)| 00:02:20 |
|*  2 |   TABLE ACCESS FULL| T2   | 10000 |   605K|  2557   (1)| 00:00:31 |
|   3 |   TABLE ACCESS FULL| T1   |   100K|   191M|  9101   (1)| 00:01:50 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."ID"="T2"."T1_ID")
   2 - filter("T2"."X"=10)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      42622  consistent gets
          0  physical reads
          0  redo size
      96022  bytes sent via SQL*Net to client
        608  bytes received via SQL*Net from client
         10  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
       8967  rows processed

SQL> select /*+ INDEX(T1) */ t2.*
  2       , t1.pad
  3    from t1, t2
  4   where t2.x = 10
  5     and t1.id = t2.t1_id;

8967 rows selected.

Elapsed: 00:00:00.26

Execution Plan
----------------------------------------------------------
Plan hash value: 1398459894

--------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       | 10000 |    19M| 12560   (1)| 00:02:31 |
|   1 |  NESTED LOOPS                |       |       |       |            |          |
|   2 |   NESTED LOOPS               |       | 10000 |    19M| 12560   (1)| 00:02:31 |
|*  3 |    TABLE ACCESS FULL         | T2    | 10000 |   605K|  2557   (1)| 00:00:31 |
|*  4 |    INDEX UNIQUE SCAN         | T1_UQ |     1 |       |     0   (0)| 00:00:01 |
|   5 |   TABLE ACCESS BY INDEX ROWID| T1    |     1 |  2006 |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("T2"."X"=10)
   4 - access("T1"."ID"="T2"."T1_ID")


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      18262  consistent gets
          0  physical reads
          0  redo size
     120070  bytes sent via SQL*Net to client
        608  bytes received via SQL*Net from client
         10  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
       8967  rows processed

Now both forms of the query complete in 0.26 seconds, but they are no longer equivalent SQL statements. Once again, the scalar query required fewer consistent gets, so it might be worth experimenting more to see why my earlier tests showed a significant increase in the consistent gets, while this test shows a decrease.

31 05 2010

Nigel Thomas (11:50:16) :

The scalar subqueries in the column list have the effect of outer joins to the corresponding tables (because if there is no matching row at the other table, the row will still be emitted – whereas in the 4 way join it won’t). So a completely fair test would be to use the outer join as the basis for comparison, i.e:

select T.id
        , A.id as A_ID
from   T, A
where A.id(+) = T.id

with the scalar subquery version:

select T.id
        , (select x from A where A.id = T.id) as A_ID
from   T

and/or with the ANSI version of the outer join:

select T.id
        , A.id as A_ID
from   T
left join A on A.id = T.id

No way to preview the code, so I hope it shows correctly. Sorry, no access to Oracle to find out for myself today (bank holiday in UK).

Generally the scalar subquery is to be avoided, but it can be a useful optimisation (for example, when it appears as part of a CASE expression that is infrequently evaluated).

Regards Nigel

31 05 2010

Charles Hooper (20:25:16) :

Nigel,

Thank you for the response. You are correct that I should have used outer joins to simulate the behavior of the scalar subqueries. I recreated the test again using the following script (the new query is listed in between the two old ones):

SET TIMING ON
SET AUTOTRACE TRACEONLY STATISTICS;
SET ARRAYSIZE 1000
ALTER SESSION SET TRACEFILE_IDENTIFIER = 'SCALAR_TEST2';
ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT FOREVER, LEVEL 8';
 
ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE;
 
SELECT
  T1.ID,
  T2.C1 T2_C1,
  T3.C1 T3_C1,
  T4.C1 T4_C1
FROM
  T1,
  T2,
  T3,
  T4
WHERE
  T2.C1 BETWEEN 1 AND 10000
  AND T2.C1=T3.C1
  AND T2.C1=T4.C1
  AND T2.C1=T1.ID;
 
ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE;
 
SELECT
  T1.ID,
  T2.C1 T2_C1,
  T3.C1 T3_C1,
  T4.C1 T4_C1
FROM
  T1,
  T2,
  T3,
  T4
WHERE
  T1.ID BETWEEN 1 AND 10000
  AND T1.ID=T2.C1(+)
  AND T1.ID=T3.C1(+)
  AND T1.ID=T4.C1(+);
 
ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE;
 
SELECT
  T1.ID,
  (SELECT
    T2.C1
  FROM
    T2
  WHERE
    T1.ID=T2.C1) T2_C1,
  (SELECT
    T3.C1
  FROM
    T3
  WHERE
    T1.ID=T3.C1) T3_C1,
  (SELECT
    T4.C1
  FROM
    T4
  WHERE
    T1.ID=T4.C1) T4_C1
FROM
  T1
WHERE
  T1.ID BETWEEN 1 AND 10000;
 
SET AUTOTRACE OFF
 
ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT OFF';

Here is the TKPROF output:

SELECT
  T1.ID,
  T2.C1 T2_C1,
  T3.C1 T3_C1,
  T4.C1 T4_C1
FROM
  T1,
  T2,
  T3,
  T4
WHERE
  T2.C1 BETWEEN 1 AND 10000
  AND T2.C1=T3.C1
  AND T2.C1=T4.C1
  AND T2.C1=T1.ID

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch    10001      5.33       5.98      21260      31281          0    10000000
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total    10003      5.33       5.98      21260      31281          0    10000000

Misses in library cache during parse: 0
Optimizer mode: ALL_ROWS
Parsing user id: 194  

Rows      Row Source Operation
--------  ---------------------------------------------------
10000000  HASH JOIN  (cr=31281 pr=21260 pw=0 time=5427652 us cost=6033 size=160000000 card=10000000)
   10000   HASH JOIN  (cr=343 pr=333 pw=0 time=8100 us cost=102 size=120000 card=10000)
   10000    INDEX FAST FULL SCAN IND_T4 (cr=27 pr=23 pw=0 time=1265 us cost=7 size=40000 card=10000)(object id 82879)
   10000    HASH JOIN  (cr=316 pr=310 pw=0 time=4556 us cost=95 size=80000 card=10000)
   10000     TABLE ACCESS FULL T2 (cr=158 pr=155 pw=0 time=1518 us cost=47 size=40000 card=10000)
   10000     TABLE ACCESS FULL T3 (cr=158 pr=155 pw=0 time=1012 us cost=47 size=40000 card=10000)
10000000   INDEX FAST FULL SCAN IND_T1 (cr=30938 pr=20927 pw=0 time=2206620 us cost=5899 size=40000000 card=10000000)(object id 82875)
 
Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                   10001        0.00          0.00
  db file sequential read                         4        0.02          0.03
  db file scattered read                        218        0.04          0.53
  asynch descriptor resize                        3        0.00          0.00
  SQL*Net message from client                 10001        0.00         13.32

SELECT
  T1.ID,
  T2.C1 T2_C1,
  T3.C1 T3_C1,
  T4.C1 T4_C1
FROM
  T1,
  T2,
  T3,
  T4
WHERE
  T1.ID BETWEEN 1 AND 10000
  AND T1.ID=T2.C1(+)
  AND T1.ID=T3.C1(+)
  AND T1.ID=T4.C1(+)
 
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch    10001      9.48      10.01      21260      31281          0    10000000
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total    10003      9.48      10.01      21260      31281          0    10000000
 
Misses in library cache during parse: 0
Optimizer mode: ALL_ROWS
Parsing user id: 194  
 
Rows      Row Source Operation
--------  ---------------------------------------------------
10000000  HASH JOIN RIGHT OUTER (cr=31281 pr=21260 pw=0 time=11112717 us cost=6094 size=160000000 card=10000000)
   10000   INDEX FAST FULL SCAN IND_T4 (cr=27 pr=23 pw=0 time=1265 us cost=7 size=40000 card=10000)(object id 82879)
10000000   HASH JOIN RIGHT OUTER (cr=31254 pr=21237 pw=0 time=8123623 us cost=6056 size=120000000 card=10000000)
   10000    TABLE ACCESS FULL T3 (cr=158 pr=155 pw=0 time=1012 us cost=47 size=40000 card=10000)
10000000    HASH JOIN RIGHT OUTER (cr=31096 pr=21082 pw=0 time=5183042 us cost=5978 size=80000000 card=10000000)
   10000     TABLE ACCESS FULL T2 (cr=158 pr=155 pw=0 time=1139 us cost=47 size=40000 card=10000)
10000000     INDEX FAST FULL SCAN IND_T1 (cr=30938 pr=20927 pw=0 time=2193563 us cost=5899 size=40000000 card=10000000)(object id 82875)
  
Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                   10001        0.00          0.00
  Disk file operations I/O                        2        0.00          0.00
  db file sequential read                         4        0.00          0.00
  db file scattered read                        218        0.02          0.47
  SQL*Net message from client                 10001        0.00         13.88
  asynch descriptor resize                        3        0.00          0.00

SELECT
  T1.ID,
  (SELECT
    T2.C1
  FROM
    T2
  WHERE
    T1.ID=T2.C1) T2_C1,
  (SELECT
    T3.C1
  FROM
    T3
  WHERE
    T1.ID=T3.C1) T3_C1,
  (SELECT
    T4.C1
  FROM
    T4
  WHERE
    T1.ID=T4.C1) T4_C1
FROM
  T1
WHERE
  T1.ID BETWEEN 1 AND 10000
 
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch    10001     15.95      16.52      21269    3198982          0    10000000
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total    10003     15.95      16.52      21269    3198982          0    10000000
 
Misses in library cache during parse: 0
Optimizer mode: ALL_ROWS
Parsing user id: 194  
 
Rows      Row Source Operation
--------  ---------------------------------------------------
   10000  TABLE ACCESS FULL T2 (cr=1580000 pr=155 pw=0 time=0 us cost=47 size=4 card=1)
   10000  TABLE ACCESS FULL T3 (cr=1580000 pr=155 pw=0 time=0 us cost=47 size=4 card=1)
   10000  INDEX RANGE SCAN IND_T4 (cr=8044 pr=32 pw=0 time=0 us cost=1 size=4 card=1)(object id 82879)
10000000  INDEX FAST FULL SCAN IND_T1 (cr=30938 pr=20927 pw=0 time=2147483 us cost=5899 size=40000000 card=10000000)(object id 82875)
 
Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                   10001        0.00          0.00
  db file sequential read                         3        0.02          0.04
  db file scattered read                        218        0.04          0.64
  SQL*Net message from client                 10001        0.00         13.47

Note that this execution *should* match the last line of the top portion of the table at the top of this page, but it appears that the timings are faster (the earlier timings might have been heat related). As can be seen by the above, the number of consistent gets and physical reads remained constant when the first query was converted to an outer join, but the execution time increased slightly. The outer join is still faster than the scalar join, and the number of consistent gets is also significantly less than what the scalar join produced.

I think that this blog article, if nothing else, highlights the value of test cases.

1 06 2010

Narendra (08:34:11) :

I don’t have access to oracle at present so can’t provide my test case.
But, in addition to options discussed above, following 2 factors will further improve the effectiveness of scalar subquery approach:
1. Uniqueness (or lack of) of the column being selected – Less distinct values, more effective the scalar subquery
2. Sorting of the data in outer query – This will be evident only if a function is being called and it uses data from the outer query.
Here is the demonstration of the same by Tom
http://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:1547006324238#726758800346145302

1 06 2010

Charles Hooper (20:47:25) :

Another set of good comments, and another useful link.

4 06 2010

joel garry (17:47:31) :

Your welcome! 🙂

I learned more than one new thing today.

4 06 2010

Charles Hooper (18:09:43) :

Hi Joel,

That second book I mentioned in the Usenet thread was supposed to be released on June 1, with a shipping estimate of June 7, but Amazon is currently reporting “This title has not yet been released.” I hope that the book does not arrive too late – I would hate to have to set the book aside when Tom Kyte’s book that is also on pre-order arrives. I imagine that there will be a couple of blog posts, good or bad, about the book that is shipping late. To stay on topic this comment deserves a bind variable declaration :B

	Charles Hooper on Favorite Free Applications to…
	Zach on Favorite Free Applications to…
	Charles Hooper on Favorite Free Applications to…
	Zach on Favorite Free Applications to…
	Charles Hooper on Huge Presentation – Work…
	Dan on Huge Presentation – Work…
	Charles Hooper on Review of Synology DS415+, How…
	Matthias Clausen on Review of Synology DS415+, How…
	Charles Hooper on _SMALL_TABLE_THRESHOLD Paramet…
	mohd atif on _SMALL_TABLE_THRESHOLD Paramet…
	Charles Hooper on Free ANSI SQL to Oracle Specif…
	Kalin Agrawal on Free ANSI SQL to Oracle Specif…
	Charles Hooper on Large Page Support or Not…
	vaurob on Large Page Support or Not…
	Locating Oracle… on Finding a New Home for a Clien…

Charles Hooper's Oracle Notes