What is the Difference Between the FIRST_ROWS Hint and ROWNUM in the WHERE Clause?

10 03 2011

March 10, 2011

A couple of days ago I saw an interesting question that asked what is the difference between using the FIRST_ROWS(100) hint in a SQL statement and using ROWNUM<=100 in the WHERE clause with the original SQL statement moved to an inline view.   For example:

SELECT /*+ FIRST_ROWS(100) */
  C1,
  C2
FROM
  T1
WHERE
  C1 BETWEEN 1 AND 1000
ORDER BY
  C1; 

 

SELECT
  *
FROM
  (SELECT
     C1,
     C2
   FROM
     T1
   WHERE
     C1 BETWEEN 1 AND 1000
   ORDER BY
     C1)
WHERE
  ROWNUM<=100; 

My first thought was that the difference is actually quite simple to explain, and then I started to wonder, how could someone logically come to the conclusion that the two approaches would yield the same results?  Do developers look at explain plan output?  Could that explain how someone might look at these two approaches and question whether or not the two approaches are equivalent?  Let’s see:

SET AUTOTRACE TRACEONLY EXPLAIN

SELECT /*+ FIRST_ROWS(100) */
  C1,
  C2
FROM
  T1
WHERE
  C1 BETWEEN 1 AND 1000
ORDER BY
  C1;

--------------------------------------------------------------------------------------------
| Id  | Operation                   | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |              |   102 |  1020 |     5   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1           |   102 |  1020 |     5   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | SYS_C0020704 |  1000 |       |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C1">=1 AND "C1"<=1000) 

 

SET AUTOTRACE TRACEONLY EXPLAIN

SELECT
  *
FROM
  (SELECT
     C1,
     C2
   FROM
     T1
   WHERE
     C1 BETWEEN 1 AND 1000
   ORDER BY
     C1)
WHERE
  ROWNUM<=100;

----------------------------------------------------------------------------------------------
| Id  | Operation                     | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |              |   100 |  2600 |     5   (0)| 00:00:01 |
|*  1 |  COUNT STOPKEY                |              |       |       |            |          |
|   2 |   VIEW                        |              |   102 |  2652 |     5   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID| T1           |   102 |  1020 |     5   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN          | SYS_C0020704 |  1000 |       |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(ROWNUM<=100)
   4 - access("C1">=1 AND "C1"<=1000) 

So, the FIRST_ROWS hinted plan shows that 102 rows will be returned, while the plan with ROWNUM in the WHERE clause shows that 100 of 102 rows will be returned.  So, this is the first clue that the two approaches might not be equivalent.  Other people have written extensively about what the FIRST_ROWS hint (and OPTIMIZER_MODE) mean and also how ROWNUM in the WHERE clause works (and a potentially significant bug that is present prior to Oracle Database 11.2.0.1).  Is it possible to simplify the explanation of the difference, something like this?

The short answer is that the FIRST_ROWS hint tells the query optimizer: I really do not care to know if more than 1, 10, 100, or 1000 rows could be returned by the query, just plan the query execution as if my application will only retrieve 1, 10, 100, or 1000 rows – my application might still retrieve all of the rows, but just plan on the specified number being read. 

The ROWNUM predicate in the WHERE clause tells the query optimizer: I really do not care if more than 1, 10, 100, or 1000 rows could be returned by the original query, plan the query execution as if my application will only retrieve 1, 10, 100, or 1000 rows and make certain that my application cannot retrieve more than the specified number of rows.

Simple, maybe just a little too simple?

Let’s put together a little test script to demonstrate.  First, we will create a table with a primary key index, insert 500,000 rows, and then collect statistics with a 100% sample size:

CREATE TABLE T1(
  C1 NUMBER,
  C2 NUMBER,
  C3 NUMBER,
  C4 DATE,
  C5 DATE,
  C6 DATE,
  C7 VARCHAR2(20),
  C8 VARCHAR2(20),
  C9 VARCHAR2(20),
  PRIMARY KEY(C1));

INSERT INTO
  T1
SELECT
  ROWNUM,
  DECODE(MOD(ROWNUM,5),0,NULL,ROWNUM),
  DECODE(MOD(ROWNUM,5),0,NULL,ROWNUM),
  TRUNC(SYSDATE)+TRUNC(ROWNUM/100),
  DECODE(MOD(ROWNUM,5),0,NULL,TRUNC(SYSDATE)+TRUNC(ROWNUM/100)),
  DECODE(MOD(ROWNUM,5),0,NULL,TRUNC(SYSDATE)+TRUNC(ROWNUM/100)),
  RPAD(CHR(MOD(ROWNUM-1,26)+65)||
    CHR(MOD(ROWNUM,26)+65)||
    CHR(MOD(ROWNUM+1,26)+65)||
    CHR(MOD(ROWNUM+2,26)+65),20,'A'),
  DECODE(MOD(ROWNUM,5),0,NULL,
   RPAD(CHR(MOD(ROWNUM-1,26)+65)||
    CHR(MOD(ROWNUM,26)+65)||
    CHR(MOD(ROWNUM+1,26)+65)||
    CHR(MOD(ROWNUM+2,26)+65),20,'A')),
  DECODE(MOD(ROWNUM,5),0,NULL,
   RPAD(CHR(MOD(ROWNUM-1,26)+65)||
    CHR(MOD(ROWNUM,26)+65)||
    CHR(MOD(ROWNUM+1,26)+65)||
    CHR(MOD(ROWNUM+2,26)+65),20,'A'))
FROM
  DUAL
CONNECT BY
  LEVEL<=500000;

COMMIT;

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T1',CASCADE=>TRUE,ESTIMATE_PERCENT=>NULL) 

Now the test script:

SPOOL TestFirstRowsAndRownum.txt
ALTER SYSTEM FLUSH BUFFER_CACHE;
SET AUTOTRACE TRACEONLY STATISTICS EXPLAIN
SET ARRAYSIZE 1000
SET TIMING ON
SET LINESIZE 140
SET PAGESIZE 1000
SET TRIMSPOOL ON

SELECT /*+ INDEX(T1) */
  C1,
  C2,
  C3,
  C4,
  C5,
  C6
FROM
  T1
WHERE
  C1 BETWEEN 10000 AND 100000;

ALTER SYSTEM FLUSH BUFFER_CACHE;

SELECT /*+ INDEX(T1) FIRST_ROWS(100) */
  C1,
  C2,
  C3,
  C4,
  C5,
  C6
FROM
  T1
WHERE
  C1 BETWEEN 10000 AND 100000;

ALTER SYSTEM FLUSH BUFFER_CACHE;

SELECT /*+ INDEX(T1) */
  C1,
  C2,
  C3,
  C4,
  C5,
  C6
FROM
  T1
WHERE
  C1 BETWEEN 10000 AND 100000
  AND ROWNUM<=100;

ALTER SYSTEM FLUSH BUFFER_CACHE;

SELECT /*+  */
  *
FROM
  (SELECT
    C1,
    C2,
    MAX(C3) C3
  FROM
    T1
  GROUP BY
    C1,
    C2
  ORDER BY
    C1,
    C2)
WHERE
  C1 BETWEEN 10000 AND 100000;

ALTER SYSTEM FLUSH BUFFER_CACHE;

SELECT /*+  FIRST_ROWS(100) */
  *
FROM
  (SELECT
    C1,
    C2,
    MAX(C3) C3
  FROM
    T1
  GROUP BY
    C1,
    C2
  ORDER BY
    C1,
    C2)
WHERE
  C1 BETWEEN 10000 AND 100000;

ALTER SYSTEM FLUSH BUFFER_CACHE;

SELECT /*+  */
  *
FROM
  (SELECT
    C1,
    C2,
    MAX(C3) C3
  FROM
    T1
  GROUP BY
    C1,
    C2
  ORDER BY
    C1,
    C2)
WHERE
  C1 BETWEEN 10000 AND 100000
  AND ROWNUM<=100;

SPOOL OFF 

The output of the first query, without the hint and without the ROWNUM restriction:

SQL> SELECT /*+ INDEX(T1) */
  2    C1,
  3    C2,
  4    C3,
  5    C4,
  6    C5,
  7    C6
  8  FROM
  9    T1
 10  WHERE
 11    C1 BETWEEN 10000 AND 100000;

90001 rows selected.

Elapsed: 00:00:00.43

Execution Plan
----------------------------------------------------------
Plan hash value: 2459953871

--------------------------------------------------------------------------------------------
| Id  | Operation                   | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |              | 90002 |  3252K|  1359   (1)| 00:00:07 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1           | 90002 |  3252K|  1359   (1)| 00:00:07 |
|*  2 |   INDEX RANGE SCAN          | SYS_C0020704 | 90002 |       |   174   (2)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C1">=10000 AND "C1"<=100000)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       1534  consistent gets
       1354  physical reads
          0  redo size
    1994612  bytes sent via SQL*Net to client
       1324  bytes received via SQL*Net from client
         92  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      90001  rows processed 

Note in the plain (unmodified) query that 90,001 rows were returned to SQL*Plus, while the optimizer predicted that 90,002 rows would be returned.  It required 0.43 seconds for the rows to be returned to the SQL*Plus session.

Now the second query of the test case, hinted with FIRST_ROWS:

SQL> SELECT /*+ INDEX(T1) FIRST_ROWS(100) */
  2    C1,
  3    C2,
  4    C3,
  5    C4,
  6    C5,
  7    C6
  8  FROM
  9    T1
 10  WHERE
 11    C1 BETWEEN 10000 AND 100000;

90001 rows selected.

Elapsed: 00:00:00.43

Execution Plan
----------------------------------------------------------
Plan hash value: 2459953871

--------------------------------------------------------------------------------------------
| Id  | Operation                   | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |              |   103 |  3811 |     5   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1           |   103 |  3811 |     5   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | SYS_C0020704 | 90002 |       |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C1">=10000 AND "C1"<=100000)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       1534  consistent gets
       1354  physical reads
          0  redo size
    1994612  bytes sent via SQL*Net to client
       1324  bytes received via SQL*Net from client
         92  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      90001  rows processed 

Note in the FIRST_ROWS(100) hinted query that 90,001 rows were returned to SQL*Plus, while the optimizer predicted that 103 rows would be returned due to the hint.  It required 0.43 seconds for the rows to be returned to the SQL*Plus session.

Now the third query of the test case:

SQL> SELECT /*+ INDEX(T1) */
  2    C1,
  3    C2,
  4    C3,
  5    C4,
  6    C5,
  7    C6
  8  FROM
  9    T1
 10  WHERE
 11    C1 BETWEEN 10000 AND 100000
 12    AND ROWNUM<=100;

100 rows selected.

Elapsed: 00:00:00.01

Execution Plan
----------------------------------------------------------
Plan hash value: 3250543521

---------------------------------------------------------------------------------------------
| Id  | Operation                    | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |              |   100 |  3700 |     5   (0)| 00:00:01 |
|*  1 |  COUNT STOPKEY               |              |       |       |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1           |   103 |  3811 |     5   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | SYS_C0020704 | 90002 |       |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(ROWNUM<=100)
   3 - access("C1">=10000 AND "C1"<=100000)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          8  consistent gets
          6  physical reads
          0  redo size
       2787  bytes sent via SQL*Net to client
        334  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        100  rows processed 

Note in the query with the ROWNUM predicate in the WHERE clause that 100 rows were returned to SQL*Plus, while the optimizer predicted that 100 rows (103 rows before the COUNT STOPKEY operation) would be returned.  It required 0.01 seconds for the rows to be returned to the SQL*Plus session.

Now for the second half of the output (created on Oracle Database 10.2.0.2 – your results may be a bit different).  The first query:

SQL> SELECT /*+  */
  2    *
  3  FROM
  4    (SELECT
  5      C1,
  6      C2,
  7      MAX(C3) C3
  8    FROM
  9      T1
 10    GROUP BY
 11      C1,
 12      C2
 13    ORDER BY
 14      C1,
 15      C2)
 16  WHERE
 17    C1 BETWEEN 10000 AND 100000;

90001 rows selected.

Elapsed: 00:00:00.56

Execution Plan
----------------------------------------------------------
Plan hash value: 915463192

------------------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      | 90002 |  3427K|       |   904   (5)| 00:00:05 |
|   1 |  VIEW               |      | 90002 |  3427K|       |   904   (5)| 00:00:05 |
|   2 |   SORT GROUP BY     |      | 90002 |  2197K|  7080K|   904   (5)| 00:00:05 |
|*  3 |    TABLE ACCESS FULL| T1   | 90002 |  2197K|       |   352   (8)| 00:00:02 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("C1"<=100000 AND "C1">=10000)

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
       6563  consistent gets
       6555  physical reads
          0  redo size
    1307436  bytes sent via SQL*Net to client
       1324  bytes received via SQL*Net from client
         92  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
      90001  rows processed 

Nothing too out of the ordinary in the above, the WHERE clause predicate was pushed into the inline view, although I thought that the inline view (indicated by the VIEW line in the explain plan output) might disappear.

The second query with the FIRST_ROWS hint:

SQL> SELECT /*+  FIRST_ROWS(100) */
  2    *
  3  FROM
  4    (SELECT
  5      C1,
  6      C2,
  7      MAX(C3) C3
  8    FROM
  9      T1
 10    GROUP BY
 11      C1,
 12      C2
 13    ORDER BY
 14      C1,
 15      C2)
 16  WHERE
 17    C1 BETWEEN 10000 AND 100000;

90001 rows selected.

Elapsed: 00:00:00.50

Execution Plan
----------------------------------------------------------
Plan hash value: 915463192

------------------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      | 90002 |  3427K|       |   904   (5)| 00:00:05 |
|   1 |  VIEW               |      | 90002 |  3427K|       |   904   (5)| 00:00:05 |
|   2 |   SORT GROUP BY     |      | 90002 |  2197K|  7080K|   904   (5)| 00:00:05 |
|*  3 |    TABLE ACCESS FULL| T1   | 90002 |  2197K|       |   352   (8)| 00:00:02 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("C1"<=100000 AND "C1">=10000)

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
       6563  consistent gets
       6555  physical reads
          0  redo size
    1307436  bytes sent via SQL*Net to client
       1324  bytes received via SQL*Net from client
         92  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
      90001  rows processed 

The WHERE clause predicate was pushed into the inline view again, but notice that the optimizer did not predict that 100 or 103 rows would be returned by the query.  Do you see the same results on other Oracle Database release versions?  It is slightly interesting to see that the available index access path was not selected… almost as if the FIRST_ROWS hint had no effect… of course you know the reason why.  🙂  (Edit: March 10, 2011: The same result was achieved when moving the hint to the inline view, and same result was achieved when specifying the hint as FIRST_ROWS(1).  I guess that it is time to review the 10053 trace to see what happened.)

Now the query with the ROWNUM predicate:

SQL> SELECT /*+  */
  2    *
  3  FROM
  4    (SELECT
  5      C1,
  6      C2,
  7      MAX(C3) C3
  8    FROM
  9      T1
 10    GROUP BY
 11      C1,
 12      C2
 13    ORDER BY
 14      C1,
 15      C2)
 16  WHERE
 17    C1 BETWEEN 10000 AND 100000
 18    AND ROWNUM<=100;

100 rows selected.

Elapsed: 00:00:00.31

Execution Plan
----------------------------------------------------------
Plan hash value: 2221909342

----------------------------------------------------------------------------------------
| Id  | Operation               | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |      |   100 |  3900 |       |   748   (5)| 00:00:04 |
|*  1 |  COUNT STOPKEY          |      |       |       |       |            |          |
|   2 |   VIEW                  |      | 90002 |  3427K|       |   748   (5)| 00:00:04 |
|*  3 |    SORT GROUP BY STOPKEY|      | 90002 |  1318K|  4968K|   748   (5)| 00:00:04 |
|*  4 |     TABLE ACCESS FULL   | T1   | 90002 |  1318K|       |   352   (8)| 00:00:02 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(ROWNUM<=100)
   3 - filter(ROWNUM<=100)
   4 - filter("C1"<=100000 AND "C1">=10000)

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
       6563  consistent gets
       6555  physical reads
          0  redo size
       2275  bytes sent via SQL*Net to client
        334  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
        100  rows processed 

The standard WHERE clause predicate used in all of the queries was pushed into the inline view again, and the ROWNUM predicate was also pushed into the inline view.  Does this demonstrate the potential bug – what if instead of a full table scan, the primary key index access path were used to read the data from table T1?