MS Query Teases You – Excel will Not Display Text Contained in Long Raw/Blob Column

30 10 2010

October 30, 2010

Older versions of the ERP package that I work with stored lengthy text data in LONG RAW columns.  Newer versions of the ERP package store lengthy text data in BLOB columns.  When tables containing those columns are queried using the Microsoft Query tool, which allows the data stored in tables to be retrieved into an Excel worksheet, the text found within the LONG RAW and BLOB columns appears in the Microsoft Query preview window, but those columns are silently excluded when the data is brought back into Microsoft Excel.  There must be a work around.  Sure, there is an easy solution for BLOB columns using a combination of the UTL_RAW.CAST_TO_VARCHAR2 and DBMS_LOB.SUBSTR functions, but is there a solution for LONG RAW columns?

First, we will build a table to somewhat mimic a table found in the old version of the ERP package (I think that I only omitted the WORKORDER_SUB_ID column):

CREATE TABLE T5(
  WORKORDER_TYPE CHAR(1),
  WORKORDER_BASE_ID VARCHAR2(30),
  WORKORDER_LOT_ID VARCHAR2(3),
  WORKORDER_SPLIT_ID VARCHAR2(3),
  SEQUENCE_NO NUMBER(12,0),
  TYPE CHAR(1),
  BITS_LENGTH NUMBER(12,0),
  BITS LONG RAW);

Now, let’s build a bit of code using the Visual Basic for Applications macro editor in Excel to populate the table.  I will use late binding, like what is required in VBS macros, so that I can use ADO without adding ADO references in the Visual Basic editor (see the previous articles in the Excel category for directions that permit early binding).

Const adCmdText = 1
Const adCmdStoredProc = 4
Const adParamInput = 1
Const adVarNumeric = 139
Const adBigInt = 20
Const adDecimal = 14
Const adDouble = 5
Const adInteger = 3
Const adLongVarBinary = 205
Const adNumeric = 131
Const adSingle = 4
Const adSmallInt = 2
Const adTinyInt = 16
Const adUnsignedBigInt = 21
Const adUnsignedInt = 19
Const adUnsignedSmallInt = 18
Const adUnsignedTinyInt = 17
Const adDate = 7
Const adDBDate = 133
Const adDBTimeStamp = 135
Const adDBTime = 134
Const adVarChar = 200
Const adUseClient = 3
Const adOpenKeyset = 1
Const adLockOptimistic = 3

Sub CreateRows()
    Dim i As Integer

    Dim strUsername As String
    Dim strPassword As String
    Dim strDatabase As String
    Dim strSQL As String

    Dim intLength As Integer
    Dim strTempBits As String
    Dim bytBits() As Byte                         'An array of bytes

    Dim dynData As Object                         'ADO Recordset object used to retrieve the user's data
    Dim dbDatabase As Object                      'ADO database connection object

    On Error Resume Next

    Set dynData = CreateObject("ADODB.Recordset")
    Set dbDatabase = CreateObject("ADODB.Connection")

    strDatabase = "MyDB"
    strUsername = "MyUser"
    strPassword = "MyPassword"

    'Connect to the database
    'Oracle connection string
    dbDatabase.ConnectionString = "Provider=OraOLEDB.Oracle;Data Source=" & strDatabase & ";User ID=" & strUsername & ";Password=" & strPassword & ";ChunkSize=1000;FetchSize=100;"

    dbDatabase.ConnectionTimeout = 40
    dbDatabase.CursorLocation = adUseClient
    dbDatabase.Open

    'Retrieve a recordset with 0 rows that will allow us to insert into the table
    strSQL = "SELECT"
    strSQL = strSQL & "  WORKORDER_TYPE," & vbCrLf
    strSQL = strSQL & "  WORKORDER_BASE_ID," & vbCrLf
    strSQL = strSQL & "  WORKORDER_LOT_ID," & vbCrLf
    strSQL = strSQL & "  WORKORDER_SPLIT_ID," & vbCrLf
    strSQL = strSQL & "  SEQUENCE_NO," & vbCrLf
    strSQL = strSQL & "  TYPE," & vbCrLf
    strSQL = strSQL & "  BITS_LENGTH," & vbCrLf
    strSQL = strSQL & "  BITS" & vbCrLf
    strSQL = strSQL & "FROM" & vbCrLf
    strSQL = strSQL & "  T5" & vbCrLf
    strSQL = strSQL & "WHERE" & vbCrLf
    strSQL = strSQL & "  1=2"
    dynData.Open strSQL, dbDatabase, adOpenKeyset, adLockOptimistic, adCmdText

    For i = 1 To 1000
        dynData.AddNew
        dynData("workorder_type") = "W"
        dynData("workorder_base_id") = "WO" & String(6 - Len(Format(i)), "0") & Format(i)
        dynData("workorder_lot_id") = "1"
        dynData("workorder_split_id") = "0"
        dynData("sequence_no") = 10
        dynData("type") = "D"

        strTempBits = "WO" & String(6 - Len(Format(i)), "0") & Format(i) & "/" & "1" & "  This is a very long description." & String(1000, ".")
        bytBits = StrConv(strTempBits, vbFromUnicode)
        intLength = Len(strTempBits)

        dynData("bits_length") = intLength
        dynData.Fields("bits").AppendChunk bytBits

        dynData.Update
    Next i

    dynData.Close
    dbDatabase.Close

    Set dynData = Nothing
    Set dbDatabase = Nothing
End Sub

When the above code is executed, the test table T5 should contain 1,000 rows.  Now let’s see if we are able to retrieve the rows from the database into Excel’s worksheet.  We will start out by switching to the Data tab in Excel (2010) and then select From Microsoft Query:

The next step is to select a suitable (32 bit, even on 64 bit computers) ODBC entry for the database.

Now let’s enter a simple SQL statement to retrieve the data contained in the test table T5 – note that Microsoft Query will complain if the ; character is included at the end of the SQL statement, but this is done to try to keep Microsoft Query from attempting to re-write the SQL statement (this occasionally eliminates a couple of problems):

As you can see, the Microsoft Query preview window shows the text data that is contained within the LONG RAW BITS column as we had hoped:

Now if we tell Microsoft Query to return the rows to Excel, we see that the LONG RAW BITS column was thrown away:

No problem, we will just try a trick to convert the LONG RAW column to a BLOB using the TO_LOB function and then convert the resulting BLOB to a VARCHAR2.  However, that trick simply does not work because the TO_LOB function can only be used in a INSERT INTO SELECT, or CREATE TABLEAS SELECT type SQL statement according to the documentation:

So, let’s create a new table to allow us to temporarily convert the LONG RAW column to a BLOB column:

Now, back in Microsoft Query, we change the SQL statement as follows:

SELECT
  WORKORDER_TYPE,
  WORKORDER_BASE_ID,
  WORKORDER_LOT_ID,
  WORKORDER_SPLIT_ID,
  SEQUENCE_NO,
  TYPE,
  BITS_LENGTH,
  UTL_RAW.CAST_TO_VARCHAR2(DBMS_LOB.SUBSTR(BITS,32000,1)) BITS
FROM
  T5_TEMP
WHERE
  TYPE='D';

When we return the query results into Excel this time, the LONG RAW column that was converted to a BLOB column in the second table, has its column values converted to a VARCHAR2, and those values actually makes it into Excel (although the column alias “BITS” is lost):

But, I don’t want to go through the process of creating a table to temporarily hold the results of a LONG RAW to BLOB conversion so that I can display the characters in Excel – that would be far too messy if there were many tables.  If we tell Excel to record a macro while we bring in data using the Microsoft Query tool, we see a macro that looks like this:

With ActiveSheet.ListObjects.Add(SourceType:=0, Source:=Array(Array( _
    "ODBC;DSN=MyODBC;UID=MyUser;;DBQ=MyDB;DBA=W;APA=T;EXC=F;FEN=T;QTO=F;FRC=10;FDL=10;LOB=T;RST=T;BTD=F;BNF=F;BAM=IfAllSuccessful;NUM=NL" _
    ), Array("S;DPM=F;MTS=T;MDI=F;CSR=F;FWC=F;FBS=64000;TLO=O;MLD=0;ODA=F;")), _
    Destination:=Range("$A$1")).QueryTable
    .CommandText = Array( _
    "SELECT" & Chr(13) & "" & Chr(10) & "  WORKORDER_TYPE," & Chr(13) & "" & Chr(10) & "  WORKORDER_BASE_ID," & Chr(13) & "" & Chr(10) & _
       "  WORKORDER_LOT_ID," & Chr(13) & "" & Chr(10) & "  WORKORDER_SPLIT_ID," & Chr(13) & "" & Chr(10) & "  SEQUENCE_NO," & _
       Chr(13) & "" & Chr(10) & "  TYPE," & Chr(13) & "" & Chr(10) & "  BITS_LENGTH," & Chr(13) & "" & Chr(10) & _
       "  UTL_RAW.CAST_TO_VARCHAR2(DBMS_LOB.SUBSTR(BITS,32000,1)) " & Chr(13) & "" & Chr(10) & "FROM" & Chr(13) & "" & Chr(10) & "" _
       , "  T5_TEMP" & Chr(13) & "" & Chr(10) & "WHERE" & Chr(13) & "" & Chr(10) & "  TYPE='D';")
    .RowNumbers = False
    .FillAdjacentFormulas = False
    .PreserveFormatting = True
    .RefreshOnFileOpen = False
    .BackgroundQuery = True
    .RefreshStyle = xlInsertDeleteCells
    .SavePassword = False
    .SaveData = True
    .AdjustColumnWidth = True
    .RefreshPeriod = 0
    .PreserveColumnInfo = True
    .ListObject.DisplayName = "Table_Query_from_OR1125"
    .Refresh BackgroundQuery:=False
End With

While the above is interesting (and probably explains why the BITS column alias was lost), it is probably of little help for our problem.  We need something better, our own macro, something like the following (note that the following macro writes the data to a text file, and then uses an Excel function to quickly bring that text file into an Excel worksheet.  This approach should be much faster than visiting each cell in the worksheet to write the query results to each cell).

Sub DisplayData()
    Dim i As Integer
    Dim lngRow As Long
    Dim intFileNum As Integer

    Dim strUsername As String
    Dim strPassword As String
    Dim strDatabase As String
    Dim strSQL As String
    Dim strOut As String

    Dim intLength As Integer
    Dim strTempBits As String

    Dim snpData As Object                         'ADO Recordset object used to retrieve the user's data
    Dim dbDatabase As Object                      'ADO database connection object
    On Error Resume Next

    Set snpData = CreateObject("ADODB.Recordset")
    Set dbDatabase = CreateObject("ADODB.Connection")

    strDatabase = "MyDB"
    strUsername = "MyUser"
    strPassword = "MyPassword"

    'Connect to the database
    'Oracle connection string
    dbDatabase.ConnectionString = "Provider=OraOLEDB.Oracle;Data Source=" & strDatabase & ";User ID=" & strUsername & ";Password=" & strPassword & ";ChunkSize=1000;FetchSize=100;"

    dbDatabase.ConnectionTimeout = 40
    dbDatabase.CursorLocation = adUseClient
    dbDatabase.Open

    'Retrieve a recordset with 0 rows that will allow us to insert into the table
    strSQL = "SELECT"
    strSQL = strSQL & "  WORKORDER_TYPE," & vbCrLf
    strSQL = strSQL & "  WORKORDER_BASE_ID," & vbCrLf
    strSQL = strSQL & "  WORKORDER_LOT_ID," & vbCrLf
    strSQL = strSQL & "  WORKORDER_SPLIT_ID," & vbCrLf
    strSQL = strSQL & "  SEQUENCE_NO," & vbCrLf
    strSQL = strSQL & "  TYPE," & vbCrLf
    strSQL = strSQL & "  BITS_LENGTH," & vbCrLf
    strSQL = strSQL & "  BITS" & vbCrLf
    strSQL = strSQL & "FROM" & vbCrLf
    strSQL = strSQL & "  T5" & vbCrLf
    strSQL = strSQL & "WHERE" & vbCrLf
    strSQL = strSQL & "  TYPE='D'"
    snpData.Open strSQL, dbDatabase

    Application.ScreenUpdating = False
    lngRow = 1

    intFileNum = FreeFile
    Open "C:\LongRawToVarchar.txt" For Output As #intFileNum

    strOut = ""
    For i = 0 To snpData.Fields.Count - 1
        strOut = strOut & snpData.Fields(i).Name & vbTab
        'ActiveSheet.Cells(lngRow, i + 1).Value = snpData.Fields(i).Name
    Next i
    Print #intFileNum, strOut

    Do While Not snpData.EOF
        lngRow = lngRow + 1
        strOut = ""
        For i = 0 To snpData.Fields.Count - 2 'All Except the last column
            If Not (IsNull(snpData.Fields(i).Value)) Then
                'Switch out Ascii 13 & Ascii 10 combinations for just Ascii 10 so that line breaks do not cause problems in the resulting file
                strOut = strOut & Replace(snpData.Fields(i).Value, vbCrLf, vbLf) & vbTab
                'ActiveSheet.Cells(lngRow, i + 1).Value = snpData.Fields(i).Value
            Else
                strOut = strOut & vbTab
            End If
        Next i

        'Handle the LONG RAW column
        strTempBits = Replace(StrConv(snpData("bits"), vbUnicode), vbCrLf, vbLf)
        strOut = strOut & strTempBits & vbTab
        'ActiveSheet.Cells(lngRow, snpData.Fields.Count).Value = strTempBits

        Print #intFileNum, strOut

        snpData.MoveNext
    Loop
    snpData.Close

    'Close the data file
    Close #intFileNum

    'Read the text file just written to disk into the worksheet
    With ActiveSheet.QueryTables.Add(Connection:="TEXT;C:\LongRawToVarchar.txt", Destination:=ActiveSheet.Range("A1"))
        .Name = "Page1"
        .FieldNames = True
        .RowNumbers = False
        .FillAdjacentFormulas = False
        .PreserveFormatting = True
        .RefreshOnFileOpen = False
        .RefreshStyle = xlInsertDeleteCells
        .SavePassword = False
        .SaveData = True
        .AdjustColumnWidth = True
        .RefreshPeriod = 0
        .TextFilePromptOnRefresh = False
        .TextFileStartRow = 1
        .TextFileParseType = xlDelimited
        .TextFileTextQualifier = xlTextQualifierDoubleQuote
        .TextFileConsecutiveDelimiter = False
        .TextFileTabDelimiter = True                         'Tabs are the delimiter
        .TextFileSemicolonDelimiter = False
        .TextFileCommaDelimiter = False
        .TextFileSpaceDelimiter = False
        .TextFileColumnDataTypes = Array(1)
        .Refresh BackgroundQuery:=False
    End With

    Application.ScreenUpdating = True

    dbDatabase.Close

    Set snpData = Nothing
    Set dbDatabase = Nothing
End Sub

If we did not want to write the data out to a text file, we could just comment out the section titled as “‘Read the text file just written to disk into the worksheet“, comment out the Print# lines, and uncomment the lines that begin with “‘ActiveSheet.Cells(“.

For example, if we do not want to write the results to a temp file, our macro would look like this:

Sub DisplayData2()
    Dim i As Integer
    Dim lngRow As Long

    Dim strUsername As String
    Dim strPassword As String
    Dim strDatabase As String
    Dim strSQL As String
    Dim strOut As String

    Dim intLength As Integer
    Dim strTempBits As String

    Dim snpData As Object                         'ADO Recordset object used to retrieve the user's data
    Dim dbDatabase As Object                      'ADO database connection object

    On Error Resume Next

    Set snpData = CreateObject("ADODB.Recordset")
    Set dbDatabase = CreateObject("ADODB.Connection")

    strDatabase = "MyDB"
    strUsername = "MyUser"
    strPassword = "MyPassword"

    'Connect to the database
    'Oracle connection string
    dbDatabase.ConnectionString = "Provider=OraOLEDB.Oracle;Data Source=" & strDatabase & ";User ID=" & strUsername & ";Password=" & strPassword & ";ChunkSize=1000;FetchSize=100;"

    dbDatabase.ConnectionTimeout = 40
    dbDatabase.CursorLocation = adUseClient
    dbDatabase.Open

    'Retrieve a recordset with 0 rows that will allow us to insert into the table
    strSQL = "SELECT"
    strSQL = strSQL & "  WORKORDER_TYPE," & vbCrLf
    strSQL = strSQL & "  WORKORDER_BASE_ID," & vbCrLf
    strSQL = strSQL & "  WORKORDER_LOT_ID," & vbCrLf
    strSQL = strSQL & "  WORKORDER_SPLIT_ID," & vbCrLf
    strSQL = strSQL & "  SEQUENCE_NO," & vbCrLf
    strSQL = strSQL & "  TYPE," & vbCrLf
    strSQL = strSQL & "  BITS_LENGTH," & vbCrLf
    strSQL = strSQL & "  BITS" & vbCrLf
    strSQL = strSQL & "FROM" & vbCrLf
    strSQL = strSQL & "  T5" & vbCrLf
    strSQL = strSQL & "WHERE" & vbCrLf
    strSQL = strSQL & "  TYPE='D'"
    snpData.Open strSQL, dbDatabase

    Application.ScreenUpdating = False
    lngRow = 1

    strOut = ""
    For i = 0 To snpData.Fields.Count - 1
        ActiveSheet.Cells(lngRow, i + 1).Value = snpData.Fields(i).Name
    Next i

    Do While Not snpData.EOF
        lngRow = lngRow + 1
        strOut = ""
        For i = 0 To snpData.Fields.Count - 2 'All Except the last column
            If Not (IsNull(snpData.Fields(i).Value)) Then
                'Switch out Ascii 13 & Ascii 10 combinations for just Ascii 10 so that line breaks do not cause problems in the resulting file
                strOut = strOut & Replace(snpData.Fields(i).Value, vbCrLf, vbLf) & vbTab
                ActiveSheet.Cells(lngRow, i + 1).Value = snpData.Fields(i).Value
            Else
                strOut = strOut & vbTab
            End If
        Next i

        'Handle the LONG RAW column
        strTempBits = Replace(StrConv(snpData("bits"), vbUnicode), vbCrLf, vbLf)
        strOut = strOut & strTempBits & vbTab
        ActiveSheet.Cells(lngRow, snpData.Fields.Count).Value = strTempBits

        snpData.MoveNext
    Loop
    snpData.Close

    Application.ScreenUpdating = True

    dbDatabase.Close

    Set snpData = Nothing
    Set dbDatabase = Nothing
End Sub

The resulting worksheet would then need a bit more clean up than it did when we wrote out the data to a text file and brought the text file into the worksheet with a QueryTable:

So, now you know how to retrieve text contained in a LONG RAW column (our T5 table) or text contained in a BLOB column (our T5_TEMP table, as stored by the recent releases of the ERP package) and display the text in an Excel worksheet with other columns retrieved by a query.

I fully recognize that LONG RAW columns are deprecated, but is anyone able to identify a more direct way to transform a LONG RAW column value into a VARCHAR2 using just a SQL statement with built-in Oracle Database functions, without requiring an intermediate temporary table?





Possibly Interesting Execution Plan

29 10 2010

October 29, 2010

I recently found what I believe to be an interesting execution plan, but then maybe I am just having difficulty thinking of blog article topics.  Here is the execution plan:

Plan hash value: 904523798

--------------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name                      | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |                           |     1 |    80 |       |  5464   (1)| 00:01:06 |
|   1 |  HASH GROUP BY                 |                           |     1 |    80 |       |  5464   (1)| 00:01:06 |
|   2 |   NESTED LOOPS                 |                           |     1 |    80 |       |  5463   (1)| 00:01:06 |
|*  3 |    HASH JOIN                   |                           |    36 |  2340 |  6376K|  5401   (1)| 00:01:05 |
|   4 |     TABLE ACCESS BY INDEX ROWID| T_DATA_CORP_AUTHORIZATION |   408 |  9792 |       |    75   (0)| 00:00:01 |
|   5 |      NESTED LOOPS              |                           |   116K|  5003K|       |  2535   (1)| 00:00:31 |
|*  6 |       TABLE ACCESS FULL        | T_DATA_FILE_DETAILS       |   286 |  5720 |       |   202   (4)| 00:00:03 |
|*  7 |       INDEX RANGE SCAN         | PK_DATA_CORPAUTHORIZATION |   408 |       |       |     6   (0)| 00:00:01 |
|   8 |     INDEX FAST FULL SCAN       | IDX_FILE_REF_PLOC         |   911K|    18M|       |  1120   (1)| 00:00:14 |
|*  9 |    TABLE ACCESS BY INDEX ROWID | T_DATA_RECORD_DETAILS     |     1 |    15 |       |     2   (0)| 00:00:01 |
|* 10 |     INDEX UNIQUE SCAN          | PK_DATA_RECORD_DETAILS    |     1 |       |       |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("D"."FILE_ID"="R"."FILE_ID" AND "D"."RECORD_REFERENCE_NO"="R"."RECORD_REFERENCE_NO" AND
              "D"."FILE_ID"="M"."FILE_ID")
   6 - filter(TO_DATE(INTERNAL_FUNCTION("M"."FILE_UPLOADED_ON"))>=TO_DATE('2010-10-01 00:00:00',
              'yyyy-mm-dd hh24:mi:ss') AND TO_DATE(INTERNAL_FUNCTION("M"."FILE_UPLOADED_ON"))<=TO_DATE('2010-10-28
              00:00:00', 'yyyy-mm-dd hh24:mi:ss'))
   7 - access("R"."FILE_ID"="M"."FILE_ID")
   9 - filter(("N"."PRINTING_STATUS" IS NULL OR "N"."PRINTING_STATUS"<>'C') AND
              "N"."CORPORATE_AUTHORIZATION_DONE"='Y')
  10 - access("N"."FILE_ID"="R"."FILE_ID" AND "N"."RECORD_REFERENCE_NO"="R"."RECORD_REFERENCE_NO")

What do you think?  Perhaps someone will find something that is more interesting about the execution plan than what I found?  Or maybe it is just a boring execution plan that is nothing special, and I am just using this as an excuse for a blog article topic.  It might help to see the original SQL statement, but maybe not.





TKPROF in 11.2.0.2 Unexpected Improvement – Rows (1st), Rows (avg), and Rows (max)

12 10 2010

October 12, 2010

One of the greatest features of TKPROF is its ability to quickly summarize very long 10046 extended SQL trace files into easily understood plain text files.  And one of the greatest problems with TKPROF is its default setting that instructs the program to summarize the 10046 extended SQL trace files into easily understood plain text files.  While generating the summary, TKPROF discards a lot of very useful information, and that is why I typically either manually review the raw 10046 trace files or process the files with my Toy Project for Performance Tuning.

In a recent OTN thread a person mentioned seeing columns labeled “Rows (1st)” , “Rows (avg)” , and “Rows (max)” in the Row Source Operation portion of the TKPROF output generated by Oracle Database 11.2.0.2, and wondered what the values in those columns meant.  I could guess the meaning, but why guess when you can search.  A Google search found… the OTN thread and little else.  A documentation search of download.oracle.com found… nothing.  A search of Metalink (MOS) found… nothing.  So, I took a guess as to what the columns mean since I did not have access to Oracle Database 11.2.0.2 in order to perform a direct test.  I built a test case script which constructed two test tables as follows, each with an index (I am still waiting for the OTN thread OP to post his results):

CREATE TABLE T1(
  C1 NUMBER NOT NULL,
  C2 NUMBER NOT NULL,
  C3 VARCHAR2(100));

CREATE TABLE T2(
  C1 NUMBER NOT NULL,
  C2 NUMBER NOT NULL,
  C3 VARCHAR2(100));

INSERT INTO
  T1
SELECT
  ROWNUM C1,
  V2.RN C2,
  RPAD(TO_CHAR(ROWNUM),10,'A') C3
FROM
  (SELECT
    ROWNUM RN
  FROM
    DUAL
  CONNECT BY
    LEVEL<=1000) V1,
  (SELECT
    ROWNUM RN
  FROM
    DUAL
  CONNECT BY
    LEVEL<=1000) V2
WHERE
  MOD(V1.RN,1000)>=V2.RN;

INSERT INTO
  T2
SELECT
  *
FROM
  T1;

COMMIT;

CREATE INDEX IND_T1_C1_C2 ON T1(C1,C2);
CREATE INDEX IND_T2_C1_C2 ON T2(C1,C2);

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T1',CASCADE=>TRUE)
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T2',CASCADE=>TRUE)

The table data inserted into the two test tables have a variable number of rows for each distinct value in column C2.  For example:

SELECT
  C2,
  COUNT(*)
FROM
  T1
GROUP BY
  C2
ORDER BY
  C2;

  C2   COUNT(*)
---- ----------
   1        999
   2        998
   3        997
   4        996
   5        995
   6        994
   7        993
   8        992
   9        991
  10        990
...
 995          5
 996          4
 997          3
 998          2
 999          1

Column C3 is simply a 10 character padding column to make each table row slightly longer, which may decrease the chances of the optimizer selecting full table scans when accessing the tables, if an index access path is available.  Next, the test script (note that I swapped the order of the first two bind variable values to better illustrate what those columns in the TKPROF output indicate): 

SET LINESIZE 160
SET PAGESIZE 1000
SET TRIMSPOOL ON

ALTER SESSION SET TRACEFILE_IDENTIFIER = 'SQL_10046_ROW_TEST';
ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT FOREVER, LEVEL 4';

SPOOL sql_10046_row_test.txt

VARIABLE N1 NUMBER

EXEC :N1:=5

SELECT /*+ GATHER_PLAN_STATISTICS */
  COUNT(*)
FROM
  T1,
  T2
WHERE
  T1.C2=T2.C2
  AND T1.C2= :N1;

SELECT
  *
FROM
  TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

EXEC :N1:=1

SELECT /*+ GATHER_PLAN_STATISTICS */
  COUNT(*)
FROM
  T1,
  T2
WHERE
  T1.C2=T2.C2
  AND T1.C2= :N1;

SELECT
  *
FROM
  TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

EXEC :N1:=95

SELECT /*+ GATHER_PLAN_STATISTICS */
  COUNT(*)
FROM
  T1,
  T2
WHERE
  T1.C2=T2.C2
  AND T1.C2= :N1;

SELECT
  *
FROM
  TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

EXEC :N1:=995

SELECT /*+ GATHER_PLAN_STATISTICS */
  COUNT(*)
FROM
  T1,
  T2
WHERE
  T1.C2=T2.C2
  AND T1.C2= :N1;

SELECT
  *
FROM
  TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT OFF';

SPOOL OFF

The excution plans generated by the calls to DBMS_XPLAN in the above script follow:

SQL_ID  6z03777fugw57, child number 0
-------------------------------------
SELECT /*+ GATHER_PLAN_STATISTICS */   COUNT(*) FROM   T1,   T2 WHERE
T1.C2=T2.C2   AND T1.C2= :N1

Plan hash value: 765688747

-------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation              | Name         | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
-------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |              |      1 |        |      1 |00:00:00.15 |    2780 |      2 |       |       |          |
|   1 |  SORT AGGREGATE        |              |      1 |      1 |      1 |00:00:00.15 |    2780 |      2 |       |       |          |
|*  2 |   HASH JOIN            |              |      1 |    250K|    990K|00:00:00.26 |    2780 |      2 |  1517K|  1517K|  712K (0)|
|*  3 |    INDEX FAST FULL SCAN| IND_T1_C1_C2 |      1 |    500 |    995 |00:00:00.03 |    1390 |      1 |       |       |          |
|*  4 |    INDEX FAST FULL SCAN| IND_T2_C1_C2 |      1 |    500 |    995 |00:00:00.02 |    1390 |      1 |       |       |          |
-------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("T1"."C2"="T2"."C2")
   3 - filter("T1"."C2"=:N1)
   4 - filter("T2"."C2"=:N1)

------

SQL_ID  6z03777fugw57, child number 0
-------------------------------------
SELECT /*+ GATHER_PLAN_STATISTICS */   COUNT(*) FROM   T1,   T2 WHERE
T1.C2=T2.C2   AND T1.C2= :N1

Plan hash value: 765688747

----------------------------------------------------------------------------------------------------------------------------
| Id  | Operation              | Name         | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |              |      1 |        |      1 |00:00:00.12 |    2780 |       |       |          |
|   1 |  SORT AGGREGATE        |              |      1 |      1 |      1 |00:00:00.12 |    2780 |       |       |          |
|*  2 |   HASH JOIN            |              |      1 |    250K|    998K|00:00:00.21 |    2780 |  1517K|  1517K|  719K (0)|
|*  3 |    INDEX FAST FULL SCAN| IND_T1_C1_C2 |      1 |    500 |    999 |00:00:00.02 |    1390 |       |       |          |
|*  4 |    INDEX FAST FULL SCAN| IND_T2_C1_C2 |      1 |    500 |    999 |00:00:00.02 |    1390 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("T1"."C2"="T2"."C2")
   3 - filter("T1"."C2"=:N1)
   4 - filter("T2"."C2"=:N1)

------

SQL_ID  6z03777fugw57, child number 0
-------------------------------------
SELECT /*+ GATHER_PLAN_STATISTICS */   COUNT(*) FROM   T1,   T2 WHERE
T1.C2=T2.C2   AND T1.C2= :N1

Plan hash value: 765688747

----------------------------------------------------------------------------------------------------------------------------
| Id  | Operation              | Name         | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |              |      1 |        |      1 |00:00:00.10 |    2780 |       |       |          |
|   1 |  SORT AGGREGATE        |              |      1 |      1 |      1 |00:00:00.10 |    2780 |       |       |          |
|*  2 |   HASH JOIN            |              |      1 |    250K|    819K|00:00:00.18 |    2780 |  1517K|  1517K|  713K (0)|
|*  3 |    INDEX FAST FULL SCAN| IND_T1_C1_C2 |      1 |    500 |    905 |00:00:00.02 |    1390 |       |       |          |
|*  4 |    INDEX FAST FULL SCAN| IND_T2_C1_C2 |      1 |    500 |    905 |00:00:00.02 |    1390 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("T1"."C2"="T2"."C2")
   3 - filter("T1"."C2"=:N1)
   4 - filter("T2"."C2"=:N1)

------

SQL_ID  6z03777fugw57, child number 0
-------------------------------------
SELECT /*+ GATHER_PLAN_STATISTICS */   COUNT(*) FROM   T1,   T2 WHERE
T1.C2=T2.C2   AND T1.C2= :N1

Plan hash value: 765688747

----------------------------------------------------------------------------------------------------------------------------
| Id  | Operation              | Name         | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |              |      1 |        |      1 |00:00:00.04 |    2780 |       |       |          |
|   1 |  SORT AGGREGATE        |              |      1 |      1 |      1 |00:00:00.04 |    2780 |       |       |          |
|*  2 |   HASH JOIN            |              |      1 |    250K|     25 |00:00:00.04 |    2780 |  1452K|  1452K|  677K (0)|
|*  3 |    INDEX FAST FULL SCAN| IND_T1_C1_C2 |      1 |    500 |      5 |00:00:00.02 |    1390 |       |       |          |
|*  4 |    INDEX FAST FULL SCAN| IND_T2_C1_C2 |      1 |    500 |      5 |00:00:00.02 |    1390 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("T1"."C2"="T2"."C2")
   3 - filter("T1"."C2"=:N1)
   4 - filter("T2"."C2"=:N1)

By looking at the consistent SQL_ID (6z03777fugw57) and child number (0) values displayed with the execution plans, we are able to determine that there was a single hard parse for this SQL statement, even though the number of rows returned by the operation indicated by ID 3 returned anywhere from 25 to 998,000 rows.  It sure would be nice if TKPROF summaries provided some sort of indication about the variable number of rows returned by each operation in the execution plan.  Prior to Oracle Database 11.1, the STAT lines in the raw 10046 trace file, which are transformed by TKPROF into the “Row Source Operation” section in the TKPROF output, were written only when the cursor for the SQL statement closed.  The default behavior in Oracle Database 11.1 changed, with the STAT lines written by default (edit: October 13, 2010) with every after the first execution of a SQL statement (controlled by the PLAN_STAT parameter of DBMS_SESSION.SESSION_TRACE_ENABLE)).  With the introduction of adaptive cursor sharing, each execution with different bind variable values could have a different execution plan written to the raw 10046 trace file.  TKPROF in 11.1.0.6 had a tendency to completely “get it wrong” when the execution plan for a SQL statement changed.  The situation improved with TKPROF in 11.1.0.7, which is able to sense a change in the execution plan and not combine the output statistics for the different execution plans.

The problem still remained through 11.2.0.1 that the “Row Source Operation” section only showed the statistics for the first execution (or the first execution for each new execution plan for a SQL statement).  How would the person reading the TKPROF output recognize that sometimes the operation indicated by ID 3 returned 3 rows, other times returned 998,000 rows, and (assuming the SQL statement was hard parsed during the trace) was originally optimized with the operation returning 990,000 rows.  That could be valuable information – maybe adaptive cursor sharing should have triggered the generation of a different execution plan.  So, what are those columns labeled “Rows (1st)”, “Rows (avg)”, and “Rows (max)” – and what happened to the “Rows” column?  Certainly, the header information in the TKPROF output must explain these three columns (from the 11.2.0.2 TKPROF output):

********************************************************************************
count    = number of times OCI procedure was executed
cpu      = cpu time in seconds executing
elapsed  = elapsed time in seconds executing
disk     = number of physical reads of buffers from disk
query    = number of buffers gotten for consistent read
current  = number of buffers gotten in current mode (usually for update)
rows     = number of rows processed by the fetch or execute call
********************************************************************************

Well, there is the “Rows” column that no longer exists in the “Row Source Operation” section.  Maybe the three columns indicate the number of rows on the first execution, the average number of rows for all executions, and the maximum number of rows of all executions.  That information would certainly be helpful when reviewing TKPROF summaries.

If we pass the 10046 extended SQL trace file generated by the test script (executed against Oracle Database 11.2.0.1) through TKPROF provided with either Oracle Database 11.1.0.6 or 11.2.0.1, we see the following for the SQL statement of interest:

SELECT /*+ GATHER_PLAN_STATISTICS */
  COUNT(*)
FROM
  T1,
  T2
WHERE
  T1.C2=T2.C2
  AND T1.C2= :N1

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        4      0.00       0.00          0          0          0           0
Execute      4      0.00       0.01          0          0          0           0
Fetch        8      0.39       0.41          2      11120          0           4
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total       16      0.39       0.43          2      11120          0           4

Misses in library cache during parse: 1
Misses in library cache during execute: 1
Optimizer mode: ALL_ROWS
Parsing user id: 68 

Rows     Row Source Operation
-------  ---------------------------------------------------
      1  SORT AGGREGATE (cr=2780 pr=2 pw=0 time=0 us)
 990025   HASH JOIN  (cr=2780 pr=2 pw=0 time=222451 us cost=759 size=2000000 card=250000)
    995    INDEX FAST FULL SCAN IND_T1_C1_C2 (cr=1390 pr=1 pw=0 time=27459 us cost=379 size=2000 card=500)(object id 72576)
    995    INDEX FAST FULL SCAN IND_T2_C1_C2 (cr=1390 pr=1 pw=0 time=17395 us cost=379 size=2000 card=500)(object id 72577)

Four executions, yet the Row Source Operation only shows the execution plan for the first execution.  Now, let’s take a look at the output found in the 11.2.0.2 TKPROF file:

SELECT /*+ GATHER_PLAN_STATISTICS */
  COUNT(*)
FROM
  T1,
  T2
WHERE
  T1.C2=T2.C2
  AND T1.C2= :N1

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        4      0.00       0.00          0          0          0           0
Execute      4      0.00       0.01          0          0          0           0
Fetch        8      0.39       0.41          2      11120          0           4
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total       16      0.39       0.43          2      11120          0           4

Misses in library cache during parse: 1
Misses in library cache during execute: 1
Optimizer mode: ALL_ROWS
Parsing user id: 68 
Number of plan statistics captured: 3

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         1          1          1  SORT AGGREGATE (cr=2780 pr=1 pw=0 time=0 us)
    990025     935684     998001   HASH JOIN  (cr=2780 pr=1 pw=0 time=190669 us cost=759 size=2000000 card=250000)
       995        966        999    INDEX FAST FULL SCAN IND_T1_C1_C2 (cr=1390 pr=0 pw=0 time=21262 us cost=379 size=2000 card=500)(object id 72576)
       995        966        999    INDEX FAST FULL SCAN IND_T2_C1_C2 (cr=1390 pr=0 pw=0 time=17005 us cost=379 size=2000 card=500)(object id 72577)

In the above we can see quite clearly that the number of rows returned by the second line in the execution plan returned a variable number of rows.  If we manually calculate the average number of rows found in the DBMS_XPLAN outputs, we find (990,000 + 998,000 + 819,000 + 25) / 4 = 701,756.25, and that is not the value shown by the “Rows (avg)” column.  The value shown by the “Rows (avg)” column apparently discards the last value from the average, thus it is (990,000 + 998,000 + 819,000) / 3 = 935,666.67 (slightly different from the displayed value due to rounding errors in the DBMS_XPLAN output).  Helpful, but it probably is not a good idea to throw out the value 25 when calculating the average when the rest of the numbers are so large.

Interestingly, the bottom of the 11.1.0.6 TKPROF output shows this:

Trace file: or112_ora_9393_SQL_10046_ROW_TEST.trc
Trace file compatibility: 11.01.00
Sort options: default

       1  session in tracefile.
      33  user  SQL statements in trace file.
     299  internal SQL statements in trace file.
     332  SQL statements in trace file.
      60  unique SQL statements in trace file.
    7857  lines in trace file.
 1286922860  elapsed seconds in trace file.

While the bottom of the 11.2.0.1 and 11.2.0.2 TKPROF outputs shows this:

Trace file: or112_ora_9393_SQL_10046_ROW_TEST.trc
Trace file compatibility: 11.1.0.7
Sort options: default

       1  session in tracefile.
      33  user  SQL statements in trace file.
     299  internal SQL statements in trace file.
     332  SQL statements in trace file.
      60  unique SQL statements in trace file.
    7857  lines in trace file.
      22  elapsed seconds in trace file.

Notice the difference in the elapsed seconds!  I guess that over time some bugs are fixed, while others are introduced.

———————–

Updated (5 minutes after hitting the Publish button in WordPress):

Chris Antognini published, roughly an hour before my article based on the Twitter feed on orana.info, a blog entry on the same subject.  You can read his blog article here – his blog article is probably much more comprehensive than my blog article.





Consistent Gets During a Hard Parse – a Test Case to See One Possible Cause

7 10 2010

October 7, 2010

A recent OTN thread caught my attention.  The original poster noticed that when generating a TKPROF summary from a 10046 trace file that the parse call showed a significant number of consistent gets when compared to the fetch and execute calls.  The TKPROF summary looked something like this:

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        4      0.85       1.07          0       3599          0           0
Execute      4      0.00       0.00          0          0          0           0
Fetch        4      0.01       0.05         14         36          0           4
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total       12      0.86       1.13         14       3635          0           4

Misses in library cache during parse: 4
Optimizer mode: ALL_ROWS
Parsing user id: 72 

Rows     Row Source Operation
-------  ---------------------------------------------------
      1  NESTED LOOPS  (cr=9 pr=8 pw=0 time=33821 us)
      1   TABLE ACCESS BY INDEX ROWID STATUS (cr=2 pr=2 pw=0 time=21787 us)
      1    INDEX UNIQUE SCAN STATUS_CON21 (cr=1 pr=1 pw=0 time=20957 us)(object id 58618)
      1   TABLE ACCESS FULL USER_ACCOUNT (cr=7 pr=6 pw=0 time=11978 us)

The TKPROF summary showed that there were 3,599 consistent gets during the four times that this SQL statement was hard parsed, while there were only 36 consistent gets performed during the four executions and fetches.  The displayed Row Source Operation execution plan shows that 9 consistent gets were performed, and because that number is 1/4 of the value displayed by the TKPROF  summary (the SQL statement was executed four times), that likely indicates that the OP is running Oracle Database 11.1.0.6 or greater, which by default outputs the STAT lines (TKPROF Row Source Operation lines) to the trace file after every execution.

How could this SQL statement perform an average of 900 consistent gets per hard parse, and only an average of 9 consistent gets to retrieve the rows from the two tables listed in the SQL statement?  This is probably a good excuse to build a test case to try out a couple of ideas.  A couple of good suggestions were offered in the OTN thread regarding what may cause consistent gets during a hard parse, but 900 consistent gets?  The OP mentioned that this problem is happening in a development database, and that may be a key clue.  What if the OP creates a couple of tables with a couple of indexes, and then loads data into the tables just before executing his SQL statement?  If the OPTIMIZER_MODE is set to the deprecated values of RULE or CHOOSE we could very well see one result for the number of consistent gets, which differs from what happens when the OPTIMIZER_MODE is set to a non-deprecated value.  What if the OP does not collect statistics on the tables and indexes, and those tables and indexes either do not survive until 10 PM or the DBA has disabled the automatic stale statistics collection job that typically starts around 10 PM?  What if the memory allocated to the SGA is much smaller than what is needed?  What if… (fill in the blank)?

Let’s build a quick test case.  First, we will make certain that no one has adjusted the default value for dynamic sampling:

SHOW PARAMETER DYNAMIC_SAMPLING

NAME                                 TYPE        VALUE
------------------------------------ ----------- -----
optimizer_dynamic_sampling           integer     2

Still the default, so we will continue building the test case on Oracle Database 11.1.0.7:

CREATE TABLE T1 (
  C1 NUMBER NOT NULL,
  C2 VARCHAR2(20) NOT NULL,
  C3 VARCHAR2(100));

CREATE INDEX IND_T1_C1 ON T1(C1);
CREATE INDEX IND_T1_C2 ON T1(C2);

CREATE TABLE T2 (
  C1 NUMBER NOT NULL,
  C2 VARCHAR2(20) NOT NULL,
  C3 VARCHAR2(100));

CREATE INDEX IND_T2_C1 ON T2(C1);
CREATE INDEX IND_T2_C2 ON T2(C2); 

The above creates two simple tables, each with two indexes.  Note that the indexes are created before the tables contain any rows, so statistics are not automatically collected for the indexes when they are created.  Now to insert the rows:

INSERT INTO
  T1
SELECT
  ROWNUM,
  RPAD(TO_CHAR(ROWNUM),20,'A'),
  RPAD(TO_CHAR(ROWNUM),100,'B')
FROM
  DUAL
CONNECT BY
  LEVEL<=100000;

COMMIT;

INSERT INTO
  T2
SELECT
  *
FROM
  T1;

COMMIT;

Just to verify that the indexes do not have statistics:

COLUMN INDEX_NAME FORMAT A10

SELECT
  INDEX_NAME,
  BLEVEL,
  NUM_ROWS,
  LEAF_BLOCKS
FROM
  USER_INDEXES
WHERE
  INDEX_NAME IN ('IND_T1_C1','IND_T2_C1','IND_T1_C2','IND_T2_C2');

INDEX_NAME     BLEVEL   NUM_ROWS LEAF_BLOCKS
---------- ---------- ---------- -----------
IND_T1_C1
IND_T1_C2
IND_T2_C1
IND_T2_C2 

Now for the experiment:

ALTER SESSION SET TRACEFILE_IDENTIFIER = 'PARSE_TEST1';
ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT FOREVER, LEVEL 1';

SET AUTOTRACE TRACEONLY STATISTICS

SELECT /* PARSE_TEST1 */
  T1.C1 T1_C1,
  SUBSTR(T1.C3,1,10) T1_C3,
  T2.C2 T2_C2,
  SUBSTR(T2.C3,1,10) T2_C3
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1
  AND T1.C1<=2000;

Statistics
---------------------------------------------------
         17  recursive calls
          0  db block gets
        514  consistent gets
         15  physical reads
          0  redo size
     113831  bytes sent via SQL*Net to client
       1844  bytes received via SQL*Net from client
        135  SQL*Net roundtrips to/from client
          2  sorts (memory)
          0  sorts (disk)
       2000  rows processed 

Note in the above that there were 514 consistent gets to retrieve 2,000 rows (15 rows at a time because of the default array fetch size in SQL*Plus).  Continuing with the test:

ALTER SESSION SET TRACEFILE_IDENTIFIER = 'PARSE_TEST2';

SELECT /* PARSE_TEST2 */
  T1.C1 T1_C1,
  SUBSTR(T1.C3,1,10) T1_C3,
  T2.C2 T2_C2,
  SUBSTR(T2.C3,1,10) T2_C3
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1
  AND T1.C1 BETWEEN 2001 AND 4000;

Statistics
---------------------------------------------------
         17  recursive calls
          0  db block gets
        560  consistent gets
         79  physical reads
       3276  redo size
     113930  bytes sent via SQL*Net to client
       1844  bytes received via SQL*Net from client
        135  SQL*Net roundtrips to/from client
          2  sorts (memory)
          0  sorts (disk)
       2000  rows processed 

Note in the above that there were 560 consistent gets to retrieve 2,000 rows.  Continuing with the test:

ALTER SESSION SET TRACEFILE_IDENTIFIER = 'PARSE_TEST3';

SELECT /* PARSE_TEST3 */
  T1.C1 T1_C1,
  SUBSTR(T1.C3,1,10) T1_C3,
  T2.C2 T2_C2,
  SUBSTR(T2.C3,1,10) T2_C3
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1
  AND T1.C1 BETWEEN 4001 AND 6000;

Statistics
---------------------------------------------------
         13  recursive calls
          0  db block gets
        568  consistent gets
        102  physical reads
       3564  redo size
     113930  bytes sent via SQL*Net to client
       1844  bytes received via SQL*Net from client
        135  SQL*Net roundtrips to/from client
          2  sorts (memory)
          0  sorts (disk)
       2000  rows processed 

Note in the above that there were 568 consistent gets to retrieve 2,000 rows.  Continuing with the test:

ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT OFF';

SELECT /* PARSE_TEST3 */
  T1.C1 T1_C1,
  SUBSTR(T1.C3,1,10) T1_C3,
  T2.C2 T2_C2,
  SUBSTR(T2.C3,1,10) T2_C3
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1
  AND T1.C1 BETWEEN 4001 AND 6000;

Statistics
---------------------------------------------------
         17  recursive calls
          0  db block gets
        520  consistent gets
          8  physical reads
          0  redo size
     113930  bytes sent via SQL*Net to client
       1844  bytes received via SQL*Net from client
        135  SQL*Net roundtrips to/from client
          2  sorts (memory)
          0  sorts (disk)
       2000  rows processed 

Note that in the above that there was another hard parse.

Now let’s take a look in the trace files (see this three part blog article series for help with decoding 10046 trace files), starting with the PARSE_TEST1 trace file (note that I manually line wrapped the dep=1 SQL statements):

...
=====================
PARSING IN CURSOR #4 len=619 dep=1 uid=60 oct=3 lid=60 tim=517854252962 hv=1052836258 ad='224c0738' sqlid='dwa1q5szc20d2'
SELECT /* OPT_DYN_SAMP */ /*+ ALL_ROWS IGNORE_WHERE_CLAUSE NO_PARALLEL(SAMPLESUB) opt_param('parallel_execution_enabled',
 'false') NO_PARALLEL_INDEX(SAMPLESUB) NO_SQL_TUNE */ NVL(SUM(C1),:"SYS_B_00"), NVL(SUM(C2),:"SYS_B_01"), COUNT(DISTINCT C3),
 NVL(SUM(CASE WHEN C3 IS NULL THEN :"SYS_B_02" ELSE :"SYS_B_03" END),:"SYS_B_04") FROM (SELECT /*+ IGNORE_WHERE_CLAUSE
 NO_PARALLEL("T1") FULL("T1") NO_PARALLEL_INDEX("T1") */ :"SYS_B_05" AS C1, CASE WHEN "T1"."C1"<=:"SYS_B_06" THEN :"SYS_B_07"
 ELSE :"SYS_B_08" END AS C2, "T1"."C1" AS C3 FROM "T1" SAMPLE BLOCK (:"SYS_B_09" , :"SYS_B_10") SEED (:"SYS_B_11") "T1") SAMPLESUB
END OF STMT
PARSE #4:c=0,e=0,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=1,plh=0,tim=517854252962
EXEC #4:c=0,e=0,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=1,plh=3906273573,tim=517854252962
FETCH #4:c=0,e=0,p=0,cr=71,cu=0,mis=0,r=1,dep=1,og=1,plh=3906273573,tim=517854252962
STAT #4 id=1 cnt=1 pid=0 pos=1 obj=0 op='SORT GROUP BY (cr=71 pr=0 pw=0 time=0 us)'
STAT #4 id=2 cnt=3410 pid=1 pos=1 obj=102003 op='TABLE ACCESS SAMPLE T1 (cr=71 pr=0 pw=0 time=0 us cost=2 size=75 card=3)'
CLOSE #4:c=0,e=0,dep=1,type=0,tim=517854252962
=====================
PARSING IN CURSOR #2 len=445 dep=1 uid=60 oct=3 lid=60 tim=517854252962 hv=840679737 ad='285dce70' sqlid='2r4a0g8t1rh9t'
SELECT /* OPT_DYN_SAMP */ /*+ ALL_ROWS opt_param('parallel_execution_enabled', 'false') NO_PARALLEL(SAMPLESUB)
 NO_PARALLEL_INDEX(SAMPLESUB) NO_SQL_TUNE */ NVL(SUM(C1),:"SYS_B_0"), NVL(SUM(C2),:"SYS_B_1"), NVL(SUM(C3),:"SYS_B_2") FROM
 (SELECT /*+ NO_PARALLEL("T1") INDEX("T1" IND_T1_C1) NO_PARALLEL_INDEX("T1") */ :"SYS_B_3" AS C1, :"SYS_B_4" AS C2, :"SYS_B_5" AS C3
  FROM "T1" "T1" WHERE "T1"."C1"<=:"SYS_B_6" AND ROWNUM <= :"SYS_B_7") SAMPLESUB
END OF STMT
PARSE #2:c=0,e=0,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=1,plh=0,tim=517854252962
EXEC #2:c=0,e=0,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=1,plh=311010773,tim=517854252962
FETCH #2:c=0,e=0,p=0,cr=5,cu=0,mis=0,r=1,dep=1,og=1,plh=311010773,tim=517854252962
STAT #2 id=1 cnt=1 pid=0 pos=1 obj=0 op='SORT AGGREGATE (cr=5 pr=0 pw=0 time=0 us)'
STAT #2 id=2 cnt=2000 pid=1 pos=1 obj=0 op='VIEW  (cr=5 pr=0 pw=0 time=0 us cost=12 size=390 card=10)'
STAT #2 id=3 cnt=2000 pid=2 pos=1 obj=0 op='COUNT STOPKEY (cr=5 pr=0 pw=0 time=0 us)'
STAT #2 id=4 cnt=2000 pid=3 pos=1 obj=102004 op='INDEX RANGE SCAN IND_T1_C1 (cr=5 pr=0 pw=0 time=0 us cost=12 size=429 card=33)'
CLOSE #2:c=0,e=0,dep=1,type=0,tim=517854252962
=====================
PARSING IN CURSOR #6 len=619 dep=1 uid=60 oct=3 lid=60 tim=517854252962 hv=3874662801 ad='224bb3a0' sqlid='0qdcdrrmg5acj'
SELECT /* OPT_DYN_SAMP */ /*+ ALL_ROWS IGNORE_WHERE_CLAUSE NO_PARALLEL(SAMPLESUB) opt_param('parallel_execution_enabled', 'false')
 NO_PARALLEL_INDEX(SAMPLESUB) NO_SQL_TUNE */ NVL(SUM(C1),:"SYS_B_00"), NVL(SUM(C2),:"SYS_B_01"), COUNT(DISTINCT C3),
 NVL(SUM(CASE WHEN C3 IS NULL THEN :"SYS_B_02" ELSE :"SYS_B_03" END),:"SYS_B_04") FROM (SELECT /*+ IGNORE_WHERE_CLAUSE NO_PARALLEL("T2")
 FULL("T2") NO_PARALLEL_INDEX("T2") */ :"SYS_B_05" AS C1, CASE WHEN "T2"."C1"<=:"SYS_B_06" THEN :"SYS_B_07" ELSE :"SYS_B_08" END AS C2,
 "T2"."C1" AS C3 FROM "T2" SAMPLE BLOCK (:"SYS_B_09" , :"SYS_B_10") SEED (:"SYS_B_11") "T2") SAMPLESUB
END OF STMT
PARSE #6:c=0,e=0,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=1,plh=0,tim=517854252962
EXEC #6:c=0,e=0,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=1,plh=724732973,tim=517854252962
FETCH #6:c=0,e=31245,p=15,cr=71,cu=0,mis=0,r=1,dep=1,og=1,plh=724732973,tim=517854284207
STAT #6 id=1 cnt=1 pid=0 pos=1 obj=0 op='SORT GROUP BY (cr=71 pr=15 pw=0 time=0 us)'
STAT #6 id=2 cnt=3420 pid=1 pos=1 obj=102006 op='TABLE ACCESS SAMPLE T2 (cr=71 pr=15 pw=0 time=0 us cost=2 size=75 card=3)'
CLOSE #6:c=0,e=0,dep=1,type=0,tim=517854284207
=====================
PARSING IN CURSOR #5 len=445 dep=1 uid=60 oct=3 lid=60 tim=517854284207 hv=2421073910 ad='1f4973fc' sqlid='b84x6ja84x9zq'
SELECT /* OPT_DYN_SAMP */ /*+ ALL_ROWS opt_param('parallel_execution_enabled', 'false') NO_PARALLEL(SAMPLESUB) NO_PARALLEL_INDEX(SAMPLESUB)
 NO_SQL_TUNE */ NVL(SUM(C1),:"SYS_B_0"), NVL(SUM(C2),:"SYS_B_1"), NVL(SUM(C3),:"SYS_B_2") FROM (SELECT /*+ NO_PARALLEL("T2")
 INDEX("T2" IND_T2_C1) NO_PARALLEL_INDEX("T2") */ :"SYS_B_3" AS C1, :"SYS_B_4" AS C2, :"SYS_B_5" AS C3  FROM "T2" "T2" WHERE
 "T2"."C1"<=:"SYS_B_6" AND ROWNUM <= :"SYS_B_7") SAMPLESUB
END OF STMT
PARSE #5:c=0,e=0,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=1,plh=0,tim=517854284207
EXEC #5:c=0,e=0,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=1,plh=1821698049,tim=517854284207
FETCH #5:c=0,e=0,p=0,cr=8,cu=0,mis=0,r=1,dep=1,og=1,plh=1821698049,tim=517854284207
STAT #5 id=1 cnt=1 pid=0 pos=1 obj=0 op='SORT AGGREGATE (cr=8 pr=0 pw=0 time=0 us)'
STAT #5 id=2 cnt=2000 pid=1 pos=1 obj=0 op='VIEW  (cr=8 pr=0 pw=0 time=0 us cost=12 size=390 card=10)'
STAT #5 id=3 cnt=2000 pid=2 pos=1 obj=0 op='COUNT STOPKEY (cr=8 pr=0 pw=0 time=0 us)'
STAT #5 id=4 cnt=2000 pid=3 pos=1 obj=102007 op='INDEX RANGE SCAN IND_T2_C1 (cr=8 pr=0 pw=0 time=0 us cost=12 size=429 card=33)'
CLOSE #5:c=0,e=0,dep=1,type=0,tim=517854284207
=====================
PARSING IN CURSOR #3 len=163 dep=0 uid=60 oct=3 lid=60 tim=517854284207 hv=701515739 ad='1f4849dc' sqlid='dr1pvrsnx0jyv'
SELECT /* PARSE_TEST1 */
  T1.C1 T1_C1,
  SUBSTR(T1.C3,1,10) T1_C3,
  T2.C2 T2_C2,
  SUBSTR(T2.C3,1,10) T2_C3
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1
  AND T1.C1<=2000
END OF STMT
PARSE #3:c=0,e=31245,p=15,cr=159,cu=0,mis=1,r=0,dep=0,og=1,plh=169351222,tim=517854284207
EXEC #3:c=0,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=169351222,tim=517854284207
FETCH #3:c=15625,e=31263,p=0,cr=45,cu=0,mis=0,r=1,dep=0,og=1,plh=169351222,tim=517854315470
FETCH #3:c=0,e=0,p=0,cr=2,cu=0,mis=0,r=15,dep=0,og=1,plh=169351222,tim=517854315470
...
FETCH #3:c=0,e=0,p=0,cr=2,cu=0,mis=0,r=4,dep=0,og=1,plh=169351222,tim=517854346714
STAT #3 id=1 cnt=2000 pid=0 pos=1 obj=0 op='HASH JOIN  (cr=355 pr=0 pw=0 time=0 us cost=48 size=284000 card=2000)'
STAT #3 id=2 cnt=2000 pid=1 pos=1 obj=102003 op='TABLE ACCESS BY INDEX ROWID T1 (cr=42 pr=0 pw=0 time=0 us cost=22 size=130000 card=2000)'
STAT #3 id=3 cnt=2000 pid=2 pos=1 obj=102004 op='INDEX RANGE SCAN IND_T1_C1 (cr=5 pr=0 pw=0 time=0 us cost=6 size=0 card=2000)'
STAT #3 id=4 cnt=2000 pid=1 pos=2 obj=102006 op='TABLE ACCESS BY INDEX ROWID T2 (cr=313 pr=0 pw=0 time=0 us cost=25 size=154000 card=2000)'
STAT #3 id=5 cnt=2000 pid=4 pos=1 obj=102007 op='INDEX RANGE SCAN IND_T2_C1 (cr=142 pr=0 pw=0 time=0 us cost=9 size=0 card=2000)' 

In the above, you will notice that the parse call for our SQL statement performed 159 consistent gets.  If you add up the number of consistent gets performed by the dep=1 SQL statements that immediately preceed our SQL statement (71 + 5 + 71 + 8), you can see where 155 consistent gets were performed during the hard parse.  The first STAT line shows that the SQL statement actually required 355 consistent gets and no physical reads (SQL*Plus showed that 514 consistent gets and 15 physical reads were performed, and if you look closely at the dep=1 SQL statement you can see where the 15 physical block reads were performed).  355 + 155 = 510, which is just less than the 514 consistent gets reported by SQL*Plus, so we could look further up in the trace file to find the remaining 4 consistent gets.

Let’s take a look at the PARSE_TEST3 trace file:

...
=====================
PARSING IN CURSOR #3 len=646 dep=1 uid=60 oct=3 lid=60 tim=518053123679 hv=1391468433 ad='2264adec' sqlid='16htvx19g07wj'
SELECT /* OPT_DYN_SAMP */ /*+ ALL_ROWS IGNORE_WHERE_CLAUSE NO_PARALLEL(SAMPLESUB) opt_param('parallel_execution_enabled', 'false')
 NO_PARALLEL_INDEX(SAMPLESUB) NO_SQL_TUNE */ NVL(SUM(C1),:"SYS_B_00"), NVL(SUM(C2),:"SYS_B_01"), COUNT(DISTINCT C3),
 NVL(SUM(CASE WHEN C3 IS NULL THEN :"SYS_B_02" ELSE :"SYS_B_03" END),:"SYS_B_04") FROM (SELECT /*+ IGNORE_WHERE_CLAUSE NO_PARALLEL("T1")
 FULL("T1") NO_PARALLEL_INDEX("T1") */ :"SYS_B_05" AS C1, CASE WHEN "T1"."C1">=:"SYS_B_06" AND "T1"."C1"<=:"SYS_B_07" THEN :"SYS_B_08"
 ELSE :"SYS_B_09" END AS C2, "T1"."C1" AS C3 FROM "T1" SAMPLE BLOCK (:"SYS_B_10" , :"SYS_B_11") SEED (:"SYS_B_12") "T1") SAMPLESUB
END OF STMT
PARSE #3:c=0,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=3906273573,tim=518053123679
EXEC #3:c=0,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=3906273573,tim=518053123679
FETCH #3:c=15625,e=0,p=0,cr=71,cu=0,mis=0,r=1,dep=1,og=1,plh=3906273573,tim=518053123679
STAT #3 id=1 cnt=1 pid=0 pos=1 obj=0 op='SORT GROUP BY (cr=71 pr=0 pw=0 time=0 us)'
STAT #3 id=2 cnt=3410 pid=1 pos=1 obj=102003 op='TABLE ACCESS SAMPLE T1 (cr=71 pr=0 pw=0 time=0 us cost=2 size=75 card=3)'
CLOSE #3:c=0,e=0,dep=1,type=0,tim=518053123679
=====================
PARSING IN CURSOR #4 len=471 dep=1 uid=60 oct=3 lid=60 tim=518053123679 hv=1212909034 ad='1f5d5b28' sqlid='c5sn2v544r1ga'
SELECT /* OPT_DYN_SAMP */ /*+ ALL_ROWS opt_param('parallel_execution_enabled', 'false') NO_PARALLEL(SAMPLESUB) NO_PARALLEL_INDEX(SAMPLESUB)
 NO_SQL_TUNE */ NVL(SUM(C1),:"SYS_B_0"), NVL(SUM(C2),:"SYS_B_1"), NVL(SUM(C3),:"SYS_B_2") FROM (SELECT /*+ NO_PARALLEL("T1") INDEX("T1" IND_T1_C1)
 NO_PARALLEL_INDEX("T1") */ :"SYS_B_3" AS C1, :"SYS_B_4" AS C2, :"SYS_B_5" AS C3  FROM "T1" "T1" WHERE "T1"."C1">=:"SYS_B_6"
 AND "T1"."C1"<=:"SYS_B_7" AND ROWNUM <= :"SYS_B_8") SAMPLESUB
END OF STMT
PARSE #4:c=0,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=41534134,tim=518053123679
EXEC #4:c=0,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=41534134,tim=518053123679
FETCH #4:c=0,e=0,p=4,cr=10,cu=0,mis=0,r=1,dep=1,og=1,plh=41534134,tim=518053123679
STAT #4 id=1 cnt=1 pid=0 pos=1 obj=0 op='SORT AGGREGATE (cr=10 pr=4 pw=0 time=0 us)'
STAT #4 id=2 cnt=2000 pid=1 pos=1 obj=0 op='VIEW  (cr=10 pr=4 pw=0 time=0 us cost=12 size=390 card=10)'
STAT #4 id=3 cnt=2000 pid=2 pos=1 obj=0 op='COUNT STOPKEY (cr=10 pr=4 pw=0 time=0 us)'
STAT #4 id=4 cnt=2000 pid=3 pos=1 obj=0 op='FILTER  (cr=10 pr=4 pw=0 time=0 us)'
STAT #4 id=5 cnt=2000 pid=4 pos=1 obj=102004 op='INDEX RANGE SCAN IND_T1_C1 (cr=10 pr=4 pw=0 time=0 us cost=12 size=429 card=33)'
CLOSE #4:c=0,e=0,dep=1,type=0,tim=518053123679
=====================
PARSING IN CURSOR #2 len=646 dep=1 uid=60 oct=3 lid=60 tim=518053123679 hv=2627297926 ad='1d2877e4' sqlid='0u23vn6f9ksn6'
SELECT /* OPT_DYN_SAMP */ /*+ ALL_ROWS IGNORE_WHERE_CLAUSE NO_PARALLEL(SAMPLESUB) opt_param('parallel_execution_enabled', 'false')
 NO_PARALLEL_INDEX(SAMPLESUB) NO_SQL_TUNE */ NVL(SUM(C1),:"SYS_B_00"), NVL(SUM(C2),:"SYS_B_01"), COUNT(DISTINCT C3),
 NVL(SUM(CASE WHEN C3 IS NULL THEN :"SYS_B_02" ELSE :"SYS_B_03" END),:"SYS_B_04") FROM (SELECT /*+ IGNORE_WHERE_CLAUSE NO_PARALLEL("T2") FULL("T2")
 NO_PARALLEL_INDEX("T2") */ :"SYS_B_05" AS C1, CASE WHEN "T2"."C1">=:"SYS_B_06" AND "T2"."C1"<=:"SYS_B_07" THEN :"SYS_B_08" ELSE :"SYS_B_09" END AS C2,
 "T2"."C1" AS C3 FROM "T2" SAMPLE BLOCK (:"SYS_B_10" , :"SYS_B_11") SEED (:"SYS_B_12") "T2") SAMPLESUB
END OF STMT
PARSE #2:c=0,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=724732973,tim=518053123679
EXEC #2:c=0,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=724732973,tim=518053123679
FETCH #2:c=0,e=31256,p=19,cr=71,cu=0,mis=0,r=1,dep=1,og=1,plh=724732973,tim=518053154935
STAT #2 id=1 cnt=1 pid=0 pos=1 obj=0 op='SORT GROUP BY (cr=71 pr=19 pw=0 time=0 us)'
STAT #2 id=2 cnt=3420 pid=1 pos=1 obj=102006 op='TABLE ACCESS SAMPLE T2 (cr=71 pr=19 pw=0 time=0 us cost=2 size=75 card=3)'
CLOSE #2:c=0,e=0,dep=1,type=0,tim=518053154935
=====================
PARSING IN CURSOR #6 len=471 dep=1 uid=60 oct=3 lid=60 tim=518053154935 hv=2390595667 ad='1f59c3c4' sqlid='5xf39uf77v62m'
SELECT /* OPT_DYN_SAMP */ /*+ ALL_ROWS opt_param('parallel_execution_enabled', 'false') NO_PARALLEL(SAMPLESUB) NO_PARALLEL_INDEX(SAMPLESUB) NO_SQL_TUNE */
 NVL(SUM(C1),:"SYS_B_0"), NVL(SUM(C2),:"SYS_B_1"), NVL(SUM(C3),:"SYS_B_2") FROM (SELECT /*+ NO_PARALLEL("T2") INDEX("T2" IND_T2_C1)
 NO_PARALLEL_INDEX("T2") */ :"SYS_B_3" AS C1, :"SYS_B_4" AS C2, :"SYS_B_5" AS C3  FROM "T2" "T2" WHERE "T2"."C1">=:"SYS_B_6" AND "T2"."C1"<=:"SYS_B_7"
 AND ROWNUM <= :"SYS_B_8") SAMPLESUB
END OF STMT
PARSE #6:c=0,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=4155399029,tim=518053154935
EXEC #6:c=0,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=4155399029,tim=518053154935
FETCH #6:c=0,e=31251,p=7,cr=16,cu=0,mis=0,r=1,dep=1,og=1,plh=4155399029,tim=518053186186
STAT #6 id=1 cnt=1 pid=0 pos=1 obj=0 op='SORT AGGREGATE (cr=16 pr=7 pw=0 time=0 us)'
STAT #6 id=2 cnt=2000 pid=1 pos=1 obj=0 op='VIEW  (cr=16 pr=7 pw=0 time=0 us cost=12 size=390 card=10)'
STAT #6 id=3 cnt=2000 pid=2 pos=1 obj=0 op='COUNT STOPKEY (cr=16 pr=7 pw=0 time=0 us)'
STAT #6 id=4 cnt=2000 pid=3 pos=1 obj=0 op='FILTER  (cr=16 pr=7 pw=0 time=0 us)'
STAT #6 id=5 cnt=2000 pid=4 pos=1 obj=102007 op='INDEX RANGE SCAN IND_T2_C1 (cr=16 pr=7 pw=0 time=0 us cost=12 size=429 card=33)'
CLOSE #6:c=0,e=0,dep=1,type=0,tim=518053186186
=====================
PARSING IN CURSOR #5 len=179 dep=0 uid=60 oct=3 lid=60 tim=518053186186 hv=3811374996 ad='28520570' sqlid='0m07kq3jktxwn'
SELECT /* PARSE_TEST3 */
  T1.C1 T1_C1,
  SUBSTR(T1.C3,1,10) T1_C3,
  T2.C2 T2_C2,
  SUBSTR(T2.C3,1,10) T2_C3
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1
  AND T1.C1 BETWEEN 4001 AND 6000
END OF STMT
PARSE #5:c=15625,e=62507,p=30,cr=172,cu=0,mis=1,r=0,dep=0,og=1,plh=169351222,tim=518053186186
EXEC #5:c=0,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=169351222,tim=518053186186
...
FETCH #5:c=0,e=0,p=0,cr=2,cu=0,mis=0,r=4,dep=0,og=1,plh=169351222,tim=518053279932
STAT #5 id=1 cnt=2000 pid=0 pos=1 obj=0 op='HASH JOIN  (cr=396 pr=72 pw=0 time=0 us cost=48 size=284000 card=2000)'
STAT #5 id=2 cnt=2000 pid=1 pos=1 obj=102003 op='TABLE ACCESS BY INDEX ROWID T1 (cr=44 pr=35 pw=0 time=0 us cost=22 size=130000 card=2000)'
STAT #5 id=3 cnt=2000 pid=2 pos=1 obj=102004 op='INDEX RANGE SCAN IND_T1_C1 (cr=6 pr=0 pw=0 time=0 us cost=6 size=0 card=2000)'
STAT #5 id=4 cnt=2000 pid=1 pos=2 obj=102006 op='TABLE ACCESS BY INDEX ROWID T2 (cr=352 pr=37 pw=0 time=0 us cost=25 size=154000 card=2000)'
STAT #5 id=5 cnt=2000 pid=4 pos=1 obj=102007 op='INDEX RANGE SCAN IND_T2_C1 (cr=143 pr=0 pw=0 time=0 us cost=9 size=0 card=2000)' 

This time you can see that 172 consistent gets were performed during the hard parse.  If we add up the consistent gets just before our SQL statement appeared in the trace file (71 + 10 + 71 + 16), we can account for 168 of the 172 consistent gets during the parse of our SQL statement.  Note that the SQL_ID for the SQL statement appeared in the trace file (‘0m07kq3jktxwn’ for the last trace file), so we could do something like this to quickly confirm that dynamic sampling happened during the hard parse without looking in the 10046 trace file:

SET LINESIZE 140
SET PAGESIZE 1000
SET TRIMSPOOL ON
SET AUTOTRACE OFF

SELECT
  *
FROM
  TABLE(DBMS_XPLAN.DISPLAY_CURSOR('0m07kq3jktxwn',NULL,'TYPICAL'));

SQL_ID  0m07kq3jktxwn, child number 0
-------------------------------------
SELECT /* PARSE_TEST3 */   T1.C1 T1_C1,   SUBSTR(T1.C3,1,10) T1_C3,
T2.C2 T2_C2,   SUBSTR(T2.C3,1,10) T2_C3 FROM   T1,   T2 WHERE
T1.C1=T2.C1   AND T1.C1 BETWEEN 4001 AND 6000

Plan hash value: 169351222

------------------------------------------------------------------------------------------
| Id  | Operation                    | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |           |       |       |    48 (100)|          |
|*  1 |  HASH JOIN                   |           |  2000 |   277K|    48   (3)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1        |  2000 |   126K|    22   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | IND_T1_C1 |  2000 |       |     6   (0)| 00:00:01 |
|   4 |   TABLE ACCESS BY INDEX ROWID| T2        |  2000 |   150K|    25   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN          | IND_T2_C1 |  2000 |       |     9   (0)| 00:00:01 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."C1"="T2"."C1")
   3 - access("T1"."C1">=4001 AND "T1"."C1"<=6000)
   5 - access("T2"."C1">=4001 AND "T2"."C1"<=6000)

Note
-----
   - dynamic sampling used for this statement

SQL_ID  0m07kq3jktxwn, child number 1
-------------------------------------
SELECT /* PARSE_TEST3 */   T1.C1 T1_C1,   SUBSTR(T1.C3,1,10) T1_C3,
T2.C2 T2_C2,   SUBSTR(T2.C3,1,10) T2_C3 FROM   T1,   T2 WHERE
T1.C1=T2.C1   AND T1.C1 BETWEEN 4001 AND 6000

Plan hash value: 169351222

------------------------------------------------------------------------------------------
| Id  | Operation                    | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |           |       |       |    48 (100)|          |
|*  1 |  HASH JOIN                   |           |  2000 |   277K|    48   (3)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1        |  2000 |   126K|    22   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | IND_T1_C1 |  2000 |       |     6   (0)| 00:00:01 |
|   4 |   TABLE ACCESS BY INDEX ROWID| T2        |  2000 |   150K|    25   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN          | IND_T2_C1 |  2000 |       |     9   (0)| 00:00:01 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."C1"="T2"."C1")
   3 - access("T1"."C1">=4001 AND "T1"."C1"<=6000)
   5 - access("T2"."C1">=4001 AND "T2"."C1"<=6000)

Note
-----
   - dynamic sampling used for this statement 

The note at the bottom of each execution plan shows that dynamic sampling happened during the hard parse.  Note also that there are two child cursors with the same execution plan.  One was created when the 10046 trace was active, and the other was created after the 10046 trace was disabled.

We could then extend this test case by collecting table AND index statistics, and then re-execute the test script to compare the results.  So, what are the other possible causes for consistent gets during a hard parse?  Start with my test case and see if you are able to show the source of the consistent gets that are output by SQL*Plus or a TKPROF summary.





First Table is 550MB, Second Table is 26GB – Nested Loops or Full Table Scan? 4

4 10 2010

October 4, 2010

(Back to the Previous Post in the Series)

The previous blog articles in this series used the following no workload system (CPU) statistics while trying to determine if Oracle Database’s optimizer prefers hash joins or nested loops when joining a couple of test tables:

SQL> SELECT
  2    PNAME,
  3    PVAL1
  4  FROM
  5    SYS.AUX_STATS$;

PNAME                PVAL1
--------------- ----------
STATUS
DSTART
DSTOP
FLAGS                    1
CPUSPEEDNW      2128.85496
IOSEEKTIM               10
IOTFRSPEED            4096
SREADTIM
MREADTIM
CPUSPEED
MBRC
MAXTHR
SLAVETHR

I then collected workload system (CPU) statistics for a period of 30 minutes while performing various index based and full table scan queries using the sample tables created in the earlier blog articles of this series.  The generated workload system (CPU) statistics are as follows:

PNAME                PVAL1
--------------- ----------
STATUS
DSTART
DSTOP
FLAGS                    0
CPUSPEEDNW      2128.85496
IOSEEKTIM               10
IOTFRSPEED            4096
SREADTIM          2490.746
MREADTIM          9162.743
CPUSPEED              2714
MBRC                    78
MAXTHR              302080
SLAVETHR

If we check the documentation, we see that the units for the SREADTIM, MREADTIM, and IOSEEKTIM values are in ms (1/1000th of a second).  Something is very strange about those automatically collected SREADTIM and MREADTIM values, but they must be correct because they were automatically collected (9.1 seconds for a multi-block read?  If this statistic really has a unit of ms, how would an average single block read time into the buffer cache of 0.1 ms from SSD storage devices be represented?).  The MBRC value is also probably a little large, but it does likely represent the average number of blocks read during multi-block reads during the statistics collection period.  We could expect to see a couple of changes to the execution plans with the workload system statistics, and probably would see a significant change in the estimated execution time if the SREADTIM and MREADTIM values are in fact stated in milliseconds.

I might expect to see statistics something like the following (from another server):

SQL> SELECT
  2    PNAME,
  3    PVAL1
  4  FROM
  5    SYS.AUX_STATS$;

PNAME                PVAL1
--------------- ----------
STATUS
DSTART
DSTOP
FLAGS                    1
CPUSPEEDNW      1077.92208
IOSEEKTIM               10
IOTFRSPEED            4096
SREADTIM             4.754
MREADTIM             7.432
CPUSPEED              1335
MBRC                    32
MAXTHR               18432
SLAVETHR

But then the MAXTHR value in the above from the second server, according to the documentation, appears to be very suspect since the documentation states that this is the maximum throughput that the I/O subsystem can deliver in bytes per second.

Let’s take a look through the workload baseline output and compare it with the no workload baseline output, only showing when the execution plan changed:

SQL> SELECT
  2    T1.COL1,
  3    T1.COL2,
  4    T1.COL3,
  5    T2.COL1,
  6    T2.COL2
  7  FROM
  8    T1,
  9    T2
 10  WHERE
 11    T1.COL1=T2.COL1
 12    AND T1.COL1 < 10000;

Execution Plan
----------------------------------------------------------
Plan hash value: 2140947373

---------------------------------------------------------------------------------------------
| Id  | Operation                    | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |              |  9998 |  2177K|   471   (1)| 00:19:32 |
|*  1 |  HASH JOIN                   |              |  9998 |  2177K|   471   (1)| 00:19:32 |
|*  2 |   TABLE ACCESS FULL          | T2           |  9999 |   107K|   145   (0)| 00:06:02 |
|   3 |   TABLE ACCESS BY INDEX ROWID| T1           |  9999 |  2070K|   325   (0)| 00:13:30 |
|*  4 |    INDEX RANGE SCAN          | SYS_C0018298 |  9999 |       |    21   (0)| 00:00:53 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."COL1"="T2"."COL1")
   2 - filter("T2"."COL1"<10000)
   4 - access("T1"."COL1"<10000)

The no workload baseline plan for the same SQL statement:

Execution Plan
----------------------------------------------------------
Plan hash value: 2371011080

-------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                  |  9998 |  2177K|   586   (1)| 00:00:08 |
|*  1 |  HASH JOIN                   |                  |  9998 |  2177K|   586   (1)| 00:00:08 |
|*  2 |   VIEW                       | index$_join$_002 |  9999 |   107K|   260   (1)| 00:00:04 |
|*  3 |    HASH JOIN                 |                  |       |       |            |          |
|*  4 |     INDEX RANGE SCAN         | SYS_C0018300     |  9999 |   107K|    22  (10)| 00:00:01 |
|   5 |     INDEX FAST FULL SCAN     | IND_T2_COL2      |  9999 |   107K|   299   (1)| 00:00:04 |
|   6 |   TABLE ACCESS BY INDEX ROWID| T1               |  9999 |  2070K|   325   (0)| 00:00:04 |
|*  7 |    INDEX RANGE SCAN          | SYS_C0018298     |  9999 |       |    21   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."COL1"="T2"."COL1")
   2 - filter("T2"."COL1"<10000)
   3 - access(ROWID=ROWID)
   4 - access("T2"."COL1"<10000)
   7 - access("T1"."COL1"<10000)

With the workload system statistics in place the optimizer is no longer performing an index join operation to avoid visiting the T2 table.  Notice that while the calculated costs decreased slightly with the workload statistics, the estimated execution time increased significantly.  Continuing:

SQL> SELECT
  2    T1.COL1,
  3    T1.COL2,
  4    T1.COL3,
  5    T2.COL1,
  6    T2.COL2
  7  FROM
  8    T1,
  9    T2
 10  WHERE
 11    T1.COL1=T2.COL1
 12    AND T1.COL2 > (10000000-10000);

Execution Plan
----------------------------------------------------------
Plan hash value: 570481587

----------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name        | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |             |  9999 |  2177K|       |   550   (1)| 00:22:49 |
|*  1 |  HASH JOIN                   |             |  9999 |  2177K|  2192K|   550   (1)| 00:22:49 |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1          |  9999 |  2070K|       |   330   (0)| 00:13:42 |
|*  3 |    INDEX RANGE SCAN          | IND_T1_COL2 |  9999 |       |       |    26   (0)| 00:01:05 |
|   4 |   TABLE ACCESS FULL          | T2          |   100K|  1074K|       |   145   (0)| 00:06:02 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."COL1"="T2"."COL1")
   3 - access("T1"."COL2">9990000)

The no workload execution plan for the same SQL statement :

Execution Plan
----------------------------------------------------------
Plan hash value: 512758489

---------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name             | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                  |  9999 |  2177K|       |  1238   (1)| 00:00:15 |
|*  1 |  HASH JOIN                   |                  |  9999 |  2177K|  2192K|  1238   (1)| 00:00:15 |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1               |  9999 |  2070K|       |   330   (0)| 00:00:04 |
|*  3 |    INDEX RANGE SCAN          | IND_T1_COL2      |  9999 |       |       |    26   (0)| 00:00:01 |
|   4 |   VIEW                       | index$_join$_002 |   100K|  1074K|       |   691   (1)| 00:00:09 |
|*  5 |    HASH JOIN                 |                  |       |       |       |            |          |
|   6 |     INDEX FAST FULL SCAN     | SYS_C0018300     |   100K|  1074K|       |   236   (1)| 00:00:03 |
|   7 |     INDEX FAST FULL SCAN     | IND_T2_COL2      |   100K|  1074K|       |   299   (1)| 00:00:04 |
---------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."COL1"="T2"."COL1")
   3 - access("T1"."COL2">9990000)
   5 - access(ROWID=ROWID)

Again, with the workload system statistics in place the optimizer is no longer utilizing an index join operation to avoid accessing the table T2.  Also, the calculated cost dropped from 1238 to 550 while the estimated execution time increased from 15 seconds to 22 minutes and 49 seconds.  Specifying smaller PGA_AGGREGATE_TARGET values did not cause the execution plans to deviate from those found for the no workload baseline.

Continuing with test results from changing the OPTIMIZER_MODE from ALL_ROWS, starting with FIRST_ROWS_1000, comparing workload with no workload system statistics:

SQL> SELECT
  2    T1.COL1,
  3    T1.COL2,
  4    T1.COL3,
  5    T2.COL1,
  6    T2.COL2,
  7    T2.COL3
  8  FROM
  9    T1,
 10    T2
 11  WHERE
 12    T1.COL1=T2.COL1;

Execution Plan
----------------------------------------------------------
Plan hash value: 2959412835

-----------------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |  1000 |   621K|       |   559   (1)| 00:23:12 |
|*  1 |  HASH JOIN         |      |  1000 |   621K|    21M|   559   (1)| 00:23:12 |
|   2 |   TABLE ACCESS FULL| T2   |   100K|    20M|       |   145   (0)| 00:06:02 |
|   3 |   TABLE ACCESS FULL| T1   | 10000 |  2070K|       |    16   (0)| 00:00:40 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."COL1"="T2"."COL1")

The no workload output for the same SQL statement with the OPTIMIZER_MODE set to FIRST_ROWS_1000:

Execution Plan
----------------------------------------------------------
Plan hash value: 2077217387

---------------------------------------------------------------------------------------------
| Id  | Operation                    | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |              |  1000 |   414K|  1085   (1)| 00:00:14 |
|   1 |  NESTED LOOPS                |              |       |       |            |          |
|   2 |   NESTED LOOPS               |              |  1000 |   414K|  1085   (1)| 00:00:14 |
|   3 |    TABLE ACCESS FULL         | T1           | 10000 |  2070K|    84   (0)| 00:00:02 |
|*  4 |    INDEX UNIQUE SCAN         | SYS_C0018300 |     1 |       |     0   (0)| 00:00:01 |
|   5 |   TABLE ACCESS BY INDEX ROWID| T2           |     1 |   212 |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("T1"."COL1"="T2"."COL1") 

In the above, the workload statistics execution plan completely avoided the index access paths and performed a hash join rather than two nested loops joins.

For the OPTIMIZER_MODEs FIRST_ROWS_100, FIRST_ROWS_10, and FIRST_ROWS_1 the workload execution plans did not differ from the no workload execution plans for all SQL statements.

Next, we will take a look at what changed when the INDEX_COST_ADJ parameter was set to 50 for the execution plans with the workload system statistics in place:

SQL> SELECT
  2    T1.COL1,
  3    T1.COL2,
  4    T1.COL3,
  5    T2.COL1,
  6    T2.COL2
  7  FROM
  8    T1,
  9    T2
 10  WHERE
 11    T1.COL1=T2.COL1
 12    AND T1.COL2 > (10000000-10000);

Execution Plan
----------------------------------------------------------
Plan hash value: 570481587

----------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name        | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |             |  9999 |  2177K|       |   385   (1)| 00:15:58 |
|*  1 |  HASH JOIN                   |             |  9999 |  2177K|  2192K|   385   (1)| 00:15:58 |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1          |  9999 |  2070K|       |   165   (0)| 00:06:51 |
|*  3 |    INDEX RANGE SCAN          | IND_T1_COL2 |  9999 |       |       |    13   (0)| 00:00:33 |
|   4 |   TABLE ACCESS FULL          | T2          |   100K|  1074K|       |   145   (0)| 00:06:02 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."COL1"="T2"."COL1")
   3 - access("T1"."COL2">9990000)

INDEX_COST_ADJ = 50, no workload:

Execution Plan
----------------------------------------------------------
Plan hash value: 512758489

---------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name             | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                  |  9999 |  2177K|       |   859   (1)| 00:00:11 |
|*  1 |  HASH JOIN                   |                  |  9999 |  2177K|  2192K|   859   (1)| 00:00:11 |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1               |  9999 |  2070K|       |   165   (0)| 00:00:02 |
|*  3 |    INDEX RANGE SCAN          | IND_T1_COL2      |  9999 |       |       |    13   (0)| 00:00:01 |
|   4 |   VIEW                       | index$_join$_002 |   100K|  1074K|       |   477   (1)| 00:00:06 |
|*  5 |    HASH JOIN                 |                  |       |       |       |            |          |
|   6 |     INDEX FAST FULL SCAN     | SYS_C0018300     |   100K|  1074K|       |   118   (0)| 00:00:02 |
|   7 |     INDEX FAST FULL SCAN     | IND_T2_COL2      |   100K|  1074K|       |   149   (0)| 00:00:02 |
---------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."COL1"="T2"."COL1")
   3 - access("T1"."COL2">9990000)
   5 - access(ROWID=ROWID)

Once again, Oracle Database’s optimizer decided not to perform an index join operation to avoid accessing table T2 when the workload system statistics were in place.  Continuing with the INDEX_COST_ADJ parameter set to 20 with workload system statistics:

SQL> SELECT
  2    T1.COL1,
  3    T1.COL2,
  4    T1.COL3,
  5    T2.COL1,
  6    T2.COL2,
  7    T2.COL3
  8  FROM
  9    T1,
 10    T2
 11  WHERE
 12    T1.COL1=T2.COL1;

Execution Plan
----------------------------------------------------------
Plan hash value: 2959412835

-----------------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |   100K|    40M|       |  5543   (1)| 03:50:06 |
|*  1 |  HASH JOIN         |      |   100K|    40M|    21M|  5543   (1)| 03:50:06 |
|   2 |   TABLE ACCESS FULL| T2   |   100K|    20M|       |   145   (0)| 00:06:02 |
|   3 |   TABLE ACCESS FULL| T1   |  1000K|   202M|       |  1431   (0)| 00:59:25 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."COL1"="T2"."COL1")

The execution plan for the same SQL statement with the INDEX_COST_ADJ parameter set to 20 with no workload system statistics:

Execution Plan
----------------------------------------------------------
Plan hash value: 3187704125

-----------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name         | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |              |   100K|    40M|       | 11880   (1)| 00:02:23 |
|   1 |  MERGE JOIN                  |              |   100K|    40M|       | 11880   (1)| 00:02:23 |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1           |  1000K|   202M|       |  6441   (1)| 00:01:18 |
|   3 |    INDEX FULL SCAN           | SYS_C0018298 |  1000K|       |       |   377   (1)| 00:00:05 |
|*  4 |   SORT JOIN                  |              |   100K|    20M|    43M|  5439   (1)| 00:01:06 |
|   5 |    TABLE ACCESS FULL         | T2           |   100K|    20M|       |   825   (1)| 00:00:10 |
-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("T1"."COL1"="T2"."COL1")
       filter("T1"."COL1"="T2"."COL1")

As we can see from the above, the optimizer has oddly decided to perform a hash join with full table scans, even though we have told the optimizer that the index access paths are 1/5th the original calculated costs.  I suspect that the hash join (likely with the tables being read by direct path reads in 11.2.0.1) will be a bit more efficient than a sort merge join that appeared with the no workload system statistics, but we will not know that until we actually execute the SQL statement.  You might recall that with the INDEX_COST_ADJ parameter set to 10 with no workload system statistics, all of the hash joins in the execution plans were replaced with either sort merge joins or nested loops joins.

Continuing with the INDEX_COST_ADJ parameter set to 20 with workload system statistics:

SQL> SELECT
  2    T1.COL1,
  3    T1.COL2,
  4    T1.COL3,
  5    T2.COL1,
  6    T2.COL2,
  7    T2.COL3
  8  FROM
  9    T1,
 10    T2
 11  WHERE
 12    T1.COL2=T2.COL2;

Execution Plan
----------------------------------------------------------
Plan hash value: 2959412835

-----------------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |   100K|    40M|       |  5543   (1)| 03:50:06 |
|*  1 |  HASH JOIN         |      |   100K|    40M|    21M|  5543   (1)| 03:50:06 |
|   2 |   TABLE ACCESS FULL| T2   |   100K|    20M|       |   145   (0)| 00:06:02 |
|   3 |   TABLE ACCESS FULL| T1   |  1000K|   202M|       |  1431   (0)| 00:59:25 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."COL2"="T2"."COL2")

The execution plan for the same SQL statement with the INDEX_COST_ADJ parameter set to 20, no workload:

Execution Plan
----------------------------------------------------------
Plan hash value: 1173042340

----------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name        | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |             |   100K|    40M|       | 11979   (1)| 00:02:24 |
|   1 |  MERGE JOIN                  |             |   100K|    40M|       | 11979   (1)| 00:02:24 |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1          |  1000K|   202M|       |  6540   (1)| 00:01:19 |
|   3 |    INDEX FULL SCAN           | IND_T1_COL2 |  1000K|       |       |   476   (1)| 00:00:06 |
|*  4 |   SORT JOIN                  |             |   100K|    20M|    43M|  5439   (1)| 00:01:06 |
|   5 |    TABLE ACCESS FULL         | T2          |   100K|    20M|       |   825   (1)| 00:00:10 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("T1"."COL2"="T2"."COL2")
       filter("T1"."COL2"="T2"."COL2")

We again see the optimizer selecting a hash join and full table scans when the column having an index with a high clustering factor is the join column with the workload system statistics in place.

Continuing with the INDEX_COST_ADJ parameter set to 20 with workload system statistics:

SQL> SELECT
  2    T1.COL1,
  3    T1.COL2,
  4    T1.COL3,
  5    T2.COL1,
  6    T2.COL2
  7  FROM
  8    T1,
  9    T2
 10  WHERE
 11    T1.COL1=T2.COL1
 12    AND T1.COL2 > (10000000-10000);

Execution Plan
----------------------------------------------------------
Plan hash value: 570481587

----------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name        | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |             |  9999 |  2177K|       |   286   (1)| 00:11:52 |
|*  1 |  HASH JOIN                   |             |  9999 |  2177K|  2192K|   286   (1)| 00:11:52 |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1          |  9999 |  2070K|       |    66   (0)| 00:02:45 |
|*  3 |    INDEX RANGE SCAN          | IND_T1_COL2 |  9999 |       |       |     5   (0)| 00:00:13 |
|   4 |   TABLE ACCESS FULL          | T2          |   100K|  1074K|       |   145   (0)| 00:06:02 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."COL1"="T2"."COL1")
   3 - access("T1"."COL2">9990000)

INDEX_COST_ADJ = 20, no workload:

Execution Plan
----------------------------------------------------------
Plan hash value: 512758489

---------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name             | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                  |  9999 |  2177K|       |   632   (1)| 00:00:08 |
|*  1 |  HASH JOIN                   |                  |  9999 |  2177K|  2192K|   632   (1)| 00:00:08 |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1               |  9999 |  2070K|       |    66   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | IND_T1_COL2      |  9999 |       |       |     5   (0)| 00:00:01 |
|   4 |   VIEW                       | index$_join$_002 |   100K|  1074K|       |   349   (1)| 00:00:05 |
|*  5 |    HASH JOIN                 |                  |       |       |       |            |          |
|   6 |     INDEX FAST FULL SCAN     | SYS_C0018300     |   100K|  1074K|       |    47   (0)| 00:00:01 |
|   7 |     INDEX FAST FULL SCAN     | IND_T2_COL2      |   100K|  1074K|       |    60   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."COL1"="T2"."COL1")
   3 - access("T1"."COL2">9990000)
   5 - access(ROWID=ROWID)

In the above, we see hash joins both with and without workload system statistics, but without workload system statistics the optimizer decided to completely avoid directly accessing table T2.

Dropping the INDEX_COST_ADJ parameter down to a value of 10 with workload system statistics:

SQL> SELECT
  2    T1.COL1,
  3    T1.COL2,
  4    T1.COL3,
  5    T2.COL1,
  6    T2.COL2,
  7    T2.COL3
  8  FROM
  9    T1,
 10    T2
 11  WHERE
 12    T1.COL1=T2.COL1;

Execution Plan
----------------------------------------------------------
Plan hash value: 2959412835

-----------------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |   100K|    40M|       |  5543   (1)| 03:50:06 |
|*  1 |  HASH JOIN         |      |   100K|    40M|    21M|  5543   (1)| 03:50:06 |
|   2 |   TABLE ACCESS FULL| T2   |   100K|    20M|       |   145   (0)| 00:06:02 |
|   3 |   TABLE ACCESS FULL| T1   |  1000K|   202M|       |  1431   (0)| 00:59:25 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."COL1"="T2"."COL1")

INDEX_COST_ADJ = 10, no workload:

Execution Plan
----------------------------------------------------------
Plan hash value: 3187704125

-----------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name         | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |              |   100K|    40M|       |  8660   (1)| 00:01:44 |
|   1 |  MERGE JOIN                  |              |   100K|    40M|       |  8660   (1)| 00:01:44 |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1           |  1000K|   202M|       |  3221   (1)| 00:00:39 |
|   3 |    INDEX FULL SCAN           | SYS_C0018298 |  1000K|       |       |   189   (1)| 00:00:03 |
|*  4 |   SORT JOIN                  |              |   100K|    20M|    43M|  5439   (1)| 00:01:06 |
|   5 |    TABLE ACCESS FULL         | T2           |   100K|    20M|       |   825   (1)| 00:00:10 |
-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("T1"."COL1"="T2"."COL1")
       filter("T1"."COL1"="T2"."COL1")

Again, we see what is likely a sensible hash join with two full table scans with the workload system statistics, but a sort merge join with no workload system statistics.  Continuing with the INDEX_COST_ADJ parameter set to 10 with workload:

SQL> SELECT
  2    T1.COL1,
  3    T1.COL2,
  4    T1.COL3,
  5    T2.COL1,
  6    T2.COL2,
  7    T2.COL3
  8  FROM
  9    T1,
 10    T2
 11  WHERE
 12    T1.COL2=T2.COL2;

Execution Plan
----------------------------------------------------------
Plan hash value: 2959412835

-----------------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |   100K|    40M|       |  5543   (1)| 03:50:06 |
|*  1 |  HASH JOIN         |      |   100K|    40M|    21M|  5543   (1)| 03:50:06 |
|   2 |   TABLE ACCESS FULL| T2   |   100K|    20M|       |   145   (0)| 00:06:02 |
|   3 |   TABLE ACCESS FULL| T1   |  1000K|   202M|       |  1431   (0)| 00:59:25 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."COL2"="T2"."COL2")

INDEX_COST_ADJ = 10, no workload:

Execution Plan
----------------------------------------------------------
Plan hash value: 1173042340

----------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name        | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |             |   100K|    40M|       |  8709   (1)| 00:01:45 |
|   1 |  MERGE JOIN                  |             |   100K|    40M|       |  8709   (1)| 00:01:45 |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1          |  1000K|   202M|       |  3270   (1)| 00:00:40 |
|   3 |    INDEX FULL SCAN           | IND_T1_COL2 |  1000K|       |       |   238   (1)| 00:00:03 |
|*  4 |   SORT JOIN                  |             |   100K|    20M|    43M|  5439   (1)| 00:01:06 |
|   5 |    TABLE ACCESS FULL         | T2          |   100K|    20M|       |   825   (1)| 00:00:10 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("T1"."COL2"="T2"."COL2")
       filter("T1"."COL2"="T2"."COL2")

Another hash join with workload system statistics and a sort merge join without (all hash joins disappeared with no workload statistics when the INDEX_COST_ADJ parameter was set to 10). 

There were no differences between the execution plans for the workload and no workload system statistics when the INDEX_COST_ADJ parameter set to 1.

If we assume that Oracle Database’s collection procedure somehow collected the statistics wrong, and it really does not take 2.491 seconds to complete a single block read into the buffer cache and 9.163 seconds to perform a multi-block read into the buffer cache, we might be willing to believe that some of the above execution plans might not be ideal.  What if the server actually had a RAID array of SSD drives, or an attached SAN with a large built-in cache?  Maybe we would consider manually setting the workload system statistics like this:

EXEC DBMS_STATS.SET_SYSTEM_STATS('SREADTIM',0.1)
EXEC DBMS_STATS.SET_SYSTEM_STATS('MREADTIM',0.2)

If we then query SYS.AUX_STATS$, we would see something like this:

SQL> SELECT
  2    PNAME,
  3    PVAL1
  4  FROM
  5    SYS.AUX_STATS$;

PNAME                PVAL1
--------------- ----------
STATUS
DSTART
DSTOP
FLAGS                    1
CPUSPEEDNW      2128.85496
IOSEEKTIM               10
IOTFRSPEED            4096
SREADTIM                .1
MREADTIM                .2
CPUSPEED              2714
MBRC                    78
MAXTHR              302080
SLAVETHR

Let’s run the test script again to see how the execution plans change.  From the new test output, comparing the new workload system statistics execution plans with the original workload execution plans:

SQL> SELECT
  2    T1.COL1,
  3    T1.COL2,
  4    T1.COL3,
  5    T2.COL1,
  6    T2.COL2,
  7    T2.COL3
  8  FROM
  9    T1,
 10    T2
 11  WHERE
 12    T1.COL1=T2.COL1;

Execution Plan
----------------------------------------------------------
Plan hash value: 2959412835

-----------------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |   100K|    40M|       |  6447  (45)| 00:00:01 |
|*  1 |  HASH JOIN         |      |   100K|    40M|    21M|  6447  (45)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| T2   |   100K|    20M|       |   229  (66)| 00:00:01 |
|   3 |   TABLE ACCESS FULL| T1   |  1000K|   202M|       |  2274  (66)| 00:00:01 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."COL1"="T2"."COL1")

Looks like the SQL statement will complete in about 1 second.  The original workload system statistics baseline plan looked like this:

Execution Plan
----------------------------------------------------------
Plan hash value: 2959412835

-----------------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |   100K|    40M|       |  5543   (1)| 03:50:06 |
|*  1 |  HASH JOIN         |      |   100K|    40M|    21M|  5543   (1)| 03:50:06 |
|   2 |   TABLE ACCESS FULL| T2   |   100K|    20M|       |   145   (0)| 00:06:02 |
|   3 |   TABLE ACCESS FULL| T1   |  1000K|   202M|       |  1431   (0)| 00:59:25 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."COL1"="T2"."COL1")

So, comparing the two, the original baseline cost was less (5543 compared to 6447), but the original baseline was expected to complete in three hours and 50 minutes rather than one second.  Other than the time and cost estimates changing, the execution plans remained the same for the two sets of workload statistics execution plans – until we take a look at what happens when the OPTIMIZER_INDEX_COST_ADJ parameter is changed to 50 or less.  What do we see as changed execution plans (at a value of 50 for OPTIMIZER_INDEX_COST_ADJ)?

SQL> SELECT
  2    T1.COL1,
  3    T1.COL2,
  4    T1.COL3,
  5    T2.COL1,
  6    T2.COL2
  7  FROM
  8    T1,
  9    T2
 10  WHERE
 11    T1.COL1=T2.COL1
 12    AND T1.COL1 < 10000;

Execution Plan
----------------------------------------------------------
Plan hash value: 3560619641

---------------------------------------------------------------------------------------------
| Id  | Operation                    | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |              |  9998 |  2177K|   357   (9)| 00:00:01 |
|*  1 |  HASH JOIN                   |              |  9998 |  2177K|   357   (9)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID| T2           |  9999 |   107K|   173   (7)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | SYS_C0018300 |  9999 |       |    14  (29)| 00:00:01 |
|   4 |   TABLE ACCESS BY INDEX ROWID| T1           |  9999 |  2070K|   174   (7)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN          | SYS_C0018298 |  9999 |       |    14  (22)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."COL1"="T2"."COL1")
   3 - access("T2"."COL1"<10000)
   5 - access("T1"."COL1"<10000)

While the automatically generated system statistics with the same value of 50 for the OPTIMIZER_INDEX_COST_ADJ produced the following execution plan (a value of 20 for this parameter also generated the same execution plan):

Execution Plan
----------------------------------------------------------
Plan hash value: 2371011080

-------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                  |  9998 |  2177K|   293   (1)| 00:12:09 |
|*  1 |  HASH JOIN                   |                  |  9998 |  2177K|   293   (1)| 00:12:09 |
|*  2 |   VIEW                       | index$_join$_002 |  9999 |   107K|   130   (1)| 00:05:23 |
|*  3 |    HASH JOIN                 |                  |       |       |            |          |
|*  4 |     INDEX RANGE SCAN         | SYS_C0018300     |  9999 |   107K|    11  (10)| 00:00:28 |
|   5 |     INDEX FAST FULL SCAN     | IND_T2_COL2      |  9999 |   107K|   149   (0)| 00:06:11 |
|   6 |   TABLE ACCESS BY INDEX ROWID| T1               |  9999 |  2070K|   163   (0)| 00:06:45 |
|*  7 |    INDEX RANGE SCAN          | SYS_C0018298     |  9999 |       |    11   (0)| 00:00:27 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."COL1"="T2"."COL1")
   2 - filter("T2"."COL1"<10000)
   3 - access(ROWID=ROWID)
   4 - access("T2"."COL1"<10000)
   7 - access("T1"."COL1"<10000)

The optimizer is no longer avoiding the table access through the use of the IND_T2_COL2 index.

When the OPTIMIZER_INDEX_COST_ADJ parameter is set to 10 we see a couple of additional execution plan changes:

SQL> SELECT
  2    T1.COL1,
  3    T1.COL2,
  4    T1.COL3,
  5    T2.COL1,
  6    T2.COL2
  7  FROM
  8    T1,
  9    T2
 10  WHERE
 11    T1.COL1=T2.COL1
 12    AND T1.COL2 > (10000000-10000);

Execution Plan
----------------------------------------------------------
Plan hash value: 570481587

----------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name        | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |             |  9999 |  2177K|       |   395  (59)| 00:00:01 |
|*  1 |  HASH JOIN                   |             |  9999 |  2177K|  2192K|   395  (59)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1          |  9999 |  2070K|       |    35   (6)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | IND_T1_COL2 |  9999 |       |       |     3   (0)| 00:00:01 |
|   4 |   TABLE ACCESS FULL          | T2          |   100K|  1074K|       |   221  (65)| 00:00:01 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."COL1"="T2"."COL1")
   3 - access("T1"."COL2">9990000)

Compared to before the workload system statistics were modified at the same OPTIMIZER_INDEX_COST_ADJ parameter value of 10:

Execution Plan
----------------------------------------------------------
Plan hash value: 512758489

---------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name             | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                  |  9999 |  2177K|       |   240   (1)| 00:09:57 |
|*  1 |  HASH JOIN                   |                  |  9999 |  2177K|  2192K|   240   (1)| 00:09:57 |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1               |  9999 |  2070K|       |    33   (0)| 00:01:23 |
|*  3 |    INDEX RANGE SCAN          | IND_T1_COL2      |  9999 |       |       |     3   (0)| 00:00:07 |
|   4 |   VIEW                       | index$_join$_002 |   100K|  1074K|       |   132   (0)| 00:05:30 |
|*  5 |    HASH JOIN                 |                  |       |       |       |            |          |
|   6 |     INDEX FAST FULL SCAN     | SYS_C0018300     |   100K|  1074K|       |    24   (0)| 00:00:59 |
|   7 |     INDEX FAST FULL SCAN     | IND_T2_COL2      |   100K|  1074K|       |    30   (0)| 00:01:15 |
---------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."COL1"="T2"."COL1")
   3 - access("T1"."COL2">9990000)
   5 - access(ROWID=ROWID)

Somewhat surprisingly, even with a OPTIMIZER_INDEX_COST_ADJ parameter value of 1, with workload system statistics a couple of the execution plans still showed hash joins, while that was not the case with the no workload system statistics:

SQL> SELECT
  2    T1.COL1,
  3    T1.COL2,
  4    T1.COL3,
  5    T2.COL1,
  6    T2.COL2,
  7    T2.COL3
  8  FROM
  9    T1,
 10    T2
 11  WHERE
 12    T1.COL1=T2.COL1
 13    AND T1.COL1 BETWEEN 1 AND 1000;

Execution Plan
----------------------------------------------------------
Plan hash value: 1188283739

---------------------------------------------------------------------------------------------
| Id  | Operation                    | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |              |   999 |   413K|     3  (34)| 00:00:01 |
|*  1 |  HASH JOIN                   |              |   999 |   413K|     3  (34)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1           |  1000 |   207K|     1   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | SYS_C0018298 |  1000 |       |     1   (0)| 00:00:01 |
|   4 |   TABLE ACCESS BY INDEX ROWID| T2           |  1000 |   207K|     1   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN          | SYS_C0018300 |  1000 |       |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."COL1"="T2"."COL1")
   3 - access("T1"."COL1">=1 AND "T1"."COL1"<=1000)
   5 - access("T2"."COL1">=1 AND "T2"."COL1"<=1000)

SQL> SELECT
  2    T1.COL1,
  3    T1.COL2,
  4    T1.COL3,
  5    T2.COL1,
  6    T2.COL2
  7  FROM
  8    T1,
  9    T2
 10  WHERE
 11    T1.COL1=T2.COL1
 12    AND T1.COL1 < 10000;

Execution Plan
----------------------------------------------------------
Plan hash value: 3560619641

---------------------------------------------------------------------------------------------
| Id  | Operation                    | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |              |  9998 |  2177K|    17  (65)| 00:00:01 |
|*  1 |  HASH JOIN                   |              |  9998 |  2177K|    17  (65)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID| T2           |  9999 |   107K|     3   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | SYS_C0018300 |  9999 |       |     1   (0)| 00:00:01 |
|   4 |   TABLE ACCESS BY INDEX ROWID| T1           |  9999 |  2070K|     3   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN          | SYS_C0018298 |  9999 |       |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."COL1"="T2"."COL1")
   3 - access("T2"."COL1"<10000)
   5 - access("T1"."COL1"<10000) 

So, the original question likely remains.  If the first table is small (the T2 and TT2 test tables) and the driven table (the T1 and TT1 test tables) contains an index on the join column, will Oracle’s optimizer favor nested loops joins over hash joins?  That would make a great test question for a certification exam and would appropriately be included in my blog category Quiz – Whose Answer is it Anyway? for those types of questions.





First Table is 550MB, Second Table is 26GB – Nested Loops or Full Table Scan? 3

3 10 2010

October 3, 2010

(Back to the Previous Post in the Series) (Forward to the Next Post in the Series)

In the last blog article in addition to changing the Oracle Database release version, we made a couple of modifications to various parameters including: OPTIMIZER_INDEX_COST_ADJ, OPTIMIZER_INDEX_CACHING, OPTIMIZER_MODE, and PGA_AGGREGATE_TARGET, all to try to determine the impact of those changes on the execution plans when a test table T1 (that was intended to simulate a 26GB table) was joined to a test table T2 (that was intended to simulate a 550MB table).  One thing is certain, Oracle Database’s query optimizer has a lot of options when joining two tables together, and some of those choices lead to efficient execution plans, while others do not.  By experimenting with the various parameters (setting the parameters to non-default values), we changed most of the hash joins to multiple nested loops joins, changed other hash joins to sort-merge joins, and in one case changed multiple nested loops joins into a single hash join.  All of those tests were performed with no-workload system (CPU) statistics.

In this blog post we will continue the experiments.  We will see what happens to the execution plans when we change the 200 character padding column values to 10 character padding column values, similar to a modifcation to my test script performed by another person.  For this test script I simply started with the previous test script and replaced T1 with TT1, and T2 with TT2.  The insert statements were changed as shown below:

INSERT INTO
  TT1
SELECT
  ROWNUM,
  10000000-ROWNUM,
  RPAD(TO_CHAR(ROWNUM),10,'A')
FROM
  (SELECT
    ROWNUM RN
  FROM
    DUAL
  CONNECT BY
    LEVEL<=1000) V1,
  (SELECT
    ROWNUM RN
  FROM
    DUAL
  CONNECT BY
    LEVEL<=1000) V2;

INSERT INTO
  TT2
SELECT
  ROWNUM,
  10000000-ROWNUM,
  RPAD(TO_CHAR(ROWNUM),10,'A')
FROM
  (SELECT
    ROWNUM RN
  FROM
    DUAL
  CONNECT BY
    LEVEL<=100000);

What in the execution plans changed?  Obviously, the number of bytes predicted to leave each operation in the excution plan decreased, as did the temp tablespace requirements.  Comparing the baseline output from the baseline output from the previous blog article, we do see a couple of execution plan changes.  For example:

SQL> SELECT
  2    TT1.COL1,
  3    TT1.COL2,
  4    TT1.COL3,
  5    TT2.COL1,
  6    TT2.COL2
  7  FROM
  8    TT1,
  9    TT2
 10  WHERE
 11    TT1.COL1=TT2.COL1
 12    AND TT1.COL1 < 10000;

Execution Plan
----------------------------------------------------------
Plan hash value: 3927111238

---------------------------------------------------------------------------------------------
| Id  | Operation                    | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |              |  9998 |   322K|   118   (1)| 00:00:02 |
|*  1 |  HASH JOIN                   |              |  9998 |   322K|   118   (1)| 00:00:02 |
|   2 |   TABLE ACCESS BY INDEX ROWID| TT2          |  9999 |   107K|    58   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | SYS_C0018343 |  9999 |       |    20   (0)| 00:00:01 |
|   4 |   TABLE ACCESS BY INDEX ROWID| TT1          |  9999 |   214K|    59   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN          | SYS_C0018341 |  9999 |       |    21   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("TT1"."COL1"="TT2"."COL1")
   3 - access("TT2"."COL1"<10000)
   5 - access("TT1"."COL1"<10000)

While the previous baseline had the following execution plan for the same query:

Execution Plan
----------------------------------------------------------
Plan hash value: 2371011080

-------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                  |  9998 |  2177K|   586   (1)| 00:00:08 |
|*  1 |  HASH JOIN                   |                  |  9998 |  2177K|   586   (1)| 00:00:08 |
|*  2 |   VIEW                       | index$_join$_002 |  9999 |   107K|   260   (1)| 00:00:04 |
|*  3 |    HASH JOIN                 |                  |       |       |            |          |
|*  4 |     INDEX RANGE SCAN         | SYS_C0018300     |  9999 |   107K|    22  (10)| 00:00:01 |
|   5 |     INDEX FAST FULL SCAN     | IND_T2_COL2      |  9999 |   107K|   299   (1)| 00:00:04 |
|   6 |   TABLE ACCESS BY INDEX ROWID| T1               |  9999 |  2070K|   325   (0)| 00:00:04 |
|*  7 |    INDEX RANGE SCAN          | SYS_C0018298     |  9999 |       |    21   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."COL1"="T2"."COL1")
   2 - filter("T2"."COL1"<10000)
   3 - access(ROWID=ROWID)
   4 - access("T2"."COL1"<10000)
   7 - access("T1"."COL1"<10000)

So, with the smaller COL3 padding column values the optimizer decided not to perform the index fast full scan of the index on COL2 of table TT2 (to avoid directly accessing table TT2), and now Oracle will perform one hash join rather than performing two hash joins.  Continuing:

SQL> SELECT
  2    TT1.COL1,
  3    TT1.COL2,
  4    TT1.COL3,
  5    TT2.COL1,
  6    TT2.COL2
  7  FROM
  8    TT1,
  9    TT2
 10  WHERE
 11    TT1.COL1=TT2.COL1
 12    AND TT1.COL2 > (10000000-10000);

Execution Plan
----------------------------------------------------------
Plan hash value: 4109682512

---------------------------------------------------------------------------------------------
| Id  | Operation                    | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |              |  9999 |   322K|   169   (2)| 00:00:03 |
|*  1 |  HASH JOIN                   |              |  9999 |   322K|   169   (2)| 00:00:03 |
|   2 |   TABLE ACCESS BY INDEX ROWID| TT1          |  9999 |   214K|    64   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | IND_TT1_COL2 |  9999 |       |    26   (0)| 00:00:01 |
|   4 |   TABLE ACCESS FULL          | TT2          |   100K|  1074K|   104   (1)| 00:00:02 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("TT1"."COL1"="TT2"."COL1")
   3 - access("TT1"."COL2">9990000)

While the previous baseline had the following execution plan for the same query:

Execution Plan
----------------------------------------------------------
Plan hash value: 512758489

---------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name             | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                  |  9999 |  2177K|       |  1238   (1)| 00:00:15 |
|*  1 |  HASH JOIN                   |                  |  9999 |  2177K|  2192K|  1238   (1)| 00:00:15 |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1               |  9999 |  2070K|       |   330   (0)| 00:00:04 |
|*  3 |    INDEX RANGE SCAN          | IND_T1_COL2      |  9999 |       |       |    26   (0)| 00:00:01 |
|   4 |   VIEW                       | index$_join$_002 |   100K|  1074K|       |   691   (1)| 00:00:09 |
|*  5 |    HASH JOIN                 |                  |       |       |       |            |          |
|   6 |     INDEX FAST FULL SCAN     | SYS_C0018300     |   100K|  1074K|       |   236   (1)| 00:00:03 |
|   7 |     INDEX FAST FULL SCAN     | IND_T2_COL2      |   100K|  1074K|       |   299   (1)| 00:00:04 |
---------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."COL1"="T2"."COL1")
   3 - access("T1"."COL2">9990000)
   5 - access(ROWID=ROWID)

Once again, with the smaller COL3 padding column values the optimizer decided not to perform the index fast full scan of the index on COL2 of table TT2 (to avoid directly accessing table TT2), and now Oracle will perform one hash join rather than performing two hash joins, while avoiding the use of approximately 2192K (if I recall correctly, that indicates 2,298,478,592 bytes because the scale for the TempSpc column is off by a factor of 1,024) of estimated temp space.

So, not many changes, other than Oracle’s optimizer no longer avoiding a table access through the use of a fast full index scan.

Decreasing the PGA_AGGREGATE_TARGET FROM 2,000MB to 1,000MB and to 100MB had no impact on the execution plans.

What happened when the OPTIMIZER_MODE was changed from the default of ALL_ROWS to FIRST_ROWS_1000?  If we just look at the changes from the new baseline:

SQL> SELECT
  2    TT1.COL1,
  3    TT1.COL2,
  4    TT1.COL3,
  5    TT2.COL1,
  6    TT2.COL2
  7  FROM
  8    TT1,
  9    TT2
 10  WHERE
 11    TT1.COL1=TT2.COL1
 12    AND TT1.COL1 < 10000;

Execution Plan
----------------------------------------------------------
Plan hash value: 1414977455

----------------------------------------------------------------------------------------------
| Id  | Operation                     | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |              |  1002 | 33066 |    66   (0)| 00:00:01 |
|   1 |  MERGE JOIN                   |              |  1002 | 33066 |    66   (0)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID | TT2          |  1002 | 11022 |     7   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN           | SYS_C0018343 |  9999 |       |     3   (0)| 00:00:01 |
|*  4 |   SORT JOIN                   |              |  9999 |   214K|    59   (0)| 00:00:01 |
|   5 |    TABLE ACCESS BY INDEX ROWID| TT1          |  9999 |   214K|    59   (0)| 00:00:01 |
|*  6 |     INDEX RANGE SCAN          | SYS_C0018341 |  9999 |       |    21   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("TT2"."COL1"<10000)
   4 - access("TT1"."COL1"="TT2"."COL1")
       filter("TT1"."COL1"="TT2"."COL1")
   6 - access("TT1"."COL1"<10000)

While the baseline had the following execution plan for the same query:

Execution Plan
----------------------------------------------------------
Plan hash value: 3927111238

---------------------------------------------------------------------------------------------
| Id  | Operation                    | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |              |  9998 |   322K|   118   (1)| 00:00:02 |
|*  1 |  HASH JOIN                   |              |  9998 |   322K|   118   (1)| 00:00:02 |
|   2 |   TABLE ACCESS BY INDEX ROWID| TT2          |  9999 |   107K|    58   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | SYS_C0018343 |  9999 |       |    20   (0)| 00:00:01 |
|   4 |   TABLE ACCESS BY INDEX ROWID| TT1          |  9999 |   214K|    59   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN          | SYS_C0018341 |  9999 |       |    21   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("TT1"."COL1"="TT2"."COL1")
   3 - access("TT2"."COL1"<10000)
   5 - access("TT1"."COL1"<10000)

So, the optimizer exchanged a hash join for a sort merge join to return the first 1,000 rows – note that the predicted number of rows dropped from 9,998 to 1,002 and that the calculated cost for the query dropped accordingly from 118 to 66.

What happened when the OPTIMIZER_MODE was changed from FIRST_ROWS_1000 to FIRST_ROWS_100?  We could predict that every execution plan that is expected to return more than 100 rows would likely see a change in its execution plan, if the execution plan had not already changed.  If we just look at the changes from the FIRST_ROWS_1000 output:

SQL> SELECT
  2    TT1.COL1,
  3    TT1.COL2,
  4    TT1.COL3,
  5    TT2.COL1,
  6    TT2.COL2,
  7    TT2.COL3
  8  FROM
  9    TT1,
 10    TT2
 11  WHERE
 12    TT1.COL1=TT2.COL1;

Execution Plan
----------------------------------------------------------
Plan hash value: 1268144893

---------------------------------------------------------------------------------------------
| Id  | Operation                    | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |              |   100 |  4400 |   103   (0)| 00:00:02 |
|   1 |  NESTED LOOPS                |              |       |       |            |          |
|   2 |   NESTED LOOPS               |              |   100 |  4400 |   103   (0)| 00:00:02 |
|   3 |    TABLE ACCESS FULL         | TT1          |  1000 | 22000 |     3   (0)| 00:00:01 |
|*  4 |    INDEX UNIQUE SCAN         | SYS_C0018343 |     1 |       |     0   (0)| 00:00:01 |
|   5 |   TABLE ACCESS BY INDEX ROWID| TT2          |     1 |    22 |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("TT1"."COL1"="TT2"."COL1")

While the FIRST_ROWS_1000 execution had the following execution plan for the same query:

Execution Plan
----------------------------------------------------------
Plan hash value: 2456117181

-----------------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |  1000 | 66000 |       |   295   (1)| 00:00:04 |
|*  1 |  HASH JOIN         |      |  1000 | 66000 |  3328K|   295   (1)| 00:00:04 |
|   2 |   TABLE ACCESS FULL| TT2  |   100K|  2148K|       |   104   (1)| 00:00:02 |
|   3 |   TABLE ACCESS FULL| TT1  | 10000 |   214K|       |    12   (0)| 00:00:01 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("TT1"."COL1"="TT2"."COL1")

As we saw with the earlier tests in the previous blog article, changing the optimizer mode to FIRST_ROWS_100 was sufficient to cause the optimizer to switch from a hash join to two nested loops joins.  The clustering factor of column COL2 was sufficiently high enough for the execution plan joining that column in the two tables to remain as a hash join.  Continuing:

SQL> SELECT
  2    TT1.COL1,
  3    TT1.COL2,
  4    TT1.COL3,
  5    TT2.COL1,
  6    TT2.COL2,
  7    TT2.COL3
  8  FROM
  9    TT1,
 10    TT2
 11  WHERE
 12    TT1.COL1=TT2.COL1
 13    AND TT1.COL1 BETWEEN 1 AND 1000;

Execution Plan
----------------------------------------------------------
Plan hash value: 1414977455

----------------------------------------------------------------------------------------------
| Id  | Operation                     | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |              |   102 |  4488 |    11   (0)| 00:00:01 |
|   1 |  MERGE JOIN                   |              |   102 |  4488 |    11   (0)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID | TT2          |   103 |  2266 |     3   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN           | SYS_C0018343 |  1000 |       |     2   (0)| 00:00:01 |
|*  4 |   SORT JOIN                   |              |  1000 | 22000 |     8   (0)| 00:00:01 |
|   5 |    TABLE ACCESS BY INDEX ROWID| TT1          |  1000 | 22000 |     8   (0)| 00:00:01 |
|*  6 |     INDEX RANGE SCAN          | SYS_C0018341 |  1000 |       |     4   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("TT2"."COL1">=1 AND "TT2"."COL1"<=1000)
   4 - access("TT1"."COL1"="TT2"."COL1")
       filter("TT1"."COL1"="TT2"."COL1")
   6 - access("TT1"."COL1">=1 AND "TT1"."COL1"<=1000)

While the FIRST_ROWS_1000 execution had the following execution plan for the same query:

Execution Plan
----------------------------------------------------------
Plan hash value: 473932778

---------------------------------------------------------------------------------------------
| Id  | Operation                    | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |              |   999 | 43956 |    16   (7)| 00:00:01 |
|*  1 |  HASH JOIN                   |              |   999 | 43956 |    16   (7)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID| TT1          |  1000 | 22000 |     8   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | SYS_C0018341 |  1000 |       |     4   (0)| 00:00:01 |
|   4 |   TABLE ACCESS BY INDEX ROWID| TT2          |  1000 | 22000 |     7   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN          | SYS_C0018343 |  1000 |       |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("TT1"."COL1"="TT2"."COL1")
   3 - access("TT1"."COL1">=1 AND "TT1"."COL1"<=1000)
   5 - access("TT2"."COL1">=1 AND "TT2"."COL1"<=1000)

Another one of the hash joins changed into a sort merge join.  Continuing:

SQL> SELECT
  2    TT1.COL1,
  3    TT1.COL2,
  4    TT1.COL3,
  5    TT2.COL1,
  6    TT2.COL2
  7  FROM
  8    TT1,
  9    TT2
 10  WHERE
 11    TT1.COL1=TT2.COL1
 12    AND TT1.COL1 < 10000;

Execution Plan
----------------------------------------------------------
Plan hash value: 382692435

----------------------------------------------------------------------------------------------
| Id  | Operation                     | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |              |   102 |  3366 |    14   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                 |              |       |       |            |          |
|   2 |   NESTED LOOPS                |              |   102 |  3366 |    14   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID| TT1          |  9999 |   214K|     4   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN          | SYS_C0018341 |  9999 |       |     3   (0)| 00:00:01 |
|*  5 |    INDEX UNIQUE SCAN          | SYS_C0018343 |     1 |       |     0   (0)| 00:00:01 |
|   6 |   TABLE ACCESS BY INDEX ROWID | TT2          |     1 |    11 |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("TT1"."COL1"<10000)
   5 - access("TT1"."COL1"="TT2"."COL1")
       filter("TT2"."COL1"<10000)

Another set of nested loops joins, did the execution plan previously show a hash join?  The FIRST_ROWS_1000 execution had the following execution plan for the same query:

Execution Plan
----------------------------------------------------------
Plan hash value: 1414977455

----------------------------------------------------------------------------------------------
| Id  | Operation                     | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |              |  1002 | 33066 |    66   (0)| 00:00:01 |
|   1 |  MERGE JOIN                   |              |  1002 | 33066 |    66   (0)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID | TT2          |  1002 | 11022 |     7   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN           | SYS_C0018343 |  9999 |       |     3   (0)| 00:00:01 |
|*  4 |   SORT JOIN                   |              |  9999 |   214K|    59   (0)| 00:00:01 |
|   5 |    TABLE ACCESS BY INDEX ROWID| TT1          |  9999 |   214K|    59   (0)| 00:00:01 |
|*  6 |     INDEX RANGE SCAN          | SYS_C0018341 |  9999 |       |    21   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("TT2"."COL1"<10000)
   4 - access("TT1"."COL1"="TT2"."COL1")
       filter("TT1"."COL1"="TT2"."COL1")
   6 - access("TT1"."COL1"<10000)

So, we lost the sort merge join that we picked up with the FIRST_ROWS_1000 OPTIMIZER_MODE and switched to two nested loops joins – interesting… is there a sillyness scale indicator here from hash join to sort merge join to nested loops?  🙂  Continuing:

SQL> SELECT
  2    TT1.COL1,
  3    TT1.COL2,
  4    TT1.COL3,
  5    TT2.COL1,
  6    TT2.COL2
  7  FROM
  8    TT1,
  9    TT2
 10  WHERE
 11    TT1.COL1=TT2.COL1
 12    AND TT1.COL2 > (10000000-10000);

Execution Plan
----------------------------------------------------------
Plan hash value: 654336742

----------------------------------------------------------------------------------------------
| Id  | Operation                     | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |              |   101 |  3333 |    14   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                 |              |       |       |            |          |
|   2 |   NESTED LOOPS                |              |   101 |  3333 |    14   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID| TT1          |  9999 |   214K|     4   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN          | IND_TT1_COL2 |  9999 |       |     3   (0)| 00:00:01 |
|*  5 |    INDEX UNIQUE SCAN          | SYS_C0018343 |     1 |       |     0   (0)| 00:00:01 |
|   6 |   TABLE ACCESS BY INDEX ROWID | TT2          |     1 |    11 |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("TT1"."COL2">9990000)
   5 - access("TT1"."COL1"="TT2"."COL1")

 The FIRST_ROWS_1000 execution had the following execution plan for the same query:

Execution Plan
----------------------------------------------------------
Plan hash value: 4109682512

---------------------------------------------------------------------------------------------
| Id  | Operation                    | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |              |  1001 | 44044 |    77   (2)| 00:00:01 |
|*  1 |  HASH JOIN                   |              |  1001 | 44044 |    77   (2)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID| TT1          |  9999 |   214K|    64   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | IND_TT1_COL2 |  9999 |       |    26   (0)| 00:00:01 |
|   4 |   TABLE ACCESS FULL          | TT2          | 10011 |   107K|    12   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("TT1"."COL1"="TT2"."COL1")
   3 - access("TT1"."COL2">9990000)

We see that the optimizer has switched another execution plan from a hash join to multiple nested loops joins.

The output with FIRST_ROWS_10 and FIRST_ROWS_1 specified for the optimizer mode showed nested loops joins for all execution plans.

When compared to the new baseline, changing the OPTIMIZER_INDEX_CACHING had no effect on the execution plans in the test case.

As we saw in the earlier tests of the previous blog articles, changing the OPTIMIZER_INDEX_COST_ADJ parameter from the default value of 100 to 50 only impacted one SQL statement’s execution plan:

SQL> SELECT
  2    TT1.COL1,
  3    TT1.COL2,
  4    TT1.COL3,
  5    TT2.COL1,
  6    TT2.COL2,
  7    TT2.COL3
  8  FROM
  9    TT1,
 10    TT2
 11  WHERE
 12    TT1.COL1=TT2.COL1
 13    AND TT1.COL1 BETWEEN 1 AND 10;

Execution Plan
----------------------------------------------------------
Plan hash value: 473932778

---------------------------------------------------------------------------------------------
| Id  | Operation                    | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |              |     9 |   396 |     4   (0)| 00:00:01 |
|*  1 |  HASH JOIN                   |              |     9 |   396 |     4   (0)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID| TT1          |    10 |   220 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | SYS_C0018341 |    10 |       |     2   (0)| 00:00:01 |
|   4 |   TABLE ACCESS BY INDEX ROWID| TT2          |    10 |   220 |     2   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN          | SYS_C0018343 |    10 |       |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("TT1"."COL1"="TT2"."COL1")
   3 - access("TT1"."COL1">=1 AND "TT1"."COL1"<=10)
   5 - access("TT2"."COL1">=1 AND "TT2"."COL1"<=10)

While the new baseline showed:

Execution Plan
----------------------------------------------------------
Plan hash value: 382692435

----------------------------------------------------------------------------------------------
| Id  | Operation                     | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |              |     9 |   396 |     5   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                 |              |       |       |            |          |
|   2 |   NESTED LOOPS                |              |     9 |   396 |     5   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID| TT1          |    10 |   220 |     4   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN          | SYS_C0018341 |    10 |       |     3   (0)| 00:00:01 |
|*  5 |    INDEX UNIQUE SCAN          | SYS_C0018343 |     1 |       |     0   (0)| 00:00:01 |
|   6 |   TABLE ACCESS BY INDEX ROWID | TT2          |     1 |    22 |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("TT1"."COL1">=1 AND "TT1"."COL1"<=10)
   5 - access("TT1"."COL1"="TT2"."COL1")
       filter("TT2"."COL1"<=10 AND "TT2"."COL1">=1)

The optimizer switched from two nested loops joins to a single hash join.  What happened when other OPTIMIZER_INDEX_COST_ADJ values were tested?  We see essentially the same pattern of sort merge joins and nested loops joins introduced as we saw with the larger padding values in column COL3 of the test tables.

What is left to test?  We could switch from no-workload system (CPU) statistics to workload system (CPU) statistics.  If we execute the following, and then perform a variety of actions that result in physical reads for a 30 minute time period:

EXEC DBMS_STATS.GATHER_SYSTEM_STATS('interval',interval=>30)

We might find that the contents of SYS.AUX_STATS$ change from this:

PNAME                PVAL1
--------------- ----------
STATUS
DSTART
DSTOP
FLAGS                    1
CPUSPEEDNW      2128.85496
IOSEEKTIM               10
IOTFRSPEED            4096
SREADTIM
MREADTIM
CPUSPEED
MBRC
MAXTHR
SLAVETHR

To this:

PNAME                PVAL1
--------------- ----------
STATUS
DSTART
DSTOP
FLAGS                    0
CPUSPEEDNW      2128.85496
IOSEEKTIM               10
IOTFRSPEED            4096
SREADTIM          2490.746
MREADTIM          9162.743
CPUSPEED              2714
MBRC                    78
MAXTHR              302080
SLAVETHR

You might be wondering what happens to the execution plans with the above workload system statistics in place.  Part 4 of this series will take a look at those execution plans.





First Table is 550MB, Second Table is 26GB – Nested Loops or Full Table Scan? 2

2 10 2010

October 2, 2010

(Back to the Previous Post in the Series) (Forward to the Next Post in the Series)

Let’s try to test a couple of ideas that were suggested either at the end of my previous blog article, or in the comments section of that blog article.  What if we change…

Let’s try changing the Oracle Database release version, the PGA_AGGREGATE_TARGET, and the server.  That is likely too many changes to allow identifying what changed and why it changed, but we will use this server as the baseline for today’s article.  Let’s check a couple of the settings for the database instance:

SQL> COLUMN PNAME FORMAT A15
SQL>
SQL> SELECT
  2    PNAME,
  3    PVAL1
  4  FROM
  5    SYS.AUX_STATS$;

PNAME                PVAL1
--------------- ----------
STATUS
DSTART
DSTOP
FLAGS                    1
CPUSPEEDNW      2128.85496
IOSEEKTIM               10
IOTFRSPEED            4096
SREADTIM
MREADTIM
CPUSPEED
MBRC
MAXTHR
SLAVETHR

SQL> SHOW PARAMETER OPTIMIZER

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
optimizer_capture_sql_plan_baselines boolean     FALSE
optimizer_dynamic_sampling           integer     2
optimizer_features_enable            string      11.2.0.1
optimizer_index_caching              integer     0
optimizer_index_cost_adj             integer     100
optimizer_mode                       string      ALL_ROWS
optimizer_secure_view_merging        boolean     TRUE
optimizer_use_invisible_indexes      boolean     FALSE
optimizer_use_pending_statistics     boolean     FALSE
optimizer_use_sql_plan_baselines     boolean     TRUE

All of the above are at the default values for Oracle Database 11.2.0.1.

SQL> SHOW PARAMETER PGA

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
pga_aggregate_target                 big integer 2000M

The PGA_AGGREGATE_TARGET is at 2000MB (1.953GB), which is four times as high as the previous test with Oracle Database 11.1.0.7.  If we run through the test case, what in the test case results changes?

SQL> SELECT
  2    T1.COL1,
  3    T1.COL2,
  4    T1.COL3,
  5    T2.COL1,
  6    T2.COL2
  7  FROM
  8    T1,
  9    T2
 10  WHERE
 11    T1.COL1=T2.COL1
 12    AND T1.COL2 > (10000000-10000);

Execution Plan
----------------------------------------------------------
Plan hash value: 512758489

---------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name             | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                  |  9999 |  2177K|       |  1238   (1)| 00:00:15 |
|*  1 |  HASH JOIN                   |                  |  9999 |  2177K|  2192K|  1238   (1)| 00:00:15 |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1               |  9999 |  2070K|       |   330   (0)| 00:00:04 |
|*  3 |    INDEX RANGE SCAN          | IND_T1_COL2      |  9999 |       |       |    26   (0)| 00:00:01 |
|   4 |   VIEW                       | index$_join$_002 |   100K|  1074K|       |   691   (1)| 00:00:09 |
|*  5 |    HASH JOIN                 |                  |       |       |       |            |          |
|   6 |     INDEX FAST FULL SCAN     | SYS_C0018300     |   100K|  1074K|       |   236   (1)| 00:00:03 |
|   7 |     INDEX FAST FULL SCAN     | IND_T2_COL2      |   100K|  1074K|       |   299   (1)| 00:00:04 |
---------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."COL1"="T2"."COL1")
   3 - access("T1"."COL2">9990000)
   5 - access(ROWID=ROWID)

The above plan shows two hash joins, with the second of the hash joins (the first hash join executed) performed on the ROWIDs returned by two index fast full scans.  Under 11.1.0.7 the plan looked like this:

Execution Plan
----------------------------------------------------------
Plan hash value: 570481587

----------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name        | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |             |  9999 |  2167K|       |   804   (2)| 00:00:04 |
|*  1 |  HASH JOIN                   |             |  9999 |  2167K|  2184K|   804   (2)| 00:00:04 |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1          |  9999 |  2060K|       |   331   (1)| 00:00:02 |
|*  3 |    INDEX RANGE SCAN          | IND_T1_COL2 |  9999 |       |       |    26   (0)| 00:00:01 |
|   4 |   TABLE ACCESS FULL          | T2          |   100K|  1074K|       |   379   (2)| 00:00:02 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."COL1"="T2"."COL1")
   3 - access("T1"."COL2">9990000)

So, just one execution plan changed.  Let’s try a couple of more tests, comparing the results with the new baseline.

PGA_AGGREGATE_TARGET = 1000M, no changes from the new baseline (11.2.0.1).

PGA_AGGREGATE_TARGET = 100M, no changes from the new baseline (11.2.0.1).

PGA_AGGREGATE_TARGET = 2000M, OPTIMIZER_MODE=FIRST_ROWS_1000 – there are enough changes to support posting the entire output.

SQL> SELECT
  2    T1.COL1,
  3    T1.COL2,
  4    T1.COL3,
  5    T2.COL1,
  6    T2.COL2,
  7    T2.COL3
  8  FROM
  9    T1,
 10    T2
 11  WHERE
 12    T1.COL1=T2.COL1;

Execution Plan
----------------------------------------------------------
Plan hash value: 2077217387

---------------------------------------------------------------------------------------------
| Id  | Operation                    | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |              |  1000 |   414K|  1085   (1)| 00:00:14 |
|   1 |  NESTED LOOPS                |              |       |       |            |          |
|   2 |   NESTED LOOPS               |              |  1000 |   414K|  1085   (1)| 00:00:14 |
|   3 |    TABLE ACCESS FULL         | T1           | 10000 |  2070K|    84   (0)| 00:00:02 |
|*  4 |    INDEX UNIQUE SCAN         | SYS_C0018300 |     1 |       |     0   (0)| 00:00:01 |
|   5 |   TABLE ACCESS BY INDEX ROWID| T2           |     1 |   212 |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("T1"."COL1"="T2"."COL1")

The above was a simple hash join before – telling the optimizer that we only want the first 1,000 rows was sufficient to switch from a hash join to a nested loops join.  Now let’s try again, this time joining on the column that has an index with a high clustering factor caused by the descending number sequence in that column:

SQL> SELECT
  2    T1.COL1,
  3    T1.COL2,
  4    T1.COL3,
  5    T2.COL1,
  6    T2.COL2,
  7    T2.COL3
  8  FROM
  9    T1,
 10    T2
 11  WHERE
 12    T1.COL2=T2.COL2;

Execution Plan
----------------------------------------------------------
Plan hash value: 2959412835

-----------------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |  1000 |   621K|       |  2075   (1)| 00:00:25 |
|*  1 |  HASH JOIN         |      |  1000 |   621K|    21M|  2075   (1)| 00:00:25 |
|   2 |   TABLE ACCESS FULL| T2   |   100K|    20M|       |   825   (1)| 00:00:10 |
|   3 |   TABLE ACCESS FULL| T1   | 10000 |  2070K|       |    84   (0)| 00:00:02 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."COL2"="T2"."COL2")

The index clustering factor, caused by the descending number sequence in column COL2, this time did not cause a change in the execution plan, unlike when we joined on the column with the ascending sequence of numbers.  Continuing:

SQL> SELECT
  2    T1.COL1,
  3    T1.COL2,
  4    T1.COL3,
  5    T2.COL1,
  6    T2.COL2,
  7    T2.COL3
  8  FROM
  9    T1,
 10    T2
 11  WHERE
 12    T1.COL1=T2.COL1
 13    AND T1.COL1 BETWEEN 1 AND 1000;

Execution Plan
----------------------------------------------------------
Plan hash value: 1188283739

---------------------------------------------------------------------------------------------
| Id  | Operation                    | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |              |   999 |   413K|    70   (2)| 00:00:01 |
|*  1 |  HASH JOIN                   |              |   999 |   413K|    70   (2)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1           |  1000 |   207K|    35   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | SYS_C0018298 |  1000 |       |     4   (0)| 00:00:01 |
|   4 |   TABLE ACCESS BY INDEX ROWID| T2           |  1000 |   207K|    34   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN          | SYS_C0018300 |  1000 |       |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."COL1"="T2"."COL1")
   3 - access("T1"."COL1">=1 AND "T1"."COL1"<=1000)
   5 - access("T2"."COL1">=1 AND "T2"."COL1"<=1000)

No change in the above execution plan.  Continuing:

SQL> SELECT
  2    T1.COL1,
  3    T1.COL2,
  4    T1.COL3,
  5    T2.COL1,
  6    T2.COL2,
  7    T2.COL3
  8  FROM
  9    T1,
 10    T2
 11  WHERE
 12    T1.COL1=T2.COL1
 13    AND T1.COL2 BETWEEN 1 AND 1000;

Execution Plan
----------------------------------------------------------
Plan hash value: 2243940848

----------------------------------------------------------------------------------------------
| Id  | Operation                     | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |              |     1 |   424 |     5   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                 |              |       |       |            |          |
|   2 |   NESTED LOOPS                |              |     1 |   424 |     5   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID| T1           |     1 |   212 |     4   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN          | IND_T1_COL2  |     1 |       |     3   (0)| 00:00:01 |
|*  5 |    INDEX UNIQUE SCAN          | SYS_C0018300 |     1 |       |     0   (0)| 00:00:01 |
|   6 |   TABLE ACCESS BY INDEX ROWID | T2           |     1 |   212 |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("T1"."COL2">=1 AND "T1"."COL2"<=1000)
   5 - access("T1"."COL1"="T2"."COL1")

No change in the above execution plan, the optimizer is correctly predicting that one or less rows will be returned.  Continuing:

SQL> SELECT
  2    T1.COL1,
  3    T1.COL2,
  4    T1.COL3,
  5    T2.COL1,
  6    T2.COL2
  7  FROM
  8    T1,
  9    T2
 10  WHERE
 11    T1.COL1=T2.COL1
 12    AND T1.COL1 < 10000;

Execution Plan
----------------------------------------------------------
Plan hash value: 1945774515

----------------------------------------------------------------------------------------------
| Id  | Operation                     | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |              |  1002 |   218K|   135   (0)| 00:00:02 |
|   1 |  NESTED LOOPS                 |              |       |       |            |          |
|   2 |   NESTED LOOPS                |              |  1002 |   218K|   135   (0)| 00:00:02 |
|   3 |    TABLE ACCESS BY INDEX ROWID| T1           |  9999 |  2070K|    35   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN          | SYS_C0018298 |  9999 |       |     4   (0)| 00:00:01 |
|*  5 |    INDEX UNIQUE SCAN          | SYS_C0018300 |     1 |       |     0   (0)| 00:00:01 |
|   6 |   TABLE ACCESS BY INDEX ROWID | T2           |     1 |    11 |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("T1"."COL1"<10000)
   5 - access("T1"."COL1"="T2"."COL1")
       filter("T2"."COL1"<10000)

There is a definite change in the above execution plan, which had appeared like this in the baseline:

Execution Plan
----------------------------------------------------------
Plan hash value: 2371011080

-------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                  |  9998 |  2177K|   586   (1)| 00:00:08 |
|*  1 |  HASH JOIN                   |                  |  9998 |  2177K|   586   (1)| 00:00:08 |
|*  2 |   VIEW                       | index$_join$_002 |  9999 |   107K|   260   (1)| 00:00:04 |
|*  3 |    HASH JOIN                 |                  |       |       |            |          |
|*  4 |     INDEX RANGE SCAN         | SYS_C0018300     |  9999 |   107K|    22  (10)| 00:00:01 |
|   5 |     INDEX FAST FULL SCAN     | IND_T2_COL2      |  9999 |   107K|   299   (1)| 00:00:04 |
|   6 |   TABLE ACCESS BY INDEX ROWID| T1               |  9999 |  2070K|   325   (0)| 00:00:04 |
|*  7 |    INDEX RANGE SCAN          | SYS_C0018298     |  9999 |       |    21   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."COL1"="T2"."COL1")
   2 - filter("T2"."COL1"<10000)
   3 - access(ROWID=ROWID)
   4 - access("T2"."COL1"<10000)
   7 - access("T1"."COL1"<10000)

The predicted number of rows decreased from 9998 to 1002 due to the OPTIMIZER_MODE=FIRST_ROWS_1000 setting, and the optimizer switched from two hash joins to two nested loops joins.  Continuing:

SQL> SELECT
  2    T1.COL1,
  3    T1.COL2,
  4    T1.COL3,
  5    T2.COL1,
  6    T2.COL2
  7  FROM
  8    T1,
  9    T2
 10  WHERE
 11    T1.COL1=T2.COL1
 12    AND T1.COL2 < 10000;

Execution Plan
----------------------------------------------------------
Plan hash value: 2243940848

----------------------------------------------------------------------------------------------
| Id  | Operation                     | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |              |     1 |   223 |     5   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                 |              |       |       |            |          |
|   2 |   NESTED LOOPS                |              |     1 |   223 |     5   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID| T1           |     1 |   212 |     4   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN          | IND_T1_COL2  |     1 |       |     3   (0)| 00:00:01 |
|*  5 |    INDEX UNIQUE SCAN          | SYS_C0018300 |     1 |       |     0   (0)| 00:00:01 |
|   6 |   TABLE ACCESS BY INDEX ROWID | T2           |     1 |    11 |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("T1"."COL2"<10000)
   5 - access("T1"."COL1"="T2"."COL1")

No change from the baseline because the optimizer previously predicted that one or less rows from table T2 would be returned.  Continuing:

SQL> SELECT
  2    T1.COL1,
  3    T1.COL2,
  4    T1.COL3,
  5    T2.COL1,
  6    T2.COL2
  7  FROM
  8    T1,
  9    T2
 10  WHERE
 11    T1.COL1=T2.COL1
 12    AND T1.COL2 > (10000000-10000);

Execution Plan
----------------------------------------------------------
Plan hash value: 2243940848

----------------------------------------------------------------------------------------------
| Id  | Operation                     | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |              |  1001 |   217K|   136   (0)| 00:00:02 |
|   1 |  NESTED LOOPS                 |              |       |       |            |          |
|   2 |   NESTED LOOPS                |              |  1001 |   217K|   136   (0)| 00:00:02 |
|   3 |    TABLE ACCESS BY INDEX ROWID| T1           |  9999 |  2070K|    36   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN          | IND_T1_COL2  |  9999 |       |     5   (0)| 00:00:01 |
|*  5 |    INDEX UNIQUE SCAN          | SYS_C0018300 |     1 |       |     0   (0)| 00:00:01 |
|   6 |   TABLE ACCESS BY INDEX ROWID | T2           |     1 |    11 |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("T1"."COL2">9990000)
   5 - access("T1"."COL1"="T2"."COL1")

The above execution plan changed from what appeared in the baseline:

Execution Plan
----------------------------------------------------------
Plan hash value: 512758489

---------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name             | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                  |  9999 |  2177K|       |  1238   (1)| 00:00:15 |
|*  1 |  HASH JOIN                   |                  |  9999 |  2177K|  2192K|  1238   (1)| 00:00:15 |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1               |  9999 |  2070K|       |   330   (0)| 00:00:04 |
|*  3 |    INDEX RANGE SCAN          | IND_T1_COL2      |  9999 |       |       |    26   (0)| 00:00:01 |
|   4 |   VIEW                       | index$_join$_002 |   100K|  1074K|       |   691   (1)| 00:00:09 |
|*  5 |    HASH JOIN                 |                  |       |       |       |            |          |
|   6 |     INDEX FAST FULL SCAN     | SYS_C0018300     |   100K|  1074K|       |   236   (1)| 00:00:03 |
|   7 |     INDEX FAST FULL SCAN     | IND_T2_COL2      |   100K|  1074K|       |   299   (1)| 00:00:04 |
---------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."COL1"="T2"."COL1")
   3 - access("T1"."COL2">9990000)
   5 - access(ROWID=ROWID)

Convincing the optimizer that we only wanted the first 1,000 rows was sufficient to switch from two hash joins to two nested loops joins.

SQL> SELECT
  2    T1.COL1,
  3    T1.COL2,
  4    T1.COL3,
  5    T2.COL1,
  6    T2.COL2,
  7    T2.COL3
  8  FROM
  9    T1,
 10    T2
 11  WHERE
 12    T1.COL1=T2.COL1
 13    AND T1.COL1 BETWEEN 1 AND 100;

Execution Plan
----------------------------------------------------------
Plan hash value: 1188283739

---------------------------------------------------------------------------------------------
| Id  | Operation                    | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |              |    99 | 41976 |    14   (8)| 00:00:01 |
|*  1 |  HASH JOIN                   |              |    99 | 41976 |    14   (8)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1           |   100 | 21200 |     7   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | SYS_C0018298 |   100 |       |     3   (0)| 00:00:01 |
|   4 |   TABLE ACCESS BY INDEX ROWID| T2           |   100 | 21200 |     6   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN          | SYS_C0018300 |   100 |       |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."COL1"="T2"."COL1")
   3 - access("T1"."COL1">=1 AND "T1"."COL1"<=100)
   5 - access("T2"."COL1">=1 AND "T2"."COL1"<=100)

The above output is the same as the baseline – we are already expected to return less than 1,000 rows, so we are still performing the hash join.  Continuing:

SQL> SELECT
  2    T1.COL1,
  3    T1.COL2,
  4    T1.COL3,
  5    T2.COL1,
  6    T2.COL2,
  7    T2.COL3
  8  FROM
  9    T1,
 10    T2
 11  WHERE
 12    T1.COL1=T2.COL1
 13    AND T1.COL1 BETWEEN 1 AND 10;

Execution Plan
----------------------------------------------------------
Plan hash value: 1945774515

----------------------------------------------------------------------------------------------
| Id  | Operation                     | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |              |     9 |  3816 |     5   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                 |              |       |       |            |          |
|   2 |   NESTED LOOPS                |              |     9 |  3816 |     5   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID| T1           |    10 |  2120 |     4   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN          | SYS_C0018298 |    10 |       |     3   (0)| 00:00:01 |
|*  5 |    INDEX UNIQUE SCAN          | SYS_C0018300 |     1 |       |     0   (0)| 00:00:01 |
|   6 |   TABLE ACCESS BY INDEX ROWID | T2           |     1 |   212 |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("T1"."COL1">=1 AND "T1"."COL1"<=10)
   5 - access("T1"."COL1"="T2"."COL1")
       filter("T2"."COL1"<=10 AND "T2"."COL1">=1)

The above is the same as the baseline.  So, we did see some changes when the OPTIMIZER_MODE was set to FIRST_ROWS_1000.  Any predictions what will happen when the OPTIMIZER_MODE is set to FIRST_ROWS_100, FIRST_ROWS_10, or FIRST_ROWS_1?

PGA_AGGREGATE_TARGET = 2000M, OPTIMIZER_MODE=FIRST_ROWS_100, just listing the changes from when OPTIMIZER_MODE=FIRST_ROWS_1000:

SQL> SELECT
  2    T1.COL1,
  3    T1.COL2,
  4    T1.COL3,
  5    T2.COL1,
  6    T2.COL2,
  7    T2.COL3
  8  FROM
  9    T1,
 10    T2
 11  WHERE
 12    T1.COL2=T2.COL2;

Execution Plan
----------------------------------------------------------
Plan hash value: 617224274

--------------------------------------------------------------------------------------------
| Id  | Operation                    | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |             |   100 | 42400 |   303   (0)| 00:00:04 |
|   1 |  NESTED LOOPS                |             |       |       |            |          |
|   2 |   NESTED LOOPS               |             |   100 | 42400 |   303   (0)| 00:00:04 |
|   3 |    TABLE ACCESS FULL         | T2          |   100K|    20M|     3   (0)| 00:00:01 |
|*  4 |    INDEX RANGE SCAN          | IND_T1_COL2 |     1 |       |     2   (0)| 00:00:01 |
|   5 |   TABLE ACCESS BY INDEX ROWID| T1          |     1 |   212 |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("T1"."COL2"="T2"."COL2")

The clustering factor for the index on COL 2 was not sufficient this time to keep the above execution plan from changing from a single hash join to two nested loops joins.

SQL> SELECT
  2    T1.COL1,
  3    T1.COL2,
  4    T1.COL3,
  5    T2.COL1,
  6    T2.COL2,
  7    T2.COL3
  8  FROM
  9    T1,
 10    T2
 11  WHERE
 12    T1.COL1=T2.COL1
 13    AND T1.COL1 BETWEEN 1 AND 1000;

Execution Plan
----------------------------------------------------------
Plan hash value: 1945774515

----------------------------------------------------------------------------------------------
| Id  | Operation                     | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |              |   102 | 43248 |    17   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                 |              |       |       |            |          |
|   2 |   NESTED LOOPS                |              |   102 | 43248 |    17   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID| T1           |  1000 |   207K|     7   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN          | SYS_C0018298 |  1000 |       |     3   (0)| 00:00:01 |
|*  5 |    INDEX UNIQUE SCAN          | SYS_C0018300 |     1 |       |     0   (0)| 00:00:01 |
|   6 |   TABLE ACCESS BY INDEX ROWID | T2           |     1 |   212 |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("T1"."COL1">=1 AND "T1"."COL1"<=1000)
   5 - access("T1"."COL1"="T2"."COL1")
       filter("T2"."COL1"<=1000 AND "T2"."COL1">=1)

Telling the optimizer that we only wanted the first 1,000 rows was sufficient to change from a hash join to two nested loops joins.

PGA_AGGREGATE_TARGET = 2000M, OPTIMIZER_MODE=FIRST_ROWS_10, just listing the changes from when OPTIMIZER_MODE=FIRST_ROWS_100:

SQL> SELECT
  2    T1.COL1,
  3    T1.COL2,
  4    T1.COL3,
  5    T2.COL1,
  6    T2.COL2,
  7    T2.COL3
  8  FROM
  9    T1,
 10    T2
 11  WHERE
 12    T1.COL1=T2.COL1
 13    AND T1.COL1 BETWEEN 1 AND 100;

Execution Plan
----------------------------------------------------------
Plan hash value: 1945774515

----------------------------------------------------------------------------------------------
| Id  | Operation                     | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |              |    11 |  4664 |     5   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                 |              |       |       |            |          |
|   2 |   NESTED LOOPS                |              |    11 |  4664 |     5   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID| T1           |    98 | 20776 |     4   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN          | SYS_C0018298 |   100 |       |     3   (0)| 00:00:01 |
|*  5 |    INDEX UNIQUE SCAN          | SYS_C0018300 |     1 |       |     0   (0)| 00:00:01 |
|   6 |   TABLE ACCESS BY INDEX ROWID | T2           |     1 |   212 |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("T1"."COL1">=1 AND "T1"."COL1"<=100)
   5 - access("T1"."COL1"="T2"."COL1")
       filter("T2"."COL1"<=100 AND "T2"."COL1">=1)

The above execution plan, which was originally predicted to return 100 rows is now predicted to return 11 rows, so the one hash join became two nested loops joins.  At this point all of the excution plans for the test queries are showing nested loops joins, when previously they showed hash joins – so there is little point in looking at the output from the test case that shows what happens when the OPTIMIZER_MODE is set to FIRST_ROWS_1.

Changing the OPTIMIZER_INDEX_CACHING parameter had no effect on the execution plans when those execution plans were compared with the baseline output.

Changing the OPTIMIZER_INDEX_COST_ADJ to a value of 50 had no effect on the execution plans, except for the last execution plan.  The execution plan now shows a hash join while the baseline shows two nested loops joins.  OPTIMIZER_INDEX_COST_ADJ=50:

SQL> SELECT
  2    T1.COL1,
  3    T1.COL2,
  4    T1.COL3,
  5    T2.COL1,
  6    T2.COL2,
  7    T2.COL3
  8  FROM
  9    T1,
 10    T2
 11  WHERE
 12    T1.COL1=T2.COL1
 13    AND T1.COL1 BETWEEN 1 AND 10;

Execution Plan
----------------------------------------------------------
Plan hash value: 1188283739

---------------------------------------------------------------------------------------------
| Id  | Operation                    | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |              |     9 |  3816 |     4   (0)| 00:00:01 |
|*  1 |  HASH JOIN                   |              |     9 |  3816 |     4   (0)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1           |    10 |  2120 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | SYS_C0018298 |    10 |       |     2   (0)| 00:00:01 |
|   4 |   TABLE ACCESS BY INDEX ROWID| T2           |    10 |  2120 |     2   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN          | SYS_C0018300 |    10 |       |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."COL1"="T2"."COL1")
   3 - access("T1"."COL1">=1 AND "T1"."COL1"<=10)
   5 - access("T2"."COL1">=1 AND "T2"."COL1"<=10)

From the baseline:

Execution Plan
----------------------------------------------------------
Plan hash value: 1945774515

----------------------------------------------------------------------------------------------
| Id  | Operation                     | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |              |     9 |  3816 |     5   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                 |              |       |       |            |          |
|   2 |   NESTED LOOPS                |              |     9 |  3816 |     5   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID| T1           |    10 |  2120 |     4   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN          | SYS_C0018298 |    10 |       |     3   (0)| 00:00:01 |
|*  5 |    INDEX UNIQUE SCAN          | SYS_C0018300 |     1 |       |     0   (0)| 00:00:01 |
|   6 |   TABLE ACCESS BY INDEX ROWID | T2           |     1 |   212 |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("T1"."COL1">=1 AND "T1"."COL1"<=10)
   5 - access("T1"."COL1"="T2"."COL1")
       filter("T2"."COL1"<=10 AND "T2"."COL1">=1)

So, we saw a small change when experimenting with OPTIMIZER_INDEX_COST_ADJ, setting it to half of its normal value.  What if we try a couple of more values?  Don’t say that someone didn’t warn you not to adjust that parameter.  With  OPTIMIZER_INDEX_COST_ADJ=20:

SQL> SELECT
  2    T1.COL1,
  3    T1.COL2,
  4    T1.COL3,
  5    T2.COL1,
  6    T2.COL2,
  7    T2.COL3
  8  FROM
  9    T1,
 10    T2
 11  WHERE
 12    T1.COL1=T2.COL1;

Execution Plan
----------------------------------------------------------
Plan hash value: 3187704125

-----------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name         | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |              |   100K|    40M|       | 11880   (1)| 00:02:23 |
|   1 |  MERGE JOIN                  |              |   100K|    40M|       | 11880   (1)| 00:02:23 |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1           |  1000K|   202M|       |  6441   (1)| 00:01:18 |
|   3 |    INDEX FULL SCAN           | SYS_C0018298 |  1000K|       |       |   377   (1)| 00:00:05 |
|*  4 |   SORT JOIN                  |              |   100K|    20M|    43M|  5439   (1)| 00:01:06 |
|   5 |    TABLE ACCESS FULL         | T2           |   100K|    20M|       |   825   (1)| 00:00:10 |
-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("T1"."COL1"="T2"."COL1")
       filter("T1"."COL1"="T2"."COL1")

Notice in the above that we now see a sort merge join, when before we saw a nice hash join, as seen in the baseline output:

Execution Plan
----------------------------------------------------------
Plan hash value: 2959412835

-----------------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |   100K|    40M|       | 20707   (1)| 00:04:09 |
|*  1 |  HASH JOIN         |      |   100K|    40M|    21M| 20707   (1)| 00:04:09 |
|   2 |   TABLE ACCESS FULL| T2   |   100K|    20M|       |   825   (1)| 00:00:10 |
|   3 |   TABLE ACCESS FULL| T1   |  1000K|   202M|       |  8225   (1)| 00:01:39 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
    1 - access("T1"."COL1"="T2"."COL1")

What about when we join T1.COL2 to T2.COL2 – want to take a guess?

SQL> SELECT
  2    T1.COL1,
  3    T1.COL2,
  4    T1.COL3,
  5    T2.COL1,
  6    T2.COL2,
  7    T2.COL3
  8  FROM
  9    T1,
 10    T2
 11  WHERE
 12    T1.COL2=T2.COL2;

Execution Plan
----------------------------------------------------------
Plan hash value: 1173042340

----------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name        | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |             |   100K|    40M|       | 11979   (1)| 00:02:24 |
|   1 |  MERGE JOIN                  |             |   100K|    40M|       | 11979   (1)| 00:02:24 |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1          |  1000K|   202M|       |  6540   (1)| 00:01:19 |
|   3 |    INDEX FULL SCAN           | IND_T1_COL2 |  1000K|       |       |   476   (1)| 00:00:06 |
|*  4 |   SORT JOIN                  |             |   100K|    20M|    43M|  5439   (1)| 00:01:06 |
|   5 |    TABLE ACCESS FULL         | T2          |   100K|    20M|       |   825   (1)| 00:00:10 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("T1"."COL2"="T2"."COL2")
       filter("T1"."COL2"="T2"."COL2")

I think that this could be a problem.  What else do we see in the output?  Just like when OPTIMIZER_INDEX_COST_ADJ was set to 50, the final execution plan now shows a hash join rather than two nested loops joins as seen in the baseline output.  What happened when OPTIMIZER_INDEX_COST_ADJ was decreased to 10?  With the exception of the first two sort merge joins, the excution plans were identical to those found in the baseline output – the last execution plan switched back to two nested loops joins:

SQL> SELECT
  2    T1.COL1,
  3    T1.COL2,
  4    T1.COL3,
  5    T2.COL1,
  6    T2.COL2,
  7    T2.COL3
  8  FROM
  9    T1,
 10    T2
 11  WHERE
 12    T1.COL1=T2.COL1
 13    AND T1.COL1 BETWEEN 1 AND 10;

Execution Plan
----------------------------------------------------------
Plan hash value: 1945774515

----------------------------------------------------------------------------------------------
| Id  | Operation                     | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |              |     9 |  3816 |     2   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                 |              |       |       |            |          |
|   2 |   NESTED LOOPS                |              |     9 |  3816 |     2   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID| T1           |    10 |  2120 |     1   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN          | SYS_C0018298 |    10 |       |     1   (0)| 00:00:01 |
|*  5 |    INDEX UNIQUE SCAN          | SYS_C0018300 |     1 |       |     1   (0)| 00:00:01 |
|   6 |   TABLE ACCESS BY INDEX ROWID | T2           |     1 |   212 |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("T1"."COL1">=1 AND "T1"."COL1"<=10)
   5 - access("T1"."COL1"="T2"."COL1")
       filter("T2"."COL1"<=10 AND "T2"."COL1">=1)

So, what happens when OPTIMIZER_INDEX_COST_ADJ is set to 1, a value that I have seen recommended a couple of times in various documents and a couple of forum posts?  The first two execution plans that included sort merge joins with the other tested values of OPTIMIZER_INDEX_COST_ADJ now show an execution plan similar to the following:

SQL> SELECT
  2    T1.COL1,
  3    T1.COL2,
  4    T1.COL3,
  5    T2.COL1,
  6    T2.COL2,
  7    T2.COL3
  8  FROM
  9    T1,
 10    T2
 11  WHERE
 12    T1.COL1=T2.COL1;

Execution Plan
----------------------------------------------------------
Plan hash value: 285382392

---------------------------------------------------------------------------------------------
| Id  | Operation                    | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |              |   100K|    40M|  2825   (1)| 00:00:34 |
|   1 |  NESTED LOOPS                |              |       |       |            |          |
|   2 |   NESTED LOOPS               |              |   100K|    40M|  2825   (1)| 00:00:34 |
|   3 |    TABLE ACCESS FULL         | T2           |   100K|    20M|   825   (1)| 00:00:10 |
|*  4 |    INDEX UNIQUE SCAN         | SYS_C0018298 |     1 |       |     1   (0)| 00:00:01 |
|   5 |   TABLE ACCESS BY INDEX ROWID| T1           |     1 |   212 |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("T1"."COL1"="T2"."COL1")

No more sort merge join, and no more hash join either, even when we join on the column COL2 (with the descending values).  Any other changes when compared to the baseline output?

SQL> SELECT
  2    T1.COL1,
  3    T1.COL2,
  4    T1.COL3,
  5    T2.COL1,
  6    T2.COL2
  7  FROM
  8    T1,
  9    T2
 10  WHERE
 11    T1.COL1=T2.COL1
 12    AND T1.COL1 < 10000;

Execution Plan
----------------------------------------------------------
Plan hash value: 3560619641

---------------------------------------------------------------------------------------------
| Id  | Operation                    | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |              |  9998 |  2177K|     7  (15)| 00:00:01 |
|*  1 |  HASH JOIN                   |              |  9998 |  2177K|     7  (15)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID| T2           |  9999 |   107K|     3   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | SYS_C0018300 |  9999 |       |     1   (0)| 00:00:01 |
|   4 |   TABLE ACCESS BY INDEX ROWID| T1           |  9999 |  2070K|     3   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN          | SYS_C0018298 |  9999 |       |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."COL1"="T2"."COL1")
   3 - access("T2"."COL1"<10000)
   5 - access("T1"."COL1"<10000)

The above shows a hash join, what did the baseline show?

Execution Plan
----------------------------------------------------------
Plan hash value: 2371011080

-------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                  |  9998 |  2177K|   586   (1)| 00:00:08 |
|*  1 |  HASH JOIN                   |                  |  9998 |  2177K|   586   (1)| 00:00:08 |
|*  2 |   VIEW                       | index$_join$_002 |  9999 |   107K|   260   (1)| 00:00:04 |
|*  3 |    HASH JOIN                 |                  |       |       |            |          |
|*  4 |     INDEX RANGE SCAN         | SYS_C0018300     |  9999 |   107K|    22  (10)| 00:00:01 |
|   5 |     INDEX FAST FULL SCAN     | IND_T2_COL2      |  9999 |   107K|   299   (1)| 00:00:04 |
|   6 |   TABLE ACCESS BY INDEX ROWID| T1               |  9999 |  2070K|   325   (0)| 00:00:04 |
|*  7 |    INDEX RANGE SCAN          | SYS_C0018298     |  9999 |       |    21   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."COL1"="T2"."COL1")
   2 - filter("T2"."COL1"<10000)
   3 - access(ROWID=ROWID)
   4 - access("T2"."COL1"<10000)
   7 - access("T1"."COL1"<10000)

Any other changes?

SQL> SELECT
  2    T1.COL1,
  3    T1.COL2,
  4    T1.COL3,
  5    T2.COL1,
  6    T2.COL2
  7  FROM
  8    T1,
  9    T2
 10  WHERE
 11    T1.COL1=T2.COL1
 12    AND T1.COL2 > (10000000-10000);

Execution Plan
----------------------------------------------------------
Plan hash value: 2243940848

----------------------------------------------------------------------------------------------
| Id  | Operation                     | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |              |  9999 |  2177K|   103   (0)| 00:00:02 |
|   1 |  NESTED LOOPS                 |              |       |       |            |          |
|   2 |   NESTED LOOPS                |              |  9999 |  2177K|   103   (0)| 00:00:02 |
|   3 |    TABLE ACCESS BY INDEX ROWID| T1           |  9999 |  2070K|     3   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN          | IND_T1_COL2  |  9999 |       |     1   (0)| 00:00:01 |
|*  5 |    INDEX UNIQUE SCAN          | SYS_C0018300 |     1 |       |     1   (0)| 00:00:01 |
|   6 |   TABLE ACCESS BY INDEX ROWID | T2           |     1 |    11 |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("T1"."COL2">9990000)
   5 - access("T1"."COL1"="T2"."COL1")

In the baseline output the execution plan appeared as follows:

Execution Plan
----------------------------------------------------------
Plan hash value: 512758489

---------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name             | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                  |  9999 |  2177K|       |  1238   (1)| 00:00:15 |
|*  1 |  HASH JOIN                   |                  |  9999 |  2177K|  2192K|  1238   (1)| 00:00:15 |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1               |  9999 |  2070K|       |   330   (0)| 00:00:04 |
|*  3 |    INDEX RANGE SCAN          | IND_T1_COL2      |  9999 |       |       |    26   (0)| 00:00:01 |
|   4 |   VIEW                       | index$_join$_002 |   100K|  1074K|       |   691   (1)| 00:00:09 |
|*  5 |    HASH JOIN                 |                  |       |       |       |            |          |
|   6 |     INDEX FAST FULL SCAN     | SYS_C0018300     |   100K|  1074K|       |   236   (1)| 00:00:03 |
|   7 |     INDEX FAST FULL SCAN     | IND_T2_COL2      |   100K|  1074K|       |   299   (1)| 00:00:04 |
---------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."COL1"="T2"."COL1")
   3 - access("T1"."COL2">9990000)
   5 - access(ROWID=ROWID)

So, we saw a bit of a change there, but at least the access cost in the plan dropped from 1238 to 103, so certainly the query must execute faster, right? 🙂  Any other changes?

SQL> SELECT
  2    T1.COL1,
  3    T1.COL2,
  4    T1.COL3,
  5    T2.COL1,
  6    T2.COL2,
  7    T2.COL3
  8  FROM
  9    T1,
 10    T2
 11  WHERE
 12    T1.COL1=T2.COL1
 13    AND T1.COL1 BETWEEN 1 AND 100;

Execution Plan
----------------------------------------------------------
Plan hash value: 1945774515

----------------------------------------------------------------------------------------------
| Id  | Operation                     | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |              |    99 | 41976 |     2   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                 |              |       |       |            |          |
|   2 |   NESTED LOOPS                |              |    99 | 41976 |     2   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID| T1           |   100 | 21200 |     1   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN          | SYS_C0018298 |   100 |       |     1   (0)| 00:00:01 |
|*  5 |    INDEX UNIQUE SCAN          | SYS_C0018300 |     1 |       |     1   (0)| 00:00:01 |
|   6 |   TABLE ACCESS BY INDEX ROWID | T2           |     1 |   212 |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("T1"."COL1">=1 AND "T1"."COL1"<=100)
   5 - access("T1"."COL1"="T2"."COL1")
       filter("T2"."COL1"<=100 AND "T2"."COL1">=1)

The baseline output showed a hash join, with the entire cost of the execution plan at 14.

Pavan Kumar decided to try a couple of modifications to my original test case, allowing the HASH_AREA_SIZE to default to 128KB with an unset PGA_AGGREGATE_TARGET and a 10 character filler value in COL3 rather than a 200 character filler value in that column.  He managed to obtain at least one sort merge join in an execution plan.  You can see his test case here.  I might try a similar change to the original test case to see what happens.

You can download the full test script here: HashOrNestedLoops.sql

So, what else do we need to test?  Changing the row lengths, changing the join columns from NUMBER columns to VARCHAR2 columns, and adjusting the system (CPU) statistics from the default values to something else (hopefully realistic values) as suggested by Jan-Marten Spit.





Disabling AUTOTRACE Causes “SP2-0575: Use of Oracle SQL feature not in SQL92 Entry Level” on 11.2.0.1

2 10 2010

October 2, 2010

I encountered a fun problem today while performing a couple of tests with the AUTOTRACE feature in SQL*Plus.  I am using a 32 bit 11.2.0.1 Oracle client connected to a 64 bit 11.2.0.1 Oracle database.  It seems that 50% of the time when I issue “SET AUTOTRACE OFF” or “set autotrace off” I receive the SP2-0575 error and my session is disconnected from the database, although it seems to happen randomly when disabling AUTOTRACE.  Interesting problem I think.  Here is a partial output from the script:

SQL> ALTER SYSTEM SET OPTIMIZER_INDEX_COST_ADJ=10 SCOPE=MEMORY;

System altered.

SQL> ALTER SYSTEM FLUSH SHARED_POOL;

System altered.

SQL> HOST sleep 10
SQL>
SQL>SPOOL testresults11201CostAdj10.txt
SQL>
SQL> SHOW PARAMETER OPTIMIZER

NAME                                 TYPE        VALUE
------------------------------------ ----------- --------
optimizer_capture_sql_plan_baselines boolean     FALSE
optimizer_dynamic_sampling           integer     2
optimizer_features_enable            string      11.2.0.1
optimizer_index_caching              integer     0
optimizer_index_cost_adj             integer     10
optimizer_mode                       string      ALL_ROWS
optimizer_secure_view_merging        boolean     TRUE
optimizer_use_invisible_indexes      boolean     FALSE
optimizer_use_pending_statistics     boolean     FALSE
optimizer_use_sql_plan_baselines     boolean     TRUE
SQL> SET AUTOTRACE TRACEONLY EXPLAIN
SQL>
(several select statements similar to the following were executed, see yesterday's article for the complete list)
SQL> SELECT
  2    T1.COL1,
  3    T1.COL2,
  4    T1.COL3,
  5    T2.COL1,
  6    T2.COL2,
  7    T2.COL3
  8  FROM
  9    T1,
 10    T2
 11  WHERE
 12    T1.COL1=T2.COL1
 13    AND T1.COL1 BETWEEN 1 AND 10;

Execution Plan
----------------------------------------------------------
Plan hash value: 1945774515

----------------------------------------------------------------------------------------------
| Id  | Operation                     | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |              |     9 |  3816 |     2   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                 |              |       |       |            |          |
|   2 |   NESTED LOOPS                |              |     9 |  3816 |     2   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID| T1           |    10 |  2120 |     1   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN          | SYS_C0018298 |    10 |       |     1   (0)| 00:00:01 |
|*  5 |    INDEX UNIQUE SCAN          | SYS_C0018300 |     1 |       |     1   (0)| 00:00:01 |
|   6 |   TABLE ACCESS BY INDEX ROWID | T2           |     1 |   212 |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("T1"."COL1">=1 AND "T1"."COL1"<=10)
   5 - access("T1"."COL1"="T2"."COL1")
       filter("T2"."COL1"<=10 AND "T2"."COL1">=1)

SQL>
SQL> SPOOL OFF
SQL> set autotrace off
SP2-0575: Use of Oracle SQL feature not in SQL92 Entry Level.
SQL>
SQL> ALTER SYSTEM SET OPTIMIZER_INDEX_COST_ADJ=1 SCOPE=MEMORY;
ERROR:
ORA-03114: not connected to ORACLE

The problem is not specific to the modification of the OPTIMIZER_INDEX_COST_ADJ parameter, and the error does not always happen after changing a specific parameter. 

Metalink (MOS) Doc ID 563269.1 does not apply:

SQL> select owner, object_type from all_objects where object_name='DUAL';

OWNER      OBJECT_TYPE
---------- -------------------
SYS        TABLE
PUBLIC     SYNONYM

FIPS compliance checking (FLAGGER) (reference 2) should not apply:

SQL> show flagger
flagger OFF

As I mentioned, an interesting problem – yet a bit of a headache inducer when trying to use AUTOTRACE.





First Table is 550MB, Second Table is 26GB – Nested Loops or Full Table Scan?

1 10 2010

October 1, 2010

(Forward to the Next Post in the Series)

A couple of days ago I wrote a blog article about creating test case scripts to test ideas.  In a recent OTN thread the original poster asked about why his value for the DB_FILE_MULTIBLOCK_READ_COUNT parameter is not being honored.  The OP stated that his DB_FILE_MULTIBLOCK_READ_COUNT parameter was set to 128, yet when reviewing a 10046 trace at level 8 he was seeing that a maximum of 16 blocks were read from disk at a time while perfoming full table scans.  So, what might be limiting the multi-block reads to 16 blocks (I believe that one of my previous blog article mentioned the limiting factors)?  This would be a great use for a test case script, although with 26GB and 550MB tables you might want to consider reducing the size of the tables when building the test case script.

Interesting, however the unexpectedly low multi-block read size is not the subject of this blog article.  Someone mentioned in the OTN thread that when joining the 550MB table to the 26GB table, a nested loops join should be used because hashing could take a lot of time.  I was not expecting to see a comment like that, so I started to wonder, could it be true?  I wonder if one of my earlier blog articles touched on this idea?  Maybe I overlooked something in the documentation:

“The database uses hash joins to join large data sets. The optimizer uses the smaller of two tables or data sources to build a hash table on the join key in memory. It then scans the larger table, probing the hash table to find the joined rows.

This method is best when the smaller table fits in available memory. The cost is then limited to a single read pass over the data for the two tables.

11.3.4.1 When the Optimizer Uses Hash Joins
The optimizer uses a hash join to join two tables if they are joined using an equijoin and if either of the following conditions are true:

  • A large amount of data must be joined.
  • A large fraction of a small table must be joined.”

Does the situation change if there is no index on the join columns in either table?  Does the situation change if there is no index on any of the predicates in the WHERE clause?  Should Oracle Database repeatedly full scan the 550MB or the 26GB table while performing a nested loops join?  What is a large fraction of a small table, 75% of the rows, 50% of the rows, 10% of the rows, 1% of the rows?  Maybe the use of a nested loops join in such a case needs a bit more consideration.  Of course we could create a test case script to determine if this statement in the OTN thread is correct, although we may need to hint the join method in the SELECT statement to test the performance with both join methods.

The same person later suggested that if the first table is small (I guess that the 550MB table is the small one) and the driven table (I suspect that the 26GB table matches this description) contains an index on the join column, Oracle’s optimizer will favor nested loops joins over hash joins.  It might be interesting to construct a test case to see if this is true (Oracle Database 11.1.0.7 used for the test case results displayed below).  To save time I will create tables which are a bit smaller than the original 550MB and 26GB tables, which might throw off the test case a little in favor of nested loops joins, but I will keep an eye on the output for problems.  In my test case script, table T1 will be my “large” table and table T2 will be my “small” table.

CREATE TABLE T1 (
  COL1 NUMBER,
  COL2 NUMBER,
  COL3 VARCHAR2(200),
  PRIMARY KEY(COL1));

INSERT INTO
  T1
SELECT
  ROWNUM,
  10000000-ROWNUM,
  RPAD(TO_CHAR(ROWNUM),200,'A')
FROM
  (SELECT
    ROWNUM RN
  FROM
    DUAL
  CONNECT BY
    LEVEL<=1000) V1,
  (SELECT
    ROWNUM RN
  FROM
    DUAL
  CONNECT BY
    LEVEL<=1000) V2;

COMMIT;

ALTER TABLE T1 MODIFY(COL2 NOT NULL);

CREATE INDEX IND_T1_COL2 ON T1(COL2);

CREATE TABLE T2 (
  COL1 NUMBER,
  COL2 NUMBER,
  COL3 VARCHAR2(200),
  PRIMARY KEY(COL1));

INSERT INTO
  T2
SELECT
  ROWNUM,
  10000000-ROWNUM,
  RPAD(TO_CHAR(ROWNUM),200,'A')
FROM
  (SELECT
    ROWNUM RN
  FROM
    DUAL
  CONNECT BY
    LEVEL<=100000);

COMMIT;

ALTER TABLE T2 MODIFY(COL2 NOT NULL);

CREATE INDEX IND_T2_COL2 ON T2(COL2);

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T1',CASCADE=>TRUE)
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T2',CASCADE=>TRUE)

We now have two simple tables with a column having an ascending number sequence, a column have a descending number sequence, and a 200 byte padding column to discourage full table scans.  Now let’s perform a quick test:

SET AUTOTRACE TRACEONLY EXPLAIN
SET LINESIZE 140
SET PAGESIZE 1000
SET TRIMSPOOL ON

SELECT
  T1.COL1,
  T1.COL2,
  T1.COL3,
  T2.COL1,
  T2.COL2,
  T2.COL3
FROM
  T1,
  T2
WHERE
  T1.COL1=T2.COL1;

Execution Plan
----------------------------------------------------------
Plan hash value: 2959412835

-----------------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |   100K|    40M|       | 28696   (1)| 00:01:57 |
|*  1 |  HASH JOIN         |      |   100K|    40M|    21M| 28696   (1)| 00:01:57 |
|   2 |   TABLE ACCESS FULL| T2   |   100K|    20M|       |   379   (2)| 00:00:02 |
|   3 |   TABLE ACCESS FULL| T1   |  1000K|   201M|       | 23406   (1)| 00:01:35 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."COL1"="T2"."COL1")

Two full table scans and a hash join.  If we retrieve all of the columns and all of the rows, the optimizer apparently will not favor nested loops joins.  Let’s try again with a slightly different query, this time using the column with the descending number sequence as the join column:

SELECT
  T1.COL1,
  T1.COL2,
  T1.COL3,
  T2.COL1,
  T2.COL2,
  T2.COL3
FROM
  T1,
  T2
WHERE
  T1.COL2=T2.COL2;

Execution Plan
----------------------------------------------------------
Plan hash value: 2959412835

-----------------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |   100K|    40M|       | 28696   (1)| 00:01:57 |
|*  1 |  HASH JOIN         |      |   100K|    40M|    21M| 28696   (1)| 00:01:57 |
|   2 |   TABLE ACCESS FULL| T2   |   100K|    20M|       |   379   (2)| 00:00:02 |
|   3 |   TABLE ACCESS FULL| T1   |  1000K|   201M|       | 23406   (1)| 00:01:35 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."COL2"="T2"."COL2")

Still a hash join and two full table scans.  Again, the optimizer did not favor nested loops joins.  Let’s try another query that specifies a range of values for the column with the ascending values:

SELECT
  T1.COL1,
  T1.COL2,
  T1.COL3,
  T2.COL1,
  T2.COL2,
  T2.COL3
FROM
  T1,
  T2
WHERE
  T1.COL1=T2.COL1
  AND T1.COL1 BETWEEN 1 AND 1000;

Execution Plan
----------------------------------------------------------
Plan hash value: 1754270684

---------------------------------------------------------------------------------------------
| Id  | Operation                    | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |              |   999 |   411K|    72   (2)| 00:00:01 |
|*  1 |  HASH JOIN                   |              |   999 |   411K|    72   (2)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1           |  1000 |   206K|    37   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | SYS_C0017027 |  1000 |       |     4   (0)| 00:00:01 |
|   4 |   TABLE ACCESS BY INDEX ROWID| T2           |  1000 |   206K|    34   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN          | SYS_C0017028 |  1000 |       |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."COL1"="T2"."COL1")
   3 - access("T1"."COL1">=1 AND "T1"."COL1"<=1000)
   5 - access("T2"."COL1">=1 AND "T2"."COL1"<=1000)

Still a hash join, although the indexes on both tables were used to reduce the number of rows entering the hash join.  So, in this case when selecting 1% (1,000 / 100,000) of the rows from tables, the optimizer still selected to perform a hash join.  Let’s try another SQL statement, this time putting the restriction on the column with the descending values (note that this query, if executed, will not return any rows because the values in column T2.COL2 do not fall into the range between 1 to 1,000):

SELECT
  T1.COL1,
  T1.COL2,
  T1.COL3,
  T2.COL1,
  T2.COL2,
  T2.COL3
FROM
  T1,
  T2
WHERE
  T1.COL1=T2.COL1
  AND T1.COL2 BETWEEN 1 AND 1000;

Execution Plan
----------------------------------------------------------
Plan hash value: 596667293

----------------------------------------------------------------------------------------------
| Id  | Operation                     | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |              |     1 |   422 |     5   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                 |              |       |       |            |          |
|   2 |   NESTED LOOPS                |              |     1 |   422 |     5   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID| T1           |     1 |   211 |     4   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN          | IND_T1_COL2  |     1 |       |     3   (0)| 00:00:01 |
|*  5 |    INDEX UNIQUE SCAN          | SYS_C0017028 |     1 |       |     0   (0)| 00:00:01 |
|   6 |   TABLE ACCESS BY INDEX ROWID | T2           |     1 |   211 |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("T1"."COL2">=1 AND "T1"."COL2"<=1000)
   5 - access("T1"."COL1"="T2"."COL1")

As you can probably tell from the above plan, the optimizer believes that 1 or fewer rows will be returned, so the optimizer changed from performing a single hash join to performing two nested loops joins.  Let’s try another query, this time we will not retrieve column T2.COL3 (the large padding column) and change the WHERE clause to pick up all COL1 values less than 10,000:

SELECT
  T1.COL1,
  T1.COL2,
  T1.COL3,
  T2.COL1,
  T2.COL2
FROM
  T1,
  T2
WHERE
  T1.COL1=T2.COL1
  AND T1.COL1 < 10000;

Execution Plan
----------------------------------------------------------
Plan hash value: 218283169

-------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                  |  9998 |  2167K|   666   (1)| 00:00:03 |
|*  1 |  HASH JOIN                   |                  |  9998 |  2167K|   666   (1)| 00:00:03 |
|*  2 |   VIEW                       | index$_join$_002 |  9999 |   107K|   323   (2)| 00:00:02 |
|*  3 |    HASH JOIN                 |                  |       |       |            |          |
|*  4 |     INDEX RANGE SCAN         | SYS_C0017028     |  9999 |   107K|    22  (10)| 00:00:01 |
|   5 |     INDEX FAST FULL SCAN     | IND_T2_COL2      |  9999 |   107K|   301   (1)| 00:00:02 |
|   6 |   TABLE ACCESS BY INDEX ROWID| T1               |  9999 |  2060K|   343   (1)| 00:00:02 |
|*  7 |    INDEX RANGE SCAN          | SYS_C0017027     |  9999 |       |    21   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."COL1"="T2"."COL1")
   2 - filter("T2"."COL1"<10000)
   3 - access(ROWID=ROWID)
   4 - access("T2"."COL1"<10000)
   7 - access("T1"."COL1"<10000)

The optimizer again did not select to perform nested loop joins, instead performing two hash joins.

Let’s put the restriction on column C2 and try again:

SELECT
  T1.COL1,
  T1.COL2,
  T1.COL3,
  T2.COL1,
  T2.COL2
FROM
  T1,
  T2
WHERE
  T1.COL1=T2.COL1
  AND T1.COL2 < 10000;

Execution Plan
----------------------------------------------------------
Plan hash value: 596667293

----------------------------------------------------------------------------------------------
| Id  | Operation                     | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |              |     1 |   222 |     5   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                 |              |       |       |            |          |
|   2 |   NESTED LOOPS                |              |     1 |   222 |     5   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID| T1           |     1 |   211 |     4   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN          | IND_T1_COL2  |     1 |       |     3   (0)| 00:00:01 |
|*  5 |    INDEX UNIQUE SCAN          | SYS_C0017028 |     1 |       |     0   (0)| 00:00:01 |
|   6 |   TABLE ACCESS BY INDEX ROWID | T2           |     1 |    11 |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("T1"."COL2"<10000)
   5 - access("T1"."COL1"="T2"."COL1")

The optimizer predicted that a single row would be returned, so it decided to use two nested loops joins rather than a hash join.  Let’s try again, but this time fix the SQL statement so that it will return rows if executed:

SELECT
  T1.COL1,
  T1.COL2,
  T1.COL3,
  T2.COL1,
  T2.COL2
FROM
  T1,
  T2
WHERE
  T1.COL1=T2.COL1
  AND T1.COL2 > (10000000-10000);

Execution Plan
----------------------------------------------------------
Plan hash value: 570481587

----------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name        | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |             |  9999 |  2167K|       |   804   (2)| 00:00:04 |
|*  1 |  HASH JOIN                   |             |  9999 |  2167K|  2184K|   804   (2)| 00:00:04 |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1          |  9999 |  2060K|       |   331   (1)| 00:00:02 |
|*  3 |    INDEX RANGE SCAN          | IND_T1_COL2 |  9999 |       |       |    26   (0)| 00:00:01 |
|   4 |   TABLE ACCESS FULL          | T2          |   100K|  1074K|       |   379   (2)| 00:00:02 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("T1"."COL1"="T2"."COL1")
   3 - access("T1"."COL2">9990000)

Back to a hash join again, so nothing magic about having a column with a descending sequence of numbers causing the nested loops join.  Let’s try again, this time retrieving only 100 rows, rather than 1,000 or 9,999:

SELECT
  T1.COL1,
  T1.COL2,
  T1.COL3,
  T2.COL1,
  T2.COL2,
  T2.COL3
FROM
  T1,
  T2
WHERE
  T1.COL1=T2.COL1
  AND T1.COL1 BETWEEN 1 AND 100;

Plan hash value: 1754270684

---------------------------------------------------------------------------------------------
| Id  | Operation                    | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |              |    99 | 41778 |    14   (8)| 00:00:01 |
|*  1 |  HASH JOIN                   |              |    99 | 41778 |    14   (8)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1           |   100 | 21100 |     7   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | SYS_C0017027 |   100 |       |     3   (0)| 00:00:01 |
|   4 |   TABLE ACCESS BY INDEX ROWID| T2           |   100 | 21100 |     6   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN          | SYS_C0017028 |   100 |       |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."COL1"="T2"."COL1")
   3 - access("T1"."COL1">=1 AND "T1"."COL1"<=100)
   5 - access("T2"."COL1">=1 AND "T2"."COL1"<=100)

A hash join again.  What if we decrease the number of rows from 100 to 10?

SELECT
  T1.COL1,
  T1.COL2,
  T1.COL3,
  T2.COL1,
  T2.COL2,
  T2.COL3
FROM
  T1,
  T2
WHERE
  T1.COL1=T2.COL1
  AND T1.COL1 BETWEEN 1 AND 10;

Execution Plan
----------------------------------------------------------
Plan hash value: 2528765105

----------------------------------------------------------------------------------------------
| Id  | Operation                     | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |              |     9 |  3798 |     5   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                 |              |       |       |            |          |
|   2 |   NESTED LOOPS                |              |     9 |  3798 |     5   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID| T1           |    10 |  2110 |     4   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN          | SYS_C0017027 |    10 |       |     3   (0)| 00:00:01 |
|*  5 |    INDEX UNIQUE SCAN          | SYS_C0017028 |     1 |       |     0   (0)| 00:00:01 |
|   6 |   TABLE ACCESS BY INDEX ROWID | T2           |     1 |   211 |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("T1"."COL1">=1 AND "T1"."COL1"<=10)
   5 - access("T1"."COL1"="T2"."COL1")
       filter("T2"."COL1"<=10 AND "T2"."COL1">=1)

Apparently, dropping the number of rows to 10 (0.01% of the rows in table T2, if my calculations are correct) was sufficient to make the optimizer switch from a hash join to two nested loops joins.

I will now ask the question, “If the first table is small and the driven table contains an index on the join column, will Oracle’s optimizer favor nested loops joins over hash joins?”  Could someone generate a test case that shows just the opposite of the results found in my test case?  Does the answer change if I were to repeat the test case script on Oracle Database 8.1.7.4 or 11.2.0.2?  What if instead of one 200 byte padding column there were fifty 10 byte padding columns, and the query only selected one of those padding columns from each table?  What if instead of one 200 byte padding column there was only a single 10 byte padding column?  Is it safe to make a generalization at this point?