What Could Go Wrong? Testing the DB_FILE_MULTIBLOCK_READ_COUNT Parameter

14 10 2011

October 14, 2011 (Modified October 15, 2011)

When reading various Oracle Database books, I often perform mental exercises related to the book’s contents, asking “what could go wrong?”   That is probably why some of my book reviews have extended to record setting lengths… who knew that Amazon imposed a limit on book review lengths…

I recently started re-reading the “Troubleshooting Oracle Performance” book after buying a PDF copy of the book.  As previously mentioned, the ultimate reason for re-reading the book is so that I may compose a detailed book review, just as I have done for most of the Oracle Database related books that I have read in the past couple of years.  So far, I have had few opportunities to ask “what could go wrong” – until of course I made it to page 178 during the second read-through of the book.  That page includes a PL/SQL block that tests the performance effects of modifying the DB_FILE_MULTIBLOCK_READ_COUNT parameter from a value of 1 to 32, increasing the value by 1 for each pass through the loop in the PL/SQL block.  What could go wrong?

Let’s build a sample table with 11,000,000 rows:

DROP TABLE T1 PURGE;

CREATE TABLE T1(
  C1 NUMBER,
  C2 NUMBER,
  C3 NUMBER,
  C4 DATE,
  C5 DATE,
  C6 DATE,
  C7 VARCHAR2(20),
  C8 VARCHAR2(20),
  C9 VARCHAR2(20));

INSERT INTO
  T1
SELECT
  ROWNUM,
  DECODE(MOD(ROWNUM,5),0,NULL,ROWNUM),
  DECODE(MOD(ROWNUM,5),0,NULL,ROWNUM),
  TRUNC(SYSDATE)+TRUNC(ROWNUM/100),
  DECODE(MOD(ROWNUM,5),0,NULL,TRUNC(SYSDATE)+TRUNC(ROWNUM/100)),
  DECODE(MOD(ROWNUM,5),0,NULL,TRUNC(SYSDATE)+TRUNC(ROWNUM/100)),
  RPAD(CHR(MOD(ROWNUM-1,26)+65)||
    CHR(MOD(ROWNUM,26)+65)||
    CHR(MOD(ROWNUM+1,26)+65)||
    CHR(MOD(ROWNUM+2,26)+65),20,'A'),
  DECODE(MOD(ROWNUM,5),0,NULL,
   RPAD(CHR(MOD(ROWNUM-1,26)+65)||
    CHR(MOD(ROWNUM,26)+65)||
    CHR(MOD(ROWNUM+1,26)+65)||
    CHR(MOD(ROWNUM+2,26)+65),20,'A')),
  DECODE(MOD(ROWNUM,5),0,NULL,
   RPAD(CHR(MOD(ROWNUM-1,26)+65)||
    CHR(MOD(ROWNUM,26)+65)||
    CHR(MOD(ROWNUM+1,26)+65)||
    CHR(MOD(ROWNUM+2,26)+65),20,'A'))
FROM
  DUAL
CONNECT BY
  LEVEL<=1000000;

INSERT INTO
  T1
SELECT
  T1.*
FROM
  T1,
  (SELECT
    ROWNUM RN
  FROM
    DUAL
  CONNECT BY
    LEVEL<=10);

COMMIT;

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T1',ESTIMATE_PERCENT=>NULL)

The above script might take a minute or two to complete.  Let’s take a look at the extent sizes for table T1:

SELECT
  SEGMENT_NAME SEGMENT,
  COUNT(*) EXTENTS,
  BYTES/1024 EXT_SIZE_KB,
  (COUNT(*) * BYTES)/1048576 TOTAL_MB
FROM
  DBA_EXTENTS
WHERE
  OWNER=USER
  AND SEGMENT_NAME IN ('T1')
GROUP BY
  SEGMENT_NAME,
  BYTES
ORDER BY
  SEGMENT_NAME,
  BYTES;

SEGMENT    EXTENTS EXT_SIZE_KB   TOTAL_MB
------- ---------- ----------- ----------
T1              16          64          1
T1              63        1024         63
T1             120        8192        960
T1               2       65536        128

Now let’s try a slightly modified version of the script that appears in the book:

SET SERVEROUTPUT ON
SET TIMING ON

DECLARE
  L_COUNT PLS_INTEGER;
  L_TIME PLS_INTEGER;
  L_STARTING_TIME PLS_INTEGER;
  L_ENDING_TIME PLS_INTEGER;
BEGIN
  DBMS_OUTPUT.PUT_LINE('DBFMBRC SECONDS');
  FOR L_DBFMBRC IN 1..128
  LOOP
    EXECUTE IMMEDIATE 'ALTER SESSION SET DB_FILE_MULTIBLOCK_READ_COUNT='||L_DBFMBRC;
    L_STARTING_TIME := DBMS_UTILITY.GET_TIME();
    SELECT /*+ FULL(T) */ COUNT(*) INTO L_COUNT FROM T1 T;
    L_ENDING_TIME := DBMS_UTILITY.GET_TIME();
    L_TIME := ROUND((L_ENDING_TIME-L_STARTING_TIME)/100);
    DBMS_OUTPUT.PUT_LINE(L_DBFMBRC||' '||L_TIME);
  END LOOP;
END;
/

The output that I received when tested on Oracle Database 11.2.0.2 follows:

DBFMBRC SECONDS
1 0
2 0
3 0
4 0
5 0
6 0
7 0
8 0
9 0
10 0
11 0
12 0
13 0
14 0
15 0
16 0
17 0
18 0
19 0
20 0
21 0
22 0
23 0
24 0
25 0
26 0
27 0
28 0
29 0
30 0
31 0
32 0
33 0
34 0
35 0
36 0
37 0
38 0
39 0
40 0
41 0
42 0
43 0
44 0
45 0
46 0
47 0
48 0
49 0
50 0
51 0
52 0
53 0
54 0
55 0
56 0
57 0
58 0
59 0
60 0
61 0
62 0
63 0
64 0
65 0
66 0
67 0
68 0
69 0
70 0
71 0
72 0
73 0
74 0
75 0
76 0
77 0
78 0
79 0
80 0
81 0
82 0
83 0
84 0
85 0
86 0
87 0
88 0
89 0
90 0
91 0
92 0
93 0
94 0
95 0
96 0
97 0
98 0
99 0
100 0
101 0
102 0
103 0
104 0
105 0
106 0
107 0
108 0
109 0
110 0
111 0
112 0
113 0
114 0
115 0
116 0
117 0
118 0
119 0
120 0
121 0
122 0
123 0
124 0
125 0
126 0
127 0
128 0

PL/SQL procedure successfully completed.

What Could Go Wrong?  (I had two or three answers for this question as I was reading the script, but I have a couple more after a bit of experimentation.  Think about what might go wrong without looking in the book.)

Added October 15, 2011:

The “Troubleshooting Oracle Performance” book, to my knowledge, was the first performance tuning book to address Oracle Database 11.1, and this book was released shortly after the release of Oracle Database 11.1.0.6.  While it was not widely known at the time (I do not think that it was documented), Oracle Database 11.1.0.6 might employ serial direct path reads (directly into the PGA) rather than db file scattered reads (into the buffer cache) when performing a full table scan.  Based on an experiment, the DB_FILE_MULTIBLOCK_READ_COUNT parameter influences the maximum number of blocks read in a single direct path read request; the maximum number of blocks read by direct path read appears to be the greatest power of two that is equal to or less than the value of the the DB_FILE_MULTIBLOCK_READ_COUNT parameter.  The paragraphs that follow this test case in the book suggest that a parallel full table scan should not be utilized – that implies that the book author is aware that direct path reads might distort the outcome of the test case.  This was the first thought that I had when viewing this test case when re-reading the book.

The effects of the table blocks that are already buffered in the buffer cache is also a potential area where this test case might see distorted results (the paragraphs that follow this test case in the book address this issue).  The test table is roughly 1,152MB in size (minus the blocks that are above the high water mark, of course), so how likely is it that some of the table blocks might be cached in the buffer cache either before the first execution of the SELECT from the table T1, or between executions of the SELECT from T1?  Does the answer depend on what was performed prior to the test case – how full is the buffer cache with unrelated blocks?  What is possibly interesting is that in my test case, all of the blocks belonging to the entire table were buffered in the buffer cache prior to the start of the PL/SQL block in the test case.  Without the timing information displayed by SET TIMING ON, it is difficult to determine if the DBMS_UTILITY.GET_TIME function simply is not supported on the operating system platform used for the test, or if as Timur stated in the comments section, the full table scans were completing so quickly that the timing of the individual runs was being rounded to 0 seconds.  I was careful not to include the output of the SQL*Plus timing during the initial posting of the blog article.  I obtained the following times from repeated PL/SQL executions:

Elapsed: 00:00:39.37

Elapsed: 00:00:37.05

Elapsed: 00:00:37.14

With a 10046 trace enabled, the execution time climbed sharply:

...
125 1
126 1
127 1
128 1

PL/SQL procedure successfully completed.

Elapsed: 00:01:40.16

The potential buffering of some or all of the table blocks was the second thought that I had while reading the test case during the re-read of the book.  The paragraph in the book that followed the test case addressed the potential problems of the table blocks being cached in the buffer cache, the operating system cache, a SAN cache, or a similar cache.  As previously mentioned, Oracle Database 11.1.0.6 and later are able to perform serial direct path reads, where previous Oracle Database versions would have used db file scattered reads, so the Oracle Database version is potentially important.  The book previously mentioned on page 175 three items that might cause fewer blocks to be read than what is specified by the DB_FILE_MULTIBLOCK_READ_COUNT parameter – those items might impact the results of the test, if the test table is not representative of the other objects in the database that might be read by a db file scattered read.

While reading the test case, another thought raced through my mind: why is the author experimenting with the DB_FILE_MULTIBLOCK_READ_COUNT parameter – just leave it unset and make certain that system statistics are collected so that the MBRC system statistic is used (I believe that this is related to Mohamed’s second point in the comments section).  Since the release of Oracle Database 10.2.0.1 it is possible to leave that parameter unset (and a bug was fixed in 10.2.0.4 that then allowed the parameter to be set to 0 which then had the same effect as leaving it unset).  That is the point that Mohamed made in the comments section.  The book author covered the possibility of leaving the DB_FILE_MULTIBLOCK_READ_COUNT parameter unset in the paragraphs that followed the test case, and since this book attempts to help with performance problems in Oracle 9.0.1 and later, it probably is a good idea that the test case was included to benefit those who had not yet transitioned all of the databases to Oracle Database 10.2 or later.

Result caching, mentioned by Steve Catmull in the comments section, is not something that I had considered while reading the test case.  If I remember correctly, when the RESULT_CACHE_MODE parameter is set to FORCE, the result cache could distort the results of the test case.

I thought that I would include a couple of more test cases on 11.2.0.2.
With an unset DB_FILE_MULTIBLOCK_READ_COUNT and the following script:

ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE;

EXEC DBMS_MONITOR.SESSION_TRACE_ENABLE

SELECT /*+ FULL(T1) */ COUNT(*) FROM T1;

EXEC DBMS_MONITOR.SESSION_TRACE_DISABLE

A variety of “direct path read” wait events appear in the resulting trace file.  Two typical wait events found in the trace file follow:

WAIT #362523544: nam='direct path read' ela= 2280 file number=4 first dba=3951616 block cnt=128 obj#=71407 tim=366050651386
WAIT #362523544: nam='direct path read' ela= 2323 file number=4 first dba=3951744 block cnt=128 obj#=71407 tim=366050654201

The above shows the types of wait events that I expected to have when I read the PL/SQL block.

Let’s try again with a specified value for DB_FILE_MULTIBLOCK_READ_COUNT:

ALTER SESSION SET DB_FILE_MULTIBLOCK_READ_COUNT=20;

ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE;

EXEC DBMS_MONITOR.SESSION_TRACE_ENABLE

SELECT /*+ FULL(T1) */ COUNT(*) FROM T1;

EXEC DBMS_MONITOR.SESSION_TRACE_DISABLE

A variety of “direct path read” wait events appear in the resulting trace file.  Two typical wait events found in the trace file follow:

WAIT #362523544: nam='direct path read' ela= 272 file number=4 first dba=4061184 block cnt=16 obj#=71407 tim=366357357469
WAIT #362523544: nam='direct path read' ela= 372 file number=4 first dba=4061216 block cnt=16 obj#=71407 tim=366357357889

In the above, note that the direct path read wait did not entirely adhere to the value for the modified DB_FILE_MULTIBLOCK_READ_COUNT, the actual number of blocks was rounded down to the next lower power of 2.

Let’s try again with another value for the DB_FILE_MULTIBLOCK_READ_COUNT parameter:

ALTER SESSION SET DB_FILE_MULTIBLOCK_READ_COUNT=100;

ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE;

EXEC DBMS_MONITOR.SESSION_TRACE_ENABLE

SELECT /*+ FULL(T1) */ COUNT(*) FROM T1;

EXEC DBMS_MONITOR.SESSION_TRACE_DISABLE

A variety of “direct path read” wait events appear in the resulting trace file.  Two typical wait events found in the trace file follow:

WAIT #362523544: nam='direct path read' ela= 1206 file number=4 first dba=4050624 block cnt=64 obj#=71407 tim=366563051144
WAIT #362523544: nam='direct path read' ela= 1144 file number=4 first dba=4050688 block cnt=64 obj#=71407 tim=366563052537

In the above, note that the direct path read wait did not entirely adhere to the value for the modified DB_FILE_MULTIBLOCK_READ_COUNT, the actual number of blocks was rounded down to the next lower power of 2.

Let’s try again with another value for the DB_FILE_MULTIBLOCK_READ_COUNT parameter:

ALTER SESSION SET DB_FILE_MULTIBLOCK_READ_COUNT=120;

ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE;

EXEC DBMS_MONITOR.SESSION_TRACE_ENABLE

SELECT /*+ FULL(T1) */ COUNT(*) FROM T1;

EXEC DBMS_MONITOR.SESSION_TRACE_DISABLE

A variety of “direct path read” wait events appear in the resulting trace file.  Two typical wait events found in the trace file follow:

WAIT #362523544: nam='direct path read' ela= 1183 file number=4 first dba=4047232 block cnt=64 obj#=71407 tim=366628708554
WAIT #362523544: nam='direct path read' ela= 1280 file number=4 first dba=4047296 block cnt=64 obj#=71407 tim=366628710113

Just for verification:

ALTER SESSION SET DB_FILE_MULTIBLOCK_READ_COUNT=128;

ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE;

EXEC DBMS_MONITOR.SESSION_TRACE_ENABLE

SELECT /*+ FULL(T1) */ COUNT(*) FROM T1;

EXEC DBMS_MONITOR.SESSION_TRACE_DISABLE

Wait events:

WAIT #362523544: nam='direct path read' ela= 2505 file number=4 first dba=4050560 block cnt=128 obj#=71407 tim=366681412987
WAIT #362523544: nam='direct path read' ela= 2259 file number=4 first dba=4050688 block cnt=128 obj#=71407 tim=366681415722

The above again shows that the number of blocks read by direct path read are rounded down to the nearest power of two.

OK, let’s try the initial test again:

DISCONNECT ALL;
CONNECT ...

Instead of using a PL/SQL block, let’s use regular SQL statements:

ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE;
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T1',ESTIMATE_PERCENT=>100)

EXEC DBMS_MONITOR.SESSION_TRACE_ENABLE

SELECT /*+ FULL(T1) */ COUNT(*) FROM T1;

EXEC DBMS_MONITOR.SESSION_TRACE_DISABLE

ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE;

EXEC DBMS_MONITOR.SESSION_TRACE_ENABLE

SELECT /*+ FULL(T1) */ COUNT(*) FROM T1;

EXEC DBMS_MONITOR.SESSION_TRACE_DISABLE

The results, “direct path read” wait events for both of the selects from table T1:

...
WAIT #357271008: nam='direct path read' ela= 2774 file number=4 first dba=3930496 block cnt=128 obj#=71407 tim=371288846837
WAIT #357271008: nam='direct path read' ela= 2353 file number=4 first dba=3930624 block cnt=128 obj#=71407 tim=371288849736
...
WAIT #357239640: nam='direct path read' ela= 2749 file number=4 first dba=3935360 block cnt=128 obj#=71407 tim=371292356287
WAIT #357239640: nam='direct path read' ela= 2580 file number=4 first dba=3935488 block cnt=128 obj#=71407 tim=371292359340
...

Other testing showed that if the buffer cache was flushed after creating and populating the table, but before collecting statistics, Oracle Database 11.2.0.2 performed serial direct path reads to read the table blocks, rather than reading those blocks into the buffer cache using db file scattered reads.