_SMALL_TABLE_THRESHOLD Parameter and Buffer Cache – What is Wrong with this Quote?

17 06 2010

June 17, 2010

Warning: this article is certain to bring tears of pain to your eyes while you attempt to determine what is right and what is wrong.

While reading the June 2010 printing of the “Oracle Performance Firefighting” book I found a couple of interesting lines about the buffer cache and the effects, or lack there-of, of the _SMALL_TABLE_THRESHOLD parameter.  This quote is from page 208 (this is a long quote, required to not lose the context of the discussion topic - please excuse any typos):

“The single most radical departure from the modified LRU algorithm [introduced in Oracle Database 6, per the book] is known as midpoint insertion [introduced in Oracle Database 8.1.5, per the book]. Each LRU chain is divided into a hot and cold region. When a buffer is read from disk and a free buffer has been found, the buffer and buffer header replace the previous buffer and buffer header contents, and then the buffer header is moved to the LRU chain midpoint. Single-block read, multiblock read, fast-full scan, or full-table scan—it makes no difference. The buffer header is not inserted at the MRU end of the LRU chain, but rather at the midpoint…

Because the window scheme used in the modified LRU algorithm is no longer used, the hidden instance parameter _small_table_threshold became deprecated. However, in Oracle Database 11g, it is being used again, but for a different purpose. Starting with this version, the _small_table_threshold parameter is the threshold for a server process to start issuing direct reads.”

Before deciding what about the above is correct or incorrect, take a look at Metalink Doc ID 787373.1.  A couple of words from that document to encourage you to take a look at the original material:

“Applies to Version: 9.0.1.0 to 11.2.0.1.0: When loading objects and data into the buffer cache… best way to load for full scans. This functionality is called smallness logic and is driven by a hidden parameter called: _small_table_threshold. … Oracle will load the object via the buffer cache as this is more efficient than doing a direct read operation.”

Before deciding what about the above Metalink document is correct or incorrect, take a look at this AskTom article.  A couple of words from that article:

“no, you are not correct on your guess about the [_SMALL_TABLE_THRESHOLD] parameter, it controls the caching of the blocks (whether they are cached like a single block IO or cached as we cache full scan blocks) – not the method of IO.”

Before deciding what about the above AskTom article is still correct, take a look at this blog article.  A couple of words from that article:

“When direct path reads starts to happen?  It is known that somewhat reliable figure is your _small_table_threshold multiplied by 5 (mentioned by Tanel Poder on oracle-l recently).”

One final blog article, from the comments section:

“A table is deemed ‘small’ by Oracle if it’s 2% or less of the buffer cache in size. A small table is ‘cached’ when read via a FTS and is not immediately loaded into the least recently used end of the LRU list as becomes a ‘large’ table read via a FTS…”

“When a block is read from a ‘large’ table via a FTS, the blocks are basically loaded into the LRU end of the LRU list, even though they’ve only just been loaded into the buffer cache. As such, the chances of them remaining in the buffer cache is minimal and will likely be immediately overwritten once processed.

However, when a block is read via an index, the blocks are loaded somewhere near the ‘middle’ of the LRU list, not at the LRU end of the list…”

What, if anything, is wrong with the above quote from the book?  Bonus question, what, if anything, is wrong with the Metalink document?  Maybe there is a problem with the other quotes? 

The point of blog articles like this one is not to insult authors who have spent thousands of hours carefully constructing an accurate and helpful book, but instead to suggest that readers investigate when something stated does not exactly match what one believes to be true.  It could be that the author “took a long walk down a short pier”, or that the author is revealing accurate information which simply cannot be found through other resources (and may in the process be directly contradicting information sources you have used in the past).  If you do not investigate in such cases, you may lose an important opportunity to learn something that could prove to be extremely valuable.


Actions

Information

26 responses

17 06 2010
Charles Hooper

Maybe it would help to start thinking about and discussing what is not quite right with the Metalink document? Anything suspicious about the Oracle release versions that are mentioned?

17 06 2010
Kyle Hailey

I’m mainly a hacker which isn’t very scientific, but well … that’s the way the ball bounces, but as far as hacking goes it seemed pretty likely that Oracle was using something like small table threshold in 8.1.6, after midpoint insertion started, to read in FTS into the cold end of the LRU.
Back in 8.1.6 we could have 50 blocks in the buffer cache (hard to imagine now) and with a program to attach to the SGA and read the memory directly with C code and a screen with 50 lines, I could watch the evolution of the buffer cache. It was pretty easy to see mid point insertion, block promotion to the hot end and to see that FTS above a certain size went into the cold end, not the mid point, of the LRU.
I have to dig out this program and try it again. Alas, with segmented buffer cache, the program needs some modifications now on 9i and above.
I also would have to segment the display to be just parts of the hot, midpoint and cold end as the minimum size of the buffer cache is well beyond 50 blocks these days.
Best
Kyle Hailey
http://db-optimizer.blogspot.com/

17 06 2010
Charles Hooper

Kyle,

First, sorry for the delay in your comment appearing. Thank you for the information regarding 8.1.6. The direct attachment to the SGA would likely put a different degree of insight into exactly what happens during a full table scan of a large table compared to looking at what information is presented through Oracle’s externalized view of what happens.

17 06 2010
Jonathan Lewis

Charles,

The “_small_table_threshold”, small tables, and buffering limits for tables are all on my “todo” list; especially after reading a post from Dion Cho ( http://dioncho.wordpress.com/tag/full-table-scan/ ) where he points out that the _small_table_threshold has probably not been the limit at which the LRU algorithm switches between end-point (for “long” tablescans) and mid-point insertion (for “short” tablescans).

I note that the metalink note seems to be claiming that the serial direct read (of 11.2) appeared in 9.0 – and it also references a test case that doesn’t exist, so perhaps a lot of the note has gone missing through the group editing effort. Certainly there’s a lot of change to worry about – covering:

a) _small_table_threshold used to dictate when a tablescan was considered so large that it was recycled to the end of the cache. (But as Dion Cho shows, this ceased to be true a long time ago).
b) _small_table_threshold is still 2% (or something to be decided) and is still used to dictate when a tablescan is reported in the instance and session stats as a “short” tablescan
c) Even a “long” tablescan will be read in to the mid-point in lots of newer versions of Oracle (even before we get to 11.2 and the serial direct) – but it is not easy to determine the rules about how Oracle decides when a table is too long to go to the mid-point. The problem of testing is made harder because the results are affected by the current actual location of the “mid-point” and the number of free (i.e. status = 0) buffers.
d) Alex seems to have done some sound work on investigating the relationship in 11g between the _small_table_threshold and the use of serial direct reads – but again it’s hard to test since the current content of the cache seems to affect the decision.

There is a lovely Powerpoint presentation on Julian Dyke’s website that has related information with some wonderful graphics to help understand how buffer headers are used and moved around in the less recent versions of Oracle. Unfortunately he doesn’t happen to make have any comments about the purpose of the _small_table_threshold.

17 06 2010
Charles Hooper

Jonathan,

I appreciate the comments, and the enhanced sense of direction provided by those comments.

Oracle Database 10.2 lists a description of “threshold level of table size for direct reads” for the _SMALL_TABLE_THRESHOLD parameter. I wonder if this only applies to parallel execution for this release? I cannot say that I have witnessed a full tablescan in 10.2 performing direct reads without parallel execution enabled (but maybe someone else has).

K Gopalakrishnan, in this thread from 2007 (you participated in the thread also), that the parameter behaved the same in 9i as in 10g R2:
http://www.freelists.org/post/oracle-l/os-cache-vs-db-cache,18 (there were related posts above and below this post in the list at the bottom of the page):

“In 9i and above this parameter is used as a threshold (number of blocks) eligible for direct reads. Any tables below this threshold will go thru the buffer cache during direct read.”

I read somewhere last night that the caching behavior involving long full table scans (and possibly the insertion point into the LRU) would be dependent on whether or not blocks from other objects already existed in the buffer cache. I saw Dion Cho’s article last night and noticed that he was flushing the buffer cache prior to his test execution – so I wonder if that might throw off the test results?

I still need to locate the investigation performed by Alex.

17 06 2010
Jonathan Lewis

Charles,

I’d forgotten the thing about parallel query and _short_table_threshold – again, it’s one of the details to check on my todo list. I believe (as Gopalakrishnan indicates) that a parallel-enabled table will not be read in parallel if it’s size is less than the parameter setting in 9i and 10g – which means the bit about direct path is implicitly about parallel query, although you can make 10g do direct serial reads through a hidden parameter.

I thought I had made the comment about flushing the cache and free blocks on Dion’s blog at the time – as a suggestion, not as a definitive statement – but it must have been somewhere else.

Alex’s investigation – was a reference to Alex Fatkulin (Quaddro) that you had linked to.

18 06 2010
Charles Hooper

Jonathan,

Thank you for the additional information. My comment about the _SMALL_TABLE_THRESHOLD parameter and parallel execution was 50% guess (maybe process of elimination is a better description) and 50% faded memory of an article I read a while ago.

I found the article that I mentioned reading a couple of nights ago:
http://dioncho.wordpress.com/2009/04/22/strong-doubt-on-small-table_small_table_threshold/

It was your comments in the above article that I read regarding how the behavior of full table scan caching might be different with an empty buffer cache.

I still need time to review Alex Fatkulin’s thread in greater detail (http://afatkulin.blogspot.com/2009/01/11g-adaptive-direct-path-reads-what-is.html)

17 06 2010
Jonathan Lewis

I lost the end of the last sentence in the first paragraph – it should be “… for a long time.”

18 06 2010
Paul Janda

Late 2009, I made an SR on this issue. The analysts were quick to say, “this is new functionality and we are unfamiliar with it and I am in contact with the author”. Then came a patch.
In 10.2.0.4, I am unfamiliar with the existence of such behavior (skip buffer cache in favor of doing direct path reads).
I think the option of automatically skip the buffer cache was not in 9.x
My guess is that the current incarnation of _SMALL_TABLE_THRESHOLD started in 11.0
I also guess that the actual threshold is based on an algorithm that is affected by many factors, and not just _SMALL_TABLE_THRESHOLD. The support analysts used the word “heuristics” to describe the decision process.
My original problem:
I have an old (bad?) habit, where I wanted to pin a (relatively) large table in the KEEP pool. It was a great idea in 9.2. In 11.1 upgrade tests, I suddenly started getting “direct path reads”, when I when I really wanted 0 disk i/o — hence the SR & patch.

18 06 2010
Donatello Settembrino

Charles,
some time ago in this link

http://antognini.ch/2009/07/impact-of-direct-reads-on-delayed-block-cleanouts/ # comment-6068

I have tested the influence of _small_table_treshold on
direct reads(version 10.2.0.4 and 11.1.0.6).
I noticed (with the flush of the buffer cache) two different behaviors…

Regards,
Donatello

21 06 2010
johnwuwuwu

Please check bug 6195768:
Full table scans of medium sized tables (size between _small_table_threshold
and 10% of buffer cache size) are read via direct I/O or via cache depending
on the return value of kcbcmt().
I think that’s why mentioned by Tanel Poder on oracle-l recently:
“When direct path reads starts to happen? It is known that somewhat reliable figure is your _small_table_threshold multiplied by 5″

22 06 2010
Charles Hooper

johnwuwuwu,

I was unsuccessful in locating that bug number in Metalink (My Oracle Support) – a search for 6195768 returned 0 results, and a search for kcbcmt also returned 0 results. I also tried various other combinations including direct path read buffer cache.

One of the articles that might be interesting is Metalink Doc ID 793845.1 “High ‘direct path read’ waits in 11g” that suggests with Automatic Shared Memory Management enabled, a serial full table scan of a table could swing between performing direct path reads and db file scattered reads (using the buffer cache) due to the automatic resizing of the buffer cache. It might be interesting to see if the _SMALL_TABLE_THRESHOLD parameter automatically adjusts when the buffer cache size automatically adjusts in size.

22 06 2010
johnwuwuwu

The table is considered small enough to be read into the buffer cache when the table’s size is less than or equal to the number of blocks specified by the parameter.
Also, a table is considered to be of “medium” size when the number of blocks exceeds this paramater setting but is less than 10 % (maybe _small_table_threshold multiplied by 5)of the buffer cache. A “medium” table may be optionally cached based upon the frequency of full table scans executed against it.
The bug 6195768 is about ” MEDIUM SIZED TABLES ALWAYS READ THRU CACHE (FTS).”

23 06 2010
Charles Hooper

johnwuwuwu,

I spent a little more than an hour trying to locate the bug that you described. I simply cannot find supporting evidence, even through many Metalink advanced searches, searches of Google Books, and searches of the Internet.

An advanced Metalink search, which included the bug database, found bug 6195768, but this bug has nothing to do with database tables:
“Bug 6195768: processes catting hid device can not be terminated”

A search on the exact phrase: medium sized table
Bug 205722: CBO FAILS TO USE INDEX IN MEDIUM SIZED TABLE
Fixed in Product Version 7.1.4
The above was the only match

A search on the exact phrase: frequency of full table scans
Latch Contention On ‘multiblock read objects’ Latch [ID 337940.1]
Applies to Oracle Server – Enterprise Edition – Version: 9.2.0.5.0
Database experiences latch contention on “multiblock read objects” latch.
Interesting, I did not know that there was a “multiblock read objects” latch, but it does not apply.

A search on the exact phrase: _small_table_threshold
Found the following bugs:
Bug 2384100: PAVDVC – PRC: DISTRIBUTE SUPPLIER INVOICE ADJUSTMENT COSTS PERFORMANCE
“_small_table_threshold = 256 and should be removed, and allowed to default. Setting this changes the default threshold which determines what is a ‘small’ table and what is a ‘large’ table.”

Bug 6357188: ORA-8103 EVEN WITH THE W/A FROM 3569503, ON RAC NOT SINGLE INST
“2. Run with _small_table_threshold=1. This forces everything to be a direct read.”

Neither of those bugs appear to be related.
—-

I wonder if Metalink Doc ID 787373.1 is partially to blame for the confusion in the bug report that you located (maybe it confused the Oracle support analyist)? If you read my discussion with Jonathan Lewis in the comments of this blog article, it appears that Metalink Doc ID 787373.1 is describing the behavior of tables that are set to a parallel degree, where the _SMALL_TABLE_THRESHOLD parameter helps Oracle decide whether to perform a full table scan using db file scattered reads (into the buffer cache) or direct path reads (into the PGA).

The warning at the beginning of this article appears to be justified: “Warning: this article is certain to bring tears of pain to your eyes while you attempt to determine what is right and what is wrong.”

25 03 2011
Kyle Hailey

On 10.2.0.4 Oracle seems to buffer no matter the size of the table:
Using Dion’s tests
http://dioncho.wordpress.com/2009/04/22/strong-doubt-on-small-table_small_table_threshold/

Linux 10.2.0.4 (with compatibility set at 10.2.0.3)
running select count(*) 3 times from a table set to a % of db_cache_size
Oracle seems to successfully cache the whole table up to 80+% of buffer cache and caches in an effective manner with tables larger than the buffer cache size:


_db_block_buffers        273102
_small_table_threshold              5462


80% Statistics
----------------------------------------------------------
     220121  consistent gets
     220004  physical reads
----------------------------------------------------------
     220023  consistent gets
          0  physical reads
----------------------------------------------------------
     220023  consistent gets
          0  physical reads

88% Statistics
----------------------------------------------------------
     240108  consistent gets
     240004  physical reads
----------------------------------------------------------
     240024  consistent gets
      67669  physical reads
----------------------------------------------------------
     240024  consistent gets
      67836  physical reads

95% Statistics
--------------------------------------------------------
     260106  consistent gets
     260004  physical reads
--------------------------------------------------------
     260024  consistent gets
      87789  physical reads

On 11.2.0.1 on Windows It seems to stop effective caching at 10%

    _db_block_buffers             51220   25% = 12805,10% = 5122
    _small_table_threshold        102
stats for 3rd run of select count(*):
Below 10%
 SQL>
    COUNT(*)
  ----------
        5120
  Statistics
  ----------------------------------------------------------
         5128  consistent gets
            0  physical reads
above 10%
  SQL>
    COUNT(*)
  ----------
        5122
  Statistics
  ----------------------------------------------------------
         5126  consistent gets
         5122  physical reads

Here is the code for example on the 11.2.0.1 Windows tests
drop table cacher;
create table cacher(c1 char(2000), c2 char(2000), c3 char(2000)) nologging;
insert /*+ append */ into cacher
select 'x', 'x', 'x'
from dual
connect by level <= 5120 -- caches
-- connect by level <= 5122 -- doesn't cache ;
commit;
alter system flush buffer_cache;
select count(*) from cacher;
select count(*) from cacher;
set autot on stat
select count(*) from cacher;
set autot off;

25 03 2011
Kyle Hailey

missed the stats with the table larger than the buffer cache:

500%  bigger buffer cache
----------------------------------------------------------
    1350131  consistent gets
    1350002  physical reads
----------------------------------------------------------
    1350037  consistent gets
    1177881  physical reads
----------------------------------------------------------
    1350037  consistent gets
    1177856  physical reads

even with this table 5x the size of buffer cache oracle still manages to effectively cache 172K blocks

25 03 2011
Charles Hooper

Hi Kyle,

Thanks for posting a follow up comment with a test case.

I noticed in the test case script that table statistics were not collected for the CACHER table. Do you think that skipping the statistics collection step might cause the runtime engine problems where it believes that the table is very small, resulting in too many of the blocks being cached, or do you think that the automatic dynamic sampling would have prevented problems like that? I am more than a little surprised by the results you posted for 10.2.0.4.

In the above comments I see that Jonathan had intended to write up a blog article about the _SMALL_TABLE_THRESHOLD parameter – it appears that he had a chance to finish up that article:
http://jonathanlewis.wordpress.com/2011/03/24/small-tables/

25 03 2011
Kyle Hailey

Great point. My reaction was, “wow, that probably explains it!” but alas I tried and got the same results:

 drop table cacher;
  create table cacher(c1 char(2000), c2 char(2000), c3 char(2000)) nologging;
  insert /*+ append */ into cacher
    select 'x', 'x', 'x'
    from dual
       connect by level <= 220000     ;
  EXEC DBMS_STATS.gather_table_stats(NULL, 'CACHER');
  commit;
  alter system flush buffer_cache;
  set autot on stat
  select count(*) from cacher;
  select count(*) from cacher;
  select count(*) from cacher;
  set autot off;

with
_db_block_buffers 273102
_small_table_threshold 5462
and the results were:

80% of buffer cache Statistics with stats gathered on table
----------------------------------------------------------
     220033  consistent gets
     220005  physical reads
----------------------------------------------------------
     220023  consistent gets
          0  physical reads
----------------------------------------------------------
     220023  consistent gets
          0  physical reads
25 03 2011
Charles Hooper

Interesting… I will do a bit of experimentation with your test case.

24 06 2011
Plan is same, Wait profile is different ? « Coskan’s Approach to Oracle

[...] the memory the buffer cache and small table threshold (for more information about this threshold Charles Hooper-great discussion with Jonathan Lewis on comments, Tanel Poder [...]

5 02 2012
talebzadeh

Hi,

I tried to do find out simply at what table size (as percent of buffer size), Oracle decides to cache table blocks. The approach I used was based on what Kyle Hailey devised. First

Let us get the environment: Oracle version installed is 11.2.0.1.0 -64 bit, block size is 8K. buffer_cache = 11,072 MB

Small table threshold at 2% of buffer cache size/MB Small table block limit
--------------------------------------------------- -----------------------
                                                221                  28,344
 
hidden parameter _small_table_threshold

name                                     value (blocks)               
---------------------------------------- -------------------
_small_table_threshold                   27295     

To force Oracle to choose the conventional path I disabled direct path read first

drop table cacher;
create table cacher(c1 char(2000), c2 char(2000), c3 char(2000), c4 char(2000)) nologging;
insert /*+ append */ into cacher
select
          RPAD('*',2000)
         ,RPAD('*',2000)
         ,RPAD('*',2000)
         ,RPAD('*',2000)
from dual
connect by level user,tabname=>'CACHER',estimate_percent=>100);
commit;
alter system flush buffer_cache;
ALTER SESSION SET EVENTS '10949 trace name context forever, level 1'  -- No Direct path read ;
select count(*) from cacher;
select count(*) from cacher;
set autot on stat
 select count(*) from cacher;
 set autot off;

When I run the above I get:

  COUNT(*)
----------
    700000
 
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
    1400040  consistent gets
     224581  physical reads
          1  rows processed

OK so we are just above the threshold in here. At this stage table cacher details is

OWNER                            TABLE_NAME                               rows block size/KB       blocks avg free space/KB Table size/MB
-------------------------------- -------------------------------- ------------ ------------- ------------ ----------------- -------------
SSDTESTER                        CACHER                                700,000             8    1,401,868                 0         5,471


Table size as percent of buffer size
------------------------------------
                               49.42

So at almost 50% we are seeing some discard activity. If I relax the size of table to be ~45% of buffer size the table is fully cached.

 
 COUNT(*)
----------
    650000


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
    1300038  consistent gets
          0  physical reads
 
OWNER                            TABLE_NAME                               rows block size/KB       blocks avg free space/KB Table size/MB
-------------------------------- -------------------------------- ------------ ------------- ------------ ----------------- -------------
SSDTESTER                        CACHER                                650,000             8    1,301,762                 0         5,081


Table size as percent of buffer size
------------------------------------
                               45.89

So with conventional path and FTS when table size is roughly around 45% we have fully caching and block promotion to hot area

Now if we leave the optimiser to choose the default direct path read with FTS, we notice that up around 4.8% of buffer size, the table is fully cached.

Example:

  COUNT(*)
----------
     67500


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
     135022  consistent gets
          0  physical reads
 
  OWNER                            TABLE_NAME                               rows block size/KB       blocks avg free space/KB Table size/MB
-------------------------------- -------------------------------- ------------ ------------- ------------ ----------------- -------------
SSDTESTER                        CACHER                                 67,500             8      135,620                 0           528


Table size as percent of buffer size
------------------------------------
                                4.77

Above this figure, the table blocks are put at LRU end and discarded. At around 4.8% of buffer size the table is read in and discarded every time. Tkprof output shows this.

select count(*)
from
 cacher


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.00       4.02     136000     136006          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.00       4.02     136000     136006          0           1

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 96

Rows     Row Source Operation
-------  ---------------------------------------------------
      1  SORT AGGREGATE (cr=136006 pr=136000 pw=0 time=0 us)
  68000   TABLE ACCESS FULL CACHER (cr=136006 pr=136000 pw=0 time=179583 us cost=37034 size=0 card=68000)


Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                       2        0.00          0.00
  direct path read                             1073        0.00          3.79
  SQL*Net message from client                     2        0.00          0.00

So in summary as per default optimiser behaviour ‘direct path read’,the expectation is that any table below 5% of buffer cache is promoted to hot area. When the table size is larger than this figure blocks with full table scans go to LRU end and are fully discarded.

Regards,

Mich

6 02 2012
Jonathan Lewis

Mich,

An important point to consider in this test case is that every row in your table is chained (row length is more than 8,000, blocksize is 8KB, pctfree seems to be at the default of 10, means you have about 7,200 bytes per block – and then you can see that your table of 700,000 rows has used 1.4M blocks).

You may have discovered yet another type of boundary case for full tablescans, direct path reads, and caching, but you’re not looking at the general case. You’re query is simply select count(*), so Oracle shouldn’t have to follow the chain for each row, but it’s possible (and please take this as a flight of fancy at this point) that when Oracle reads a block which holds nothing but the target of a chained, it doesn’t count it is a “proper” read, and therefore manages to exclude it from the more generic arithmetic.

6 02 2012
talebzadeh

Hi Jonathan,

Many thanks for your input. To test the case again I decided to create the cacher table with PCTFREE = 0 as follows:

create table cacher(c1 char(2000), c2 char(2000), c3 char(2000), c4 char(2000))PCTFREE 0
nologging;
insert /*+ append */ into cacher
select
          RPAD('*',2000)
         ,RPAD('*',2000)
         ,RPAD('*',2000)
         ,RPAD('*',2000)
from dual
connect by level<=1350000;
exec dbms_stats.gather_table_stats(ownname=>user,tabname=>'CACHER',estimate_percent=>100);
commit;
alter system flush buffer_cache;
EXEC DBMS_MONITOR.SESSION_TRACE_ENABLE ( waits=>true, plan_stat=>'ALL_EXECUTIONS' );
ALTER SESSION SET EVENTS '10949 trace name context forever, level 1'  -- No Direct path read ;
select count(*) from cacher;
select count(*) from cacher;
ALTER SESSION SET TRACEFILE_IDENTIFIER = 'catcher';
EXEC DBMS_MONITOR.SESSION_TRACE_ENABLE ( waits=>true, plan_stat=>'ALL_EXECUTIONS' );
select count(*) from cacher;

With that in mind In the case above we have a table with the following properties. I also added the simple ratios of buffer cache to block size to give an indication of how many blocks fit in to this cache

Parameter                      buffer cache size/MB blocks fit in this cache
------------------------------ -------------------- ------------------------
buffer_cache                                 11,072                1,417,216


Small table threshold at 2% of buffer cache size/MB Small table block limit
--------------------------------------------------- -----------------------
                                                221                  28,344

name                                     value                isdefault            description
---------------------------------------- -------------------- -------------------- -------------------------------------------------------
_small_table_threshold                   27295                TRUE                 lower threshold level of table size for direct reads

My table details

OWNER                            TABLE_NAME                               rows block size/KB       blocks avg free space/KB Table size/MB
-------------------------------- -------------------------------- ------------ ------------- ------------ ----------------- -------------
SSDTESTER                        CACHER                              1,350,000             8    1,351,819                 0        10,552


Table size as percent of buffer size
------------------------------------
                               95.30

So almost one block per row and no free space. Tkprof reports shows:

select count(*)
from
 cacher


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      1.31       1.32          2    1350039          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      1.31       1.32          2    1350039          0           1

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 96

Rows     Row Source Operation
-------  ---------------------------------------------------
      1  SORT AGGREGATE (cr=1350039 pr=2 pw=0 time=0 us)
1350000   TABLE ACCESS FULL CACHER (cr=1350039 pr=2 pw=0 time=2925915 us cost=366427 size=0 card=1350000)

So there we go 135,0039 CR with 2 physical reads for a table that is around 95% of the buffer cache.

Obviously what really matters is the default optimizer behaviour when choosing ‘direct path read’. Let us what it shows now:

With direct path read the magic percentage of ~ 10% holds. Let us look at the below case. This table at 9.2% of buffer size in. Anything above is recycled

My table details

OWNER                            TABLE_NAME                               rows block size/KB       blocks avg free space/KB Table size/MB
-------------------------------- -------------------------------- ------------ ------------- ------------ ----------------- -------------
SSDTESTER                        CACHER                                130,000             8      130,612                 0         1,016


Table size as percent of buffer size
------------------------------------
                                9.18

Table block size/threshold limit 
------------------------------- 
                              5 



select count(*)
from
 cacher


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.11       0.11          0     130022          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.11       0.11          0     130022          0           1

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 96

Rows     Row Source Operation
-------  ---------------------------------------------------
      1  SORT AGGREGATE (cr=130022 pr=0 pw=0 time=0 us)
 130000   TABLE ACCESS FULL CACHER (cr=130022 pr=0 pw=0 time=252448 us cost=35406 size=0 card=130000)

I believe now that you assertions about 10% boundary below which repeating a scan WILL increment the touch count holds as per optimizer use of direct path reads. For conventional path reads if table fits into the cache then Oracle allows that.

In a way it perhaps explains why Oracle prefers direct path reads to conventional reads for FTS. I think because Oracle has more control in place on the use of PGA than SGA.. It also shows that threshold limit of 2% does not mean much.

Regards,

Mich

7 02 2012
talebzadeh

Hi,

If you recall a while back we discussed direct path reads vs conventional reads in OTN. The query was pretty simple:

DECLARE
        type array is table of tdash%ROWTYPE index by binary_integer;
        l_data array;
        l_rec tdash%rowtype;
BEGIN
        SELECT
                a.*
                ,RPAD('*',4000,'*') AS PADDING1
                ,RPAD('*',4000,'*') AS PADDING2
        BULK COLLECT INTO
        l_data
        FROM ALL_OBJECTS a;
 
        DBMS_MONITOR.SESSION_TRACE_ENABLE ( waits=>true );
        FOR rs IN 1 .. 100
        LOOP
                BEGIN
                        SELECT * INTO l_rec FROM tdash WHERE object_id = l_data(rs).object_id;
                EXCEPTION
                  WHEN NO_DATA_FOUND THEN NULL;
                END;
        END LOOP;
END;

The base table tdash had 1.7 million rows and I tried using direct path read (default) and db scattered file read. with conventional path it was 20% faster.
At that time the argument was that the run time engine preferred direct path read to save CPU usage in multi-user environment. Fair enough. My test was every time after reboot, one run only. However, we are looping over table 100 times fatching rows matching object_id. tkprof output showed:

for direct path read:

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          7         47          0           0
Execute    100      0.00       0.00          2         51          0           0
Fetch      100     10.52    6358.83  194142802  194831012          0         100
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total      201     10.53    6358.84  194142811  194831110          0         100

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 96  (SSDTESTER)   (recursive depth: 1)

Rows     Row Source Operation
-------  ---------------------------------------------------
      1  TABLE ACCESS FULL TDASH (cr=1948310 pr=1941430 pw=0 time=0 us cost=526908 size=8091 card=1)

There was full table scan for every read in the loop. With conventional path we had

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute    100      0.02       0.00          0          0          0           0
Fetch      100    150.74    5103.34  131458857  194490912          0         100
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total      201    150.76    5103.34  131458857  194490912          0         100

So we seems to have 194142811 - 131458857 = 62683954 or around 32% less physical reads. Table tdash had 1,729,204 rows in 2,709,153 blocks and Table size as percent of buffer size was 123%. So with more caching and less waits that 20% performance gain may be explained now?

Cheers,

Mich

7 02 2012
Charles Hooper

Hi Mich,

I think that this is the OTN thread where we discussed the topic of serial direct path reads (direct path read waits) versus conventional, buffered reads (db file scattered read waits) for table scans:
https://forums.oracle.com/forums/message.jspa?messageID=10077663

My opening paragraph in that thread stated:

… Keep in mind that the numbers that you are seeing are slightly distorted due to the fact that the buffer cache was empty when the test started. To understand how this may affect the test results, see the first comment on this page: http://dioncho.wordpress.com/2009/04/22/strong-doubt-on-small-table_small_table_threshold/

In short, you saw the significant 20% performance improvement with db file scattered reads because the buffer cache was empty at the start of the test, which is unlikely in a production environment. If you want to test this theory:

CREATE TABLE TDASH2 AS SELECT * FROM TDASH;

Bounce the database, execute a modified version of your script:

DECLARE
        type array is table of tdash%ROWTYPE index by binary_integer;
        l_data array;
        l_rec tdash%rowtype;
BEGIN
        SELECT
                a.*
                ,RPAD('*',4000,'*') AS PADDING1
                ,RPAD('*',4000,'*') AS PADDING2
        BULK COLLECT INTO
        l_data
        FROM ALL_OBJECTS a;

        FOR rs IN 1 .. 100
        LOOP
                BEGIN
                        SELECT * INTO l_rec FROM TDASH2 WHERE object_id = l_data(rs).object_id;
                EXCEPTION
                  WHEN NO_DATA_FOUND THEN NULL;
                END;
        END LOOP;
END;

Finally, execute your script with and without serial direct path reads enabled. You should now see that the buffer cache had less of an opportunity to cache frequently accessed blocks read during the full table scan of your tdash table, and you should find that the serial direct path read now not only uses less CPU, but also requires less elapsed time for its execution than the conventional buffered reads.

Jonathan, as usual, has provided excellent information in the comment section of this blog article, and on the OTN thread, and has mentioned items that I had not considered. For example, I did not consider that your cacher table example would result in chained rows (out of curiosity, I might test this just to see if I am able to identify the chained rows - I am assuming that Jonathan noticed that counting 67,500 rows required 136,006 consistent gets during the COUNT(*) FROM CACHER;).

7 02 2012
talebzadeh

Charles,

No doubt when buffer cache is busy, direct path reads (DPR) will be faster than conventional path reads (CPR). As suggested, I tried that as well. Exactly the same code on identical tables on different schemas. I started running the query in second schema and then I kicked off the query in the main schema. With DPR disabled, CPR results were slower compared to doing exactly the same concurrent run tests with DPR.

My current argument is that the optimiser costing will alway favour DPR to CPR when a table scan of > 10% of buffer cache is involved. In other words it does not matter how much free buffers are there (case a single session running nothing else), the run time engine will perform DPR and will always put the table in the LRU end of the chain to be discarded immediately.

So in summary with DPR, any table > 10% of buffer cache will be fetch and discard and will not be promoted to hot area.

I trust that this makes sense.

Cheers,

Mich

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s




Follow

Get every new post delivered to your Inbox.

Join 139 other followers

%d bloggers like this: