December 22, 2009
Over the years a lot of conflicting advice for maintaining Oracle databases has surfaced in books, magazine articles, websites, Usenet (and other discussion forums), and presentations. Even as the year 2009 draws to a close there are Oracle forum posts asking for help in improving the buffer cache hit ratio (BCHR).
Here is an interesting history of recommendations for the buffer cache hit ratio (BCHR), as presented in user conferences for a particular ERP platform:
Find the statistics in V$SYSSTAT or REPORT.TXTLogical reads = db block gets + consistent gets Buffer cache hit ratio (%) = ((logical reads - physical reads)/logical reads)*100
Ratio should be above 90%, if not increase the shared_pool_size
HIT_RATIO=((DB BLOCK GETS + CONSISTENT GETS - PHYSICAL READS)/ (DB BLOCK GETS + CONSISTENT GETS))*100
90 – 100 % Few physical reads. Current size is optimal if not a bit high. OK to remove some buffers if memory needed elsewhere.
70 – 89 % Buffer cache has low to moderate number of physical reads. Consider resizing if there is a serious problem with memory on the Oracle database.
0 – 69 % Buffer cache is experiencing moderate to high number of physical reads. Consider adding more buffers to the buffer cache.
Be aware that the Cache Hit Ratio isn’t necessarily the definitive answer; although it can be a good indication that something is up.
Bad performance indicators – Poor cache hit ratios
The buffer cache hit ratio was never mentioned.
If the above is any indication, recommending the use of the buffer cache hit ratio as a performance metric is on the decline. This seems to be confirmed by the helpful replies that typically follow requests for improving the buffer cache hit ratio in various forum threads. But, there is a twist. Do we know what the buffer cache hit ratio is supposed to measure? A search of the Internet, including Metalink, finds a number of formulas for calculating the buffer cache hit ratio. Some of those forumulas follow:
(logical reads – physical reads)/logical reads
——1 – (physical reads)/(consistent gets + db block gets)
——(logical reads)/(logical reads + physical reads)
——1 – (physical reads – physical reads direct – physical reads direct (lob))/(consistent gets + db block gets)
——1 – (physical reads – (physical reads direct + physical reads direct (lob)))/(db block gets + consistent gets – (physical reads direct + physical reads direct (lob)))
So many formulas, and then we have this one from the 11g R2 Performance Tuning Guide:
1 - (('physical reads cache') / ('consistent gets from cache' + 'db block gets from cache')
Nice… and a bit confusing. Pick a formula, any formula. But, what is it measuring?
A search of the Internet finds a large number of articles discussing the benefits of using the buffer cache hit ratio as a performance metric. For example:
“If the cache-hit ratio goes below 90% then:
* For Oracle 8 and earlier: increase the initialisation parameter DB_BLOCK_BUFFERS.
* For Oracle 9 onwards: increate the initialisation parameter DB_CACHE_SIZE.”
“In general, if the hit ratio is below 90%, and the dictionary cache has been tuned, increase the init.ora parameter DB_BLOCK_BUFFERS to increase the hit ratio.”
“One must use delta statistics over time to come up with a meaningful value for the ratio, and high logical I/O values can definitely be a leading cause of bad execution times. However, when properly computed, the buffer cache hit ratio is an excellent indicator of how often the data requested by users is found in RAM instead of disk, a fact of no small importance.”
“As a DBA, you are responsible for monitoring and calculating the Buffer Cache Hit Ratio in the SGA memory in case of performance problems…
– If the Buffer Cache Hit Ratio is more than 90% then there is no problem.
– If the Buffer Cache Hit Ratio is between 70% and 90% then there could be a problem.
– And if the Buffer Cache Hit Ratio is less than 70%, there is definitely a problem and the Buffer Cache size needs to be increased.”
“To summarize, our goal as the Oracle DBA is to allocate as much RAM as possible to the data buffers without causing the database server to page-in RAM. Whenever the hourly data buffer hit ratio falls below 90 percent, we should add buffers to the block buffers.”
“New myth – Ratio-based Oracle tuning is meaningless…
However, most OLTP systems and systems in which the working set of frequently-referenced data are not cached will greatly benefit from ratio-based tuning. Oracle Corporation recognizes that monitoring SGA usage ratios and adjusting the size of the SGA regions can have a dramatic impact on system performance, and this is the foundation of Oracle10g Automatic Memory Management (AMM) in which Oracle calculates the point of diminishing marginal return from adding data buffers”
“The data buffer hit ratio can provide data similar to v$db_cache_advice , and most Oracle tuning professionals use both tools to monitor the effectiveness of data buffers and monitor how AMM adjusts the sizes of the buffer pools.”
“The DBHR is a common metric used by Oracle tuning experts to measure the propensity of a row to be in the data buffer. For example, a hit ratio of 95 percent means that 95 percent of row requests were already present in the data buffer, thereby avoiding an expensive disk I/O. In general, as the size of the data buffers increases, the DBHR will also increase and approach 100 percent.”
“In order for the DBA to determine how well the buffer pools are performing, it is necessary to measure the hit ratio at more frequent intervals. Calculating the DBHR for Oracle8 and beyond is more complicated than earlier versions, but the results enable the administrator to achieve a higher level of tuning than was previously possible.”
“But the question remains about the value of this metric to the DBA.
Once I’ve tuned and stabilized my systems, I notice that the metrics create repeatable ‘signatures’, patterns of usage that form the baselines for the exception alerts.
First, we establish a ‘exception threshold’ for the BCHR, (e.g. +- 20%), and compare that deviation to the historical average, normalized by the historical average per hour and the day-of-the-week.”
“Many folks misunderstand that bit about ‘setting your own BHR’, and falsely conclude that it’s a useless metric. It’s not useless.
Of course, doing lots pre-buffered I/O (consistent gets) will increase the BHR, that’s the whole point, right?
That does not mean that the BHR is useless, it just means that it’s not a panacea.
The BHR remains very useful for detecting ‘undersized’ data buffers, where the working-set is not cached and Oracle is forced to do extra physical reads. . . .
If the BHR was totally useless, why does Oracle continue to include it in OEM alert thresholds, and STATSPACK and AWR reports?
The BHR is just like any other Oracle metric, you must understand its limitations and proper usage. It’s just one of many tools…”
http://groups.google.com/group/comp.databases.oracle.server/browse_thread/thread/1a946dbe8dcfa71e (Many, many pros and cons)
“BCHR can be manipulated. That is nothing new. All stats can be inflated in similar manners. But that doesn’t make them all meaningless. Given everything else being equal, high BCHR is always better than low BHCR… BCHR alone is not meant to tell performance. If it does, we would not have to look at anything else… BCHR alone does not tell you about overall performance. It simply tell you the disk I/O percentage. It is an indicator, a very meaningful one.”
A number of examples advocating the use of the buffer cache hit ratio also exist in Oracle books, many of which may be viewed, in part, through Google searches:
“Creating a self-tuning Oracle database: automating Oracle9i Dynamic SGA”
“The data buffer hit ratio (DBHR) measures the propensity for a block to be cached in the buffer pool, and the goal of the DBA is to keep as many of the frequently used Oracle blocks in buffer memory as possible…”
“Oracle9i High-Performance Tuning with STATSPACK”
“From this chart we can clearly see that the DBHR dropped below the recommended value of 90 percent at 3:00 A.M., 4:00 A.M., and 10:00 A.M. each day… The problem here is that the DBHR is dropping low at 10:00 A.M., a prime-time online period.”
“Oracle Tuning: The Definitive Reference”
“A query like the one that follows can be used to see a metric’s behavior for the recent time period. For example, the following query shows data buffer hit ratio history for the last hour.”
“Expert Oracle Database 11g Administration” – Page 190
“this is why the buffer cache hit ratio, which measures the percentage of time users accessed the data they needed from the buffer cache (rather than requiring a disk read), is such an important indicator of performance of the Oracle instance.”
The author provides a link on page 1161 to an article authored by Cary Millsap which discusses why a higher buffer cache hit ratio may not be ideal. This is definitely a step in the right direction regarding the buffer cache hit ratio, but it might be better to simply ignore the statistic.
“Oracle Database 10g Performance Tuning Tips & Techniques”
“Some DBAs (usually those trying to sell you a tuning product) minimize the importance of hit ratios (proactive tuning) and focus completely on waits (reactive tuning), since focusing on waits is a great way to quickly solve the current burning problems. By monitoring the Instance Efficiency section (and using all of STATSPACK and Enterprise Manager), the DBA will combine reactive and proactive tuning and will find some problems before the users scream or wait events hit the top 5 list. Hit ratios are one important piece of the puzzle (so are waits).”
“Hit ratios are a great barometer of the health of your system. A large increase or drop from day to day is an indicator of a major change that needs to be investigated.”
“Hit Ratio = (Logical Reads – Physical Reads) / Logical Reads” “The buffer hit ratio should be above 95 percent. If it is less than 95 percent, you should consider increasing the size of the data cache by increasing the DB_CACHE_SIZE initialization parameter (given that physical memory is available to do this).”
There are, of course, very strong counter-points to using the buffer cache hit ratio (BCHR) as a tuning metric. The first two are very well written, detailed articles:
http://jonathanlewis.wordpress.com/2007/09/05/hit-ratios-2/ (a very detailed blog entry)
“Ratios are highly suspect for monitoring purposes. If you think a ratio is helpful, think carefully about whether you should be monitoring the two underlying values instead.
The buffer cache hit ratio (BCHR) is a particularly bad example of the genre as there are so many events that make the attempt to correlate BCHR and performance meaningless.”
http://richardfoote.wordpress.com/2007/12/16/buffer-cache-hit-ratios-useful-or-not/ (a very detailed blog entry)
“The biggest problem of all with the BCHR is that regardless of it’s values, or whether it goes up or down or remains unchanged, we need to perform precisely the same database checks regardless as it doesn’t tell us whether the ‘health’ of the database has improved, got worse or remains unchanged.”
“Many DBAs do their best to get a 99% or better buffer cache hit ratio, but quickly discover that the performance of their database isn’t improving as the hit ratio gets better.”
“The evidence that hit ratios are unreliable is overwhelming, and similar ratio fallacies occurring in other industries are well documented.”
“Oracle Performance Troubleshooting: With Dictionary Internals SQL & Tuning“
“For example, there have been a number of benchmarks done to prove that a 99-100% buffer cache hit ratio does not mean that a database is running well. A high cache hit ratio can be observed while the database is literally at a standstill.”
Cary Millsap’s “Why a 99%+ Database Buffer Cache Hit Ratio is Not Ok”
“Many tuning professionals and textbook authors sell advice encouraging their customers to enjoy the performance virtues of Oracle database buffer cache hit ratios that approach 100%. However, database buffer cache hit ratio is not a reliable system performance metric. Buffer cache hit ratios above 99% usually indicate particularly serious SQL inefficiencies”
“The data buiffer hit ratio has limited value”
“This ratio should not be used as an indicator of database performance health. Much Oracle software documentation touts the database buffer cache hit ratio as being one of the most important tuning metrics. In my opinion and that of many others this statement is often a complete and absolute falsity.”
“So if someone tells you that the buffer cache hit ratio must be a good thing because Oracle has based their v$db_cache_advice technology on it, then they are displaying a lack of understanding about the deficiencies of the buffer cache hit ratio in particular, and how LRU (least recently used) caching mechanisms work in general.”
Are your users complaining about performance?
If not, then your hit ratio is just fine and dandy, whatever it happens to be.
The more general point here is that, *** and *** notwithstanding, the buffer cache hit ratio is an absolutely abysmal way of tuning anything. It can sometimes offer a useful corollary to other statistics; to allow you to distinguish between two otherwise equally plausible causes for, for example, free buffer waits. But as a performance tuning goal in its own right? Furgedaboudit.”
“And it only take *one* piece of what I technically define as ‘Crap’ code to both inflate the BHR to incredibly high levels whilst at the same time killing or impacting *database* performance.
I’m probably more sympathetic to BHRs than many. However, it provides only one very small piece in the tuning puzzle, one that needs to be put into perspective. It can be used as an indicator of whether the buffer cache is set way to low/high and nothing more. And what it’s actual *value* is of little consequence, there is no such thing as an ideal value x.
Does a 99.9% BHR mean the database/buffer cache/sql is well tuned. Possibly.
Does a 99.9% BHR mean the database/buffer cache/sql is poorly tuned. Possibly.
So what does a 99.9% BHR actually mean and represent? Without being able to answer this question in it’s fullness, the figure is meaningless.
You get the point.”
“Tuning Oracle Without Cache-Hit Ratios“
“From time immemorial, Oracle performance tuning has the infamous label of witchcraft, secretly practiced by an elite group of individuals, who allegedly use voodoo to cure the performance problems of an Oracle system. To compound this misperception there exists many thousands of pages of published material that propagate the idea of tuning Oracle with cache-hit ratios.”
“Bottom line, Oracle tuning efforts need to be based on isolating and pinpointing bottlenecks (the disease) not cache-hit ratios (the symptoms).”
Of course, tools are available to help correct a low buffer cache hit ratio:
From the book “Optimizing Oracle Performance”
Connor McDonald’s “Choose any hit ratio”
Jonathan Lewis’ “A higher buffer hit ratio is a good thing. (17-Jan-2001)” script