Book Review: Oracle Performance Firefighting (Fourth Printing)

25 06 2010

June 25, 2010

Many Helpful Topics, However the Errors and Omissions are Unnecessary Distractions
http://resources.orapub.com/Oracle_Performance_Firefighting_Book_p/ff_book.htm

This review covers the June 2010 fourth printing of the “Oracle Performance Firefighting” book.  I bought a PDF copy of the book direct from the OraPub site.  The author’s excellent “Forecasting Oracle Performance” book was a factor in my decision to purchase this book, as was my desire to learn something new while trying to maintain previously acquired knowledge.  As stated in this book, the author has roughly 20 years of experience working with Oracle Database, worked with Cary Millsap at Oracle Corporation for a period of time, and conducts a successful consulting and training business.  I recognize the names of three of the six technical reviewers listed in the book as well known experts in the Oracle community.  The author’s website states that there are no known issues/errata starting with the third printing (November 2009) of the book (reference).  All of the components for the book’s success are in place, so how did the book perform?

There are definitely positive moments in the book, where the book successfully communicates complicated topics using very clear statements.  The book publisher selected a font size and margin size that packs a significant amount of detail into each of the book’s 387 pages.  There are several well-placed helpful diagrams with no space wasting cartoon drawings.  The author introduced his three circle analysis which involves identifying the intersection of the Oracle Database, the Application, and the Operating System to help pinpoint the source of performance problems.  Several examples were provided where the author used the Linux strace command to troubleshoot Oracle Database performance problems.  The book recommends against modifying the _SPIN_COUNT hidden initialization parameter, which is a wise advisement. Clearly states that when there is an operating system bottleneck, there are only 4 subsystems to investigate: CPU, memory, IO, and network.  Provides a good demonstration of using DBMS_MONITOR to enable 10046 traces.  Provides a nice description of in-memory undo and queuing theory in easily understood language.  The author’s website offers the OSM toolkit, which is a collection of scripts that can be freely downloaded – several of the scripts were used throughout the book.  It is obvious that the author invested a significant amount of time into the writing of this book.

There are also definitely negative moments in the book, sections filled with typing errors, logic errors, and omissions.  Signs of problems include a paragraph that includes four hidden initialization parameters that the author suggests to adjust, without the author indicating by how much the initialization parameters should be adjusted, and without the author explaining why three of the four hidden initialization parameters simply do not exist in Oracle Database 11.1.0.7.  Some sections of the book are written in a condescending tone, as if no one other than a DBA is intelligent enough to understand the challenges that a DBA or performance tuning consultant faces, even if the concepts are provided in simple terms to others.  Very few reproducible test cases are provided in the book, which makes it a little difficult for readers to practice or duplicate concepts introduced in the book.

This book is not a replacement for the “Troubleshooting Oracle Performance” book – the two books share very little in common.  This book seems to spend little time explaining how to understand the root cause of the performance problems through various trace file analyses (10046, 10053, 10032, etc.) or even analysis of execution plans, while the author does spend a lot of effort suggesting changes to initialization parameters.  The book’s author is in a somewhat unique position in that he has either a 3 out of 5 star book, or a 5 out of 5 star book depending on the corrections that make it into the fifth printing of the book.  The PDF version of the book cannot be printed, does not permit copying text, and has the purchaser’s name, address, and phone number at the top of each page.

Specific errors, omissions, and distractions in order:

  • The book uses a lot of footnotes, some of which do not add value to the discussion of Oracle Database.
  • Page 16 suggests to keep working to improve performance until time and budget are exhausted, or performance is acceptable.  I think that this is a pathway to compulsive tuning disorder.
  • Page 22 suggests that when the CPU run queue is usually longer than the number of CPU cores, that means that processes are needing to wait for CPU resources.  This suggestion begs the question – isn’t it possible to reach the point of processes needing to wait for CPU resources without the CPU run queue reaching the length of the number of CPU cores?
  • Chapters one and two seem to spend a lot of time trying to relate Oracle firefighting concepts to non-computer related activities: telephone operator, school report card, doctor checking a young child, clients requiring the author to keep a timecard to track time, etc.  Additionally, the early chapters excessively repeat the terms firefighting and firefight.
  • Page 41: Missing letter “Oracle has never guaranteed it has completely instrumented it code.”
  • Page 42: Missing letter “By tracing an Oracle process through the operating system, we can easily observe how Oracle has instrumented it code.”
  • Page 43: Incorrectly defines the meaning of a db file scattered read wait event when attempting to correlate a Linux readv OS call to an Oracle wait event: “Regardless of the operating system or the actual system call, Oracle gives a multiblock IO call the special name db file scattered read, because these multiple blocks can be scattered over the IO subsystem.”  One of the problems with the quote is that a multiblock read from the operating system could correspond to several different Oracle wait events.  A second problem is that the readv call reads logically adjacent sections of a file (think multiple Oracle blocks) on disk into non-adjacent memory locations (reference).
  • Page 47: While Table 2-1 states that only selected columns of V$SYSTEM_EVENT are displayed, the table probably should have included the TOTAL_TIMEOUTS column, which helps describe the true length of wait events that have, for instance, three second timeouts – this column has existed since Oracle Database 8.1.5.  The table also states that the WAIT_CLASS, WAIT_CLASS_ID, AND WAIT_CLASS# columns (the book incorrectly includes an underscore between CLASS and #) were introduced in 11g R1, while the Oracle documentation shows that those columns were introduced in 10g R2 (reference).
  • Page 49: While Table 2-2 states that only selected columns of V$SESSION_WAIT are displayed, the table seems to discount the value of the SECONDS_IN_WAIT column by excluding that column.  The table also states that the WAIT_CLASS, WAIT_CLASS_ID, AND WAIT_CLASS# columns (the book incorrectly includes an underscore between CLASS and #) were introduced in 11g R1, while the Oracle documentation shows that those columns were introduced in 10g R2 (reference).  The table should provide a better description for the STATE column’s value of “WAITED KNOWN TIME: Previous wait completed; the session is currently not waiting.” – the book should describe what does this STATE value means: that the session is in an idle wait event, or that the session is currently burning CPU time (this is what it means).
  • Page 55: States “Systems based on Oracle Database 10g and later can take advantage of the v$sys_time_model and v$ses_time_model views…” – the session level view is actually named V$SESS_TIME_MODEL.
  • Page 61: The book strongly suggests that a user’s “Submit” or “Save” operation equates to a COMMIT, and that a COMMIT equates to a LOG FILE SYNC wait event, and therefore if the average LOG FILE SYNC wait time is identical to a previous day when performance was acceptable, the performance problem is “most likely not related to the database.”  A lot may be said about where this paragraph fails the reality check – just think of one other operation that may happen when the user selects the “Save” option, or one other cause for the LOG FILE SYNC wait, or how averages may distort the true slowness that is experienced by the user, or how the author is simply measuring the wrong thing when trying to determine if the problem is present in the database (reference).
  • Page 68: Example of condescending tone: “When working on Oracle systems, you will eventually hear some poor whiner say something like, ‘Oracle is a resource hog!  It consumes every bit of resources it can get its despicable hands on!’”  More examples of condescending/confrontational tone with the server admin, storage admin, storage vendor, etc. are found on pages 128, 130, 132, 148, 156, 201, 264, 324.
  • Page 74: Example of an odd footnote that does not add value to the book: “I am in no way implying Oracle sincerely cares how much memory its database system requires.”
  • Page 94: The author indicated that one of his tests that modified the _KKS_USE_MUTEX_PIN parameter showed that the mutex implementation decreased the execution time by 11%, yet the author did not provide a SQL statement to create the test table, nor provide the PL/SQL script used in the test so that readers could reproduce the test results.
  • Page 104: The book states: “Virtual memory management, process scheduling, power management, or essentially any activity not directly related to a user task is classified as system time.  From an Oracle-centric perspective, system time is pure overhead…”  The book fails to mention that activities in Oracle Database itself directly cause kernel mode CPU utilization.  Network I/O,  disk I/O (especially with buffered file systems), timer calls (typically gettimeofday calls),  memory access requests, memory management, and process creation and termination (think session creation and termination) all directly cause kernel mode CPU utilization.  The paragraph seems to suggest that the DBAs has no control over kernel mode CPU consumption (reference reference2).
  • Page 116: States: “While it may seem strange, the run queue reported by the operating system includes processes waiting to be serviced by a CPU as well as processes currently being serviced by a CPU.”  This is an operating system specific observation.  Linux appears to behave this way, Windows does not, and others have indicated that various Unix releases do not behave the same as Linux (reference).
  • Page 136: States: “For example, if you ran a response time-focused report over a duration of 60 seconds, and if the single-CPU core subsystem were heavily overworked, Oracle could report that Oracle processes consumed 65 seconds of CPU time. Yet, we clearly know the CPU subsystem is limited to supplying up to 60 seconds of CPU time and no more.”  This appears to be an operating system specific CPU usage measurement problem.  Windows will exhibit this problem, where time spent in the run queue is accumulated with time spent actually running on the CPU, but I could not reproduce that behavior on Linux (reference).
  • The book seems to use Oracle 10g R1 frequently for tests, rather than something more recent, such as 10g R2, in various tests.  Oracle 10g R1 was desupported quite a long time ago, while the 10.2.0.5 patch was only recently released.  In the span of a couple of sentences, the book frequently uses the term “Oracle” to mean both “Oracle Database” and “Oracle Corporation” – this might cause some degree of confusion.
  • Page 175: The book seems to suggest that bind variable values are typically included in TKPROF summaries, except when multiple trace files are combined.  Bind variable values are never included in TKPROF summaries.
  • Page 177: The book seems to recommend changing hidden initialization parameters without warning that those parameters should only be modified after consulting Oracle support.
  • Page 182: The book suggests modifying the _ASH_SAMPLING_FREQUENCY hidden initialization parameter.  Unfortunately, this parameter does not exist in Oracle Database 11.1.0.7 (the only release that I checked).
  • Page 183: The book suggests that changing the _ASH_SIZE hidden initialization parameter will modify the amount of memory used by Active Session History, yet the book mentions that Oracle does not always respect the setting.  This might be a sign that the parameter controls something other than what the author described.  The author also mentioned changing the  _ASH_ENABLE hidden initialization parameter.
  • Page 200: The author suggests that the _DB_BOCK_MAX_CR_DBA hidden initialization parameter controls the number of clones per buffer, however that parameter does not exist in Oracle Database 11.1.0.7.  I believe that the author intended to write _DB_BLOCK_MAX_CR_DBA which attempts to control the maximum number of consistent read copies of a block.
  • Page 203: The author suggests that changing the _DB_BLOCK_HASH_LATCHES hidden initialization parameter will provide some relief to cache buffer chains latch contention.  This is one of the few times that the author cautions, “Before you change this, be aware there may be support issues from both Oracle and your application vendor.”
  • Page 208: The book states that the _SMALL_TABLE_THRESHOLD hidden initialization parameter was deprecated because of the modified LRU (Least Recently Used) algorithm that the book stated was introduced in Oracle Database 8.1.5, and the hidden initialization parameter assumed new life with the introduction of Oracle Database 11g where the “parameter is the threshold for a server process to start issuing direct reads.”  Metalink (My Oracle Support) Doc ID 787373.1 indicates that the parameter was never deprecated.  The actual threshold for selecting a direct path read for a serial operation in Oracle Database 11.1.0.6 and above is roughly five times the value specified for _SMALL_TABLE_THRESHOLD.  Various other articles describe the purpose of the _SMALL_TABLE_THRESHOLD parameter prior to Oracle Database 11.1.0.6. (reference reference2 reference3)
  • Page 213: The book states: “Without physical blocks being read from disk, there will be no LRU chain latch contention, because there will be no need to find a free buffer or insert a buffer header into an LRU chain.”  This statement begs the question, what about consistent read copies of blocks, where Oracle must essentially roll back a block to a specific SCN to provide a consistent read for an in-process query?
  • Page 219: The book states: “Investigate the _db_block_write_batch and the _db_writer_max_writes parameters. Also considering increasing the instance parameter _db_writer_max_scan_pct (the default may be 40, for 40%) or _db_writer_max_scan_cnt…”  One problem with these two sentences is that the _DB_BLOCK_WRITE_BATCH, _DB_WRITER_MAX_SCAN_PCT, and _DB_WRITER_MAX_SCAN_CNT hidden initialization parameters do not exist in Oracle Database 11.1.0.7 (the only release checked) (reference).
  • Page 221: The book states: “Now suppose you have scanned more than _db_writer_max_scan_pct buffer headers. If so, you would be very frustrated.”  Unfortunately, the _DB_WRITER_MAX_SCAN_PCT hidden initialization parameter does not exist in Oracle Database 11.1.0.7 (the only release checked).
  • Page 222: The book states: “Increase _db_writer_max_scan_pct. This will give the database writer more time to flush its write list.”  Unfortunately, the _DB_WRITER_MAX_SCAN_PCT hidden initialization parameter does not exist in Oracle Database 11.1.0.7 (the only release checked).
  • Page 223: The book states, “Strangely, in Oracle Database 11g, Oracle removed the reason code from the v$session_wait and v$session view’s p3 column!”  Column P3 was changed from the reason code to block class starting with Oracle Database 10g R1.
  • Page 226: The author used block dumps to investigate locks, while it seems that it would be easier to query a couple of views.
  • Page 230: The book states, “… Simply keep adding free lists until the contention subsides.” This begs the question of what side-effects may be encountered?
  • Page 231: The book describes how active transactions are assigned to undo segments.  Limited testing indicates that the book is correct unless the objects are created in the SYSTEM tablespace (creating objects in that tablespace is not recommended).
  • Page 253: The book used library_cache dumps to investigate child cursors – it seems that there are easier, less resource intensive ways to obtain this information.
  • Page 256: The book states: “Oracle keeps track of these reloads in the reloads column from v$library_cache…” – the correct view name is V$LIBRARYCACHE, not V$LIBRARY_CACHE.
  • Page 265: The book mentions the use of CURSOR_SHARING = SIMILAR and FORCE, but does not discuss the performance impact of the change other than stating that Mark Gury stated that CURSOR_SHARING = FORCE can result in incorrect results.  No other discussion of performance problems were provided.
  • Page 284: The book describes the author’s under-powered Linux server used in a test.  The server was configured with 512MB of memory, 256MB shared pool, 4MB for the buffer cache and was running Oracle Database 11g Release 1.  The Oracle documentation (reference) states that 1GB of memory is the minimum for Oracle Database 11g Release 1, and that puts into question the validity of any tests performed with that server.  The details of the test case, which obtained a 21% decrease in CPU usage through the use of in-memory undo over the configuration with in-memory undo disabled, was largely omitted, making it impossible for readers to duplicate the test.
  • Page 286: The book lists a parameter named COMPATIBILITY, however that parameter does not exist.  The actual parameter name is COMPATIBLE.
  • Page 289: The book states: “Increase the number of IMU pools by increasing the _im_pools parameter…”  The _im_pools hidden initialization parameter does not exist in Oracle Database 11.1.0.7 (the only release checked) – the author probably intended to write _IMU_POOLS.
  • Page 294: The test script shows that redo statistics for a session are recorded, the session sleeps for 5 seconds, the session queries a table, and then the redo statistics are recorded again.  The author attempted to explain that redo could be generated by a session merely by querying a table due to delayed block cleanout.  The test case is slightly unclear, making it difficult for readers to reproduce the results – there is no point in the session sleeping for 5 seconds, it would have been helpful if the author explained what changes were made to the CUSTOMERS that eventually resulted in the delayed block cleanouts, and because the statistic number for the redo size statistic is prone to change from one Oracle release to the next it would have made more sense to specify SN.NAME=’redo size’ rather than essentially specifying SN.STATISTIC#=133.
  • Page 297 states: “… There is also the instance parameter _log_small_entry_max_size, which is used to shift allocation latch activity onto one of the redo copy latches…”  The _LOG_SMALL_ENTRY_MAX_SIZE hidden initialization parameter does not exist in Oracle Database 11.1.0.7 (the only release checked).
  • Page 298 states: “The DBA can specifically set the number of redo strands via the hidden parameter _log_parallelism.”  The _LOG_PARALLELISM hidden initialization parameter does not exist in Oracle Database 11.1.0.7 (the only release checked).
  • Page 300 states: “Global temporary tables can be explicitly dropped. When this occurs, the associated indexes and views are also dropped.”  The associated indexes will be dropped assuming that the temporary table is also purged, however affected views will not be dropped and will have a status of INVALID.
  • Page 301 states: “Object growth can be observed by repeatedly by querying v$sort_usage_view.  Global temporary tables create 50% less redo than standard tables.” This quote has a minor grammar error, and the view name is V$SORT_USAGE (should not include _VIEW in the name).  The question remains how the author determined that working with global temporary tables generates half as much redo as regular heap tables.
  • Page 302: Another instance where the author correlates an application save with a LOG FILE SYNC wait and concludes that the problem is not in the database instance.
  • Page 305: While it is mentioned on the next page in another section, it probably would have made sense to mention that the redo log buffer is also flushed every three seconds and repeated that the redo log buffer is also by default flushed after every COMMIT.
  • Page 314: The footnote states: “When a process is serviced by one CPU and then is switched to another CPU, it undergoes a context switch… Exclusively binding a CPU and process together virtually eliminates context switching, because the CPU will always be there for that process and the process will never need to wait while the operating system switches its processing to another CPU.”  The author seems to have either redefined the meaning of context switch, or the footnote needs additional clarification (reference reference2).
  • Page 315 states: “The redo log buffer is structured in relatively small operating system-size blocks (typically 512 bytes), not Oracle-size blocks, so it is normal for log writer activity to be performed with multiple blocks.”  The Oracle documentation (reference) states that the redo log file block size defaults to the physical sector size of the disk, which is typically 512 bytes.
  • Page 322 states: “The heavy redo-generating SQL will obviously be some type of DML, like inserts, updates, deletes, or perhaps create statements. If you have ASH licensed, as Figure 8-9 demonstrated, you can easily find the SQL related to the log file parallel write event.” – The heavy redo generating SQL statement could also be a SELECT statement due to delayed block cleanout (as mentioned earlier in the book), a CREATE statement would likely be classified as DDL.  Since the LOG FILE PARALLEL WRITE wait event is posted by LGWR and not a user session, there probably will not be a SQL statement that is definitely associated with the LOG FILE PARALLEL WRITE wait event.
  • Page 324: The book makes a claim that is difficult to verify: “Using global temporary tables can cut redo generation by 50%, substantially reducing Oracle’s overall IO requirements. This alone could solve the log file parallel write problem.”
  • Page 326 states: “If the log writer encounters a nonarchived redo log, it will stop writing and post the wait event log file switch (archive incomplete).”  The actual wait event that will be posted is log file switch (archiving needed).
  • Pages 327 – 387 seem to be related to the author’s “Forecasting Oracle Performance” book, with concepts written in an easier to understand, more fluid writing style.  This is not a problem, just worth mentioning.
  • Page 330: The word “difference” should be replaced with “different”.
  • Page 330 states: “For example, the Statspack facility typically collects data in 60-minute intervals and stores the data in tables starting with stats$ (the key table is stats$snap).”  There is no typical collection period for Statspack, while AWR snapshots default to 60 minutes in duration.
  • Page 333 states: “In previous chapters, I have presented how to gather CPU requirements from the v$sesstat, v$sysstat, v$ses_time_mode, and v$sys_time_model views.”   V$SES_TIME_MODEL should be written as V$SESS_TIME_MODEL.
  • Page 343 states: “Also, Oracle CPU consumption includes Oracle processes waiting for CPU and also queuing for CPU. As a result, in a CPU-saturated system, Oracle may report CPU consumption higher than actually occurred.”  The behavior mentioned by the author is operating system dependent.
  • Page 344 states: “In other words, Oracle does not have the ability to split CPU time into service time and queue time. When we gather CPU consumption from either the instance statistics views or the system time model views, what we collect as CPU time and typically classify as CPU service time actually contains both CPU service time and CPU queue time. Oracle has no way of knowing the difference and reports the total, which is, in the truest sense, CPU response time.” This behavior is operating system dependent.
  • Page 346 states: “The arrival rate in Figure 9-9, which is the horizontal axis, is simply the number of logical IOs (v$sysstat: buffer gets plus consistent gets) processed per millisecond.”  There is no statistic in V$SYSSTAT that is named buffer gets, the author probably should have stated db block gets instead. (reference).
  • Page 347 states: “The instance statistic physical reads signifies the number of Oracle blocks that Oracle server processes had to request from the operating system because the blocks did not reside in Oracle’s buffer cache.”  The physical reads statistic includes block reads that will never hit the buffer cache, including blocks read into the PGA.  It appears that the author is describing the physical reads cache statistic instead.
  • Page 349 states: “…when focusing on a particular session or group of sessions, use v$sesstat, v$ses_time_model, and v$session_event.” The correct view name is V$SESS_TIME_MODEL.

The “Oracle Performance Firefighting” book has a lot of potential for being helpful, however the errors and omissions found in the fourth printing of the book give the book a decidedly dangerous edge.  If flipping initialization parameter values fits your style, this book will be a good fit.

Pages 352 though 387 are currently not included in this book review.

(Section added December 13, 2010): Related blog articles that discuss the contents of this book:





Redo Log Buffer – What is Wrong with this Quote?

24 06 2010

June 24, 2010

I am in the process of reading the June 2010 printing (fourth printing) of the “Oracle Performance Firefighting” book.  To say that I have stumbled upon one or two gems in the book is probably an understatement.  The book is filled with interesting statements – this time I found a couple of interesting paragraphs that describe several initialization parameters and their effect on the redo log buffer prior to Oracle 9i R2, and the current behavior.

Redo Log Buffer Prior to Oracle Database 9i R2, page 297:

Having a single redo allocation latch makes enforcing redo serialization very straightforward. But as you can imagine, having a single redo allocation latch also can become a point of contention. To reduce the likelihood of this, server processes hold the allocation latch just long enough to allocate redo log buffer space. There is also the instance parameter _log_small_entry_max_size, which is used to shift allocation latch activity onto one of the redo copy latches, as discussed in the ‘Redo Allocation Latch Contention’ section later in this chapter. To further reduce the contention possibilities, Oracle allows for multiple redo copy latches. The instance parameter _log_simultaneous_copies is used to control the number of redo copy latches.

Redo Log Buffer Oracle Database 9i R2 and Later, page 298:

By default, the number of redo strands is dynamic, but it can be made static by setting the hidden instance parameter _log_parallelism_dynamic to false. When Oracle is dynamically controlling the number of redo strands, the maximum number of strands is controlled by the hidden instance parameter _log_parallelism_max. The DBA can specifically set the number of redo strands via the hidden parameter _log_parallelism. The default number of redo strands is surprisingly low—perhaps two.

What, if anything, is wrong with the above quotes from the book (it is possible that nothing is wrong)?  Keep in mind that these blog articles are intended to generate discussion – be sure to check any and all comments attached to the articles in this series.  The comment section is where the heart of the blog article material in this series will be found.

The point of blog articles like this one is not to insult authors who have spent thousands of hours carefully constructing an accurate and helpful book, but instead to suggest that readers investigate when something stated does not exactly match what one believes to be true.  It could be that the author “took a long walk down a short pier”, or that the author is revealing accurate information which simply cannot be found through other resources (and may in the process be directly contradicting information sources you have used in the past).  If you do not investigate in such cases, you may lose an important opportunity to learn something that could prove to be extremely valuable.





Undo Segments – What is Wrong with this Quote?

21 06 2010

June 21, 2010

I located another interesting section of a paragraph in the June 2010 printing of the “Oracle Performance Firefighting” book.  From page 231:

“By default, Oracle tries to assign only one active transaction per undo segment. If each undo segment has an active transaction and if there is space available in the undo tablespace, Oracle will automatically create an additional undo segment. This usually takes care of the buffer busy waits. However, if there is no more room in the undo segment tablespace, multiple transactions will be assigned to each undo segment, and eventually undo segment header contention will result. The solution is to add another database file to your undo segment tablespace, thus enabling Oracle to create additional undo segments. It’s that simple.”

What, if anything, is wrong with the above quote from the book (it is possible that nothing is wrong)?  Keep in mind that these blog articles are intended to generate discussion – be sure to check any and all comments attached to the articles in this series.  The comment section is where the heart of the blog article material in this series will be found.

The point of blog articles like this one is not to insult authors who have spent thousands of hours carefully constructing an accurate and helpful book, but instead to suggest that readers investigate when something stated does not exactly match what one believes to be true.  It could be that the author “took a long walk down a short pier”, or that the author is revealing accurate information which simply cannot be found through other resources (and may in the process be directly contradicting information sources you have used in the past).  If you do not investigate in such cases, you may lose an important opportunity to learn something that could prove to be extremely valuable.





Database Writer Parameters – What is Wrong with this Quote?

18 06 2010

June 18, 2010

While reading the June 2010 printing of the “Oracle Performance Firefighting” book I keep stumbling across interesting sections of the book.  Here is an interesting section – I hope that I did not exclude too much of the surrounding section of the book, causing the paragraph to lose contextual meaning (I am trying to balance how much typing I need to do with how much needs to be quoted to avoid losing the meaning of the material).  From page 219:

“From an Oracle perspective, I start thinking of any instance parameter that may increase Oracle’s IO writing efficiency. For example, I would investigate looking for a way to increase the database writer’s batch write size. As I mentioned, there are version-specific ways to alter the database writer’s batch size. Investigate the _db_block_write_batch and the _db_writer_max_writes parameters. Also considering increasing the instance parameter _db_writer_max_scan_pct (the default may be 40, for 40%) or _db_writer_max_scan_cnt, as they determine how many LRU buffer headers a server process will scan before it signals the database writer to start writing. Increasing these parameters provides more time for the write list to build up, and therefore results in more blocks written per database writer IO request.”

What, if anything, is wrong with the above quote from the book?  Keep in mind that these blog articles are intended to generate discussion – be sure to check any and all comments attached to the articles in this series.  The comment section is where the heart of the blog article material in this series will be found.

The point of blog articles like this one is not to insult authors who have spent thousands of hours carefully constructing an accurate and helpful book, but instead to suggest that readers investigate when something stated does not exactly match what one believes to be true.  It could be that the author “took a long walk down a short pier”, or that the author is revealing accurate information which simply cannot be found through other resources (and may in the process be directly contradicting information sources you have used in the past).  If you do not investigate in such cases, you may lose an important opportunity to learn something that could prove to be extremely valuable.





_SMALL_TABLE_THRESHOLD Parameter and Buffer Cache – What is Wrong with this Quote?

17 06 2010

June 17, 2010

Warning: this article is certain to bring tears of pain to your eyes while you attempt to determine what is right and what is wrong.

While reading the June 2010 printing of the “Oracle Performance Firefighting” book I found a couple of interesting lines about the buffer cache and the effects, or lack there-of, of the _SMALL_TABLE_THRESHOLD parameter.  This quote is from page 208 (this is a long quote, required to not lose the context of the discussion topic – please excuse any typos):

“The single most radical departure from the modified LRU algorithm [introduced in Oracle Database 6, per the book] is known as midpoint insertion [introduced in Oracle Database 8.1.5, per the book]. Each LRU chain is divided into a hot and cold region. When a buffer is read from disk and a free buffer has been found, the buffer and buffer header replace the previous buffer and buffer header contents, and then the buffer header is moved to the LRU chain midpoint. Single-block read, multiblock read, fast-full scan, or full-table scan—it makes no difference. The buffer header is not inserted at the MRU end of the LRU chain, but rather at the midpoint…

Because the window scheme used in the modified LRU algorithm is no longer used, the hidden instance parameter _small_table_threshold became deprecated. However, in Oracle Database 11g, it is being used again, but for a different purpose. Starting with this version, the _small_table_threshold parameter is the threshold for a server process to start issuing direct reads.”

Before deciding what about the above is correct or incorrect, take a look at Metalink Doc ID 787373.1.  A couple of words from that document to encourage you to take a look at the original material:

“Applies to Version: 9.0.1.0 to 11.2.0.1.0: When loading objects and data into the buffer cache… best way to load for full scans. This functionality is called smallness logic and is driven by a hidden parameter called: _small_table_threshold. … Oracle will load the object via the buffer cache as this is more efficient than doing a direct read operation.”

Before deciding what about the above Metalink document is correct or incorrect, take a look at this AskTom article.  A couple of words from that article:

“no, you are not correct on your guess about the [_SMALL_TABLE_THRESHOLD] parameter, it controls the caching of the blocks (whether they are cached like a single block IO or cached as we cache full scan blocks) – not the method of IO.”

Before deciding what about the above AskTom article is still correct, take a look at this blog article.  A couple of words from that article:

“When direct path reads starts to happen?  It is known that somewhat reliable figure is your _small_table_threshold multiplied by 5 (mentioned by Tanel Poder on oracle-l recently).”

One final blog article, from the comments section:

“A table is deemed ‘small’ by Oracle if it’s 2% or less of the buffer cache in size. A small table is ‘cached’ when read via a FTS and is not immediately loaded into the least recently used end of the LRU list as becomes a ‘large’ table read via a FTS…”

“When a block is read from a ‘large’ table via a FTS, the blocks are basically loaded into the LRU end of the LRU list, even though they’ve only just been loaded into the buffer cache. As such, the chances of them remaining in the buffer cache is minimal and will likely be immediately overwritten once processed.

However, when a block is read via an index, the blocks are loaded somewhere near the ‘middle’ of the LRU list, not at the LRU end of the list…”

What, if anything, is wrong with the above quote from the book?  Bonus question, what, if anything, is wrong with the Metalink document?  Maybe there is a problem with the other quotes? 

The point of blog articles like this one is not to insult authors who have spent thousands of hours carefully constructing an accurate and helpful book, but instead to suggest that readers investigate when something stated does not exactly match what one believes to be true.  It could be that the author “took a long walk down a short pier”, or that the author is revealing accurate information which simply cannot be found through other resources (and may in the process be directly contradicting information sources you have used in the past).  If you do not investigate in such cases, you may lose an important opportunity to learn something that could prove to be extremely valuable.





System/Kernel Mode CPU Usage – What is Wrong with this Quote?

16 06 2010

June 16, 2010

While reading the June 2010 printing of the “Oracle Performance Firefighting” book I found an interesting paragraph that left me wondering if something more, or perhaps something else should be stated.  This quote is from page 104 (this is a long quote, required to not lose the context of the discussion topic – please excuse any typos):

“System time: This is when a core is spending time processing operating system kernel code. Virtual memory management, process scheduling, power management, or essentially any activity not directly related to a user task is classified as system time. From an Oracle-centric perspective, system time is pure overhead. It’s like paying taxes. It must be done, and there are good reasons (usually) for doing it, but it’s not under the control of the business—it’s for the government. Normally, Oracle database CPU subsystems spend about 5% to 40% of their active time in what is called system mode. If you’re from a non-Unix background, you may be more familiar with the term kernel mode or privileged mode, which is, in essence, system time.”

What, if anything, is wrong with the above quote from the book? 

The point of blog articles like this one is not to insult authors who have spent thousands of hours carefully constructing an accurate and helpful book, but instead to suggest that readers investigate when something stated does not exactly match what one believes to be true.  It could be that the author “took a long walk down a short pier”, or that the author is revealing accurate information which simply cannot be found through other resources (and may in the process be directly contradicting information sources you have used in the past).  If you do not investigate in such cases, you may lose an important opportunity to learn something that could prove to be extremely valuable.





Log File Sync – What is Wrong with this Quote?

16 06 2010

June 16, 2010

While reading the June 2010 printing of the “Oracle Performance Firefighting” book I found an interesting paragraph that left me wondering if something is missing.  This quote is from page 61 (this is a long quote, required to not lose the context of the discussion topic – please excuse any typos):

“As another example, suppose that you receive a call about long commit times. The users are saying ‘submits’ or ‘saves’ are taking longer today than yesterday. Upon an examination of the average log file sync wait times (which indicate commit times from an Oracle perspective), you discover there is no change from the previous day. Therefore, you know the performance issue is not because of Oracle’s commit mechanism and most likely not related to the database.”  [This period marks the end of Part 3, "IO Wait Time Summary with Event Details"]

What, if anything, is wrong with the above quote from the book? 

The point of blog articles like this one is not to insult authors who have spent thousands of hours carefully constructing an accurate and helpful book, but instead to suggest that readers investigate when something stated does not exactly match what one believes to be true.  It could be that the author “took a long walk down a short pier”, or that the author is revealing accurate information which simply cannot be found through other resources (and may in the process be directly contradicting information sources you have used in the past).  If you do not investigate in such cases, you may lose an important opportunity to learn something that could prove to be extremely valuable.








Follow

Get every new post delivered to your Inbox.

Join 143 other followers