November 7, 2010 (Updated November 9, 2010, November 28, 2010, December 8, 2010, February 28, 2011, March 9, 2011)
1,100 Page True-False Quiz, “Definitive” is Left as an Exercise for the Student
(Edit November 28, 2010: Amazon.com permitted me to submit a review for the book, but only for the first 85 or so pages. Much more detail has been added to this review since the last update.)
Based on feedback that this book’s author provided in a comment attached to one of my other (thirteen) Amazon Oracle Database related book reviews, I decided to add a couple of comments at the start of this review. Last year I contributed to a book that was written by a total of sixteen people, co-authoring two of the chapters in that book. While that book contained “Oracle” in its title, and the two chapters that I co-authored were Oracle performance related, the book as a whole is not a competitor of the “Oracle Tuning: The Definitive Reference” book. I am an IT Manager and Oracle DBA, so I am not a competing author or a competing Oracle consultant. Additionally, I only review books that I have purchased using a portion of my paycheck, so my reviews also take into account whether or not I feel that I have received my money’s worth from a book – does the book deliver on the promises of its front cover and its description.
The author of the “Oracle Tuning: The Definitive Reference” book is described as having authored more than 30 books, is “one of the world’s top Oracle Database experts with more than 25 years of full-time DBA experience,” operates an Oracle consulting/training company, is the Editor-in-Chief at Rampant TechPress, participates in Oracle related discussion forums (including one operated by his consulting company), and produces articles for a non-Oracle specific blog.
Why did I buy the “Oracle Tuning: The Definitive Reference” book? Put simply, I was curious. I participated in several discussions with Mr. Burleson on Oracle’s OTN forums where his articles and sections of his previous books were discussed. I brought to his attention a couple of what I felt to be errors and/or omissions from articles and book sections that were linked in the OTN threads. More knowledgeable contributors to the OTN threads admittedly provided more thorough feedback to Mr. Burleson. So, I was curious to know whether or not some or all of those corrections and suggestions became a part of the second edition of this book. The second edition of this book is the first book that I purchased that was published by Rampant TechPress, and the first book that I purchased which was written by Mr. Burleson. I initially ordered the book from Amazon at the end of May 2010 (it was expected to be in stock the first week of June 2010), and when it had not shipped by the end of August, my curiosity subsided due to the delay (and various other reasons) and I cancelled the original order. Near the end of October 2010 my curiosity peaked again, so I placed a new order for the book (at that time the book was expected to be in stock the first week of November 2010). While waiting for the book to arrive, I started reading portions of the first and second editions of the book through the Google books library.
Considering that the word “definitive” appears on this book’s front cover, and the book publisher’s website states the following, I expected to see information that went well beyond showing long listings followed by a sentence or two: “This is not a book for beginners. Targeted at the senior Oracle DBA, this comprehensive book gives you all of the knowledge you need to be successful in tuning even the most complex Oracle database.” So, how well did the book meet the level of expectation that was set by its front cover and publisher’s description?
The first and second editions of this book are partially viewable through Google books, and I suggest that anyone considering the purchase of this book should first read a couple of pages of the book at a local bookstore or through the Google books interface. In comparing the first 30 pages of the first edition with the first 250 pages of the second edition, it is apparent that a much more reasonable font size was selected for the second edition of the book. Several logic, spelling, and grammar errors that are present in the first edition were corrected or removed from the second edition.
Now, moving on to the problems that prevented this book from receiving a five out of five star rating. Internet searches of phrases from the book identified several paragraphs that were directly copied and pasted from other books (most published by Rampant TechPress), other articles, the official Oracle documentation, and in at least one case part of the contents of a script found in the RDBMS/ADMIN directory of the Oracle Database home – the original sources of the material were not mentioned. The book repeatedly emphasizes the goal of changing/flipping initialization parameters, using a scheduling utility, periodically throughout the week (or even throughout the day) in anticipation of workload changes. Unfortunately, a number of the scripts that the author provides to help the reader with monitoring the workload, so that the reader knows when to automatically flip the initialization parameters, do not provide the information claimed in the book. Repeated advertisements of the author’s Ion tool are unnecessary distractions throughout the book. Most importantly, the number of errors and lack of sufficient detail prevent the book from meeting my expectations of a book that is advertised as comprehensive and targeted at the senior Oracle DBA; for example, showing the output of DESC V$SYS_TIME_MODEL and a simple listing of the statistics names in that view cannot be described as comprehensive, or as targeted at the senior Oracle DBA.
This review is roughly 24 typewritten pages in length, several times longer than any of my previous Oracle related book reviews. The significant number of specific errors, omissions, and distractions far exceeds the problems per page count of any other Oracle related book that I have read to date, and the listed items below likely miss less easily identified problems that a senior Oracle DBA with 25 years of experience or an Oracle Certified Master/Oracle Ace should be able to identify. I am a bit disappointed that most of the errors that I previously pointed out to the book author, as well as those errors identified by others, were reprinted, or even added to the second edition of this book. This book truly is not targeted at beginners, but it makes for a spirited true/false quiz or even an essay exam for someone who has a bit more Oracle Database knowledge by having read portions of the official Oracle Database documentation as well as other books. Those DBAs who enjoy hourly parameter modification, running their databases with multiple block sizes, and performing frequent index rebuilds might find value in this book if those DBAs are able to over-look the sometimes ragged joins of sections of multiple books being copied into this book and the scripts that do not work as designed.
Specific errors, omissions, and distractions in order for the first 200 or so pages followed by random selections from other parts of the book (the detail behind the rating assigned to the book) (edit November 11, 2010 ~ 111110: Google Books preview for the book):
- Page 1 contains an advertisement for the author’s Ion product and some sort of a script library that can be purchased.
- Page 6 states that the author “wrote five of the officially authorized Oracle Press books.” Books with the “Oracle Press” logo are not officially authorized by Oracle Corporation. “Oracle Press” is simply a marketing label applied to books published by McGraw Hill.
- Each chapter opens with a cartoon drawing that is typically one half to two-thirds as tall as the page.
- Page 8 states “This tuning approach is concerned with delivering data to the end-users quickly, even if it means that the database will consume additional resources. This approach is associated with the SQL optimizer goal of first_rows_n.” This statement is a bit sloppy. The intention of the OPTIMIZER_MODE parameter’s value FIRST_ROWS_n (where n is one of 1, 10, 100, or 1000) is to return the first n rows as quickly as is possible, with the assumption that the remaining rows in the resultset will not be retrieved. The statement in the book seems to imply that setting the OPTIMIZER_MODE to a value of FIRST_ROWS_n will allow retrieval of ALL of the rows from the database as quickly as possible, with the tradeoff being an increase in resource usage. Technically, it is not the database that is consuming the resources, but instead the instance’s processes.
- Page 10, starting to notice several paragraphs that are a single sentence in length, and this formatting continues throughout the book. SQL code listings also seem to be formatted to consume as much vertical space on a page as is possible.
- Page 10, confusing sentence/paragraph, “The Oracle 11g SQL tuning advisors are a fully automatic SQL tuning approach and that they use a reactive tool approach, waiting watching for problems to occur.”
- Page 10, after stating on the previous page that “reactive tuning deals with the present, commonly using v$ views and Active Session History (ASH),” the book states on page 10, “in reactive tuning, the Oracle DBA fights the symptoms, not the disease… Reactive tuning is also problematic because we wait until end-users have been inconvenienced.” The book author is advocating the use of what he terms “proactive tuning”, which the author describes as attempting to determine the best combination of database parameters and tablespace/table options for each SQL workload. Confusingly, the author also states that his “proactive tuning” method uses Active Session History, Statspack, and the Automated Workload Repository. In my opinion, reacting to what is found in ASH, AWR, Statspack reports, most V$ views, and 10046 trace files should be termed as “reactive tuning”, where the DBA responds to an actual identified problem experienced by the end-users. What the author describes as “proactive tuning” appears to be more along the lines of initialization parameter flipping while end-users are not complaining.
- Page 11 states “For example, if the AWR shows that the demands on the shared pool become very high between 1:00 pm and 2:00 pm, the DBA might trigger a dynamic decrease of db_cache_size and a corresponding increase of the shared_pool_size parameter during this time period.” If Oracle’s buffer cache is using the memory specified by the DB_CACHE_SIZE, the size value specified for DB_CACHE_SIZE cannot be decreased without first manually flushing the buffer cache – flushing the buffer cache to allow the shared pool to grow for an hour is a bad idea. It is important to keep in mind that when the SGA_TARGET parameter is specified, the values for DB_CACHE_SIZE and SHARED_POOL_SIZE specify the minimum values for the parameters.
- Page 12, when describing examples of poor schema design, suggests that databases using extensive data normalization to minimize data redundancy is a poor design because it forces Oracle to perform unnecessary table joins.
- Page 12 states, “If poorly designed PL/SQL is encountered, users may be able to tune the database by using array processing such as bulk collect, forall and ref cursors…” As of Oracle Database 10g, Oracle automatically array fetches 100 rows at a time in cursor FOR loops, which might make it much less likely/necessary that a ‘user’ will re-write the PL/SQL to take advantage of bulk collection’s slightly better performance, assuming that the ‘user’ is not synonymous with ‘end-user’ (reference).
- Page 15 a confusing sentence, “In these cases, we need to adjust the instance parameters need to change, depending on the time-of-day and day-of-week.”
- Page 17, the timeline states, “1990s: Oracle is developed and relational databases dominate the IT market.” This timeline statement, giving consideration to the lack of mention about databases prior to 1990, seems to indicate that Oracle Database version 2 did not exist in 1979.
- Page 17 states, “RAM speed has not improved since the 1970s while CPU speeds get faster every year.” This is an inaccurate statement about memory speed not improving.
- Page 19, figure 1.2, which is described as “The Intel CPU Architecture of the Early 21st Century” seems to be terribly out of place in this section of the book, the acronyms contained within the figure are not explained, and the figure fills half of the page.
- Page 21, figure 1.3, which is described as, “The Changing Dynamics of Human and Hardware Costs” simply shows the intersection of an ascending line labeled as People Costs and a descending curve labeled as Hardware Costs. It might have been helpful to provide some sort of numbers on the illustration to indicate a timeline and monetary figures – at what point did hardware costs and DBA costs equalize?
- Page 22 states, “The number one cause of poor performance is the over-normalization of Oracle tables.” Interesting.
- Page 22 states, “All SQL should use bind variables, preferably in the code or via cursor_sharing=force, to make SQL reusable within the library cache.” In general, most SQL statements should use bind variables, but of course there are occasions where the use of bind variables should not be used. “All” is too strong of a recommendation, and failure to use bind variables does not automatically make the already hard parsed SQL statement unavailable for reuse.
- Page 22 states, “Failure to set pga_aggregate_target to allow the CBO to choose hash joins can result in very slow SQL performance.” It is unclear from this section in the book if the PGA_AGGREGATE_TARGET must just be set to a non-zero value, or if there is some other value that it must be set to in order for Oracle’s optimizer to consider using hash joins. Considering that the optimizer may select to use a hash join when the PGA_AGGREGATE_TARGET parameter is set to a value of zero, and also when that parameter is set to a small value, this statement found in the book is vague.
- Page 25 states “The first tasks when tuning a database are to identify the external bottleneck conditions, which may include: CPU Bottleneck,… RAM Bottleneck,… Network Bottleneck, … and Disk Bottleneck.” While it is important to check these items, I suggest that these probably would not be the first items to check when tuning the database instance’s performance. Yes, checking whether or not the CPUs are overloaded (or heavily loaded) is important, but maybe it would be better to determine if the overload is causing a measurable negative impact on performance by first checking the delta values of the various wait events, system statistics, and system time model statistics before deciding to check, for instance, whether there are network bottlenecks.
- Page 25 states, “When the number of run queue waits exceeds the number of CPUs on the server, the server is experiencing a CPU shortage… Note that a 10% CPU level is not always a concern as modern servers are designed to keep the processors as busy as possible.” The first question is how would someone measure the number of run queue waits. Is this a measurement over a period of time, or is it an instantaneous measurement of the number of processes waiting to run on the CPUs, or is it an instantaneous measurement of the number of processes currently running on the CPUs plus the number of processes waiting to run on the CPUs? The second question is whether or not the system can be considered to be CPU bound before the run queue length matches the number of CPUs in the server; for example, previous pages in the book suggested setting the processor affinity or using VMWare to confine the instance (or its processes) to a subset of the CPUs, or to use the Unix nice command to alter the dispatching priority of processes. The third question is why was 10% selected, and not 50%, 75%, 90%, or 100%? The first edition of the book did not include the word “wait” in the quoted sentence, and I believe that word causes additional confusion in this edition of the book.
- Page 25 states that “Large amounts of Oracle*Net traffic contribute to slow SQL performance.” The book does not describe what is considered “large amounts”, nor what to do about addressing the network bottleneck.
- Page 25 under the heading of disk bottlenecks the only issue identified is “using RAID5 for high update systems.” The adjective “high” is never clarified.
- Page 25 states, while discussing potential solutions for over-allocation of the server’s RAM, that the options for “excessive swapping is to add more RAM, reduce the size of Oracle’s SGAs, or turn on Oracle’s Shared Server.” Are these the only options, or could more suggestions be recommended? For example, “size of Oracle’s SGAs” implies that more than one instance is running on the server – would it make sense to reduce the number of instances on the server (suggesting so would, of course, disagree with an earlier suggestion of moving most/all of the instances to a single high-end internally redundant server)? Are there any reasons not to implement a shared server configuration? What about tuning the SQL, analyze excessive PGA utilization, etc.?
- Page 26, the author appears to misuse the term “network latency”, or at least stretch the meaning of latency as it applies to a network.
- Page 26 and 27 include half page advertisements for the author’s Ion tool. Pages 42, 46, 60, 62, 68, 71, 73, and 89 also include an advertisement for the tool.
- Page 27 states “In a disk-bound database, the majority of the wait time is spent accessing data blocks.” It is quite possible in a non-disk-bound, well performing database instance that the majority of wait time could very well be spent reading data blocks. In fact, these events should be among the top 5 wait events, otherwise there probably is a significant performance problem. It is also quite possible that the system could be disk-bound when the majority of wait time is spent _writing_ to the redo logs, archiving redo logs, and updating data/undo/temp/control files.
- Page 27, the Top 5 Timed Events that are described as demonstrating a disk constrained database, shows 44 waits for the CPU. Waits for the CPU cannot be reported in the Top 5 Timed Events – this data in the report is bogus. The report also shows that the average single block read time is 2.75 seconds, while the average multi-block read time is 0.127 seconds (127ms) – this data in the report is also likely bogus. The report also shows that 1,363 seconds were lost in 673 waits (average of 2.03 seconds) to the “library cache load lock” wait – if that wait appeared in the top 5 report, even in a report showing the single block read time is 2.75 seconds, I think it would be a good idea to start with the “library cache load lock” wait since that is one of the waits that indicates a severe problem if it appears in the top 5 wait events.
- Page 28 states “CPU enqueues can be observed when the CPU run queue exceeds the number of CPUs on the database server. This can be seen by… If the system is already optimized, having CPU time as a top wait event is a positive because the addition of faster CPUs or more CPUs will relieve the bottleneck.” The effects of CPU over-load may be observed long before the CPU run queue exceeds the number of CPUs in the server, whether it be longer average waits for the log file sync wait event, increased waits for latches, increased wait time for single block or multi-block reads, or any number of statistics in the system time model (V$SYS_TIME_MODEL, V$SESS_TIME_MODEL). Additionally, upgrading and/or adding CPUs is not a cheap option from a licensing standpoint, and is not guaranteed to remove CPU time from the top five “wait” event list (it might even become more prominent as more processes may then simultaneously spin while trying to acquire a latch).
- Page 28 shows the Top 5 Timed Events portion of an AWR report that includes 4,851 waits for CPU time – a Top 5 report cannot show the number of waits for the CPU – this data in the report is bogus. Second, the CPU is a limited “wait” resource with an upper utilization limit equal to the number of CPUs in the server multiplied by the elapsed number of seconds, while the other foreground wait events, for the most part, have an upper limit of the average number of sessions (plus parallel related additional processes) connected to the instance multiplied by the elapsed number of seconds. It is quite possible that a server experiencing excessive CPU usage may not cause the “CPU time” to appear in the top 5 timed events. Third, we do not know how many CPUs are in the server – would it make a difference if there was only a single CPU compared to 96 CPUs? Fourth, we have no concept of the elapsed time represented by the report – what if the elapsed time for the report is eight hours (the total reported CPU time is 4,042 seconds)? Fifth, if the single block read time averages more than a second, shouldn’t the investigation start with the IO system? Sixth, the most that can be determined from the top 5 timed events is the average utilization, number of waits, and number of timeouts – why are the number of timeouts not shown in the book? From the top 5 timed events it is difficult, if not impossible, to determine if there wasn’t a brief period of intensive activity, followed by a long period of almost no activity – or if the average activity levels were fairly constant in the time period.
- Page 28, the Top 5 Wait Events portion of an AWR report (what caused the switch from Top 5 _Timed_ Events as shown in the previous example) is labeled as a mockup – probably a good idea that this is labeled as such because the report shows 94,754 seconds where the server was waiting to send data across the network while single block and multi-block reads totaled only 85.75 seconds. That is why you do not use a satellite link for the network backbone.
- Page 29 states “Individual Program Global Areas (PGAs) for each session are a set of running programs that do work for the instance and are referred to as processes.” To put it simply, PGA is much closer to a concept of “an allocated block of memory”, than it is to “a set of running programs” (reference).
- Page 30 describes the DB_CACHE_SIZE parameter as “the number of data buffers to allocate for the instance.” The parameter does not specify the number of data buffers (blocks?), it specifies the amount of memory in bytes to allocate to the DEFAULT buffer cache (or the minimum amount in the event the SGA_TARGET is specified).
- Page 30, a self-conflicting sentence that seems to make no sense. “The most important parameters for instance tuning are those that are immutable because they cannot be changed without starting and stopping the instance or using alter session commands or SQL hints.”
- Page 30, the DB_FILE_MULTIBLOCK_READ_COUNT parameter is listed among the parameters that cannot be changed without bouncing the database – that is not correct. When workload CPU costing is in use, the value of the DB_FILE_MULTIBLOCK_READ_COUNT parameter also has no effect in costing of multi-block reads, in stark contrast to the statement made by the book that it is (implied always) used in optimizer costing calculations.
- Page 30 states “parallel_automatic_tuning:… since parallel full-table scans are very fast, the CBO will give a higher cost-to-index access and be friendlier to full-table scans.” This is simply incorrect. The optimizer does not increase the cost of index access paths when costing parallel access paths that use full table scans – it lowers the cost of the full table scan relative to the parallel degree. Second, parallel full table scans are not automatically very fast, especially if the IO subsystem is bottlenecked.
- Page 30, the description of the OPTIMIZER_INDEX_COST_ADJ parameter is very weak, “this parameter controls the relative costs of full-table scans versus index scans.” It would be much better to state that the parameter specifies the percentage of the index access cost to be retained. Setting the value to a very low value could cause the optimizer to select not only an index access path rather than a full table scan, but to select the wrong index due to cost rounding problems as the value of the OPTIMIZER_INDEX_COST_ADJ parameter approaches 1. Ideally, since at least Oracle Database 10.1 (and possibly as early as 9.0.1) this parameter should be left at the default value of 100 due to the Oracle optimizer’s use of system (CPU) statistics by default (the default behavior of system statistics usage changed in 10.1).
- Page 30, the description of the OPTIMIZER_INDEX_CACHING parameter is vague – the book doesn’t indicate when the value of the parameter is used in cost calculations, and when it is not used.
- By page 30 the book has recommended the use of SQL Performance Analyzer, SQL Tuning Advisor, AWR, ASH, accessing the various DBA_HIST views, and accessing the various performance views in Enterprise Manager without mentioning the various extra cost licenses required to use those features. For example, the book should have mentioned that the Diagnostic Pack must be licensed for access to AWR information, and the Diagnostic and Tuning Packs must be licensed for access to the SQL Tuning Advisor, long before any mention of needing to purchase additional licenses on page 31.
- Page 33, the half-page SQL script printed in the book and in the source code depot is Oracle Database’s DBA_FEATURE_USAGE_STATISTICS view definition found in the catdbfus.sql file located in the RDBMS/ADMIN directory of the Oracle home. For some reason this script is marked as “Copyright © 2005 by Rampant TechPress” in the script library for the book, while the original version found in the Oracle home is marked “Copyright (c) 2002, 2005, Oracle. All rights reserved.” No effort is made in the book to interpret the meaning of the columns returned by this view.
- Page 34 states “Starting in Oracle 10g, statistics collections are automated, but users may still need to selectively add histograms and other specialized optimizer statistics.” The book does not state what triggers the statistics collection for a specific object, at what time the statistics are collected, that histograms (maybe excessively) are automatically generated, nor the potential problems that might be encountered after the statistics collection.
- Page 34 indicates that table compression is introduced in Oracle 11g. That feature was actually introduced in Oracle 9i R2 (reference).
- Page 35 states “In Oracle 11g we have SQL profiles and the SQL Access Advisor to help identify sub-optimal SQL statements. Once identified, the SQL profile utility will allow changes to execution plans without adding hints.” This is untrue – a SQL profile is a series of hints that are automatically generated and applied to the SQL statement during cost-based optimization.
- Page 36 mentions that Oracle 10g will sometimes automatically rewrite a SQL statement into a more efficient form, but does not provide much, if any, detail describing why this is done, how to see that it has happened, or what to do if the rewritten form is not more efficient (Oracle 9i’s cost-based optimizer was also able to rewrite SQL statements).
- Page 39, frequently uses the term “Oracle” to mean both Oracle Corporation and Oracle Database, even in the same sentence, and that makes it a bit difficult to comprehend the intention of some of the passages in the book. For example, “Oracle first addressed this problem by developing special advisory utilities in Oracle.”
- Page 39, the text at the bottom of the page appears to be a copy and paste of text that appeared in a book titled, “High Performance Data Warehousing” which was originally copyrighted in 1997.
- Pages 40, 90 is the large Trumpet of Doom graphic useful?
- Page 46, the bottom half of this page is largely a direct copy and paste of pages 8 and 9 of an article titled “Metric Baselines: Detecting Unusual Performance Events Using System-Level Metrics in EM 10GR2” that was written by John Beresniewicz at Oracle Corporation. The article is marked Copyright 2005 by Oracle Corporation (reference which was hosted by permission of the author).
- Page 47 states, “When the data buffer hit ratio falls below a predefined threshold, it might be useful to add RAM to the data buffer cache.” It is not a good idea to tune databases based on the buffer cache hit ratio, a ratio that started falling out of favor by the mid to late 1990s (reference).
- Page 49 states that when the buffer cache hit ratio drops below 70%, the DBA should set up the system to dynamically adjust the buffer cache.
- Page 51, the text in the last section of the page is largely a copy and paste of page 36 of the book “Oracle Silver Bullets: Real-World Performance Secrets”.
- Page 52, it would have been helpful if the book briefly described why each of the statistics that were listed are the most commonly used statistics for Oracle exception reporting.
- Page 53, the SQL statement lacks an ORDER BY clause, which possibly means that the SQL*Plus BREAK ON command will add extra line breaks in unexpected positions in the output.
- Page 56, table 2.2 is almost an exact copy and paste, ignoring slight formatting differences, of the Oracle Database Reference documentation from Oracle Corporation for DBA_HIST_TEMPSTATXS (reference).
- Page 57, the SQL statement does not work as described for a variety of reasons, including the cumulative nature of the statistic values, with no attempt to calculate the delta values of the statistics. This is one of the SQL statements that the book suggests for trend identification with the goal of scheduling the automatic changing of initialization parameters.
- Page 58, the SQL statement does not work as described for a variety of reasons, including the cumulative nature of the statistic values with no attempt to calculate the delta values of the statistics.
- Page 61, the SQL statement does not work as described for a variety of reasons, including the cumulative nature of the statistic values with no attempt to calculate the delta values of the statistics.
- Page 65 simply shows a DESC for V$SYS_TIME_MODEL and V$SESS_TIME_MODEL, and includes a list of the STAT_NAME values found in those views without attempting to describe the meaning of the STAT_NAME values.
- Page 67, the author does not explain the parent-child relationships of the statistics from V$SESS_TIME_MODEL, rather the query just sorts the statistics in descending order by time.
- Page 69, the SQL statement does not take into account that some of the sessions may have been connected for more than a year, while other sessions may have been connected for only a couple of minutes. Consider how a check of the db time statistic for the two sessions might show similar values for the two sessions – which of the two sessions represents more of a potential impact on system-wide performance?
- Page 70, the first SQL statement querying AWR data that compares two snapshots, unfortunately the script header in the book does not match the script header in the script library for the book, making it difficult to locate the script. Matching scripts listed in the book with those in the script library is difficult for many of the scripts due to the scripts being labeled differently in the book and script library.
- Page 74, the book states that AWR and Statspack reports show time model statistics, and then proceeds to show operating system statistics from an Oracle Database 10.1 report which shows statistics ending with _TICKS, rather than _TIME which became standard starting with Oracle Database 10.2. It would have been helpful if this section of the book were updated to show the _TIME statistics, or at least mention that the reader will likely see _TIME statistics rather than _TICKS statistics.
- Pages 81 through 83, the discussion of Oracle Data Miner and decision support systems seems to be off topic for this book.
- Page 85 states, “this can be plotted as a 1/x function and the exact optimal point computed as the second derivative of the function 1/x as shown in Figure 4.1.” Many fancy words in this sentence, but no example is provided that shows how to translate the displayed statistics into the graph, how to calculate the second derivative of the 1/x function, or exactly what x is. Quite often the data contained in the V$DB_CACHE_ADVICE view does not conform to any kind of curve that is represented by figure 4.1.
- Page 87, the SQL statement does not work as described for a variety of reasons, including the cumulative nature of the statistic values with no attempt to calculate the delta values of the statistics. This is one of the SQL statements that the book suggests for trend identification with the goal of scheduling the automatic changing of initialization parameters.
- Page 91, the cumulative nature of the AWR data seems to imply that the author’s database is bounced late every Sunday night – other than that, the chart’s information is nearly useless because the delta values of the AWR statistics were not calculated.
- Page 92, table 4.2 appears to be a direct copy and paste of the Oracle documentation library for DBA_HIST_ACTIVE_SESS_HISTORY (reference).
- Page 93, the SQL statement does not work as described for a variety of reasons, including the cumulative nature of the statistic values with no attempt to calculate the delta values of the statistics. One might wonder how this SQL statement would behave in a RAC environment.
- Page 95, table 4.3 appears to be a direct copy and paste of the Oracle documentation library for DBA_HIST_FILESTATXS (reference).
- Page 96, it is interesting that the script output on page 96 matches the script output on page 57. The SQL statement does not work as described for a variety of reasons, including the cumulative nature of the statistic values with no attempt to calculate the delta values of the statistics.
- Page 100 contains what is essentially a full-page advertisement for the author’s remote DBA consulting company.
- Page 102, two unnecessary cartoon drawings, one apparently depicting an upset IT manager, and another depicting a person wearing a ski mask while working on a computer.
- Page 103, the case study suggests that changing the OPTIMIZER_MODE from the default of ALL_ROWS to FIRST_ROWS_100, followed by restarting the database instance corrects performance in OLTP type databases. “Their DBA did not understand the difference between the optimizer goal of all_rows, which minimizes computing resources and favors full-tale [sic] scans and the first_rows_n optimizer which sacrifices fast throughput in favor of fast access via indexes.” First problem with this advice: bouncing the database is not necessary when changing the OPTIMIZER_MODE parameter’s value – simply changing the parameter value will automatically force a hard parse of each SQL statement the next time that SQL statement is executed – the client experienced an unnecessary system outage when the database was bounced, as well as losing the performance benefits derived from the blocks already residing in the buffer cache. Second problem with this advice: it might simply be the case that due to the presence of histograms, some of the execution plans were less than optimal due to the specification of uncommon (or very common) bind variable values on the initial hard parse as a result of the optimizer peeking at the initial bind variable values to generate an optimal execution plan for those values. Adaptive cursor sharing, introduced in 11.1, reduces the risks associated with the combination of bind variable peeking and histograms on columns (this was not mentioned). By changing a global setting, such as the OPTIMIZER_MODE parameter, rather than spending a couple of minutes enabling a 10046 trace for a slow session, or examining an AWR or Statspack report to identify resource intensive SQL statements, the author possibly caused other performance problems for any SQL statement that, for example, might exist in a report which typically returns more than 100 rows. If a query is expected to return 100 or fewer rows, the execution plans should be identical when the OPTIMIZER_MODE parameter is set to ALL_ROWS and when it is set to FIRST_ROWS_100 (this value tells the optimizer to assume that all rows after the first 100 will be discarded) which probably describes most of the queries that are commonly executed in OLTP databases (with the exception of a few). Third problem with this advice: the author did not describe why FIRST_ROWS_100 was selected rather than FIRST_ROWS_1, FIRST_ROWS_10, or FIRST_ROWS_1000 (or actually even provide a good explanation why ALL_ROWS was the wrong setting).
- Page 104 states, “in this emergency, an Oracle9i shop called complaining about a serious degradation in SQL performance right after implementing a partitioned tablespace.” I was not aware that a tablespace could be partitioned, and a search of the Oracle documentation library found nothing on this topic. The author provided a DBMS_STATS.GATHER_SCHEMA_STATS command with several parameters. Only the DEGREE parameter was explained, stating that it allowed collecting the statistics using 15 parallel processes. I wonder if the author reset the parallel degree of the indexes belonging to the schema from the resulting value of 15 to the normal value of 1?
- Page 104, the author references output generated by his plan9i.sql script, and indicated that the output showed unnecessary full table scans of large tables. We really do not know if this database had been online for a couple of hours, or a couple of years based on the output. In examining the contents of the plan9i.sql script, there are a couple of problems with the script. First, it queries V$SQLAREA rather than V$SQL, so it is completely blind to the possibility that multiple child cursors might exist for the same SQL statement, each with different execution plans. Second, the script reports on all SQL statements that are still in the library cache when the sum of the executions of those SQL statements is ten or greater when grouped by the table’s name – this sum becomes the NBR_FTS column. Therefore, the NBR_FTS column does not take into account the fact that a full table scan of a table may be performed more than once per execution of the SQL statement, and fails to count the full table scans that happened for SQL statements already aged out of the library cache. Third, the case study did not discuss how to examine whether or not the full table scans were more efficient than an available index access paths, or how it was found that a common WHERE clause in the query contained a bind variable name that was wrapped in single quotes. Fourth, how did creating a single index fix the five tables in the output, yield a 50 fold performance improvement, and reduce disk I/O by 600%?
- Page 105, how did creating a single function based index fix the five tables in the output and result in a five-fold performance improvement?
- Page 105, the silver bullet tip to change the CURSOR_SHARING parameter to FORCE did not warn of any of the possible side-effects, which may include poor performance when cursors that should not be shared because uneven data distributions may cause the optimizer to generate execution plans that are inappropriate for other literal values, or bugs related to CURSOR_SHARING that either result in wrong data being returned, or instance stability problems. It would be interesting to see how the 75% performance improvement was achieved by changing this parameter.
- Page 106, as mentioned previously, the author’s plan9i.sql script shows something other than what is described, so the output presented in the book may or may not be useful. Additionally, the author did not state whether the database had been online for an hour when the report was generated, or three years. The author’s buf_blocks.sql script, mentioned in this section of the book, is a bit interesting. The script builds a normal table T1 by selecting a count(distinct file# || block#) of the blocks in the buffer cache, joined to DBA_OBJECTS, when grouping on the OWNER, OBJECT_NAME, SUBOBJECT_NAME, and OBJECT_TYPE. The script then joins the T1 table to DBA_SEGMENTS in order to calculate the percentage of the object’s blocks that are currently in the buffer cache for all objects with at least 11 blocks in the buffer cache. Why does the script’s output, that is printed in the book, show six tables with seven or fewer blocks in the buffer cache. Additionally, if these table blocks really were accessed frequently, wouldn’t Oracle automatically keep the blocks in the buffer cache by promoting the blocks closer to the MRU end. The author’s buf_keep_pool.sql script is also a bit interesting, as it generates SQL statements to alter all objects that have more than 80% of their blocks in the buffer cache, so that the objects will reside in the KEEP buffer cache (if not already in the KEEP buffer cache). This seems slightly counter-intuitive to move these objects out of the DEFAULT or RECYCLE buffer cache and into what is likely a smaller KEEP buffer cache. Additionally, this script could not have been used as described in the book because all of those tables listed in the book had less than 0.45% of their blocks in the buffer cache.
- Page 107, technically, the 2GB per process limit on 32 bit Windows can be raised to almost 3GB (Metalink recommends lowering the maximum from 3GB) by adding the /3GB parameter to the boot.ini file (reference).
- Page 107, Figure 5.1 is described as showing the output of the author’s plan9i.sql script. Figure 5.1 actually shows a “Top 5 Timed Events” report, possibly from Statspack. The book states that the database was crippled by db file scattered read wait events, yet the “Top 5 Timed Events” printed in the book shows a much more sinister problem. While 3,246 seconds were spent on the db file scattered read wait event (average wait time 0.010 seconds), 1,363 seconds were spent on the library cache load lock wait event (average wait time 0.29 seconds), and 7,146 seconds were spent on the db file sequential read wait event (average wait time 0.013 seconds). If I was presented with the same “Top 5 Timed Events”, I probably would have been focusing on the time lost to the library cache load lock wait, rather than trying to create a materialized view. It is rather amazing, considering the “Top 5 Timed Events”, that the three materialized views reduced disk I/O by more than 2,000 percent and improved database performance 30 fold.
- Page 108, the case study that suggests implementing bitmap indexes seems to be flawed. An AskTom thread detailed the flaws better than I am able to do so (reference).
- Page 108, the adding freelists case study lacks sufficient detail for a person to be able to use that case study to help solve a problem experienced by the reader. For example, the book does not describe how the author was able to examine V$SQL and discover that “virtually all” of the DML statement were inserts into the CUSTOMER_ORDER table – what in V$SQL did the author examine? Were there no SELECT statements (those are considered DML also)? The author stated “the top timed event was buffer busy waits and it was clear that there were enqueues on the segment header blocks for the table and its indexes.” The author does not describe how the transition from “virtually all” DML were inserts into a single table (I wonder if the header detail and order line detail were both contained in the same CUSTOMER_ORDER table) was made to stating that the buffer busy wait problem was in fact indicating a problem with waits for the segment header blocks for the CUSTOMER_ORDER table and its indexes. Was it just coincidence because the two seemed to be happening at the same time, and was therefore a lucky guess? The author’s website stated that this was a 188.8.131.52 database (reference), so how did the author determine that the buffer busy waits where not caused by the same action that is reported as the read by other session wait in 10g? Did the author check V$WAITSTAT, V$SESSION_WAIT, generate a 10046 trace, or do something else? Setting the freelists for the CUSTOMER_ORDER table and an index named CUST_PK to a value of 5 immediately saved the client $500 per minute? Finally, what conditions would have to be present for a buffer busy wait to last 30 seconds, was someone playing with bitmap indexes (assuming that some of the sessions were also waiting in enqueue waits), and would that problem be fixed by altering the freelists? (Edit: December 9, 2010: this $500 per minute fix is also described on page 697, but in this case the book mentions that the Oracle Database release version is in fact 184.108.40.206).
- Page 110 states, “with the Internet, there is a wealth of information about Oracle troubleshooting and some of it is good, while most of it is absolute garbage. Every self-anointed Oracle expert is touting their own methodology for Oracle troubleshooting, and some are absolute zealots in proclaiming their approach as the ‘best’ method.” I am having a little trouble translating these two sentences. Is the author stating that the Oracle experts who smear themselves with perfumed oil will refuse to believe any other method is better? Or is the author stating that some people who self-proclaimed that they are one of the best Oracle experts in the world, have a tendency to game the Internet search engines? Either way, the two sentences should be rewritten to clarify the author’s intention.
- Page 111, the quote printed in the book from the documentation appears to have been re-worded in the 11g documentation, however I am not sure that the quote indicates that Oracle Corporation recommends the use of the buffer cache hit ratio for much of anything.
- Page 112, the book describes how in a well-tuned database the buffer cache hit ratio metric is useful. Half of page 112 is devoted to showing a buffer pool advisory for an instance with a 52MB buffer cache.
- Page 115, the script printed in the book for calculating the buffer cache hit ratio accesses SYS.V_SYSSTAT, rather than the more common view synonym V$SYSSTAT. The printed script also does not match the buffratio.sql script from the book’s script library, where the book specifies the “db block gets”, while the script from the script library specifies “physical reads direct (lob)”; the book specifies “consistent gets”, while the script from the script library specifies “session logical reads”. Worse yet, neither formula matches the formula printed in the Oracle Performance Tuning Guide, which is part of the official Oracle Database documentation (reference).
- Page 117 states, “by default, this [TIMED_STATISTICS] parameter is set to false, which disallows the collection of wait times for each wait event defined in the Oracle engine.” This statement is incorrect as of the release of Oracle 10.1. The TIMED_STATISTICS parameter defaults to TRUE when the STATISTICS_LEVEL parameter is set to either the default value of TYPICAL, or the value of ALL (reference).
- Pages 117-118 lists several wait event names that the author states “provides no real value to a DBA who is trying to figure out where a database bottleneck exists.” The lists includes: lock element cleanup, SQL*Net message from client, SQL*Net more data to client, and PL/SQL lock timer. The first of these wait events does not exist in an Oracle 10.2 database, and might have been made deprecated around the time of Oracle 8i. The SQL*Net message from client wait event, at the session level, will tell the DBA when the bottleneck is not in the database, and that could be extremely valuable information. The SQL*Net more data to client wait event was highlighted on page 28 of this book as identifying a “Network Constrained Database”. (reference)
- Pages 118-119, while the script does execute, it probably should be updated to use analytic functions to eliminate the need of joining two inline views that both query SYS.V_SYSTEM_EVENT. Additionally, it would probably be helpful if the script also eliminated most of the idle wait events introduced since Oracle 10.1, for example: jobq slave wait, DIAG idle wait, shared server idle wait, reliable message, etc. Additionally, the script in the script library will not remove most of the idle wait events that are listed in the script because a blank space appears before the closing single quote for each of those wait events: ‘pmon timer ‘, ‘rdbms ipc message ‘, ‘smon timer ‘, etc. The book did almost nothing with the script, other than to display the results in some sort of graphical query tool (most of the previous query results used SQL*Plus to show the results).
- Pages 120-121, the sesswaits.sql script does not appear to be in the script library for the book. The script in the book has an obvious typo near the beginning: “a.a.total_waits,”. Like some of the other SQL statements, this SQL statement does not use the more common (those that are in the Oracle documentation) synonym names, but instead lists row source names such as SYS.V$_SESSION_EVENT. A minor improvement would be to use meaningful aliases for row sources, rather than generic aliases such as A, B, and C. This script attempts to remove some potentially helpful idle wait events, while not excluding some of the more recently introduced idle wait events. The book did almost nothing with the script, other than to display the results in some sort of graphical query tool (most of the previous query results used SQL*Plus to show the results).
- Pages 121-122, the csesswaits.sql script does not appear to be in the script library for the book. This script attempts to remove some potentially helpful idle wait events, while not excluding some of the more recently introduced idle wait events. The book did almost nothing with the script, other than to display the results in some sort of graphical query tool (most of the previous query results used SQL*Plus to show the results). It is a bit disappointing that the author made little attempt to explain the columns returned from V$SESSION_WAIT (sys.v_$session_wait in the script).
- Pages 122-123, the objwaits.sql script does not appear to be in the script library for the book. The book states that the script may be used if enqueue waits are present in V$SESSION_WAIT to determine which object and datafile are “the holdup.” The objwaits.sql script is not specific to enqueue type waits, and in some cases the ROW_WAIT_OBJ# from V$SESSION may not contain information about the object (showing a value of -1), while V$LOCK might expose the OBJECT_ID in the ID2 column during an enqueue wait.
- Page 124, sentence uses the word “undo” rather than “undue”, “… to quickly pinpoint resource-intensive SQL code that is causing undo strain on the database.”
- Pages 124-125 state that the number one Oracle Performance Myth is that, “eighty percent of a database’s overall performance is derived from the code that is written against it.” An Internet search for this sentence finds essentially the same set of supporting paragraphs in a couple of other books (“High-Performance SQL Server DBA”, “Oracle Performance Troubleshooting with Dictionary Internals SQL & Tuning Scripts”). Oddly, the publisher’s website’s description for the book “Oracle Performance Troubleshooting with Dictionary Internals SQL & Tuning Scripts Second Edition” uses essentially the same sentence as justification for its chapter 8 contents (reference reference2).
- Pages 126-134 appear to be a copy and paste of pages 406-421 of the book “Oracle Tuning Power Tuning Scripts” with a small number of modifications. It might be interesting to note that the script outputs in this section of the book also use a graphical screen capture from some program, rather than the raw SQL*Plus style output seen in the earlier portions of the book.
- Page 126 recommends that temporary tablespaces be set to autoextend. Run-away queries with missing join conditions can cause a lot of problems if temporary tablespaces are in fact set to autoextend with the maximum size set to unlimited (32GB for an 8KB block size database when a single file is used).
- Page 126-127, the spacesum.sql script does not appear to be in the script library for the book. How the script operates is not well described, and the top half of the UNION ALL returns a VARCHAR2 in the last column position, while the bottom half of the UNION ALL returns a NUMBER in the last column position. It is not clear whether or not the query works as intended.
- Page 128, the datafileae.sql script does not appear to be in the script library for the book. This query joins SYS.DBA_DATA_FILES to SYS.FILEXT$ to determine which datafiles are configured to extend automatically. It appears that the join to SYS.FILEXT$ is unnecessary as DBA_DATA_FILES contains the column AUTOEXTENSIBLE.
- Page 129, the tsfrag.sql script does not appear to be in the script library for the book. It is possibly interesting that the script calculates the square root of the square root of the COUNT of a column in DBA_FREESPACE when grouped by the tablespace name.
- Page 130, the dffrag.sql script does not appear to be in the script library for the book. This query joins SYS.DBA_DATA_FILES to SYS.FILEXT$ to determine which datafiles are configured to extend automatically. It appears that the join to SYS.FILEXT$ is unnecessary as DBA_DATA_FILES contains the column AUTOEXTENSIBLE.
- Page 131, the odd formatting were a single sentence becomes a paragraph has essentially changed the meaning of the second paragraph on the page such that it could be misunderstood. The fragmentation terms honeycomb and bubble were used, but never defined.
- Page 131, the tsmap.sql script does not appear to be in the script library for the book. One might be curious to determine if any of these scripts for detecting fragmentation that examine extents in datafiles, if they work as described, are useful since the introduction of locally managed tablespaces in Oracle Database 8i roughly a decade ago (reference).
- Page 132 states, “as a result of repeated insert and delete activity, tables can become internally fragmented and contain a lot of wasted space.” I suspect that this is a confusing sentence, it seems that the book implies that when a row is deleted, the space previously occupied by the row cannot be reused – this is simply not true. The book continues with, “In the same way, indexes can become fragmented so that their depth reaches unacceptable levels.” The second quote also seems to imply that when an index entry is deleted, that the space cannot be reused. Additionally, is this the definition of fragmented indexes that the author uses occasionally?
- Page 132-133, the maxext.sql script does not appear to be in the script library for the book. This script appears to be very similar to one that appears on page 144 in the book “Oracle Performance Troubleshooting: With Dictionary Internals SQL & Tuning Scripts” (even the graphical version of the query output matches), and for some reason the SQL statement excludes the SEGMENT_TYPE of CACHE, but not any of the other segment types (referencereference2)
- Pages 129-144 appear to be copied and pasted, with minor changes, from pages 131-170 of the book “Oracle Performance Troubleshooting: With Dictionary Internals SQL & Tuning Scripts” (even the graphical version of the query output matches reference).
- Page 133, the objdef.sql script does not appear to be in the script library for the book.
- Page 134, potentially confusing sentence, “Sometimes, however, this is not the case, and the table suddenly finds itself containing chained or migrated rows, which are rows that span more than one data block.”
- Pages 134-136, the tabreorg.sql script does not appear to be in the script library for the book. This script uses the RULE hint, which according to the Oracle documentation is no longer supported as of Oracle Database 10.1 (reference).
- Pages 137-138, the tabreorg.sql script does not appear to be in the script library for the book. This script uses the RULE hint, which according to the Oracle documentation is no longer supported as of Oracle Database 10.1.
- Page 138, a sentence seems to imply that a high clustering factor, or having the BLEVEL greater than four may be a reason to either reorganize (rebuild) an index or drop it. Drop – maybe, rebuild – no. Changing the clustering factor for an index requires changing the order of the rows in the table, although the clustering factor calculation can be thrown off if the statistics collection sampling percent is too low.
- Page 140, the sgasize.sql script does not appear to be in the script library for the book.
- Pages 143-144, the memsnap.sql script does not appear to be in the script library for the book.
- Page 145, the poolhit.sql script does not appear to be in the script library for the book. Page 145 also states, “However, when properly computed, the buffer cache hit ratio is an excellent indicator…” The formula provided for calculating the buffer cache hit ratio appears to be incorrect, according to the formula that appears in the 11g R2 Performance Tuning Guide from the Oracle documentation library.
- Page 145-146, the sesshitrate.sql script does not appear to be in the script library for the book.
- Page 147, the sqlhitrate.sql script does not appear to be in the script library for the book. Considering that it is possible for a SQL statement to have more than one execution plan, if this SQL statement is producing useful information it probably would have been a better idea to query V$SQL rather than V$SQLAREA.
- Pages 148-156 appear to be copied and pasted, with minor changes, from pages 182-196 of the book “Oracle Performance Troubleshooting: With Dictionary Internals SQL & Tuning Scripts” (reference) .
- Page 148, the explain plan format pre-dates the release of Oracle 10.1, and shows the use of the OPTIMIZER_MODE CHOOSE which became deprecated with the release of Oracle 10.1. The book calls a listing “runtime statistics”, but does not explain how the statistics were generated (likely the output of SQL*Plus’ AUTOTRACE functionality).
- Page 149 states, “The amount of logical reads has been cut in half simply by using the new 16K tablespace and accompanying 16K data cache.” It has been demonstrated to the book author multiple times why this test case is flawed (reference reference2 reference3 reference4).
- Page 149, the cacheobjcnt.sql script does not appear to be in the script library for the book.
- Page 150, the buffutl.sql script does not appear to be in the script library for the book. The query joins SYS.X$BH to SYS.TS$ four times, rather than performing the join a single time with either a DECODE or CASE statement to differentiate the different values for the STATE column.
- Page 152, the libdet.sql script does not appear to be in the script library for the book.
- Page 153, the libwait.sql script does not appear to be in the script library for the book.
- Page 153, the libobj.sql script does not appear to be in the script library for the book.
- Page 155, the dictdet.sql script does not appear to be in the script library for the book.
- Page 156-157, the text at the bottom portion of page 156 through the middle of page 157 appears to be copied and pasted from page 306 of the book “Oracle Tuning Power Scripts” (reference).
- Page 157-158, the bottom half of page 157 through the top half of page 158 are supposed to demonstrate the wait events and statistics that will be present when the LOG_BUFFER parameter is undersized. As of Oracle Database 10.2 the LOG_BUFFER parameter is automatically set to a value that is slightly smaller than the granule size, so having a value for that parameter not specified is not a significant risk starting with 10.2. Page 158 has been addressed by other contributors on the Oracle OTN forums (reference reference2) .
- Pages 158-195, the bottom of page 158 through page 195 appears to be copied and pasted (with minor modifications) from page 197-275 of the book “Oracle Performance Troubleshooting: With Dictionary Internals SQL & Tuning Scripts” (reference).
- Page 159, the memhog.sql script does not appear to be in the script library for the book.
- Page 160, the first paragraph on this page probably should have appeared on page 147, shortly after the script mentioned in the paragraph appeared in the book, rather than in a section of the book that describes session memory usage.
- Page 161, which appears to be describing Oracle Database 8i and earlier, states, “if the memory sort ratio falls below 90%, increasing the parameters devoted to memory sorts, sort_area_size and sort_area_retained_size, may help.” The first question someone might ask is why 90%, and not 99% or 50%? It would probably be a better idea to determine which queries are causing the sorts to disk. Maybe increasing the SORT_AREA_SIZE to 1000 times it original value will still not cause the memory sort ratio (which the book’s script calculates since the last bounce of the database) will not allow the ratio to increase to 90% – then what. The second question someone might ask is what do these parameters do – if the reader sets the parameters to 500M to try to get the memory sort ratio to 90%, what kinds of problems might happen?
- Page 161 seems to indicate that the PGA_AGGREGATE_TARGET will not be used for shared server connections. While that was correct for Oracle Database 9i R1 and 9i R2, it is not the case starting with Oracle Database 10g R1 (reference).
- Page 161 recommends that on a Unix server the PGA_AGGREGATE_TARGET be set to 90% of the memory remaining in the server after the database instance starts. There are a variety of reasons why this is a bad idea, the fact that the PGA_AGGREGATE_TARGET is not a hard upper limit for PGA memory consumption is just one of those reasons.
- Page 162, the term “one-pass memory” is mentioned but never defined. The use of the term seems to imply that a workarea execution is either optimal or one-pass – are multi-pass workarea executions never a possibility?
- Page 162, the bufobjwaits.sql script does not appear to be in the script library for the book. Additionally, the script does not appear to be RAC aware as the statistic “gc buffer busy” was ignored and the query did not select from GV$SEGMENT_STATISTICS.
- Page 163, the latchdet.sql script does not appear to be in the script library for the book. I suspect that it could be debatable whether or not the author’s specification that a 1% latch miss ratio indicates that further investigation is required – a better approach would probably start by seeing if a specific type of latching is contributing a significant amount of time to the wait events between a known start and end time.
- Page 164, the currlwaits.sql script does not appear to be in the script library for the book. This script uses SYS.V_$ view names rather than the officially documented synonym names, and uses alias “a”, “b”, and “c” rather than meaningful aliases for each of the views. The query also contains an unnecessary subquery that queries SYS.V_$EVENT_NAME.
- Page 165, while the globiostats.sql script is found in the script library, the statistic values that are requested seem to omit those statistics that are introduced in Oracle Database 10.1, and provide statistics values since the last open of the database. I suspect that nightly backups, statistics collection, or scheduled index rebuilds might distort the statistics retrieved by the script. Nothing is done with the script in the book – the book immediately led into another script.
- Page 166, while the script syswaits.sql is in the script library for the book, the script printed in the book differs from what is found in the script library. Additionally, the script in the script library includes an extra space before the closing single quote that follows each wait event name that should be excluded from the report.
- Page 167, the advice about the “latch free” and “enqueue” waits predates the release of Oracle Database 10.1. The advice about the “db file scattered read” wait event could cause unnecessary concern – that wait event should be one of the top wait events in a healthy database instance.
- Page 168, the globaccpatt.sql script does not appear to be in the script library for the book. The book makes no attempt to interpret the statistic values returned by the script, nor the meaning of the statistics. The book does state that “large-table full-table scans can indicate sub-optimal SQL and/or missing indexes, cases where the SQL needlessly reads all blocks in the table.” The book does not state which of the statistics indicate a “large-table full-table scan” – is it the “table scans (long tables)” statistic? It is also quite possible that full table scans of long (large) tables may just indicate that everything is working as it should due to the data requirements, indexes with poor clustering factors, or maybe the database is simply a data warehouse. For clarity, it is not the SQL that needlessly reads all of the blocks in the table.
- Page 168 contains an advertisement for the author’s ION program.
- Page 169 states that “if standard DBA practices have been followed and indexes are placed in their own tablespace, the I/O statistics for that tablespace can be examined to determine if the indexes are actually being used.” Is it really a standard DBA practice to separate indexes from their tables by placing each into a different tablespace? How would seeing the I/O statistics for a tablespace (more correctly the datafiles for the tablespace) really indicate if an index were used – an index may be used by Oracle Database’s optimizer without actually being used for query execution.
- Page 169, the fileio.sql script found in the book differs significantly from the script by the same name found in the script library for the book. The script found in the book seems to unnecessarily join several objects owned by the SYS user when a much more simple query could be written using the documented synonym names V$FILESTAT, V$TABLESPACE, V$DATAFILE, and V$TEMPFILE.
- Page 170, the term “hub tables” is used without defining the term.
- Page 170-171, the purpose of the toptables.sql script does not seem to be well defined, and might contain one or more errors. The script returned no rows when tested.
- Page 172, the largescan.sql script, identified as largescan9i.sql in the script library, fails to recognize that a table referenced in an execution plan and identified as experiencing a full table scan may actually be full table scanned many times when the SQL statement is executed once, even if the table is listed only once in the execution plan.
- Page 172, a “large table” is defined within the script as a table that is more than 1MB in size. If we associate this definition with the term “large-table” found on page 168, the reader might be a bit confused.
- Page 173 states “one rule of thumb is that if any session is currently consuming 50% or more of the total I/O, that session and its SQL need to be investigated further to determine what activity it is engaged in.” Why 50%? Would it make a difference if there are 2 or 2,000 sessions connected? Would it matter how many sessions are actively executing SQL statements? Over what period would the DBA examine to see if a session meets the 50% criteria? The provided script, physpctio.sql, bases the time period on the length of time that the session has been connected to the database.
- Page 174, the totpctio.sql script is nearly identical to the physpctio.sql script from the previous page, and suffers from the same time scoping problem.
- Pages 175-176, the topiousers.sql script is certainly long, and aside from renaming some of the statistic names from SYS.V_$STATNAME, really does not introduce any new information that was not provided by the earlier scripts. The script has time scoping problems, and no effort is made to explain the meaning of the statistics returned by the script.
- Page 177, the bgact.sql script is apparently not found in the book’s script library. The script is described as “providing a handle on DBWR and LGWR activity,” yet the script retrieves the “dirty buffers inspected” foreground statistic and a “write requests” statistic that does not seem to be a valid statistic name (reference).
- Page 178 states, “seeing non-zero values for the DBWR summed dirty queue length typically indicates that buffers are being left in the write queue after a write request.” The “DBWR summed dirty queue length” statistic is apparently not an Oracle Database statistic, so seeing non-zero values might mean that something else is wrong.
- Page 178 states, “non-zero values for the redo log space wait requests and redo log space wait time statistics could indicate the log buffer setting is too low.” There is no statistic named “redo log space wait requests” – the actual statistic name is “redo log space requests”. These statistics are not associated with the redo log buffer size, but instead the size of the redo log files in relation to the amount of redo information that needs to be written to the log files (reference).
- Page 178, the archhist.sql script does not appear to be in the script library for the book. The script summarizes by day, rather than a smaller time period such as an hour. This might provide misleading information if on average 4 redo logs are archived per hour, but during busy time periods during the day archived redo logs are archived at a rate of once per minute – the average over 24 hours will hide important information about a problem.
- Page 179, the rolldet.sql script does not appear to be in the script library for the book. The script does not permit time scoping, uses the non-documented object names in the SYS schema rather than the public synonyms, and uses generic alias names for the objects.
- Page 180, a sentence which makes no sense, “an overall contention ratio of 1% or higher is an indicator of too few rollbacks as well.” First, the sentence is missing at least one word. Second, the method to calculate the overall contention ratio is not described. Third, why 1% and not 10% or some other number?
- Page 181, the totuserspace.sql script does not appear to be in the script library for the book.
- Page 182, the sortusage.sql script does not appear to be in the script library for the book.
- Page 182, the sortdet.sql script does not appear to be in the script library for the book.
- Page 183-185, the topsess.sql script does not appear to be in the script library for the book. After using three pages for the script, it probably would have been wise to use more than a small paragraph to explain what to do with the output of the script.
- Page 186 states, “a rule of thumb is that no session should consume more than 25-50% of the overall resources in a particular category.” Why 25% to 50%? It would seem that the percentage might make a difference if there was a single session connected or 1,000 sessions.
- Pages 186-187, the topsessdet.sql script does not appear to be in the script library for the book. The script’s output is cumulative since the session connected, so it could easily generate invalid results if one session had been connected to 2 months, while a session consuming significant resources has been connected for 30 minutes. The query does not use the time model statistics, so it is possible for the statistics displayed to be inaccurate for actively executing sessions – for example if a session has been on the CPU for the last 60 minutes performing logical IO.
- Pages 188-189, the userscans.sql script does not appear to be in the script library for the book. The script is interesting in that it retrieves the “table scans (long tables)” statistic for each session. According to the Oracle documentation, a table is considered long if the table is not cached and its high water mark is more than 5 blocks, and the statistic is impacted by the CACHE and NOCACHE hints – based on that description, the query is not behaving as described in the book (reference).
- Page 189, the scatwait.sql script does not appear to be in the script library for the book. The query is not time scoped, so a session that has been connected for months may unfairly be sorted to the top of the output.
- Page 189, the book states that the “db file scattered read” wait event “is generally thought to be an indicator of table scan activity.” It probably would have been wise to state that the “db file scattered read” wait event also appears during index fast full scans, and in some cases index range scans. As of Oracle Database 11.1 it is quite common for the “direct path read” wait event to appear when full table scans are performed, but the book made no mention of that fact.
- Page 190, the large_scanusers.sql script does not appear to be in the script library for the book. The script suffers from the same errors as the scripts earlier in the book when it attempts to determine how many full table scans users have performed based on querying SYS.V_$SQL_PLAN – a table that appears a single time in an execution plan may be full table scanned multiple times during a single execution.
- Page 190 states that the “table scans (long tables)” statistic indicates full table scans of tables that are five or more blocks in size. That is almost what the Oracle 10.2 documentation states, but that raises the question regarding the script on page 188 and 189 – is there a problem with performing a couple of full table scans on a table with 10 blocks under its high water mark, and is this as severe as performing a couple of full table scans of tables with 10 million blocks under its high water mark?
- Page 191, the lockcnt.sql script does not appear to be in the script library for the book. The script appears to have an incomplete join to the SYS.V_$LOCKED_OBJECT view.
- Page 192, the curriosql.sql script does not appear to be in the script library for the book. This SQL statement has time scoping problems, directly comparing a session that has been connected to the database for months with one that has been connected for a couple of minutes.
- Page 193, the cartsession.sql script does not appear to be in the script library for the book. The script joins SYS.V_$SQL to SYS.V_$SQL_PLAN using just the HASH_VALUE, and there are certainly risks in doing that. Not all Cartesian joins are bad – the optimizer may (frequently) use a Cartesian join when the statistics for one of the row sources indicates that a single row, or a very small number of rows, will be joined to the second row source.
- Pages 193-194, the cartsql.sql script does not appear to be in the script library for the book.
- Page 195, the list of optimizer modes probably should indicate that the FIRST_ROWS optimizer mode has been deprecated for quite a while and/or not list it as one of the two optimizer modes. The book states of the first rows optimizer mode, “first rows: Fetches the desired rows with a minimum amount of block touches (favors indexes).” This, of course, raises the question, what if the desired number of rows is all or nearly all of the rows – is this still a true statement? Do the non-deprecated FIRST_ROWS_1, FIRST_ROWS_10, FIRST_ROWS_100, and FIRST_ROWS_1000 also favor indexes – this section of the book does not mention those optimizer modes. This page also incorrectly states that the ALL_ROWS optimizer mode favors full scans, such a statement is misleading at best.
- Pages 196-197, the topsql.sql script does not appear to be in the script library for the book. It appears that the script is designed for Oracle Database 9i (query retrieves HASH_VALUE rather than SQL_ID), queries SYS.V_$SQLAREA rather than the more common (and documented) V$SQLAREA, unnecessarily groups all child cursors of the same SQL statement together (should have queried V$SQL instead) which can be quite misleading especially in 11.1 with the introduction of adaptive cursor sharing and 11.2 with the introduction of cardinality feedback (two items that are apparently not mentioned in this book). Additionally, the starting time point of the query is the database open time, rather than a more meaningful examination of the delta statistics for a shorter time period such as an hour or a day – a query that is executed once a month that causes a lot of physical reads might not be a problem, while a query that executed many times in the last hour might be much more of a cause for concern.
- Page 198 appears to be copied and pasted (with minor modifications) from pages 279-280 of the book “Oracle Performance Troubleshooting: With Dictionary Internals SQL & Tuning Scripts”
- Page 198, the book is states, “the first query in the result set has accumulated almost five seconds of total execution time and runs for about two and a half seconds each time it is executed.” While the query might have an *average* execution time of 1.5 seconds, it very well could have required 4.99 seconds for the first execution and 0.01 seconds for the second execution. Considering that this data that the author is commenting about is drawn from V$SQLAREA, it very well could be the case that the first execution and the second execution have entirely different execution plans. The effects of block caching could also mean that the time for each execution is not nearly the same as the average execution time.
- Page 198 states, “this can be accomplished by noting the hash_value for each SQL statement and then issuing the fullsql.sql script to obtain the full SQL statement.” This statement causes confusion because the fullsql.sql script printed in the book appears to request the SQL_ID, and not the HASH_VALUE as described, which was returned by the earlier SQL statement.
- Pages 199-201 appear to be copied and pasted (with minor modifications) from pages 347-350 of the book “Oracle Tuning Power Scripts” (reference).
- Page 199, the cartcount.sql script does not appear to be in the script library for the book. Not all Cartesian joins are bad – the optimizer may (frequently) use a Cartesian join when the statistics for one of the row sources indicates that a single row, or a very small number of rows, will be joined to the second row source.
- Page 199, the same cartsql.sql script that appeared on pages 193-194 is pasted on this page also.
- Page 200, the tabscan.sql script does not appear to be in the script library for the book.
- Page 200 appears to be copied and pasted (with minor modifications) from page 284 of the book “Oracle Performance Troubleshooting: With Dictionary Internals SQL & Tuning Scripts”
- Page 204-205 appear to be copied and pasted (with minor modifications) from page 61-64 of the book “Oracle9i High Performance Tuning with Statspack” (reference).
- Pages 213-217 appear to be copied and pasted (with minor modifications) from pages 158-162 of the book “Oracle Tuning Power Scripts” (reference).
- Pages 224-225 appear to be copied and pasted (with minor modifications) from pages 147-149 of the book “Oracle Tuning Power Scripts” (reference).
- Page 226 appears to be copied and pasted (with minor modifications) from page 146 of the book “Oracle Tuning Power Scripts”
- Page 229 appears to be copied and pasted (with minor modifications) from page 139 of the book “Easy Oracle Automation: Oracle 10g Automatic Storage, Memory and Diagnostic Features” (reference).
- Pages 229-234 appear to be copied and pasted (with minor modifications) from pages 150-160 of the book “Oracle Tuning Power Scripts” (reference).
Jumping to mostly random pages:
- Page 235 appears to be copied and pasted (with minor modifications) from pages 252-253 of the book “Oracle Tuning Power Scripts” (reference).
- Pages 243-247 appear to be copied and pasted (with minor modifications) from pages 112-115 of the book “Easy Oracle Automation” (reference).
- Page 249 appears to be copied and pasted (with minor modifications) from pages 47-48 of the book “Oracle 9i High-Performance Tuning with Statspack” (reference).
- Pages 250, 309, 311, and 318 contain an advertisement for the author’s ION product.
- Page 280 states, “Oracle does not run in a vacuum, and it’s critical to search outside the box and see what is happening with your CPU, RAM, network, and disk I/O subsystems.” While this quote might raise a couple of questions, I would like to know why Oracle (assuming Database) was not designed to function correctly inside a vacuum, and why would the server’s CPU be outside the box?
- Pages 313-315 are nearly identical word for word with Metalink (MOS) Doc ID 884046.1 “Understand each field of AWR” as well as pages 22 through 25 of a document that is marked as copyright 2010 by IBM Corp. (reference).
- Page 325 states, “The reason that CPU drives to 10% utilization is because the UNIX internal dispatchers will always attempt to keep the CPU’s as busy as possible. This maximizes task throughput, but it can be misleading for a neophyte. Remember, it is not a cause for concern when the user + system CPU values approach 100 percent.” Why 10%? Page 325 of the book makes the same errors as are found on page 25 of the book (reference).
- Page 355 provides no real help with decoding the time model statistics, not even a suggestion to arrange the statistics into a logical hierarchy. All that is stated is, “in the sample output of the AWR Time Model Statistics Report shown above, it can be seen that the system spends the most processing time on actual SQL execution but not on parsing. This is very good for a production system.” This is at least the second case in the book where the time model statistics were introduced were no information was provided to indicate what to do with the statistics.
- Page 355-356, other than telling the reader that the operating system statistics are important, and providing what appears to be output from Oracle Database 10.1, the book does not indicate what to do with the statistics.
- Pages 516-519 appear to be copied and pasted (with minor modifications) from pages 212-216 of the book “Oracle 9i high-performance tuning with Statspack” (reference).
- Page 520 states, “for example, tuning an SQL statement to remove full-table scans makes the query run faster because of the direct reduction in the amount of data blocks that are read from disk.” This is simply too broad and too inaccurate of a statement to be true, as it seems to imply that all full table scans are automatically worse than an index access path. It can be quite easily demonstrated that even when selecting 0.06% of a table, a full table scan may be significantly faster than an index access path to the same data (reference).
- Page 521 includes a statement that does not seem to make sense, “Prior to Oracle, Oracle Professionals noticed that by moving the entire database to a larger blocksize, they reduce disk I/O and improve performance of the entire system.” Were there Oracle DBAs before Oracle Corporation was formed, or is this sentence stating something else?
- Page 521 states, “there is no question that a large blocksize for indexes is going to reduce I/O, and therefore improve the overall performance of the entire database.” There have been several well written articles that question whether using a larger block size for indexes will do either of reduce I/O or improve the overall performance. For example, what happens in a multi-user database when more index entries are packed into a larger index leaf block and multiple sessions concurrently insert into the table to which the index belongs? This is essentially a repeat of the same logic error as is found on page 149 of the book (reference http://richardfoote.wordpress.com/2009/02/23/larger-block-tablespace-for-indexes-revisted-part-ii-money/).
- Page 522 states, “Oracle has codified the benefits of different blocksizes, and the Oracle 11g Performance Tuning Guide notes that multiple block sizes are indeed beneficial…” The author then quoted the portion of the Oracle Database documentation which describes the KEEP and RECYCLE buffer pools. The KEEP and RECYCLE buffer pools can only use the same block size as is the default block size for the database – the quote from the documentation does not describe the benefits of different block sizes. I pointed this error out to the author in the Oracle OTN forums in 2008 after a couple other people previously mentioned the same error (reference).
- Page 522 states, “for example, insert-intensive databases will perform less write I/O (via the DBWR process) with larger block sizes. This is because more logical inserts can take place within the data buffer before the block becomes full and requires writing it back to disk.” This is an extremely weak argument. DBWR batches block writes, it does not write a block to disk as soon as it becomes full. Second, what kinds of problems may happen when multiple sessions are concurrently performing the inserts into leaf blocks? Third, what about all of the bugs that are related to having multiple block sizes in a single database such as bug ID 4940513 (undo related), bug ID 5496041 (index corruption), and doc ID 844497.1 (RMAN skips non-default block size tablespaces).
- Pages 523-528, the author attempts to argue for the use of multiple block sizes in a single database using arguments that were discussed in several forum threads. A lot of convincing evidence was provided in those threads that strongly recommended against the practice that is suggested by the author.
- Page 529, Oracle Database 10.2 and above also considers the value of the SESSIONS parameter when automatically setting the DB_FILE_MULTIBLOCK_READ_COUNT parameter value (reference).
- Page 535, the same flawed comparison of indexes in an 8KB and 16KB block size which is found on page 149 is repeated.
- Page 537 describes a case where a customer was using a 16KB block size database that experienced performance problems where updating roughly 850,000 rows in a table required roughly 45 minutes, and that time decreased to just 2.2 minutes by recreating the database with a 4KB block size. This magic performance increase was discussed on the Oracle OTN forums and several blogs. Jonathan Lewis create a test case that identified the cause as an ASSM bug that appears in tablespaces created with block sizes larger than 8KB. A bug was filed with Oracle Corp. related to the problem, and the author of this book as well as the field technician who spotted the error were notified that the cause of the problem was identified and a patch is being developed. I am not sure why the book states, “I was able to get the update down to 1.5 minutes” – I distinctly recall the field technician (also the technical editor of this book) reported fixing the problem, and not the book author (reference reference2 reference3 reference4).
- Page 726 states to check Metalink (MOS) Doc ID 122008.1 for “the officially authorized script to detect indexes that benefit from rebuilding.” That Metalink article states that the criteria is not valid and the script has been revised to meet “current standards and functionality.” That means that the suggested criteria for rebuilding that is printed in the book regarding 20% or more of deleted index entries or a depth of more than 4 levels is invalid, as had previously been pointed out to the book author as invalid in a couple of OTN discussion threads (reference reference2 reference3).
- Page 726 suggests to check Metalink Doc ID 77574.1 for Oracle’s index rebuilding guidelines – that Doc ID does not exist on Metalink (MOS).
- Page 726 states, “However, when large numbers of adjacent rows are deleted, it is highly unlikely that Oracle will have an opportunity to reuse the deleted leaf rows, and these represent wasted space in the index.” The first question is why can the space not be reused. The second question is if it matters whether the index is a primary key b*tree index, and index on another column, or a different type of index altogether. The third question is how are the adjacent rows deleted – if this is a heap organized table, how does one make sure that physically adjacent rows in the table blocks are deleted?
- Page 727 appears to be at least partially copied from page 412 of the book “Oracle 9i High-Performance Tuning” (reference reference2).
- Page 727 states, “Hence, an Oracle index may have four levels, but only in those areas of the index tree where the massive inserts have occurred. Oracle indexes can support many millions of entries in three levels, and any index that has four or more levels would benefit from rebuilding.” The first sentence is clearly not true, and was communicated to the book’s author by members of the Oracle community in early 2003 (as well as several times since 2003). The second sentence leads into a bit of a problem area – what happens if after rebuilding an index it is still four or more levels – do you rebuild it again? (reference reference2)
- Page 727 states, “Gets per index access: The number of gets per access refers to the amount of logical I/O that is required to fetch a row with the index.” This attempted description of the BLKS_GETS_PER_ACCESS column of INDEX_STATS seems to be inaccurate, and likely is not a good metric for deciding if an index should be rebuilt – the author suggests that it might be appropriate to rebuild the index when the value of this statistic is more than five (reference).
- Page 727 suggests using “ANALYZE INDEX VALIDATE STRUCTURE” for all indexes to determine which are candidates for a rebuild. The book then states, “After analyzing the report above, the DBA may want to consider rebuilding any index where the height is more than three levels…” The author does not mention that the tables to which the indexes belong will be locked for the entire duration of the ANALYZE INDEX command – this problem was brought to the attention of the book author in 2007 by members of the Oracle community. For the second of the two quoted sentences, the book does not include a report above the sentence quoted from the book, so it is unclear what the reader should be reviewing (reference).
- Pages 728-732 contain a lot of faulty facts about the safeness and cheapness of index rebuilds, and how “many” databases see a huge benefit from periodic index rebuilding. Many of these faulty facts are discussed in detail in Richard Foote’s blog and in various Internet forums.
(Section added December 8, 2010): Related blog articles that discuss the contents of this book:
(Section added February 28, 2011): Attempted Purchases of the “Oracle Tuning: The Definitive Reference Second Edition” Book:
On February 2, 2011 I added the following comment to my review of this book on Amazon.com:
In my review of the book I stated the following:
“I initially ordered the book from Amazon at the end of May 2010 (it was expected to be in stock the first week of June 2010), and when it had not shipped by the end of August, my curiosity subsided due to the delay (and various other reasons) and I cancelled the original order. Near the end of October 2010 my curiosity peaked again, so I placed a new order for the book (at that time the book was expected to be in stock the first week of November 2010). While waiting for the book to arrive, I started reading portions of the first and second editions of the book through the Google books library.”
The order that I placed for the book at the end of October 2010 still has not shipped, more than three months after I placed the order. It is somewhat interesting to see that the publisher’s website offers this book for seven dollars less than Amazon’s current price for the same book, while the publisher apparently cannot supply Amazon with a sufficient number of copies of this book to fill the existing orders. I have no intention of cancelling this order, even if it does not ship for another three to six months. Best of luck to others who have waited six months (three months for the first order before cancelling and three months for the second order) to obtain the book.
This was the status of the first attempt to acquire the book, as shown on Amazon.com August 26, 2010 – that order was cancelled shortly after making this screen capture:
This screen capture shows the current status of the second attempt to acquire the book, as shown on Amazon.com February 28, 2011 – I do not intend to cancel this order:
This is certainly an interesting situation. Why publish a book, and then not provide the book vendors with a sufficient number of copies to satisfy the apparent demand? There is probably a good explanation for the delay, but not one that I am able to understand.
(Section added March 9, 2011): Attempted Purchases of the “Oracle Tuning: The Definitive Reference Second Edition” Book, Part 2:
Amazon cancelled my second order and sent the following email:
I noticed a couple of days ago that Rampant was selling the book through Amazon.com. Currently there is only a single seller listed on Amazon.com, and that seller is asking $976.98 for a used copy of the book: