August 26, 2010
In June 2010 an article appeared on another blog that identifed three blog articles appearing on a blog operated by an Oracle Certified Master (OCM) that were copied nearly verbatim from three other blogs, thus saving the OCM a considerable effort in not only typing verbage appearing in the article, but also effort in actually building useful content. Why spent four hours developing a blog article (this is probably the average time I spend), when a simple copy, paste, find-and-replace can be accomplished in a couple of minutes? Those copies of articles clearly violated commonly understood copyright restrictions, and a quick search showed that the copyright violations extended far beyond the initial three articles which were mentioned (yes, I have screen captures and PDF copies of all that I identified).
So, why did I write the above paragraph? For some reason I was recently browsing through some of the information about DMCA:
“Any material that was posted without the copyright owner’s authorization must be removed or blocked promptly once the service provider has been notified that it has been removed, blocked, or ordered to be removed or blocked, at the originating site.”
I realized that a random reader of this blog could potentially send a note to WordPress stating that they are, in fact under penalty of law, the owner of the material I just spent 2 hours, 4 hours, 8 hours, 16 hours - maybe even months writing, and demand that the article be taken offline because I clearly stated that I quoted a couple of words from one of their articles while agreeing with or disagreeing with the other article. Is quoting another source, while clearly stating that the other source is being quoted, in an effort to critique the quoted section an example of fair use?
“Section 107 contains a list of the various purposes for which the reproduction of a particular work may be considered fair, such as criticism, comment, news reporting, teaching, scholarship, and research. Section 107 also sets out four factors to be considered in determining whether or not a particular use is fair:
- The purpose and character of the use, including whether such use is of commercial nature or is for nonprofit educational purposes
- The nature of the copyrighted work
- The amount and substantiality of the portion used in relation to the copyrighted work as a whole
- The effect of the use upon the potential market for, or value of, the copyrighted work”
Interesting… as some of you may have noticed, the Oracle Database book reviews that I wrote in the last two years attempt to be very thorough, precise, and specific – many times down to a page number and a quoted sentence or two. In my view such quotes clearly fall under the embrella of fair use (specifically criticism, comment, teaching, and research).
On June 10, 2010 I posted a blog article where I mentioned having three Oracle Database books on order that I intended to review just as I had with many of the other Oracle Database books that I purchased in the past. Two of the books that I ordered, “Oracle Performance Firefighting (fourth printing)” and “Expert Oracle Database Architecture: Oracle Database Programming 9i, 10g, and 11g Techniques and Solutions, Second Edition” arrived and were reviewed, while the book that was supposed to ship on June 1, 2010 currently shows the following status on Amazon.com:
Let’s take a look at that order:
So, in summary Amazon reports that the book has yet to be released, and if it ships in September it will arrive three months later than originally expected. On August 17, 2010 I received two unsolicited emails from someone at that book publishing company asking me to submit any errata that I find in the second edition of the book so that the problems may be addressed by an OCM who is also an Oracle Ace, after I write up the book review – the person also mentioned that, by the way, the book keeps selling out. Nice, except that I now have a slight problem. My reviews typically take between one and two hours to write and when agreeing with or disagreeing with a section of a book I try very hard not to lose the precise message of that section of the book – that often means that it is necessary to directly quote a sentence or two of the book that is being reviewed. In theory, simply quoting a section of a copyrighted work might be grounds for a DCMA pull down of the content, thus rendering an hour or two of writing lost for a period of time. Would the author of this book that is now three months delayed try a DCMA pull down if the review is not completely favorable? I don’t know that answer.
In late May or early June I wrote the following about why I ordered the book that has yet to ship:
I was curious, so I spent good money on the book. For most of the Oracle books that I have read in the last two years I have posted detailed reviews of the books on Amazon. The reviews list what I thought was good about the book, what I thought was great about the book, errors/errata found in the book’s chapters, as well as misc. points about the book as it is read. Some of my reviews identified a dozen or more errors in a book, which is a significant problem with some of the Oracle books on the market. The errors must make it difficult for beginners to learn about Oracle Database.
How long will the review of the “Oracle Tuning: The Definitive Reference Second Edition” book stretch? I am not sure, but it appears that I could write about a 4 page review on the first 27 pages of the first edition of this book – I sincerely hope that some/most of those issues were addressed in the second edition. I want to make certain that I obtain a good value from the $44 that I spent on the book. I do not think that it too much to ask for accuracy from a published book, especially when the book cost more than $40. Some of the pages in the chapters that I co-authored in the “Expert Oracle Practices” required more than 8 hours of research, typing, verification, testing, proofing, re-testing, and formatting per page. I believe that Raldolf spent roughly the same amount of time on the pages that he wrote for the chapters. I suspect that many of the very good books that are on the market for Oracle also required about the same amount of time per page. Certainly, it can’t be too much to ask that other book authors put effort into testing and verification of what they publish? The author’s website states about the the “Oracle Tuning: The Definitive Reference Second Edition” book: [edit: feel free to read the book's description on the publisher's website]. So, I will be looking for the portions of the book’s contents that are comprehensive and targeted at the senior Oracle DBA.
After waiting three months, I am no longer curious about the book – so I just cancelled the order. Was I serious that I could write a four page review on the first 27 pages of the first edition of the book? I heard someone shout “show me your cards.” In late May and early June I read the first 27 pages of the first edition using the Google books site, and I made notes as I read. These are my notes, quoting a minimal amount of the book to fall under the embrella of “criticism, comment” of fair use:
- Page 9 (now page 11) states “For example, if the AWR shows that the demands on the shared pool become very high between 1:00 pm and 2:00 pm, the DBA might trigger a dynamic decrease of db_cache_size and a corresponding increase of the shared_pool_size parameter during this time period.” If Oracle’s buffer cache is using the memory specified by the DB_CACHE_SIZE, the size value specified for DB_CACHE_SIZE cannot be decreased without first manually flushing the buffer cache – flushing the buffer cache to allow the shared pool to grow for an hour is a bad idea. It is important to keep in mind that when the SGA_TARGET parameter is specified, the values for DB_CACHE_SIZE and SHARED_POOL_SIZE specify the minimum values for the parameters.
- Page 10 is an advertisement for the WISE package – free copy is apparently available with the book.
- Page 11, when describing examples of poor schema design, suggests that databases using extensive data normalization to minimize data redundancy is a poor design because it forces Oracle to perform unnecessary table joins.
- Page 11 (now page 12) states, “If poorly designed PL/SQL is encountered, users might be able to tune the database by using array processing such as bulk collect, forall and ref cursors…” As of Oracle Database 10g, Oracle automatically array fetches 100 rows at a time in cursor FOR loops, making it much less likely/necessary that a ‘user’ will re-write the PL/SQL to take advantage of bulk collect’s performance slightly better performance. http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:1203923200346667188
- By page 13 has recommended the use of AWR, various AWR tables, and the SQL Tuning Advisor without mentioning that the Diagnostic Pack must be licensed for access to AWR information, and the Diagnostic and Tuning Pack must be licensed for access to the SQL Tuning Advisor.
- Page 14 (now page 25) states “The first tasks when tuning a database when tuning a database are to identify the external bottleneck conditions, which may include: CPU Bottleneck,… RAM Bottleneck,… Network Bottleneck, … and Disk Bottleneck.” While it is important to check these items, I suggest that these probably would not be the first items to check when tuning the database instance’s performance. Yes, checking whether or not the CPUs are overloaded (or heavily loaded) is important, but maybe it would be better to determine if the overload is causing a measurable negative impact on performance by first checking the delta values of the various wait events, system statistics, and system time model statistics before deciding to check, for instance, whether there are network bottlenecks.
- Page 14 (now page 25) states “Whenever the run queue exceeds the number of CPUs on the Oracle server in the absence of high idle times, the system is said to be CPU-bound.” The first question with this statement is what qualifies as a “high idle time”? Secondly, a server may become CPU bound long before the CPU run queue exceeds the number of CPUs in the server, and long before 90% CPU utilization is reached. The author mentions that CPU consumption can be reduced by tuning SQL and reducing library cache contention – that is true, but much more could be said about the topic. For example, would it make a difference if the database is used for OLTP rather than data warehouse activities? The remedies listed on page 15 are just to add additional CPUs to the server or disabling a “high CPU consumer” such as parallel query. No warning provided to indicate that adding CPUs to the server could be expensive from an Oracle licensing perspective and a system uptime perspective.
- Page 15 (now page 25) states that “Large amounts of Oracle*Net traffic contribute to slow SQL performance.” The book does not describe what is considered “large amounts”, nor what to do about addressing the network bottleneck.
- Page 15 (now page 25) under the heading of disk bottlenecks the only issue identified is “using RAID5 for high update systems.” The adjective “high” is never clarified.
- Page 15 (now page 25) states, while discussing potential solutions for over-allocation of the server’s RAM, that the options for “excessive wrapping id to add more RAM, reduce the size of Oracle’s SGAs, or turn-on Oracle’s multi-threaded server.” Are these the only options, or could more suggestions be recommended? For example, “size of Oracle’s SGAs” implies that more than one instance is running on the server – would it make sense to reduce the number of instances on the server? Are there any reasons not to implement a shared server configuration (what the author refers to as “multi-threaded server”? What about tuning the SQL?
- Page 15 (now page 26), the author appears to misuse the term “network latency”, or at least stretch the meaning of latency as it applies to a network.
- Page 16 includes an advertisement for WISE.
- Page 16 (now page 27) states “In a disk-bound database, the majority of the wait time is spent accessing data blocks.” It is quite possible in a non-disk-bound, well performing database instance that the majority of wait time could very well be spent reading data blocks. In fact, these events should be among the top 5 wait events, otherwise there probably is a significant performance problem. It is also quite possible that the system could be disk-bound when the majority of wait time is spent _writing_ to the redo logs, archiving redo logs, and updating data/undo/temp files.
- Page 16 (now page 27), the Top 5 Timed Events, demonstrating a disk constrained database, shows 44 waits for the CPU. Waits for the CPU cannot be reported in the Top 5 Timed Events – this data in the report is bogus. The report also shows that the average single block read time is 2.75 seconds, while the average multi-block read time is 0.127 seconds (127ms) – this data in the report is also likely bogus. The report also shows that 1,363 seconds were lost in 673 waits (average of 2.03 seconds) to the “library cache load lock” wait – if that wait appeared in the top 5 report, even in a report showing the single block read time is 2.75 seconds, I think it would be a good idea to start with the “library cache load lock” wait since that is one of the waits that indicates a severe problem if it appears in the top 5 wait events.
- Page 17 (now page 28) states “CPU enqueues can be observed when the CPU run queue exceeds the number of CPUs on the database server, and this can be seen by…” The effects of CPU enqueues may be observed long before the CPU run queue exceeds the number of CPUs in the server, whether it be longer average waits for the log file sync wait event, increased waits for latches, increased wait time for single block or multi-block reads, or any number of statistics in the system time model (V$SYS_TIME_MODEL, V$SESS_TIME_MODEL).
- Page 17 (now page 28) states “High CPU usage will be reported as a top 5 timed even (sic) in any AWR report as shown below” First, the Top 5 Timed Events report shows that there were 4,851 waits for CPU time – a Top 5 report cannot show the number of waits for the CPU – this data in the report is bogus. Second, the CPU is a limited “wait” resource with an upper utilization limit equal to the number of CPUs in the server multiplied by the elapsed number of seconds, while the other foreground wait events, for the most part, have an upper limit of the average number of sessions (plus parallel related additional processes) connected to the instance multiplied by the elapsed number of seconds. It is quite possible that a server experiencing excessive CPU usage may not cause the “CPU time” to appear in the top 5 timed events. Third, we do not know how many CPUs are in the server – would it make a difference if there was only a single CPU compared to 96 CPUs? Fourth, we have not concept of the elapsed time represented by the report – what if the elapsed time for the report is eight hours? Fifth, if the single block read time averages more than a second, shouldn’t the investigation start with the IO system? Sixth, the most that can be determined from the top 5 timed events is the average utilization, number of waits, and number of timeouts – why are the number of timeouts not shown? From the top 5 timed events it is difficult, if not impossible, to determine if there wasn’t a brief period of intensive activity, followed by a long period of almost no activity – or if the average activity levels were fairly constant in the time period.
- Page 18 (now page 28), the Top 5 Wait Events (why the switch from Top 5 Timed Events) is labeled as a mockup – probably a good idea that this is labeled because the report shows 94,754 seconds where the server was waiting to send data across the network while single block and multi-block reads totaled only 85.75 seconds. That is why you do not use a satellite link for the network backbone.
- Page 19, several grammar errors to this point in the book “It not elegant, or even correct, but it’s a common choice by managers who need quick, low-risk performance improvements… Individual Program Global Areas (PGA’s for each session) are a set of running programs that do work for the instance and are referred to as processes.”
- Page 20 (now page 30), describes db_cache_size as “the number of data buffers to allocate for the instance.” The parameter does not specify the number of data buffers (blocks?), it specifies the amount of memory to allocate to the buffer cache (or the minimum amount in the event the SGA_TARGET is specified).
- Page 21, the DB_FILE_MULTIBLOCK_READ_COUNT parameter is listed among the parameters that cannot be changed without bouncing the instance – that is not correct.
- Page 21 (now page 30), states “parallel_automatic_tuning:… since parallel full-table scans are very fast, the CBO will give a higher cost to index access and be friendlier to full-table scans.” This is simply incorrect. The optimizer does not increase the cost of index access when costing parallel access paths that use full table scans – it lowers the cost of the full table scan relative to the parallel degree. Second, parallel full table scans are not automatically very fast, especially if the IO subsystem is bottlenecked.
- Page 21 (now page 30), the description of the OPTIMIZER_INDEX_COST_ADJ parameter is very weak, “this parameter controls the relative costs of full-table scans versus index scans.” It would be much better to state that the parameter specifies the percentage of the index access cost to be retained. Setting the value to a very low value could cause the optimizer to select not only an index access path rather than a full table scan, but to select the wrong index due to cost rounding problems as the value of the OPTIMIZER_INDEX_COST_ADJ parameter approaches 1. Ideally, since at least Oracle Database 10.1 this parameter should be left at the default value of 100 due to the Oracle optimizer’s use of system statistics by default.
- Page 21 (now page 30), the description of the OPTIMIZER_INDEX_CACHING parameter is vague – the book doesn’t indicate when the value of the parameter is used in cost calculations, and when it is not used.
- Page 22 (now page 34) states “In Oracle 10g, statistics collections are automated, but users may still need to selectively add histograms and other specialized optimizer statistics.” The book does not state what triggers the statistics collection for a specific object, at what time the statistics are collected, that histograms (maybe excessively) are automatically generated, nor the potential problems that might be encountered after the statistics collection.
- Page 23 (now page 35) states “SQL will run fastest when the first table joins deliver the smallest result set.” This is stated as an absolute, which absolutely will not be correct all of the time.
- Page 23 states “the new Oracle10g SQL profile utility will allow changes to execution plans without adding hints.” This is untrue – a SQL profile is a series of hints that are automatically generated.
- Page 25 shows operating with OPTIMIZER_MODE set to CHOOSE and deleting a table’s statistics to immediately return the system to acceptable performance. No warning that this caused the optimizer to switch to the RULE based optimizer, and no demonstration of how statistics should be collected.
- Page 25 “there was a serious degradation in SQL performance immediately after the implementation of partitioned _tablespaces_ in a 16-CPU Solaris 64-bit Oracle 9i database.”
- Page 26 shows a grammar error “Note that is the full-table scans are parallelized with 15 parallel query processes to speed up the statistics collection”
- Page 26 – amazing performance improvements “tripled the performance” “400 percent performance improvement” – no real suggestion of how the silver bullet corrected a problem to cause such a significant increase in performance, so that readers can experience similar performance improvements in their database instances.
- Page 26 – assumes a data warehouse was set up without a PGA_AGGREGATE_TARGET parameter value and an unspecified value for SORT_AREA_SIZE, resulting in a SORT_AREA_SIZE that was the default value, and only a 50% overall improvement was experienced by setting the parameter to 1MB. The book did not state that the default value of SORT_AREA_SIZE, when not specified, and PGA_AGGREGATE_TARGET is not used, that the SORT_AREA_SIZE defaults to 64KB. The book also did not describe how the 1MB value was determined to be the best value, nor what impact changing the parameter’s value has on memory availability in the server.
- Page 27 – code section shows single quotes around what is apparently a bind variable defined as VARCHAR2. The single quotes should not appear around that bind variable. Author states that a concatenated index on the CUSTOMER_STATUS and CUSTOMER_AGE columns resulted in a 50 fold performance improvement and 6 fold reduction in disk IO. The book does not state how or why, so I wonder how that could happen when the CUSTOMER_STATUS column likely has few unique values and an inequality is used on the CUSTOMER_AGE column. Was a bitmap or B*tree index used, was the CUSTOMER_STATUS or CUSTOMER_AGE specified as the leading column in the index?
- Page 27 “While it was not clear why the CBO was not choosing the index, the query with the index hint ran almost 20x faster. After acting fast and running a script against v$bh and user_indexes, the DBA discovered that approximately 65 percent of the indexes were currently inside the buffer cache.” There are a couple of problems with this statement. The statement is not clear whether the DBA found that at least one block from roughly 65% of the indexes found in USER_INDEXES was in the buffer cache, or if the DBA found that roughly 65% of all blocks belonging to the indexes found in USER_INDEXES were in the buffer cache. The second problem is why did the DBA check USER_INDEXES and not DBA_INDEXES – if the DBA checked USER_INDEXES, he would only see the indexes that belong to the user account that the DBA used to connect to the database (unless he used ALTER SESSION SET CURRENT_SCHEMA to change to another schema). The third problem is that there may be multiple, possibly many consistent read versions of an individual block in the buffer cache – how did the DBA factor out the multiple copies of blocks when he performed his calculations? The fourth problem is that the DBA did not consult a 10053 trace for a hard parse of the SQL statement to understand why the expected index was not selected.
- Page 28 “Based on similar systems, the next step was to lower optimizer_index_cost_adj to a value of 20 in the hopes of forcing the CBO to lower the relative costs of index access.” At the same time the OPTIMIZER_INDEX_CACHING parameter was set to 65. What are the problems with this approach? According to the Oracle documentation, the “OPTIMIZER_INDEX_CACHING parameter lets you adjust the behavior of cost-based optimization to favor nested loops joins and IN-list iterators. The cost of executing an index using an IN-list iterator or of executing a nested loops join when an index is used to access the inner table depends on the caching of that index in the buffer cache. The amount of caching depends on factors that the optimizer cannot predict, such as the load on the system and the block access patterns of different users. You can modify the optimizer’s assumptions about index caching for nested loops joins and IN-list iterators by setting this parameter to a value between 0 and 100 to indicate the percentage of the index blocks the optimizer should assume are in the cache. Setting this parameter to a higher value makes nested loops joins and IN-list iterators look less expensive to the optimizer.” The parameter change is global, and does not apply to all index type accesses, it primarily makes nested loop joins appear to be more favorable than hash joins. What criteria determined that 20 was the best value for OPTIMIZER_INDEX_COST_ADJ, and why didn’t the DBA correct or even check the system statistics, rather than experimenting with a parameter that should rarely be changed from its default value since the introduction of system statistics in Oracle 9i.
- Page 28 shows “alter system set optimizer_index_cost_adj=20 scope = pfile;” Issuing that command will result in an “ORA-00922: missing or invalid option” error. The pfile cannot be altered using an ALTER SYSTEM command, except of course if the spfile is modified, and then the spfile is written over the top of the existing pfile (init.ora file).
- Page 28 indicates that the Statspack report shown was from Oracle 9i, yet the report indicates that the wait time is in centiseconds, while an Oracle 9i Statspack report indicates time in seconds. It is a bit alarming to see 4,796.54 seconds of wait on enqueue (46.71% of total implies total wait time of 10,268.76 seconds), while the 10,579,442 multi-block reads required 1,972.05 seconds (report shows that this is 29.20% of the total, but a calculator shows that it is 19.20%). Something is wrong with this Top 5 Wait Events report – it is showing that the average multi-block read completed in 0.0001864 seconds, while the average single block read time required 0.002714 seconds.
- Page 29 – the silver bullet tip to change the CURSOR_SHARING parameter to FORCE did not warn of any of the possible side-effects, which may include poor performance when cursors that should not be shared because of uneven data distributions generate execution plans that are inappropriate for other literal values, or bugs related to CURSOR_SHARING that either result in wrong data being returned, or instance stability problems. It would be interesting to see how the 75% performance improvement was achieved by changing this parameter.
I guess that if the above mentioned problems are still present in the second edition of the book, it might have taken me significantly longer than the typical one to two hours to write the book review on the remaining 1000+ pages. Best of luck for anyone ordering the second edition of the book.