Finding a New Home for a Client on Windows

30 08 2010

August 30, 2010

As I am sure that you are aware, there are compatibility problems between Oracle Databases and Oracle Clients that are from different releases.  The Oracle Database 11.1 documentation states the following:

“Upgrade Clients to the Current Release
You can upgrade any or all of your Oracle8i, Oracle9i, or Oracle Database 10g clients to Oracle Database 11g Release 1 (11.1). The Oracle Database 11g Release 1 (11.1) client can be used to access your Oracle8i, Oracle9i, Oracle Database 10g, and Oracle Database 11g Release 1 (11.1) databases.”

So, apparently the Oracle 11.1 client has no trouble communicating with a very old Oracle Database 8.1 release (I did not realize that the backward compatibility extended that far back).  But, what do you do if you are running a Windows client computer and need to able to access an Oracle 11.2.0.1 database and an Oracle 8.0.5 database?  Are you out of luck?  A recent thread on the OTN forums asked a question that was very similar.  In the thread I provided three possible solutions:

  • If you have either the Professional or Ultimate editions of Windows 7. Download Microsoft’s free Windows XP Mode virtual machine and install the Oracle 9i client in that virtual machine. You should be able to create an icon that automatically launches the Windows XP mode virtual machine and then runs a specific program inside the virtual machine (I have not yet tested the auto-launch feature).
  • If the programs will use an ODBC connection, install one of the Oracle clients, create the ODBC connection that will use that client, install the other Oracle client, and create the ODBC connection that will use that client.
  • Modify the environment and start the program that will access the database using that modified environment.

Let’s take a closer look at the last option.  Let’s assume that we have an Oracle 11.1.0.7 home set up on the computer, with the bin folder located here:

C:\oracle\product\11.1.0\db_1\bin

When a program starts on Windows, typically the current folder (directory) is searched for required support files, and then the folders in the system path (defined by the PATH environment variable) are checked in the specified order for the support files.  So, to change to this Oracle home we just need to place this location of our home’s bin folder at the start of the PATH environment variable.  To change that variable, open a Windows command prompt window (run cmd.exe) and type the following commands (note that the last command will cause the command prompt window to close):

SET PATH=C:\oracle\product\11.1.0\db_1\bin;%PATH%
START CMD /K SQLPLUS /NOLOG
EXIT

The first line appends the selected Oracle home to the beginning of the PATH variable.  The second line starts SQL*Plus in a new window, with Windows automatically finding it in the modified search path; what is interesting is that SQL*Plus (or any program for that matter) remembers the PATH variable that was modified by the previous command (we could have modified several environment variables).  The end result might look like the following (assuming that you did not press the Enter key after typing EXIT – the background window will disappear if you pressed the Enter key):

Interesting, but how do we know that SQL*Plus is remembering the modified PATH variable?  We are able to confirm that the modified path is remembered using Microsoft’s Process Explorer utility by double-clicking on the sqlplus.exe entry in the program and then switching to the Environment tab:

As you are probably able to determine, I probably also should have changed the PERL5LIB location.  Now, let’s try a 8.1.7 client on the same computer:

SET PATH=C:\orant\bin;%PATH%
START CMD /K SQLPLUS /NOLOG
EXIT

The resulting window will look like this, if you did not press the Enter key after typing EXIT  – the background window will disappear if you pressed the Enter key:

And the Process Explorer window might look like this:

Why stop at two examples?  We could do this in a command prompt window assuming that there are four Oracle homes):

SET PATH=C:\oracle\product\10.2.0\db_1\bin;%PATH%
START CMD /K SQLPLUS /NOLOG

SET PATH=C:\oracle\product\10.2.0\client_1\bin;%PATH%
START CMD /K SQLPLUS /NOLOG

SET PATH=C:\oracle\product\11.1.0\db_1\bin;%PATH%
START CMD /K SQLPLUS /NOLOG

SET PATH=C:\orant\bin;%PATH%
START CMD /K SQLPLUS /NOLOG
EXIT

The end result, after manually aligning the windows, might look like this:

It does not need to be SQL*Plus that is started – it could be any program.  OK, but I don’t want to type a bunch of commands to launch the program.  No problem, just create a text file with a .bat extension:

If the default icon is not fancy enough, just right-click the file and create a shortcut.  You can then view the properties of the shortcut to change the icon:

Just about any EXE file will contain a couple of icons, as will some DLL files.  The result looks like this:

If we were using this approach to start an application program that uses a specific Oracle home, it probably would make sense to change the properties of the shortcut to run using a minimized window to prevent a brief screen flash to black when the batch file executes.





Book Reviews and DMCA – No Longer Curious, Order Cancelled

26 08 2010

August 26, 2010

In June 2010 an article appeared on another blog that identifed three blog articles appearing on a blog operated by an Oracle Certified Master (OCM) that were copied nearly verbatim from three other blogs, thus saving the OCM a considerable effort in not only typing verbage appearing in the article, but also effort in actually building useful content.  Why spent four hours developing a blog article (this is probably the average time I spend), when a simple copy, paste, find-and-replace can be accomplished in a couple of minutes?  Those copies of articles clearly violated commonly understood copyright restrictions, and a quick search showed that the copyright violations extended far beyond the initial three articles which were mentioned (yes, I have screen captures and PDF copies of all that I identified).

So, why did I write the above paragraph?  For some reason I was recently browsing through some of the information about DMCA:

“Any material that was posted without the copyright owner’s authorization must be removed or blocked promptly once the service provider has been notified that it has been removed, blocked, or ordered to be removed or blocked, at the originating site.”

I realized that a random reader of this blog could potentially send a note to WordPress stating that they are, in fact under penalty of law, the owner of the material I just spent 2 hours, 4 hours, 8 hours, 16 hours  – maybe even months writing, and demand that the article be taken offline because I clearly stated that I quoted a couple of words from one of their articles while agreeing with or disagreeing with the other article.  Is quoting another source, while clearly stating that the other source is being quoted, in an effort to critique the quoted section an example of fair use?

“Section 107 contains a list of the various purposes for which the reproduction of a particular work may be considered fair, such as criticism, comment, news reporting, teaching, scholarship, and research. Section 107 also sets out four factors to be considered in determining whether or not a particular use is fair:

  • The purpose and character of the use, including whether such use is of commercial nature or is for nonprofit educational purposes
  • The nature of the copyrighted work
  • The amount and substantiality of the portion used in relation to the copyrighted work as a whole
  • The effect of the use upon the potential market for, or value of, the copyrighted work”

Interesting… as some of you may have noticed, the Oracle Database book reviews that I wrote in the last two years attempt to be very thorough,  precise, and specific –  many times down to a page number and a quoted sentence or two.  In my view such quotes clearly fall under the embrella of fair use (specifically criticism, comment, teaching, and research).

On June 10, 2010 I posted a blog article where I mentioned having three Oracle Database books on order that I intended to review just as I had with many of the other Oracle Database books that I purchased in the past.  Two of the books that I ordered, “Oracle Performance Firefighting (fourth printing)” and “Expert Oracle Database Architecture: Oracle Database Programming 9i, 10g, and 11g Techniques and Solutions, Second Edition” arrived and were reviewed, while the book that was supposed to ship on June 1, 2010 currently shows the following status on Amazon.com:

Let’s take a look at that order:

So, in summary Amazon reports that the book has yet to be released, and if it ships in September it will arrive three months later than originally expected.  On August 17, 2010 I received two unsolicited emails from someone at that book publishing company asking me to submit any errata that I find in the second edition of the book so that the problems may be addressed by an OCM who is also an Oracle Ace, after I write up the book review – the person also mentioned that, by the way, the book keeps selling out.  Nice, except that I now have a slight problem.  My reviews typically take between one and two hours to write and when agreeing with or disagreeing with a section of a book I try very hard not to lose the precise message of that section of the book – that often means that it is necessary to directly quote a sentence or two of the book that is being reviewed.  In theory, simply quoting a section of a copyrighted work might be grounds for a DCMA pull down of the content, thus rendering an hour or two of writing lost for a period of time.  Would the author of this book that is now three months delayed try a DCMA pull down if the review is not completely favorable?  I don’t know that answer.

In late May or early June I wrote the following about why I ordered the book that has yet to ship:

I was curious, so I spent good money on the book.  For most of the Oracle books that I have read in the last two years I have posted detailed reviews of the books on Amazon.  The reviews list what I thought was good about the book, what I thought was great about the book, errors/errata found in the book’s chapters, as well as misc. points about the book as it is read.  Some of my reviews identified a dozen or more errors in a book, which is a significant problem with some of the Oracle books on the market.  The errors must make it difficult for beginners to learn about Oracle Database.

How long will the review of the “Oracle Tuning: The Definitive Reference Second Edition” book stretch?  I am not sure, but it appears that I could write about a 4 page review on the first 27 pages of the first edition of this book – I sincerely hope that some/most of those issues were addressed in the second edition.  I want to make certain that I obtain a good value from the $44 that I spent on the book.  I do not think that it too much to ask for accuracy from a published book, especially when the book cost more than $40.  Some of the pages in the chapters that I co-authored in the “Expert Oracle Practices” required more than 8 hours of research, typing, verification, testing, proofing, re-testing, and formatting per page. I believe that Raldolf spent roughly the same amount of time on the pages that he wrote for the chapters.  I suspect that many of the very good books that are on the market for Oracle also required about the same amount of time per page.  Certainly, it can’t be too much to ask that other book authors put effort into testing and verification of what they publish? The author’s website states about the the “Oracle Tuning: The Definitive Reference Second Edition” book: [edit: feel free to read the book's description on the publisher's website].  So, I will be looking for the portions of the book’s contents that are comprehensive and targeted at the senior Oracle DBA.

After waiting three months, I am no longer curious about the book – so I just cancelled the order.  Was I serious that I could write a four page review on the first 27 pages of the first edition of the book?  I heard someone shout “show me your cards.”  In late May and early June I read the first 27 pages of the first edition using the Google books site, and I made notes as I read.  These are my notes, quoting a minimal amount of the book to fall under the embrella of “criticism, comment” of fair use:

  • Page 9 (now page 11) states “For example, if the AWR shows that the demands on the shared pool become very high between 1:00 pm and 2:00 pm, the DBA might trigger a dynamic decrease of db_cache_size and a corresponding increase of the shared_pool_size parameter during this time period.”  If Oracle’s buffer cache is using the memory specified by the DB_CACHE_SIZE, the size value specified for DB_CACHE_SIZE cannot be decreased without first manually flushing the buffer cache – flushing the buffer cache to allow the shared pool to grow for an hour is a bad idea.  It is important to keep in mind that when the SGA_TARGET parameter is specified, the values for DB_CACHE_SIZE and SHARED_POOL_SIZE specify the minimum values for the parameters.
  • Page 10 is an advertisement for the WISE package – free copy is apparently available with the book.
  • Page 11, when describing examples of poor schema design, suggests that databases using extensive data normalization to minimize data redundancy is a poor design because it forces Oracle to perform unnecessary table joins.
  • Page 11 (now page 12) states, “If poorly designed PL/SQL is encountered, users might be able to tune the database by using array processing such as bulk collect, forall and ref cursors…”  As of Oracle Database 10g, Oracle automatically array fetches 100 rows at a time in cursor FOR loops, making it much less likely/necessary that a ‘user’ will re-write the PL/SQL to take advantage of bulk collect’s performance slightly better performance.  http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:1203923200346667188
  • By page 13 has recommended the use of AWR, various AWR tables, and the SQL Tuning Advisor without mentioning that the Diagnostic Pack must be licensed for access to AWR information, and the Diagnostic and Tuning Pack must be licensed for access to the SQL Tuning Advisor.
  • Page 14 (now page 25) states “The first tasks when tuning a database when tuning a database are to identify the external bottleneck conditions, which may include: CPU Bottleneck,… RAM Bottleneck,…  Network Bottleneck, … and Disk Bottleneck.”  While it is important to check these items, I suggest that these probably would not be the first items to check when tuning the database instance’s performance.  Yes, checking whether or not the CPUs are overloaded (or heavily loaded) is important, but maybe it would be better to determine if the overload is causing a measurable negative impact on performance by first checking the delta values of the various wait events, system statistics, and system time model statistics before deciding to check, for instance, whether there are network bottlenecks.
  • Page 14 (now page 25) states “Whenever the run queue exceeds the number of CPUs on the Oracle server in the absence of high idle times, the system is said to be CPU-bound.”  The first question with this statement is what qualifies as a “high idle time”?  Secondly, a server may become CPU bound long before the CPU run queue exceeds the number of CPUs in the server, and long before 90% CPU utilization is reached.  The author mentions that CPU consumption can be reduced by tuning SQL and reducing library cache contention – that is true, but much more could be said about the topic.  For example, would it make a difference if the database is used for OLTP rather than data warehouse activities?  The remedies listed on page 15 are just to add additional CPUs to the server or disabling a “high CPU consumer” such as parallel query.  No warning provided to indicate that adding CPUs to the server could be expensive from an Oracle licensing perspective and a system uptime perspective.
  • Page 15 (now page 25) states that “Large amounts of Oracle*Net traffic contribute to slow SQL performance.”  The book does not describe what is considered “large amounts”, nor what to do about addressing the network bottleneck.
  • Page 15 (now page 25) under the heading of disk bottlenecks the only issue identified is “using RAID5 for high update systems.”  The adjective “high” is never clarified.
  • Page 15 (now page 25) states, while discussing potential solutions for over-allocation of the server’s RAM, that the options for “excessive wrapping id to add more RAM, reduce the size of Oracle’s SGAs, or turn-on Oracle’s multi-threaded server.”  Are these the only options, or could more suggestions be recommended?  For example, “size of Oracle’s SGAs” implies that more than one instance is running on the server – would it make sense to reduce the number of instances on the server?  Are there any reasons not to implement a shared server configuration (what the author refers to as “multi-threaded server”?  What about tuning the SQL?
  • Page 15 (now page 26), the author appears to misuse the term “network latency”, or at least stretch the meaning of latency as it applies to a network.
  • Page 16 includes an advertisement for WISE.
  • Page 16 (now page 27) states “In a disk-bound database, the majority of the wait time is spent accessing data blocks.”  It is quite possible in a non-disk-bound, well performing database instance that the majority of wait time could very well be spent reading data blocks.  In fact, these events should be among the top 5 wait events, otherwise there probably is a significant performance problem.  It is also quite possible that the system could be disk-bound when the majority of wait time is spent _writing_ to the redo logs, archiving redo logs, and updating data/undo/temp files.
  • Page 16 (now page 27), the Top 5 Timed Events, demonstrating a disk constrained database, shows 44 waits for the CPU.  Waits for the CPU cannot be reported in the Top 5 Timed Events – this data in the report is bogus.  The report also shows that the average single block read time is 2.75 seconds, while the average multi-block read time is 0.127 seconds (127ms) – this data in the report is also likely bogus.  The report also shows that 1,363 seconds were lost in 673 waits (average of 2.03 seconds) to the “library cache load lock” wait – if that wait appeared in the top 5 report, even in a report showing the single block read time is 2.75 seconds, I think it would be a good idea to start with the “library cache load lock” wait since that is one of the waits that indicates a severe problem if it appears in the top 5 wait events.
  • Page 17 (now page 28) states “CPU enqueues can be observed when the CPU run queue exceeds the number of CPUs on the database server, and this can be seen by…”  The effects of CPU enqueues may be observed long before the CPU run queue exceeds the number of CPUs in the server, whether it be longer average waits for the log file sync wait event, increased waits for latches, increased wait time for single block or multi-block reads, or any number of statistics in the system time model (V$SYS_TIME_MODEL, V$SESS_TIME_MODEL).
  • Page 17 (now page 28) states “High CPU usage will be reported as a top 5 timed even (sic) in any AWR report as shown below”  First, the Top 5 Timed Events report shows that there were 4,851 waits for CPU time – a Top 5 report cannot show the number of waits for the CPU – this data in the report is bogus.  Second, the CPU is a limited “wait” resource with an upper utilization limit equal to the number of CPUs in the server multiplied by the elapsed number of seconds, while the other foreground wait events, for the most part, have an upper limit of the average number of sessions (plus parallel related additional processes) connected to the instance multiplied by the elapsed number of seconds.  It is quite possible that a server experiencing excessive CPU usage may not cause the “CPU time” to appear in the top 5 timed events.  Third, we do not know how many CPUs are in the server – would it make a difference if there was only a single CPU compared to 96 CPUs?  Fourth, we have not concept of the elapsed time represented by the report – what if the elapsed time for the report is eight hours?  Fifth, if the single block read time averages more than a second, shouldn’t the investigation start with the IO system?  Sixth, the most that can be determined from the top 5 timed events is the average utilization, number of waits, and number of timeouts – why are the number of timeouts not shown?  From the top 5 timed events it is difficult, if not impossible, to determine if there wasn’t a brief period of intensive activity, followed by a long period of almost no activity – or if the average activity levels were fairly constant in the time period.
  • Page 18 (now page 28), the Top 5 Wait Events (why the switch from Top 5 Timed Events) is labeled as a mockup – probably a good idea that this is labeled because the report shows 94,754 seconds where the server was waiting to send data across the network while single block and multi-block reads totaled only 85.75 seconds.  That is why you do not use a satellite link for the network backbone.
  • Page 19, several grammar errors to this point in the book “It not elegant, or even correct, but it’s a common choice by managers who need quick, low-risk performance improvements…  Individual Program Global Areas (PGA’s for each session) are a set of running programs that do work for the instance and are referred to as processes.”
  • Page 20 (now page 30), describes db_cache_size as “the number of data buffers to allocate for the instance.”  The parameter does not specify the number of data buffers (blocks?), it specifies the amount of memory to allocate to the buffer cache (or the minimum amount in the event the SGA_TARGET is specified).
  • Page 21, the DB_FILE_MULTIBLOCK_READ_COUNT parameter is listed among the parameters that cannot be changed without bouncing the instance – that is not correct.
  • Page 21 (now page 30), states “parallel_automatic_tuning:… since parallel full-table scans are very fast, the CBO will give a higher cost to index access and be friendlier to full-table scans.”  This is simply incorrect.  The optimizer does not increase the cost of index access when costing parallel access paths that use full table scans – it lowers the cost of the full table scan relative to the parallel degree.  Second, parallel full table scans are not automatically very fast, especially if the IO subsystem is bottlenecked.
  • Page 21 (now page 30), the description of the OPTIMIZER_INDEX_COST_ADJ parameter is very weak, “this parameter controls the relative costs of full-table scans versus index scans.”  It would be much better to state that the parameter specifies the percentage of the index access cost to be retained.  Setting the value to a very low value could cause the optimizer to select not only an index access path rather than a full table scan, but to select the wrong index due to cost rounding problems as the value of the OPTIMIZER_INDEX_COST_ADJ parameter approaches 1.  Ideally, since at least Oracle Database 10.1 this parameter should be left at the default value of 100 due to the Oracle optimizer’s use of system statistics by default.
  • Page 21 (now page 30), the description of the OPTIMIZER_INDEX_CACHING parameter is vague – the book doesn’t indicate when the value of the parameter is used in cost calculations, and when it is not used.
  • Page 22 (now page 34) states “In Oracle 10g, statistics collections are automated, but users may still need to selectively add histograms and other specialized optimizer statistics.”  The book does not state what triggers the statistics collection for a specific object, at what time the statistics are collected, that histograms (maybe excessively) are automatically generated, nor the potential problems that might be encountered after the statistics collection.
  • Page 23 (now page 35) states “SQL will run fastest when the first table joins deliver the smallest result set.”  This is stated as an absolute, which absolutely will not be correct all of the time.
  • Page 23 states “the new Oracle10g SQL profile utility will allow changes to execution plans without adding hints.”  This is untrue – a SQL profile is a series of hints that are automatically generated.
  • Page 25 shows operating with OPTIMIZER_MODE set to CHOOSE and deleting a table’s statistics to immediately return the system to acceptable performance.  No warning that this caused the optimizer to switch to the RULE based optimizer, and no demonstration of how statistics should be collected.
  • Page 25 “there was a serious degradation in SQL performance immediately after the implementation of partitioned _tablespaces_ in a 16-CPU Solaris 64-bit Oracle 9i database.”
  • Page 26 shows a grammar error “Note that is the full-table scans are parallelized with 15 parallel query processes to speed up the statistics collection”
  • Page 26 – amazing performance improvements “tripled the performance” “400 percent performance improvement” – no real suggestion of how the silver bullet corrected a problem to cause such a significant increase in performance, so that readers can experience similar performance improvements in their database instances.
  • Page 26 – assumes a data warehouse was set up without a PGA_AGGREGATE_TARGET parameter value and an unspecified value for SORT_AREA_SIZE, resulting in a SORT_AREA_SIZE that was the default value, and only a 50% overall improvement was experienced by setting the parameter to 1MB.  The book did not state that the default value of SORT_AREA_SIZE, when not specified, and PGA_AGGREGATE_TARGET is not used, that the SORT_AREA_SIZE defaults to 64KB.  The book also did not describe how the 1MB value was determined to be the best value, nor what impact changing the parameter’s value has on memory availability in the server.
  • Page 27 – code section shows single quotes around what is apparently a bind variable defined as VARCHAR2.  The single quotes should not appear around that bind variable.  Author states that a concatenated index on the CUSTOMER_STATUS and CUSTOMER_AGE columns resulted in a 50 fold performance improvement and 6 fold reduction in disk IO.  The book does not state how or why, so I wonder how that could happen when the CUSTOMER_STATUS column likely has few unique values and an inequality is used on the CUSTOMER_AGE column.  Was a bitmap or B*tree index used, was the CUSTOMER_STATUS or CUSTOMER_AGE specified as the leading column in the index?
  • Page 27 “While it was not clear why the CBO was not choosing the index, the query with the index hint ran almost 20x faster.  After acting fast and running a script against v$bh and user_indexes, the DBA discovered that approximately 65 percent of the indexes were currently inside the buffer cache.”  There are a couple of problems with this statement.  The statement is not clear whether the DBA found that at least one block from roughly 65% of the indexes found in USER_INDEXES was in the buffer cache, or if the DBA found that roughly 65% of all blocks belonging to the indexes found in USER_INDEXES were in the buffer cache.  The second problem is why did the DBA check USER_INDEXES and not DBA_INDEXES – if the DBA checked USER_INDEXES, he would only see the indexes that belong to the user account that the DBA used to connect to the database (unless he used ALTER SESSION SET CURRENT_SCHEMA to change to another schema).  The third problem is that there may be multiple, possibly many consistent read versions of an individual block in the buffer cache – how did the DBA factor out the multiple copies of blocks when he performed his calculations?  The fourth problem is that the DBA did not consult a 10053 trace for a hard parse of the SQL statement to understand why the expected index was not selected.
  • Page 28 “Based on similar systems, the next step was to lower optimizer_index_cost_adj to a value of 20 in the hopes of forcing the CBO to lower the relative costs of index access.”  At the same time the OPTIMIZER_INDEX_CACHING parameter was set to 65.  What are the problems with this approach?  According to the Oracle documentation, the “OPTIMIZER_INDEX_CACHING parameter lets you adjust the behavior of cost-based optimization to favor nested loops joins and IN-list iterators.  The cost of executing an index using an IN-list iterator or of executing a nested loops join when an index is used to access the inner table depends on the caching of that index in the buffer cache. The amount of caching depends on factors that the optimizer cannot predict, such as the load on the system and the block access patterns of different users.  You can modify the optimizer’s assumptions about index caching for nested loops joins and IN-list iterators by setting this parameter to a value between 0 and 100 to indicate the percentage of the index blocks the optimizer should assume are in the cache. Setting this parameter to a higher value makes nested loops joins and IN-list iterators look less expensive to the optimizer.”  The parameter change is global, and does not apply to all index type accesses, it primarily makes nested loop joins appear to be more favorable than hash joins.  What criteria determined that 20 was the best value for OPTIMIZER_INDEX_COST_ADJ, and why didn’t the DBA correct or even check the system statistics, rather than experimenting with a parameter that should rarely be changed from its default value since the introduction of system statistics in Oracle 9i.
  • Page 28 shows “alter system set optimizer_index_cost_adj=20 scope = pfile;”  Issuing that command will result in an “ORA-00922: missing or invalid option” error.  The pfile cannot be altered using an ALTER SYSTEM command, except of course if the spfile is modified, and then the spfile is written over the top of the existing pfile (init.ora file).
  • Page 28 indicates that the Statspack report shown was from Oracle 9i, yet the report indicates that the wait time is in centiseconds, while an Oracle 9i Statspack report indicates time in seconds.  It is a bit alarming to see 4,796.54 seconds of wait on enqueue (46.71% of total implies total wait time of 10,268.76 seconds), while the 10,579,442 multi-block reads required 1,972.05 seconds (report shows that this is 29.20% of the total, but a calculator shows that it is 19.20%).  Something is wrong with this Top 5 Wait Events report – it is showing that the average multi-block read completed in 0.0001864 seconds, while the average single block read time required 0.002714 seconds.
  • Page 29 – the silver bullet tip to change the CURSOR_SHARING parameter to FORCE did not warn of any of the possible side-effects, which may include poor performance when cursors that should not be shared because of uneven data distributions generate execution plans that are inappropriate for other literal values, or bugs related to CURSOR_SHARING that either result in wrong data being returned, or instance stability problems.  It would be interesting to see how the 75% performance improvement was achieved by changing this parameter.

I guess that if the above mentioned problems are still present in the second edition of the book, it might have taken me significantly longer than the typical one to two hours to write the book review on the remaining 1000+ pages.  Best of luck for anyone ordering the second edition of the book.





Oracle Logging Trigger Creator

25 08 2010

August 25, 2010

In a multi-user database environment it seems that sometimes changes just magically appear in ERP (or other) data.  The questions swirl, “Who changed that record?”; “I know that employee ABC arrived on time yesterday, why does the computer show he arrived late?”; “That new record is missing vital information, who created it?”, etc.  There are auditing features built into Oracle Database, and it is possible to review the redo log contents with LogMiner, but what if you just need a simple way to quietly record changes made to a table’s data?  Sure, triggers could be manually crafted to log changes, but it is a time consuming task, and consistency from one logging table/trigger to the next is difficult to maintain.  Well, this seems to be enough justification to develop a solution in search of a suitable problem, for which the solution will solve (we always invent the solution first, and then search for suitable problems that can be fixed, right?).

Today’s article will offer a VBS script that connects to an Oracle Database, and displays table definitions for all tables in the database using Internet Explorer as the script’s user interface.  For example, the following screen capture shows the tables in the TESTUSER schema with the definition of the LOCATIONS table displayed.  As shown, this table contains two VARCHAR2(15) columns in the primary key, a typical VARCHAR2(80) column, and a CHAR(1) column with a NOT NULL constraint.

—-

Switching to the PARTS table, we see that there is much more to the VBS script that automatically creates logging triggers and the table to maintain the captured data.  For each column in the table we are able to specify whether or not a change in a column’s value will cause a row to be inserted into the logging table – the “Trigger On Change” column in the VBS script’s user interface allows the user to specify the columns that cause the trigger to fire.  The “Log Values” column determines what data will be captured when the trigger fires:

  • Both New and Old (capture the prior values before the change and the values after the change)
  • New Values (capture only the values after the change)
  • Old Values (capture only the values before the change)
  • New Values and Delete (capture only the values after the change, unless the row is being deleted – then capture the values from before the change)
  • Old Values and Insert (capture only the values before the change, unless the row is being inserted – then capture the values from the insert statement)
  • Do Not Log (no data is captured for this column)

Clicking on the Create Data Log Script button builds the script which may be used to create the logging table and the logging trigger.  By default the logging table will be owned by the DATALOG user, although that can be changed within the VBS script.  The generated logging script displayed on the screen should look like this when the above selections are made:

(Added January 6, 2013: Sample trigger code shown above in a text file, save as TRIGGER_DATALOG_or112p_HPM_LOG_PARTS.SQL – without the .DOC extension.)

Notice that the generated script provides details about the purpose of the script, how to execute the script, what passwords need to be specified, sample commands to create a DATALOG tablespace and DATALOG user, the SQL to create the logging table, and the SQL to create the logging trigger.  By design, the default behavior of the script is easy to modify.

You can download the VBS script here: LoggingTrigger.vbs (save as LoggingTrigger.vbs). 

The script logic starts in the StartUp sub:

Sub StartUp()
    Dim strSQL
    Dim intResult
    Dim intFlag

    'Fire up Internet Explorer
    Set objIE = CreateObject("InternetExplorer.Application")
    objIE.Left = 0
    objIE.Top = 0
    objIE.Width = 1100
    objIE.Height = 800
    objIE.StatusBar = True
    objIE.MenuBar = False
    objIE.Toolbar = False

    objIE.Navigate "about:blank"
    objIE.Document.Title = "Charles Hoopers Oracle Logging Trigger Creator"
    objIE.Visible = True

    Set dbDatabase = CreateObject("ADODB.Connection")
    Set snpData = CreateObject("ADODB.Recordset")
    Set comRetrieveTableNames = CreateObject("ADODB.Command")
    Set comRetrieveTableColumns = CreateObject("ADODB.Command")

    'Database configuration
    strUsername = "MyUsername"
    strPassword = "MyPassword"
    strDatabase = "MyDB"

    On Error Resume Next

    dbDatabase.ConnectionString = "Provider=OraOLEDB.Oracle;Data Source=" & strDatabase & ";User ID=" & strUsername & ";Password=" & strPassword & ";"
    dbDatabase.Open

    'Should verify that the connection attempt was successful, but I will leave that for someone else to code

    strTableOwner = "TESTUSER"  'Set to the default owner name to be selected, possibly same as strUsername
    strDataLogSchema = "DATALOG"

As was the case for VBS scripts appearing in earlier articles that interact with the database, you must modify the database connection details, specifying username, password, and database.  Additionally, this script permits setting the default table schema (strTableOwner) and the default logging schema (strDataLogSchema) that display in the script’s user interface.  The script runs in a tight loop (sleeping for 0.5 seconds) in the StartUp sub until the script ends.  Within that loop the script reacts to changes in the Internet Explorer window, such as a selection of a different table name, or clicking the Create Data Log Script button.

When the user selects a different Table Owner the script executes the TableOwnerChange procedure which retrieves a list of the tables that reside in the selected user’s schema (note that this means the username used to run the script must have access to the various DBA_ views).  When a Table Name is selected, the TableNameChange procedure is executed which retrieves the definition for that table, and displays that definition in the Internet Explorer supplied user interface.  When the Create Data Log Script button is clicked the CreateLoggingScript procedure is executed, which writes the SQL*Plus compatible script to the root of the C:\ drive, and then displays the script on the screen.

If we run the script that was generated (after setting the correct passwords in the script), we should see something like this:

SQL> @C:\TRIGGER_DATALOG_or112p_HPM_LOG_PARTS.SQL
Connected.
DROP TABLE DATALOG.HPM_LOG_PARTS
                   *
ERROR at line 1:
ORA-00942: table or view does not exist

Note that the script attempted to drop an old logging table named DATALOG.HPM_LOG_PARTS at the start of the script, and that is what triggered the ORA-00942 error – just ignore this error message.

The PARTS table in this database has a definition that looks like this (it was a table created in one of my scripts mentioned in this article Faulty Quotes 5 – Block Sizes):

CREATE TABLE PARTS (
  PART_ID VARCHAR2(30) NOT NULL ENABLE,
  DESCRIPTION VARCHAR2(40),
  STOCK_UM VARCHAR2(15) NOT NULL ENABLE,
  PLANNING_LEADTIME NUMBER DEFAULT 0 NOT NULL ENABLE,
  ORDER_POLICY CHAR(1) DEFAULT 'M' NOT NULL ENABLE,
  ORDER_POINT NUMBER(14,4),
  SAFETY_STOCK_QTY NUMBER(14,4),
  FIXED_ORDER_QTY NUMBER(14,4),
  DAYS_OF_SUPPLY NUMBER,
  MINIMUM_ORDER_QTY NUMBER(14,4),
  MAXIMUM_ORDER_QTY NUMBER(14,4),
  ENGINEERING_MSTR VARCHAR2(3),
  PRODUCT_CODE VARCHAR2(15),
  COMMODITY_CODE VARCHAR2(15),
  MFG_NAME VARCHAR2(30),
  MFG_PART_ID VARCHAR2(30),
  FABRICATED CHAR(1) DEFAULT 'N' NOT NULL ENABLE,
  PURCHASED CHAR(1) DEFAULT 'Y' NOT NULL ENABLE,
  STOCKED CHAR(1) DEFAULT 'N' NOT NULL ENABLE,
  DETAIL_ONLY CHAR(1) DEFAULT 'N' NOT NULL ENABLE,
  DEMAND_HISTORY CHAR(1) DEFAULT 'N' NOT NULL ENABLE,
  TOOL_OR_FIXTURE CHAR(1) DEFAULT 'N' NOT NULL ENABLE,
  INSPECTION_REQD CHAR(1) DEFAULT 'N' NOT NULL ENABLE,
  WEIGHT NUMBER(14,4),
  WEIGHT_UM VARCHAR2(15),
  DRAWING_ID VARCHAR2(15),
  DRAWING_REV_NO VARCHAR2(8),
  PREF_VENDOR_ID VARCHAR2(15),
  PRIMARY_WHS_ID VARCHAR2(15),
  PRIMARY_LOC_ID VARCHAR2(15),
  BACKFLUSH_WHS_ID VARCHAR2(15),
  BACKFLUSH_LOC_ID VARCHAR2(15),
  INSPECT_WHS_ID VARCHAR2(15),
  INSPECT_LOC_ID VARCHAR2(15),
  MRP_REQUIRED CHAR(1) DEFAULT 'N',
  MRP_EXCEPTIONS CHAR(1) DEFAULT 'N',
  PRIVATE_UM_CONV CHAR(1) DEFAULT 'N',
  AUTO_BACKFLUSH CHAR(1) DEFAULT 'Y',
  PLANNER_USER_ID VARCHAR2(20),
  BUYER_USER_ID VARCHAR2(20),
  ABC_CODE CHAR(1),
  ANNUAL_USAGE_QTY NUMBER(15,4),
  INVENTORY_LOCKED CHAR(1) DEFAULT 'N' NOT NULL ENABLE,
  UNIT_MATERIAL_COST NUMBER(20,6) DEFAULT 0 NOT NULL ENABLE,
  UNIT_LABOR_COST NUMBER(20,6) DEFAULT 0 NOT NULL ENABLE,
  UNIT_BURDEN_COST NUMBER(20,6) DEFAULT 0 NOT NULL ENABLE,
  UNIT_SERVICE_COST NUMBER(20,6) DEFAULT 0 NOT NULL ENABLE,
  BURDEN_PERCENT NUMBER(5,2) DEFAULT 0 NOT NULL ENABLE,
  BURDEN_PER_UNIT NUMBER(20,6) DEFAULT 0 NOT NULL ENABLE,
  PURC_BUR_PERCENT NUMBER(6,3) DEFAULT 0 NOT NULL ENABLE,
  PURC_BUR_PER_UNIT NUMBER(20,6) DEFAULT 0 NOT NULL ENABLE,
  FIXED_COST NUMBER(15,2) DEFAULT 0 NOT NULL ENABLE,
  UNIT_PRICE NUMBER(20,6),
  NEW_MATERIAL_COST NUMBER(14,4) DEFAULT 0 NOT NULL ENABLE,
  NEW_LABOR_COST NUMBER(14,4) DEFAULT 0 NOT NULL ENABLE,
  NEW_BURDEN_COST NUMBER(14,4) DEFAULT 0 NOT NULL ENABLE,
  NEW_SERVICE_COST NUMBER(14,4) DEFAULT 0 NOT NULL ENABLE,
  NEW_BURDEN_PERCENT NUMBER(5,2) DEFAULT 0 NOT NULL ENABLE,
  NEW_BURDEN_PERUNIT NUMBER(20,6) DEFAULT 0 NOT NULL ENABLE,
  NEW_FIXED_COST NUMBER(15,2) DEFAULT 0 NOT NULL ENABLE,
  MAT_GL_ACCT_ID VARCHAR2(30),
  LAB_GL_ACCT_ID VARCHAR2(30),
  BUR_GL_ACCT_ID VARCHAR2(30),
  SER_GL_ACCT_ID VARCHAR2(30),
  QTY_ON_HAND NUMBER(14,4) DEFAULT 0 NOT NULL ENABLE,
  QTY_AVAILABLE_ISS NUMBER(14,4) DEFAULT 0 NOT NULL ENABLE,
  QTY_AVAILABLE_MRP NUMBER(14,4) DEFAULT 0 NOT NULL ENABLE,
  QTY_ON_ORDER NUMBER(14,4) DEFAULT 0 NOT NULL ENABLE,
  QTY_IN_DEMAND NUMBER(14,4) DEFAULT 0 NOT NULL ENABLE,
  USER_1 VARCHAR2(80),
  USER_2 VARCHAR2(80),
  USER_3 VARCHAR2(80),
  USER_4 VARCHAR2(80),
  USER_5 VARCHAR2(80),
  USER_6 VARCHAR2(80),
  USER_7 VARCHAR2(80),
  USER_8 VARCHAR2(80),
  USER_9 VARCHAR2(80),
  USER_10 VARCHAR2(80),
  LT_PLUS_DAYS NUMBER,
  LT_MINUS_DAYS NUMBER,
  STATUS CHAR(1),
  USE_SUPPLY_BEF_LT CHAR(1),
  QTY_COMMITTED NUMBER(14,4) DEFAULT 0 NOT NULL ENABLE,
  PRT_CREATE_USER_ID VARCHAR2(30) DEFAULT USER,
  PRT_CREATE_DATE DATE DEFAULT SYSDATE,
  CONSTRAINT "CHK_PART1" CHECK (
    (PLANNING_LEADTIME >= 0)
    AND (ORDER_POLICY = 'N'
      Or ORDER_POLICY = 'M'
      Or ORDER_POLICY = 'F'
      Or ORDER_POLICY = 'E'
      Or ORDER_POLICY = 'D'
      Or ORDER_POLICY = 'P')
    AND (ORDER_POINT >= 0)
    AND (SAFETY_STOCK_QTY >= 0)
    AND (FIXED_ORDER_QTY >= 0)
    AND (DAYS_OF_SUPPLY >= 0)
    AND (MINIMUM_ORDER_QTY >= 0)
    AND (MAXIMUM_ORDER_QTY >= 0)
    AND (FABRICATED = 'Y' Or FABRICATED = 'N')
    AND (PURCHASED = 'Y' Or PURCHASED = 'N')
    AND (STOCKED = 'Y' Or STOCKED = 'N')
    AND (DETAIL_ONLY = 'Y' Or DETAIL_ONLY = 'N')
    AND (DEMAND_HISTORY = 'Y' Or DEMAND_HISTORY = 'N')
    AND (TOOL_OR_FIXTURE = 'Y' Or TOOL_OR_FIXTURE = 'N')
    AND (MRP_REQUIRED = 'Y' Or MRP_REQUIRED = 'N')
    AND (MRP_EXCEPTIONS = 'Y' Or MRP_EXCEPTIONS = 'N')
    AND (PRIVATE_UM_CONV = 'Y' Or PRIVATE_UM_CONV = 'N')
    AND (INVENTORY_LOCKED = 'Y' Or INVENTORY_LOCKED = 'N')
    AND (INSPECTION_REQD = 'Y' Or INSPECTION_REQD = 'N')) ENABLE,
  PRIMARY KEY (PART_ID),
  CONSTRAINT "FKEY_INSP" FOREIGN KEY (INSPECT_WHS_ID, INSPECT_LOC_ID)
    REFERENCES LOCATIONS (WAREHOUSE_ID, LOCATION_ID) ENABLE,
  CONSTRAINT "FKEY_PREF_VENDOR" FOREIGN KEY (PREF_VENDOR_ID)
    REFERENCES VENDORS (VENDOR_ID) ENABLE,
  CONSTRAINT "FKEY_UM" FOREIGN KEY (WEIGHT_UM)
    REFERENCES UMS (UNIT_OF_MEASURE) ENABLE,
  CONSTRAINT "FKEY_STOCK_UM" FOREIGN KEY (STOCK_UM)
    REFERENCES UMS (UNIT_OF_MEASURE) ENABLE);

Yes, it is an ugly, long table definition that references a couple of additional tables.  Now, let’s make some changes to the above table:

INSERT INTO PARTS (
  PART_ID,
  DESCRIPTION,
  STOCK_UM,
  PLANNING_LEADTIME,
  ORDER_POLICY,
  FABRICATED,
  PURCHASED,
  STOCKED,
  DETAIL_ONLY,
  DEMAND_HISTORY,
  TOOL_OR_FIXTURE,
  INSPECTION_REQD,
  INVENTORY_LOCKED,
  UNIT_MATERIAL_COST,
  UNIT_LABOR_COST,
  UNIT_BURDEN_COST,
  UNIT_SERVICE_COST,
  BURDEN_PERCENT,
  BURDEN_PER_UNIT,
  PURC_BUR_PERCENT,
  PURC_BUR_PER_UNIT,
  FIXED_COST,
  UNIT_PRICE,
  NEW_MATERIAL_COST,
  NEW_LABOR_COST,
  NEW_BURDEN_COST,
  NEW_SERVICE_COST,
  NEW_BURDEN_PERCENT,
  NEW_BURDEN_PERUNIT,
  NEW_FIXED_COST,
  QTY_ON_HAND,
  QTY_AVAILABLE_ISS,
  QTY_AVAILABLE_MRP,
  QTY_ON_ORDER,
  QTY_IN_DEMAND,
  QTY_COMMITTED)
SELECT
  'AA'||ROWNUM,
  'WATCH ME',
  'EA',
  0,
  'N',
  'Y',
  'N',
  'N',
  'N',
  'N',
  'N',
  'N',
  'N',
  100,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  109,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0
FROM
  DUAL
CONNECT BY
  LEVEL<=5;

5 rows created.

The above added 5 rows to the PARTS table.  Now let’s modify those rows:

UPDATE
  PARTS
SET
  DESCRIPTION='CHANGED'
WHERE
  PART_ID IN ('AA1','AA2');

2 rows updated.

UPDATE
  PARTS
SET
  SAFETY_STOCK_QTY=10,
  STOCK_UM='PC',
  FIXED_ORDER_QTY=10
WHERE
  PART_ID IN ('AA1','AA4','A5');

2 rows updated.

UPDATE
  PARTS
SET
  SAFETY_STOCK_QTY=10,
  STOCK_UM='PC',
  FIXED_ORDER_QTY=10
WHERE
  PART_ID IN ('AA1','AA4','AA5');

3 rows updated.

DELETE FROM
  PARTS
WHERE
  PART_ID IN ('AA2','AA3','AA4','AA5');

4 rows deleted.

COMMIT;

SELECT
  COUNT(*)
FROM
  DATALOG.HPM_LOG_PARTS;

  COUNT(*)
----------
        14

As indicated by the above output, our logging table now has 14 rows of data.  Let’s view that data using a spreadsheet program (in this case Excel 2010):

Obviously, it is important to be smart when deciding which columns to capture, whether the old and/or new values should be captured, and changes to which columns should cause the trigger to fire.  So, what was the inspiration for this article?  One of the features in my Toy Project for Performance Tuning:

The VBS script should work on client computers running Windows 2000 or above.  Have fun with the script, and let me know about any improvements that you make to the script (there were a couple of neat improvements to the Automated DBMS XPLAN Viewer script).





Strange Search Terms – Are the Answers Out There?

22 08 2010

August 22, 1010

I posted the first blog articles on this blog roughly nine months ago.  In the last nine months I have published roughly 230 blog articles, which is roughly 210 more than anticipated.  What happened to just posting Oracle Database test cases to demonstrate Oracle Database behavior?  I started seeing somewhat unusual search terms appearing in WordPress’ statistics for the site, so I gradually added additional content.  The additional content that I wrote did not necessarily answer the questions found in searches, but the searches provided me with motivation, thinking that DBAs and developers really wanted to know THE answer quickly.  Is it possible to be both quick and thorough?  I guess that my answer to that question is no, otherwise I would have stopped after completing the 20th blog article.

So, what are some of the most interesting and/or strange search terms?  Here are a few – these are actual search terms (in a completely non-random order):

pig outer join
nnnnnnnnnnnnnnnnnnnnn
buffer hit% awr
low buffer hit ratio
buffer cache hit ratio 11g
cache hit percentage oracle 11g            (53 hits)
optimize non standard block cache sizes    (79 hits)
using a large database block size for or
funny examples of root cause analysis
oracle inconsistent reads
ipad metalink oracle
more locks with oracle 11g
oracle index hint not working
site:hoopercharles.wordpress.com burles
sql injection ora-00907
ora books architectural books
sql true false quiz
michigan oaktable symposium               (65 hits)

Searching to find knowledge… a much better approach than knowing the answer, while not knowing the question that is being answered.

Entering, stage right… Moans the Magnificent (Carnac the Magnificent‘s distant cousin).  Moans the Magnificent might not know the questions to the first two search answers, but I am sure that he would have no trouble with the remainder of the list.  Possible examples:

Answer: Michigan OakTable Symposium
Possible follow-up question provided by Moans the Magnificent: Name an event that not even the oracles of the world could have predicted to have more of an impact on the climate than Al Gore inventing the Internet.

Answer: SQL Injection
Possible follow-up question provided by Moans the Magnificent: What is the relationship between code monkeys, fertility clinics, and the big book of child names?

Moans the Magnificent will be the opening session at the Michigan OakTable Symposium in Ann Arbor, MI on Thursday September 16, 2010 and Friday September 17, 2010.  Registration for the event is still open, however hotel rooms are in short supply.  Moans the Magnificent will be joined by roughly twenty other OakTable Network members from around the world, providing a one-of-a-kind two day training session (this will not be an annual event) with virtually on-on-one contact with some of the world’s most respected Oracle DBAs, consultants, programmers, and there might even be a chance to mingle with a couple of the developers who wrote some of the Oracle Database features that we all use on a daily basis.

You could continue to search for the answer… or you could directly talk with the people who know the answer.





Network Monitoring Experimentations 7

19 08 2010

August 19, 2010

(Back to the Previous Post in the Series)

In the previous articles of this series we saw several cases where Wirehark and other utilities helped identify various types of network related problems, and those blog articles provided screen captures that should allow you to identify network patterns that might exist in your own enviroments.  The question likely remains: “But (as a DBA) will we ever use this information?”  A recent thread on the OTN forums suggests maybe… especially if someone has “optimized” the servers.  I contributed several responses in that thread.

In the forum thread the original poster mentions that he is experiencing performance problems when querying the database over a 1Mb/s WAN connection:

“For example, a query on a heavy records table that retrieves about 700 KB of information in 500 records takes us 3:31 minutes, which gives an average speed of downloading about 3-4 KB/s, though we have upload bandwidth in abundance to the server and more than enough download bandwith in the customer’s network. With this query specifically happens that the reception / transmission of data takes place at regular intervals, receiving a chunk of around 20 KB and waiting about 5 seconds to receive new data.”

Waiting 5 seconds to retrieve 20KB – maybe someone swapped the 1Mb/s connection for a 56Kb/s dial-up modem which should be able to download 5KB to 7KB per second.  Or, maybe there is a better explanation – network monitoring applied.  In the OTN thread Sybrand offered a very good suggestion to use a trace route (TRACERT), the output of which looked something like this:

----
 1  * * * Tiempo de espera agotado para esta solicitud.
 2  * * * Tiempo de espera agotado para esta solicitud.
 3  69 ms  69 ms  69 ms
 4  69 ms  70 ms  70 ms
 5  70 ms  72 ms  70 ms
 6  69 ms  70 ms  73 ms
 7  70 ms  70 ms  70 ms
 8  82 ms  88 ms  83 ms
 9  97 ms 102 ms  99 ms
10  98 ms  97 ms 101 ms
11  97 ms  98 ms  98 ms
12  99 ms  98 ms  98 ms
13 102 ms  98 ms  97 ms
14  98 ms  97 ms  97 ms
----

The two timeouts at the start of the output imply that there are two devices, very likely firewalls, that are in the network pathway closest to the computer that executed the command.  The times for a response from each hop on the way to the other side of the connection are between 70ms (0.070 seconds) and 100ms (0.100 seconds).  The high response times imply that the connection probably is not a T1 or similar high-cost 1Mb/s connection, and the low response times imply that the connection cannot be satellite based where 1000ms (1.000 second) to 2500ms (2.500 seconds) response times would be expected.  The original poster confirmed that the connection is a DSL connection – such a connection likely does not have a quality of service guarantee in the event that performance is very slow, or that would take top priority if 95% of the packets are dropped (possibly needing to be retransmitted multiple times).  The times reported by trace route are important – if a network packet is sent from the server to the client it will probably require between 0.035 and 0.500 seconds for the packet to reach the client, and an equal amount of time for the ACK confirmation from the client essentially stating that the packet was successfully received and that it is ready for the next set of packets.

Sybrand then offered another very good suggestion, enabling a 10046 trace (reference #2) while remotely querying the database.  A TKPROF summary of the 10046 trace file showed the following statistics:

Execution Plan Time:   0.013437 seconds
CPU Time:              0.01 seconds
Elapsed Time:        185.38 seconds
Rows:                500

Sever-side waits:              0.03 seconds
SQL*Net more data to client: 185.32 seconds, 5.13 seconds max wait, 373 waits
SQL*Net message from client:  10.50 seconds, 5.02 seconds max wait,   6 waits

If you have viewed many TKPROF summaries, or raw 10046 trace files, you would probably recognize that the array fetch size was probably set at 100, and also just how uncommon 5+ second wait are for the “SQL*Net more data to client” wait event.  Even though the Oracle Database documentation provides little information about this event, you should be able to recognize that something is very wrong with the 5.13 second maximum wait time.  In the opening quote of this blog article the OP mentioned that it required about 5 seconds to transmit 20KB over the WAN connection – could it be that one of these 5 second long “SQL*Net more data to client” waits happened during the transfer, making it appear that the average thoughput was roughly 4KB/s?

Now what, we have tried TRACERT and a TKPROF summary of a 10046 trace file.  Maybe manually analyze the raw 10046 trace file looking for patterns?  Or, we can try what I suggested in the tread – network monitoring with Wireshark.  Fortunately, the OP already had access to Wireshark, he just needed a little assistance working through the packet capture.  The OP mentioned:

“But we used Wireshark in the server side and we observed the strange behaviour I spoke earlier: several packets (all together around 20 KB) an then around 5 seconds of NOTHING, nor ACK neither dropped packets. I didn’t test the client side but I assume that’d be the same stuff.”

The important key to understanding what is happening is to determine when the 5 seconds of inactivity happened.  Typically, a computer will send two network packets to a remote device and then wait patiently for an acknowledgement (ACK) that the two packets were received before it sends the next two packets.  If an ACK is not received within 0.2 seconds, the sender will assume that the packets were lost in transit and resend the packets.  So, we need to examine when the 5 seconds of inactivity happened:

  • Immediately after the server sent two packets to the client computer.
  • Immediately after the server sent a single packet to the client computer.
  • Immediately after receiving an ACK packet from the client computer.

In the case of the first bullet point, that might tell us that something happened to the client computer or the DSL connection.  However, the server should have taken action after 0.2 seconds and attempted to send the packets again.  In the case of the second and third bullets, we probably need to look at either what is happening in the server (maybe it took 5 seconds to jump back on the CPU to fetch the next set of rows, but it seems like that would have shown in the TKPROF summary?) or the local network (maybe the switch port dropped out for 5 seconds, but that would have also affected the clients that are local to the server).

Signs of trouble:

“The network administrators from the housing service have reconfigured the interface to force 100 Mb full duplex and guess the time it takes now to the query that took 3:31 s before: 1:38 s!”

Someone has been “optimizing” the servers and/or the network.  What happens when a switch port is forced to 100Mb/s full duplex without touching the server, which by the way has a gigabit network interface?

http://www.cisco.com/en/US/tech/tk389/tk214/technologies_tech_note09186a0080094781.shtml#when:

“One of the most common causes of performance issues on 10/100 Mb Ethernet links occurs when one port on the link operates at half-duplex while the other port operates at full-duplex. This occurs when one or both ports on a link are reset and the auto-negotiation process does not result in both link partners having the same configuration. It also can occur when users reconfigure one side of a link and forget to reconfigure the other side. Both sides of a link should have auto-negotiation on, or both sides should have it off. Cisco recommends to leave auto-negotiation on for those devices compliant with 802.3u.”

http://en.wikipedia.org/wiki/Autonegotiation:

“A duplex mismatch occurs when two connected devices are configured in different duplex modes. This may happen for example if one is configured for autonegotiation while the other one has a fixed mode of operation that is full duplex (no autonegotiation). In such conditions, the autonegotiation device correctly detects the speed of operation, but is unable to correctly detect the duplex mode. As a result, it sets the correct speed but starts using the half duplex mode.”

http://docs.hp.com/en/601/autonegotiation.pdf:

“However, parallel detection cannot detect the duplex mode so the duplex mode will default to Half duplex. The resulting link configuration will be able to send and receive frames but performance will be very poor. The performance will be poor because the Full duplex MAC disables the carrier sense and collision detect circuitry. So, when i t has frames to transmit it will transmit irrespective of what the Half Duplex MAC is doing. This will cause collisions with the Full Duplex MAC not backing off.”

From one of my posts in the thread:

“Something seems to be very odd here. First, the two sides are connected by a high latency (Internet) connection with many hops, and an average ping time of about 0.1 seconds. You will probably find that the MTU size (consider this to be the maximum packet size before packet fragmentation) when jumping through the various routers is probably about 1500 bytes. If the network administrators have configured the servers and local switch to support jumbo frames (8KB packets, for instance), those jumbo frames will be split into multiple packets so that they are able to pass through the intermediate routers between the server and client computers (you might not see a problem on the server side while one might appear on the client side). You indicate that the time delta between packets is about 0.0001 seconds (0.1ms), which is significantly less than the ping times suggested by the trace route – how many of those packets appear together with a delay of 0.0001 seconds in between?

If you have a Nagle/delayed ACK problem where one side is set to send an ACK after 13 (or some other number more than 2) packets while the other side is set to send an ACK after the default of 2 packets, that might, in part, explain the 5 seconds where no packets appeared on the server side of the Wireshark capture (for reference, on a gigabit network, a 130MB file will typically transfer across the network in about 3 second. When a Nagle/delayed ACK problem is present, that same transfer requires roughly 45 minutes).

Your network administrators forced the servers to 100Mb/s and it improved performance? This very well could be just a coincidence. If the server is forced to 100Mb/s full duplex but the switch port is not also forced to 100Mb/s full duplex, in most cases the actual connection speed will fall back to the minimum possible configuration – 10Mb/s half duplex. I wonder if the network administrators previously forced the network switch port to 100Mb/s full duplex but forgot to do the same for the server? Still, at 10Mb/s half duplex, the speed (throughput) is probably faster than your WAN link, so that change should not have made much of a difference. The very low latency between the server and switch is still much lower than the latency that you reported with the trace route – and that is what is likely killing performance. I wonder if the network administrators also disabled jumbo frames when they forced the connection speed? Disabling jumbo frames might explain the sudden decrease in time.”

I mentioned falling back to 10Mb/s half duplex, while the other quotes above seem to indicate that the speed will fall back to 100Mb/s half duplex – maybe some of these switches now consider 100Mb/s half duplex as the minimum.  Even at 10Mb/s half duplex the speed should be enough to fully saturate the 1Mb/s WAN connection… until we consider latency and the ACK frequency.  If it takes an average of 0.9 seconds for the server to see the next request for the next two 1500 byte network packets, that just might slow down the connection a bit.  Someone might have the wise idea of configuring the server to send more than two packets, maybe 16, at a time before waiting for the client to send back a single ACK packet.  Sounds like a good idea, right?  In part three of this series I showed why that might not be such a good idea – especially if only one side of the connection is configured for the modified ACK frequency.  That could never happen in a production environment, right?

The OP provided a screen capture of Wireshark running on his server.  I have attempted to make the screen capture a bit more generic and have marked up the screen capture:

I provided the following comment:

“That screen capture (assuming that it was captured from the server) shows that the server sent 16 packets in rapid fire and then waited. 78ms later [indicated by the green boxes in the screen capture] (roughly the ping time that you reported based on the TRACERT output) your server received 8 ACK packets – one for every 2 packets sent by the server (the client side is not configured to ACK after every 8 or 16 packets as the server appears to be configured). You will see a somewhat similar screen capture here where I “optimized” the ACK frequency on one side of the connection with a large array fetch size.

If I am reading the Wireshark screen capture correctly, the 5 second delay [the blue boxes in the screen capture show the roughly 5 second delays] happened after the server received 8 ACK packets (ACKs for the previous 16 packets that the server sent), and before the server continued to send the next batch of packets to the client. The problem appears to be on the server side of the connection – the server is probably waiting for another packet or a timeout. The typical timeout permitted to wait for an expected packet is 0.2 seconds, much less than the 5 second delay. Is it possible that someone “optimized” some of the other network parameters on the Windows server?” 

This OP asked why it appears that the client sent back 8 ACK packets in rapid fire to the server, this is my response:

“It is an illusion that all of the ACK packets are being sent together – well, not really. The ACK packet is a little bit like a method of controlling the speed of a transfer: the sender promises not to send more than (by default) two network packets at a time before receiving confirmation that the two packets were safely received by the sender. If the sender does not receive an ACK packet typically within 0.2 seconds it assumes that the packets were lost in transmission and attempts to resend the packets. Your server is sending out 16 packets before pausing to wait for the return of the single ACK packet to acknowledge that the previous 16 packets were received. The client computer at the other end of the DSL line receives the first two packets 0.04 seconds after the server sent them and replies with an ACK for those two packets. It then immediately sees the next two packets and replies with an ACK for those two. It then immediately sees the next two packets are replies with an ACK for those two. This continues until the client receives the 15th and 16th packets (a Wireshark capture on the client should show this). The server sees the 8 ACK packets for every 2 packets sent coming in all at once (0.04 seconds after being sent by the client), when it is expecting a single ACK for the previous 16 packets. I assume that this is at least part of the cause for the 5 second pauses.”

The OP provided a screen capture of his client when it submitted the same SQL statement (likely from a different time period), and mentioned that there are a lot of checksum errors appearing on the client side:

This is my response from the thread:

“The header checksum errors are caused by TCP checksum offloading. You will typically receive that message when a better quality network card is installed in the computer – the packet checksum calculation/verification is handled by the CPU in the network card itself, rather than relying on the host computer’s CPU. I think that one of my blog articles mentions this feature. More information may be found here.

That is interesting that you periodically see 4 packets arriving with a single ACK packet sent in return – possibly some of the packets were received by the client out of order, so the client had to wait for the expected packet to be received. The typical pattern for a database sending packets seems to be one or more TCP packets with “[TCP segment of a ressembled PDU]” in the info column followed by a single TNS packet with “Response, Data (6), Data” in the info column – that TNS packet seems to marks the end of the array fetch (although I may need to perform additional analysis). Based on some of the tests that I have performed with varying the array fetch size, if you cannot track down why the ACK frequency is wrong, you may be able to work around the issue by setting the array fetch size much smaller.

In the above blog article, you can compare the “Optimized” transfer speed as the fetch array size is increased, and also compare it with the unoptimized (no one has changed the ACK frequency) transfer. You will see that as the fetch array size increases, the ACK frequency drops off – this is a setup where I modified the ACK frequency on the client, while I believe that the ACK frequency of your server was adjusted.”

The OP also stated that on the server-side of the network he tried a couple of network transfer performance tests.  He tried copying 640MB of data and saw estimated completion times of 140 to 250 minutes.  He then changed a setting on the server’s network interface and the time dropped to about 4 minutes.  In the forum thread I mentioned the following:

“Something is wrong with the above. On a 100Mb/s connection, I am able to copy a 129MB file to a laptop in about 18 seconds. That suggests a 10Mb/s connection (which you will see when the server and switch are not set to the same speed – if not auto-negotiate) would be able to transfer roughly 12.9MB in the same 18 seconds. Doing the math:

129MB in 18 seconds at 100Mb/s, 12.9MB in 18 seconds at 10Mb/s
640MB/12.9 * 18 = 893 seconds = 14.9 minutes to transfer 640MB over a 10Mb/s connection.
(or 12.9MB/18 = 0.7166MB/s, 640MB / 0.7166MB ~ 893 seconds)

Or another way, assuming that the real-world speed of a 10Mb/s is close to 7Mb/s:

640MB / 7mb/s * 8bits = 731.43 seconds = 12.2 minutes to transfer 640MB over a 10Mb/s connection.

You should not be seeing 140 or 250 minutes (unless your server is not sending ACK correctly, in which case Wireshark would show periodic 0.2 second delays) – it should be close to 15 minutes.”

—-

As I mentioned, this OTN thread is interesting, providing us an opportunity to use some of the tools mentioned in earlier blog articles to try to find a resolution to a particularily troubling problem found in the wild.  While the complete cause of the 5 second delay problem has need yet been discovered, it should be an interesting thread to follow.  My vote on the cause of the problem is a combination of delayed ACKs due to a configuration change on the server (problem not yet found in the Windows registry), out of order receipt of packets, the server becoming confused at receiving 8 ACK packets when it expected 1, and a couple of network hardware configuration issues (maybe even the use of jumbo packets that are being fragmented) such as having to conform to the 1476 byte MTU of the DSL connection when the network local to the server is configured for a larger MTU size.





Book Review: Expert Oracle Database Architecture: Oracle Database Programming 9i, 10g, and 11g Techniques and Solutions, Second Edition

16 08 2010

August 16, 2010

Foundation Knowledge to Understand Most Any Other Oracle Database Book – Required Reading for All DBAs and Developers
http://www.amazon.com/Expert-Oracle-Database-Architecture-Programming/dp/1430229462

I pre-ordered this book in May, having previously read the two previous editions of this book (“Expert One-On-One Oracle” and “Expert Oracle Database Architecture: 9i and 10g Programming Techniques and Solutions”) cover to cover.  Now that I have had an opportunity to fully read this edition, I now realize that the first edition of this book subconsciously set the standards that I expected of all other Oracle Database books I have subsequently purchased.  The latest edition is now among my five favorite Oracle Database books, sharing the top five ranking with “Cost-Based Oracle Fundamentals”, “Troubleshooting Oracle Performance”, and “Optimizing Oracle Performance”.

The author of this book is a Vice President at Oracle Corporation, and is well known for his asktom.oracle.com website, articles in Oracle’s magazine, forum/Usenet postings, and presentations.  The author recently revised the Oracle Concepts Guide, which is part of the Oracle Database 11.2 documentation library.  Slightly off topic: It might be notable that a book written by a VP at Oracle Corporation is not published by Oracle Press.  Oracle Press is a marketing label used by McGraw Hill to market its line of Oracle books – the books are not formally written by Oracle Corporation nor are they formally tech edited by Oracle Corporation, although some employees of Oracle Corporation have used McGraw Hill as a publishing company.  At least for me, for several years the “Officially Authorized Oracle Press” logo on books published by McGraw Hill seemed to be an endorsement of the books by Oracle Corporation, but that is not the case, as was clarified by one of the threads on the AskTom site (Reference Reference2).

The book contents are very well organized, allowing the reader to continually build on previously introduced concepts.  Many forward and backward references in the book are provided – “see the section a1 in chapter n for additional information”.  It is surprising just how much new content is included in this edition of the book – this book reads as though it were written specifically for Oracle Database 11g R2 with exceptions in behavior mentioned for older Oracle Database releases.  Even if you have read one of the older releases of the book, I do not recommend just jumping into one of the chapters using either the table of contents or the index at the back of the book.  Start at the beginning of the book and finish at the end of the book.  The author introduces very useful bits of information in the chapters that are not specific to the chapter title, and do not appear in the table of contents or the index at the back of the book.  Just a couple of examples of useful information that will be missed (especially if you only read the first half of the book):

  • Page 421: UTL_RAW.CAST_TO_RAW and UTL_RAW.CAST_TO_VARCHAR2 functions with sample code.
  • Page 434: Using DUMP(number, 16) to see the raw bytes used to store values.
  • Page 464: Using DBMS_UTILITY.GET_CPU_TIME to determine the previously used CPU seconds for the session.
  • Page 465: Oracle Database 10g R2’s deterministic function optimization.
  • Page 470: Use of the analytic function RATIO_TO_REPORT
  • Page 472: Creating a UNIQUE function based index to enforce uniqueness of values in a column when another column contains a specific value.
  • Page 477: Oracle Forms by default updates every column including primary key columns, and that behavior could lead to deadlocks.
  • Page 478: Disabling table locks on a table will help detect unindexed foreign keys.
  • Page 487: Using DBMS_SPACE.FREEBLOCKS to determine the number of index blocks in the index freelist.
  • Page 512: Using the VSIZE function to indicate how much storage space a column value requires.
  • Page 513: Warns of a calculation error (that is not a bug) when adding the floating point numbers 0.3 and 0.1 – the result is not 0.4.
  • Page 646: Using DBMS_STATS.SET_TABLE_STATS to simulate a large table in order  to determine how data volume changes will impact the execution plan generated by the optimizer.
  • Page 656: Using the NTILE analytic function to break a rowset into equally sized groups of rows.
  • Page 687: Starting with Oracle 9i it is possible to insert into one or more tables using complex WHEN conditions.
  • Page 739: Encrypting a column in a table may cause significant changes to the execution plans that specify that column in join or filtering conditions.
  • Page 740: Index range scans are not possible with indexes created for encrypted columns.

The author uses an odd, yet very helpful teaching method throughout the book.  Break something, with the assumption that you are reading the chapter because something is already broken, and then show you how to fix the problem and ultimately avoid the problem in the future.  Unfortunately, the index at the back of the book only contains a small handful of these problem-solution cases – another reason to read the whole book and take good notes.  A list of the errors covered in the book as teachable moments follows:

  • Chapter 1: ORA-00060 (deadlock detected while waiting for resource: page 6), ORA-00942 (table or view does not exist: page 19), ORA-08177 (can’t serialize access for this transaction: pages 37, 38), ORA-02391 (exceeded simultaneous SESSIONS_PER_USER limit: page 44)
  • Chapter 2: ORA-12162 (TNS:net service name is incorrectly specified: page 53), ORA-01078 (failure in processing system parameters: page 55), ORA-16196 (database has been previously opened and closed: page 57)
  • Chapter 3: ORA-00942 (table or view does not exist: pages 70, 76) , ORA-25138 (initialization parameter has been made obsolete: page 71), ORA-01031 (insufficient privileges: pages 75, 110, 112), ORA-02096 (specified initialization parameter is not modifiable with this option: page 78), ORA-12913 (Cannot create dictionary managed tablespace: page 102), ORA-12505 (TNS:listener does not currently know of SID given in connect descriptor: page 111)
  • Chapter 4: ORA-29339 (page 155)
  • Chapter 6: ORA-00054 (pages 210, 224), ORA-00060 (page 211), ORA-00069 (page ?)
  • Chapter 7: ORA-01555 (pages 252, 254, 255), ORA-08177 (pages 252, 261)
  • Chapter 8: ORA-02290 (pages 270, 272, 274, 297), ORA-02292 (page 281), ORA-02291 (page 281), ORA-30036 (pages 288, 289)
  • Chapter 9: Checkpoint not complete (page 321), ORA-01403 (page 324), ORA-01555 (pages 323, 334-344)
  • Chapter 10: ORA-22908 (page 401), ORA-01427 (page 401)
  • Chapter 11: ORA-25954 (page 459), ORA-01450 (pages 465, 466), ORA-01743 (page 472), ORA-00001 (page 476), ORA-00069 (page 478)
  • Chapter 12: ORA-12899 (page 504), ORA-01461 (page 504), ORA-01465 (page 507), ORA-01438 (pages 510-512), ORA-00932 (page 518), ORA-01839 (pages 529, 534), ORA-30079 (page 538), ORA- 22285 (page 553)
  • Chapter 13: ORA-00376 (pages 559, 595, 611), ORA-01652 (page 562), ORA-14511 (page 562), ORA-01843 (page 567), ORA-14400 (pages 569, 575), ORA-14767 (page 576), ORA-14652 (page 576), ORA-14659 (page 585), ORA-14402 (page 585), ORA-01408 (page 598), ORA-02429 (page 600), ORA-14038 (page 600)
  • Chapter 14: ORA-12838 (page 633)
  • Chapter 15: SQLLDR-601 (page 660), ORA-22285 (page 679), SQL*Loader-144 (page 688), ORA-29913 (page 691), ORA-06512 (page 692)
  • Chapter 16: ORA-28365 (page 718), ORA-28348 (page 720)

Helpful Foundation Knowledge Found in the Book:

  • Notes at the front of the book show how to quickly install Statspack, change AUTOTRACE settings, permanently configure the SQL*Plus environment, and the requirements for the Runstats, Mystat, Show_Space, and Big_Table scripts that are used later in the book.
  • Provides a warning not to use bitmap indexes on columns just because the columns have low cardinality, for example a column named PROCESSED that has either a value of Y or N, without first considering concurrency – how will the column be used and updated.
  • Pages 17 through 21 demonstrate SQL injection when bind variables are not used in an example that most people probably would not consider as a threat – the SQL injection was accomplished by simply changing the NLS_DATE_FORMAT (Reference).
  • Page 72 warns against setting undocumented parameters: “Use undocumented parameters only at the request of Oracle support. Their use can be damaging to a database, and their implementation can – and will – change from release to release.”  This is a warning that should be included in every book before the first mention of hidden parameters.
  • The book provides a detailed list and description of the functionality provided by Oracle’s background processes.
  • Demonstrates how to build test cases to verify performance/scalability issues.
  • Pages 262-265 provide a very detailed demonstration of transactions automatically restarting.  This behavior might cause, for example, before update triggers to fire more than once, various problems with PL/SQL variables, autonomous transactions, and various PL/SQL packages (UTL_FILE, UTL_SMTP, UTL_HTTP, etc.).
  • Excellent section on the topic of index myths, one of which had its roots in Oracle Database 5.
  • Excellent six case summary of why an index may not have been used.
  • Shows a lot of examples of practical SQL and PL/SQL in test cases, which is extremely helpful for people trying to learn either SQL or PL/SQL.  The test cases frequently include TKPROF output.
  • The book includes a nice analogy of Oracle Database’s parallel execution option by comparing that feature with writing either a short or long report – coordinating the writing of a small report could take longer than simply writing the report without assistance.
  • Excellent section dedicated to answering frequently asked questions about SQL Loader.
  • The chapter showing the use of external data includes examples with SQLLOADER, external tables, flat files, and Data Pump unload (used to write out the contents of a table to allow that table’s rows to be used in another database).
  • The book almost always lists the first Oracle release version that offered a feature when that feature is introduced in the book.
  • Typically indicates when syntax introduced in the book is operating system dependent.  For example, on page 682 “ls –l” is specified for Unix/Linux and “dir /q /n” for Windows.
  • The downloadable source code library seems to be complete, making it easy for readers to reproduce the tests that are found in the book.

Not everything about the book is perfect.  The errata page for the book currently lists a problem where the author used the word “be” too many times in a sentence.  What else needs a little fine tuning, is not quite right, or is simply incorrect?  The following are my observations about the book:

  • The book frequently seems to assume that a fully optioned version of the Enterprise Edition of Oracle is used by all of the readers of the book, which may set the readers up for false hope in solving certain problems, or implementing solutions that cannot be used in production environments (the chapters discussing bitmap indexes, partitioning, and parallel execution are exceptions, where the requirements are stated):
    — Advanced Queuing on page 10
    — SQL Profiles on page 11
    — FLASHBACK on page 26
    — Fine-grained access control on page 46
    — Index key compression on page 407
    — Chapter on the topic of Data Encryption does not mention what capabilities are available for the Standard Edition, Enterprise Edition, or the Enterprise Edition with the Oracle Advanced Security Option.
  • Page xxxvi describes running the utlxplan script as the user SYSTEM.  Running this script should not be necessary starting with Oracle Database 9i (Reference).
  • Chapter 1 seems to be a little loose with terminology, specifically with the terms “database” and “Oracle”.  These chapters were probably intentionally written as such to help introduce the reader to the world of Oracle Database.  The first couple of pages of the second chapter provide a definition of “database” and “instance”.
  • Some code sections of the book have lines with unnecessarily bold fonts (page 13, 29, 33, 34, 37, 53, 54, 59, 61, 175).
  • Page 109 typing error: “Normally, on UNIX/Linux, this group will be DBA by default, and OSDBA on Windows. That group is ‘special,’ in that any user in that group can connect to Oracle ‘as SYSDBA’ without specifying a username or password.” On Windows the group name is ora_dba, not OSDBA – the same error is found in the previous edition of the book (Reference).
  • Page 131 behavior change: “A serial (nonparallel query) session will use a small percentage of the PGA_AGGREGATE_TARGET, typically about 5 percent or less.  So, if you’ve set the PGA_AGGREGATE_TARGET to 100MB, you’d expect to use no more than about 5MB per work area (e.g., the sort or hash work area). You may well have multiple work areas in your session for multiple queries, or more than one sort or hash operation in a single query, but each work area will be about 5 percent or less of the PGA_AGGREGATE_TARGET.”  It appears that this behavior changes when the OPTIMIZER_FEATURES_ENABLE parameter is set to 10.2.0.1 or greater, as evidenced by the change in the _SMM_MAX_SIZE parameter as found in a 10053 trace file or a query of X$KSPPSV and X$KSPPI (Reference Reference2).
  • Page 220 join error: The self-join of V$LOCK probably should have also joined on the TYPE column in addition to the ID1 and ID2 column.  Additionally, the BLOCK column could potentially have a value of 2, rather than 1 in a RAC environment (Reference).
  • Page 335 typing error: shows the view name V$UNDOSTATUNDOSTAT, should be V$UNDOSTAT.
  • Page 419 suggested use of a deprecated feature: Shows using the exp command with Oracle Database 11.2.0.1 to create a people.sql file containing a CREATE TABLE command for an object based table.  Page 116 states the following “Export [exp] is officially deprecated with Oracle Database 11g Release 2. It is supplied only for use with legacy database structures.”
  • Some of the autotrace generated execution plans do not conform to the DBMS_XPLAN style execution plans that are automatically generated as of Oracle Database 10g R2, even though it appears that the author was connected to Oracle Database 11g R2.  Examples include the two execution plans on page 480.
  • The SQL*Plus prompts are atypical (not “SQL>”) as displayed in the book –  while it might be helpful to show which database the SQL*Plus session is connected to, it makes it hard to read some of the code samples that span more than a single line.

The problems found in the book are absolutely minor in comparison to the value of the information provided in the book.  Yes, it is probably harsh to be judged by the standards set in the previous edition of the book.  This edition update was certainly not a five hour task, and probably was not a five month task either.  I suspect that this edition has been in development for most of the last five years, since the publication of the previous edition.  If you have read the previous editions of this book, set those books aside and start reading from the beginning of this edition – you will not be disappointed.

(Section added December 13, 2010): Related blog articles that discuss the contents of this book:





Buffer Cache Hit Ratio, Optimizer Index Cost Adj, Clustering Factor, Performance Testing – How to Break a Test Computer

14 08 2010

August 14, 2010

My replacement laptop arrived this week to fix an occasional problem that I had with the old laptop:

I thought that I would perform a test on the new laptop, so I thought that I would try a slightly modified version of the script that appeared in the “Expert Oracle Practices” book in the Buffer Cache Hit Ratio section of chapter 8.  The table creation part of the script tends to be very CPU intensive for an hour or two, so if you want to try the script, set aside plenty of time.

SET LINESIZE 150
SET PAGESIZE 2000
SET TRIMSPOOL ON
SET TIMING ON
SET ARRAYSIZE 100
SPOOL C:\BCHRTestShortAsync11Gr2.txt

CREATE TABLE T1 (
  ID NUMBER,
  DESCRIPTION VARCHAR2(80));

INSERT INTO T1
SELECT
  CEIL(ABS(SIN(ROWNUM/9.9999)*10000)),
  'This is the long description for this number '|| TO_CHAR(CEIL(ABS
(SIN(ROWNUM/9.9999)*10000)))
FROM
  (SELECT
    ROWNUM RN
  FROM
    DUAL
  CONNECT BY
    LEVEL<=10000),
  (SELECT
    ROWNUM RN
  FROM
    DUAL
  CONNECT BY
    LEVEL<=10000);

COMMIT;

CREATE INDEX IND_T1 ON T1(ID);

EXEC DBMS_STATS.GATHER_TABLE_STATS (OWNNAME=>USER,TABNAME=>'T1',CASCADE=>TRUE, METHOD_OPT=>'FOR ALL COLUMNS SIZE 1')

ANALYZE INDEX IND_T1 VALIDATE STRUCTURE;

ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH SHARED_POOL;

SET AUTOTRACE TRACEONLY EXPLAIN
ALTER SESSION SET OPTIMIZER_INDEX_COST_ADJ=100;

SELECT
  ID,
  DESCRIPTION
FROM
  T1
WHERE
  ID BETWEEN 1 AND 400;

ALTER SESSION SET OPTIMIZER_INDEX_COST_ADJ=5;

SELECT
  ID,
  DESCRIPTION
FROM
  T1
WHERE
  ID BETWEEN 1 AND 400;

SET ARRAYSIZE 100
SET AUTOTRACE TRACEONLY STATISTICS

SET TIMING ON

ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SESSION SET TRACEFILE_IDENTIFIER='IND_COST_LARGE100';
ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT FOREVER, LEVEL 12';

ALTER SESSION SET OPTIMIZER_INDEX_COST_ADJ=100;

SELECT
  ID,
  DESCRIPTION
FROM
  T1
WHERE
  ID BETWEEN 1 AND 400;

ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT OFF';
ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SESSION SET TRACEFILE_IDENTIFIER='IND_COST_LARGE5';
ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT FOREVER, LEVEL 12';

ALTER SESSION SET OPTIMIZER_INDEX_COST_ADJ=5;

SELECT
  ID,
  DESCRIPTION
FROM
  T1
WHERE
  ID BETWEEN 1 AND 400;

SPOOL OFF

The script builds a table that has 100,000,000 rows, displays the execution plan for a SQL statement that selects a small percent of the table when the OPTIMIZER_INDEX_COST_ADJ is set to 100 and again at 5, and finally excutes the SQL statements with a 10046 trace enabled.

A screen capture during the table creation with Oracle Database 11.2.0.1 (SQL*Plus Windows 7 Task Manager, Windows 7 Resource Monitor):

Note that the laptop only managed to achieve 3MB per second while creating the table – that was a fairly consistent average during the table creation that required 40 minutes and 29 seconds. 

The execution plans generated are as follows:

ALTER SESSION SET OPTIMIZER_INDEX_COST_ADJ=100;

Execution Plan
----------------------------------------------------------
Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |  4000K|   206M|   222K  (1)| 00:44:29 |
|*  1 |  TABLE ACCESS FULL| T1   |  4000K|   206M|   222K  (1)| 00:44:29 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("ID"<=400 AND "ID">=1)

——–

ALTER SESSION SET OPTIMIZER_INDEX_COST_ADJ=5;

Execution Plan
----------------------------------------------------------
Plan hash value: 634656657
--------------------------------------------------------------------------------------
| Id  | Operation                   | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |        |  4000K|   206M|   202K  (1)| 00:40:25 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1     |  4000K|   206M|   202K  (1)| 00:40:25 |
|*  2 |   INDEX RANGE SCAN          | IND_T1 |  4067K|       |   427   (1)| 00:00:06 |
--------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("ID">=1 AND "ID"<=400)

So, the optimizer is predicting that an execution using a full table scan will complete in 44 minutes and 29 seconds, while an execution using an index range scan will complete in 40 minutes and 25 seconds.   Let’s see how the different execution plans performed.

Full Table Scan:

SQL> ALTER SESSION SET OPTIMIZER_INDEX_COST_ADJ=100;

Session altered.

Elapsed: 00:00:00.00
SQL>
SQL> SELECT
  2    ID,
  3    DESCRIPTION
  4  FROM
  5    T1
  6  WHERE
  7    ID BETWEEN 1 AND 400;

2547158 rows selected.

Elapsed: 00:00:14.39

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
     838515  consistent gets
     813246  physical reads
          0  redo size
  140941880  bytes sent via SQL*Net to client
     280701  bytes received via SQL*Net from client
      25473  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
    2547158  rows processed

—-

Index Range Scan:

SQL> ALTER SESSION SET OPTIMIZER_INDEX_COST_ADJ=5;

Session altered.

Elapsed: 00:00:00.00
SQL>
SQL> SELECT
  2    ID,
  3    DESCRIPTION
  4  FROM
  5    T1
  6  WHERE
  7    ID BETWEEN 1 AND 400;

2547158 rows selected.

Elapsed: 00:01:24.36

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
    2573595  consistent gets
     818246  physical reads
          0  redo size
   17418653  bytes sent via SQL*Net to client
     280701  bytes received via SQL*Net from client
      25473  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
    2547158  rows processed

The full table scan when selecting 2.55% of the rows in the table required 14.39 seconds while the index range scan required 1 minute 24.36 seconds.  I guess that is not a bad improvement for leaving the OPTIMIZER_INDEX_COST_ADJ parameter at the default value of 100.  But wait, the full table scan was CPU bound, not IO bound!  What if we allow the SQL statement to use more than 1 CPU?

SET ARRAYSIZE 100
SET AUTOTRACE TRACEONLY STATISTICS

SET TIMING ON

ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SESSION SET TRACEFILE_IDENTIFIER='IND_COST_PARALLEL100';
ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT FOREVER, LEVEL 12';

ALTER SESSION SET OPTIMIZER_INDEX_COST_ADJ=100;

SELECT /*+ PARALLEL(T1, 8 ) */
  ID,
  DESCRIPTION
FROM
  T1
WHERE
  ID BETWEEN 1 AND 400;

Here is a screen capture that was captured as the full table scan ended (I had to execute the above several times so that the screen capture showed the very end of the execution):

That helped a little, the full table scan completed in 12.01 seconds and used about 24% of the CPU capacity.  Interesting that the screen capture shows that the network utilization was 207Mb/s – the laptop was connected to a wireless connection with Oracle Database 11.2.0.1 running locally.  The full table scan might be IO bound now (note that there were only two datafiles, so I supposed that we could still be CPU bound if only two parallel processes could be used).

Let’s see the storage requirements for the table and index (ASSM AUTO tablespace):

SET AUTOTRACE OFF

SELECT
  SEGMENT_NAME SEGMENT,
  COUNT(*) EXTENTS,
  BYTES/1024 EXT_SIZE_KB,
  (COUNT(*) * BYTES)/1048576 TOTAL_MB
FROM
  DBA_EXTENTS
WHERE
  OWNER=USER
  AND SEGMENT_NAME IN ('IND_T1','T1')
GROUP BY
  SEGMENT_NAME,
  BYTES
ORDER BY
  SEGMENT_NAME,
  BYTES;

SEGMENT       EXTENTS EXT_SIZE_KB   TOTAL_MB
---------- ---------- ----------- ----------
IND_T1             16          64          1
IND_T1             63       1,024         63
IND_T1            120       8,192        960
IND_T1             10      65,536        640
T1                 16          64          1
T1                 63       1,024         63
T1                120       8,192        960
T1                  1      60,416         59
T1                 83      65,536      5,312

It might be somewhat interesting to mention that somehow we ended up with one odd 59MB extent in the table, otherwise the extent allocation followed the typical pattern on 16 extents at 64KB, 63 extents at 1MB, 120 extents at 8MB, and then 64MB extents.

Let’s take a quick peek at a couple of the index statistics:

SELECT
  HEIGHT,
  BLOCKS,
  LF_BLKS,
  LF_ROWS,
  DISTINCT_KEYS,
  MOST_REPEATED_KEY,
  PCT_USED
FROM
  INDEX_STATS
WHERE
  NAME='IND_T1';

HEIGHT     BLOCKS    LF_BLKS     LF_ROWS DISTINCT_KEYS MOST_REPEATED_KEY   PCT_USED
------ ---------- ---------- ----------- ------------- ----------------- ----------
     3     212992     208854 100,000,000        10,000           900,324         90

-

SELECT
  INDEX_TYPE,
  BLEVEL,
  LEAF_BLOCKS,
  DISTINCT_KEYS,
  AVG_LEAF_BLOCKS_PER_KEY,
  AVG_DATA_BLOCKS_PER_KEY,
  CLUSTERING_FACTOR,
  SAMPLE_SIZE
FROM
  USER_INDEXES
WHERE
  INDEX_NAME='IND_T1';

INDEX_TYPE     BLEVEL LEAF_BLOCKS DISTINCT_KEYS AVG_LEAF_BLOCKS_PER_KEY AVG_DATA_BLOCKS_PER_KEY CLUSTERING_FACTOR SAMPLE_SIZE
---------- ---------- ----------- ------------- ----------------------- ----------------------- ----------------- -----------
NORMAL              2     212,350        10,000                      21                  10,074       100,749,838     556,356

Interesting – the clustering factor is higher than the number of rows in the table – maybe we should have sampled 100% of the table and index when collecting statistics.

A quick look at the table statistics:

SELECT
  NUM_ROWS,
  BLOCKS,
  AVG_ROW_LEN,
  SAMPLE_SIZE
FROM
  USER_TABLES
WHERE
  TABLE_NAME='T1';

   NUM_ROWS     BLOCKS AVG_ROW_LEN SAMPLE_SIZE
----------- ---------- ----------- -----------
100,000,000    817,275          54 100,000,000

It might be a good idea to take a look at the TKPROF summaries for the executions.  Just as a reminder, the normal (non-parallel, serial) table scan required 14.39 seconds, the parallel table scan required 12.01 seconds, and the index range scan required 1 minute 24.36 seconds.

TKPROF output full table scan: elapsed time 6.18 seconds (8.21 seconds less than what was reported by SQL*Plus) with 5.28 seconds of CPU time:

SELECT
  ID,
  DESCRIPTION
FROM
  T1
WHERE
  ID BETWEEN 1 AND 400

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch    25473      5.28       6.18     813246     838515          0     2547158
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total    25475      5.28       6.18     813246     838515          0     2547158

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 185 

Rows     Row Source Operation
-------  ---------------------------------------------------
2547158  TABLE ACCESS FULL T1 (cr=838515 pr=813246 pw=0 time=6130508 us cost=222369 size=216021546 card=4000399)

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                   25473        0.00          0.02
  db file sequential read                         3        0.00          0.00
  direct path read                             4080        0.00          0.91
  SQL*Net message from client                 25473        0.00          7.00
  db file scattered read                          1        0.00          0.00

TKPROF output parallel full table scan: elapsed time 3.42 seconds (8.59 seconds less than what was reported by SQL*Plus) with 1.63 seconds of CPU time:

SELECT /*+ PARALLEL(T1, 8 ) */
  ID,
  DESCRIPTION
FROM
  T1
WHERE
  ID BETWEEN 1 AND 400

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.04         11         25          0           0
Fetch    25473      1.63       3.42          0          0          0     2547158
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total    25475      1.63       3.47         11         25          0     2547158

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 185 

Rows     Row Source Operation
-------  ---------------------------------------------------
2547158  PX COORDINATOR  (cr=25 pr=11 pw=0 time=2345038 us)
      0   PX SEND QC (RANDOM) :TQ10000 (cr=0 pr=0 pw=0 time=0 us cost=30853 size=216021546 card=4000399)
      0    PX BLOCK ITERATOR (cr=0 pr=0 pw=0 time=0 us cost=30853 size=216021546 card=4000399)
      0     TABLE ACCESS FULL T1 (cr=0 pr=0 pw=0 time=0 us cost=30853 size=216021546 card=4000399)

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  db file sequential read                         3        0.00          0.00
  db file scattered read                          1        0.00          0.00
  os thread startup                               8        0.00          0.02
  PX Deq: Join ACK                                8        0.00          0.00
  PX Deq: Parse Reply                             8        0.00          0.01
  SQL*Net message to client                   25473        0.00          0.02
  PX Deq: Execute Reply                        5495        0.00          1.92
  SQL*Net message from client                 25473        0.00          7.32
  latch free                                      1        0.00          0.00
  PX qref latch                                   1        0.00          0.00
  PX Deq: Signal ACK RSG                          1        0.00          0.00
  PX Deq: Signal ACK EXT                          7        0.00          0.00
  PX Deq: Slave Session Stats                     3        0.00          0.00
  enq: PS - contention                            1        0.00          0.00

TKPROF output index range scan: elapsed time 76.85 seconds (16.51 seconds less than what was reported by SQL*Plus) with 29.28 seconds of CPU time:

SELECT
  ID,
  DESCRIPTION
FROM
  T1
WHERE
  ID BETWEEN 1 AND 400

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch    25473     29.28      76.85     818246    2573595          0     2547158
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total    25475     29.28      76.85     818246    2573595          0     2547158

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 185 

Rows     Row Source Operation
-------  ---------------------------------------------------
2547158  TABLE ACCESS BY INDEX ROWID T1 (cr=2573595 pr=818246 pw=0 time=31008522 us cost=202004 size=216021546 card=4000399)
2547158   INDEX RANGE SCAN IND_T1 (cr=30646 pr=5238 pw=0 time=2313805 us cost=427 size=0 card=4067270)(object id 77111)

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                   25473        0.00          0.03
  db file sequential read                     81689        0.10         32.96
  SQL*Net message from client                 25473        0.00          6.11
  db file parallel read                       36586        0.87         26.09

Numbers from the twilight zone?  But wait, there is more.  The numbers were so illogical that I took the database offline, rebooted, and was immediately greeted with a warning that the RAID 0 array in the new laptop failed!  Try this experiement at your own risk.

————————————-

So, what  have we learned from this blog article?  To get you started, what was the buffer cache hit ratio for each of the executions?  How can you explain the decrease in CPU time when parallel execution was used?

—-

Edit August 14, 2010: In case you are wondering, the Oracle database was running on the laptop, not a remote server.  The 11.2.0.1 database was configured with Oracle’s default parameters, with the exception of SGA_TARGET which was set to 12G and PGA_AGGREGATE_TARGET which was set to 1800M.  Oracle Database on the Windows platform supports only direct, asynchronous I/O.  Windows 7 Ultimate running on the laptop does not offer large memory page support, unlike the Server type Windows operating systems.








Follow

Get every new post delivered to your Inbox.

Join 144 other followers