September 27, 2010
It is far too easy to invest many hours into building a test case to demonstrate or explore a simple concept, or model a particular problem that is being encountered. I recently had the opportunity to attend a session by Jonathan Lewis that was titled “The Beginner’s Guide to Becoming an Expert”, a session which emphasized the use of test scripts to investigate various types of unexpected behavior, when that unexpected behavior is found. Using examples from Jonathan’s session, that unexpected behavior might include noticing a reduction in the redo generated when an update statement is driven through an index access path, for instance. Another example from Jonathan’s session included the unexpected behavior where a significant amount of redo is generated when a bitmap index is updated, compared to the amount of redo generated when a b*tree index on the same column is updated, and then finding that the amount of redo generated for the bitmap index drops significantly when the Oracle Database is upgraded from 9.2 to 11.1. Creating a test script that models a particular situation allows that problem to be investigated in isolation, and then repeatedly analyzed as one variable is changed at a time – the change might include modifying an initialization parameter, submission of different initial bind variables, execution of the script against a different Oracle Database release version, or any number of other small scale changes.
One of the key concepts in the creation of test cases is to compare the intended end result of the test case with the starting point, and then identify logical transformation stages between the two end points. At each transformation stage confirm that the preceding step logically leads to the next step, and then determine if there is any chance that a different result might be obtained given the same input. Work backward from the last transformation stage to the first, and ask yourself if there is any other possible input that might have caused the outcome. It takes practice, a fair amount of research, and sometimes trial and error to discover the multiple potential cause and effect relationships.
I recently used this technique to build a test case for an OTN thread to discover why an index was not used to retrieve three rows from an 8 million row table. The “given” information was that the index was used if an always false comparison of two literal VARCHAR values was removed from an OR condition in the WHERE clause. The first step in building a test case, assuming that the production environment where the problem was found is not available for use, involves building a model of the source data. If you are familiar with the real data that causes the unexpected behavior, build a model that simulates the real data, otherwise assume that the data to be returned is scattered among the other rows in the table data. In my test case, to simplify the calculations a bit, the number of rows to be inserted into the test table was increased to 10 million. Before building the data model, determine what might cause a full table scan to be performed rather than using an index access path: high clustering factor for the index, poor cardinality estimates (might be caused by out of date object statistics), incorrectly gathered system statistics, poorly specified optimizer parameters, sub-optimal index definition (might require an index skip scan, or return a large number of rows that are later eliminated at the table level), optimizer bugs, maybe the full table scan really is faster than an index access path, etc.
Now that we have identified some of the causes for a full table scan, our test case may be used to experiment “what if” and “how would I know if that was the cause?” To determine if the clustering factor was the problem, I might first need to know how the optimizer uses the clustering factor, and am I able to artificially adjust it to see if a change in the value makes a difference in performance. To determine if the cardinality estimates are a problem I might use DBMS_XPLAN.DISPLAY_CURSOR to retrieve the estimated cardinality and actual number of rows returned by every operation in the execution plan. I might remove the comparison of the two literal VARCHAR values to see how that change affects the cardinality estimates, or see if changing one of the literal values causes a change – one of those changes just might also allow an index access path to be used. I might try to use a hint to force the use the index (hints are, after all, directives and not suggestions), and if that does not work I might analyze why the hint was invalid. If the hint worked, after verifying that a performance improvement was achieved, I would likely consult a 10053 trace file to determine why the index access path was not automatically used – was the cost too high, or was the index access path cost not even calculated?
Once the cause of the full table scan is determined, additional what-if statements should be considered. What if the index hint worked, am I able to permanently add that hint to the SQL statement – are there any cases were adding that hint might lead to other performance problems later? If the cardinality estimates are incorrect, is there anything that I am able to do to fix the cardinality estimates? If the optimizer parameters are poorly specified, am I able to adjust those parameters – and if I do what might be the impact on the other SQL statements that depend on the current initialization parameter values for optimal performance and costing calculations? Are we able to modify the application that submitted the SQL statement to remove the unnecessary literal comparison?
Once the desired outcome is reached, work backward through the steps and determine where this approach might go wrong. Is it reasonable to conclude that the previous step always results in the next step? In short, test cases are not terribly difficult to construct once you learn what might cause a transformation step to simply lead to the wrong next transformation step.
Oracle Database behavior changes from one release version to another, and sometimes from one patch level to the next. Save the test case scripts once generated, because the behavior that is demonstrated by the test script might only be true for a small number of Oracle Database releases. My test case script for generating a deadlock on Oracle Database 11.1 and 11.2, while the same test case script simply generates an expected enqueue on Oracle Database 10.2, is just one example of the usefulness of test cases to demonstrate Oracle behavior.
Do we need a test case to determine why the OTN forums, if working at all, are so slow this morning? I believe that it required more than two minutes just to display the forum home – that was roughly an hour after I saw the message stating that the OTN forums were down (someone must have tripped over the power cord again).
Any other ideas for easily creating useful test cases?