June 16, 2010
While reading the June 2010 printing of the “Oracle Performance Firefighting” book I found an interesting paragraph that left me wondering if something is missing. This quote is from page 61 (this is a long quote, required to not lose the context of the discussion topic - please excuse any typos):
“As another example, suppose that you receive a call about long commit times. The users are saying ‘submits’ or ‘saves’ are taking longer today than yesterday. Upon an examination of the average log file sync wait times (which indicate commit times from an Oracle perspective), you discover there is no change from the previous day. Therefore, you know the performance issue is not because of Oracle’s commit mechanism and most likely not related to the database.” [This period marks the end of Part 3, "IO Wait Time Summary with Event Details"]
What, if anything, is wrong with the above quote from the book?
The point of blog articles like this one is not to insult authors who have spent thousands of hours carefully constructing an accurate and helpful book, but instead to suggest that readers investigate when something stated does not exactly match what one believes to be true. It could be that the author “took a long walk down a short pier”, or that the author is revealing accurate information which simply cannot be found through other resources (and may in the process be directly contradicting information sources you have used in the past). If you do not investigate in such cases, you may lose an important opportunity to learn something that could prove to be extremely valuable.

Averages hide outliers. This means that if there are long taking commits, but about the same amount of commits which take much shorter time, the average can be the same (of a previous day with satisfactory commit times). It’s like the analogy of one foot in a bath with boiling water, and the other one in a bath with ice. The average temperature of the two is convenient, so it must be convenient.
The text also seems to steer away from measuring, whilst that will be the only way to determine if something is the matter.
The author may be correct in that it may not be a problem with the “commit mechanism.” What would stop this from being a problem with any other Oracle mechanism? Maybe the execution plan changed between the time when the system was performing acceptably and when it wasn’t. I would suspect that there are a number of possible causes that would ultimately be perceived in the same way by the users.
This particular example is a perfect reason why end-to-end application tracing is necessary. How do we even know that the problem is database related or not? Does the time spent in the database (“DB Time”) account for a significant portion of the overall time spent? If so, more investigation by the DBA team is necessary.
Great comments so far.
A parallel question, does:
User reported “Save” = COMMIT = log file sync wait?
I think that Centinul’s response is on the verge of answering the above parallel question.
How likely is it the save button on the Application is just performing a commit? most likely its taking the changes the user made inserting/updating/deleting rows then commiting all tied to a single button press the commit here may be the smallest part of the operation itself.
Apart from a techical discussion around the commit mechanism I would be very worried about any statement that start off “You know ” and ends with “… not related to the database”.
I would also want to understand the metrics and evidence behind ‘long commit times’. Is this a user perception or is committ taking longer than yesterday.
Presumably you have given something away with the title of the blog being ‘Log file sync’ and the evience of problems would this a leading wait event.
John
I agree with Chris_c. The nebulous terms “submit” and “save” may in fact relate to more database work than one would presume in the servicing of EXEC SQL COMMIT;
“Oracle’s commit mechanism” is a bit of a nickname nickname unless that has been defined elsewhere in the book. The schematic of a COMMIT is pretty complex and includes a lot of IPC and latching.
We are probably splitting hairs on this one.
I thought that I would share the thoughts that I had about the quoted paragraph at the time I read it in the book. The comments so far in this blog article are very much on target, and in many cases are either more detailed or better thought through than what immediately crossed my mind when first seeing the paragraph.
The first thought was that the author was tired at the point when this paragraph was written and was looking forward to moving on to the next section, with the intention of looping back to this section of the book to fill in the missing details (I used this approach when working on the “Expert Oracle Practices” book). As it stands, the paragraph reads almost as if it is a DBA washing his hands of a problem and pushing the problem back into the corner of the network administrator or some other person in the company – because the DBA is claiming that because the COMMITs are fast it cannot possibly be a problem with the database server.
This brings me to the parallel question that I posed: is a user reported save (or submit) equal to an application COMMIT, and is a COMMIT the only thing that causes a ‘log file sync’ wait? A quick test to see if COMMITs are the only cause of ‘log file sync’ waits. The test script, after connecting to the database, drops an existing test table, creates a new test table, inserts 100 rows into the test table, and then issues a ROLLBACK:
The test shows that the implicit COMMIT for DDL, when dropping the table and creating the table, caused a ‘log file sync’ wait (impressive, but you knew that). The ROLLBACK also caused the log file sync wait (I wonder if the author considered this?).
—
Is a user reported ‘Save’ the same thing as a COMMIT? Two things crossed my mind when I thought about that:
1. I have written several programs that interact with Oracle databases, and the programs often have something similar to a ‘Save’ option that a user might select. What frequently happens when the Save command is executed? A transaction in the database is explicitly started (the programming environment would otherwise perform an implicit COMMIT after all DDL commands); data to be ‘Saved’ is validated which might involve selecting from various database tables or performing lookups in text files; rows in tables are deleted, inserted, and/or updated; if no errors/exceptions are reported in previous operations a COMMIT is executed otherwise a ROLLBACK is executed. The COMMIT is a very small portion of the total ‘Save’ operation.
2. I recalled reading in Tom Kyte’s “Expert Oracle Database Architecture” book something about the actual COMMIT operation being very fast compared to all of the steps leading up to the execution of the COMMIT. I think that it was later 2005 or early 2006 when I read that book, so I probably forgot several of the details.
I have pulled my second copy of the “Expert Oracle Database Architecture: 9i and 10g Programming Techniques and Solutions” off the bookshelf (which at the moment looks a lot like a coffee table). From pages 296 and 297:
Much more may be found in Tom’s book related to why the actual COMMIT time is the wrong thing to measure.
—
Finally, averages hide details. What if the user is not doing exactly what she was doing yesterday? Maybe her session performed a greater percentage of ROLLBACK operations yesterday than COMMIT operations. (Pages 298-302 of Tom’s book describe why that might matter.)
I’m not following your #1 about “A transaction in the database is explicitly started (the programming environment would otherwise perform an implicit COMMIT after all DDL commands).” You’re saying DDL doesn’t do a commit before and after the DDL, regardless?
Joel,
Thank you for catching that typo – DML (data manipulation language) was intended, not DDL (data definition language). Unless a transaction is explicitly started, some programming environments will perform an implicit commit after every DML statement. If someone is performing DDL in a typical ‘Save’ operation something is probably very wrong.