June 28, 2011
While reading the alpha edition of the “Oracle Database 11g Performance Tuning Recipes” book, I noticed a couple of interesting SQL statements in Recipe 5-17 that just did not look right. One of those SQL statements, slightly reformatted, appears below:
SELECT A.CURRENT_OBJ#, O.OBJECT_NAME, O.OBJECT_TYPE, A.EVENT, SUM(A.WAIT_TIME + A.TIME_WAITED) TOTAL_WAIT_TIME FROM V$ACTIVE_SESSION_HISTORY A, DBA_OBJECTS D WHERE A.SAMPLE_TIME BETWEEN SYSDATE – 30/2880 AND SYSDATE AND A.CURRENT_OBJ# = D.OBJECT_ID GROUP BY A.CURRENT_OBJ#, D.OBJECT_NAME, D.OBJECT_TYPE, A.EVENT ORDER BY TOTAL_WAIT_TIME;
The above SQL statement is supposed to retrieve a list of object names that experienced the most wait time in the last 15 minutes. What is wrong with the query? Consider that a slight variation of the above query is found in the following locations:
- Page 1,170 of the book “Expert Oracle Database 11g Administration“
- An OTN thread where the original poster claimed that the query required too much time to execute.
- An undated news article.
- Page 213 of the book “Oracle Tuning: The Definitive Reference, Second Edition“
- Page 160 of the book “Oracle Tuning Power Scripts: With 100+ High Performance SQL Scripts”
So, what is wrong with the query? OK, the table aliases are inconsistent, but that is too easy. Let’s refer to the Oracle Database 11.2 documentation for the V$ACTIVE_SESSION_HISTORY view.
“V$ACTIVE_SESSION_HISTORY displays sampled session activity in the database. It contains snapshots of active database sessions taken once a second. A database session is considered active if it was on the CPU or was waiting for an event that didn’t belong to the Idle wait class. Refer to the V$EVENT_NAME view for more information on wait classes.
This view contains one row for each active session per sample and returns the latest session sample rows first. A majority of the columns describing the session in the active session history are present in the V$SESSION view.”
Still not seeing the problem? If we believe the documentation, assume that we are only interested in a 10 second time interval, and one of the sessions started waiting on a wait event at the start of the time interval, and was still waiting in the same wait event at the end of the 10 second time interval. SUM(A.WAIT_TIME + A.TIME_WAITED) for the session (assuming that rounding does not happen due to view update frequency) would show 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10 = 55 seconds of total wait time for the result of the SUM function. Not too bad, right? Now, assume that the session had already waited 240 seconds and continued to wait for an additional 10 seconds – the SUM(A.WAIT_TIME + A.TIME_WAITED) for the session would show that the session waited 240 + 241 + 242 + 243 + 244 + 245 + 246 + 247 + 248 + 249 + 250 = 2695 seconds (44.92 minutes) in that 10 second time interval.
A fun little test that will require two sessions. In Session 1:
CREATE TABLE T1 ( C1 NUMBER, PRIMARY KEY(C1)); INSERT INTO T1 VALUES ( 1);
Now, in Session 1, determine the SID:
COLUMN SID FORMAT 999 SELECT SID FROM V$MYSTAT WHERE ROWNUM=1; SID ---- 189
In Session 2, determine the SID:
COLUMN SID FORMAT 999 SELECT SID FROM V$MYSTAT WHERE ROWNUM=1; SID ---- 4
Back in Session 1, insert a row into the table:
INSERT INTO T1 VALUES ( 1);
In Session 2, attempt to insert the same row into the table (a primary key violation if the first session commits):
INSERT INTO T1 VALUES ( 1);
Session 2 is now hung. Back in Session 1, wait 20 minutes and check
AWR ASH using a slightly fixed version of the query (mainly the table aliases are corrected, and adjusted to examine only the last 10 seconds of entries in V$ACTIVE_SESSION_HISTORY for the two sessions of interest):
SELECT A.CURRENT_OBJ#, D.OBJECT_NAME, D.OBJECT_TYPE, A.EVENT, SUM(A.WAIT_TIME + A.TIME_WAITED) TOTAL_WAIT_TIME FROM V$ACTIVE_SESSION_HISTORY A, DBA_OBJECTS D WHERE A.SAMPLE_TIME BETWEEN SYSDATE – 10/(24*60*60) AND SYSDATE AND A.CURRENT_OBJ# = D.OBJECT_ID AND A.SESSION_ID IN (4,189) GROUP BY A.CURRENT_OBJ#, D.OBJECT_NAME, D.OBJECT_TYPE, A.EVENT ORDER BY TOTAL_WAIT_TIME; no rows selected
That can’t be the expected result. Let’s try running the following query in Microsoft Excel:
SELECT SESSION_ID, EVENT, CURRENT_OBJ#, SAMPLE_TIME, WAIT_TIME, TIME_WAITED FROM V$ACTIVE_SESSION_HISTORY WHERE SESSION_ID IN (4,189) ORDER BY SESSION_ID, SAMPLE_TIME;
After scrolling down a couple of thousand lines in the spreadsheet:
Well, that throws off the results of the query found in the book a bit. The CURRENT_OBJ# is -1 and the session has not waited any time (SUM(A.WAIT_TIME + A.TIME_WAITED) = 0 for SID 4, but would be 116,972,415 for SID 189 for the same CURRENT_OBJ# = -1).
Maybe the query found in the book just does not work for potential primary key violations? What about a case where one session updates a row and another session attempts to update the same row? Issue a COMMIT in Session 1 and Session 2. In Session 1:
COMMIT; UPDATE T1 SET C1=C1;
In Session 2, attempt to update the same row:
UPDATE T1 SET C1=C1;
Session 2 is hung while waiting for Session 1 to COMMIT or ROLLBACK. Now wait for a while. While you are waiting, let’s take a quick look at the documentation for the V$ACTIVE_SESSION_HISTORY view from the 11.2 Oracle Documentation library:
“TIME_WAITED: If SESSION_STATE = WAITING, then the time that the session actually spent waiting for that event (in hundredths of a second). This column is set for waits that were in progress at the time the sample was taken.
If a wait event lasted for more than a second and was caught waiting in more than one session sample row, then the actual time spent waiting for that wait event will be populated in the last of those session sample rows.”
OK, must check the SESSION_STATE column, only the last of those sample rows in a string of rows for the same wait event, and the time is in 1/100th (0.01) of a second. Let’s re-run the query in Excel to see the progress:
That 2,471,728,088 number in the TIME_WAITED column appears to be in conflict with the description in the Oracle Documentation library – the time appears to be reported in 1/1,000,000 (0.000001) of a second, not 1/100 of a second.
Back to the original topic of this article. I have yet to determine why the query subtracts 30/2880 from the SYSDATE to retrieve the activity in the last 15 minutes – wouldn’t it be more clear to subtract 15 / (24 * 60) or 15 / 1440? Considering that the SAMPLE_TIME column in V$ACTIVE_SESSION_HISTORY is a TIMESTAMP datatype, would it make more sense to use SYSTIMESTAMP rather than SYSDATE?
Finally, What is Wrong with the Quote Found in the Book?
The point of blog articles like this one is not to insult authors who have spent thousands of hours carefully constructing an accurate and helpful book, but instead to suggest that readers investigate when something stated does not exactly match what one believes to be true. It could be that the author “took a long walk down a short pier”, or that the author is revealing accurate information which simply cannot be found through other resources (and may in the process be directly contradicting information sources you have used in the past). If you do not investigate in such cases, you may lose an important opportunity to learn something that could prove to be extremely valuable.