Find Objects with the Greatest Wait Time – What is Wrong with this Quote?

28 06 2011

June 28, 2011

While reading the alpha edition of the “Oracle Database 11g Performance Tuning Recipes” book, I noticed a couple of interesting SQL statements in Recipe 5-17 that just did not look right.  One of those SQL statements, slightly reformatted, appears below:

SELECT
  A.CURRENT_OBJ#,
  O.OBJECT_NAME,
  O.OBJECT_TYPE,
  A.EVENT,
  SUM(A.WAIT_TIME + A.TIME_WAITED) TOTAL_WAIT_TIME
FROM
  V$ACTIVE_SESSION_HISTORY A,
  DBA_OBJECTS D
WHERE
  A.SAMPLE_TIME BETWEEN SYSDATE – 30/2880 AND SYSDATE
  AND A.CURRENT_OBJ# = D.OBJECT_ID
GROUP BY
  A.CURRENT_OBJ#,
  D.OBJECT_NAME,
  D.OBJECT_TYPE,
  A.EVENT
ORDER BY
  TOTAL_WAIT_TIME; 

The above SQL statement is supposed to retrieve a list of object names that experienced the most wait time in the last 15 minutes.  What is wrong with the query?  Consider that a slight variation of the above query is found in the following locations:

So, what is wrong with the query?  OK, the table aliases are inconsistent, but that is too easy.  Let’s refer to the Oracle Database 11.2 documentation for the V$ACTIVE_SESSION_HISTORY view.

“V$ACTIVE_SESSION_HISTORY displays sampled session activity in the database. It contains snapshots of active database sessions taken once a second. A database session is considered active if it was on the CPU or was waiting for an event that didn’t belong to the Idle wait class. Refer to the V$EVENT_NAME view for more information on wait classes.

This view contains one row for each active session per sample and returns the latest session sample rows first. A majority of the columns describing the session in the active session history are present in the V$SESSION view.”

Still not seeing the problem?  If we believe the documentation, assume that we are only interested in a 10 second time interval, and one of the sessions started waiting on a wait event at the start of the time interval, and was still waiting in the same wait event at the end of the 10 second time interval.  SUM(A.WAIT_TIME + A.TIME_WAITED) for the session (assuming that rounding does not happen due to view update frequency) would show 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10 = 55 seconds of total wait time for the result of the SUM function.  Not too bad, right?  Now, assume that the session had already waited 240 seconds and continued to wait for an additional 10 seconds – the SUM(A.WAIT_TIME + A.TIME_WAITED) for the session would show that the session waited 240 + 241 + 242 + 243 + 244 + 245 + 246 + 247 + 248 + 249 + 250 = 2695 seconds (44.92 minutes) in that 10 second time interval.

A fun little test that will require two sessions.  In Session 1:

CREATE TABLE T1 (
  C1 NUMBER,
  PRIMARY KEY(C1));

INSERT INTO
  T1
VALUES (
  1); 

Now, in Session 1, determine the SID:

COLUMN SID FORMAT 999

SELECT
  SID
FROM
  V$MYSTAT
WHERE
  ROWNUM=1;

 SID
----
 189 

In Session 2, determine the SID:

COLUMN SID FORMAT 999

SELECT
  SID
FROM
  V$MYSTAT
WHERE
  ROWNUM=1;

 SID
----
   4 

Back in Session 1, insert a row into the table:

INSERT INTO
  T1
VALUES (
  1); 

In Session 2, attempt to insert the same row into the table (a primary key violation if the first session commits):

INSERT INTO
  T1
VALUES (
  1); 

Session 2 is now hung.  Back in Session 1, wait 20 minutes and check AWR ASH using a slightly fixed version of the query (mainly the table aliases are corrected, and adjusted to examine only the last 10 seconds of entries in V$ACTIVE_SESSION_HISTORY for the two sessions of interest):

SELECT
  A.CURRENT_OBJ#,
  D.OBJECT_NAME,
  D.OBJECT_TYPE,
  A.EVENT,
  SUM(A.WAIT_TIME + A.TIME_WAITED) TOTAL_WAIT_TIME
FROM
  V$ACTIVE_SESSION_HISTORY A,
  DBA_OBJECTS D
WHERE
  A.SAMPLE_TIME BETWEEN SYSDATE – 10/(24*60*60) AND SYSDATE
  AND A.CURRENT_OBJ# = D.OBJECT_ID
  AND A.SESSION_ID IN (4,189)
GROUP BY
  A.CURRENT_OBJ#,
  D.OBJECT_NAME,
  D.OBJECT_TYPE,
  A.EVENT
ORDER BY
  TOTAL_WAIT_TIME; 

no rows selected

That can’t be the expected result.  Let’s try running the following query in Microsoft Excel:

SELECT
  SESSION_ID,
  EVENT,
  CURRENT_OBJ#,
  SAMPLE_TIME,
  WAIT_TIME,
  TIME_WAITED
FROM
  V$ACTIVE_SESSION_HISTORY
WHERE
  SESSION_ID IN (4,189)
ORDER BY
  SESSION_ID,
  SAMPLE_TIME; 

After scrolling down a couple of thousand lines in the spreadsheet:

Well, that throws off the results of the query found in the book a bit.  The CURRENT_OBJ# is -1 and the session has not waited any time (SUM(A.WAIT_TIME + A.TIME_WAITED) = 0 for SID 4, but would be 116,972,415 for SID 189 for the same CURRENT_OBJ# = -1). 

Maybe the query found in the book just does not work for potential primary key violations?  What about a case where one session updates a row and another session attempts to update the same row?  Issue a COMMIT in Session 1 and Session 2.  In Session 1:

COMMIT;

UPDATE
  T1
SET
  C1=C1; 

In Session 2, attempt to update the same row:

UPDATE
  T1
SET
  C1=C1; 

Session 2 is hung while waiting for Session 1 to COMMIT or ROLLBACK.  Now wait for a while.  While you are waiting, let’s take a quick look at the documentation for the V$ACTIVE_SESSION_HISTORY view from the 11.2 Oracle Documentation library:

“TIME_WAITED: If SESSION_STATE = WAITING, then the time that the session actually spent waiting for that event (in hundredths of a second). This column is set for waits that were in progress at the time the sample was taken.

If a wait event lasted for more than a second and was caught waiting in more than one session sample row, then the actual time spent waiting for that wait event will be populated in the last of those session sample rows.”

OK, must check the SESSION_STATE column, only the last of those sample rows in a string of rows for the same wait event, and the time is in 1/100th (0.01) of a second.  Let’s re-run the query in Excel to see the progress:

That 2,471,728,088 number in the TIME_WAITED column appears to be in conflict with the description in the Oracle Documentation library – the time appears to be reported in 1/1,000,000 (0.000001) of a second, not 1/100 of a second.

Back to the original topic of this article.  I have yet to determine why the query subtracts 30/2880 from the SYSDATE to retrieve the activity in the last 15 minutes – wouldn’t it be more clear to subtract 15 / (24 * 60) or 15 / 1440?  Considering that the SAMPLE_TIME column in V$ACTIVE_SESSION_HISTORY is a TIMESTAMP datatype, would it make more sense to use SYSTIMESTAMP rather than SYSDATE?

Finally, What is Wrong with the Quote Found in the Book?

The point of blog articles like this one is not to insult authors who have spent thousands of hours carefully constructing an accurate and helpful book, but instead to suggest that readers investigate when something stated does not exactly match what one believes to be true. It could be that the author “took a long walk down a short pier”, or that the author is revealing accurate information which simply cannot be found through other resources (and may in the process be directly contradicting information sources you have used in the past). If you do not investigate in such cases, you may lose an important opportunity to learn something that could prove to be extremely valuable.