June 15, 2010
A couple of days ago a thread appeared in the OTN forums asking the following question:
“As per documents In general, the addition of wait classes helps direct the DBA more quickly toward the root cause of performance problems.
How could i trace the root cause of performence problems if it is related to wait class?”
This is the answer that I provided:
I am not completely sure that I understand your question. The wait class gives you an approximate idea of where the performance problem will be found. You must then further investigate the wait events in that wait class. There are of course potential problems with starting at the wait class (some wait classes have 2 wait events, while others have many – that could throw off the search for the problem that is impacting performance the most), but at least it provides a starting point. To give you an idea of the wait events in each wait class, here is a SQL statement that was executed on Oracle Database 126.96.36.199:
SQL> DESC V$EVENT_NAME Name Null? Type ----------------------------------------- -------- ------------ EVENT# NUMBER EVENT_ID NUMBER NAME VARCHAR2(64) PARAMETER1 VARCHAR2(64) PARAMETER2 VARCHAR2(64) PARAMETER3 VARCHAR2(64) WAIT_CLASS_ID NUMBER WAIT_CLASS# NUMBER WAIT_CLASS VARCHAR2(64)
(Note that I have trimmed the output of the SQL statement to show the first wait event in each wait class, followed by a couple other wait events in the same wait class.)
SELECT SUBSTR(NAME,1,30) EVENT_NAME, SUBSTR(WAIT_CLASS,1,20) WAIT_CLASS FROM V$EVENT_NAME ORDER BY SUBSTR(WAIT_CLASS,1,20), SUBSTR(NAME,1,30); EVENT_NAME WAIT_CLASS ------------------------------ -------------------- ASM COD rollback operation com Administrative ASM mount : wait for heartbeat Administrative Backup: sbtbackup Administrative Backup: sbtbufinfo Administrative Backup: sbtclose Administrative Backup: sbtclose2 Administrative ... OLAP DML Sleep Application SQL*Net break/reset to client Application SQL*Net break/reset to dblink Application Streams capture: filter callba Application Streams: apply reader waiting Application WCR: replay lock order Application Wait for Table Lock Application enq: KO - fast object checkpoi Application enq: PW - flush prewarm buffer Application enq: RC - Result Cache: Conten Application enq: RO - contention Application enq: RO - fast object reuse Application enq: TM - contention Application enq: TX - row lock contention Application enq: UL - contention Application ASM PST query : wait for [PM][ Cluster gc assume Cluster gc block recovery request Cluster ... enq: BB - 2PC across RAC insta Commit log file sync Commit Shared IO Pool Memory Concurrency Streams apply: waiting for dep Concurrency buffer busy waits Concurrency cursor: mutex S Concurrency cursor: mutex X Concurrency cursor: pin S wait on X Concurrency ... Global transaction acquire ins Configuration Streams apply: waiting to comm Configuration checkpoint completed Configuration enq: HW - contention Configuration enq: SQ - contention Configuration enq: SS - contention Configuration enq: ST - contention Configuration enq: TX - allocate ITL entry Configuration free buffer waits Configuration ... ASM background timer Idle DIAG idle wait Idle EMON slave idle wait Idle HS message to agent Idle IORM Scheduler Slave Idle Wait Idle JOX Jit Process Sleep Idle ... ARCH wait for flow-control Network ARCH wait for net re-connect Network ARCH wait for netserver detach Network ARCH wait for netserver init 1 Network ARCH wait for netserver init 2 Network ARCH wait for netserver start Network ARCH wait on ATTACH Network ARCH wait on DETACH Network ARCH wait on SENDREQ Network LGWR wait on ATTACH Network LGWR wait on DETACH Network LGWR wait on LNS Network LGWR wait on SENDREQ Network LNS wait on ATTACH Network LNS wait on DETACH Network LNS wait on LGWR Network LNS wait on SENDREQ Network SQL*Net message from dblink Network SQL*Net message to client Network SQL*Net message to dblink Network SQL*Net more data from client Network SQL*Net more data from dblink Network ... AQ propagation connection Other ARCH wait for archivelog lock Other ARCH wait for process death 1 Other ARCH wait for process death 2 Other ARCH wait for process death 3 Other ARCH wait for process death 4 Other ARCH wait for process death 5 Other ARCH wait for process start 1 Other ... Streams AQ: enqueue blocked du Queueing Streams AQ: enqueue blocked on Queueing Streams capture: waiting for s Queueing Streams: flow control Queueing Streams: resolve low memory co Queueing resmgr:I/O prioritization Scheduler resmgr:become active Scheduler resmgr:cpu quantum Scheduler ARCH random i/o System I/O ARCH sequential i/o System I/O Archiver slave I/O System I/O DBWR slave I/O System I/O LGWR random i/o System I/O ... BFILE read User I/O DG Broker configuration file I User I/O Data file init write User I/O Datapump dump file I/O User I/O Log file init write User I/O Shared IO Pool IO Completion User I/O buffer read retry User I/O cell multiblock physical read User I/O cell single block physical rea User I/O cell smart file creation User I/O cell smart index scan User I/O cell smart table scan User I/O cell statistics gather User I/O db file parallel read User I/O db file scattered read User I/O db file sequential read User I/O db file single write User I/O ...
So, if the User I/O wait class floats to the top of the wait classes between a known start time and end time, and the Commit wait class is at the bottom of the wait classes when comparing accumulated time, it probably would not make much sense to spend time investigating the wait events in the Commit class… until you realize that there is a single event in the Commit wait class that typically contributes wait time, while there are many in the User I/O wait class.
It appears that my answer either missed the target or was ignored by the original poster. It appeared that the OP was attempting to immediately jump from step 1 to the last step, without considering any of the intermediate steps.
How would you answer the above question? Would you read the question differently and start writing about Statspack, AWR, ADDM, 10046 traces, or process state dumps? Would you lead the OP step by step through the process? Or would you take an entirely different approach?