December 6, 2009
Over the years I have participated in several discussions about enqueue waits. Below are some of those discussions.
http://forums.oracle.com/forums/thread.jspa?threadID=638246
ENQ: TX – ROW LOCK CONTENTION – this is a row level lock caused by a possible primary key or unique index violation if the first session commits its current transaction. A second session attempting to insert the value that will result in the primary key violation will sit in this wait event until the first session commits or rolls back.
Problems with initrans would show up as ENQ: TX – ALLOCATE ITL ENTRY waits in 10g/11g. But, such problems should be rare with tablespaces using ASSM.
After fixing the formatting of the top 5 wait events (total duration unknown):
EVENT TOTAL_WAITS TIME_WAITED AVG_MS PERCENT ---------------------------------------------------------------------- CPU 94,623.39 48 enq: TX - row lock contention 12,531 36,607.28 2921.34 18 control file parallel write 1,300,731 30,880.79 23.74 16 log file parallel write 1,510,503 12,640.80 8.37 6 log file sync 1,072,553 9,680.07 9.03 5
12,531 * 3 second time out = 37,593 seconds = 10.44 hours.
What if the reason for the 3 second average wait time is due to a timeout. I performed a little experiment… I changed a row in a test table and then made a pot of coffee.
In session 1:
CREATE TABLE T1 ( C1 NUMBER(10), C2 NUMBER(10), PRIMARY KEY (C1)); INSERT INTO T1 SELECT ROWNUM, ROWNUM*10 FROM DUAL CONNECT BY LEVEL<=1000000; COMMIT;
I now have a test table with 1,000,000 rows. I start monitoring the changes in the wait events roughly every 60 seconds, and V$SESSION_WAIT and V$LOCK roughly 4 times per second.
Back in session 1:
UPDATE T1 SET C1=-C1 WHERE C1<=100;
I have now modified the first 100 rows that were inserted into the table, time to make the pot of coffee.
In session 2, I try to insert a row with a primary key value of -10:
INSERT INTO T1 VALUES ( -10, 10);
Session 2 hangs.
If I take the third 60 second snap of the system wide wait events as the zero point, and the 11th snap as the end point. There were 149 waits on ENQ: TX – ROW LOCK CONTENTION, 148 time outs, 446.62 seconds of total time in the wait event, with an average wait time of 2.997450 seconds.
Rolling down to the session level wait events, SID 208 (my session 2) had 149 waits on ENQ: TX – ROW LOCK CONTENTION, for a total time of 446.61 seconds with an average wait time of 2.997383 seconds. All of the 149 waits and the wait time was in this one session that was locked up for the full duration of this time period because session 1 was making a pot of coffee.
Rolling down to V$SESSION_WAIT (sampled roughly 4 times per second): At the start of the third time interval, SID 208 has been in the ENQ: TX – ROW LOCK CONTENTION wait event for 39 seconds and is actively waiting trying to execute SQL with a hash value of 1001532423, the wait object is -1, wait file is 0, wait block is 0, wait row is 0, P1 is 1415053316, P2 is 196646, P3 is 4754.
At the end of the 11th time interval: , SID 208 has been in the ENQ: TX – ROW LOCK CONTENTION wait event for 483 seconds and is actively waiting trying to execute SQL with a hash value of 1001532423, the wait object is -1, wait file is 0, wait block is 0, wait row is 0, P1 is 1415053316, P2 is 196646, P3 is 4754.
Rolling down to V$LOCK (sampled roughly 4 times per second): I see that SID 214 (session 1) is blocking SID 208 (session 2). SID 214 has a TX lock in mode 6 with ID1 of 196646 and ID2 of 4754. SID 208 is requesting a TX lock in mode 4 with ID1 of 196646 and ID2 of 4754.
So, it seems that I need a faster coffee pot rather than an additional index on my table. It could be that the above process would have found that the application associated with SID 214 was abandoned or crashed and for some reason the lock was not released for a long period of time, a little less than 10.44 hours in your case.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
http://forums.oracle.com/forums/thread.jspa?threadID=860488
TYPE=TX, LMODE=6, REQUEST=6 = session 2 is attempting to update a row that is locked by session 1’s transaction (likely session 1 already updated the row)
TYPE=TX, LMODE=6, REQUEST=4 = session 2 is attempting to insert a row with the same primary key value or unique index value as a row just inserted by session 1
TYPE=TX, LMODE=6, REQUEST=4 = ITL Problem or bitmap index fragment problem
RBS Seg # = trunc(p2/65536) = v$transaction.xidusn
RBS Slot # = bitand( p2, 65535) = v$transaction.xidslot
P3 = v$transaction.xidsqn
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
http://forums.oracle.com/forums/thread.jspa?messageID=2613515
Let’s try a couple experiments to see if we can reproduce what you are seeing (this may not be the only possible cause):
#Basic Setup
Session 1:
CREATE TABLE T1 ( C1 VARCHAR2(10), C2 NUMBER(12), PRIMARY KEY (C1)); Table created. INSERT INTO T1 VALUES( 'ONE', 1); 1 row created. INSERT INTO T1 VALUES( 'TWO', 2); 1 row created. SELECT DISTINCT SID FROM V$MYSTAT; SID ---------- 150
#Test 1 (Primary Key Violation, No Commit)
Session 2:
SELECT DISTINCT SID FROM V$MYSTAT; SID ---------- 143 INSERT INTO T1 VALUES( 'TWO', 2);
(Session 2 Hung)
In Session 1:
SELECT S.SID, S.PROGRAM, S.STATUS, SW.EVENT, SW.WAIT_TIME, SW.STATE, SW.SECONDS_IN_WAIT, S.SQL_ADDRESS, S.SQL_HASH_VALUE, S.ROW_WAIT_OBJ#, S.ROW_WAIT_FILE#, S.ROW_WAIT_BLOCK#, S.ROW_WAIT_ROW#, SW.P1 , SW.P2, SW.P3 FROM V$SESSION S, V$SESSION_WAIT SW WHERE S.USERNAME IS NOT NULL AND SW.SID=S.SID AND SW.EVENT NOT LIKE '%SQL*Net%' AND SW.EVENT NOT IN ('Streams AQ: waiting for messages in the queue','wait for unread message on broadcast channel'); SID PROGRAM STATUS EVENT WAIT_TIME STATE SECONDS_IN_WAIT SQL_ADDR SQL_HASH_VALUE ROW_WAIT_OBJ# ROW_WAIT_FILE# ROW_WAIT_BLOCK# ROW_WAIT_ROW# P1 P2 P3 --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 143 sqlplus.exe ACTIVE enq: TX - row lock contention 0 WAITING 424 1CA703CC 2983910641 -1 0 0 0 1415053316 393235 11543
SELECT S.SID, S.PROGRAM, S.SQL_HASH_VALUE, L.REQUEST, L.ID1, L.ID2, L.LMODE, L.TYPE, L.BLOCK FROM V$LOCK L, V$SESSION S WHERE (L.ID1, L.ID2, L.TYPE) IN (SELECT ID1, ID2, TYPE FROM V$LOCK WHERE REQUEST > 0) AND L.SID=S.SID; SID PROGRAM SQL_HASH_VALUE REQUEST ID1 ID2 LMODE TY BLOCK ----------------------------------------------------------------------------------------------- 143 sqlplus.exe 2983910641 4 393235 11543 0 TX 0 150 sqlplus.exe 883681740 0 393235 11543 6 TX 1
#Test 1 missed the mark, no ROW_WAIT_OBJ#
COMMIT;
In Session 2:
ERROR at line 1: ORA-00001: unique constraint (TESTUSER.SYS_C0014294) violated
#Test 2 (Primary Key Updated, No Commit)
In Session 1:
UPDATE T1 SET C1=C1 WHERE C1='TWO'; 1 row updated.
In Session 2:
UPDATE T1 SET C1=C1 WHERE C1='TWO';
In Session 1:
SELECT S.SID, S.PROGRAM, S.STATUS, SW.EVENT, SW.WAIT_TIME, SW.STATE, SW.SECONDS_IN_WAIT, S.SQL_ADDRESS, S.SQL_HASH_VALUE, S.ROW_WAIT_OBJ#, S.ROW_WAIT_FILE#, S.ROW_WAIT_BLOCK#, S.ROW_WAIT_ROW#, SW.P1 , SW.P2, SW.P3 FROM V$SESSION S, V$SESSION_WAIT SW WHERE S.USERNAME IS NOT NULL AND SW.SID=S.SID AND SW.EVENT NOT LIKE '%SQL*Net%' AND SW.EVENT NOT IN ('Streams AQ: waiting for messages in the queue','wait for unread message on broadcast channel'); SID PROGRAM STATUS EVENT WAIT_TIME STATE SECONDS_IN_WAIT SQL_ADDR SQL_HASH_VALUE ROW_WAIT_OBJ# ROW_WAIT_FILE# ROW_WAIT_BLOCK# ROW_WAIT_ROW# P1 P2 P3 --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 143 sqlplus.exe ACTIVE enq: TX - row lock contention 0 WAITING 38 222799C4 1367536044 73419 1 84362 1 1415053318 589828 11227
SELECT S.SID, S.PROGRAM, S.SQL_HASH_VALUE, L.REQUEST, L.ID1, L.ID2, L.LMODE, L.TYPE, L.BLOCK FROM V$LOCK L, V$SESSION S WHERE (L.ID1, L.ID2, L.TYPE) IN (SELECT ID1, ID2, TYPE FROM V$LOCK WHERE REQUEST > 0) AND L.SID=S.SID; SID PROGRAM SQL_HASH_VALUE REQUEST ID1 ID2 LMODE TY BLOCK ----------------------------------------------------------------------------------------------- 143 sqlplus.exe 1367536044 6 589828 11227 0 TX 0 150 sqlplus.exe 883681740 0 589828 11227 6 TX 1
#Test 2 missed the mark, ROW_WAIT_OBJ# exists and ROW_WAIT_ROW# exists
COMMIT;
In Session 2:
COMMIT;
#Test 3 (Unique Index Violation)
In Session 1:
CREATE UNIQUE INDEX IND_T1 ON T1(C2); Index created. INSERT INTO T1 VALUES( 'THREE', 3); 1 row created.
In Session 2:
INSERT INTO T1 VALUES( 'FOUR', 3);
(Session 2 Hung)
In Session 1:
SELECT S.SID, S.PROGRAM, S.STATUS, SW.EVENT, SW.WAIT_TIME, SW.STATE, SW.SECONDS_IN_WAIT, S.SQL_ADDRESS, S.SQL_HASH_VALUE, S.ROW_WAIT_OBJ#, S.ROW_WAIT_FILE#, S.ROW_WAIT_BLOCK#, S.ROW_WAIT_ROW#, SW.P1 , SW.P2, SW.P3 FROM V$SESSION S, V$SESSION_WAIT SW WHERE S.USERNAME IS NOT NULL AND SW.SID=S.SID AND SW.EVENT NOT LIKE '%SQL*Net%' AND SW.EVENT NOT IN ('Streams AQ: waiting for messages in the queue','wait for unread message on broadcast channel'); SID PROGRAM STATUS EVENT WAIT_TIME STATE SECONDS_IN_WAIT SQL_ADDR SQL_HASH_VALUE ROW_WAIT_OBJ# ROW_WAIT_FILE# ROW_WAIT_BLOCK# ROW_WAIT_ROW# P1 P2 P3 -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 143 sqlplus.exe ACTIVE enq: TX - row lock contention 0 WAITING 41 1C8F8B60 463727853 -1 1 84362 1 1415053316 393242 11541
SELECT S.SID, S.PROGRAM, S.SQL_HASH_VALUE, L.REQUEST, L.ID1, L.ID2, L.LMODE, L.TYPE, L.BLOCK FROM V$LOCK L, V$SESSION S WHERE (L.ID1, L.ID2, L.TYPE) IN (SELECT ID1, ID2, TYPE FROM V$LOCK WHERE REQUEST > 0) AND L.SID=S.SID; SID PROGRAM SQL_HASH_VALUE REQUEST ID1 ID2 LMODE TY BLOCK ----------------------------------------------------------------------------------------------- 143 sqlplus.exe 463727853 4 393242 11541 0 TX 0 150 sqlplus.exe 883681740 0 393242 11541 6 TX 1
#Test 3 missed the mark, no ROW_WAIT_OBJ#
In Session 1:
COMMIT; Commit complete.
In Session 2:
ERROR at line 1: ORA-00001: unique constraint (TESTUSER.IND_T1) violated
#Test 4 (Unique Index Violation)
In Session 1:
UPDATE T1 SET C2=C2+1 WHERE C2=3; 1 row updated.
In Session 2:
INSERT INTO T1 VALUES( 'FOUR', 3);
(Session 2 Hung)
In Session 1:
SELECT S.SID, S.PROGRAM, S.STATUS, SW.EVENT, SW.WAIT_TIME, SW.STATE, SW.SECONDS_IN_WAIT, S.SQL_ADDRESS, S.SQL_HASH_VALUE, S.ROW_WAIT_OBJ#, S.ROW_WAIT_FILE#, S.ROW_WAIT_BLOCK#, S.ROW_WAIT_ROW#, SW.P1 , SW.P2, SW.P3 FROM V$SESSION S, V$SESSION_WAIT SW WHERE S.USERNAME IS NOT NULL AND SW.SID=S.SID AND SW.EVENT NOT LIKE '%SQL*Net%' AND SW.EVENT NOT IN ('Streams AQ: waiting for messages in the queue','wait for unread message on broadcast channel'); SID PROGRAM STATUS EVENT WAIT_TIME STATE SECONDS_IN_WAIT SQL_ADDR SQL_HASH_VALUE ROW_WAIT_OBJ# ROW_WAIT_FILE# ROW_WAIT_BLOCK# ROW_WAIT_ROW# P1 P2 P3 -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 143 sqlplus.exe ACTIVE enq: TX - row lock contention 0 WAITING 18 1C8F8B60 463727853 -1 1 84362 1 1415053316 65560 8736
SELECT S.SID, S.PROGRAM, S.SQL_HASH_VALUE, L.REQUEST, L.ID1, L.ID2, L.LMODE, L.TYPE, L.BLOCK FROM V$LOCK L, V$SESSION S WHERE (L.ID1, L.ID2, L.TYPE) IN (SELECT ID1, ID2, TYPE FROM V$LOCK WHERE REQUEST > 0) AND L.SID=S.SID; SID PROGRAM SQL_HASH_VALUE REQUEST ID1 ID2 LMODE TY BLOCK ----------------------------------------------------------------------------------------------- 143 sqlplus.exe 463727853 4 65560 8736 0 TX 0 150 sqlplus.exe 883681740 0 65560 8736 6 TX 1
#Test 4 missed the mark, ROW_WAIT_OBJ# does not exist and ROW_WAIT_ROW# exists
ROLLBACK;
In Session 2:
ROLLBACK;
#Test 5 (Select For Update in Two Sessions)
In Session 1:
SELECT * FROM T1 FOR UPDATE; C1 C2 --------------------- ONE 1 TWO 2 THREE 3
In Session 1:
SELECT * FROM T1 FOR UPDATE;
(Session 2 Hung)
In Session 1:
SELECT S.SID, S.PROGRAM, S.STATUS, SW.EVENT, SW.WAIT_TIME, SW.STATE, SW.SECONDS_IN_WAIT, S.SQL_ADDRESS, S.SQL_HASH_VALUE, S.ROW_WAIT_OBJ#, S.ROW_WAIT_FILE#, S.ROW_WAIT_BLOCK#, S.ROW_WAIT_ROW#, SW.P1 , SW.P2, SW.P3 FROM V$SESSION S, V$SESSION_WAIT SW WHERE S.USERNAME IS NOT NULL AND SW.SID=S.SID AND SW.EVENT NOT LIKE '%SQL*Net%' AND SW.EVENT NOT IN ('Streams AQ: waiting for messages in the queue','wait for unread message on broadcast channel'); SID PROGRAM STATUS EVENT WAIT_TIME STATE SECONDS_IN_WAIT SQL_ADDR SQL_HASH_VALUE ROW_WAIT_OBJ# ROW_WAIT_FILE# ROW_WAIT_BLOCK# ROW_WAIT_ROW# P1 P2 P3 -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 143 sqlplus.exe ACTIVE enq: TX - row lock contention 0 WAITING 18 1C911EE4 3199333328 73419 1 84362 0 1415053318 327707 11953
SELECT S.SID, S.PROGRAM, S.SQL_HASH_VALUE, L.REQUEST, L.ID1, L.ID2, L.LMODE, L.TYPE, L.BLOCK FROM V$LOCK L, V$SESSION S WHERE (L.ID1, L.ID2, L.TYPE) IN (SELECT ID1, ID2, TYPE FROM V$LOCK WHERE REQUEST > 0) AND L.SID=S.SID; SID PROGRAM SQL_HASH_VALUE REQUEST ID1 ID2 LMODE TY BLOCK ----------------------------------------------------------------------------------------------- 143 sqlplus.exe 3199333328 6 327707 11953 0 TX 0 150 sqlplus.exe 883681740 0 327707 11953 6 TX 1
#Test 5 ROW_WAIT_OBJ# exist and ROW_WAIT_ROW# does not exists, lock request mode is 6 and the other session is holding the lock in mode 6. It looks like one possibility was found.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
http://forums.oracle.com/forums/thread.jspa?messageID=3188695
Based on a calculation it appears that the Statspack duration is about 20 minutes (this is likely reasonable). I did not spend a lot of time analyzing the information that you posted, so a quick summary of what I see that looks interesting:
Avg Total Wait wait Waits Event Waits Timeouts Time (s) (ms) /txn ---------------------------- ------------ ---------- ---------- ------ -------- enqueue 83 58 208 2509 0.1 resmgr:waiting in end wait 265 0 25 94 0.4 log file parallel write 1,939 0 22 11 3.2 resmgr:waiting in check 8 0 17 2112 0.0
In the above, notice that there are 58 time outs for the enqueue. The timeout period for an enqueue is 3 seconds, which means that you potentially had a single wait that lasted 58*3 (+ up to 2.99) seconds, which totals 174 (+ up to 2.99) seconds. To make the calculation simple, assume that this was a single wait lasting exactly 174 seconds. The remaining 47 seconds of the wait divided by 25 (83-58) means that the remaining average wait time is 1.88 seconds. With Oracle 9.0.1 it is more difficult to see how much time was spent in each of the different types of enqueues compared to Oracle 10g and above.
Looking at the enqueue information that you posted:
Avg Wt Wait Eq Requests Succ Gets Failed Gets Waits Time (ms) Time (s) -- ------------ ------------ ----------- ----------- ----------- ------------ TX 569 568 0 30 7,356.07 221
It appears that all of the enqueue activity is of the type TX, while are row level enqueues – usually meaning that someone updated a row, and there was a pause between the time that the update was performed and a commit or rollback was executed. There are multiple causes for TX enqueues which may not be found with a Statspack report. From my notes (possibly based on the Oracle Performance Tuning Guide):
* TX enqueues are acquired exclusive when a transaction initiates its first change and held until the transaction COMMITs or ROLLBACK.
* Waits for TX in mode 6 occurs when a session is waiting for a row level lock that is already held by another session. The wait event for mode 6 TX enqueues is ‘enq: TX – row lock contention’ on 10g and above.
* Waits for TX in mode 4 can occur when a session wants to lock a row in the block, but one or more other sessions have rows locked in the same block and there is no free Interested Transaction LIST slot in the block. Oracle will usually dynamically add another ITL slot if there is sufficient space in the block. To fix this problem, increase INITRANS or MAXTRANS for the table. This type of TX enqueue corresponds to the wait event ‘enq: TX – allocate ITL’ on 10g and above.
* Waits for TX in mode 4 can also occur if the session is waiting for potential duplicates in a UNIQUE index, which would occur if a second session inserts a value before the first tries to insert the same value, and the second session has not yet committed. This type of enqueue corresponds to the wait event: ‘enq: TX – row lock contention’ on 10g and above.
* ‘Waits for TX in mode 4 can also occur when a transaction is inserting a row into an index and must wait for the end of an index block split being done by another transaction. This type of TX enqueue wait corresponds to the wait event ‘enq: TX – index’ on 10g and above.
You might want to look at the SQL statements sorted by CPU utilization based on the above description.
Taking a look at a couple SQL statements that you posted:
CPU Elapsd Buffer Gets Executions Gets per Exec %Total Time (s) Time (s) Hash Value --------------- ------------ -------------- ------ -------- --------- ---------- 3,301,423 91 36,279.4 67.5 0.01 0.01 4086952936 UPDATE IPAO.EMP_MAST SET OSCAL_CODE=:b1 WHERE :b1 < 600 902,040 30 30,068.0 18.5 0.00 0.00 2333288512 UPDATE IPAO.MON_ACT SET CHEQUE_NO=:b1,CHEQUE_DT=:b2,CHEQUE_AMT=:b3 WHERE V_NO = :b4 AND V_DATE = :b5
The first of the two looks interesting. I am a little surprised by the small number for the CPU utilization considering the number of buffer gets. The WHERE clause “WHERE :b1 < 600” seems a little unusual.
Excellent post with excellent test cases for understanding Enqueues..
Thanks a lot for sharing…
-Yasser
Yasser, thanks for the feedback regarding the post. It is good to know that some of the information that I have added to the blog is helpful for others.
Excellent article.
Hi Charles…
Nice post as always , with lots of details demonstrations , really helps readers.
I would like to add one more scenario …of enq: TX row contention , which is pretty tough to interpret when it occurred but at the end it turned out to be a good one to investigate.
Scenario:
– enq: TX row contention for a transaction ( merge load) — ROW_WAIT_FILE# (from v$session) – pointing to undo file, whereas ROW_WAIT_OBJ# points to 0 (which essential means an undo segment).
– no user spawned sid exists that is rolling back this transaction, so I interpreted the USN from P2 value of the waitevent and could find out the status of rollback (how many blocks rolled back and how many blocks total blocks left for rollback) and found that smon was rolling back the transaction.
– the load job which completes in maximum 5 minutes was running for 200 minutes.
– so it was obvious that the load job processes were waiting on enqueue held by smon on undo segments.
Later came to know that someone was doing a big update on the table and when it was running long , he killed the update statement and asked the DBA to make the rollback fast. So , the DBA killed the user session from the OS , to force SMON to do the rollback and changed the parameter fast_start_parallel_rollback to high which made smon to spawn 256 slaves.
I traced the SMON (event 10500 , level 10) and could find , it was not doing this rollback dedicatedly because it has to do other chores time to time like drop temp segment , shrink undo etc…
It was wait and watch thing but was a good learning… Just wanted to share.
Later , googling got me this link and it matched ditto with my case:
http://www.freelists.org/post/oracle-l/How-much-rollback-left-to-apply
Question was raised by Kyle Hailey and the best explanation I could see was from J Lewis.
thanks
—Raj
Raj,
Thank you for sharing your story, and providing a link to the Oracle-L discussion.