August 1, 2013
Four years ago while co-writing two chapters for an Oracle Database book, an interesting test case was put together to demonstrate how enqueues are handled when multiple sessions are competing for some of the same database resources – the test case was intended to demonstrate the neat, orderly, and predictable process implemented by Oracle Database. Oracle Database 10.2.0.1 through 10.2.0.4 behaved essentially the same when executing the test case script, while Oracle Database 11.1.0.6, 11.1.0.7, and 11.2.0.1 behaved differently when executing the test case script (11.2.0.2 and 11.2.0.3 behave just like 11.2.0.1 when executing the script).
Oracle Database 11.1.0.x and 11.2.0.x terminated the test case early with a deadlock error. Jonathan Lewis offered an exceptional explanation of what happened with my test case script in his Lock Horror article. He also mentioned a couple of interesting notes in comments attached to my article, one of which stated that Oracle Database 9.2.0.5 (or 9.2.0.4) had also introduced a change in enqueue rules related to foreign keys. Richard Foote put together a related article that explained the changes in locking mode with foreign keys in his Oracle11g: New Locking Modes When Policing FK Constraints article. Trying not to forget related articles (see the note at the end of this blog article), Tom Kyte also authored a blog article titled Something I recently unlearned… that is somewhat related to the test case script that appeared in the Expert Oracle Practices book. So, the rules have changed, deadlocks should now be expected when executing the test case script.
I just started experimenting with Oracle Database 12.1.0.1. My very first experiment with the new database version was to test the behavior of the enqueue script to verify that it results in a deadlock on Oracle Database 12.1.0.1. Consistency is one of the key factors in a database application – if an application executes:
SELECT 1 + 1 FROM DUAL;
a developer should be reasonably certain that Oracle Database will either return the value 2 or the binary number 10 when the above SQL statement is executed, regardless of whether Oracle Database 6.x or Oracle Database 12.x is used by an application written by the developer. The same developer should have a reasonable expectation that his application, if it is compatible with Oracle Database 9.0.1, should work pretty much the same way on Oracle Database 12.1.0.1. But, what happens when Oracle Corp. fixes bugs in Oracle Database?
Consistency is important, but it is also important to recognize that behavior may change consistently.
Let’s reintroduce the test case script from my earlier article (which also appeared in the Expert Oracle Practices book). Executing this script requires four sessions, each of which is logged in as a different user (to minimize the difficulty associated with deciphering the results of the SQL statements):
- Session 1 connected as TESTUSER (the owner of the tables)
- Session 2 connected as USER2
- Session 3 connected as USER3
- Session 4 connected as SYS (only to query the various performance views)
First, we need a couple of test tables with a declared foreign key relationship, without an index on the foreign key column. In Session 1:
CREATE TABLE T1( C1 NUMBER(10) PRIMARY KEY); INSERT INTO T1 SELECT ROWNUM FROM DUAL CONNECT BY LEVEL<=100000; COMMIT; CREATE TABLE T2( C1 NUMBER(10) PRIMARY KEY, C2 NUMBER(10), CONSTRAINT FK_T1_C1 FOREIGN KEY(C2) REFERENCES T1(C1) ENABLE); INSERT INTO T2 SELECT ROWNUM, ROWNUM FROM DUAL CONNECT BY LEVEL<=100000; COMMIT; GRANT ALL ON T1 TO PUBLIC; GRANT ALL ON T2 TO PUBLIC; CREATE PUBLIC SYNONYM T1 FOR T1; CREATE PUBLIC SYNONYM T2 FOR T2;
In Session 1, insert a row into the parent and child tables without issuing a commit:
INSERT INTO T1 VALUES(100010); INSERT INTO T2 VALUES(100010,100010);
In Session 2, insert a row into the parent and child tables, delete a row in the child table so that the primary key column value of the associated row in the parent table may be modified, and modify the parent row’s primary key column value:
INSERT INTO T1 VALUES(100020); INSERT INTO T2 VALUES(100020,100020); DELETE FROM T2 WHERE C1=50; UPDATE T1 SET C1=100030 WHERE C1=50;
(Session 2 is hung)
Session 3 cannot see that a row was inserted by the now hung session 2, so session 3 attempts to insert a row with the same primary key value. In Session 3:
INSERT INTO T1 VALUES(100020);
(Session 3 is hung)
The enqueues at this point on Oracle Database 10.2.0.4 differs from those found on Oracle Database 11.1.0.6, and (for consistency) those enqueues differ from what are found on Oracle Database 12.1.0.1.
In Session 4, let’s take a look at the enqueues (note that the join to the V$SESSION_WAIT performance view is unnecessary starting with Oracle Database 10.1.0.x, however I left that join in place to allow easy adaptation of the SQL statement so that it may work with older Oracle Database versions).
SET LINESIZE 165 SET PAGESIZE 1000 SET TRIMSPOOL ON COLUMN SID FORMAT 9999 COLUMN PROGRAM FORMAT A11 COLUMN USERNAME FORMAT A8 COLUMN LMODE FORMAT 99 COLUMN REQUEST FORMAT 99 COLUMN CN FORMAT 99 COLUMN EVENT FORMAT A30 COLUMN STATE FORMAT A8 COLUMN S_I_W FORMAT 9999 COLUMN WT FORMAT 9999 COLUMN OBJ# FORMAT 999999 COLUMN FILE# FORMAT 999 COLUMN BLOCK# FORMAT 9999999 COLUMN P2 FORMAT 9999999 COLUMN P3 FORMAT 999999 SELECT S.SID, S.USERNAME, S.PROGRAM, S.STATUS, SW.EVENT, SW.WAIT_TIME WT, SW.STATE, SW.SECONDS_IN_WAIT S_I_W, S.SQL_ID, S.SQL_CHILD_NUMBER CN, S.ROW_WAIT_OBJ# OBJ#, S.ROW_WAIT_FILE# FILE#, S.ROW_WAIT_BLOCK# BLOCK#, S.ROW_WAIT_ROW# ROW#, SW.P1, SW.P2, SW.P3 FROM V$SESSION_WAIT SW, V$SESSION S WHERE S.USERNAME IS NOT NULL AND SW.SID=S.SID AND SW.EVENT NOT LIKE '%SQL*Net%' AND SW.EVENT NOT IN ('Streams AQ: waiting for messages in the queue', 'wait for unread message on broadcast channel'); SID USERNAME PROGRAM STATUS EVENT WT STATE S_I_W SQL_ID CN OBJ# FILE# BLOCK# ROW# P1 P2 P3 ----- -------- ----------- -------- ------------------------------ ----- -------- ----- ------------- --- ------- ----- -------- ---- ---------- -------- ------- 184 USER2 sqlplus.exe ACTIVE enq: TM - contention 0 WAITING 89 4rtg0hv0atfkx 0 -1 0 0 0 1414332421 91943 0 243 USER3 sqlplus.exe ACTIVE enq: TX - row lock contention 0 WAITING 49 cv338j6z2530g 0 -1 0 0 0 1415053316 65548 1531
For comparison, here is the output of the same SQL statement on 11.1.0.7 from the earlier blog article:
__SID USERNAME PROGRAM STATUS EVENT WT STATE S_I_W SQL_ID CN OBJ# FILE# BLOCK# ROW# P1 P2 P3 ----- -------- ----------- -------- ------------------------------ ----- -------- ----- ------------- --- ------- ----- -------- ---- ---------- -------- ------- 307 USER2 sqlplus.exe ACTIVE enq: TM - contention 0 WAITING 422 4rtg0hv0atfkx 0 -1 0 0 0 1414332421 82913 0 314 USER3 sqlplus.exe ACTIVE enq: TM - contention 0 WAITING 407 cv338j6z2530g 0 -1 0 0 0 1414332419 82913 0
For comparison, here is the output of the same SQL statement on 10.2.0.4 from the earlier blog article:
__SID USERNAME PROGRAM STATUS EVENT WT STATE S_I_W SQL_ID CN OBJ# FILE# BLOCK# ROW# P1 P2 P3 ----- -------- ----------- -------- ------------------------------ ----- -------- ----- ------------- --- ------- ----- -------- ---- ---------- -------- ------- 204 USER2 sqlplus.exe ACTIVE enq: TM - contention 0 WAITING 213 4rtg0hv0atfkx 0 -1 0 0 0 1414332421 16472 0 217 USER3 sqlplus.exe ACTIVE enq: TM - contention 0 WAITING 201 cv338j6z2530g 0 -1 0 0 0 1414332418 16472 0
… USER3 on 12.1.0.1 seems to not be waiting on a table level (TM) enqueue.
In Session 4, let’s try the second SQL statement to examine the lock modes:
SELECT S.SID, S.USERNAME, S.PROGRAM, S.SQL_ID, S.SQL_CHILD_NUMBER CN, S.ROW_WAIT_OBJ#, S.ROW_WAIT_FILE#, S.ROW_WAIT_BLOCK#, S.ROW_WAIT_ROW#, L.LMODE, L.REQUEST, L.ID1, L.ID2, L.TYPE, L.BLOCK FROM V$LOCK L, V$SESSION S WHERE (L.ID1, L.ID2, L.TYPE) IN (SELECT ID1, ID2, TYPE FROM V$LOCK WHERE REQUEST > 0) AND L.SID=S.SID; SID USERNAME PROGRAM SQL_ID CN ROW_WAIT_OBJ# ROW_WAIT_FILE# ROW_WAIT_BLOCK# ROW_WAIT_ROW# LMODE REQUEST ID1 ID2 TY BLOCK ----- -------- ----------- ------------- --- ------------- -------------- --------------- ------------- ----- ------- ---------- ---------- -- -------- 127 TESTUSER sqlplus.exe -1 0 0 0 3 0 91943 0 TM 1 184 USER2 sqlplus.exe 4rtg0hv0atfkx 0 -1 0 0 0 3 5 91943 0 TM 1 184 USER2 sqlplus.exe 4rtg0hv0atfkx 0 -1 0 0 0 6 0 65548 1531 TX 1 243 USER3 sqlplus.exe cv338j6z2530g 0 -1 0 0 0 2 0 91943 0 TM 0 243 USER3 sqlplus.exe cv338j6z2530g 0 -1 0 0 0 0 4 65548 1531 TX 0
For comparison, here is the output of the same SQL statement on 11.1.0.7 from the earlier blog article:
__SID USERNAME PROGRAM SQL_ID CN ROW_WAIT_OBJ# ROW_WAIT_FILE# ROW_WAIT_BLOCK# ROW_WAIT_ROW# LMODE REQUEST ID1 ID2 TY BLOCK ----- -------- ----------- ------------- --- ------------- -------------- --------------- ------------- ----- ------- ---------- ---------- -- -------- 320 TESTUSER sqlplus.exe 0vbusv12hnbk6 0 12517 1 29656 0 3 0 82913 0 TM 1 307 USER2 sqlplus.exe 4rtg0hv0atfkx 0 -1 0 0 0 3 5 82913 0 TM 1 314 USER3 sqlplus.exe cv338j6z2530g 0 -1 0 0 0 0 3 82913 0 TM 0
For comparison, here is the output of the same SQL statement on 10.2.0.4 from the earlier blog article:
__SID USERNAME PROGRAM SQL_ID CN ROW_WAIT_OBJ# ROW_WAIT_FILE# ROW_WAIT_BLOCK# ROW_WAIT_ROW# LMODE REQUEST ID1 ID2 TY BLOCK ----- -------- ----------- ------------- --- ------------- -------------- --------------- ------------- ----- ------- ---------- ---------- -- -------- 213 TESTUSER sqlplus.exe 0vbusv12hnbk6 0 0 2 799 0 3 0 16472 0 TM 1 204 USER2 sqlplus.exe 4rtg0hv0atfkx 0 -1 0 0 0 3 5 16472 0 TM 0 217 USER3 sqlplus.exe cv338j6z2530g 0 -1 0 0 0 0 2 16472 0 TM 0
So, on Oracle Database 12.1.0.1, USER3 is not waiting on a table level (TM) equeue (it is holding a level 2 TM enqueue, while on 10.2.0.4 that session was trying to acquire a level 2 TM lock, and on 11.1.0.7 that session was trying to acquire a level 3 TM lock), it is instead waiting on a row level (TX – transaction) enqueue that is held in exclusive mode (level 6) by USER2. Quite honestly, that result is similar to what I was hoping to see four years ago when the test script was put together, because USER3 really is waiting due to a potential primary key collision, if session 2 issues a commit.
Session 2 is hung, so it cannot issue a commit. So, in Session 1:
COMMIT;
Session 2 is no longer hung, and now shows:
1 row updated.
Session 3, of course, remains hung waiting for Session 2 to either issue a COMMIT or a ROLLBACK to resolve the potential primary key collision.
For comparison, Session 3 on 11.1.0.7 from the earlier blog article showed the following:
INSERT INTO T1 VALUES(100020) * ERROR at line 1: ORA-00060: deadlock detected while waiting for resource
… and Session 2 showed the following on 11.1.0.7:
1 row updated.
Is it possible that the Session 2 could have displayed a deadlock message, rather than Session 3 showing a deadlock message? I seem to remember reading that there is a good chance that the session that has been in the enqueue the longest could be selected as the “vicitim” of the deadlock, but I might be incorrectly remembering that rule. Oracle Database really did not completely resolve the deadlock, nor did it “kill” session 3 (as too many books and blog article state would happen) when the deadlock message appeared in that session’s SQL*Plus session.
For comparison, Oracle Database 10.2.0.4 behaved similar to Oracle Database 12.1.0.1, with Session 2 showing 1 row updated, and Session 3 remaining hung. The more things change, the more they stay the same?
—
Thoughts? Curious about the test cases found on the web pages that I linked to earlier? Does Oracle Database 12.1.0.1 behave more like 11.2.0.3 or 10.2.0.4 when foreign key columns are not properly indexed?
Recent Comments