Deadlock on Oracle 11g but Not on 10g

7 01 2010

January 7, 2010

Is it possible to trigger a deadlock on Oracle Database 11.1.0.6, 11.1.0.7, and 11.2.0.1 when the exact same procedure does not trigger a deadlock on Oracle Database 10.2.0.4?  Trace files containing deadlocks always contain the following advice:

The following deadlock is not an ORACLE error. It is a deadlock due to user error in the design of an application or from issuing incorrect ad-hoc SQL.

In short, the above states that deadlocks are programming and/or end-user errors.

A couple of months after I generated a test case like the one below, I read an interesting blog article by Mark Bobak, a fellow OakTable Network member and frequent contributor to the Oracle OTN forums and Oracle-L list.  I then started to wonder if the test case I put together applied to Mark’s article (clarification: my test case was built roughly a year after Mark’s article was written, I only recently found Mark’s article).

I have a few favorite Oracle books, and I thought that I would quote from two of those books:

Expert Oracle Database Architecture: 9i and 10g Programming Techniques and Solutions

“So, when do you not need to index a foreign key? The answer is, in general, when the following conditions are met:

  • You do not delete from the parent table.
  • You do not update the parent table’s unique/primary key value (watch for unintended updates to the primary key by tools!).
  • You do not join from the parent to the child (like DEPT to EMP).”

Page 302 of the book “Expert One-On-One” and the reprint Expert Oracle, Signature Edition state essentially the same thing as the above book.

Oracle Wait Interface: a Practical Guide to Performance Diagnostics & Tuning

“Unindexed foreign keys are no longer an issue starting in Oracle9i Database.”

Interesting…

I have not read the following book, but found it through a Google search.  This book seems to take the middle ground:

OCA Oracle Database 11g: Administration I Exam Guide

“However, you should always create indexes on the foreign key columns within the child table for performance reasons: a DELETE on the parent table will be much faster if Oracle can use an index to determine whether there are any rows in the child table referencing the row that is being deleted.” 

OK, combining the three quotes – in general, there are only 3 cases when foreign key columns do not need to be indexed, as of Oracle 9i it is no longer necessary to index foreign key columns, but in Oracle 11g creating an index on a foreign key column will speed up deletes on the parent table.

Let’s keep the above in mind while we trigger a deadlock in Oracle 11.1.0.7. Refer back to the articles on Enqueues and Deadlocks to see if you are able to determine what happens in the test case (or more accurately WHY it happens) (clarification: this test case simulates a “screen painter” type program that updates every column in a row based on the fields on screen – the same behavior should be present if the primary key value in the parent table is updated to the same value, rather than a different value – it also relies on the fact that a row that is inserted by one session but not committed is not visible to another session until the first session commits).

First, we need a couple test tables:

CREATE TABLE T1(
  C1 NUMBER(10) PRIMARY KEY);

INSERT INTO
  T1
SELECT
  ROWNUM
FROM
  DUAL
CONNECT BY
  LEVEL<=100000;

COMMIT;

CREATE TABLE T2(
  C1 NUMBER(10) PRIMARY KEY,
  C2 NUMBER(10),
  CONSTRAINT FK_T1_C1 FOREIGN KEY(C2) REFERENCES T1(C1) ENABLE);

INSERT INTO
  T2
SELECT
  ROWNUM,
  ROWNUM
FROM
  DUAL
CONNECT BY
  LEVEL<=100000;

COMMIT;

GRANT ALL ON T1 TO PUBLIC;
GRANT ALL ON T2 TO PUBLIC;

CREATE PUBLIC SYNONYM T1 FOR T1;
CREATE PUBLIC SYNONYM T2 FOR T2;

OK, we have two tables with a foreign key relationship, and no index on the foreign key column.  Now, we need four sessions:

  • Session 1 connected as TESTUSER (the owner of the tables)
  • Session 2 connected as USER2
  • Session 3 connected as USER3
  • Session 4 connected as SYS (only to query the various performance views)

Session 1:

INSERT INTO T1 VALUES(100010);
INSERT INTO T2 VALUES(100010,100010);

Session 2:

INSERT INTO T1 VALUES(100020);
INSERT INTO T2 VALUES(100020,100020);
DELETE FROM T2 WHERE C1=50;
UPDATE T1 SET C1=100030 WHERE C1=50;

{session 2 hangs}

Session 3:

INSERT INTO T1 VALUES(100020);

{session 3 hangs}

Session 4:

SELECT /*+ ORDERED */
  S.SID,
  S.USERNAME,
  S.PROGRAM,
  S.STATUS,
  SW.EVENT,
  SW.WAIT_TIME WT,
  SW.STATE,
  SW.SECONDS_IN_WAIT S_I_W,
  S.SQL_ID,
  S.SQL_ADDRESS,
  S.SQL_HASH_VALUE,
  S.SQL_CHILD_NUMBER,
  S.ROW_WAIT_OBJ# OBJ#,
  S.ROW_WAIT_FILE# FILE#,
  S.ROW_WAIT_BLOCK# BLOCK#,
  S.ROW_WAIT_ROW# ROW#,
  SW.P1,
  SW.P2,
  SW.P3
FROM
  V$SESSION_WAIT SW,
  V$SESSION S
WHERE
  S.USERNAME IS NOT NULL
  AND SW.SID=S.SID
  AND SW.EVENT NOT LIKE '%SQL*Net%'
  AND SW.EVENT NOT IN ('Streams AQ: waiting for messages in the queue', 'wait for unread message on broadcast channel');

SID USERNAME PROGRAM     STATUS EVENT                WT STATE   S_I_W SQL_ID        SQL_ADDR SQL_HV     CN OBJ# FILE# BLOCK# ROW#         P1    P2 P3
--- -------- ----------- ------ -------------------- -- ------- ----- ------------- -------- ---------- -- ---- ----- ------ ---- ---------- ----- --
307 USER2    sqlplus.exe ACTIVE enq: TM - contention  0 WAITING   422 4rtg0hv0atfkx 224E2B7C 3232545373  0   -1     0      0    0 1414332421 82913  0
314 USER3    sqlplus.exe ACTIVE enq: TM - contention  0 WAITING   407 cv338j6z2530g 224DAE38 3189935119  0   -1     0      0    0 1414332419 82913  0

 

SELECT
  S.SID,
  S.USERNAME,
  S.PROGRAM,
  S.SQL_ID,
  S.SQL_ADDRESS,
  S.SQL_HASH_VALUE,
  S.SQL_CHILD_NUMBER CN,
  S.ROW_WAIT_OBJ#,
  S.ROW_WAIT_FILE#,
  S.ROW_WAIT_BLOCK#,
  S.ROW_WAIT_ROW#,
  L.LMODE,
  L.REQUEST,
  L.ID1,
  L.ID2,
  L.TYPE,
  L.BLOCK
FROM
  V$LOCK L,
  V$SESSION S
WHERE
  (L.ID1, L.ID2, L.TYPE) IN
    (SELECT
      ID1,
      ID2,
      TYPE
    FROM
      V$LOCK
    WHERE
      REQUEST > 0)
  AND L.SID=S.SID;

SID USERNAME PROGRAM     SQL_ID        SQL_ADDR SQL_HASH_VALUE CN ROW_WAIT_OBJ# ROW_WAIT_FILE# ROW_WAIT_BLOCK# ROW_WAIT_ROW# LMODE REQUEST ID1   ID2 TY BLOCK
--- -------- ----------- ------------- -------- -------------- -- ------------- -------------- --------------- ------------- ----- ------- ----- --- -- -----
307 USER2    sqlplus.exe 4rtg0hv0atfkx 224E2B7C     3232545373  0            -1              0               0             0     3       5 82913   0 TM     1
314 USER3    sqlplus.exe cv338j6z2530g 224DAE38     3189935119  0            -1              0               0             0     0       3 82913   0 TM     0
320 TESTUSER sqlplus.exe 0vbusv12hnbk6 22480E10     1158295110  0         12517              1           29656             0     3       0 82913   0 TM     1

 

Session 1:

COMMIT;

Session 3 shows the following:

INSERT INTO T1 VALUES(100020)
            *
ERROR at line 1:
ORA-00060: deadlock detected while waiting for resource

Session 2 shows the following:

1 row updated.

The trace file contains the following deadlock information:

[Transaction Deadlock]

The following deadlock is not an ORACLE error. It is a
deadlock due to user error in the design of an application
or from issuing incorrect ad-hoc SQL. The following
information may aid in determining the deadlock:

Deadlock graph:
 ---------Blocker(s)--------  ---------Waiter(s)---------
Resource Name          process session holds waits  process session holds waits
TM-000143e1-00000000        21     314    SX             23     307    SX   SSX
TX-0009000d-000030d3        23     307     X             21     314           S

session 314: DID 0001-0015-00000006 session 307: DID 0001-0017-00000001
session 307: DID 0001-0017-00000001 session 314: DID 0001-0015-00000006

Rows waited on:
  Session 314: no row
  Session 307: obj - rowid = 000143E0 - AAAUPgAAFAAFDpsAAA
  (dictionary objn - 82912, file - 5, block - 1325676, slot - 0)

----- Information for the OTHER waiting sessions -----
Session 307:
  sid: 307 ser: 2 audsid: 1210519 user: 186/USER2 flags: 0x100045
  pid: 23 O/S info: user: SYSTEM, term: TESTBOX, ospid: 940
    image: ORACLE.EXE (SHAD)
  client details:
    O/S info: user: TEST, term: TESTBOX, ospid: 1616:1264
    machine: TESTBOX program: sqlplus.exe
    application name: SQL*Plus, hash value=3669949024
  current SQL:

UPDATE T1 SET C1=100030 WHERE C1=50  

Interesting deadlock.  Now, let’s try the same test with Oracle 10.2.0.4:

The table definitions:

CREATE TABLE T1(
  C1 NUMBER(10) PRIMARY KEY);

INSERT INTO
  T1
SELECT
  ROWNUM
FROM
  DUAL
CONNECT BY
  LEVEL<=100000;

COMMIT;

CREATE TABLE T2(
  C1 NUMBER(10) PRIMARY KEY,
  C2 NUMBER(10),
  CONSTRAINT FK_T1_C1 FOREIGN KEY(C2) REFERENCES T1(C1) ENABLE);

INSERT INTO
  T2
SELECT
  ROWNUM,
  ROWNUM
FROM
  DUAL
CONNECT BY
  LEVEL<=100000;

COMMIT;

GRANT ALL ON T1 TO PUBLIC;
GRANT ALL ON T2 TO PUBLIC;

CREATE PUBLIC SYNONYM T1 FOR T1;
CREATE PUBLIC SYNONYM T2 FOR T2;

Now for the test:

Session 1:

INSERT INTO T1 VALUES(100010);
INSERT INTO T2 VALUES(100010,100010);

Session 2:

INSERT INTO T1 VALUES(100020);
INSERT INTO T2 VALUES(100020,100020);
DELETE FROM T2 WHERE C1=50;
UPDATE T1 SET C1=100030 WHERE C1=50;

{session 2 hangs}

Session 3:

INSERT INTO T1 VALUES(100020);

{session 3 hangs}

Session 4:

SELECT /*+ ORDERED */
  S.SID,
  S.USERNAME,
  S.PROGRAM,
  S.STATUS,
  SW.EVENT,
  SW.WAIT_TIME WT,
  SW.STATE,
  SW.SECONDS_IN_WAIT S_I_W,
  S.SQL_ID,
  S.SQL_ADDRESS,
  S.SQL_HASH_VALUE,
  S.SQL_CHILD_NUMBER,
  S.ROW_WAIT_OBJ# OBJ#,
  S.ROW_WAIT_FILE# FILE#,
  S.ROW_WAIT_BLOCK# BLOCK#,
  S.ROW_WAIT_ROW# ROW#,
  SW.P1,
  SW.P2,
  SW.P3
FROM
  V$SESSION_WAIT SW,
  V$SESSION S
WHERE
  S.USERNAME IS NOT NULL
  AND SW.SID=S.SID
  AND SW.EVENT NOT LIKE '%SQL*Net%'
  AND SW.EVENT NOT IN ('Streams AQ: waiting for messages in the queue', 'wait for unread message on broadcast channel');

SID USERNAME PROGRAM     STATUS EVENT                WT STATE   S_I_W SQL_ID        SQL_ADDR SQL_HV     CN OBJ# FILE# BLOCK# ROW#         P1    P2 P3
--- -------- ----------- ------ -------------------- -- ------- ----- ------------- -------- ---------- -- ---- ----- ------ ---- ---------- ----- --
204 USER2    sqlplus.exe ACTIVE enq: TM - contention  0 WAITING   213 4rtg0hv0atfkx 2AF83608 3232545373  0   -1     0      0    0 1414332421 16472  0
217 USER3    sqlplus.exe ACTIVE enq: TM - contention  0 WAITING   201 cv338j6z2530g 2AF85A5C 3189935119  0   -1     0      0    0 1414332418 16472  0

 

SELECT
  S.SID,
  S.USERNAME,
  S.PROGRAM,
  S.SQL_ID,
  S.SQL_ADDRESS,
  S.SQL_HASH_VALUE,
  S.SQL_CHILD_NUMBER CN,
  S.ROW_WAIT_OBJ#,
  S.ROW_WAIT_FILE#,
  S.ROW_WAIT_BLOCK#,
  S.ROW_WAIT_ROW#,
  L.LMODE,
  L.REQUEST,
  L.ID1,
  L.ID2,
  L.TYPE,
  L.BLOCK
FROM
  V$LOCK L,
  V$SESSION S
WHERE
  (L.ID1, L.ID2, L.TYPE) IN
    (SELECT
      ID1,
      ID2,
      TYPE
    FROM
      V$LOCK
    WHERE
      REQUEST > 0)
  AND L.SID=S.SID;

SID USERNAME PROGRAM     SQL_ID        SQL_ADDR SQL_HASH_VALUE CN ROW_WAIT_OBJ# ROW_WAIT_FILE# ROW_WAIT_BLOCK# ROW_WAIT_ROW# LMODE REQUEST ID1   ID2 TY BLOCK
--- -------- ----------- ------------- -------- -------------- -- ------------- -------------- --------------- ------------- ----- ------- ----- --- -- -----
213 TESTUSER sqlplus.exe 0vbusv12hnbk6 2AF78CC4     1158295110  0             0              2             799             0     3       0 16472   0 TM     1
204 USER2    sqlplus.exe 4rtg0hv0atfkx 2AF83608     3232545373  0            -1              0               0             0     3       5 16472   0 TM     0
217 USER3    sqlplus.exe cv338j6z2530g 2AF85A5C     3189935119  0            -1              0               0             0     0       2 16472   0 TM     0

Session 1:

COMMIT;

Session 3:

{session 3 remains hung}

Session 2:

1 row updated.

Session 4:

SELECT /*+ ORDERED */
  S.SID,
  S.USERNAME,
  S.PROGRAM,
  S.STATUS,
  SW.EVENT,
  SW.WAIT_TIME WT,
  SW.STATE,
  SW.SECONDS_IN_WAIT S_I_W,
  S.SQL_ID,
  S.SQL_ADDRESS,
  S.SQL_HASH_VALUE,
  S.SQL_CHILD_NUMBER,
  S.ROW_WAIT_OBJ# OBJ#,
  S.ROW_WAIT_FILE# FILE#,
  S.ROW_WAIT_BLOCK# BLOCK#,
  S.ROW_WAIT_ROW# ROW#,
  SW.P1,
  SW.P2,
  SW.P3
FROM
  V$SESSION_WAIT SW,
  V$SESSION S
WHERE
  S.USERNAME IS NOT NULL
  AND SW.SID=S.SID
  AND SW.EVENT NOT LIKE '%SQL*Net%'
  AND SW.EVENT NOT IN ('Streams AQ: waiting for messages in the queue', 'wait for unread message on broadcast channel');

SID USERNAME PROGRAM     STATUS EVENT                         WT STATE   S_I_W SQL_ID        SQL_ADDR SQL_HV     CN OBJ# FILE# BLOCK# ROW#         P1     P2   P3
--- -------- ----------- ------ ----------------------------- -- ------- ----- ------------- -------- ---------- -- ---- ----- ------ ---- ---------- ------ ----
217 USER3    sqlplus.exe ACTIVE enq: TX - row lock contention  0 WAITING   380 cv338j6z2530g 2AF85A5C 3189935119  0   -1     0      0    0 1415053316 458788 1297

 

SELECT
  S.SID,
  S.USERNAME,
  S.PROGRAM,
  S.SQL_ID,
  S.SQL_ADDRESS,
  S.SQL_HASH_VALUE,
  S.SQL_CHILD_NUMBER CN,
  S.ROW_WAIT_OBJ#,
  S.ROW_WAIT_FILE#,
  S.ROW_WAIT_BLOCK#,
  S.ROW_WAIT_ROW#,
  L.LMODE,
  L.REQUEST,
  L.ID1,
  L.ID2,
  L.TYPE,
  L.BLOCK
FROM
  V$LOCK L,
  V$SESSION S
WHERE
  (L.ID1, L.ID2, L.TYPE) IN
    (SELECT
      ID1,
      ID2,
      TYPE
    FROM
      V$LOCK
    WHERE
      REQUEST > 0)
  AND L.SID=S.SID;

SID USERNAME PROGRAM     SQL_ID        SQL_ADDR SQL_HASH_VALUE CN ROW_WAIT_OBJ# ROW_WAIT_FILE# ROW_WAIT_BLOCK# ROW_WAIT_ROW# LMODE REQUEST ID1    ID2  TY BLOCK
--- -------- ----------- ------------- -------- -------------- -- ------------- -------------- --------------- ------------- ----- ------- ------ ---- -- -----
217 USER3    sqlplus.exe cv338j6z2530g 2AF85A5C     3189935119  0            -1              0               0             0     0       4 458788 1297 TX     0
204 USER2    sqlplus.exe               00                    0               -1              0               0             0     6       0 458788 1297 TX     1

So, why did Oracle 11.1.0.7 deadlock while 10.2.0.4 did not deadlock (11.1.0.6 and 11.2.0.1 will also deadlock)?


Actions

Information

25 responses

8 01 2010
Jonathan Lewis

Oh no! They’ve changed it again.

9.2.0.5 (or maybe 9.2.0.4) brought in a mode 2 (row share) lock on the child if you updated the parent (and vice versa). It looks like that’s been changed to a mode 3 (row exclusive) in 11g.

A mode 2 is compatible with a mode 4 (share) or 5 (share row exclusive), but a mode 3 is not. And your sequence ‘delete a child row, delete parent a parent (in the abence of an FK index)’ leads you to a mode 5 – lining you up for the subsequent deadlock as other mode 3 locks start queueing.

I seriously dislike that quote from the OWI book – the change in FK locking in 9i made the problems slightly less threatening in low-activity systems, but certainly didn’t do away with them.

And the statement from the 11g OCA book is just blurs the issues terribly – if you’re going to delete parent rows you need the FK index to avoid the table lock, the efficiency of the delete is almost a secondary issue.

Thanks for publishing this – it’s yet another “little thing” for my list of 11g upgrade traps.

8 01 2010
Jonathan Lewis

I should have made it clear that the “lock in mode 2 / mode 3″ at the other end of the referential integrity constraint applies even when the FK index is in place.

Regards
Jonathan Lewis

8 01 2010
Charles Hooper

Jonathan, thanks for stopping by to share your knowledge and add value to this blog article. Your analysis is a bit better than is mine about this particular problem (I tried to be a little vague in the article).

The quote from the OWI book about no longer needing to index foreign keys was one of the very few problems that I found in the book. If anyone is interested, my other comments about that book may be found here:
https://hoopercharles.wordpress.com/2009/11/30/book-review-oracle-wait-interface-a-practical-guide-to-performance-diagnostics-tuning/

Jonathan, if you have a couple minutes, the following is paraphrased from your Practical Oracle 8i book, how does the 11g behavior compare with that of Oracle 8i?
“When there is a foreign key relationship between tables, Oracle will acquire a share lock on the child table when the primary (or unique) key in a row of the parent table is updated, or the row is deleted. This locking condition may be eliminated by indexing the foreign key column.”

9 01 2010
Jonathan Lewis

Charles,

As far as unindexed foreign keys are concerned, the big change came in 9i and I haven’t spotted any further changes in 11g. (Although the switch from mode 2 to mode 3 at the opposite end of the RI constraint will probably have some consequences.)

When you delete a parent row, or update the key value, your session will try to acquire a mode 4 – share – or mode 5 – share row exclusive – lock on the child table. (Mode 5 if the transaction has previously modified the child – otherwise mode 4, and most of the literature only mentions the mode 4).

In 8i the lock is released when your transaction commits. In 9i the lock is released (or downgraded from 5 to 3) as soon as the operation on the parent completes. This means that any sessions that had been queued behind the child lock will be able to resume sooner. I don’t think I’ve seen anything in 11g that reduces the interference any further.

In passing: it’s not commonly realised that the “foreign key index” doesn’t have to be an exact match for the foreign key; it merely has to start with the foreign key columns (which can be in any order) to avoid the locking issue.

12 01 2010
Charles Hooper

Thanks for the additional information regarding enqueue locks.

22 01 2010
Damir Vadas

Hi!

On my blog post Example how to save deadlock exceptions into table (beside alert log)! I have shown (beside how to catch deadlock to a plain table) significant changes of information saved according 9/10 versions and 11g.
Similar information can be found on site sensitive deadlock
Regards,
Damir Vadas
http://damir-vadas.blogspot.com

22 01 2010
Charles Hooper

Thanks for posting a reference to the article on your site.

Note that this blog article also mentioned the blog article written by Mark Bobak.

15 02 2010
Lock Horror « Oracle Scratchpad

[...] up a blog note from Charles Hooper on deadlocks in 11g, and then more recently a note on the OTN database forum, I was moved to do some testing on [...]

3 09 2010
damirvadas

Hi!
I’m not sure but when you run your 1 example, and only involved two sessions. You’ll see that blocked object is FK INDEX and not any of table records involved in transaction.

S I have rearranged mine function which shows blocked records. here is output for that case (owner of all tables is DAMIRV):

SQL>  @sbr
Oracle version: 11.2.0.1.0 (11.1.0.0.0)
        
DAMIRV (1 '133,10')
        DAMIRV    1     18,9           1596 sec   Share-Row-Exc         ACTIVE        SQL*Plus
                Locked object is not table: DAMIRV.SYS_C0022027 (INDEX)

PL/SQL procedure successfully completed.

SQL>

So when other transactin try to update index it is in the same data block as data from first (blocker) transaction and block come from that…
At least I do not have better explanation.

Code of this function (sbr.sql) can be downloaded from:
http://damir-vadas.blogspot.com/2010/07/blocking-session-show-table-records.html

Cheers!

3 09 2010
damirvadas

Small correction. Now I realize the problem and I do not have deadlock on 10g/11g

C2 column in T2 table is not indexed and in this way any operation on T1 (which is parent for T2), lock the whole table T2!
So when you add index on the C2 column session will NOT hang!

SQL*Plus: Release 10.2.0.3.0 - Production on Sub Ruj 4 00:54:33 2010

Copyright (c) 1982, 2006, Oracle.  All Rights Reserved.

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> INSERT INTO T1 VALUES(100020);

1 row created.

SQL> INSERT INTO T2 VALUES(100020,100020);

1 row created.

SQL> 
SQL> DELETE FROM T2 WHERE C1=50;

1 row deleted.

SQL> UPDATE T1 SET C1=100030 WHERE C1=50;

1 row updated.

SQL>

But session 3 will hang what is normal!

Hope this helps…

Cheers!
Damir Vadas
http://damir-vadas.blogspot.com

4 09 2010
Charles Hooper

Damir,

When this example was built, it was not intended to result in a deadlock on 11.1.0.6 and above – it might be interesting to see if it results in a deadlock on 10.2.0.5. The original example was intended to be included in chapter 8 of the “Expert Oracle Practices” book in the section titled “Investigating Enqueue Waits” (pages 286 through 291):
http://books.google.com/books?id=tUNtyMCwDWQC&pg=PA289

As I recall, the idea was to have one session (TESTUSER in this example) change a large number of rows in the database, holding off issuing a COMMIT until the transaction completed. Session 2 (USER2 in this example) was supposed to make a series of changes, including inserting a value into a primary key column (not derived from a sequence, but instead a SELECT MAX(ID) FROM… syntax), a value that session 3 (USER3 in this example) could not see and might also try to use if another session’s (TESTUSER) modified rows prevented session 2 from quickly issuing a COMMIT. In essence, this example was supposed to simulate the problem found on pages 190 through 195 of the “Expert Oracle Practices” book.

Reproducing the problem found on pages 190 through 195 was too easy, so we decided to have the SQL statements show a TM type enqueue (caused by the missing foreign key index) being quickly replaced by a TX enqueue when session 1 issued a COMMIT. We decided that a “screen painter” type application having a missing foreign key index in one of its tables may be a good example to simulate (some sources had reported that missing foreign key indexes were no longer a problem as of Oracle Database 9i, so a side benefit was demonstrating that foreign key indexes are still needed). I was surprised, and I believe that Randolf was also, when different results were received on Oracle Database 10.2.0.4 (which worked as expected) and Oracle Database 11.1.0.6/11.1.0.7/11.2.0.1 (which triggered an unexpected deadlock due to the different lock request modes, as pointed out by Jonathan Lewis in a comment of this article).

I have heard (actually read) on several occasions where people were criticized (maybe even called lazy or irresponsible) for not upgrading relatively quickly to the latest Oracle Database release. Consider this example, where a screen painter type program works exactly as expected on Oracle Database 10.2.0.1 through 10.2.0.4 (and maybe 10.2.0.5) and then suddenly starts triggering deadlocks, and possibly lost transactions if recovery from the deadlocks is not handled by the program, when the database is upgraded to 11.2.0.1 or the older 11.1.0.6 or 11.1.0.7. Such a problem would be very hard to investigate and correct. Would this problem be considered an Oracle bug (the application worked fine until Oracle Corp. decided to change the requested lock mode), an application bug (from the deadlock trace file: “The following deadlock is not an ORACLE error. It is a deadlock due to user error in the design of an application or from issuing incorrect ad-hoc SQL.”), or a “DBA bug” (the DBA, not wanting to be criticized for being lazy/irresponsible upgraded the database to a supported Oracle Database release).

Maybe the possibility of a “DBA bug” was actually the cause of the deadlock, and not the missing foreign key index… just something to think about. :-)

5 09 2010
damirvadas

Charles,
Thank you for bigger reply.
Now I see the whole picture.
So a conclusion was to stay on FK index…as in good old time of 8i…
Rg,
Damir Vadas
http://damir-vadas.blogspot.com

19 09 2010
kayvan

hi
i have a problem
my application is developer 10g
and database is 11g
i have any problem in update on any table
my filed in application is data base item
please help me
for sample please send me offer script

19 09 2010
Charles Hooper

Kayvan,

The problem that you described is not related to a deadlock which appears in Oracle Database 11g which does not appear in Oracle Database 10g, as described in this blog article.

This blog is not a question and answer forum, however it is a place where I try to help people understand how to find good answers. I am not familiar with “developer 10g” and Oracle Database 11g now includes four different releases (11.1.0.6, 11.1.0.7, 11.2.0.1, 11.2.0.2), each with potentially different behavior. When asking questions it is a very good idea to be very specific, otherwise you may receive an answer which is useless, thus wasting the time of the person answering the question and your own time.

Ask yourself what might prevent you from making changes to any table. My first thought is that you might have established a read-only connection between “developer 10g” and Oracle Database 11g (one of 11.1.0.6, 11.1.0.7, 11.2.0.1, 11.2.0.2). In such a case, you might try a Google search (http://www.google.com) of the Oracle documentation (assuming that “developer 10g” is an Oracle product). The search keywords might look like this:

oracle developer 10g "read only" connection site:download.oracle.com

My second thought is that there is a limitation applied to the database user account that you are using in “developer 10g”. Maybe that account has SELECT permission on the tables, but lacks the UPDATE permission on the tables. To see how to do that, you would use search keywords like this:

oracle grant update permission site:download.oracle.com

I would also try to verify that the same problem is present when using that same user account with another program, such as SQL*Plus.

Before attempting to ask a question:
* Identify the best place to ask a question – it does little good to ask a question on a site that does not answer specific user supplied questions. If there are multiple forums at the site, make certain that you pick the most appropriate forum for your question.
* Clarify what is “developer 10g”, for example, who makes the product and the exact product name.
* Determine the exact Oracle Database release, for example 11.2.0.2.
* Document exactly what you tried when attempting to resolve the problem.

When asking the question:
* Be polite, and do not mention the word “urgent”.
* Provide the table definition of one of the tables and any triggers on the table.
* Provide the EXACT error message that you received from the “developer 10g” product
* Specify the exact Oracle Database release
* Describe the steps that you performed to investigate the problem.

If the question is an Oracle Database type question, pick a question/answer forum that is appropriate for your question. Some of the better sites include:
http://asktom.oracle.com
http://groups.google.com/group/comp.databases.oracle.server/
http://forums.oracle.com/forums/forum.jspa?forumID=61

7 11 2010
Upgrade Pain01 « Thinking Out Loud

[...] google: deadlock 11g and found Deadlock on Oracle 11g but Not on 10g [...]

10 11 2010
Oracle11g: New Locking Modes When Policing FK Constraints (A Wolf at the Door) « Richard Foote’s Oracle Blog

[...] locking issues such as this, increasing the likelihood of hitting deadlock scenarios (as discussed here previously by Charles Hooper) and can therefore potentially reduce the overall concurrency [...]

14 02 2011
al0

Any guesses why Oracle introduced this change?

14 02 2011
Charles Hooper

I wonder if this is related:
http://tkyte.blogspot.com/2010/04/something-i-recently-unlearned.html

And from a comment in the above article, possibly the fix for “Bug 5970280 MERGE has wrong TM lock modes for foreign key constraints”
https://supporthtml.oracle.com/ep/faces/secure/km/DocumentDisplay.jspx?id=5970280.8

14 02 2011
al0

It rather seems to be 5909305 (https://support.oracle.com/CSP/main/article?cmd=show&type=NOT&id=5909305.8).
I dare say “the remedy is worse than the disease”. See as well http://jonathanlewis.wordpress.com/2010/02/15/lock-horror.

14 02 2011
Charles Hooper

Thank you for linking to the Metalink (MOS) note 5909305. It might be interesting to see what happens when disabling the fix for bug 5909305. I cannot remember for certain if I tried that – I recall doing some experimentation with _FIX_CONTROL, and I did not have any success rolling back a bug fix, but I cannot remember if it was this particular bug number. More information on _FIX_CONTROL (read the warning before experimenting):
http://www.pythian.com/news/1002/undocumented-parameter-_fix_control-or-how-to-break-your-database/
http://coskan.wordpress.com/2011/01/17/11-2-0-2-performance-stories/

2 06 2011
Paresh

Hi Charles,

I came to this old thread from your latest blog post “Insert Error on Oracle 11g but Not on 10g”. In addition to what Jonathan wrote above “it’s not commonly realised that the “foreign key index” doesn’t have to be an exact match for the foreign key; it merely has to start with the foreign key columns (which can be in any order) to avoid the locking issue”, he also mentions in his book Practical Oracle 8i to compress this index to save space (which will reduce number of Index Blocks so as to reduce IO and improve Buffer Cache efficiency).

3 06 2011
Charles Hooper

Good point. I thought that as of Oracle Database 9i it is also possible that the full table locking problem could be avoided even if the leading column in an index is not the column specified with the foreign key constraint in the child table – that Oracle Database could perform an index skip scan of the index to avoid a full table (TM enqueue) lock. I just reproduced the test case in this article, adding the following index definition:

CREATE INDEX IND_T2_C1_C2 ON T2 (C1, C2);

The above index definition did NOT help avoid the TM enqueue and the resulting deadlock on Oracle Database 11.2.0.2, so what you stated still appears to be correct for the latest version of Oracle Database.

18 01 2012
28 08 2013
Johan Lorier

In 2 environments where I tested this case (11.2.0.1 and 11.2.0.3 database), both times it is session 2 that gives a deadlock error message:

USER2@d121t > UPDATE T1 SET C1=100030 WHERE C1=50;
UPDATE T1 SET C1=100030 WHERE C1=50
*
ERROR at line 1:
ORA-00060: deadlock detected while waiting for resource

Is this to be expected? Or is it random session 2 or session 3 whose transaction can be ended by oracle?

28 08 2013
Charles Hooper

Johan,

That is interesting that in your trial of the test case that the other session received the deadlock error. In theory, either session could receive the deadlock error, however it always seemed to be session 3 in the several tests that I performed with 11.1.0.6, 11.1.0.7, 11.2.0.1, and 11.2.0.2.

Let’s check the documentation:

http://docs.oracle.com/cd/B12037_01/server.101/b10739/manproc.htm#sthref689

“A deadlock can occur when two or more users are waiting for data locked by each other. Deadlocks prevent some transactions from continuing to work. Oracle Database automatically detects deadlock situations and resolves them by rolling back one of the statements involved in the deadlock, thereby releasing one set of the conflicting row locks.”

http://docs.oracle.com/cd/E11882_01/appdev.112/e10825/pc_03dbc.htm#i6337

“Because each user is waiting for resources held by another user, neither can continue until Oracle breaks the deadlock. Oracle signals an error to the participating transaction that had completed the least amount of work, and the “deadlock detected while waiting for resource” Oracle error code is returned to sqlcode in the SQLCA.”

http://docs.oracle.com/cd/E11882_01/server.112/e25789/consist.htm#BABDDBCG

“Oracle Database automatically detects deadlocks and resolves them by rolling back one statement involved in the deadlock, releasing one set of the conflicting row locks. The database returns a corresponding message to the transaction that undergoes statement-level rollback. The statement rolled back belongs to the transaction that detects the deadlock. Usually, the signalled transaction should be rolled back explicitly, but it can retry the rolled-back statement after waiting.”

Three different answers from the documentation regarding which session ends up being the “victim”:
* Random
* Session that has performed the least amount of work
* Session that detects the deadlock (looking at that quote again, I am not sure what it is stating)

From what I understand, it is usually the session that has waited the longest that receives the deadlock, only because that session’s process is probably the first one to come off the CPU run queue. I think that there is a good explanation in the Oracle Core book:
http://books.google.com/books?id=G9AJA91PL54C&pg=PA83#v=onepage&q&f=false

Other references:
http://jonathanlewis.wordpress.com/2011/08/29/deadlock-2/
http://arup.blogspot.com/2013/04/application-design-is-only-reason-for.html

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s




Follow

Get every new post delivered to your Inbox.

Join 137 other followers

%d bloggers like this: