Faulty Quotes 7 – Deadlock Kills Sessions?

4 01 2012

January 4, 2012

(Back to the Previous Post in the Series)

I thought that I would begin this blog article with a bit of a test case that demonstrates a very simple deadlock between two sessions.

In Session 1 (create the test table with a primary key and insert a single row):

CREATE TABLE T1 (
  C1 NUMBER PRIMARY KEY,
  C2 VARCHAR2(10));

INSERT INTO T1 VALUES (1,'1');

In Session 2 (insert a row that does not conflict with session 1 and then insert a row with the same primary key value as the uncommitted row in Session 1):

INSERT INTO T1 VALUES (2,'2');

INSERT INTO T1 VALUES (1,'3');

(Session 2 hangs while waiting for Session 1 to COMMIT or ROLLBACK)

In Session 1 (insert a row with the same primary key value as the uncommitted row in Session 2):

INSERT INTO T1 VALUES (2,'2');

(Session 1 hangs while waiting for Session 2 to COMMIT or ROLLBACK)

In Session 2 (an ORA-00060 is returned, deadlock between the two sessions ends):

INSERT INTO T1 VALUES (1,'3')
            *
ERROR at line 1:
ORA-00060: deadlock detected while waiting for resource

(Session 1 still hangs while waiting for Session 2 to COMMIT or ROLLBACK the first row that was inserted)

In Session 2 (confirming that the session was not killed/terminated and that the first row inserted still exists – indicates that the current TRANSACTION was not rolled back, just the statement that was involved in the deadlock):

SELECT
  *
FROM
  T1;

        C1 C2
---------- ----------
         2 2

While recently reading an OTN thread I was reminded of a bit of information that several sources simply miss-state about Oracle deadlocks.  The incorrect information is repeated, seemingly quite frequently, in various books and forum threads.  As luck would have it, several of the books that I have read about Oracle Database also repeat this incorrect information.  For instance, a quote from page 352 of the book “Expert Oracle Database 11g Administration” (a book that I reviewed and gave 4 out of 5 stars, maybe I need to re-evaluate that rating?):

“This is a catch-22 situation, because the stalemate can’t be broken by either session unilaterally. In such circumstances, Oracle steps in, terminates one of the sessions, and rolls back its statement. Oracle quickly recognizes that two sessions are deadlocked and terminates the session that issued the most recent request for a lock. This will release the object locks that the other session is waiting for.”

A second book, “Secrets of the Oracle Database” (another book that I reviewed and gave 4 out of 5 stars, actually it was probably worth about 4.4 stars), states the following on page 57:

“Hence, the ORACLE DBMS detects circular chains pertaining to interdependent locks, signals the error ‘ORA-00060: deadlock detected while waiting for resource’, and rolls back one of the sessions involved in the would-be deadlock.”

A third book, “Oracle SQL Recipes” (another book that I reviewed and gave 4 out of 5 stars), states the following on page 217:

“The immediate solution to the problem requires no user intervention: Oracle detects the deadlock and automatically rolls back one of the transactions, releasing one of the sets of row locks so that the other session’s DML will complete.”

A fourth book, “Oracle Database 11g Performance Tuning Recipes” (another book that I reviewed and gave 2 out of 5 stars), states on page 227:

“While the database can automatically resolve deadlocks between sessions (by killing one of the sessions holding a needed lock)…”

The Google book search found that the above four books were not alone in their statements of deadlocks that do not match the test case results at the start of this blog article.  One of the books found by Google is one that I have not read yet (and probably never will), “Oracle PL/SQL for Dummies”, which states the following on page 298:

“This is a deadlock. Oracle resolves it by raising a deadlock exception (ORA-00060: deadlock detected while waiting for resource) in user 1’s session. This terminates the procedure and allows user 2’s procedure to successfully complete.”

A second book found by the Google search is a bit old, “Oracle 9i DBA 101”, which on page 194 states:

“A deadlock occurs when two or more users wait for data that is locked by each other. Oracle will kill the first server process that detects the deadlock, so you should always look for and fix code that can cause deadlocks.”

There are certainly many other statements in other books regarding deadlocks, both correct and incorrect.  A quick search through the OTN forums found several threads in addition to the one at the start of this blog article that also drifted a bit from an accurate picture of what happens when a deadlock is broken in Oracle Database.

https://forums.oracle.com/forums/thread.jspa?threadID=2196282

“That is the deadlock. Oracle will then immediately choose arbitrarlly a victim session (542 or 548) and kill its process letting the remaining session continuing its work.”

https://forums.oracle.com/forums/thread.jspa?threadID=2152646

“There should be a simple wait, not a deadlock which would killed one of the session.”

https://forums.oracle.com/forums/thread.jspa?threadID=2261367

“If two sessions are actually deadlocked, Oracle should always kill one of the blocked sessions, raising an ORA-00060 error. It sounds like you’re saying this is not the expected behavior.”

https://forums.oracle.com/forums/thread.jspa?threadID=2227420

“The two session are inserting records in the same table with a unique index on it. With the same database version on another server and the same treatment, the deadlocks are correctly handled (one session is killed with ora-00060) and the process can continue.”

https://forums.oracle.com/forums/thread.jspa?threadID=1072665

“In oracle, when session1 and session2 make a deadlock, the oracle will kill one session , but I want know how to let oracle kill session1 and session2 both, is there way to do that?”

https://forums.oracle.com/forums/thread.jspa?threadID=2169563

“NO. Oracle already broke the deadlock by terminating one of these sessions. By the time trace file was closed, the deadlock no longer existed.”

https://forums.oracle.com/forums/thread.jspa?threadID=953308

“Oracle itself resolve the deadlock by killing the session.”

While reading the book “Oracle Core Essential Internals for DBAs and Developers” I almost flagged a sentence in a discussion of deadlocks as potentially containing an error.  That statement is found on page 82 near the end of the third to the last paragraph on the page.  After reading that paragraph about 10 times I decided that the statement is probably specific to TM locks, and probably is correct (the thought of TX locks maintained at the individual block level that contains the table rows kept running through my head the first nine times I read the paragraph).  This section of the book is interesting, as it not only avoids the mistakes mentioned in the above books, but it also highlights an inaccurate statement or two found in the Oracle Database documentation.


Actions

Information

10 responses

4 01 2012
chris_c

I wonder if the misconception arises because of the way some client applications handle the error. A badly coded application which crashes on receiving the ORA-00060 usually because there is poor or no error handling could lead to the assumption that oracle has killed the session rather than the client application crashing. We have a wonderful third party application here that on any ORA-* message crashes and dumps a stack trace, as the trace then shows the crash occurred during a database call its the DBA’s fault not the developers.

4 01 2012
Charles Hooper

Chris,

I think that you are right – it is a bit of a mistaken cause and effect relationship, where the application’s reaction to the ORA error is missing element.

4 01 2012
hourim

Charles,

In the third OTN link you’ve listed, I gave the following answer

When deadlock occurs Oracle will not roll back the entire transaction. It will only do a rollback of one of the statements (belonging to one of the sessions) involded in the deadlock. The other session remains blocked if the session for which one of the statement has been rolled back by Oracle will not commit or rollback.

In other words, in the Justin Cave example, Session 1 will remains blocked if Session 2 will not issue a rollback after it has received the deadlook error

So, the OP should verify that his application must forsee a rollback in case of exception (and the deadlock error could be one of those exceptions)

Hope this helps

.

4 01 2012
Charles Hooper

Mohamed,

What you quoted above is very well worded – and matches the test case results at the start of this article. Thank you for posting the quote.

4 01 2012
hourim

There is also a misconception about oracle choosing ARBITRARILY one of the two sessions involved in the deadlocks and rollback one of its statement. The choosen session is not done arbitrarily but it seems that it is the session that start first waiting

http://jonathanlewis.wordpress.com/2011/08/29/deadlock-2/

4 01 2012
Charles Hooper

Another piece of the puzzle. Jonathan mentioned something similar in the Oracle Core book – and his writing style made it seem so obvious that it works as he indicated, that I had a hard time understanding why I did not understand the reasoning earlier. 🙂

5 01 2012
Damir Vadas

Nice. Thank you for sharing.

6 01 2012
pvh

Hi Chalres, But the “enq: TM contention” did’nt killed the session for me:
session1:

SQL> SELECT sys_context('USERENV', 'SID') FROM DUAL;

SYS_CONTEXT('USERENV','SID')
-----------------------------
140

CREATE TABLE emp(
emp_id number(5) not null,
emp_name varchar2(35) not null,
address varchar2(50),
CONSTRAINT empid_pk PRIMARY KEY (emp_id)
);

INSERT INTO emp VALUES (1, 'xyz', 'addr1');
INSERT INTO emp VALUES (2, 'abc', 'addr2');
COMMIT;
    EMP_ID EMP_NAME                            ADDRESS
---------- ----------------------------------- --------------------------------------------------
         1 xyz                                 addr1
         2 abc                                 addr2

DELETE from emp WHERE emp_id = 1;

session2:

SQL> SELECT sys_context('USERENV', 'SID') FROM DUAL;

SYS_CONTEXT('USERENV','SID')
--------------------------------------------------------------------------------
21

CREATE TABLE dept
(dept_id     number(8)     not null,
dept_name    varchar2(35)    not null,
emp_id     number(5)     not null,
CONSTRAINT fk_emp
FOREIGN KEY (emp_id)
REFERENCES emp(emp_id) ON DELETE CASCADE
);

INSERT INTO dept VALUES (1, 'dep1', 1);
INSERT INTO dept VALUES (2, 'dep2', 1);
INSERT INTO dept VALUES (3, 'dep3', 2);
COMMIT;

SQL> select * from dept;

   DEPT_ID DEPT_NAME                               EMP_ID
---------- ----------------------------------- ----------
         1 dep1                                         1
         2 dep2                                         1
         3 dep3                                         2

DELETE from emp WHERE emp_id = 2;

(session2 hangs here)

session3:

SQL> select sid,serial#,SECONDS_IN_WAIT,STATE,BLOCKING_SESSION,BLOCKING_SESSION_STATUS,event from v$session;

       SID    SERIAL# SECONDS_IN_WAIT STATE               BLOCKING_SESSION BLOCKING_SE EVENT
---------- ---------- --------------- ------------------- ---------------- ----------- ---------------------------------
        21      22260              10 WAITING                          140 VALID       enq: TM - contention
        140     29720              25 WAITING                              NO HOLDER   SQL*Net message from client
........... (edited due to many results)
6 01 2012
Charles Hooper

pvh,

It was a good idea to try a deadlock involving a TM enqueue. You can find another interesting example of a TM enqueue here:
https://hoopercharles.wordpress.com/2010/01/07/deadlock-on-oracle-11g-but-not-on-10g/

Please keep in mind that a TM enqueue will not necessarily lead to a deadlock, and that a deadlock involving a TM equeue will not kill the sessions that ends up being the deadlock “victim” (this might not have been clear because of all of the incorrect quotes from the various books).

I had a little bit of trouble when I tried to recreate your test case:
In Session 1:

SELECT sys_context('USERENV', 'SID') FROM DUAL;

SYS_CONTEXT('USERENV','SID')
----------------------------
223

CREATE TABLE emp(
emp_id number(5) not null,
emp_name varchar2(35) not null,
address varchar2(50),
CONSTRAINT empid_pk PRIMARY KEY (emp_id)
);

INSERT INTO emp VALUES (1, 'xyz', 'addr1');
INSERT INTO emp VALUES (2, 'abc', 'addr2');
COMMIT;

DELETE from emp WHERE emp_id = 1;

In Session 2:

SELECT sys_context('USERENV', 'SID') FROM DUAL;

SYS_CONTEXT('USERENV','SID')
----------------------------
222

CREATE TABLE dept
(dept_id     number(8)     not null,
dept_name    varchar2(35)    not null,
emp_id     number(5)     not null,
CONSTRAINT fk_emp
FOREIGN KEY (emp_id)
REFERENCES emp(emp_id) ON DELETE CASCADE
);

CREATE TABLE dept
*
ERROR at line 1:
ORA-00054: resource busy and acquire with NOWAIT specified or timeout expired

I could not create the DEPT table in Session 2 because Session 1 was already a TX lock on the EMP table. We need to change the order of the commands a little.

Fixing up the order of the commands and modifying the script to actually trigger a deadlock:
In Session 1:

ROLLBACK;

In Session 2:

ROLLBACK;

In Session 1:

CREATE TABLE dept
(dept_id     number(8)     not null,
dept_name    varchar2(35)    not null,
emp_id     number(5)     not null,
CONSTRAINT fk_emp
FOREIGN KEY (emp_id)
REFERENCES emp(emp_id) ON DELETE CASCADE
);

INSERT INTO dept VALUES (1, 'dep1', 1);
INSERT INTO dept VALUES (2, 'dep2', 1);
INSERT INTO dept VALUES (3, 'dep3', 2);
COMMIT;

DELETE from emp WHERE emp_id = 1;

In Session 2:

INSERT INTO emp VALUES (3, 'abc', 'addr2');
 
DELETE from emp WHERE emp_id = 2;

(Session 2 is hung)

In Session 1:

INSERT INTO emp VALUES (3, 'abc', 'addr2');

Session 2 Shows:

DELETE from emp WHERE emp_id = 2
            *
ERROR at line 1:
ORA-00060: deadlock detected while waiting for resource

(Session 1 is still hung)

In Session 2:

SELECT
  *
FROM
  EMP
WHERE
  EMP_ID=3;
 
EMP_ID EMP_NAME                            ADDRESS
------ ----------------------------------- -------
     3 abc                                 addr2

The above shows that Session 2 was still connected to the database, and that the first row inserted by Session 2 was not rolled back by an automatic transaction rollback (there was an automatic statement-level rollback).

This is the deadlock graph that was written to the trace file:

Deadlock graph:
                       ---------Blocker(s)--------  ---------Waiter(s)---------
Resource Name          process session holds waits  process session holds waits
TX-00040016-0000085a        23     222     X             31     223           S
TM-00011c1a-00000000        31     223    SX             23     222    SX   SSX
 
session 222: DID 0001-0017-000002D9	session 223: DID 0001-001F-00000001 
session 223: DID 0001-001F-00000001	session 222: DID 0001-0017-000002D9 
 
Rows waited on:
  Session 222: no row
  Session 223: no row
 
----- Information for the OTHER waiting sessions -----
Session 223:
  sid: 223 ser: 25 audsid: 235217 user: 62/TESTUSER
    flags: (0x45) USR/- flags_idl: (0x1) BSY/-/-/-/-/-
    flags2: (0x40008) -/-
  pid: 31 O/S info: user: SYSTEM, term: AIRFORCE-1, ospid: 15824
    image: ORACLE.EXE (SHAD)
  client details:
    O/S info: user: hooper, term: AIRFORCE-1, ospid: 9392:15596
    machine: AIRFORCE-1 program: sqlplus.exe
    application name: SQL*Plus, hash value=3669949024
  current SQL:
  INSERT INTO emp VALUES (3, 'abc', 'addr2')
 
----- End of information for the OTHER waiting sessions -----
6 01 2012
pvh

Thanks for the inputs Charles, I will check that out.

Leave a reply to Charles Hooper Cancel reply