Select For Update – In What Order are the Rows Locked?

21 11 2011

November 21, 2011

A recent thread in the comp.databases.oracle.server usenet group asked whether or not a SELECT FOR UPDATE statement locks rows in the order specified by the ORDER BY clause.  Why might this be an important question?  Possibly if the SKIP LOCKED clause is implemented in the SELECT FOR UPDATE statement?  Possibly if a procedure is hanging, and it is important to know at what point an enqueue happened?  Any other reasons?

Without testing, I would have suggested that the rows are locked as the rows are read from the table blocks, and not after the ORDER BY clause alters the order of the rows.  Why?  Oracle Database is fundamentally lazy, or put another way, fundamentally efficient – in general it does not perform unnecessary work.  Locking the rows after applying the ORDER BY clause would require a second visit to the table blocks (possibly having to visit each block multiple times to lock different rows in the same block) in the order specified by the ORDER BY clause.  Such an approach could be incredibly inefficient and also error prone (what happens if a row was locked by a second session while the first session was sorting the rows per the ORDER BY clause?).

We could guess, but why guess when we are able to easily test the theory?  Let’s create a simple table with 10,000 rows:

DROP TABLE T1 PURGE;

CREATE TABLE T1 AS
SELECT
  ROWNUM C1,
  RPAD('A',255,'A') C2
FROM
  DUAL
CONNECT BY
  LEVEL<=10000;

COMMIT; 

Now we need 2 sessions (I will call them Session 1 and Session 2).  We will execute the same SELECT FOR UPDATE statement in both sessions, with Session 1 first selecting the table rows in ascending order and then Session 2 selecting the table rows in descending order.  If the ORDER BY clause determines the order in which the rows are locked, the row with a C1 value of 10,000 should be identified as the row that caused the enqueue because that is the first row that should be returned to Session 2.

In Session 1:

SELECT
  C1,
  C2
FROM
  T1
WHERE
  MOD(C1,100)=0
ORDER BY
  C1
FOR UPDATE; 

In Session 2:

SELECT
  C1,
  C2
FROM
  T1
WHERE
  MOD(C1,100)=0
ORDER BY
  C1 DESC
FOR UPDATE; 

(Session 2 is hung)

Let’s try to identify the row that Session 2 is waiting to lock.  There are a couple of ways to accomplish this task (on Oracle Database 10.1 and higher the join to the V$SESSION_WAIT view is unnecessary because that information is found in V$SESSION).

In Session 1:

COLUMN EVENT FORMAT A20
COLUMN OBJ# FORMAT 999999
COLUMN BLOCK# FORMAT 999999
COLUMN ROW# FORMAT 99

SELECT
  SW.EVENT,
  S.ROW_WAIT_OBJ# OBJ#,
  S.ROW_WAIT_FILE# FILE#,
  S.ROW_WAIT_BLOCK# BLOCK#,
  S.ROW_WAIT_ROW# ROW#
FROM
  V$SESSION_WAIT SW,
  V$SESSION S
WHERE
  S.USERNAME IS NOT NULL
  AND SW.SID=S.SID
  AND SW.EVENT NOT LIKE '%SQL*Net%'
  AND SW.EVENT NOT IN ('Streams AQ: waiting for messages in the queue', 'wait for unread message on broadcast channel');

EVENT                   OBJ#      FILE#  BLOCK# ROW#
-------------------- ------- ---------- ------- ----
enq: TX - row lock c   71913          4    4262   18
ontention 

We have the object ID (71913 – technically we need the DATA_OBJECT_ID, not the OBJECT_ID for the DBMS_ROWID.ROWID_CREATE call, but the two values will be identical in this test case), absolute file number (4), block (4262), and row (18) in the block that caused the enqueue.  Let’s select that row from the table:

SELECT
  C1
FROM
  T1
WHERE
  ROWID=DBMS_ROWID.ROWID_CREATE(1, 71913, 4, 4262, 18);

 C1
---
100 

The row with a C1 value of 100 is the first row that was returned by Session 1 (it will be the last row requested to be returned by Session 2), but it is also the first row that matched the WHERE clause predicates for Session 2’s SQL statement as the blocks were read (you could confirm the order in which the blocks are read by flushing the buffer cache before the select, setting the DB_FILE_MULTIBLOCK_READ_COUNT parameter at the session level to 1, enabling a 10046 trace, and then executing the SELECT FOR UPDATE statement).

Let’s retrieve the execution plan for Session 2 to determine why the original poster (OP) might have throught that the rows were locked after the ORDER BY is applied (note that the SQL_ID and SQL_CHILD_NUMBER columns only exist in V$SESSION as of Oracle Database 10.1, so technically the join to V$SESSION_WAIT is unnecessary; however, for consistency with the previous SQL statement that determined the locked row, the join to the V$SESSION_WAIT view is included):

SELECT
  S.SQL_ID,
  S.SQL_CHILD_NUMBER
FROM
  V$SESSION_WAIT SW,
  V$SESSION S
WHERE
  S.USERNAME IS NOT NULL
  AND SW.SID=S.SID
  AND SW.EVENT NOT LIKE '%SQL*Net%'
  AND SW.EVENT NOT IN ('Streams AQ: waiting for messages in the queue', 'wait for unread message on broadcast channel');

SQL_ID        SQL_CHILD_NUMBER
------------- ----------------
2dnpymtj0rc1r                0 

Now retrieving the execution plan:

SELECT
  *
FROM
  TABLE(DBMS_XPLAN.DISPLAY_CURSOR('2dnpymtj0rc1r',0,'TYPICAL'));

SQL_ID  2dnpymtj0rc1r, child number 0
-------------------------------------
SELECT   C1,   C2 FROM   T1 WHERE   MOD(C1,100)=0 ORDER BY   C1 DESC
FOR UPDATE

Plan hash value: 3991553210

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |       |       |    33 (100)|          |
|   1 |  FOR UPDATE         |      |       |       |            |          |
|   2 |   SORT ORDER BY     |      |   123 | 17466 |    33   (4)| 00:00:01 |
|*  3 |    TABLE ACCESS FULL| T1   |   123 | 17466 |    32   (0)| 00:00:01 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter(MOD("C1",100)=0)

Note
-----
   - dynamic sampling used for this statement (level=2) 

As mentioned by the OP, the execution plan appears to be slightly misleading – unless of course you remember that locking the rows after sorting the rows based on the ORDER BY clause  would require revisiting the rows in the table blocks.  The situation could be different in this test case if there was an index on column C1.  In such a case the index could be read in descending order, thus making it appear that the rows were attempted to be locked in the order described by the ORDER BY clause.

Let’s slightly adapt the original test case to test the second theory, that it could appear that the rows are locked in the order specified by the ORDER BY clause.

In Session 1:

ROLLBACK;

In Session 2:

ROLLBACK;

Now we will create an index on column C1.

In Session 1:

CREATE INDEX IND_T1_C1 ON T1(C1);

Now the revised test begins.

In Session 1:

SELECT /*+ INDEX(T1) */
  C1,
  C2
FROM
  T1
WHERE
  C1<=100
ORDER BY
  C1
FOR UPDATE;

In Session 2:

SELECT /*+ INDEX(T1) */
  C1,
  C2
FROM
  T1
WHERE
  C1<=100
ORDER BY
  C1 DESC
FOR UPDATE;

(Session 2 is hung)

Let’s try to identify the row that Session 2 is waiting to lock.

In Session 1:

COLUMN EVENT FORMAT A20
COLUMN OBJ# FORMAT 999999
COLUMN BLOCK# FORMAT 999999
COLUMN ROW# FORMAT 99

SELECT
  SW.EVENT,
  S.ROW_WAIT_OBJ# OBJ#,
  S.ROW_WAIT_FILE# FILE#,
  S.ROW_WAIT_BLOCK# BLOCK#,
  S.ROW_WAIT_ROW# ROW#
FROM
  V$SESSION_WAIT SW,
  V$SESSION S
WHERE
  S.USERNAME IS NOT NULL
  AND SW.SID=S.SID
  AND SW.EVENT NOT LIKE '%SQL*Net%'
  AND SW.EVENT NOT IN ('Streams AQ: waiting for messages in the queue', 'wait for unread message on broadcast channel');

EVENT                   OBJ#      FILE#  BLOCK# ROW#
-------------------- ------- ---------- ------- ----
enq: TX - row lock c   71913          4    4259    0
ontention 

Notice in the above output, now block number 4259 is identified, while in the earlier test script block number 4262 was identified by the above SQL statement.  We have the object ID (we actually need the DATA_OBJECT_ID, but the values will be the same in this test case), the absolute file number, the block number, and the row number in the block (0 in this case).  Let’s select that row from the table:

SELECT
  C1
FROM
  T1
WHERE
  ROWID=DBMS_ROWID.ROWID_CREATE(1, 71913, 4, 4259, 0);

 C1
---
 1 

The row with a C1 value of 1 is the first row that was returned by Session 1 (it will be the last row requested to be returned by Session 2), but it is also the first row that matched the WHERE clause predicates for Session 2’s SQL statement as the blocks were read.  This is the same result as we saw before – maybe it does not matter whether an index access path is used to avoid the sort operation that would be otherwise required to satisfy the ORDER BY clause – is the second theory false?  Before making that determination, let’s take a look at the execution plan for Session 2’s SQL statement:

SELECT
  S.SQL_ID,
  S.SQL_CHILD_NUMBER
FROM
  V$SESSION_WAIT SW,
  V$SESSION S
WHERE
  S.USERNAME IS NOT NULL
  AND SW.SID=S.SID
  AND SW.EVENT NOT LIKE '%SQL*Net%'
  AND SW.EVENT NOT IN ('Streams AQ: waiting for messages in the queue', 'wait for unread message on broadcast channel');

SQL_ID        SQL_CHILD_NUMBER
------------- ----------------
3yz7pu7rw5cw0                0

SELECT
  *
FROM
  TABLE(DBMS_XPLAN.DISPLAY_CURSOR('3yz7pu7rw5cw0',0,'TYPICAL'));

Plan hash value: 3432103074

-------------------------------------------------------------------------------------------
| Id  | Operation                     | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |           |       |       |     7 (100)|          |
|   1 |  FOR UPDATE                   |           |       |       |            |          |
|   2 |   SORT ORDER BY               |           |   100 | 14200 |     7  (15)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID| T1        |   100 | 14200 |     6   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN          | IND_T1_C1 |   100 |       |     2   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("C1"<=100)

Note
-----
   - dynamic sampling used for this statement (level=2)

The index that we created was definitely used, but notice that there is still a SORT ORDER BY operation in the execution plan.  The rows in the index were read in ascending order, not descending order!  Let’s try again using an INDEX_DESC hint in the SQL statement for Session 2.

In Session 1:

ROLLBACK;

In Session 2:

ROLLBACK;

In Session 1:

SELECT /*+ INDEX(T1) */
  C1,
  C2
FROM
  T1
WHERE
  C1<=100
ORDER BY
  C1
FOR UPDATE;

In Session 2:

SELECT /*+ INDEX_DESC(T1) */
  C1,
  C2
FROM
  T1
WHERE
  C1<=100
ORDER BY
  C1 DESC
FOR UPDATE;

(Session 2 is hung)

Let’s determine the row that Session 2 is waiting to lock.

In Session 1:

SELECT
  SW.EVENT,
  S.ROW_WAIT_OBJ# OBJ#,
  S.ROW_WAIT_FILE# FILE#,
  S.ROW_WAIT_BLOCK# BLOCK#,
  S.ROW_WAIT_ROW# ROW#
FROM
  V$SESSION_WAIT SW,
  V$SESSION S
WHERE
  S.USERNAME IS NOT NULL
  AND SW.SID=S.SID
  AND SW.EVENT NOT LIKE '%SQL*Net%'
  AND SW.EVENT NOT IN ('Streams AQ: waiting for messages in the queue', 'wait for unread message on broadcast channel');

EVENT                   OBJ#      FILE#  BLOCK# ROW#
-------------------- ------- ---------- ------- ----
enq: TX - row lock c   71913          4    4262   18
ontention

SELECT
  C1
FROM
  T1
WHERE
  ROWID=DBMS_ROWID.ROWID_CREATE(1, 71913, 4, 4262, 18);

 C1
---
100

The above output now shows that Session 2 is attempting to lock the first row (with a C1 value of 100) that it intends to return.  Has the execution plan changed?

SELECT
  S.SQL_ID,
  S.SQL_CHILD_NUMBER
FROM
  V$SESSION_WAIT SW,
  V$SESSION S
WHERE
  S.USERNAME IS NOT NULL
  AND SW.SID=S.SID
  AND SW.EVENT NOT LIKE '%SQL*Net%'
  AND SW.EVENT NOT IN ('Streams AQ: waiting for messages in the queue', 'wait for unread message on broadcast channel');

SQL_ID        SQL_CHILD_NUMBER
------------- ----------------
806mtjxk7k1dv                0

SELECT
  *
FROM
  TABLE(DBMS_XPLAN.DISPLAY_CURSOR('806mtjxk7k1dv',0,'TYPICAL'));

Plan hash value: 3814195162

--------------------------------------------------------------------------------------------
| Id  | Operation                      | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |           |       |       |     6 (100)|          |
|   1 |  FOR UPDATE                    |           |       |       |            |          |
|   2 |   BUFFER SORT                  |           |       |       |            |          |
|   3 |    TABLE ACCESS BY INDEX ROWID | T1        |   100 | 14200 |     6   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN DESCENDING| IND_T1_C1 |   100 |       |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("C1"<=100)

Note
-----
   - dynamic sampling used for this statement (level=2) 

Notice in the above execution plan that there is no longer a SORT ORDER BY operation in the execution plan (replaced by a BUFFER SORT operation) and the INDEX RANGE SCAN operation was also replaced by an INDEX RANGE SCAN DESCENDING operation.  Simply because the index range scan is performed in the same order as specified by the ORDER BY clause, the rows are locked in the same order as is specified by the ORDER BY clause – that is, after all, the order in which the rows were touched.

Any other ideas for a demonstration of the order in which rows are locked when a SELECT FOR UPDATE is used?