[原]ORA-00060: Deadlock detected（場景1:單表并發更新）

單表并發更新容易產生死鎖的原因分析。

先說說什么是死鎖（Deadlock），關于死鎖的定義google、baidu可以輕易查到，我也不想引經據典，我用一個簡單故事來說明一下死鎖。

話說一個風和日麗的下午，小明和小強打架，由于情節嚴重被老師批評教育不說還得放學后寫悔過書，大家知道寫悔過書要紙和筆，恰巧在剛才那場大戰中小明將小強全部作業本和草稿紙撕碎，而小強將小明所有筆摔個稀巴爛，現在兩人勢如水火，別說借東西，話都不想說，于是就這么等啊等，等到老師來看他們的悔過書寫得怎么樣，結果當然是“還沒開始寫”，于是老師說：“小明你把紙借給小強，讓小強寫”。

哈哈整個故事有點牽強，但這就是一個經典死鎖場景，如果老師不出來調停，他兩就只能這樣等下去，俗語叫“等死”了。

說回正題，Oracle 特殊的鎖管理模式使發生死鎖的幾率大大減少，但是，要相信“一切皆有可能”，以后幾篇博客對會分析導致 ORA-00060 的各種場景和處理方法。

今天說說第一個場景，也是網上能找到最多的場景，我都不太好意思在標題上加個“[原]”標志了。

還是使用經典的 scott demo 吧：

開兩個會話，看看各自的sid：

select sid from v$mystat where rownum=1;

scott$mydb@test02 SQL> set pagesize 50 ; select * from emp ; MGR HIREDATE                   SAL       COMM     DEPTNO --------- ---------- ------------------- ---------- ---------- ---------- 7902 1980-12-17 00:00:00        800                    20 7698 1981-02-20 00:00:00       1600        300         30 7698 1981-02-22 00:00:00       1250        500         30 7839 1981-04-02 00:00:00       2975                    20 7698 1981-09-28 00:00:00       1250       1400         30 7839 1981-05-01 00:00:00       2850                    30 7839 1981-06-09 00:00:00       2450                    10 7566 1987-04-19 00:00:00       3000                    20 1981-11-17 00:00:00       5000                    10 7698 1981-09-08 00:00:00       1500          0         30 7788 1987-05-23 00:00:00       1100                    20 7698 1981-12-03 00:00:00        950                    30 7566 1981-12-03 00:00:00       3000                    20 7782 1982-01-23 00:00:00       1300                    10

我想為工資（SAL）最低的兩個人（ENAME：Smith、James EMPNO：7369、7900）加工資1元（我承認這次工資的漲幅最不上CPI），而另一位毫不知情的 Manager Blake 也想給他們加1元的工資，我在 sid 為 126 進行操作，而 Blake 在 sid 為 128 的會話中操作，執行順序如下表：

        +--------------------------------------------+--------------------------------------------+
        |         Session 1 (sid=126)                |            Session 2 (sid=128)             |
        +--------------------------------------------+--------------------------------------------+
        | update emp set sal=sal+1 where empno=7369; |                                            |
        +--------------------------------------------+--------------------------------------------+
        |                                            | update emp set sal=sal+1 where empno=7900; |
        +--------------------------------------------+--------------------------------------------+
        | update emp set sal=sal+1 where empno=7900; |                                            |
        +--------------------------------------------+--------------------------------------------+
        |                                            | update emp set sal=sal+1 where empno=7369; |
        +--------------------------------------------+--------------------------------------------+
        | ORA-00060: deadlock detected               |                                            |
        |            while waiting for resource      |                                            |
        +--------------------------------------------+--------------------------------------------+

這樣我們就成功地觸發了一個ORA-00060，從aler文件中可以看到一條類似如下的信息：

ORA-00060: Deadlock detected. More info in file /u01/app/admin/mydb/udump/mydb_ora_7531.trc.

我們看看 trc 文件，其中最有用的一部分是 Deadlock graph ：

[Transaction Deadlock]
Current SQL statement for this session:
update emp set sal=sal+1 where empno=7900
The following deadlock is not an ORACLE error. It is a
deadlock due to user error in the design of an application
or from issuing incorrect ad-hoc SQL. The following
information may aid in determining the deadlock:
Deadlock graph:
                       ---------Blocker(s)--------  ---------Waiter(s)---------
Resource Name          process session holds waits  process session holds waits
TX-00010013-00005f96        32     126     X             34     128           X
TX-00030001-000065ed        34     128     X             32     126           X
session 126: DID 0001-0020-00003C79     session 128: DID 0001-0022-0000772F
session 128: DID 0001-0022-0000772F     session 126: DID 0001-0020-00003C79
Rows waited on:
Session 128: obj - rowid = 0000C7CF - AAAMfPAAEAAAAAgAAA
  (dictionary objn - 51151, file - 4, block - 32, slot - 0)
Session 126: obj - rowid = 0000C7CF - AAAMfPAAEAAAAAgAAL
  (dictionary objn - 51151, file - 4, block - 32, slot - 11)
Information on the OTHER waiting sessions:
Session 128:
  pid=34 serial=31980 audsid=319634 user: 54/SCOTT
  O/S info: user: oracle, term: pts/8, ospid: 22771, machine: test02
            program: sqlplus@test02 (TNS V1-V3)
  application name: SQL*Plus, hash value=3669949024
  Current SQL Statement:
  update emp set sal=sal+1 where empno=7369
End of information on OTHER waiting sessions.

大家可以看到引起死鎖的語句（一個巴掌拍不響，一般死鎖都要2條或以上的語句才能引起死鎖），還有注意產生的object，這里的objn是51151，正是emp表：

scott$mydb@test02 SQL> select object_id,object_name from user_objects where object_name='EMP';   

 OBJECT_ID OBJECT_NAME
---------- ------------------------------
     51151 EMP

如果八卦一點，還可以看看引起死鎖的相關者都在等什么而“等死”了：

Rows waited on:
Session 128: obj - rowid = 0000C7CF - AAAMfPAAEAAAAAgAAA
  (dictionary objn - 51151, file - 4, block - 32, slot - 0)
Session 126: obj - rowid = 0000C7CF - AAAMfPAAEAAAAAgAAL
  (dictionary objn - 51151, file - 4, block - 32, slot - 11)

session 128 (也就是session 2，sid=128) 等著要ROWID=AAAMfPAAEAAAAAgAAA的行鎖，而session 126 (也就是session 1，sid=126)等著要ROWID=AAAMfPAAEAAAAAgAAL的行，驗證一下：

scott$mydb@test02 SQL> select rowid,empno from emp where empno in (7369,7900) ;

ROWID                   EMPNO
------------------ ----------
AAAMfPAAEAAAAAgAAA       7369
AAAMfPAAEAAAAAgAAL       7900

對照一下上面的表格，注意一下 update 語句的順序，你就明白了。

大家可以類比一下之前所說的故事，假設empno為 7369 和 7900 的行是紙和筆，session 1 和 session 2是小明和小強，最后老師Oracle跑出來調停。

解決死鎖的方法之一是給資源編號，然后按照固定的順序進行訪問，簡單來說，就是先改編號小的再改編號大的（翻過來亦然），如下表：

        +--------------------------------------------+--------------------------------------------+
        |         Session 1 (sid=126)                |            Session 2 (sid=128)             |
        +--------------------------------------------+--------------------------------------------+
        | update emp set sal=sal+1 where empno=7369; |                                            |
        +--------------------------------------------+--------------------------------------------+
        |                                            | update emp set sal=sal+1 where empno=7369; |
        |                                            | Waiting....                                |
        +--------------------------------------------+--------------------------------------------+
        | update emp set sal=sal+1 where empno=7900; |                                            |
        +--------------------------------------------+--------------------------------------------+
        | commit/rollback;                           |                                            |
        +--------------------------------------------+--------------------------------------------+
        |                                            | update emp set sal=sal+1 where empno=7900; |
        +--------------------------------------------+--------------------------------------------+

大家可以看到加了兩次工資，而且session 2 被 session 1 阻塞了。對于用戶感受不好，如果session 1 一直結束事務（commit/rollback），session 2 只能一直等啊等，比deadlock后，oracle出面調停還要糟糕，那怎么辦呢？

可以 select … for update nowait 語句測試一下需要更改的行是否被鎖定，如果沒有被鎖定那這個語句會馬上給這行加鎖，如果該已經被鎖定那就馬上返回 ORA-00054: resource busy and acquire with NOWAIT specified ，如下表所示：

        +--------------------------------------------+--------------------------------------------+
        |         Session 1 (sid=126)                |            Session 2 (sid=128)             |
        +--------------------------------------------+--------------------------------------------+
        | select * from emp where empno in(7369,7900)|                                            |
        | for update nowait ;                        |                                            |
        +--------------------------------------------+--------------------------------------------+
        |                                            | select * from emp where empno in(7369,7900)|
        |                                            | for update nowait ;                        |
        |                                            | ORA-00054: resource busy and acquire       |
        |                                            |            with NOWAIT specified           |
        +--------------------------------------------+--------------------------------------------+
        | update emp set sal=sal+1 where empno=7369; |                                            |
        +--------------------------------------------+--------------------------------------------+
        | update emp set sal=sal+1 where empno=7900; |                                            |
        +--------------------------------------------+--------------------------------------------+

有人可能會說，這樣做我的程序改動太大了，畢竟要引入一個select … for update nowait 和 ORA-00054 的判斷，有沒有更好的辦法呢？

有，更經典的處理死鎖的算法——“鴕鳥算法”，簡單來說就是“不管”，反正Oracle最終會出來調停的。

posted @ 2010-09-12 23:57 killkill 閱讀(10579) 評論(3) 收藏舉報

刷新頁面返回頂部

[原]ORA-00060: Deadlock detected（場景1:單表并發更新）

公告