[原]ORA-00060: Deadlock detected(場景1:單表并發更新)
先說說什么是死鎖(Deadlock),關于死鎖的定義google、baidu可以輕易查到,我也不想引經據典,我用一個簡單故事來說明一下死鎖。
話說一個風和日麗的下午,小明和小強打架,由于情節嚴重被老師批評教育不說還得放學后寫悔過書,大家知道寫悔過書要紙和筆,恰巧在剛才那場大戰中小明將小強全部作業本和草稿紙撕碎,而小強將小明所有筆摔個稀巴爛,現在兩人勢如水火,別說借東西,話都不想說,于是就這么等啊等,等到老師來看他們的悔過書寫得怎么樣,結果當然是“還沒開始寫”,于是老師說:“小明你把紙借給小強,讓小強寫”。
哈哈整個故事有點牽強,但這就是一個經典死鎖場景,如果老師不出來調停,他兩就只能這樣等下去,俗語叫“等死”了。
說回正題,Oracle 特殊的鎖管理模式使發生死鎖的幾率大大減少,但是,要相信“一切皆有可能”,以后幾篇博客對會分析導致 ORA-00060 的各種場景和處理方法。
今天說說第一個場景,也是網上能找到最多的場景,我都不太好意思在標題上加個“[原]”標志了。
還是使用經典的 scott demo 吧:
開兩個會話,看看各自的sid:
select sid from v$mystat where rownum=1;
我這里兩個會話的 sid 分別是 126 和128 ??纯词纠龜祿?/p>
scott$mydb@test02 SQL> set pagesize 50 ;
scott$mydb@test02 SQL> select * from emp ;
EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO
---------- ---------- --------- ---------- ------------------- ---------- ---------- ----------
7369 SMITH CLERK 7902 1980-12-17 00:00:00 800 20
7499 ALLEN SALESMAN 7698 1981-02-20 00:00:00 1600 300 30
7521 WARD SALESMAN 7698 1981-02-22 00:00:00 1250 500 30
7566 JONES MANAGER 7839 1981-04-02 00:00:00 2975 20
7654 MARTIN SALESMAN 7698 1981-09-28 00:00:00 1250 1400 30
7698 BLAKE MANAGER 7839 1981-05-01 00:00:00 2850 30
7782 CLARK MANAGER 7839 1981-06-09 00:00:00 2450 10
7788 SCOTT ANALYST 7566 1987-04-19 00:00:00 3000 20
7839 KING PRESIDENT 1981-11-17 00:00:00 5000 10
7844 TURNER SALESMAN 7698 1981-09-08 00:00:00 1500 0 30
7876 ADAMS CLERK 7788 1987-05-23 00:00:00 1100 20
7900 JAMES CLERK 7698 1981-12-03 00:00:00 950 30
7902 FORD ANALYST 7566 1981-12-03 00:00:00 3000 20
7934 MILLER CLERK 7782 1982-01-23 00:00:00 1300 10
我想為工資(SAL)最低的兩個人(ENAME:Smith、James EMPNO:7369、7900)加工資1元(我承認這次工資的漲幅最不上CPI),而另一位毫不知情的 Manager Blake 也想給他們加1元的工資,我在 sid 為 126 進行操作,而 Blake 在 sid 為 128 的會話中操作,執行順序如下表:
+--------------------------------------------+--------------------------------------------+
| Session 1 (sid=126) | Session 2 (sid=128) |
+--------------------------------------------+--------------------------------------------+
| update emp set sal=sal+1 where empno=7369; | |
+--------------------------------------------+--------------------------------------------+
| | update emp set sal=sal+1 where empno=7900; |
+--------------------------------------------+--------------------------------------------+
| update emp set sal=sal+1 where empno=7900; | |
+--------------------------------------------+--------------------------------------------+
| | update emp set sal=sal+1 where empno=7369; |
+--------------------------------------------+--------------------------------------------+
| ORA-00060: deadlock detected | |
| while waiting for resource | |
+--------------------------------------------+--------------------------------------------+
這樣我們就成功地觸發了一個ORA-00060,從aler文件中可以看到一條類似如下的信息:
ORA-00060: Deadlock detected. More info in file /u01/app/admin/mydb/udump/mydb_ora_7531.trc.
我們看看 trc 文件,其中最有用的一部分是 Deadlock graph :
[Transaction Deadlock]
Current SQL statement for this session:
update emp set sal=sal+1 where empno=7900
The following deadlock is not an ORACLE error. It is a
deadlock due to user error in the design of an application
or from issuing incorrect ad-hoc SQL. The following
information may aid in determining the deadlock:
Deadlock graph:
---------Blocker(s)-------- ---------Waiter(s)---------
Resource Name process session holds waits process session holds waits
TX-00010013-00005f96 32 126 X 34 128 X
TX-00030001-000065ed 34 128 X 32 126 X
session 126: DID 0001-0020-00003C79 session 128: DID 0001-0022-0000772F
session 128: DID 0001-0022-0000772F session 126: DID 0001-0020-00003C79
Rows waited on:
Session 128: obj - rowid = 0000C7CF - AAAMfPAAEAAAAAgAAA
(dictionary objn - 51151, file - 4, block - 32, slot - 0)
Session 126: obj - rowid = 0000C7CF - AAAMfPAAEAAAAAgAAL
(dictionary objn - 51151, file - 4, block - 32, slot - 11)
Information on the OTHER waiting sessions:
Session 128:
pid=34 serial=31980 audsid=319634 user: 54/SCOTT
O/S info: user: oracle, term: pts/8, ospid: 22771, machine: test02
program: sqlplus@test02 (TNS V1-V3)
application name: SQL*Plus, hash value=3669949024
Current SQL Statement:
update emp set sal=sal+1 where empno=7369
End of information on OTHER waiting sessions.
大家可以看到引起死鎖的語句(一個巴掌拍不響,一般死鎖都要2條或以上的語句才能引起死鎖),還有注意產生的object,這里的objn是51151,正是emp表:
scott$mydb@test02 SQL> select object_id,object_name from user_objects where object_name='EMP';
OBJECT_ID OBJECT_NAME
---------- ------------------------------
51151 EMP
如果八卦一點,還可以看看引起死鎖的相關者都在等什么而“等死”了:
Rows waited on: Session 128: obj - rowid = 0000C7CF - AAAMfPAAEAAAAAgAAA (dictionary objn - 51151, file - 4, block - 32, slot - 0) Session 126: obj - rowid = 0000C7CF - AAAMfPAAEAAAAAgAAL (dictionary objn - 51151, file - 4, block - 32, slot - 11)
session 128 (也就是session 2,sid=128) 等著要ROWID=AAAMfPAAEAAAAAgAAA的行鎖,而session 126 (也就是session 1,sid=126)等著要ROWID=AAAMfPAAEAAAAAgAAL的行,驗證一下:
scott$mydb@test02 SQL> select rowid,empno from emp where empno in (7369,7900) ; ROWID EMPNO ------------------ ---------- AAAMfPAAEAAAAAgAAA 7369 AAAMfPAAEAAAAAgAAL 7900
對照一下上面的表格,注意一下 update 語句的順序,你就明白了。
大家可以類比一下之前所說的故事,假設empno為 7369 和 7900 的行是紙和筆,session 1 和 session 2是小明和小強,最后老師Oracle跑出來調停。
解決死鎖的方法之一是給資源編號,然后按照固定的順序進行訪問,簡單來說,就是先改編號小的再改編號大的(翻過來亦然),如下表:
+--------------------------------------------+--------------------------------------------+
| Session 1 (sid=126) | Session 2 (sid=128) |
+--------------------------------------------+--------------------------------------------+
| update emp set sal=sal+1 where empno=7369; | |
+--------------------------------------------+--------------------------------------------+
| | update emp set sal=sal+1 where empno=7369; |
| | Waiting.... |
+--------------------------------------------+--------------------------------------------+
| update emp set sal=sal+1 where empno=7900; | |
+--------------------------------------------+--------------------------------------------+
| commit/rollback; | |
+--------------------------------------------+--------------------------------------------+
| | update emp set sal=sal+1 where empno=7900; |
+--------------------------------------------+--------------------------------------------+
大家可以看到加了兩次工資,而且session 2 被 session 1 阻塞了。對于用戶感受不好,如果session 1 一直結束事務(commit/rollback),session 2 只能一直等啊等,比deadlock后,oracle出面調停還要糟糕,那怎么辦呢?
可以 select … for update nowait 語句測試一下需要更改的行是否被鎖定,如果沒有被鎖定那這個語句會馬上給這行加鎖,如果該已經被鎖定那就馬上返回 ORA-00054: resource busy and acquire with NOWAIT specified ,如下表所示:
+--------------------------------------------+--------------------------------------------+
| Session 1 (sid=126) | Session 2 (sid=128) |
+--------------------------------------------+--------------------------------------------+
| select * from emp where empno in(7369,7900)| |
| for update nowait ; | |
+--------------------------------------------+--------------------------------------------+
| | select * from emp where empno in(7369,7900)|
| | for update nowait ; |
| | ORA-00054: resource busy and acquire |
| | with NOWAIT specified |
+--------------------------------------------+--------------------------------------------+
| update emp set sal=sal+1 where empno=7369; | |
+--------------------------------------------+--------------------------------------------+
| update emp set sal=sal+1 where empno=7900; | |
+--------------------------------------------+--------------------------------------------+
有人可能會說,這樣做我的程序改動太大了,畢竟要引入一個select … for update nowait 和 ORA-00054 的判斷,有沒有更好的辦法呢?
有,更經典的處理死鎖的算法——“鴕鳥算法”,簡單來說就是“不管”,反正Oracle最終會出來調停的。
浙公網安備 33010602011771號