[原]工欲善其事,必先利其器,記一次處理Oracle Listener掛掉的處理過程
國慶節(jié)凌晨0:36收到郵件報網(wǎng)通Tomcat服務(wù)器掛掉,Tomcat總是那么脆弱,重啟是家常便飯了,但這次和往常不一樣,WEB服務(wù)始終沒有起來,馬上重啟整臺服務(wù)器再啟動Tomcat,故障依舊。
馬上查看Tomcat日志:
2010-10-1 1:10:42 com.fsm.util.SearchDispatchFilter init 信息: SearchDispatchFilter.init() done ***********************創(chuàng)建數(shù)據(jù)源******************** createDataSource:DriverClassName=oracle.jdbc.driver.OracleDriver createDataSource:Url=jdbc:oracle:thin:@db_server:1521/ora8i ***************************************************
過了3分鐘,才出現(xiàn)久違“Server startup”:
[FSM-ERROR]:2010-10-01 01:13:55,394-org.hibernate.util.JDBCExceptionReporter.logExceptions(LINE:78) Cannot create PoolableConnectionFactory (Io 異常: The Network Adapter could not establish the connection) org.tuckey.web.filters.urlrewrite.UrlRewriteFilter INFO: loaded (conf ok) 2010-10-1 1:13:58 org.apache.coyote.http11.Http11BaseProtocol start 信息: Starting Coyote HTTP/1.1 on http-80 2010-10-1 1:13:58 org.apache.jk.common.ChannelSocket init 信息: JK: ajp13 listening on /0.0.0.0:8009 2010-10-1 1:13:58 org.apache.jk.server.JkMain start 信息: Jk running ID=0 time=0/84 config=null 2010-10-1 1:13:58 org.apache.catalina.storeconfig.StoreLoader load 信息: Find registry server-registry.xml at classpath resource 2010-10-1 1:13:58 org.apache.catalina.startup.Catalina start 信息: Server startup in 205603 ms
通常Tomcat也就是啟動20來秒,這次卻啟動了200多秒,不太正常,再往上看出現(xiàn)了“JDBC”和“The Network Adapter could not establish the connection”,99%就是與數(shù)據(jù)庫建立連接時候出現(xiàn)了問題,馬上telnet數(shù)據(jù)庫的1521端口,可連接是通的:
[root@primary_node bin]# telnet db_server 1521 Trying 192.168.4.20... Connected to db_server (192.168.4.20). Escape character is '^]'. aaaaa aaa aaaaa aaaaaa aaaaa
上面那堆'”a”是我輸入的,檢查連接是否真的通了。
于是將矛頭指向配置文件和運行所需的jar包上,也沒有發(fā)現(xiàn)最近被修改過的痕跡,將另一臺服務(wù)器的配置文件和jar包copy過來,重啟Tomcat,故障依舊。此時已是2:11分,此時該臺網(wǎng)通Tomcat服務(wù)器的VIP早就被電信服務(wù)器搶過去了,服務(wù)沒有受到影響,此時精神上有點吃不消了,趕緊睡覺去,獲取周公能給我靈感。
早上7:25分醒來,第一件事睜開眼睛,第二件事登錄到服務(wù)器上繼續(xù)調(diào)試問題,首先將數(shù)據(jù)庫IP指向測試數(shù)據(jù)庫,啟動Tomcat,WEB服務(wù)跑起來了,這個現(xiàn)象再次讓我把矛頭指向Oracle服務(wù)器,由于服務(wù)器上沒有安裝Oracle的客戶端只能通過telnet這種原始的方式進行測試,telnet測試的Oracle服務(wù)器:
[root@primary_node bin]# telnet test_db_server 1521 Trying 192.168.4.74... Connected to db_server (192.168.4.74). Escape character is '^]'. aaaaa aaa Connection closed by foreign host.
奇怪了,測試用的Oracle服務(wù)器會主動close連接,而生產(chǎn)用的Oracle服務(wù)器不會主動close連接,終于找到突破口了。 登錄到Oracle服務(wù)器,查看以下listener的狀態(tài):
[oracle@wz_oracle1 adump]$ lsnrctl status LSNRCTL for Linux: Version 10.2.0.2.0 - Production on 01-OCT-2010 09:08:09 Copyright (c) 1991, 2005, Oracle. All rights reserved. Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.4.20)(PORT=1521)))
Hang住,取消,直接 lsnrctl stop ,還是Hang住,出必殺 killall –9 tnslsnr,再啟動:
[oracle@wz_oracle1 adump]$ lsnrctl start LSNRCTL for Linux: Version 10.2.0.2.0 - Production on 01-OCT-2010 09:10:17 Copyright (c) 1991, 2005, Oracle. All rights reserved. Starting /u01/app/oracle/bin/tnslsnr: please wait... TNSLSNR for Linux: Version 10.2.0.2.0 - Production System parameter file is /u01/app/oracle/network/admin/listener.ora Log messages written to /u01/app/oracle/network/log/listener.log Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.4.20)(PORT=1521))) Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.4.30)(PORT=1521))) Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.4.20)(PORT=1521))) STATUS of the LISTENER ------------------------ Alias LISTENER Version TNSLSNR for Linux: Version 10.2.0.2.0 - Production Start Date 01-OCT-2010 09:10:17 Uptime 0 days 0 hr. 0 min. 0 sec Trace Level off Security ON: Local OS Authentication SNMP OFF Listener Parameter File /u01/app/oracle/network/admin/listener.ora Listener Log File /u01/app/oracle/network/log/listener.log Listening Endpoints Summary... (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.4.20)(PORT=1521))) (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.4.30)(PORT=1521))) Services Summary... Service "PLSExtProc" has 1 instance(s). Instance "PLSExtProc", status UNKNOWN, has 1 handler(s) for this service... Service "ora8i" has 1 instance(s). Instance "ora8i", status UNKNOWN, has 1 handler(s) for this service... Service "shdb" has 1 instance(s). Instance "ora8i", status UNKNOWN, has 1 handler(s) for this service... The command completed successfully
搞定,重啟Tomcat,一切恢復(fù)正常。
這次故障處理時間長達3小時,最開始以為是一個普通的故障,重啟一下Tomcat就能完事,但沒那么簡單,然后被 telnet 的現(xiàn)象所迷惑,走入了一條長長的彎路,“工欲善其事,必先利其器”如果當(dāng)時找一臺服務(wù)器用sqlplus 遠程連一下數(shù)據(jù)庫就不用折騰那么久了。
浙公網(wǎng)安備 33010602011771號