[原]說不清楚是Oracle的Bug還是TSM的Bug
說到controlfile的自動備份相信很多人都會第一時間反應出RMAN配置中的這一項:
RMAN> show all; RMAN configuration parameters are: ...... ...... CONFIGURE CONTROLFILE AUTOBACKUP OFF; ...... ......
但是在Oracle 10g環境中,當完成備份system表空間的時候,controlfile也會自動備一份,通常controlfile也不大,多備幾次也無所謂。但是就是這個特性導致了TSM TDPO的一個報錯。
RMAN的備份腳本如下:
run {
configure exclude for tablespace PSTATS;
configure exclude for tablespace WGXOA_SYS13;
configure exclude for tablespace DZOA_SYS13;
show exclude ;
allocate channel ch1
device type 'sbt_tape'
format '%d_%I_%M%D_%s.dbf' ;
backup incremental level 0 database tag 'tsm';
backup current controlfile tag 'tsm';
configure exclude for tablespace PSTATS clear;
configure exclude for tablespace WGXOA_SYS13 clear;
configure exclude for tablespace DZOA_SYS13 clear;
release channel ch1;
}
RMAN的configure設置已經禁用了 controlfile 的autobackup。RMAN的備份日志如下:
RMAN> ### backup database everyday
2> set echo on;
3> run {
4> configure exclude for tablespace PSTATS;
5> configure exclude for tablespace WGXOA_SYS13;
6> configure exclude for tablespace DZOA_SYS13;
7> show exclude ;
8> allocate channel ch1
9> device type 'sbt_tape'
10> format '%d_%I_%M%D_%s.dbf' ;
11> backup incremental level 0 database tag 'tsm';
12> backup current controlfile tag 'tsm';
13> configure exclude for tablespace PSTATS clear;
14> configure exclude for tablespace WGXOA_SYS13 clear;
15> configure exclude for tablespace DZOA_SYS13 clear;
16> release channel ch1;
17> }
18> exit;
echo set on
tablespace PSTATS will be excluded from future whole database backups
new RMAN configuration parameters are successfully stored
starting full resync of recovery catalog
full resync complete
tablespace WGXOA_SYS13 will be excluded from future whole database backups
new RMAN configuration parameters are successfully stored
starting full resync of recovery catalog
full resync complete
tablespace DZOA_SYS13 will be excluded from future whole database backups
new RMAN configuration parameters are successfully stored
starting full resync of recovery catalog
full resync complete
RMAN configuration parameters are:
CONFIGURE EXCLUDE FOR TABLESPACE 'PSTATS';
CONFIGURE EXCLUDE FOR TABLESPACE 'WGXOA_SYS13';
CONFIGURE EXCLUDE FOR TABLESPACE 'DZOA_SYS13';
allocated channel: ch1
channel ch1: sid=491 devtype=SBT_TAPE
channel ch1: Data Protection for Oracle: version 5.4.1.0
Starting backup at 2011-02-22 23:30:41
file 7 is excluded from whole database backup
file 10 is excluded from whole database backup
file 11 is excluded from whole database backup
file 12 is excluded from whole database backup
file 13 is excluded from whole database backup
file 14 is excluded from whole database backup
file 15 is excluded from whole database backup
file 16 is excluded from whole database backup
file 17 is excluded from whole database backup
file 18 is excluded from whole database backup
file 19 is excluded from whole database backup
file 20 is excluded from whole database backup
file 21 is excluded from whole database backup
file 22 is excluded from whole database backup
file 23 is excluded from whole database backup
file 24 is excluded from whole database backup
file 25 is excluded from whole database backup
file 26 is excluded from whole database backup
file 27 is excluded from whole database backup
file 28 is excluded from whole database backup
file 29 is excluded from whole database backup
file 30 is excluded from whole database backup
file 32 is excluded from whole database backup
file 33 is excluded from whole database backup
file 34 is excluded from whole database backup
file 35 is excluded from whole database backup
channel ch1: starting incremental level 0 datafile backupset
channel ch1: specifying datafile(s) in backupset
input datafile fno=00005 name=/u01/app/oradata/oracle/LSOA_SYS13_01.dbf
input datafile fno=00009 name=/u01/app/oradata/oracle/HCBOA_SYS13.dbf
input datafile fno=00002 name=/u01/app/oradata/oracle/undotbs101.dbf
input datafile fno=00031 name=/u01/app/oradata/oracle/ZC_USERS02.dbf
input datafile fno=00006 name=/u01/app/oradata/oracle/ZC_USERS01.dbf
input datafile fno=00001 name=/u01/app/oradata/oracle/system01.dbf
input datafile fno=00003 name=/u01/app/oradata/oracle/sysaux01.dbf
input datafile fno=00008 name=/u01/app/oradata/oracle/ZC_INDX01.dbf
input datafile fno=00004 name=/u01/app/oradata/oracle/users01.dbf
channel ch1: starting piece 1 at 2011-02-22 23:30:42
channel ch1: finished piece 1 at 2011-02-22 23:39:57
piece handle=ORACLE_1541093227_0222_348.dbf tag=TSM comment=API Version 2.0,MMS Version 5.4.1.0
channel ch1: backup set complete, elapsed time: 00:09:15
channel ch1: starting incremental level 0 datafile backupset
channel ch1: specifying datafile(s) in backupset
including current control file in backupset
including current SPFILE in backupset
channel ch1: starting piece 1 at 2011-02-22 23:39:59
channel ch1: finished piece 1 at 2011-02-22 23:40:14
piece handle=ORACLE_1541093227_0222_349.dbf tag=TSM comment=API Version 2.0,MMS Version 5.4.1.0
channel ch1: backup set complete, elapsed time: 00:00:17
Finished backup at 2011-02-22 23:40:14
Starting backup at 2011-02-22 23:40:15
channel ch1: starting full datafile backupset
channel ch1: specifying datafile(s) in backupset
including current control file in backupset
channel ch1: starting piece 1 at 2011-02-22 23:40:15
released channel: ch1
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of backup command on ch1 channel at 02/22/2011 23:40:30
ORA-27192: skgfcls: sbtclose2 returned error - failed to close file
ORA-19511: Error received from media manager layer, error text:
ANS1301E (RC1) Server detected system error
Recovery Manager complete.
TSM服務器的日志如下:
02/22/2011 23:40:22 ANR0511I Session 21019 opened output volume
E:\TSMPOOL\000034D4.BFS. (SESSION: 21019)
02/22/2011 23:40:24 ANR8340I SERVER volume ZAIBEI3_SERVER1.BFS.298389221
mounted. (SESSION: 21019)
02/22/2011 23:40:24 ANR1340I Scratch volume ZAIBEI3_SERVER1.BFS.298389221 is
now defined in storage pool COPY_POOL. (SESSION: 21019)
02/22/2011 23:40:24 ANR0511I Session 21019 opened output volume
ZAIBEI3_SERVER1.BFS.298389221. (SESSION: 21019)
02/22/2011 23:40:24 ANR4383E Session failure, target server ZAIBEI3_SERVER1
has aborted current transaction; reason: Exceeded
MAXNUMMP on target. (SESSION: 21019)
02/22/2011 23:40:24 ANR1411W Access mode for volume
ZAIBEI3_SERVER1.BFS.298389221 now set to "read-only" due
to write error. (SESSION: 21019)
02/22/2011 23:40:24 ANR0514I Session 21019 closed volume
ZAIBEI3_SERVER1.BFS.298389221. (SESSION: 21019)
02/22/2011 23:40:24 ANR1181E astxn.c(684): Data storage transaction 0:6735375
was aborted. (SESSION: 21019)
02/22/2011 23:40:24 ANR0532W smnode.c(3191): Transaction 0:6735375 was aborted
for session 21019 for node ORA_110_7 (TDPO LinuxAMD64).
(SESSION: 21019)
02/22/2011 23:40:24 ANE4994S (Session: 21019, Node: ORA_110_7) TDPO
LinuxAMD64 ANU0599 TDP for Oracle: (9738): => (ora_110_7)
ANS1301E (RC1) Server detected system error (SESSION:
21019)
02/22/2011 23:40:24 ANR0514I Session 21019 closed volume
E:\TSMPOOL\000034D4.BFS. (SESSION: 21019)
02/22/2011 23:40:24 ANR0403I Session 21019 ended for node ORA_110_7 (TDPO
LinuxAMD64). (SESSION: 21019)
從日志來看是參數MAXNUMMP 配置過小,或者空間滿導致讀寫錯誤引發RMAN報錯,實際上MAXNUMMP 的設置是完全夠用,空間也很充足。
再仔細分析,從RMAN腳本中可以看到只有兩個backup語句,從channel的配置來看最多就產生兩個backup set,仔細查看日志可發現:
channel ch1: starting piece 1 at 2011-02-22 23:30:42 channel ch1: finished piece 1 at 2011-02-22 23:39:57 piece handle=ORACLE_1541093227_0222_348.dbf tag=TSM comment=API Version 2.0,MMS Version 5.4.1.0 channel ch1: starting piece 1 at 2011-02-22 23:39:59 channel ch1: finished piece 1 at 2011-02-22 23:40:14 piece handle=ORACLE_1541093227_0222_349.dbf tag=TSM comment=API Version 2.0,MMS Version 5.4.1.0
這兩句是我們期望的,但是偏偏不知道在哪里多了個backup操作:
Starting backup at 2011-02-22 23:40:15 channel ch1: starting full datafile backupset channel ch1: specifying datafile(s) in backupset including current control file in backupset channel ch1: starting piece 1 at 2011-02-22 23:40:15 released channel: ch1 RMAN-00571: =========================================================== RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS =============== RMAN-00571: =========================================================== RMAN-03009: failure of backup command on ch1 channel at 02/22/2011 23:40:30 ORA-27192: skgfcls: sbtclose2 returned error - failed to close file ORA-19511: Error received from media manager layer, error text: ANS1301E (RC1) Server detected system error
從日志信息來看,就是備份controlfile和spfile,但是明明已經禁用了controlfile 的autobackup。這是為什么呢?
在《RMAN recipes for Oracle database 11g: a problem-solution approach》中有這么一句:
但是Oracle的官方文檔中這樣寫道:
The RMAN behavior when theBACKUPcommand includes datafile1depends on theCONFIGURECONTROLFILEAUTOBACKUPsetting. If control file autobackups are ON and the backup includes datafile 1, RMAN writes the control file and SPFILE to a separate autobackup backup set. If control file autobackups are OFF and the backup includes datafile 1, then RMAN includes the current control file and SPFILE in the same backup set as the datafiles.
簡單來說,就是datafile 1(通常是system表空間)如果被backup,controlfile和spfile都會被備份,controlfile autobackup的設定僅僅是控制備份到哪里。
這就解析了controlfile和spfile為什么會“自動”備份了。
以下是RMAN的基本配置:
RMAN> show all ; RMAN configuration parameters are: CONFIGURE RETENTION POLICY TO RECOVERY WINDOW OF 10 DAYS; CONFIGURE BACKUP OPTIMIZATION OFF; # default CONFIGURE DEFAULT DEVICE TYPE TO DISK; # default CONFIGURE CONTROLFILE AUTOBACKUP OFF; CONFIGURE CONTROLFILE AUTOBACKUP FORMAT FOR DEVICE TYPE DISK TO '/u01/ocfs_arch_ocr/backup/db_192.168.110.5_%d_%T_%F.ctrl'; CONFIGURE CONTROLFILE AUTOBACKUP FORMAT FOR DEVICE TYPE 'SBT_TAPE' TO 'db_RAC1_%d_%F.ctrl'; CONFIGURE DEVICE TYPE DISK PARALLELISM 1 BACKUP TYPE TO BACKUPSET; # default CONFIGURE DATAFILE BACKUP COPIES FOR DEVICE TYPE DISK TO 1; # default CONFIGURE ARCHIVELOG BACKUP COPIES FOR DEVICE TYPE DISK TO 1; # default CONFIGURE MAXSETSIZE TO UNLIMITED; # default CONFIGURE ENCRYPTION FOR DATABASE OFF; # default CONFIGURE ENCRYPTION ALGORITHM 'AES128'; # default CONFIGURE ARCHIVELOG DELETION POLICY TO NONE; # default CONFIGURE SNAPSHOT CONTROLFILE NAME TO '/u01/app/oracle/dbs/snapcf_oracle1.f'; # default
僅備份system表空間,觸發controlfile autobackup:
run {
allocate channel ch1
device type 'sbt_tape'
format '%d_%I_%M%D_%s.dbf' ;
backup incremental level 0 tablespace system tag 'tsm';
release channel ch1 ;
}
allocated channel: ch1 channel ch1: sid=447 instance=oracle1 devtype=SBT_TAPE channel ch1: Data Protection for Oracle: version 5.4.1.0 Starting backup at 2011-02-23 17:25:23 channel ch1: starting incremental level 0 datafile backupset channel ch1: specifying datafile(s) in backupset input datafile fno=00001 name=+DATA/oracle/system01.dbf channel ch1: starting piece 1 at 2011-02-23 17:25:23 channel ch1: finished piece 1 at 2011-02-23 17:25:48 piece handle=ORACLE_1557493236_0223_1269.dbf tag=TSM comment=API Version 2.0,MMS Version 5.4.1.0 channel ch1: backup set complete, elapsed time: 00:00:25 channel ch1: starting incremental level 0 datafile backupset channel ch1: specifying datafile(s) in backupset including current control file in backupset including current SPFILE in backupset channel ch1: starting piece 1 at 2011-02-23 17:25:50 channel ch1: finished piece 1 at 2011-02-23 17:26:05 piece handle=ORACLE_1557493236_0223_1270.dbf tag=TSM comment=API Version 2.0,MMS Version 5.4.1.0 channel ch1: backup set complete, elapsed time: 00:00:17 Finished backup at 2011-02-23 17:26:05 released channel: ch1
分別備份兩個表空間,其中一個是system表空間:
run {
allocate channel ch1
device type 'sbt_tape'
format '%d_%I_%M%D_%s.dbf' ;
backup incremental level 0 tablespace system tag 'tsm';
backup incremental level 0 tablespace users tag 'tsm';
release channel ch1 ;
}
allocated channel: ch1 channel ch1: sid=447 instance=oracle1 devtype=SBT_TAPE channel ch1: Data Protection for Oracle: version 5.4.1.0 Starting backup at 2011-02-23 17:30:09 channel ch1: starting incremental level 0 datafile backupset channel ch1: specifying datafile(s) in backupset input datafile fno=00001 name=+DATA/oracle/system01.dbf channel ch1: starting piece 1 at 2011-02-23 17:30:10 channel ch1: finished piece 1 at 2011-02-23 17:30:35 piece handle=ORACLE_1557493236_0223_1271.dbf tag=TSM comment=API Version 2.0,MMS Version 5.4.1.0 channel ch1: backup set complete, elapsed time: 00:00:25 channel ch1: starting incremental level 0 datafile backupset channel ch1: specifying datafile(s) in backupset including current control file in backupset including current SPFILE in backupset channel ch1: starting piece 1 at 2011-02-23 17:30:36 channel ch1: finished piece 1 at 2011-02-23 17:30:51 piece handle=ORACLE_1557493236_0223_1272.dbf tag=TSM comment=API Version 2.0,MMS Version 5.4.1.0 channel ch1: backup set complete, elapsed time: 00:00:16 Finished backup at 2011-02-23 17:30:51 Starting backup at 2011-02-23 17:30:52 channel ch1: starting incremental level 0 datafile backupset channel ch1: specifying datafile(s) in backupset input datafile fno=00005 name=+DATA/oracle/users01.dbf channel ch1: starting piece 1 at 2011-02-23 17:30:52 released channel: ch1 RMAN-00571: =========================================================== RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS =============== RMAN-00571: =========================================================== RMAN-03009: failure of backup command on ch1 channel at 02/23/2011 17:31:07 ORA-27192: skgfcls: sbtclose2 returned error - failed to close file ORA-19511: Error received from media manager layer, error text: ANS1301E (RC1) Server detected system error
使用include current controlfile 子句強制備份controlfile:
run {
allocate channel ch1
device type 'sbt_tape'
format '%d_%I_%M%D_%s.dbf' ;
backup incremental level 0 tablespace system
include current controlfile
tag 'tsm';
release channel ch1 ;
}
allocated channel: ch1 channel ch1: sid=447 instance=oracle1 devtype=SBT_TAPE channel ch1: Data Protection for Oracle: version 5.4.1.0 Starting backup at 2011-02-23 17:32:57 channel ch1: starting incremental level 0 datafile backupset channel ch1: specifying datafile(s) in backupset input datafile fno=00001 name=+DATA/oracle/system01.dbf channel ch1: starting piece 1 at 2011-02-23 17:32:57 channel ch1: finished piece 1 at 2011-02-23 17:33:22 piece handle=ORACLE_1557493236_0223_1274.dbf tag=TSM comment=API Version 2.0,MMS Version 5.4.1.0 channel ch1: backup set complete, elapsed time: 00:00:25 channel ch1: starting incremental level 0 datafile backupset channel ch1: specifying datafile(s) in backupset including current control file in backupset including current SPFILE in backupset channel ch1: starting piece 1 at 2011-02-23 17:33:24 channel ch1: finished piece 1 at 2011-02-23 17:33:39 piece handle=ORACLE_1557493236_0223_1275.dbf tag=TSM comment=API Version 2.0,MMS Version 5.4.1.0 channel ch1: backup set complete, elapsed time: 00:00:17 Finished backup at 2011-02-23 17:33:39 released channel: ch1
基本上沒有人用的寫法:
run {
allocate channel ch1
device type 'sbt_tape'
format '%d_%I_%M%D_%s.dbf' ;
backup incremental level 0 tablespace system
include current controlfile
tag 'tsm';
backup incremental level 0 tablespace users tag 'tsm';
release channel ch1 ;
}
allocated channel: ch1 channel ch1: sid=447 instance=oracle1 devtype=SBT_TAPE channel ch1: Data Protection for Oracle: version 5.4.1.0 Starting backup at 2011-02-23 17:34:09 channel ch1: starting incremental level 0 datafile backupset channel ch1: specifying datafile(s) in backupset input datafile fno=00001 name=+DATA/oracle/system01.dbf channel ch1: starting piece 1 at 2011-02-23 17:34:10 RMAN-03009: failure of backup command on ch1 channel at 02/23/2011 17:34:35 ORA-27192: skgfcls: sbtclose2 returned error - failed to close file ORA-19511: Error received from media manager layer, error text: ANS1301E (RC1) Server detected system error continuing other job steps, job failed will not be re-run channel ch1: starting incremental level 0 datafile backupset channel ch1: specifying datafile(s) in backupset including current control file in backupset including current SPFILE in backupset channel ch1: starting piece 1 at 2011-02-23 17:34:36 released channel: ch1 RMAN-00571: =========================================================== RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS =============== RMAN-00571: =========================================================== RMAN-03009: failure of backup command on ch1 channel at 02/23/2011 17:34:51 ORA-27192: skgfcls: sbtclose2 returned error - failed to close file ORA-19511: Error received from media manager layer, error text: ANS1301E (RC1) Server detected system error
一種比較正常的寫法:
run {
allocate channel ch1
device type 'sbt_tape'
format '%d_%I_%M%D_%s.dbf' ;
backup incremental level 0 tablespace users tag 'tsm';
backup incremental level 0 tablespace system
include current controlfile
tag 'tsm';
release channel ch1 ;
}
ANS0102W Unable to open the message repository /opt/tivoli/tsm/client/oracle/bin64/zh_CN/tdpo.cat. The American English repository will be used instead. allocated channel: ch1 channel ch1: sid=447 instance=oracle1 devtype=SBT_TAPE channel ch1: Data Protection for Oracle: version 5.4.1.0 Starting backup at 2011-02-23 17:38:40 channel ch1: starting incremental level 0 datafile backupset channel ch1: specifying datafile(s) in backupset input datafile fno=00005 name=+DATA/oracle/users01.dbf channel ch1: starting piece 1 at 2011-02-23 17:38:41 channel ch1: finished piece 1 at 2011-02-23 17:38:56 piece handle=ORACLE_1557493236_0223_1278.dbf tag=TSM comment=API Version 2.0,MMS Version 5.4.1.0 channel ch1: backup set complete, elapsed time: 00:00:15 Finished backup at 2011-02-23 17:38:56 Starting backup at 2011-02-23 17:38:57 channel ch1: starting incremental level 0 datafile backupset channel ch1: specifying datafile(s) in backupset input datafile fno=00001 name=+DATA/oracle/system01.dbf channel ch1: starting piece 1 at 2011-02-23 17:38:57 channel ch1: finished piece 1 at 2011-02-23 17:39:22 piece handle=ORACLE_1557493236_0223_1279.dbf tag=TSM comment=API Version 2.0,MMS Version 5.4.1.0 channel ch1: backup set complete, elapsed time: 00:00:25 channel ch1: starting incremental level 0 datafile backupset channel ch1: specifying datafile(s) in backupset including current control file in backupset including current SPFILE in backupset channel ch1: starting piece 1 at 2011-02-23 17:39:24 released channel: ch1 RMAN-00571: =========================================================== RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS =============== RMAN-00571: =========================================================== RMAN-03009: failure of backup command on ch1 channel at 02/23/2011 17:39:39 ORA-27192: skgfcls: sbtclose2 returned error - failed to close file ORA-19511: Error received from media manager layer, error text: ANS1301E (RC1) Server detected system error
如果不使用TSM進行備份,是否會報錯呢?
run {
backup incremental level 0 tablespace system
include current controlfile
format '/u01/ocfs_arch_ocr/backup/db_full_192.168.110.5_%d_%T_%U.dbf'
tag 'tsm';
backup incremental level 0 tablespace users
format '/u01/ocfs_arch_ocr/backup/db_full_192.168.110.5_%d_%T_%U.dbf'
tag 'tsm';
}
Starting backup at 2011-02-23 17:42:40 using channel ORA_DISK_1 channel ORA_DISK_1: starting incremental level 0 datafile backupset channel ORA_DISK_1: specifying datafile(s) in backupset input datafile fno=00001 name=+DATA/oracle/system01.dbf channel ORA_DISK_1: starting piece 1 at 2011-02-23 17:42:40 channel ORA_DISK_1: finished piece 1 at 2011-02-23 17:42:47 piece handle=/u01/ocfs_arch_ocr/backup/db_full_192.168.110.5_ORACLE_20110223_82m5dekg_1_1.dbf tag=TSM comment=NONE channel ORA_DISK_1: backup set complete, elapsed time: 00:00:07 channel ORA_DISK_1: starting incremental level 0 datafile backupset channel ORA_DISK_1: specifying datafile(s) in backupset including current control file in backupset including current SPFILE in backupset channel ORA_DISK_1: starting piece 1 at 2011-02-23 17:42:49 channel ORA_DISK_1: finished piece 1 at 2011-02-23 17:42:50 piece handle=/u01/ocfs_arch_ocr/backup/db_full_192.168.110.5_ORACLE_20110223_83m5dekn_1_1.dbf tag=TSM comment=NONE channel ORA_DISK_1: backup set complete, elapsed time: 00:00:03 Finished backup at 2011-02-23 17:42:50 Starting backup at 2011-02-23 17:42:50 using channel ORA_DISK_1 channel ORA_DISK_1: starting incremental level 0 datafile backupset channel ORA_DISK_1: specifying datafile(s) in backupset input datafile fno=00005 name=+DATA/oracle/users01.dbf channel ORA_DISK_1: starting piece 1 at 2011-02-23 17:42:51 channel ORA_DISK_1: finished piece 1 at 2011-02-23 17:42:52 piece handle=/u01/ocfs_arch_ocr/backup/db_full_192.168.110.5_ORACLE_20110223_84m5dekr_1_1.dbf tag=TSM comment=NONE channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01 Finished backup at 2011-02-23 17:42:52
從測試來看,run代碼塊中出現兩個backup,TSM就變傻了。

浙公網安備 33010602011771號