Tuesday 13 December 2011

Block Recovery Fails : ORA-00283: recovery session canceled due to errors ; ORA-00267: name of archived log file not needed ; ORA-06512: at "SYS.DBMS_BACKUP_RESTORE", line 4655 ; ORA-00368: checksum error in redo log block

We had certain blocks in the database getting corrupted due to issues with attached SAN disks. Once the SAN disks were fixed, we were supposed to recover the corrupted blocks in the database and followed the following procedure to recover all the corrupted blocks:

1. Perform database logical check:
run {
allocate channel d1 type disk;
backup check logical validate database;

release channel d1;
}


2. Check whether there are corrupt blocks in the database
select * from v$database_block_corruption;

3. Recover all the corrupted blocks
run {
allocate channel d1 type disk;
blockrecover corruption list;

release channel d1;
}

Note: The syntax to recover blocks indiviually is:
RMAN> BLOCKRECOVER DATAFILE <file#> BLOCK <block#>,<block#> DATAFILE <file#> BLOCK <block#>;
eg. RMAN> BLOCKRECOVER DATAFILE 2 BLOCK 12, 13 DATAFILE 3 BLOCK 5, 98, 99 DATAFILE 4 BLOCK 19;

While the recovery was going, the following error was reported and the recovery canceled:
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of blockrecover command at

RMAN-10035: exception raised in RPC:
ORA-00283: recovery session canceled due to errors
ORA-00267: name of archived log file not needed
ORA-06512: at "SYS.DBMS_BACKUP_RESTORE", line 4655
RMAN-10031: RPC Error: ORA-00283 occurred during call to DBMS_BACKUP_RESTORE.BMRDOMEDIARECOVERY


After doing some analysis, we found that one ofthe current redo log files had a corrupt block reported in the alert log file as well:
ORA-00368: checksum error in redo log block
ORA-00353: log corruption near block 143704 change 7917341986636 time 09/20/2010 22:26:57
ORA-00334: archived log: '/u01/oradata/DB/redo01a.log'


To fix the above, we firstly checked that all the other redo log files were not having any corrupt blocks by running the following against every logfile:
SQL> alter system dump logfile 'full path of redo log' validate;

It was then noted that all the other logfiles were not reporting any errors on block corruption apart from the above. As part of good practice we had another copy of the redo log file in the same group in another location. Hence, the database was shutdown cleanly and the healthy redo log member was copied over the corrupted one and renamed accordingly. The database was then started.
After performing the above, the blockrecover was done again and this time it was successful.

Hope this helps for others out there.