Pranilesh DBA Journey

Monday, 20 August 2012

Change Network settings for Public Interface in 11g R2 RAC

To change the network adaptor name from eth1 to bond1 for public interface, this is what needs to happen:

1. Stop Oracle Clusterware on all the Nodes

# ./crsctl stop crs

2. Start the bond1 Network interface on all the nodes.
# ifup bond1

3. If the eth1 interface is no longer available, the below command will not be starting the VIPs and SCAN Listeners
# crsctl start crs

4. Now lets change the network adaptor name in the gpnp profile so that the CRS will know which network adaptor to use to initiate the private interconnect traffic

# ./gpnptool edit -p=/u01/app/11.2.0.3/grid/gpnp/testserver/profiles/peer/profile.xml.new -o=/u01/app/11.2.0.3/grid/gpnp/testserver/profiles/peer/profile.xml.new -ovr -prf_sq=3 -net1:net_use=public -net1:net_ip=172.20.11.0 -net1:net_ada=bond1

# ./gpnptool sign -p=/u01/app/11.2.0.3/grid/gpnp/testserver/profiles/peer/profile.xml.new -o=/u01/app/11.2.0.3/grid/gpnp/testserver/profiles/peer/profile.xml.new -ovr -w=cw-fs:peer

# ./gpnptool put -p=/u01/app/11.2.0.3/grid/gpnp/uatwlgdb001/profiles/peer/profile.xml.16082012.new

5. Put the changes in the OCR as well
# ./oifcfg setif -global bond1/172.20.11.0:public

6. Stop VIP on the node and make the changes
# ./srvctl stop vip -n testserver
# ./srvctl modify nodeapps -n testserver -A 172.20.11.182/255.255.255.0/bond1

8. Stop the start the clusterware
# ./crsctl stop crs
# ./crsctl start crs

9. Start the clusterware on all the other nodes as well
# ./crsctl start crs

Change Network settings for Private Interconnect in 11g R2 RAC

To change the network adaptor name from eth0 to bond0 for private interconnect, this is what needs to happen:

1. Stop Oracle Clusterware on all the Nodes

# ./crsctl stop crs

2. Start the bond0 Network interface on all the nodes.
# ifup bond0

3. If the eth0 interface is no longer available, the below command will only start the Oracle High Availability Services and other service will still be down.
# crsctl start crs

4. Now lets change the network adaptor name in the gpnp profile so that the CRS will know which network adaptor to use to initiate the private interconnect traffic

# ./gpnptool edit -p=/u01/app/11.2.0.3/grid/gpnp/testserver/profiles/peer/profile.xml.new -o=/u01/app/11.2.0.3/grid/gpnp/testserver/profiles/peer/profile.xml.new -ovr -prf_sq=2 -net2:net_use=cluster_interconnect -net2:net_ip=172.20.15.0 -net2:net_ada=bond0

# ./gpnptool sign -p=/u01/app/11.2.0.3/grid/gpnp/testserver/profiles/peer/profile.xml.new -o=/u01/app/11.2.0.3/grid/gpnp/testserver/profiles/peer/profile.xml.new -ovr -w=cw-fs:peer

# ./gpnptool put -p=/u01/app/11.2.0.3/grid/gpnp/testserver/profiles/peer/profile.xml.new

5. Put the changes in the OCR as well
# ./oifcfg setif -global bond0/172.20.15.0:cluster_interconnect

6. Stop the start the clusterware
# ./crsctl stop crs
# ./crsctl start crs

7. Start the clusterware on all the other nodes as well
# ./crsctl start crs

Error CLSGPNP_DUPLICATE trying put profile

This errors is generated when you are trying to put a profile using gpnptool.

To fix it, firstly open the profile.xml file. The location of this file is at:
$ORA_CRS_HOME/gpnp/<server name>/profiles/peer/

There is a line for ProfileSequence and the value of it defines the current settings. This error heppens when the current setting of ProfileSequence in the profile.xml file is same as the setting in the new file you are trying to put.

Hence to fix it, increase the setting of -prf_sq input in gpnptool command to be higher.
For example:
I am putting a file called /u01/app/11.2.0.3/grid/gpnp/testserver/profiles/peer/profile.xml.new. My current value for ProfileSequence in profile.xml is 3. Hence, the value for -prf_sq parameter should be 4
# ./gpnptool edit -p=/u01/app/11.2.0.3/grid/gpnp/testserver/profiles/peer/profile.xml.new -o=/u01/app/11.2.0.3/grid/gpnp/testserver/profiles/peer/profile.xml.new -ovr -prf_sq=4
# ./gpnptool sign -p=/u01/app/11.2.0.3/grid/gpnp/uatwlgdb001/profiles/peer/profile.xml.new -o=/u01/app/11.2.0.3/grid/gpnp/uatwlgdb001/profiles/peer/profile.xml.new -ovr -w=cw-fs:peer
# ./gpnptool put -p=/u01/app/11.2.0.3/grid/gpnp/uatwlgdb001/profiles/peer/profile.xml.new

The error should not happen now.

Sunday, 29 July 2012

NFS Mount Causes ORA-27086: unable to lock file – already in use

The following error can occur with other errors if you have a database on NFS mount point:
ORA-27086: unable to lock file – already in use

To fix it, you can simply do the following as root user:

Check if 'nfslock' service is running:
# service nfslock status
rpc.statd is stopped

Start the 'nfslock' service:
# /etc/init.d/nfslock start
Starting NFS statd: [ OK ]

Check if 'nfslock' service is running:
# service nfslock status
rpc.statd (pid XXXX) is running...

Ensure the 'nfslock' service start up automatically when the server reboots:
# chkconfig nfslock on

Once the 'nfslock' service is up and running, you should be able to start your database without those locking errors.

Tuesday, 13 December 2011

Block Recovery Fails : ORA-00283: recovery session canceled due to errors ; ORA-00267: name of archived log file not needed ; ORA-06512: at "SYS.DBMS_BACKUP_RESTORE", line 4655 ; ORA-00368: checksum error in redo log block

We had certain blocks in the database getting corrupted due to issues with attached SAN disks. Once the SAN disks were fixed, we were supposed to recover the corrupted blocks in the database and followed the following procedure to recover all the corrupted blocks:

1. Perform database logical check:
run {
allocate channel d1 type disk;
backup check logical validate database;
release channel d1;
}

2. Check whether there are corrupt blocks in the database
select * from v$database_block_corruption;

3. Recover all the corrupted blocks
run {
allocate channel d1 type disk;
blockrecover corruption list;
release channel d1;
}
Note: The syntax to recover blocks indiviually is:
RMAN> BLOCKRECOVER DATAFILE <file#> BLOCK <block#>,<block#> DATAFILE <file#> BLOCK <block#>;
eg. RMAN> BLOCKRECOVER DATAFILE 2 BLOCK 12, 13 DATAFILE 3 BLOCK 5, 98, 99 DATAFILE 4 BLOCK 19;

While the recovery was going, the following error was reported and the recovery canceled:
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of blockrecover command at
RMAN-10035: exception raised in RPC:
ORA-00283: recovery session canceled due to errors
ORA-00267: name of archived log file not needed
ORA-06512: at "SYS.DBMS_BACKUP_RESTORE", line 4655
RMAN-10031: RPC Error: ORA-00283 occurred during call to DBMS_BACKUP_RESTORE.BMRDOMEDIARECOVERY

After doing some analysis, we found that one ofthe current redo log files had a corrupt block reported in the alert log file as well:
ORA-00368: checksum error in redo log block
ORA-00353: log corruption near block 143704 change 7917341986636 time 09/20/2010 22:26:57
ORA-00334: archived log: '/u01/oradata/DB/redo01a.log'

To fix the above, we firstly checked that all the other redo log files were not having any corrupt blocks by running the following against every logfile:
SQL> alter system dump logfile 'full path of redo log' validate;

It was then noted that all the other logfiles were not reporting any errors on block corruption apart from the above. As part of good practice we had another copy of the redo log file in the same group in another location. Hence, the database was shutdown cleanly and the healthy redo log member was copied over the corrupted one and renamed accordingly. The database was then started.
After performing the above, the blockrecover was done again and this time it was successful.

Hope this helps for others out there.

Monday, 14 November 2011

Compressed Tables Improves Query Performance

I performed the following test to see how much performance improvment I could get if Icompress a table and run a full scan on it.

1. Ran the following to get the initial response time of full scan on my table:

SQL> select count(*) from test;

Elapsed: 00:01:33.19

Execution Plan

-------------------------------------------------------------------

-------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 1 | 72313 (1)| 00:14:28 |

| 1 | SORT AGGREGATE | | 1 | | |

| 2 | TABLE ACCESS FULL| TEST | 24M| 72313 (1)| 00:14:28 |

-------------------------------------------------------------------

2. Ran the below PLSQL block against the above table to find the compression ratio for above table. The compression ratio for above table was 2.82

set serveroutput on

declare

pragma autonomous_transaction;

-- sample percentage

pct number := 0.000099;

-- original block count (should be less than 10k)

blkcnt number := 0;

-- compressed block count

blkcntc number;

begin

execute immediate ' create table TEMP_UNCOMPRESSED pctfree 0

as select * from &&tabname where rownum < 1';

while ((pct < 100) and (blkcnt < 1000)) loop

execute immediate 'truncate table TEMP_UNCOMPRESSED';

execute immediate 'insert into TEMP_UNCOMPRESSED select * from &&tabname sample block (' || pct || ',10)';

execute immediate 'select

count(distinct(dbms_rowid.rowid_block_number(rowid)))

from TEMP_UNCOMPRESSED' into blkcnt;

pct := pct * 10;

end loop;

execute immediate 'create table TEMP_COMPRESSED compress as

select * from TEMP_UNCOMPRESSED';

execute immediate 'select

count(distinct(dbms_rowid.rowid_block_number(rowid)))

from TEMP_COMPRESSED' into blkcntc;

execute immediate 'drop table TEMP_COMPRESSED';

execute immediate 'drop table TEMP_UNCOMPRESSED';

dbms_output.put_line('Compression Ratio for &&tabname: '||round(blkcnt/blkcntc,2));

end;

3. Compressed the table:

alter table test move compress;

4. Ran the full scan on the compressed table:

SQL> select count(*) from test;

Elapsed: 00:00:28.16

Execution Plan

--------------------------------------------------------------------

--------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 1 | 22964 (1)| 00:04:36 |

| 1 | SORT AGGREGATE | | 1 | | |

| 2 | TABLE ACCESS FULL| TEST | 25M| 22964 (1)| 00:04:36 |

--------------------------------------------------------------------

With the above test, it is quite clear that if we compress a table with high compression factor, the response time will definitley improve.
Hence, I have a golden rule and that is to compress the tables whose compression factor is more than 2 and the table is quite static. This is very helpful for a datawarehouse environment.

Wednesday, 26 October 2011

Apply Linux Kernel Update with ASMLib configured

Before your actual database server which has Linux running on is patched, you need to ensure that you have downloaded the right asmlib rpm for the new kernel version.

The asmlib rpm can be downloaded from:
http://www.oracle.com/technetwork/server-storage/linux/downloads/rhel5-084877.html

After the kernel update is applied on the server, you need to install the new asmlib rpm as root user:
For example, for the below kernel version, you need to install the following rpm
# uname -r
2.6.18-274.7.1.el5
# rpm -Uvh oracleasm-2.6.18-274.7.1.el5-2.0.5-1.el5.x86_64.rpm

Once the new rpm for the patched kernel is installed, you will need to restart the asm so that the disks can be discovered. To do this, issue the following as root:

# service oracleasm restart
Dropping Oracle ASMLib disks:                              [ OK ]
Shutting down the Oracle ASMLib driver:                    [ OK ]
Initializing the Oracle ASMLib driver:                     [ OK ]
Scanning the system for Oracle ASMLib disks:               [ OK ]
# /etc/init.d/oracleasm listdisks
VOL1
VOL2
VOL3
VOL4
VOL5
VOL6
You should now be able to start your asm instance successfully with all the required diskgroups mounted and ready to go.