Node eviction is quite sometimes happening in Oracle RAC environment on any platform and troubleshooting and finding root cause for node eviction is very important for DBAs to avoid same in the future.
There can be many reasons behind node eviction like ,
- Network Heartbeat Missed
- Voting Heartbeat Missed
- CSSD Agent/ Monitor hung
- RDBMS instance being hung and leads to node eviction
Above problem causes node eviction , but we can adjust some parameters of CSS (cluster synchronization service) as per our network connectivity.
css misscount in Cluster: The CSS misscount is the maximum time, in seconds, that a cluster heartbeat (messages sent between nodes over the network interconnect or through voting disk; the prime indicator of connectivity), can be missed before entering into a cluster reconfiguration to evict the node.
Two types :
- css misscount For Network heartbeat
- disk misscount for Disk heartbeat
Default value for css misscount is 30 seconds.
To check value of this parameter
[oracle@db02 ~]$ crsctl get css misscount CRS-4678: Successful get misscount 60 for Cluster Synchronization Services.
reboottime: The amount of time allowed for a node to complete a reboot after the CSS daemon has been evicted. (i.e. how long does it take for the machine to completely shut-down when you do a reboot -f -n)
Default value 3 seconds
You can check value of reboottime parameter by
[oracle@db02 ~]$ crsctl get css reboottime CRS-4678: Successful get reboottime 3 for Cluster Synchronization Services.
disktimeout: Disk Heartbeat is internally calculated. The value is different across different releases of Oracle. Disk heart beat time-out should be set maximum time allowed for Voting Disk IO to be completed.
Default value is 200 seconds
To check the value for disktimeout parameter :
[oracle@db02 ~]$ crsctl get css disktimeout CRS-4678: Successful get disktimeout 200 for Cluster Synchronization Services.
With 11gR2, these settings can be changed online without taking any node down:
1) Execute crsctl as root to modify the misscount:
$CRS_HOME/bin/crsctl set css misscount n $CRS_HOME/bin/crsctl set css reboottime n $CRS_HOME/bin/crsctl set css disktimeout n
Thank you for giving your valuable time to read the above information.
If you want to be updated with all our articles send us the Invitation or Follow us:
Skant Gupta’s LinkedIn: www.linkedin.com/in/skantali/
Joel Perez’s LinkedIn: Joel Perez’s Profile
Anuradha’s LinkedIn: Anuradha’s Profile
LinkedIn Group: Oracle Cloud DBAAS
Facebook Page: OracleHelp
Suppose if node down, due to some reason.. Can we get reason from other node, cssd or alert file.
Yes, we are preparing different post for the same