Node eviction is quite sometimes happening in Oracle RAC environment on any platform and troubleshooting and finding root cause for node eviction is very important for DBAs to avoid same in the future.

There can be many reasons behind node eviction like ,

  • Network Heartbeat Missed
  • Voting Heartbeat Missed
  • CSSD Agent/ Monitor hung
  • RDBMS instance being hung and leads to node eviction

Above problem causes node eviction , but we can adjust some parameters of CSS (cluster synchronization service) as per our network connectivity.

css misscount in Cluster: The CSS misscount is the maximum time, in seconds, that a cluster heartbeat (messages sent between nodes over the network interconnect or through voting disk; the prime indicator of connectivity), can be missed before entering into a cluster reconfiguration to evict the node.

Two types :

  1. css misscount For Network heartbeat
  2. disk misscount for Disk heartbeat

Default value for css misscount is 30 seconds.

To check value of this parameter

[oracle@db02 ~]$ crsctl get css misscount
CRS-4678: Successful get misscount 60 for Cluster Synchronization Services.

reboottime: The amount of time allowed for a node to complete a reboot after the CSS daemon has been evicted. (i.e. how long does it take for the machine to completely shut-down when you do a reboot -f -n)

Default value 3 seconds

You can check value of reboottime parameter by

[oracle@db02 ~]$ crsctl get css reboottime
CRS-4678: Successful get reboottime 3 for Cluster Synchronization Services.

disktimeout: Disk Heartbeat is internally calculated. The value is different across different releases of Oracle. Disk heart beat time-out should be set maximum time allowed for Voting Disk IO to be completed.

Default value is 200 seconds

To check the value for disktimeout parameter :

[oracle@db02 ~]$ crsctl get css disktimeout
CRS-4678: Successful get disktimeout 200 for Cluster Synchronization Services.

With 11gR2, these settings can be changed online without taking any node down:

1) Execute crsctl as root to modify the misscount:

$CRS_HOME/bin/crsctl set css misscount n
$CRS_HOME/bin/crsctl set css reboottime n
$CRS_HOME/bin/crsctl set css disktimeout n

Thank you for giving your valuable time to read the above information.

If you want to be updated with all our articles send us the Invitation or Follow us:

Skant Gupta’s LinkedIn: www.linkedin.com/in/skantali/

Joel Perez’s LinkedIn: Joel Perez’s Profile

Anuradha’s LinkedIn: Anuradha’s Profile

LinkedIn Group: Oracle Cloud DBAAS

Facebook Page: OracleHelp

About The Author

Comments

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.