Top
PRIMECLUSTER  Installation and Administration Guide4.5
FUJITSU Software

7.6 CF and RMS Heartbeats

PRIMECLUSTER sends heartbeats to CF and RMS. Each type of heartbeat failure that is detected from CF and RMS respectively and its detection time (default) are as follows.

Table 7.3 Failures detected with a heartbeat and its detection time of heartbeat timeout (CF and RMS))

Failure type detected with a heartbeat

Detection time of heartbeat timeout
(default)

CF

  • System hangs on the kernel layer level

  • All paths failure of cluster interconnects

  • Remote node panics or reset (*1)

10 seconds

RMS

  • System hangs on the user layer (application layer) level

  • RMS abnormal stop of a remote node(*2 and *3)

600 seconds

(*1): When using the monitoring agent of PRIMECLUSTER, the monitoring agent detects it immediately

(*2): The ELM heartbeat (RMS heartbeat) detects it immediately.

(*3): As an example, there is a double fault.

Note

  • The error detected by a CF heartbeat effects well on the operation. Therefore, the detection time of heartbeat timeout (detection time) is set shorter than RMS detection time.

    If you set the detection time of CF shorter than that of RMS, the following warning message is output during RMS startup.

    (BM, 4) The CF cluster timeout <cftimeout> exceeds the RMS timeout <rmstimeout>. This may result in RMS node elimination request before CF timeout is exceeded. Please check the CF timeout specified in /etc/default/cluster.config and the RMS heartbeat miss time specified by hvcm '-h' option.

  • If the following subsystem hang occurs in the configuration where both the I/O fencing function and the ICMP shutdown agent are used, the operational node cannot be panicked nor forcibly stopped. In this case, the operational node cannot be switched to the standby node automatically:

    • The heartbeat of CF or the heartbeat of RMS is lost.

    • The operational node responds to ping in the network route that is specified to the ICMP shutdown agent.

    If these errors occur, take the corrective actions below to switch the application.

    • If the heartbeat of CF is lost

      CF becomes LEFTCLUSTER state. Restore this LEFTCLUSTER state.

      For more information on LEFTCLUSTER state, refer to "PRIMECLUSTER Cluster Foundation Configuration and Administration Guide."

    • If the heartbeat of RMS is lost

      SysNode becomes Wait state. Clear this Wait state.

      To check the status of SysNode, use hvdisp -T <SysNode>.

      For how to clear Wait state, refer to "PRIMECLUSTER Reliant Monitor Services (RMS) with Wizard Tools Configuration and Administration Guide."