Top
PRIMECLUSTER  Installation and Administration Guide 4.4
FUJITSU Software

7.6 CF and RMS Heartbeats

PRIMECLUSTER sends heartbeats to CF and RMS. Each type of heartbeat failure that is detected from CF and RMS respectively and its detection time (default) are as follows.

Table 7.2 Failures detected with a heartbeat and its detection time of heartbeat timeout (CF and RMS))

Failure type detected with a heartbeat

Detection time of heartbeat timeout
(default)

CF

  • System hangs on the kernel layer level

  • All paths failure of cluster interconnects

  • Remote node panics or reset (*1)

10 seconds

RMS

  • System hangs on the user layer (application layer) level

  • RMS abnormal stop of a remote node(*2 and *3)

  • 4.1A40 or earlier
    45 seconds

  • 4.2A00 or later
    600 seconds

(*1): When using the monitoring agent of PRIMECLUSTER, the monitoring agent detects it immediately

(*2): In the environment where the ELM heartbeat (RMS heartbeat) is available, the ELM heartbeat detects it immediately (the ELM heartbeat is available in 4.2A00 or later as default).

(*3): As an example, there is a double fault.

Note

The error detected by a CF heartbeat effects well on the operation. Therefore, the detection time of heartbeat timeout (detection time) is set shorter than RMS detection time.

If you set the detection time of CF shorter than that of RMS, the following warning message is output during RMS startup.

(BM, 4) The CF cluster timeout <cftimeout> exceeds the RMS timeout <rmstimeout>. This may result in RMS node elimination request before CF timeout is exceeded. Please check the CF timeout specified in /etc/default/cluster.config and the RMS heartbeat miss time specified by hvcm '-h' option.