3.1.6 Setting Up the Cluster High-Speed Failover Function

Overview

If one of the nodes that configure a cluster system fails and a heartbeat fails, the PRIMECLUSTER shutdown facility forcibly shuts down the failed node.

If the heartbeat fails due to a panic, the failed node collecting a crash dump is forcibly shut down and then crash dump collection ends in the middle. This means that you may not be able to collect information for troubleshooting.

The cluster high-speed failover function prevents a node from being forcibly shut down during crash dump collection, and at the same time, enables the ongoing operations on the failed node to be quickly moved to another node during crash dump collection.

kdump

As shown in the above figure, the cluster high-speed failover function allows for panic status setting and reference through BMC (Baseboard Management Controller) or iRMC when a heartbeat monitoring failure occurs. The node that detects the failure can consider that the other node is stopped and takes over ongoing operation without forcibly shutting down the node that is collecting a crash dump.

Note

If you reset the node that is collecting crash dump, collection of the crash dump will fail.
When the node completes collecting the crash dump after it gets panicked, the behavior of the node follows the setting of kdump.

Required setting for the kdump shutdown agent

Configure kdump
When using kdump, it is necessary to configure the kdump.
For details on the configuration procedure, see the manual of your OS.
Note
Configure the kdump again if it is already configured with the installation of Red Hat Enterprise Linux.
Check kdump
[RHEL6]
Check if the kdump is available. If not, enable the kdump using the "runlevel(8)" and "chkconfig(8)" commands.
- Check the current runlevel using the "runlevel(8)" command.
  Example:
```
# /sbin/runlevel
N 3
```
  The above example shows that the current runlevel is 3.
- Check if the kdump is available using the "chkconfig(8)" command.
  Example:
```
# /sbin/chkconfig --list kdump
kdump  0:off 1: off 2: off 3: off 4: off 5: off 6: off
```
  The above example shows that the kdump in the current runlevel 3 is off.
- If the kdump is off in the current runlevel, enable the kdump by executing the "chkconfig(8)" command, and then start the kdump by executing the service command.
```
# /sbin/chkconfig kdump on
# /sbin/service kdump start
```
[RHEL7]
Check if the kdump is available. If not, enable the kdump using the "runlevel(8)" and "systemctl(1)" commands.
- Check the current runlevel using the "runlevel(8)" command.
  Example:
```
# /sbin/runlevel
N 3
```
  The above example shows that the current runlevel is 3.
- Check if the kdump is available using the "systemctl(1)" command.
  Example:
```
# /usr/bin/systemctl list-unit-files --type=service | grep kdump.service
kdump.service                               disabled
```
  The above example shows that the kdump in the current runlevel 3 is disabled.
- If the kdump is disabled in the current runlevel, enable the kdump by executing the "systemctl(1)" command, and then start the kdump.
```
# /usr/bin/systemctl enable kdump.service
# /usr/bin/systemctl start kdump.service
```

Prerequisites for the other shutdown agent settings

After you completed configuring the kdump shutdown agent, set the IPMI (Intelligent Platform Management Interface) or BLADE server.

Information

The IPMI shutdown agent is used with the hardware device in which BMC or iRMC is installed.

Prerequisites for the IPMI shutdown agent settings

Set the following for BMC or iRMC.

IP address
User for the IPMI shutdown agent (*1)

For details, see "User Guide" provided with the hardware and the ServerView Operations Manager manual.

*1) Assign this user as the administrator. Set the user password with seven-bit ASCII characters except the following characters.
> < " / \ = ! ? ; , &

Prerequisites for the Blade shutdown agent settings

Set the following for the BLADE server:

Install ServerView
Set SNMP community for the management blade (*2)
Set an IP address of the management blade

For details, see the operation manual provided with the hardware and the ServerView Operations Manager manual.

*2) When configuring the cluster across multiple chassis, set the same SNMP community for all the management blades.

If an error occurs in one of the nodes of the cluster system where PRIMEQUEST is used, the PRIMECLUSTER shutdown facility uses the two methods described below to detect that error. For details, see "2.3.1.7 PRIMECLUSTER SF" in "PRIMECLUSTER Concepts Guide."

(1) Node status change detection through MMB units (asynchronous monitoring)

(2) Heartbeat failure between cluster nodes (NSM: node status monitoring) (cyclic monitoring)

The asynchronous monitoring of (1) allows node errors to be detected immediately, and failover occurs at a higher speed than when detected by the cyclic monitoring of (2).

As shown in the above figure, if a panic occurs, the cluster control facility uses the MMB units to receive the panic notice. This allows the system to detect the node panic status faster than it would be a heartbeat failure.

See

PRIMEQUEST allows you to set the panic environment so that a crash dump is collected if a panic occurs.

For details about the PRIMEQUEST dump function, setup method, and confirmation method, see the following manuals:

PRIMEQUEST 1000 Series
- "PRIMEQUEST 1000 Series Installation Manual"
- "PRIMEQUEST 1000 Series ServerView Mission Critical Option User Manual"
PRIMEQUEST 2000 Series
- "PRIMEQUEST 2000 Series Installation Manual"
- "PRIMEQUEST 2000 Series ServerView Mission Critical Option User Manual"

To use asynchronous monitoring (1), you must install software that controls the MMB and specify appropriate settings for the driver. This section describes procedures for installing the MMB control software and setting up the driver, which are required for realizing high-speed failover.

Installing the HBA blockage function and the PSA/SVmco
The HBA blockage function and the PSA/SVmco report node status changes through the MMB units to the shutdown facility. Install the HBA blockage function and the PSA/SVmco before setting up the shutdown facility. For installation instructions, see the following manuals:
- PRIMEQUEST 1000 Series
  - "PRIMEQUEST 1000 SERIES HBA blockage function USER'S GUIDE"
  - "PRIMEQUEST 1000 Series Installation Manual"
  - "PRIMEQUEST 1000 Series ServerView Mission Critical Option User Manual"
- PRIMEQUEST 2000 Series
  - "PRIMEQUEST 2000 Series HBA blockage function USER'S GUIDE"
  - "PRIMEQUEST 2000 Series Installation Manual"
  - "PRIMEQUEST 2000 Series ServerView Mission Critical Option User Manual"
Setting up the PSA/SVmco and the MMB units
The PSA/SVmco and MMB must be set up so that node status changes are reported properly to the shutdown facility through the MMB units. Set up the PSA/SVmco units before setting up the shutdown facility. For setup instructions, see the following manuals:
- PRIMEQUEST 1000 Series
  - "PRIMEQUEST 1000 Series Installation Manual"
  - "PRIMEQUEST 1000 Series ServerView Mission Critical Option User Manual"
- PRIMEQUEST 2000 Series
  - "PRIMEQUEST 2000 Series Installation Manual"
  - "PRIMEQUEST 2000 Series ServerView Mission Critical Option User Manual"
You must create an RMCP user so that PRIMECLUSTER can link with the MMB units.
In all PRIMEQUEST instances that make up the PRIMECLUSTER system, be sure to create a user who uses RMCP to control the MMB. To create a user who uses RMCP to control the MMB, log in to MMB Web-UI, and create the user from the "Remote Server Management" window of the "Network Configuration" menu. Create the user as shown below.
- Set [Privilege] to "Admin".
- Set [Status] to "Enabled".
Set the user password with seven-bit ASCII characters except the following characters.
```
>  <  "  /  \  =  !  ?  ;  ,  &
```
For details about creating a user who uses RMCP to control the MMB, see the following manuals:
- PRIMEQUEST 1000 Series
  "PRIMEQUEST 1000 Series Tool Reference"
- PRIMEQUEST 2000 Series
  "PRIMEQUEST 2000 Series Tool Reference"
The user name created here and the specified password are used when the shutdown facility is set up. Record the user name and the password.
Note
The MMB units have two types of users:
- User who controls all MMB units
- User who uses RMCP to control the MMB
The user created here is the user who uses RMCP to control the MMB. Be sure to create the correct type of user.
Setting up the HBA blockage function
Note
Be sure to carry out this setup when using shared disks.
If a panic occurs, the HBA units that are connected to the shared disks are closed, and I/O processing to the shared disk is terminated. This operation maintains data consistency in the shared disk and enables high-speed failover.
On all the nodes, specify the device paths of the shared disks (GDS device paths if GDS is being used) in the HBA blockage function command, and add the shared disks as targets for which the HBA function is to be stopped. If GDS is being used, perform this setup after completing the GDS setup. For setup instructions, see the following manuals:
- PRIMEQUEST 1000 Series
  "PRIMEQUEST 1000 SERIES HBA blockage function USER'S GUIDE"
- PRIMEQUEST 2000 Series
  "PRIMEQUEST 2000 Series HBA blockage function USER'S GUIDE"
Setting the I/O completion wait time
To maintain consistent I/O processing to the shared disk if a node failure (panic, etc.) occurs and failover takes place, some shared disk units require a fixed I/O completion wait time, which is the duration after a node failure occurs until the new operation node starts operating.
The initial value of the I/O completion wait time is set to 0 second. However, change the value to an appropriate value if you are using shared disk units that require an I/O completion wait time.
Information
ETERNUS Disk storage systems do not require an I/O completion wait time. Therefore, this setting is not required.
Specify this setting after completing the CF setup. For setting instructions, see "5.1.2.4.5 Setting I/O Completion Wait Time."
Note
If an I/O completion wait time is set, the failover time when a node failure (panic, etc.) occurs increases by that amount of time.

3.1.6 Setting Up the Cluster High-Speed Failover Function

3.1.6.1 PRIMERGY

3.1.6.2 PRIMEQUEST