3.1.6 Setting Up the Cluster High-Speed Failover Function

Overview

If one of the nodes that configure a cluster system fails and a heartbeat fails, the PRIMECLUSTER shutdown facility forcibly shuts down the failed node.

If the heartbeat fails due to a panic, the failed node collecting a crash dump is forcibly shut down and then crash dump collection ends in the middle. This means that you may not be able to collect information for troubleshooting.

The cluster high-speed failover function prevents a node from being forcibly shut down during crash dump collection, and at the same time, enables the ongoing operations on the failed node to be quickly moved to another node during crash dump collection.

kdump

As shown in the above figure, the cluster high-speed failover function allows for panic status setting and reference through BMC (Baseboard Management Controller) or iRMC when a heartbeat monitoring failure occurs. The node that detects the failure can consider that the other node is stopped and takes over ongoing operation without forcibly shutting down the node that is collecting a crash dump.

Note

If you reset the node that gets panicked during crash dump collection, crash dump collection will fail. Do not reset the node during crash dump collection.
When the node completes collecting the crash dump after it gets panicked, the behavior of the node follows the setting of kdump.

Required setting for the kdump shutdown agent

Configure kdump
When using kdump, it is necessary to configure the kdump.
For details on the configuration procedure, see the manual of your OS.
Note
Configure the kdump again if it is already configured with the installation of Red Hat Enterprise Linux.
Check kdump
[RHEL6]
Check if the kdump is available. If not, enable the kdump using the "runlevel(8)" and "chkconfig(8)" commands.
- Check the current runlevel using the "runlevel(8)" command.
  Example:
```
# /sbin/runlevel
N 3
```
  The above example shows that the current runlevel is 3.
- Check if the kdump is available using the "chkconfig(8)" command.
  Example:
```
# /sbin/chkconfig --list kdump
kdump  0:off 1: off 2: off 3: off 4: off 5: off 6: off
```
  The above example shows that the kdump in the current runlevel 3 is off.
- If the kdump is off in the current runlevel, enable the kdump by executing the "chkconfig(8)" command, and then start the kdump by executing the service command.
```
# /sbin/chkconfig kdump on
# /sbin/service kdump start
```
[RHEL7]
Check if the kdump is available. If not, enable the kdump using the "runlevel(8)" and "systemctl(1)" commands.
- Check the current runlevel using the "runlevel(8)" command.
  Example:
```
# /sbin/runlevel
N 3
```
  The above example shows that the current runlevel is 3.
- Check if the kdump is available using the "systemctl(1)" command.
  Example:
```
# /usr/bin/systemctl list-unit-files --type=service | grep kdump.service
kdump.service                               disabled
```
  The above example shows that the kdump in the current runlevel 3 is disabled.
- If the kdump is disabled in the current runlevel, enable the kdump by executing the "systemctl(1)" command, and then start the kdump.
```
# /usr/bin/systemctl enable kdump.service
# /usr/bin/systemctl start kdump.service
```

Prerequisites for the other shutdown agent settings

After you completed configuring the kdump shutdown agent, set the IPMI (Intelligent Platform Management Interface) or BLADE server.

Information

The IPMI shutdown agent is used with the hardware device in which BMC or iRMC is installed.

Prerequisites for the IPMI shutdown agent settings

Set the following for BMC or iRMC.

IP address
User for the IPMI shutdown agent (*1)

For details, see "User Guide" provided with the hardware and the ServerView Operations Manager manual.

*1) Assign this user as the administrator. Set the user password with seven-bit ASCII characters except the following characters.
> < " / \ = ! ? ; , &

Prerequisites for the Blade shutdown agent settings

Set the following for the BLADE server:

Install ServerView
Set SNMP community for the management blade (*2)
Set an IP address of the management blade

For details, see the operation manual provided with the hardware and the ServerView Operations Manager manual.

*2) When configuring the cluster across multiple chassis, set the same SNMP community for all the management blades.

3.1.6.2 PRIMEQUEST 2000 series

If an error occurs in one of the nodes of the cluster system where PRIMEQUEST 2000 series is used, the PRIMECLUSTER shutdown facility uses the following two methods to detect that error. For details, see "2.3.5 PRIMECLUSTER SF" in "PRIMECLUSTER Concepts Guide."

Node status change detection through MMB units (asynchronous monitoring)
Heartbeat failure between cluster nodes (NSM: node status monitoring) (cyclic monitoring)

The asynchronous monitoring allows node errors to be detected immediately, and failover occurs at a higher speed than when detected by the cyclic monitoring.

As shown in the above figure, if a panic occurs, the cluster control facility uses the MMB units to receive the panic notice. This allows the system to detect the node panic status faster than it would be a heartbeat failure.

See

PRIMEQUEST allows you to set the panic environment so that a crash dump is collected if a panic occurs.

For details about the PRIMEQUEST dump function, setup method, and confirmation method, see the following manuals:

"PRIMEQUEST 2000 Series Installation Manual"
"PRIMEQUEST 2000 Series ServerView Mission Critical Option User Manual"

To use asynchronous monitoring, you must install software that controls the MMB units and specify appropriate settings for the driver. This section describes procedures for installing the MMB control software and setting up the driver, which are required for realizing high-speed failover.

Installing the HBA blockage function and the SVmco
The HBA blockage function and the SVmco report node status changes through the MMB units to the shutdown facility. Install the HBA blockage function and the SVmco before setting up the shutdown facility. For installation instructions, see the following manuals:
- "PRIMEQUEST 2000 Series HBA blockage function USER'S GUIDE"
- "PRIMEQUEST 2000 Series Installation Manual"
- "PRIMEQUEST 2000 Series ServerView Mission Critical Option User Manual"
Setting up the SVmco and the MMB units
The SVmco and the MMB units must be set up so that node status changes are reported properly to the shutdown facility through the MMB units. Set up the SVmco units before setting up the shutdown facility. For setup instructions, see the following manuals:
- "PRIMEQUEST 2000 Series Installation Manual"
- "PRIMEQUEST 2000 Series ServerView Mission Critical Option User Manual"
You must create an RMCP user so that PRIMECLUSTER can link with the MMB units.
In all PRIMEQUEST 2000 instances that make up the PRIMECLUSTER system, be sure to create a user who uses RMCP to control the MMB units. To create a user who uses RMCP to control the MMB units, log in to MMB Web-UI, and create the user from the "Remote Server Management" window of the "Network Configuration" menu. Create the user as shown below.
- Set [Privilege] to "Admin".
- Set [Status] to "Enabled".
Set the user password with seven-bit ASCII characters except the following characters.
```
>  <  "  /  \  =  !  ?  ;  ,  &
```
For details about creating a user who uses RMCP to control the MMB units, see the following manual provided with the unit:
- "PRIMEQUEST 2000 Series Tool Reference"
The user name created here and the specified password are used when the shutdown facility is set up. Record the user name and the password.
Note
The MMB units have two types of users:
- User who controls all MMB units
- User who uses RMCP to control the MMB units
The user created here is the user who uses RMCP to control the MMB units.
Setting up the HBA blockage function
Note
Be sure to carry out this setup when using shared disks.
If a panic occurs, the HBA units that are connected to the shared disks are closed, and I/O processing to the shared disk is terminated. This operation maintains data consistency in the shared disk and enables high-speed failover.
On all the nodes, specify the device paths of the shared disks (GDS device paths if GDS is being used) in the HBA blockage function command, and add the shared disks as targets for which the HBA function is to be stopped. If GDS is being used, perform this setup after completing the GDS setup. For setup instructions, see the following manuals:
- "PRIMEQUEST 2000 Series HBA blockage function USER'S GUIDE"
Setting the I/O completion wait time
To maintain consistent I/O processing to the shared disk if a node failure (panic, etc.) occurs and failover takes place, some shared disk units require a fixed I/O completion wait time, which is the duration after a node failure occurs until the new operation node starts operating.
The initial value of the I/O completion wait time is set to 0 second. However, change the value to an appropriate value if you are using shared disk units that require an I/O completion wait time.
Information
ETERNUS Disk storage systems do not require an I/O completion wait time. Therefore, this setting is not required.
Specify this setting after completing the CF setup. For setting instructions, see "5.1.2.4.5 Setting I/O Completion Wait Time."
Note
If an I/O completion wait time is set, the failover time when a node failure (panic, etc.) occurs increases by that amount of time.

3.1.6.3 PRIMEQUEST 3000 series

If an error occurs in one of the nodes of the cluster system where PRIMEQUEST 3000 series is used, the PRIMECLUSTER shutdown facility uses the following two methods to detect that error. For details, see "2.3.5 PRIMECLUSTER SF" in "PRIMECLUSTER Concepts Guide."

Node status change detection through iRMC/MMB units (asynchronous monitoring)
Heartbeat failure between cluster nodes (NSM: node status monitoring) (cyclic monitoring)

The asynchronous monitoring allows node errors to be detected immediately, and failover occurs at a higher speed than when detected by the cyclic monitoring.

As shown in the above figure, if a panic occurs, the cluster control facility uses the iRMC/MMB units to receive the panic notice. This allows the system to detect the node panic status faster than it would be a heartbeat failure.

See

PRIMEQUEST allows you to set the panic environment so that a crash dump is collected if a panic occurs.

For details about the PRIMEQUEST dump function, setup method, and confirmation method, see the following manuals:

"PRIMEQUEST 3000 Series Installation Manual"

To use the asynchronous monitoring, install the required software and set up the driver appropriately. This section describes how to install the required software and set up the driver to enable the fast switching.

Installing the HBA blockage function
The HBA blockage function reports the node status change through the iRMC/MMB units to the shutdown facility. Install the HBA blockage function before setting up the shutdown facility. For installation instructions, see the following manual:
- "PRIMEQUEST 3000 SERIES HBA blockage function USER'S GUIDE"
Setting up iRMC
iRMC must be set up so that the node status change is reported properly to the shutdown facility through iRMC. Set up iRMC before setting up the shutdown facility. For the setup instructions, see the following manual:
- "PRIMEQUEST 3000 Series Installation Manual"
You must create a user so that PRIMECLUSTER can link with iRMC. On all PRIMEQUEST 3000 instances that make up the PRIMECLUSTER system, make sure to create a user to control iRMC.
Set the user password with seven-bit ASCII characters except the following characters.
```
> < " / \ = ! ? ; , &
```
The created user name and the specified password are used when the shutdown facility is set up. Record the user name and the password.
- PRIMEQUEST 3000 (except B model)
  To create a user to control iRMC, use "set irmc user" command.
  For how to use "set irmc user" command, refer to the following manual page:
  - "PRIMEQUEST 3000 Series Tool Reference (MMB)"
  When configuring the cluster system using the extended partitions, PRIMECLUSTER and iRMC cannot link with each other if VGA/USB/rKVMS of Home SB is "Free". Assign VGA/USB/rKVMS of Home SB to any one of the extended partitions (it can also be an extended partition not configuring the cluster system).
  Refer to the following manual for how to assign VGA/USB/rKVMS to the extended partitions:
  - "PRIMEQUEST 3000 Series Tool Reference (MMB)"
- PRIMEQUEST 3000 B model
  To create a user to control iRMC, log in to iRMC Web Interface and create the user from "User Management" page of "Settings" menu.
  For how to use iRMC Web Interface, refer to the following manual page:
  - "FUJITSU Server PRIMEQUEST 3000 Series Business Model iRMC S5 Web Interface"
Setting up MMB (except B model)
MMB must be set up so that the node status change is reported properly to the shutdown facility through MMB.
You must create the RMCP user so that PRIMECLUSTER can link with the MMB units. On all PRIMEQUEST 3000 instances that make up the PRIMECLUSTER system, make sure to create a user to control the MMB units with RMCP. To create a user to control MMB with RMCP, log in to MMB Web-UI, and create the user from "Remote Server Management" screen of "Network Configuration" menu. Create the user as shown below:
- [Privilege]: "Admin"
- [Status]: "Enabled"
Set the user password with seven-bit ASCII characters except the following characters.
```
> < " / \ = ! ? ; , &
```
For details about creating a user who uses RMCP to control the MMB units, see the following manual provided with the unit:
- "PRIMEQUEST 3000 Series Operation and Management Manual"
The user name created here and the specified password are used when the shutdown facility is set up. Record the user name and the password.
Note
The MMB units have two types of users:
- User who controls all MMB units
- User who uses RMCP to control the MMB units
The user created here is the user who uses RMCP to control the MMB units.
Setting up the HBA blockage function
Note
Be sure to carry out this setup when using shared disks.
If a panic occurs, the HBA units that are connected to the shared disks are closed, and I/O processing to the shared disk is terminated. This operation maintains data consistency in the shared disk and enables high-speed failover.
On all the nodes, specify the device paths of the shared disks (GDS device paths if GDS is being used) in the HBA blockage function command, and add the shared disks as targets for which the HBA function is to be stopped. If GDS is being used, perform this setup after completing the GDS setup. For setup instructions, see the following manuals:
- "PRIMEQUEST 3000 SERIES HBA blockage function USER'S GUIDE"
Setting the I/O completion wait time
To maintain consistent I/O processing to the shared disk if a node failure (panic, etc.) occurs and failover takes place, some shared disk units require a fixed I/O completion wait time, which is the duration after a node failure occurs until the new operation node starts operating.
The initial value of the I/O completion wait time is set to 0 second. However, change the value to an appropriate value if you are using shared disk units that require an I/O completion wait time.
Information
ETERNUS Disk storage systems do not require an I/O completion wait time. Therefore, this setting is not required.
Specify this setting after completing the CF setup. For setting instructions, see "5.1.2.5.5 Setting I/O Completion Wait Time."
Note
If an I/O completion wait time is set, the failover time when a node failure (panic, etc.) occurs increases by that amount of time.