PRIMECLUSTER Installation and Administration Guide 4.2 (Linux) |
Contents
Index
![]() ![]() |
Part 2 Installation | > Chapter 3 Software Installation | > 3.1 Installation and Setup of Related Software |
If heartbeat monitoring fails because of a node failure, PRIMECLUSTER shutdown facility removes the failed node. If this occurs during crash dump collection, you might not be able to acquire information for troubleshooting.
The cluster high-speed failover function prevents node elimination during crash dump collection, and at the same time, enables the ongoing operations on the failed node to be quickly moved to the another node.
The crash dump collection facility varies depending on the version of RHEL being used.
Version of Red Hat Enterprise Linux |
Crash dump collection facility |
---|---|
RHEL-AS3/RHEL-ES3 |
Netdump |
RHEL-AS3 batch correction U05011/ RHEL-ES3 batch correction U05011 |
Netdump or Diskdump |
RHEL-AS4 batch correction U05111 |
Diskdump |
As shown in the above figure, the cluster high-speed failover function sets up and refers to the panic status of the Netdump server if the heartbeat fails. The node that detected a heartbeat error assumes that the failed node enters the Offline mode without forced power-off of the node for which the crash dump is being output, so that this node can take over the transactions.
If the Netdump server stops, a crash dump cannot be collected and the node will be shut down forcibly from the RSB shutdown agent of another node within the cluster system.
If an error (a network failure, etc.) occurs between the Netdump client and the Netdump server during crash dump collection, crash dump collection will be disabled and the node will be shut down forcibly from the RSB shutdown agent of another node configuring the cluster system.
If you reset either the Netdump server or paniced node during crash dump collection, crash dump collection will not be performed correctly. Therefore, do not perform a reset during crash dump collection.
The operation of the panicked node after crash dump collection is determined by the Netdump settings.
Netdump cannot be used with Diskdump.
You must prepare another node, to be used as the Netdump server, independently of the cluster nodes. It must be connected to the LAN for the Netdump server (a dedicated LAN). For example, when you build a cluster system configured with four nodes, you must prepare a total of five nodes, one of which will be used as the Netdump server.
To enable to use the Netdump function, you must first set up the Netdump server and Netdump client.
Confirming the Netdump function
Confirm that the Netdump server function is available. If not, enable it.
Use the "runlevel(8)" command and the "chkconfig(8)" command to confirm the operation.
Confirm the current run level with the "runlevel(8)" command.
(Example) When the following is given, the current run level is 3.
# /sbin/runlevel
N 3
Confirm whether the Netdump server function is available with the "chkconfig(8)" command.
(Example) When the following is given, the Netdump server function at the current run level 3 is Off.
# /sbin/chkconfig --list netdump-server
netdump-server 0:Off 1:Off 2:Off 3:Off 4:Off 5:Off 6:Off
If the Netdump server function is Off at the current run level, change it to On with the "chkconfig(8)" command.
# /sbin/chkconfig netdump-server on
Confirming the NFS function
The Netdump shutdown agent uses NFS. Confirm if NFS is available. If it is not available, make it available.
Use the "runlevel(8)" command and the "chkconfig(8)" command to confirm the operation.
Confirm the current run level with the "runlevel(8)" command.
(Example) When the following is given, the current run level is 3.
# /sbin/runlevel
N 3
Confirm whether the NFS function is available with the "chkconfig(8)" command.
(Example) When the following is given, the NFS function at the current run level 3 is Off.
# /sbin/chkconfig --list nfs
nfs 0:Off 1:Off 2:Off 3:Off 4:Off 5:Off 6:Off
If the NFS function is Off at the current run level, change it to On with the "chkconfig(8)" command.
# /sbin/chkconfig nfs on
Setting to avoid rebooting
The Netdump command is used to reboot a node from which a dump was collected after crash dump collection. Set up the following in "/etc/netdump.conf" to prevent the node from rebooting after dump collection.
noreboot=true
Setting the NFS function
Set up the following in "/etc/exports."
/var/crash/log/netdump_status NodeA(ro,no_root_squash) NodeB(ro,no_root_squash)
In "/var/crash/log/netdump_status," describe all mountable nodes that constitute the cluster system.
Specify the host names of the nodes that constitute the cluster system in NodeA and NodeB.
(Example) When there are three nodes constituting the cluster system, namely, NodeA, NodeB, and NodeC
/var/crash/log/netdump_status NodeA(ro,no_root_squash) NodeB(ro,no_root_squash) NodeC(ro,no_root_squash)
Rebooting the system
Reboot the system.
# shutdown -r now
Confirming the NFS function
Confirm if NFS is available. If it is not available, make it available. This operation must be executed on all the nodes that constitute the cluster system.
Use the "runlevel(8)" command and the "chkconfig(8)" command to confirm the operation.
Confirm the current run level with the "runlevel(8)" command.
(Example) When the following is given, the current run level is 3.
# /sbin/runlevel
N 3
Confirm whether the NFS function is available with the "chkconfig(8)" command.
(Example) When the following is given, the NFS function at the current run level 3 is Off.
# /sbin/chkconfig --list nfs
nfs 0:Off 1:Off 2:Off 3:Off 4:Off 5:Off 6:Off
If the NFS function is Off at the current run level, change it to On with the "chkconfig(8)" command.
# /sbin/chkconfig nfs on
Setting the NFS function
This operation must be executed on all the nodes that constitute the cluster system.
Create the NFS mount point.
Create the mount point (/var/crash/panicinfo). Create the mount point as follows.
# mkdir -m 0444 -p /var/crash/panicinfo
Set /etc/fstab.
Set up the following in "/etc/fstab."
Netdump_server:/var/crash/log/netdump_status /var/crash/panicinfo nfs ro,fg,soft,noac 0 0
Specify an IP address or host name of the Netdump server in Netdump_server.
When the host name is to be set up, configure the IP address of the Netdump server in "/etc/hosts."
(Example)
Node0:/var/crash/log/netdump_status /var/crash/panicinfo nfs ro,fg,soft,noac 0 0
Rebooting the system
Reboot the system.
This operation must be executed on all the nodes that configure the cluster system.
# shutdown -r now
As shown in the above figure, the cluster fast switching function allows for panic status setting and reference through RSB or BMC (Baseboard Management Controller) when a heartbeat monitoring failure occurs. The node that detects the failure can consider that the other node is stopped and takes over ongoing operation without eliminating the node that is collecting crash dump.
If you reboot the node that is collecting crash dump, collection of the crash dump will fail.
When the node completes collecting the crash dump after it gets panicked, the behavior of the node follows the Diskdump setting.
Diskdump cannot be used with Netdump.
Configure Diskdump
When using Diskdump, it is necessary to configure the Diskdump.
Check Diskdump
Check if the Diskdump is available. If not, enable the Diskdump using the "runlevel(8)" and "chkconfig(8)" commands.
Check the current run level using the "runlevel(8)" command.
Example)
# /sbin/runlevel
N 3
The above example shows that the run level is 3.
Check if the Diskdump is available using the "chkconfig(8)" command.
Example)
# /sbin/chkconfig --list diskdump
diskdump 0:off 1: off 2: off 3: off 4: off 5: off 6: off
The above example shows that the Diskdump of the runlevel 3 is currently off.
If the Diskdump is off, enable it by executing the "chkconfig(8)" command.
# /sbin/chkconfig diskdump on
Then, start it by executing the service command.
# /sbin/service diskdump start
After you completed configuring the Netdump shutdown agent or Diskdump shutdown agent, set the remote service board (RSB), IPMI (Inteligent Platform Management Interface) or BLADE server.
Set the following for the remote service board (RSB):
User ID
Password
IP address
For details, see the operation manual provided with the remote service board and the "ServerView User Guide."
Set the following for the IPMI user.
User ID
Password
IP address
For details, see the "User Guide" provided with the hardware and the "ServerView User Guide."
Set the following for the BLADE server:
Install ServerView
Set SNMP community
Set an IP address of the management blade
For details, see the operation manual provided with the hardware and the "ServerView User Guide."
Contents
Index
![]() ![]() |