This section describes the setup procedure of the shutdown facility for the PRIMERGY, PRIMEQUEST, and virtual machine environment.
The setup procedure for the shutdown facility is different depending on the model/configuration. Check the hardware model/configuration to set up the appropriate shutdown agent.
The following shows the shutdown agents required for each hardware model/configuration.
Server model | SA_lkcd | SA_ipmi | SA_blade |
---|---|---|---|
BX920 series | Y | Y (*1)(*2) | Y |
RX200/300/500/600/2520/2540/4770 series | Y | Y | - |
TX200/300/2540 series | Y | Y | - |
*1) For use in combination with ServerView Resource Orchestrator Virtual Edition in the BX920 series, set SA_ipmi.
*2) The combination of the BMC or iRMC user name and password must be the same on all blades.
Server model | MMB | ||
---|---|---|---|
Panic | Reset | ||
PRIMEQUEST | 1000/2000 series | SA_mmbp | SA_mmbr |
Server model | Cluster configuration | ||||
---|---|---|---|---|---|
Virtual machine function (Xen environment) | |||||
Guests in a unit | Guests in other units | ||||
vmSP | |||||
Panic | Reset | Panic | Reset | ||
PRIMEQUEST | 1000 series | SA_vmSPgp | SA_vmSPgr | SA_vmSPgp | SA_vmSPgr |
Server model | Cluster configuration | ||||
---|---|---|---|---|---|
Virtual machine function (KVM environment) | |||||
Guests in a unit | Guests in other units | ||||
libvirt | |||||
Panic | Reset | Panic | Reset | ||
PRIMERGY | SA_libvirtgp | SA_libvirtgr | SA_libvirtgp | SA_libvirtgr | |
PRIMEQUEST | 1000/2000 series | SA_libvirtgp | SA_libvirtgr | SA_libvirtgp | SA_libvirtgr |
Also, when using Host OS failover function, set the following shutdown agents. The SA_vmSPg, SA_vmSPgr, SA_libvirtgp, and SA_libvirtgr of the shutdown agent that are set on the guest OS are the same as those used in the virtual machine function. For details on SA_vmSPgp, SA_vmSPgr, SA_libvirtgp, and SA_libvirtgr, see "5.1.2.3.1 Setting up the shutdown daemon" and "5.1.2.5.2 libvirt". Set SA_vmchkhost according to the procedures described in "5.1.2.5.3 vmchkhost".
Server model | Cluster configuration | |||||
---|---|---|---|---|---|---|
Virtual machine function (Xen environment) | ||||||
Guests in other units (Host OS failover function) | ||||||
MMB | vmSP | vmchkhost | ||||
Panic | Reset | Panic | Reset | Checking the status | ||
PRIMEQUEST 1000 series | Host OS | SA_ mmbp | SA_mmbr | - | - | - |
Guest OS | - | - | SA_vmSPgp | SA_vmSPgr | SA_vmchkhost |
Server model | Cluster configuration | ||||
---|---|---|---|---|---|
Virtual machine function (KVM environment) | |||||
Guests in other units (Host OS failover function) | |||||
libvirt | vmchkhost | ||||
Depending on server model | Panic | Reset | Checking the status | ||
PRIMERGY | Host OS | See Table 5.2. | - | - | - |
Guest OS | - | SA_libvirtgp | SA_libvirtgr | SA_vmchkhost | |
PRIMEQUEST 1000/2000 series | Host OS | See Table 5.3 | - | - | - |
Guest OS | - | SA_libvirtgp | SA_libvirtgr | SA_vmchkhost |
Note
When creating a redundant administrative LAN used in the shutdown facility by using GLS, use the logical IP address takeover function of the NIC switching mode, and for the administrative LAN used in the shutdown facility, set a physical IP address.
See
For details on the shutdown facility, see the following manuals:
"3.3.1.7 PRIMECLUSTER SF" in the "PRIMECLUSTER Concepts Guide"
"8 Shutdown Facility" in the "PRIMECLUSTER Cluster Foundation (CF) Configuration and Administration Guide"
Check the information of the shutdown agent to be used.
Note
Check the shutdown agent information before cluster initialization.
MMB check items
If an MMB is being used, check the following settings:
The "Privilege" setting of the user is set to "Admin" so that the user can control the MMB with RMCP.
The "Status" setting of the user is set to "Enabled" so that the user can control the MMB with RMCP.
Check the settings for the user who uses RMCP to control the MMB. Log in to MMB Web-UI, and check the settings from the "Remote Server Management" window of the "Network Configuration" menu.
If the above settings have not been set, set up the MMB so that the above settings are set.
Jot down the following information related to the MMB:
User's name for controlling the MMB with RMCP (*1)
User's password for controlling the MMB with RMCP.
*1) The user must be granted the Admin privilege.
Note
The MMB units have two types of users:
User who controls all MMB units
User who uses RMCP to control the MMB
The user to be checked here is the user who uses RMCP to control the MMB. Be sure to check the correct type of user.
Virtual machine check items (Xen environment)
When setting up vmSP (Virtual Machine Service Provider) to the shutdown facility in a Xen environment, log in to the host OS using SSH in order to force stop the guest OS. To do this, you need to set up the following information:
Host OS IP address
User name for logging in to the host OS (FJSVvmSP)
User password for logging in to the host OS
For information on the user name and password for logging in to the host OS, record it you have set up in the following:
When building a cluster system between guest OSes on one host OS, see "3.2.1.2 Host OS setup (after installing the operating system on guest OS)."
When building a cluster system between guest OSes on multiple host OSes without using Host OS failover function, see "3.2.2.2 Host OS setup (after installing the operating system on guest OS)."
When building a cluster system between guest OSes on multiple host OSes using Host OS failover function, see "3.2.3.1.4 Host OS setup (after installing the operating system on guest OS)."
Virtual machine check items (KVM environment)
When setting up the shutdown facility in a KVM environment, log in to the hypervisor using SSH in order to force stop the guest OS. To do this, you need to set up the following information.
IP address for the hypervisor
User for logging in to the hypervisor (*2)
User password for logging in to the hypervisor
*2) In order to execute the command as a root user, you must specify the user who sets the "sudo" command.
For information on the user name and password for logging in to the hypervisor, record it you have set up in the following:
When building a cluster system between guest OSes on one host OS, see "3.2.1.2 Host OS setup (after installing the operating system on guest OS)."
When building a cluster system between guest OSes on multiple host OSes without using Host OS failover function, see "3.2.2.2 Host OS setup (after installing the operating system on guest OS)."
When building a cluster system between guest OSes on multiple host OSes using Host OS failover function, see "3.2.3.1.4 Host OS setup (after installing the operating system on guest OS)."
If the cluster partition occurred due to a fault in the cluster interconnect, all the nodes would still be in the state of accessing the user resources. For details on the cluster partition, see "1.2.2.1 Protecting data integrity" in "PRIMECLUSTER Concepts Guide."
In order to guarantee the data consistency in the user resources, SF must determine the node groups of which nodes remain to survive and which nodes need to be forcibly stopped.
The weight assigned to each node group is referred to as "Survival priority" in PRIMECLUSTER.
The greater the weight of the node, the higher the survival priority. Conversely, the less the weight of the node, the lower the survival priority. If the multiple node groups have the same survival priority, the node group that includes the node with the alphabetical earliest node name will survive.
Survival priority can be calculated based on the following formula:
Survival priority = SF node weight + ShutdownPriority of userApplication
Note
When SF calculates the survival priority, each node will send its survival priority to the remote node via the administrative LAN. If any communication problem of the administrative LAN occurs, the survival priority will not be able to reach. In this case, the survival priority will be calculated only by the SF node weight.
Weight of node. Default value = 1. Set this value while configuring the shutdown facility.
Set this attribute when userApplication is created. For details on how to change the settings, see "8.5 Changing the Operation Attributes of a userApplication."
See
For details on the ShutdownPriority attribute of userApplication, see "12.1 Attributes available to the user" in the "PRIMECLUSTER Reliant Monitor Services (RMS) with Wizard Tools Configuration and Administration Guide."
The typical scenarios that are implemented are shown below:
Set the weight of all nodes to 1 (default).
Set the attribute of ShutdownPriority of all user applications to 0 (default).
Set the "weight" of the node to survive to a value more than double the total weight of the other nodes.
Set the ShutdownPriority attribute of all user applications to 0 (default).
In the following example, node1 is to survive:
Set the "weight" of all nodes to 1 (default).
Set the ShutdownPriority attribute of the user application whose operation is to continue to a value more than double the total of the ShutdownPriority attributes of the other user applications and the weights of all nodes.
In the following example, the node for which app1 is operating is to survive:
Set the "weight" of the node to survive to a value more than double the total weight of the other nodes which have lower priority.
Set the ShutdownPriority attribute of all user applications to 0 (default).
In the following example, node1, node2, node3, and node4 are to survive in this order:
Set the "weight" of nodes to a power-of-two value (1,2,4,8,16,...) in ascending order of survival priority in each cluster system.
The "weight" set to a guest OS should have the same order relation with a corresponding host OS.
For example, when setting a higher survival priority to host1 than host2 between host OSes, set a higher survival priority to node1 (corresponding to host1) than node2-4 (corresponding to host2) between guest OSes.
Set the ShutdownPriority attribute of all user applications to 0 (default).
In the following example, node1, node2, node3, and node4 are to survive in this order:
This section describes the procedure for setting up the shutdown agent in PRIMERGY. To use this in a virtual machine environment, see "5.1.2.5 Setting up the shutdown agent in virtual machine environment."
Note
After setting up the shutdown agent, conduct a test for forced shutdown of cluster nodes to check that the correct nodes can be forcibly stopped. For details of the test for forced shutdown of cluster nodes, see "1.4 Test".
Create /etc/opt/SMAW/SMAWsf/rcsd.cfg on all nodes as shown below:
CFNameX,weight=weight,admIP=myadmIP:agent=SA_xxx,timeout=timeout CFNameX,weight=weight,admIP=myadmIP:agent=SA_xxx,timeout=timeout
CFNameX : Specify the CF node name of the cluster host. weight : Specify the weight of the SF node. myadmIP : Specify the IP address of the administrative LAN on the local node. Available IP addresses are IPv4 and IPv6 address. IPv6 link local addresses are not available. When specifying the IPv6 address, enclose it in blankets "[ ]". (Example: [1080:2090:30a0:40b0:50c0:60d0:70e0:80f0]) SA_xxx : Specify the name of the Shutdown Agent. For the IPMI Shutdown Agent, specify SA_ipmi. For the Blade Shutdown Agent, specify SA_blade. timeout : Specify the timeout duration (seconds) of the Shutdown Agent. For the IPMI Shutdown Agent, specify 25 seconds. For the Blade Shutdown Agent, specify 20 seconds. |
Example: IPMI Shutdown Agent
node1,weight=1,admIP=10.20.30.100:agent=SA_ipmi,timeout=25 node2,weight=1,admIP=10.20.30.101:agent=SA_ipmi,timeout=25
Example: Blade Shutdown Agent
node1,weight=1,admIP=10.20.30.100:agent=SA_blade,timeout=20 node2,weight=1,admIP=10.20.30.101:agent=SA_blade,timeout=20
Note
For IPMI shutdown agent, set timeout to 25.
For using STP (Spanning Tree Protocol) in PRIMERGY, it is necessary to set the SF timeout value to the current value plus (+) 50 (seconds), taking into account the time STP needs to create the tree and an extra cushion. This setting also causes delays in failover times.
Information
When the "/etc/opt/SMAW/SMAWsf/rcsd.cfg" file is to be created, the "/etc/opt/SMAW/SMAWsf/rcsd.cfg.template" file can be used as a prototype.
For the server with the BMC (Baseboard Management Controller) or iRMC (integrated Remote Management Controller) installed, configure the IPMI shutdown agent. You must configure the IPMI shutdown agent before you configure the kdump shutdown agent.
Starting the IPMI service
Execute the following command on all nodes to check the startup status of the IPMI service.
# /sbin/service ipmi status
Execute the following command on all nodes in which the IPMI service is not activated to start the IPMI service.
# /sbin/service ipmi start
Starting ipmi drivers: [ OK ]
Setting the run level of the IPMI service
Execute the following command on all nodes to read the IPMI service on startup.
# /sbin/chkconfig --level 2345 ipmi on
Encrypting the password
Execute the sfcipher command to encrypt passwords of a user for the shutdown facility.
Example: If the password specified when making the IPMI (BMC and iRMC) setting is "bmcpwd$"
# sfcipher -c
Enter User's Password: <- enter bmcpwd$
Re-enter User's Password: <- enter bmcpwd$
/t1hXYb/Wno=
Note: It is not necessary to insert '\' in front of the special characters specified as the password in 4.3A30 or later.
For information on how to use the sfcipher command, see the "sfcipher" manual page.
Note
For the passwords specified when making IPMI (BMC and iRMC), seven-bit ASCII characters are available.
Among them, do not use the following characters as they may cause a problem.
> < " / ¥ = ! ? ; , &
Setting the shutdown agent
Create /etc/opt/SMAW/SMAWsf/SA_ipmi.cfg on all nodes as shown below:
For IPv4 address
CFName1 ip-address:user:passwd {cycle | leave-off} CFName2 ip-address:user:passwd {cycle | leave-off}
For IPv6 address
CFName1 [ip-address]:user:passwd {cycle | leave-off} CFName2 [ip-address]:user:passwd {cycle | leave-off}
CFNameX : Specify the CF node name of the cluster host. ip-address : Specify the Ip address for IPMI (BMC or iRMC). Available IP addresses are IPv4 and IPv6 address. IPv6 link local addresses are not available. When specifying the IPv6 address, enclose it in blankets "[ ]". (Example: [1080:2090:30a0:40b0:50c0:60d0:70e0:80f0]) user : Specify the user name defined when IPMI (BMC or iRMC) was setup. passwd : Password defined when IPMI (BMC or iRMC) was setup. 1.Specify the encrypted password. cycle : Reboot the node after forcibly stopping the node. leave-off : Power-off the node after forcibly stopping the node.
Example 1:
When the IP address of iRMC of node1 is 10.20.30.50, the IP address of iRMC of node2 is 10.20.30.51.
node1 10.20.30.50:root:D0860AB04E1B8FA3 cycle
node2 10.20.30.51:root:D0860AB04E1B8FA3 cycle
Example 2:
When the IP address of iRMC of node1 is 1080:2090:30a0:40b0:50c0:60d0:70e0:80f0, the IP address of iRMC of node2 is 1080:2090:30a0:40b0:50c0:60d0:70e0:80f1.
node1 [1080:2090:30a0:40b0:50c0:60d0:70e0:80f0]:root:D0860AB04E1B8FA3 cycle
node2 [1080:2090:30a0:40b0:50c0:60d0:70e0:80f1]:root:D0860AB04E1B8FA3 cycle
Information
When the "/etc/opt/SMAW/SMAWsf/SA_ipmi.cfg" file is to be created, the "/etc/opt/SMAW/SMAWsf/SA_ipmi.cfg.template" file can be used as a prototype.
Note
Check if the setting contents of the /etc/opt/SMAW/SMAWsf/SA_ipmi.cfg file are correct. If there is an error in the setting contents, the shutdown facility cannot be performed normally.
Check if the IP address (ip-address) of IPMI (BMC or iRMC) corresponding to the cluster host's CF node name (CFNameX) of the /etc/opt/SMAW/SMAWsf/SA_ipmi.cfg file are set. If there is an error in the setting, a different node may be forcibly stopped.
Change the permission of the /etc/opt/SMAW/SMAWsf/SA_ipmi.cfg file to 600 by executing the following command.
# chmod 600 /etc/opt/SMAW/SMAWsf/SA_ipmi.cfg
For the Blade server, configure the Blade shutdown agent. You must configure the Blade shutdown agent before you configure the kdump shutdown agent.
Create /etc/opt/SMAW/SMAWsf/SA_blade.cfg on all nodes as shown below:
(1) Cluster configuration within a single chassis
management-blade-ip IPaddress community-string SNMPcommunity
CFName1 slot-no {cycle | leave-off}
CFName2 slot-no {cycle | leave-off}
(2) Cluster configuration across multiple chassis
community-string SNMPcommunity management-blade-ip IPaddress CFName1 slot-no {cycle | leave-off}
management-blade-ip IPaddress
CFName2 slot-no {cycle | leave-off}
IPaddress : Specify the IP address of the management blade. Available IP addresses are IPv4 and IPv6 address. IPv6 link local addresses are not available. When specifying the IPv6 address, enclose it in blankets "[ ]". (Example: [1080:2090:30a0:40b0:50c0:60d0:70e0:80f0]) SNMPcommunity : Specify the SNMP community. CFNameX : Specify the CF node name of the cluster host. slot-no : Specify the slot No. of the server blade. cycle : Reboot the node after forcibly stopping the node. leave-off : Power-off the node after forcibly stopping the node.
Example 1:
When the IP address of the management blade of node1 and node2 is 10.20.30.50, the slot number of node1 is 1 and the slot number of node2 is 2.
management-blade-ip 10.20.30.50 community-string public
node1 1 cycle
node2 2 cycle
Example 2:
When the IP address of the management blade of node1 is 10.20.30.50, and the slot number of node1 is 1.
Moreover, when the IP address of the management blade of node2 is 10.20.30.51, and the slot number of node2 is 2.
community-string public
management-blade-ip 10.20.30.50
node1 1 cycle
management-blade-ip 10.20.30.51
node2 2 cycle
Information
When the "/etc/opt/SMAW/SMAWsf/SA_blade.cfg" file is to be created, the "/etc/opt/SMAW/SMAWsf/SA_blade.cfg.template" file can be used as a prototype.
Note
Check if the setting contents of the /etc/opt/SMAW/SMAWsf/SA_blade.cfg file are correct. If there is an error in the setting contents, the shutdown facility cannot be performed normally.
Check if the IP address (IPaddress) of the management blade and the slot number (slot-no) of the server blade corresponding to the cluster host's CF node name (CFNameX) of the /etc/opt/SMAW/SMAWsf/SA_blade.cfg file are set. If there is an error in the setting, a different node may be forcibly stopped.
Change the permission of the /etc/opt/SMAW/SMAWsf/SA_blade.cfg file to 600 by executing the following command.
# chmod 600 /etc/opt/SMAW/SMAWsf/SA_blade.cfg
Note
The rcsd.cfg, SA_ipmi.cfg and SA_blade.cfg files must be the same on all nodes. If not, operation errors might occur.
Set up the kdump shutdown agent to collect the crash dump.
Note
On PRIMERGY (except virtual machine environment), make sure to set the kdump shutdown agent.
To use the kdump shutdown agent on PRIMERGY, extra_modules option cannot be used in /etc/kdump.conf, which is the configuration file of kdump, in RHEL5 environment.
Initializing the configuration file for the kdump
Execute the following command on any one of the cluster nodes.
# /etc/opt/FJSVcllkcd/bin/panicinfo_setup
Note
To execute the command, CF and CF services (CFSH and CFCP) must be activated. For details, see "5.1.1 Setting Up CF and CIP."
Setting crash dump collection
The procedures for setting up may differ depending on the hardware used for the node.
PRIMERGY RX200/300/500/600/2520/2540/4770 series, TX200/300/2540 series, and BX920 series (used in combination with ServerView Resource Orchestrator Virtual Edition.)
Change /etc/opt/FJSVcllkcd/etc/SA_lkcd.tout as follows on all nodes.
Before change
PANICINFO_TIMEOUT 5 RSB_PANIC 0
After change
PANICINFO_TIMEOUT 10 RSB_PANIC 3
Change the timeout value of SA_lkcd in the /etc/opt/SMAW/SMAWsf/rcsd.cfg file as follows on all nodes.
Before change
agent=SA_lkcd,timeout=20
After change
agent=SA_lkcd,timeout=25
BLADE servers (BX920 series)
Change RSB_PANIC of /etc/opt/FJSVcllkcd/etc/SA_lkcd.tout as follows on all nodes.
Before change
RSB_PANIC 0
After change
RSB_PANIC 2
Start or restart the shutdown facility on all nodes.
If the shutdown daemon (rcsd) has not yet been started
Start the shutdown daemon (rcsd) with sdtool -b.
# sdtool -b
If the shutdown daemon (rcsd) is active
Stop the shutdown daemon (rcsd) with sdtool -e and then start it with sdtool -b.
# sdtool -e # sdtool -b
Use sdtool -s to confirm whether the shutdown daemon (rcsd) is active.
# sdtool -s
By executing sdtool -s on all nodes, the composition of the shutdown facility can be confirmed.
Note
Confirm the shutdown facility operates normally by the display result of the sdtool -s command.
There is a possibility that the mistake is found in the configuration setting of the agent or hardware when displayed as follows though the setting of the shutdown facility is completed.
"InitFailed" is displayed as the initial status.
"Unknown" or "TestFailed" is displayed as the test status.
Confirm whether the error message is output to/var/log/messages file. Then, take corrective actions according to the content of the output message.
This section describes the procedure for setting up the shutdown agent in PRIMEQUEST. To use this in a virtual machine environment, see "5.1.2.5 Setting up the shutdown agent in virtual machine environment."
Note
After setting up the shutdown agent, conduct a test for forced shutdown of cluster nodes to check that the correct nodes can be forcibly stopped. For details of the test for forced shutdown of cluster nodes, see "1.4 Test".
This section describes the procedure for setting up the MMB in the shutdown facility.
Check the information of the shutdown agent before setting up the shutdown facility.
Setting up the MMB Shutdown Facility
Note
Carry out the MMB information registration described here after "5.1.1 Setting Up CF and CIP" and before "Setting Up the Shutdown Daemon", which is described later.
Execute the "clmmbsetup -a" command on all nodes, and register the MMB information.
For instructions on using the "clmmbsetup" command, see the "clmmbsetup" manual page.
# /etc/opt/FJSVcluster/bin/clmmbsetup -a mmb-user Enter User's Password: Re-enter User's Password:
For mmb-user and User's Password, enter the following values that were checked in "5.1.2.1 Checking the Shutdown Agent Information."
User's name for controlling the MMB with RMCP
User's password for controlling the MMB with RMCP.
Note
Use alphanumerics only to set User's password. Do not use symbols.
Execute the "clmmbsetup -l" command on all nodes, and check the registered MMB information.
If the registered MMB information was not output on all nodes in Step 1, start over from Step 1.
# /etc/opt/FJSVcluster/bin/clmmbsetup -l
cluster-host-name user-name
-----------------------------------
node1 mmb-user
node2 mmb-user
On all nodes, create /etc/opt/SMAW/SMAWsf/rcsd.cfg with the following information:
CFNameX,weight=weight,admIP=myadmIP:agent=SA_xxx,timeout=timeout CFNameX,weight=weight,admIP=myadmIP:agent=SA_xxx,timeout=timeout
CFNameX : Specify the CF node name of the cluster host. weight : Specify the weight of the SF node. myadmIP : Specify the IP address of the administrative LAN for the local node. Available IP addresses are IPv4 and IPv6 address. IPv6 link local addresses are not available. When specifying the IPv6 address, enclose it in blankets "[ ]". (Example: [1080:2090:30a0:40b0:50c0:60d0:70e0:80f0]) SA_xxx : Specify the name of the Shutdown Agent. To set the node to panic status through the MMB Specify "SA_mmbp". To reset the node through the MMB Specify "SA_mmbr". timeout : Specify the timeout duration (seconds) of the Shutdown Agent. Specify 20 seconds for "SA_mmbp" and "SA_mmbr".
Example) Shown below is a setup example for a 2-node configuration.
# cat /etc/opt/SMAW/SMAWsf/rcsd.cfg
node1,weight=2,admIP=fuji2:agent=SA_mmbp,timeout=20:agent=SA_mmbr,timeout=20
node2,weight=2,admIP=fuji3:agent=SA_mmbp,timeout=20:agent=SA_mmbr,timeout=20
Note
For the shutdown agents to be specified in the rcsd.cfg file, set both the SA_mmbp and SA_mmbr shutdown agents in that order.
Set the same contents in the rcsd.cfg file on all nodes. Otherwise, a malfunction may occur.
Information
When creating the /etc/opt/SMAW/SMAWsf/rcsd.cfg file, you can use the /etc/opt/SMAW/SMAWsf/rcsd.cfg.mmb.template file as a template.
Starting the MMB asynchronous monitoring daemon
Check that the MMB asynchronous monitoring daemon has been started on all nodes.
# /etc/opt/FJSVcluster/bin/clmmbmonctl
If "The devmmbd daemon exists." is displayed, the MMB asynchronous monitoring daemon has been started.
If "The devmmbd daemon does not exist." is displayed, the MMB asynchronous monitoring daemon has not been started. Execute the following command to start the MMB asynchronous monitoring daemon.
# /etc/opt/FJSVcluster/bin/clmmbmonctl start
Starting the shutdown facility.
Check that the shutdown facility has been started on all nodes.
# sdtool -s
If the shutdown facility has already been started, execute the following command to restart the shutdown facility on all nodes.
# sdtool -e # sdtool -b
If the shutdown facility has not been started, execute the following command to start the shutdown facility on all nodes.
# sdtool -b
Checking the status of the shutdown facility
Check the status of the shutdown facility on all nodes.
# sdtool -s
Information
Display results of the sdtool -s command
If "InitFailed" is displayed as the initial status, it means that a problem occurred during initialization of that shutdown agent.
If "TestFailed" is displayed as the test status, it means that a problem occurred while the agent was testing whether or not the node displayed in the cluster host field could be stopped. Some sort of problem probably occurred in the software, hardware, or network resources being used by that agent.
If "Unknown" is displayed as the stop or initial status, it means that the SF has still not executed node stop, path testing, or SA initialization. "Unknown" is displayed temporarily until the actual status can be confirmed.
If "TestFailed" or "InitFailed" is displayed, check the SA log file or /var/log/messages. The log file records the reason why SA testing or initialization failed. After the failure-causing problem is resolved and SF is restarted, the status display changes to InitWorked or TestWorked.
Note
If "TestFailed" is displayed and the message 7210 is output to /var/log/messages at the same time when "sdtool -s" is executed after the shutdown facility was started, there may be an error in the settings as described below.
Make sure each setting is correctly set.
7210 An error was detected in MMB. (node:nodename mmb_ipaddress1:mmb_ipaddress1 mmb_ipaddress2:mmb_ipaddress2 node_ipaddress1:node_ipaddress1 node_ipaddress2:node_ipaddress2 status:status detail:detail)
PSA/Svmco is not installed or not set.
A node is not restarted after installing Svmco manually.
Incorrect PSA/Svmco settings
Example: An incorrect IP address (such as MMB IP address) is set to the IP address of the administrative LAN.
Necessary firewall to activate PSA/Svmco is not set.
Incorrect MMB settings
Example 1: An incorrect IP address is set.
Example 2: Both the virtual IP address of MMB and the physical IP address of MMB are not set.
If "sdtool -s" is executed immediately after the OS is started, "TestFailed" may be displayed in the Test State for the local node. However, this state is displayed because the snmptrapd daemon is still being activated and does not indicate a malfunction. If "sdtool -s" is executed 10 minutes after the shutdown facility is started, TestWorked is displayed in the Test State.
In the following example, "TestFailed" is displayed in the Test State for the local node (node1).
# sdtool -s
Cluster Host Agent SA State Shut State Test State Init State
------------ ----- -------- ---------- ---------- ----------
node1 SA_mmbp.so Idle Unknown TestFailed InitWorked
node1 SA_mmbr.so Idle Unknown TestFailed InitWorked
node2 SA_mmbp.so Idle Unknown TestWorked InitWorked
node2 SA_mmbr.so Idle Unknown TestWorked InitWorked
The following messages may be displayed right after the OS is started by same reason as previously described.
3084: Monitoring another node has been stopped. SA SA_mmbp.so to test host nodename failed SA SA_mmbr.so to test host nodename failed
These messages are also displayed because the snmptrapd daemon is being activated and does not indicate a malfunction. The following message is displayed 10 minutes after the shutdown facility is started.
3083: Monitoring another node has been started.
If "sdtool -s" is executed when MMB asynchronous monitoring daemon is started for the first time, "TestFailed" may be displayed. This is a normal behavior because the settings are synchronizing between nodes. If "sdtool -s" is executed 10 minutes after the shutdown facility is started, "TestWorked "is displayed in Test State field.
If nodes are forcibly stopped by the SA_mmbr shutdown agent, the following messages may be displayed. These are displayed because it takes time to stop the nodes and do not indicate a malfunction.
Fork SA_mmbp.so(PID pid) to shutdown host nodename : SA SA_mmbp.so to shutdown host nodename failed : Fork SA_mmbr.so(PID pid) to shutdown host nodename : SA SA_mmbr.so to shutdown host nodename failed : MA SA_mmbp.so reported host nodename leftcluster, state MA_paniced_fsnotflushed : MA SA_mmbr.so reported host nodename leftcluster, state MA_paniced_fsnotflushed : Fork SA_mmbp.so(PID pid) to shutdown host nodename : SA SA_mmbp.so to shutdown host nodename succeeded
If "sdtool -s" is executed after the messages above were displayed, KillWorked is displayed in the Shut State for the SA_mmbp.so. Then, KillFailed is displayed in the Shut State for the SA_mmbr.so.
The following is the example of "sdtool -s" when the nodes (from node1 to node2) were forcibly stopped and the messages above were displayed.
# sdtool -s
Cluster Host Agent SA State Shut State Test State Init State
------------ ----- -------- ---------- ---------- ----------
node1 SA_mmbp.so Idle Unknown TestWorked InitWorked
node1 SA_mmbr.so Idle Unknown TestWorked InitWorked
node2 SA_mmbp.so Idle KillWorked TestWorked InitWorked
node2 SA_mmbr.so Idle KillFailed TestWorked InitWorked
To recover KillFailed displayed by "sdtool -s," perform the following procedure.
# sdtool -e
# sdtool -b
# sdtool -s Cluster Host Agent SA State Shut State Test State Init State ------------ ----- -------- ---------- ---------- ---------- node1 SA_mmbp.so Idle Unknown TestWorked InitWorked node1 SA_mmbr.so Idle Unknown TestWorked InitWorked node2 SA_mmbp.so Idle Unknown TestWorked InitWorked node2 SA_mmbr.so Idle Unknown TestWorked InitWorked
Setting the I/O Completion Wait Time
Set the wait time until I/O completion (WaitForIOComp) during failover triggered by a node failure (panic, etc.) according to the procedure described below.
Prechecking the shared disk
The standard setting for the I/O completion wait time during failover triggered by a node failure (for example, if a panic occurs during MMB asynchronous monitoring) is 0 seconds. However, if a shared disk that requires an I/O completion wait time is being used, this setting must be set to an appropriate value.
Information
ETERNUS Disk storage systems do not require an I/O completion wait time. Therefore, this setting is not required.
Note
If an I/O completion wait time is set, the failover time when a node failure (panic, etc.) occurs increases by that amount of time.
Setting the I/O completion wait time
Execute the following command, and set the wait time until I/O completion (WaitForIOComp) during failover triggered by a node failure (panic, etc.). For details about the "cldevparam" command, see the "cldevparam" manual page.
Execute the command in any node that is part of the cluster system.
# /etc/opt/FJSVcluster/bin/cldevparam -p WaitForIOComp value
Alternatively, execute the following command and check the setting of the wait time until I/O processing is completed (WaitForIOComp).
# /etc/opt/FJSVcluster/bin/cldevparam -p WaitForIOComp value
Starting the shutdown facility
Check that the shutdown facility has been started on all nodes.
# sdtool -s
If the shutdown facility has already been started, execute the following command to restart the shutdown facility on all nodes:
# sdtool -r
If the shutdown facility has not been started, execute the following command to start the shutdown facility on all nodes.
# sdtool -b
Checking the status of the shutdown facility
Check the status of the shutdown facility on all nodes.
# sdtool -s
This section describes the procedure for setting up the shutdown agent in a virtual machine environment.
Note
After setting up the shutdown agent, conduct a test for forced shutdown of cluster nodes to check that the correct nodes can be forcibly stopped. For details of the test for forced shutdown of cluster nodes, see "1.4 Test".
This section describes the procedure for setting vmSP (Virtual Machine Service Provider) as the shutdown facility in a Xen environment.
Be sure to perform "5.1.2.1 Checking the Shutdown Agent Information" before setting up the shutdown facility.
Note
Be sure to perform the following operations from 1. to 6. on all guest OSes (nodes).
Encrypt the password
Execute the sfcipher command to encrypt passwords for the account FJSVvmSP of all host OSes, where guest OSes set as cluster nodes exist.
For details on how to use the sfcipher command, see the manual page of "sfcipher."
# sfcipher -c Enter User's Password: Re-enter User's Password: D0860AB04E1B8FA3 #
Setup the shutdown agent
Set the shutdown agent. When the shutdown agent for PANIC (SA_vmSPgp) is used, create /etc/opt/SMAW/SMAWsf/SA_vmSPgp.cfg, and when the shutdown agent for RESET (SA_vmSPgr) is used, create /etc/opt/SMAW/SMAWsf/SA_vmSPgr.cfg as below.
CFNameX domainX ip-address user passwd CFNameX domainX ip-address user passwd
CFNameX : Specify the CF node name of the cluster host. domainX : Specify the guest OS domain name. ip-address : Specify the IP address of the host OS. Available IP address is IPv4 address. user : Specify the account FJSVvmSP of the host OS. passwd : A login password of the account FJSVvmSP of the host OS. Specify the encrypted password that you have checked in 1.
Example) The following is a setup example.
When the guest OS domain name of node1 is domain1, and the IP address of the host OS on which node1 operates is 10.20.30.50.
Moreover, when the guest OS domain name of node2 is domain2, and the IP address of the host OS on which node2 operates is 10.20.30.51.
# cat /etc/opt/SMAW/SMAWsf/SA_vmSPgp.cfg
node1 domain1 10.20.30.50 FJSVvmSP D0860AB04E1B8FA3
node2 domain2 10.20.30.51 FJSVvmSP D0860AB04E1B8FA3
# cat /etc/opt/SMAW/SMAWsf/SA_vmSPgr.cfg
node1 domain1 10.20.30.50 FJSVvmSP D0860AB04E1B8FA3
node2 domain2 10.20.30.51 FJSVvmSP D0860AB04E1B8FA3
Note
Check if the setting contents of the /etc/opt/SMAW/SMAWsf/SA_vmSPgp.cfg file and the /etc/opt/SMAW/SMAWsf/SA_vmSPgr.cfg file are correct. If there is an error in the setting contents, the shutdown facility cannot be performed normally.
Check if the domain name (domainX) of the guest OS and the IP address (ip-address) of the host OS corresponding to the cluster host's CF node name (CFNameX) of the /etc/opt/SMAW/SMAWsf/SA_vmSPgp.cfg file and the /etc/opt/SMAW/SMAWsf/SA_vmSPgr.cfg file are set. If there is an error in the setting, a different node may be forcibly stopped.
Log in to the host OS
The shutdown facility accesses the target node with SSH. Therefore, you need to authenticate yourself (create the RSA key) in advance, which is required when using SSH for the first time.
On all guest OSes (nodes), log in to each host OS IP address (ip-address) set in the step 2. using each host OS user name (user) set in the step 2.
# ssh -l FJSVvmSP XXX.XXX.XXX.XXX The authenticity of host 'XXX.XXX.XXX.XXX (XXX.XXX.XXX.XXX)' can't be established. RSA key fingerprint is xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx. Are you sure you want to continue connecting (yes/no)? yes <- "Enter yes." #
Setup the shutdown daemon
Create /etc/opt/SMAW/SMAWsf/rcsd.cfg as below.
CFNameX,weight=weight,admIP=myadmIP:agent=SA_xxxx,timeout=timeout CFNameX,weight=weight,admIP=myadmIP:agent=SA_xxxx,timeout=timeout
CFNameX : Specify the CF node name of the cluster host. weight : Specify the weight of the SF node. myadmIP : Specify the IP address of the administrative LAN for your guest OS (node). SA_xxxx : Specify the name of the Shutdown Agent. Here, "SA_vmSPgp" or "SA_vmSPgr" is specified. timeout : Specify the timeout duration (seconds) of the Shutdown Agent. Specify 35 seconds for "SA_vmSPgp" and "SA_vmSPgr".
Example) The following is a setup example.
# cat /etc/opt/SMAW/SMAWsf/rcsd.cfg
node1,weight=2,admIP=fuji2:agent=SA_vmSPgp,timeout=35:agent=SA_vmSPgr,timeout=35
node2,weight=2,admIP=fuji3:agent=SA_vmSPgp,timeout=35:agent=SA_vmSPgr,timeout=35
Note
For the shutdown agent set by the rcsd.cfg file, set both shutdown agents in the order of SA_vmSPgp to SA_vmSPgr.
The contents of the SA_vmSPgp.cfg, SA_vmSPgr.cfg, and rcsd.cfg files of all guest OSes (nodes) should be identical. If not, a malfunction will occur.
Start the shutdown facility
Check that the shutdown facility has already been started on all the nodes.
# sdtool -s
If the shutdown facility has already been started, execute the following on all the nodes to restart it.
# sdtool -e # sdtool -b
If the shutdown facility has not been started, execute the following on all the nodes to start it.
# sdtool -b
Check the state of the shutdown facility
Check the state of the shutdown facility.
# sdtool -s
Information
About the displayed results
If "InitFailed" is displayed as the initial status, it means that a problem occurred during initialization of that shutdown agent.
If "TestFailed" is displayed as the test status, it means that a problem occurred while the agent was testing whether or not the node displayed in the cluster host field could be stopped. Some sort of problem probably occurred in the software, hardware, network resources, or the host OS being used by that agent.
When the maximum concurrent connections for SSH are "the number of cluster nodes" or less, the status of the shutdown facility may be displayed as InitFailed or TestFailed. Change the configuration to set up the maximum concurrent connections for SSH to be "the number of cluster nodes + 1"or more.
If "Unknown" is displayed as the stop or initial status, it means that the SF has still not executed node stop, path testing, or SA initialization. "Unknown" will be displayed temporarily until the actual status can be confirmed.
If "TestFailed" or "InitFailed" is displayed, check the SA log file or /var/log/messages. The log file records the reason why SA testing or initialization failed. After the failure-causing problem is resolved and SF is restarted, the status display changes to InitWorked or TestWorked.
This section describes the procedure for setting libvirt as the shutdown facility in a KVM environment.
Be sure to perform "5.1.2.1 Checking the Shutdown Agent Information" before setting up the shutdown facility.
Note
Be sure to perform the following operations from 1. to 6. on all guest OSes (nodes).
Encrypt the password
Execute the sfcipher command to encrypt the password for a user for the shutdown facility.
For details on how to use the sfcipher command, see the manual page of "sfcipher."
# sfcipher -c Enter User's Password: Re-enter User's Password: D0860AB04E1B8FA3 #
Setup the shutdown agent
Set the shutdown agent. When the shutdown agent for PANIC (SA_libvirtgp) is used, create /etc/opt/SMAW/SMAWsf/SA_libvirtgp.cfg, and when the shutdown agent for RESET (SA_libvirtgr) is used, create /etc/opt/SMAW/SMAWsf/SA_libvirtgr.cfg as below.
CFNameX domainX ip-address user passwd CFNameX domainX ip-address user passwd
CFNameX : Specify the CF node name of the cluster host. domainX : Specify the guest OS domain name. ip-address : Specify the IP address of the hypervisor. Available IP addresses are IPv4 and IPv6 address. IPv6 link local addresses are not available. user : Account of the hypervisor. Specify the user for the shutdown facility. passwd : A login password of the account specified by "user". Specify the encrypted password that you have checked in 1.
Example) The following is a setup example.
When the guest OS domain name of node1 is domain1, and the IP address of the hypervisor on which node1 operates is 10.20.30.50.
Moreover, when the guest OS domain name of node2 is domain2, and the IP address of the hypervisor on which node2 operates is 10.20.30.51.
# cat /etc/opt/SMAW/SMAWsf/SA_libvirtgp.cfg
node1 domain1 10.20.30.50 user D0860AB04E1B8FA3
node2 domain2 10.20.30.51 user D0860AB04E1B8FA3
# cat /etc/opt/SMAW/SMAWsf/SA_libvirtgr.cfg
node1 domain1 10.20.30.50 user D0860AB04E1B8FA3
node2 domain2 10.20.30.51 user D0860AB04E1B8FA3
Note
Check if the setting contents of the /etc/opt/SMAW/SMAWsf/SA_libvirtgp.cfg file and the /etc/opt/SMAW/SMAWsf/SA_libvirtgr.cfg file are correct. If there is an error in the setting contents, the shutdown facility cannot be performed normally.
Check if the domain name (domainX) of the guest OS and the IP address (ip-address) of the hypervisor corresponding to the cluster host's CF node name (CFNameX) of the /etc/opt/SMAW/SMAWsf/SA_libvirtgp.cfg file and the /etc/opt/SMAW/SMAWsf/SA_libvirtgr.cfg file are set. If there is an error in the setting, a different node may be forcibly stopped.
After forcibly stopping the node (guest OS) by SA_libvirtgp, the state of the guest OS may remain temporarily stopped (when there is no sufficient disk space in /var/crash on the host OS, and so on.). In this case, forcibly stop the guest OS with the virsh destroy command.
Log in to the hypervisor
The shutdown facility accesses the target node with SSH. Therefore, you need to authenticate yourself (create the RSA key) in advance, which is required when using SSH for the first time.
On all guest OSes (nodes), log in to each hypervisor IP address (ip-address) set in the step 2. using each user for the shutdown facility.
# ssh -l user XXX.XXX.XXX.XXX The authenticity of host 'XXX.XXX.XXX.XXX (XXX.XXX.XXX.XXX)' can't be established. RSA key fingerprint is xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx. Are you sure you want to continue connecting (yes/no)? yes <- "Enter yes." #
Setup the shutdown daemon
Create /etc/opt/SMAW/SMAWsf/rcsd.cfg as below.
CFNameX,weight=weight,admIP=myadmIP:agent=SA_xxxx,timeout=timeout CFNameX,weight=weight,admIP=myadmIP:agent=SA_xxxx,timeout=timeout
CFNameX : Specify the CF node name of the cluster host. weight : Specify the weight of the SF node. myadmIP : Specify the IP address of the administrative LAN for your guest OS (node). Available IP addresses are IPv4 and IPv6 address. IPv6 link local addresses are not available. When specifying the IPv6 address, enclose it in blankets "[ ]". (Example: [1080:2090:30a0:40b0:50c0:60d0:70e0:80f0]) SA_xxxx : Specify the name of the Shutdown Agent. Here, "SA_libvirtgp" or "SA_libvirtgr" is specified. timeout : Specify the timeout duration (seconds) of the Shutdown Agent. Specify 35 seconds for "SA_libvirtgp" and "SA_libvirtgr".
Example) The following is a setup example.
# cat /etc/opt/SMAW/SMAWsf/rcsd.cfg
node1,weight=1,admIP=10.20.30.100:agent=SA_libvirtgp,timeout=35:agent=SA_libvirtgr,timeout=35
node2,weight=1,admIP=10.20.30.101:agent=SA_libvirtgp,timeout=35:agent=SA_libvirtgr,timeout=35
Note
For the shutdown agent set by the rcsd.cfg file, set both shutdown agents in the order of SA_libvirtgp to SA_libvirtgr.
The contents of the SA_libvirtgp.cfg, SA_libvirtgr.cfg, and rcsd.cfg files of all guest OSes (nodes) should be identical. If not, a malfunction will occur.
Start the shutdown facility
Check that the shutdown facility has already been started on all the nodes.
# sdtool -s
If the shutdown facility has already been started, execute the following on all the nodes to restart it.
# sdtool -e # sdtool -b
If the shutdown facility has not been started, execute the following on all the nodes to start it.
# sdtool -b
Check the state of the shutdown facility
Check the state of the shutdown facility.
# sdtool -s
Information
About the displayed results
If "InitFailed" is displayed as the initial status, it means that a problem occurred during initialization of that shutdown agent.
If "TestFailed" is displayed as the test status, it means that a problem occurred while the agent was testing whether or not the node displayed in the cluster host field could be stopped. Some sort of problem probably occurred in the software, hardware, network resources, or the host OS being used by that agent.
When the maximum concurrent connections for SSH are "the number of cluster nodes" or less, the status of the shutdown facility may be displayed as InitFailed or TestFailed. Change the configuration to set up the maximum concurrent connections for SSH to be "the number of cluster nodes + 1"or more.
If "Unknown" is displayed as the stop or initial status, it means that the SF has still not executed node stop, path testing, or SA initialization. "Unknown" will be displayed temporarily until the actual status can be confirmed.
If "TestFailed" or "InitFailed" is displayed, check the SA log file, /var/log/messages, or /etc/sysconfig/libvirt-guests. The log file records the reason why SA testing or initialization failed. For /etc/sysconfig/libvirt-guests, check whether the following settings are made:
When building a cluster system between guest OSes on one host OS, see "3.2.1.2 Host OS setup (after installing the operating system on guest OS)."
When building a cluster system between guest OSes on multiple host OSes without using Host OS failover function, see "3.2.2.2 Host OS setup (after installing the operating system on guest OS)."
When building a cluster system between guest OSes on multiple host OSes using Host OS failover function, see "3.2.3.1.4 Host OS setup (after installing the operating system on guest OS)."
After the failure-causing problem is resolved and SF is restarted, the status display changes to InitWorked or TestWorked.
This section describes the procedure for setting up vmchkhost (Host OS check) as the shutdown agent in a virtual machine environment.
Set up this setting after setting vmSP (Virtual Machine Service Provider) or libvirt in the shutdown facility.
Note
Be sure to perform the following operations from 1. to 6. on all guest OSes (nodes).
Information
About a log file
A log file of the vmchkhost shutdown agent is output at the following:
/var/opt/SMAWsf/log/SA_vmchkhost.log
Encrypt the password
In a Xen environment, use the encrypted passwords for the account FJSVvmSP of all host OSes, which were used when you set vmSP (Virtual Machine Service Provider) to the shutdown facility.
In a KVM environment, use the encrypted passwords of general users for the shutdown facility, which were used when you set libvirt to the shutdown facility.
Set up the shutdown agent.
Create /etc/opt/SMAW/SMAWsf/SA_vmchkhost.cfg as described in the following:
guest-cfname host-cfname ip-address user password guest-cfname host-cfname ip-address user password
guest-cfname : CF node name of the guest OS. host-cfname : CF node name of the host OS. ip-address : An IP address of the host OS. Available IP addresses are IPv4 and IPv6 address. IPv6 link local addresses are not available. user : An account of the host OS. For a Xen environment, FJSVvmSP is fixed. For a KVM environment, specify the user which were created when you set libvirt for the shutdown facility. password : A login password of the account specified by "user". Specify the encrypted password that you have checked in 1.
Example) The following is a setup example.
When the CF node name of the host OS on which node1 (CF node name of the guest OS) operates is hostos1, the IP address of the host OS is 10.20.30.50, the CF node name of the host OS on which node2 (CF node name of the guest OS) operates is hostos2, and the IP address of the host OS is 10.20.30.51.
For Xen environment:
# cat /etc/opt/SMAW/SMAWsf/SA_vmchkhost.cfg
node1 hostos1 10.20.30.50 FJSVvmSP 3CA1wxVXKD8a93077BaEkA==
node2 hostos2 10.20.30.51 FJSVvmSP 3CA1wxVXKD8a93077BaEkA==
For KVM environment:
# cat /etc/opt/SMAW/SMAWsf/SA_vmchkhost.cfg
node1 hostos1 10.20.30.50 user D0860AB04E1B8FA3
node2 hostos2 10.20.30.51 user D0860AB04E1B8FA3
Note
Check if the setting contents of the /etc/opt/SMAW/SMAWsf/SA_vmchkhost.cfg file are correct. If there is an error in the setting contents, the shutdown facility cannot be performed normally.
Check if the domain name (domainX) of the guest OS and the IP address (ip-address) of the host OS corresponding to the cluster host's CF node name (CFNameX) of the /etc/opt/SMAW/SMAWsf/SA_vmchkhost.cfg file are set. If there is an error in the setting, the shutdown facility cannot be performed normally.
Log in to the host OS
The shutdown facility accesses the target node with SSH. Therefore, you need to authenticate yourself (create the RSA key) in advance, which is required when using SSH for the first time.
Check that you have already authenticated yourself (created the RSA key) when setting up vmSP (Virtual Machine Service Provider) or libvirt in the shutdown facility.
Set up the shutdown daemon
Create /etc/opt/SMAW/SMAWsf/rcsd.cfg as below.
CFNameX,weight=weight,admIP=myadmIP:agent=SA_xxxx,timeout=timeout CFNameX,weight=weight,admIP=myadmIP:agent=SA_xxxx,timeout=timeout
CFNameX :Specify the CF node name of the cluster host. weight :Specify the weight of the SF node. myadmIP :Specify the IP address of the administrative LAN for your guest OS (node). Available IP addresses are IPv4 and IPv6 address. IPv6 link local addresses are not available. When specifying the IPv6 address, enclose it in blankets "[ ]". (Example: [1080:2090:30a0:40b0:50c0:60d0:70e0:80f0]) SA_xxxx :Specify the name of the Shutdown Agent. Here, "SA_vmchkhost" is specified. timeout :Specify the timeout duration (seconds) of the Shutdown Agent. Specify 35 seconds for "SA_vmchkhost".
Example) The following is a setup example.
For Xen environment:
# cat /etc/opt/SMAW/SMAWsf/rcsd.cfg node1,weight=2,admIP=fuji2:agent=SA_vmSPgp,timeout=35:agent=SA_vmSPgr,timeout=35:agent=SA_vmchkhost,timeout=35 node2,weight=1,admIP=fuji3:agent=SA_vmSPgp,timeout=35:agent=SA_vmSPgr,timeout=35:agent=SA_vmchkhost,timeout=35
For KVM environment:
# cat /etc/opt/SMAW/SMAWsf/rcsd.cfg node1,weight=2,admIP=fuji2:agent=SA_libvirtgp,timeout=35:agent=SA_libvirtgr,timeout=35:agent=SA_vmchkhost,timeout=35 node2,weight=1,admIP=fuji3:agent=SA_libvirtgp,timeout=35:agent=SA_libvirtgr,timeout=35:agent=SA_vmchkhost,timeout=35
Note
For the shutdown agent set by the rcsd.cfg file, set both shutdown agents in the order of SA_vmSPgp to SA_vmSPgr.
The contents of the SA_vmchkhost.cfg and rcsd.cfg files of all guest OSes (nodes) should be identical. If not, a malfunction will occur.
Start the shutdown facility
Check that the shutdown facility has already been started on all the nodes.
# sdtool -s
If the shutdown facility has already been started, execute the following on all the nodes to restart it.
# sdtool -e # sdtool -b
If the shutdown facility has not been started, execute the following on all the nodes to start it.
# sdtool -b
Check the state of the shutdown facility
Check the state of the shutdown facility.
# sdtool -s
Information
Display results of the sdtool -s command
If "InitFailed" is displayed as the initial status, it means that a problem occurred during initialization of that shutdown agent.
If "TestFailed" is displayed as the test status, it means that a problem occurred while the agent was testing whether or not the node displayed in the cluster host field could be stopped. Some sort of problem probably occurred in the software, hardware, network resources, or host OS being used by that agent.
When the maximum concurrent connections for SSH are "the number of cluster nodes" or less, the status of the shutdown facility may be displayed as InitFailed or TestFailed. Change the configuration to set up the maximum concurrent connections for SSH to be "the number of cluster nodes + 1"or more.
If "Unknown" is displayed as the stop or initial status, it means that the SF has still not executed node stop, path testing, or SA initialization. "Unknown" is displayed temporarily until the actual status can be confirmed.
If "TestFailed" or "InitFailed" is displayed, check the SA log file or /var/log/messages. The log file records the reason why SA testing or initialization failed. After the failure-causing problem is resolved and SF is restarted, the status display changes to InitWorked or B242TestWorked.
This section describes the procedure for setting up the Host OS failover function when using it in the PRIMEQUEST KVM environment.
Set up this setting after setting libvirt and vmchkhost in the shutdown facility on the guest OS (node).
Note
Be sure to perform the following operations from 1. to 4. on all guest OSes (nodes).
Encrypt the password
Execute the sfcpher command to encrypt passwords for login to the guest OS (node) as a root user.
For details on how to use the sfcipher command, see the manual page of "sfcipher."
# sfcipher -c
Enter User's Password:
Re-enter User's Password:
D0860AB04E1B8FA3
Create /etc/opt/FJSVcluster/etc/kvmguests.conf
Create /etc/opt/FJSVcluster/etc/kvmguests.conf with the following contents.
guest-name host-cfname guest-clustername guest-cfname guest_IP guest_user guest_passwd
:
Create the kvmguests.conf file using system administrator access privileges and change the permission of the file to 600.
Enter the information of one node in one line.
Delimit each item with a single space.
The kvmguests.conf file must be the same on all cluster nodes.
guest-name :Specifies the domain name of the guest OS. host-cfname :Specifies the CF node name of the host OS in which "guest-name" is running. If you execute "cftool -l" on the host OS in which "guest-name" is running, you can confirm the CF node name of the node. guest-clustername :Specifies the cluster name of the guest OS. If you execute "cftool -c" on the guest OS, you can confirm the cluster name of the node. guest-cfname :Specifies the CF node name of the guest OS. If you execute "cftool -l" on the guest OS, you can confirm the CF node name of the node. guest_IP :Specifies the IP address of the guest OS. Available IP address formats are IPv4 and IPv6 addresses. IPv6 link local addresses are not available. guest_user :Specifies the user name for logging in to the guest OS. Specify the fixed root. guest_passwd :Specifies the password for logging in ton the guest OS. Specifies the password encrypted in step 1.
Example) In a two-node configuration between guest OSes, two cluster systems are configured
guest11 cfhost1 cluster1 cfguest11 10.20.30.50 root D0860AB04E1B8FA3 guest12 cfhost2 cluster1 cfguest12 10.20.30.51 root D0860AB04E1B8FA3 guest21 cfhost1 cluster2 cfguest21 10.20.30.60 root D0860AB04E1B8FA3 guest22 cfhost2 cluster2 cfguest12 10.20.30.61 root D0860AB04E1B8FA3
Confirm the log in to the guest OS
The shutdown facility accesses the target node with SSH. Therefore, you need to authenticate yourself (create the RSA key) in advance, which is required when using SSH for the first time.
Check that you can connect to all the guest OSes (nodes) which are defined to /etc/opt/FJSVcluster/etc/kvmguests.conf via SSH as a root user.
# ssh -l root XXX.XXX.XXX.XXX
The authenticity of host 'XXX.XXX.XXX.XXX (XXX.XXX.XXX.XXX)' can't be established.
RSA key fingerprint is xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx.
Are you sure you want to continue connecting (yes/no)? yes <- Enter "yes."
Start the shutdown facility
Check that the shutdown facility has already been started on all the nodes.
# sdtool -s
If the shutdown facility has already been started, execute the following on all the nodes to restart it.
# sdtool -e # sdtool -b
If the shutdown facility has not been started, execute the following on all the nodes to start it.
# sdtool -b