G.2.3 Building a Cluster

For details on survival priority, see "5.1.2.1 Survival Priority."

In VMware environments, when a failure occurs in a guest OS, the virtual machine of the guest OS where a failure is detected is powered off forcibly by cooperating with VMware vCenter Server. By this process, an operation can be failed over.

This section explains the method for setting up the SA_vwvmr shutdown agent as the shutdown facility.

Note

Be sure to perform the following operations on all guest OSes (nodes).

Encrypting the password
Execute the sfcipher command to encrypt passwords for accessing VMware vCenter Server.
For details on how to use the sfcipher command, see the manual page of "sfcipher."
```
# sfcipher -c
Enter User's Password:
Re-enter User's Password:
D0860AB04E1B8FA3
```

Setting up the shutdown agent

Specify the shutdown agent.

Create /etc/opt/SMAW/SMAWsf/SA_vwvmr.cfg with the following contents on all guest OSes (nodes) of the cluster:

# comment line
CFName: cfname1
VMName: vmname1
vCenter_IP: ipaddress1
vCenter_Port: port
user: user
passwd: passwd
# comment line
CFName: cfname2
VMName: vmname2
vCenter_IP: ipaddress2
vCenter_Port: port2
user: user
passwd: passwd

cfnameX            : Specify the CF node name of the cluster host.
vmnameX            : Specify the name of the virtual machine where the cluster host is working.
ipaddressX         : Specify the IP address of VMware vCenter Server that manages the virtual
                     machine.
                     Available IP addresses are IPv4 and IPv6 addresses.
                     IPv6 link local addresses are not available.
                     When specifying the IPv6 address, enclose it in brackets "[ ]".
                     (Example: [1080:2090:30a0:40b0:50c0:60d0:70e0:80f0])
portX              : Specify the port number of VMware vCenter Server.
                      When using the default value (443), describe "vCenter_Port:". Do not specify
                     the parameter.
user               : Specify the user of VMware vCenter Server created in 
                     "G.2.1.1 Installation and Configuration of Related Software."
                     When logging in with single sign-on (SSO), specify user@SSO_domain_name.
passwd             : A login password of the account specified by "user".
                     Specify the encrypted password encrypted in 1.

Note

Do not change the order of each item.
If the virtual machine name (VMName:) includes a Japanese character, use the character code UTF-8 to describe the machine name.
One-byte space and a double-byte space is used as a different character. Use one-byte space when inserting a space in the file.
Only the line start with "#" is treated as a comment. When "#" is in the middle of a line, this "#" is treated as a part of the setting value.
In the following example, "vm1 # node1's virtual machine." is used as the virtual machine name.
```
...
VMName: vm1 # node1's virtual machine.
...
```
The contents of SA_vwvmr.cfg must be the same on all the guest OSes. If not, the shutdown facility may not work correctly.

Example

Log in with single sign-on
When the IP address of VMware vCenter Server that manages all the virtual machines is 10.20.30.40, the port numbers are the default value, the user who connects to VMware vCenter Server is Administrator, SSO domain name is vsphere.local, and the password encrypted in step "1. Encrypting the password" is D0860AB04E1B8FA3:
```
##
## node1's information.
##
CFName: node1
VMName: vm1
vCenter_IP: 10.20.30.40
vCenter_Port:
user: Administrator@vsphere.local
passwd: D0860AB04E1B8FA3
##
## node2's information.
##
CFName: node2
VMName: vm2
vCenter_IP: 10.20.30.40
vCenter_Port:
user: Administrator@vsphere.local
passwd: D0860AB04E1B8FA3
```
Log in without single sign-on.
When the IP address of VMware vCenter Server that manages all the virtual machines is 10.20.30.40, the port numbers are the default value, the user who connects to VMware vCenter Server is root, and the password encrypted in step "1. Encrypting the password" is D0860AB04E1B8FA3:
```
##
## node1's information.
##
CFName: node1
VMName: vm1
vCenter_IP: 10.20.30.40
vCenter_Port:
user: root
passwd: D0860AB04E1B8FA3
##
## node2's information.
##
CFName: node2
VMName: vm2
vCenter_IP: 10.20.30.40
vCenter_Port:
user: root
passwd: D0860AB04E1B8FA3
```

Setting up the shutdown daemon

Create /etc/opt/SMAW/SMAWsf/rcsd.cfg with the following contents on all guest OSes (nodes) of the cluster:

CFNameX,weight=weight,admIP=myadmIP:agent=SA_vwvmr,timeout=timeout
CFNameX,weight=weight,admIP=myadmIP:agent=SA_vwvmr,timeout=timeout

CFNameX        : CF node name of the cluster host. 
weight         : Weight of the SF node. 
myadmIP        : Specify the IP address of the administrative LAN for CFNameX. 
                 Available IP addresses are IPv4 and IPv6 addresses.
                 IPv6 link local addresses are not available.
                 When specifying the IPv6 address, enclose it in brackets "[ ]".
                 (Example: [1080:2090:30a0:40b0:50c0:60d0:70e0:80f0])
                 If you specify a host name, please make sure it is listed in /etc/hosts.
timeout        : Specify the timeout duration (seconds) of the Shutdown Agent. 
                 Specify 45 for the value.

Note

The rcsd.cfg file must be the same on all guest OSes (nodes). Otherwise, operation errors might occur.

Example

Below is the setting examples:

node1,weight=1,admIP=10.0.0.1:agent=SA_vwvmr,timeout=45
node2,weight=1,admIP=10.0.0.2:agent=SA_vwvmr,timeout=45

Starting the shutdown facility
Check that the shutdown facility has started.
```
# sdtool -s
```
If the shutdown facility has already started, execute the following command to restart the shutdown facility.
```
# sdtool -r
```
If the shutdown facility is not started, execute the following command to start the shutdown facility.
```
# sdtool -b
```
Checking the status of the shutdown facility
Check that the status of the shutdown facility is either "InitWorked" or "TestWorked." If the displayed status is "TestFailed" or "InitFailed," check the shutdown daemon settings for any mistakes.
```
# sdtool -s
```

This section explains the method for setting up the SA_icmp shutdown agent as the shutdown facility.

Note

Be sure to perform the following operations on all guest OSes (nodes).

Setting up the shutdown facility

Specify the shutdown agent.

Create /etc/opt/SMAW/SMAWsf/SA_icmp.cfg with the following contents on all guest OSes (nodes) of the cluster:

TIME_OUT=value
cfname:ip-address-of-node:NIC-name1,NIC-name2

value              : Specify the interval (in seconds) for checking whether the node is
                     alive. The recommended value is "5" (s).
cfname             : Specify the name of the CF node.
ip-address-of-node : Specify the IP addresses of any one of the following networks
                     utilized for checking whether the cfname node is alive.
                      - Cluster interconnect (IP address of CIP)
                      - Administrative LAN
                      - Public LAN
                     Checking via multiple networks is also available.
                     In this case, add a line for each utilized network.
                     Available IP addresses are IPv4 and IPv6 addresses.
                     IPv6 link local addresses are not available.
                     When specifying the IPv6 address, enclose it in brackets "[ ]".
                     (Example: [1080:2090:30a0:40b0:50c0:60d0:70e0:80f0])
                     Enter the IP address for all guest OSes (nodes) that configure the
                     cluster system.
                     
                     Depending on the configuration of the network used for checking, 
                     the following behavior may vary:
                     Please select after carefully examining the contents.
                     
                     - Using only the cluster interconnect:
                       If the cluster interconnect cannot communicate, even if the operating
                       node is running, Automatically switches the cluster application
                       immediately.
                       With this setting, I/O fencing suppresses I/O to the shared disk.
                       Your network may have a temporary duplicate IP address
                       Until the old operating node is stopped (approximately 25 seconds),
                       a duplicate IP address is generated from the client.
                       Access might be connected to the old operating node.
                     
                     - Using only one route other than the cluster interconnect or 
                       multiple routes:
                       Communicates over all configured routes to check node survival.
                       If the operating node cannot be confirmed on either route,
                       Automatically switches the cluster application immediately.
                       If the operation of the node is confirmed (for example, a temporary
                       high load or an intermittent communication path failure),
                       PRIMECLUSTER suppresses automatic the cluster application
                       switching and enters the LEFTCLUSTER state.
                       In this case, the operator checks the effect of the error on
                       the cluster application and then resolves the LEFTCLUSTER state
                       and switches the cluster application as necessary.
                       In the LEFTCLUSTER state, the following functions will not operate
                       and may affect the cluster application. Resolve the LEFTCLUSTER
                       state immediately.
                       
                         (1) Some PRIMECLUSTER commands
                             The following command cannot be executed successfully.
                              - Command execution between cluster nodes (clexec)
                              - Distribute files between cluster nodes (clsyncfile)
                              - Display configuration and status information for objects (*1) 
                                (sdxinfo)
                                (*1) Classes, groups, disks, and volumes(slices).
                            
                         (2) Operations on the shared disk
                             The following with operations on the shared disk cannot be 
                             performed successfully:.
                     
                              - Symfoware archive log monitoring/switching, dbspace/audit
                                logging, Execute recovery, dump, and utility commands
                              - Equality recovery copy (performed during automatic recovery 
                                after an I/O error)
                              - Online/Offline operations on GDS resources contained in a
                                cluster application
                              - GUI for viewing and manipulating GDS configuration
                              - Execute the following GDS command
                                - Class Manipulation (sdxclass -R)
                                - Disk Operations (sdxdisk -M (*2), -R (*2), -C, -D)
                                - Group Operation (sdxgroup -C, -D, -R)
                                - Volume Operations (sdxvolume -M, -R, -N, -F, -S)
                                - Slice operations (sdxslice -M, -R, -N, -F, -T)
                                - Disk replacement (sdxswap -O (*2), -I (*2))
                                - Equivalent recovery copy (sdxcopy -B, -C, -I, -P)
                                - Recovering failed objects (sdxfix -C (*2), -D, -V)
                                - Changing an object's attribute value
                                  (sdxattr -C (*2), -G, -D, -V, -S)
                                - Object Configuration operations
                                  (sdxconfig Remove (*2), Restore (*2), Backup)
                                (*2) This applies to the local class in which the shared disk is
                                 registered.
                              - I/O errors on the shared disk
NIC-nameX          : Specify the network interface of the local guest OS (node) utilized 
                     for checking whether the node defined by ip-address-of-node is alive. 
                     If there is more than one, delimit them with commas (",").

Note

Registering network interfaces

For duplicating by GLS, define all redundant network interfaces. (Example: eth0,eth1)
If you are bonding NICs, define the bonding device behind the IP address. (Example: bond0)
For registering the cluster interconnect, define all network interfaces that are used on all paths of the cluster interconnect. (Example: eth2,eth3)
Do not use the takeover IP address (takeover virtual interface).

Example

Below indicates the setting example of clusters (consisted by 2 nodes) between guest OSes on multiple ESXi hosts.

When cluster interconnects (eth2,eth3) are set

TIME_OUT=5
node1:192.168.1.1:eth2,eth3
node2:192.168.1.2:eth2,eth3

When the public LAN (duplicated (eth0,eth1) by GLS) and the administrative LAN (eth4) are set

TIME_OUT=5
node1:10.20.30.100:eth0,eth1
node1:10.20.40.200:eth4
node2:10.20.30.101:eth0,eth1
node2:10.20.40.201:eth4

Setting up the shutdown daemon

Create /etc/opt/SMAW/SMAWsf/rcsd.cfg with the following contents on all guest OSes (nodes) of the cluster:

CFNameX,weight=weight,admIP=myadmIP:agent=SA_icmp,timeout=timeout
CFNameX,weight=weight,admIP=myadmIP:agent=SA_icmp,timeout=timeout

CFNameX        : CF node name of the cluster host. 
weight         : Weight of the SF node. 
                 Set 1 because this value is not effective with the I/O fencing function.
myadmIP        : Specify the IP address of the administrative LAN for CFNameX. 
                 Available IP addresses are IPv4 and IPv6 addresses.
                 IPv6 link local addresses are not available.
                 When specifying the IPv6 address, enclose it in brackets "[ ]".
                 (Example: [1080:2090:30a0:40b0:50c0:60d0:70e0:80f0])
                 If you specify a host name, please make sure it is listed in /etc/hosts.
timeout        : Specify the timeout duration (seconds) of the Shutdown Agent. 
                 Specify the following values.
                (TIME_OUT + 2) X number of paths to be used for checking the survival
                 of a node, or 20 (specify the larger value)
                 TIME_OUT is the TIME_OUT value that is described in the SA_icmp.cfg.

                     - When checking the survival of a node on the 1 path
                       (either one of administrative LAN, public LAN, or cluster
                        interconnects)
                       (1) TIME_OUT is 18 or larger
                           TIME_OUT + 2
                       (2) TIME_OUT is less than 18
                           20

                     - When checking the survival of a node on the 2 paths
                       (either two of administrative LAN, public LAN, or cluster
                        interconnects)
                       (1) TIME_OUT is 8 or larger
                           (TIME_OUT + 2)X 2
                       (2) TIME_OUT is less than 8
                           20

                     - When checking the survival of a node on the 3 paths
                       (three of administrative LAN, multiple public LANs, or public
                        LAN, or cluster interconnects)
                       (1) TIME_OUT is 5 or larger
                           (TIME_OUT + 2)X 3
                       (2) TIME_OUT is less than 5
                           20

Note

The rcsd.cfg file must be the same on all guest OSes (nodes). Otherwise, operation errors might occur.

Example

Below indicates the setting example to check survival of a node by using administrative LAN and public LAN when TIME_OUT value described in the SA_icmp.cfg is 10, in a two-node configuration.

node1,weight=1,admIP=192.168.100.1:agent=SA_icmp,timeout=24 (*)
node2,weight=1,admIP=192.168.100.2:agent=SA_icmp,timeout=24 (*)
timeout = (10 (TIMEOUT value) + 2) X 2(administrative LAN, public LAN) = 24

Starting the shutdown facility
Check that the shutdown facility has started.
```
# sdtool -s
```
If the shutdown facility has already started, execute the following command to restart the shutdown facility.
```
# sdtool -r
```
If the shutdown facility is not started, execute the following command to start the shutdown facility.
```
# sdtool -b
```
Checking the status of the shutdown facility
Check that the status of the shutdown facility is either "InitWorked" or "TestWorked." If the displayed status is "TestFailed" or "InitFailed," check the shutdown daemon settings for any mistakes.
```
# sdtool -s
```

G.2.3 Building a Cluster

G.2.3.1 Initial Setup of CF and CIP

G.2.3.2 Setting Up the Shutdown Facility (when using VMware vCenter Server Functional Cooperation)

G.2.3.3 Setting Up the Shutdown Facility (when using I/O fencing function)

G.2.3.4 Initial Setup of the Cluster Resource Management Facility

G.2.3.5 Setting Up Fault Resource Identification and Operator Intervention Request