This section explains the procedure for adding a system board by DR during PRIMECLUSTER system operation.
If a system board is added by DR, this might affect the PRIMECLUSTER monitoring facility resulting in node elimination.
If DR needs to be used, stop the cluster monitoring facility beforehand with the following procedure:
Execute the "hvshut" command on each node to stop PRIMECLUSTER RMS as follows. Answer "yes," then only RMS will stop. The cluster application will remain running.
# hvshut -L
WARNING
-------
The '-L' option of the hvshut command will shut down the RMS
software without bringing down any of the applications.
In this situation, it would be possible to bring up the same
application on another node in the cluster which *may* cause
data corruption.
Do you wish to proceed ? (yes = shut down RMS / no = leave RMS running).
yes
NOTICE: User has been warned of 'hvshut -L' and has elected to proceed.
Add the following line to the end of the "/opt/SMAW/SMAWRrms/bin/hvenv.local" file on each node.
export HV_RCSTART=0
It is necessary to perform the procedure above so that RMS will not automatically start immediately after OS startup.
Execute the "sdtool" command on each node to stop PRIMECLUSTER SF as follows.
# sdtool -e
LOG3.013806902801080028 11 6 30 4.3A30 SMAWsf : RCSD returned a
successful exit code for this command
Perform the following operation on each node to change the timeout value of PRIMECLUSTER CF:
Add the following line to the "/etc/default/cluster.config" file.
CLUSTER_TIMEOUT "600"
Execute the following command.
# cfset -r
Check whether or not the timeout value is valid.
# cfset -g CLUSTER_TIMEOUT
>From cfset configuration in CF module:
Value for key: CLUSTER_TIMEOUT --->600
#
Use DR.
Perform the following operation on each node to return the timeout value of PRIMECLUSTER CF to the default value:
Change the value of CLUSTER_TIMEOUT defined in "/etc/default/cluster.config" file earlier to 10.
Before change
CLUSTER_TIMEOUT "600"
After change
CLUSTER_TIMEOUT "10"
Execute a following command.
# cfset -r
Check whether or not the timeout value is valid.
# cfset -g CLUSTER_TIMEOUT
>From cfset configuration in CF module:
Value for key: CLUSTER_TIMEOUT --->10
#
Execute the "sdtool" command on each node to start the PRIMECLUSTER SF.
# sdtool -b
Check if PRIMECLUSTER SF is running. (The following indicates an output example of a two-node configuration)
# sdtool -s
Cluster Host Agent SA State Shut State Test State Init State ------------ ----- -------- ---------- ---------- ---------- node0 SA_mmbp.so Idle Unknown TestWorked InitWorked node0 SA_mmbr.so Idle Unknown TestWorked InitWorked node1 SA_mmbp.so Idle Unknown TestWorked InitWorked node1 SA_mmbr.so Idle Unknown TestWorked InitWorked
Execute the "hvcm" command on each node to start PRIMECLUSTER RMS.
# hvcm
Starting Reliant Monitor Services now
RMS must be running on all the nodes. Check if each icon indicating the node state is green (Online) in the RMS main window of Cluster Admin.
Finally, remove the following line from "/opt/SMAW/SMAWRrms/bin/hvenv.local" file on each node.
export HV_RCSTART=0
Note
If you plan to use DR, be sure to verify a cluster system during cluster configuration using the above steps.
If a node failure (such as a node panic or reset) or a hang-up occurs due to hardware failure and so on during step 1 through 7, you need to follow the procedure below to start the cluster application, which was running on the node where DR is used, on a standby node.
If a hang-up occurs, stop the failed node forcibly, and then check that the node is stopped.
Mark the node DOWN by executing the "cftool" command on any of the nodes where a failure has not been occurred and specifying the node number and CF node name for failed nodes. However, if the state of the failed node is not LEFTCLUSTER, wait until the node becomes LEFTCLUSTER, and then execute the "cftool -k" command.
# cftool -n Node Number State Os Cpu node0 1 UP Linux EM64T node1 2 LEFTCLUSTER Linux EM64T # cftool -k This option will declare a node down. Declaring an operational node down can result in catastrophic consequences, including loss of data in the worst case. If you do not wish to declare a node down, quit this program now. Enter node number: 2 Enter name for node #2: node1 cftool(down): declaring node #2 (node1) down cftool(down): node node1 is down # cftool -n Node Number State Os Cpu node0 1 UP Linux EM64T node1 2 DOWN Linux EM64T #
Perform step 5 through 9 on all the nodes where a failure has not been occurred, and then start RMS. If the cluster application is in active standby configuration, the following message is displayed. Answer "yes." (For details on this message, see "PRIMECLUSTER Messages."
1421 The userApplication " userApplication " did not start automatically because not all of the nodes where it can run are online. Forcing the userApplication online on the SysNode "SysNode" is possible. Warning: When performing a forced online, confirm that RMS is started on all nodes in the cluster,manually shutdown any nodes where it is not started and then perform it.For a forced online, there is a risk of data corruption due to simultaneous access from several nodes.In order to reduce the risk, nodes where RMS is not started maybe forcibly stopped. Are you sure wish to force online? (yes/no) Message No: number
Remark) The operator intervention request is disabled by default at initial installation.
This function needs to be set by performing "5.2 Setting Up Fault Resource Identification and Operator Intervention Request." If this function is not set, you need to execute the "hvswitch" command. For details on the "hvswitch" command, see the description of the -f option of the online manual page for the command.
After restoring the failed node, perform step 5 through 9 on the appropriate node to start RMS.