15.6 How to Check Errors

Check the messages output to the system log (/var/log/messages) if an error occurs on the primary master server that causes a switch to the secondary master server (when using replicated configuration), or if an error on one side of a replicated network causes a switch to the other side.

The status of the HA cluster and network replication can be checked with the following commands.

Checking the status of the master server HA cluster

The status can be checked by running the hvdisp command on both the primary master server and the secondary master server.

Refer to the online Help of hvdisp for details of the hvdisp(1M) command.

The following example shows the status of the HA cluster after it switches to the secondary master server due to an error on the primary master server.
The error must be removed from the primary master server, because the status of app1 on this master server is "Faulted".

Display the status of the HA cluster of the primary master server (example)

# hvdisp -a <Enter>

Local System:    master1RMS
Configuration:   /opt/SMAW/SMAWRrms/build/bdpp.us

Resource            Type    HostName            State        StateDetails
-----------------------------------------------------------------------------
master1RMS          SysNode                     Online       
master2RMS          SysNode                     Online       
app1                userApp                     Faulted      Failed Over
Machine001_app1     andOp   master1RMS                       
Machine000_app1     andOp   master2RMS          Offline      
ManageProgram000_Cmd_APP1 gRes                  Offline      
Ipaddress000_Gls_APP1 gRes                      Offline

Display the status of the HA cluster of the secondary master server (example)

# hvdisp -a <Enter>

Local System:    master2RMS
Configuration:   /opt/SMAW/SMAWRrms/build/bdpp.us

Resource            Type    HostName            State        StateDetails
-----------------------------------------------------------------------------
master1RMS          SysNode                     Online       
master2RMS          SysNode                     Online       
app1                userApp                     Online       
app1                userApp master1RMS          Online       
Machine001_app1     andOp   master2RMS          Online       
Machine000_app1     andOp   master1RMS                       
ManageProgram000_Cmd_APP1 gRes                  Online       
Ipaddress000_Gls_APP1 gRes                      Online

In the example below, the cause of the error has been removed from the primary master server and the recovered state is shown.
Failback to the primary master server is possible, because the status of app1 on this master server is "Offline".

Display the status of the HA cluster of the primary master server (example)

# hvdisp -a <Enter>

Local System:    master1RMS
Configuration:   /opt/SMAW/SMAWRrms/build/bdpp.us

Resource            Type    HostName            State        StateDetails
-----------------------------------------------------------------------------
master1RMS          SysNode                     Online
master2RMS          SysNode                     Online
app1                userApp                     Offline
app1                userApp master2RMS          Online
Machine001_app1     andOp   master2RMS
Machine000_app1     andOp   master1RMS          Offline
ManageProgram000_Cmd_APP1 gRes                  Offline
Ipaddress000_Gls_APP1 gRes                      Offline

Checking the status of communication between master servers

The status of the primary and secondary master servers can be checked by executing the cftool command on the respective server.

Refer to the online Help of cftool for details of the cftool(1M) command.

In the example below, an error has occurred in the cluster interconnect (CIP) and the status of the remote node cannot be determined. In this situation, a cluster partition is assumed to have occurred.

Display the communication status of the primary master server (example)

# cftool -n <Enter>
Node    Number State       Os      Cpu
master1 1      UP          Linux   EM64T
master2 2      LEFTCLUSTER Linux   EM64T

Display the communication status of the secondary master server (example)

# cftool -n <Enter>
Node    Number State       Os      Cpu
master1 1      LEFTCLUSTER Linux   EM64T
master2 2      UP          Linux   EM64T

Checking network replication status

Check the status using the dsphanet command. The status can be checked on the master server and slave servers.

Refer to "7.4 dsphanet Command" under "Chapter 7 Command references" in the "PRIMECLUSTER Global Link Services Configuration and Administration Guide 4.3 Redundant Line Control Function" for information on the dsphanet command.

Display the status of network replication of the primary master server (example)

# /opt/FJSVhanet/usr/sbin/dsphanet <Enter>
[IPv4,Patrol / Virtual NIC]
 Name       Status   Mode CL  Device
+----------+--------+----+----+------------------------------------------------+
 sha0       Active    d   ON   eth5(ON),eth9(OFF)
 sha1       Active    p   OFF  sha0(ON)
[IPv6]
 Name       Status   Mode CL  Device
+----------+--------+----+----+------------------------------------------------+

Display the status of network replication of the slave server (example)

# /opt/FJSVhanet/usr/sbin/dsphanet <Enter>
[IPv4,Patrol / Virtual NIC]
 Name       Status   Mode CL  Device
+----------+--------+----+----+------------------------------------------------+
 sha0       Active    e   OFF  eth5(ON),eth9(OFF)
 sha1       Active    p   OFF  sha0(ON)
[IPv6]
 Name       Status   Mode CL  Device
+----------+--------+----+----+------------------------------------------------+