PRIMECLUSTER Installation and Administration Guide 4.1 (for Solaris(TM) Operating System) |
Contents
Index
![]() ![]() |
Part 3 Operations | > Chapter 7 Operations | > 7.4 Corrective Actions for Resource Failures | > 7.4.2 Corrective Action when Patrol Diagnosis Detects a Fault |
Correct the faulted hardware according to the operation procedure below.
When a disk unit that is registered with GDS is to be exchanged, follow the steps described in the GDS disk replacement procedure. For information on GDS disk replacement, see "In Case of Disk Abnormality" in the "PRIMECLUSTER Global Disk Services Configuration and Administration Guide."
If the above procedure shows that the fault was not corrected, you need to continue the following procedure:
Then, use the CRM main window to check whether the fault was corrected. If the fault was corrected, the ON state is displayed.
The "clsptl(1M)" command has two functions. One function allows you to specify a faulted hardware unit and diagnoses only the specified device. The other function runs batch diagnosis of all shared disk units or all network interface cards. If faults occur in multiple hardware units, it is convenient to use the batch diagnosis function.
# /etc/opt/FJSVcluster/bin/clsptl -u generic -n c1t4d4
# /etc/opt/FJSVcluster/bin/clsptl -a DISK
Execute the "clgettree(1)" command to check whether the fault was corrected. If the fault was corrected, the ON state will be displayed for the hardware.
Confirm that the state of the cluster application to which the recovered hardware is registered, either in the RMS main window or with the "hvdisp(1M)" command.
If the cluster application is Faulted, switch the cluster application from the failed to the active state, either in the RMS main window or with the "hvutil(1M)" command. For information on the procedures related to the CRM main window, see "Bringing Faulted Cluster Application to Online State".
If operator intervention request is enabled, a message will be displayed with the "syslogd(1M)" command and Cluster Admin when RMS is started. By entering a response to this message, you can switch the state of the cluster application from the failed state to active. For information on the setup procedure for operator intervention requests, see "Setting Up Fault Resource Identification and Operator Intervention Requests".
An example of an operator intervention request is shown below. For details on the messages requesting operator intervention, see "Failed Resource and Operator Intervention Messages (GUI)" and "Operator Intervention Messages."
1422 On the SysNode "node1RMS", the userApplication "app0" is the Faulted state due to a fault in the resource "apl1". Do you want to clear fault? (yes/no) Message number: 1001
If "Yes" is set for the "AutoStartUp" attribute, an operator intervention request message will be displayed at node startup. You need to respond to the operator intervention message after executing step 4. of the procedure.
Contents
Index
![]() ![]() |