Top
PRIMECLUSTER  Installation and Administration Guide4.3

7.4.1 Corrective Action when the resource state is Faulted

This section describes the corrective actions to take when the resource state became Faulted.

7.4.1.1 Failure Detection and Cause Identification if a Failure Occurs

If a failure occurs in a resource, you can use the functions of PRIMECLUSTER and the operating system to detect the failure and identify the faulted resource that caused the failure.

The descriptions given in (a) to (k) below are relevant to the "Failure confirmation features list" given below:

Failure detection

Normally, the RMS main window (b) is used to monitor the cluster applications.

In addition, you can use the features described in "Failure confirmation features" to detect the failure.

Cause identification

You can also use the function that detected the failure and the features listed in "Failure confirmation features" below to identify the faulted resource that caused the failure.

Failure confirmation features list

Failure confirmation features

Manual reference

(a)

Message screen

C.3.1 Failed Resource Message

(b)

RMS main window
The RMS tree and the RMS cluster table can be used from this screen.

7.1.3 RMS Main Window

(c)

CF main window
The CF tree can be used from this screen.

7.1.1 CF Main Window

(d)

CRM main window
The CRM tree can be used from this screen.

This screen is useful in detecting hardware resource faults.

7.1.2 CRM Main Window

(e)

"Resource Fault History" screen
This screen is useful in detecting hardware resource faults.

C.3.2 Resource Fault History

(f)

Current list of resources in which a failure has occurred

C.3.3 Fault Resource List

(g)

MSG main window
The cluster control messages can be viewed in this screen.

To display this screen, select the msg tab in the Cluster Admin screen.

-

(h)

Application log

7.3.4.2 Viewing application logs

(i)

switchlog

7.3.4.1 Viewing switchlogs

(j)

Syslog

-

(k)

Console
Messages that are displayed on the console can be checked.
Viewing the "console problem" information on the console can help you identify the fault cause.

Appendix D Messages

(l)

Machine management GUI

Machine Administration Guide

(m)

MultiPathDisk view

Multipath Disk Control Load Balance option x.x Guide

(n)

GDS GUI

PRIMECLUSTER Global Disk Services Configuration and Administration Guide

7.4.1.2 Corrective Action for Failed Resources

Take the following steps for failed resources;

  1. Correct the faulted resource

    Correct the problem in the failed resource. For details, see "PRIMECLUSTER Reliant Monitor Services (RMS) Reference Guide."

    If an error message of patrol diagnosis is displayed, see "7.4.2 Corrective Action when Patrol Diagnosis Detects a Fault."

    "hvdet_sptl" is displayed in the name of the program that outputs the patrol diagnosis message.

    Note

    If you are using an operation management product other than a PRIMECLUSTER product, you may need to take corrective actions prescribed for that product.

    For details, see the manual provided with each operation management product.

    [Examples] Machine Administration, MultiPathDisk view, GDS

  2. Recover the cluster application

    At the RMS main window, check the state of the cluster application to which the corrected resource is registered. If the cluster application is in the Faulted state, execute the Fault clear operation.

    For details on the Fault clear operation, see "7.2.2.4 Bringing Faulted Cluster Application to Online State."