F.1.4 Class Status Abnormality

PRIMECLUSTER Global Disk Services Configuration and Administration Guide 4.1 (Solaris(TM) Operating System)

Contents

Appendix F Troubleshooting

> F.1 Resolving Problems

F.1.4 Class Status Abnormality

If the class status is one of the following statuses, take action as indicated for the relevant situation.

Class becomes closed status during operation.

(1) Class becomes closed status during operation.

[Explanation]

The class becomes closed when the number of configuration databases which store information on object configuration and object status within a class is insufficient, or when the communication error between nodes occurs in a cluster environment.

All objects within a closed class are inaccessible.

An Insufficient number of configuration databases will occur under the following conditions:

When there are no disks that can be accessed normally, if there are two or less disks in ENABLE status.
When there are one or less disks that can be accessed normally, if there are three to five disks in ENABLE status.
When there are two or less disks that can be accessed normally, if there are more than six disks in ENABLE status.

However, in the event of root class, the class will not be closed unless there are no accessible disks.

GDS configuration databases cannot be stored in BCV devices and target (R2) devices since the devices are overwritten by data in copy source disks. Therefore, GDS does not regard BCV devices and target (R2) devices as "disks that can be accessed normally" described in the above conditions.

[Resolution]

1) You can check whether or not a class was closed during operation as follows. Do not reboot the system or restart sdxservd daemon, as it will make the checking impossible.

# /etc/opt/FJSVsdx/bin/sdxdcdown

CLASS   DOWN REASON NDK NEN NDB NLDB DEVNAM
------- ---- ------ --- --- --- ---- --------------------------
Class1  no   -       10  10   8    0 c1t1d0:c1t2d0:c1t3d0:c1t4d0:c2t1d0:c2t2d0:c2t3d0:c2t4d0
Class2  yes  Comm    10  10   8    0 c3t1d0:c3t2d0:c3t3d0:c3t4d0:c4t1d0:c4t2d0:c4t3d0:c4t4d0
Class3  yes  FewDB   10  10   1    7 c5t1d0
Class4  yes  NoDB    10  10   0    8 -

In this example, Class2, Class3, and Class4 with "yes" in the DOWN field are closed. The cause shown in the REASON field are as follows.

(Cause 1): Comm Communication failure between nodes.
(Cause 2): FewDB Insufficient number of valid configuration databases.
(Cause 3): NoDB No valid configuration database.

2) Depending on specific causes, recovery may be difficult.

First, collect the investigation material.

For information on how to collect the investigation material, see "Collecting Investigation Material."
Resolutions are described for the following two cases:

Closed due to a communication error
Closed due to an insufficient number of configuration databases

3a) In the even of (Cause 1), contact your local customer support.

3b) In the event of (Cause 2) or (Cause 3), all (or the majority) of the disks registered with class have abnormalities.

You can check the disks registered with class as follows.

# sdxinfo -D -c Class3

OBJ    NAME    TYPE   CLASS   GROUP   DEVNAM  DEVBLKS  DEVCONNECT       STATUS
------ ------- ------ ------- ------- ------- -------- ---------------- -------
disk   Disk31  mirror Class3  Group1  c1t1d0   8847360 *                ENABLE
disk   Disk32  mirror Class3  Group1  c2t1d0   8847360 *                ENABLE
disk   Disk33  mirror Class3  Group2  c1t2d0   8847360 *                ENABLE
disk   Disk34  mirror Class3  Group2  c2t2d0   8847360 *                ENABLE
disk   Disk35  mirror Class3  Group3  c1t3d0  17793024 *                ENABLE
disk   Disk36  mirror Class3  Group3  c2t3d0  17793024 *                ENABLE
disk   Disk37  mirror Class3  Group4  c1t4d0  17793024 *                ENABLE
disk   Disk38  mirror Class3  Group4  c2t4d0  17793024 *                ENABLE
disk   Disk39  spare  Class3  Group1  c1t5d0  17793024 *                ENABLE
disk   Disk40  spare  Class3  *       c2t5d0  17727488 *                ENABLE

In this example, ten disks from Disk31 to Disk40 are registered with Class3.

Physical disk names are shown in the DEVNAM field. Identify the cause of abnormality with these physical disks by referring to disk driver log messages.

The cause of abnormality could be either of the following:

(Failure 1): Failed or defective non-disk component.
(Failure 2): Failed disk component.

4b) In the event of (Failure 1), recover the failed or defective non-disk component (such as I/O adapter, I/O cable, I/O controller, power supply, and fan).

5b) For a local class or a shared class, execute the sdxfix command to restore the class status.

# sdxfix -C -c Class3
SDX:sdxfix: INFO: Class3: class recovery completed successfully

If the sdxfix command ends normally, skip steps 6b) through 9b) and go on to step 10b).
If the sdxfix command does not end normally, go on to step 6b).
For the root class, go on to step 6b).

6b) Open the GDS configuration parameter file with an editor.

# vi /etc/opt/FJSVsdx/sdx.cf

Add the following one line in the end of the file.

SDX_DB_FAIL_NUM=0

7b) Reboot the system.

8b) Confirm that objects within the class are accessible.

# sdxinfo -c Class3

If nothing is displayed, recovery was unsuccessful. You will have to contact your local customer support. If information is displayed normally, proceed with the following procedures.

9b) In the event of (Failure 2), where a disk component has failed, follow the procedures in "Disk Swap," or "sdxswap - Swap disk," and swap the disks.

10b) After completing the recovery for both (Failure 1) and (Failure 2), check the number of valid configuration databases as described below.

# /etc/opt/FJSVsdx/bin/sdxdcdown

CLASS   DOWN REASON NDK NEN NDB NLDB DEVNAM
------- ---- ------ --- --- --- ---- ---------------------------
Class1  no   -       10  10  8    0 c1t1d0:c1t2d0:c1t3d0:c1t4d0:c2t1d0:c2t2d0:c2t3d0:c2t4d0
Class2  no   -       10  10  8    0 c3t1d0:c3t2d0:c3t3d0:c3t4d0:c4t1d0:c4t2d0:c4t3d0:c4t4d0
Class3  no   -       10  10  8    0 c5t1d0:c5t2d0:c5t3d0:c5t4d0:c6t1d0:c6t2d0:c6t3d0:c6t4d0
Class4  no   -       10  10  8    0 c7t1d0:c7t2d0:c7t3d0:c7t4d0:c8t1d0:c8t2d0:c8t3d0:c8t4d0

NLDB field gives the insufficient number of configuration databases. If this value is "0," the problem is resolved. If this value is "1" or more, there are still disks that have not been recovered. In the above example, all NLDB fields display "0," indicating the successful recovery.

When step 6b) was not performed, the following procedures are not required.

11b) Open the GDS configuration parameter file with an editor.

# vi /etc/opt/FJSVsdx/sdx.cf

Remove the following one line added in step 6b).

SDX_DB_FAIL_NUM=0

12b) Reboot the system.

If you cannot perform the recovery with the described procedures, contact your local customer support.

Contents