F.1.4 Class Status Abnormality

PRIMECLUSTER Global Disk Services Configuration and Administration Guide 4.1 (Linux)

Contents

Appendix F Troubleshooting

> F.1 Resolving Problems

F.1.4 Class Status Abnormality

(1) Class becomes closed status during operation.

[Explanation]

The class becomes closed when the number of configuration databases which store information on object configuration and object status within a class is insufficient, or when the communication error between nodes occurs in a cluster environment.

All objects within a closed class are inaccessible.

An Insufficient number of configuration databases will occur under the following conditions:

When there are no disks that can be accessed normally, if there are two or less disks in ENABLE status.
When there are one or less disks that can be accessed normally, if there are three to five disks in ENABLE status.
When there are two or less disks that can be accessed normally, if there are more than six disks in ENABLE status.

[Resolution]

1) You can check whether or not a class was closed during operation as follows. Do not reboot the system or restart sdxservd daemon, as it will make the checking impossible.

# /etc/opt/FJSVsdx/bin/sdxdcdown

CLASS   DOWN REASON NDK NEN NDB NLDB DEVNAM
------- ---- ------ --- --- --- ---- --------------------------
Class1  no   -       10  10   8    0 sda:sdb:sdc:sdd:sde:sdf:sdg:sdh
Class2  yes  Comm    10  10   8    0 sdi:sdj:sdk:sdl:sdm:sdn:sdo:sdp
Class3  yes  FewDB   10  10   1    7 sdq
Class4  yes  NoDB    10  10   0    8 -

In this example, Class2, Class3, and Class4 with "yes" in the DOWN field are closed. The cause shown in the REASON field are as follows.

(Cause 1): Comm Communication failure between nodes.
(Cause 2): FewDB Insufficient number of valid configuration databases.
(Cause 3): NoDB No valid configuration database.

2) Depending on specific causes, recovery may be difficult.

First, collect the investigation material.

For information on how to collect the investigation material, see "Collecting Investigation Material."
Resolutions are described for the following two cases:

a) closed due to a communication error

b) closed due to an insufficient number of configuration databases

3a) In case of Cause 1, contact your local customer support.

3b) In case of Cause 2 or 3, all (or the majority) of the disks registered with class have abnormalities.

You can check the disks registered with class follows.

# sdxinfo -D -c Class3

OBJ    NAME    TYPE   CLASS   GROUP   DEVNAM  DEVBLKS  DEVCONNECT       STATUS
------ ------- ------ ------- ------- ------- -------- ---------------- -------
disk   Disk31  mirror Class3  Group1  sda      8847360 *                ENABLE
disk   Disk32  mirror Class3  Group1  sdb      8847360 *                ENABLE
disk   Disk33  mirror Class3  Group2  sde      8847360 *                ENABLE
disk   Disk34  mirror Class3  Group2  sdf      8847360 *                ENABLE
disk   Disk35  mirror Class3  Group3  sdc     17793024 *                ENABLE
disk   Disk36  mirror Class3  Group3  sdg     17793024 *                ENABLE
disk   Disk37  mirror Class3  Group4  sdd     17793024 *                ENABLE
disk   Disk38  mirror Class3  Group4  sdh     17793024 *                ENABLE
disk   Disk39  spare  Class3  Group1  sdr     17793024 *                ENABLE
disk   Disk40  spare  Class3  *       sds     17727488 *                ENABLE

In this example, ten disks from Disk31 to Disk40 are registered with Class3.

Physical disk names are shown in the DEVNAM field. Identify the cause of abnormality with these physical disks by referring to disk driver log messages.

The cause of abnormality could be either of the following:

(Failure 1): Failed or defective non-disk component.
(Failure 2): Failed disk component.

4b) Open the GDS configuration parameter file with an editor.

# vi /etc/opt/FJSVsdx/sdx.cf

Add the following line at the end of the file.

SDX_DB_FAIL_NUM=0

5b) Shut down the system.

6b) In the event of Failure 1, recover the failed non-disk component (such as an I/O adapter, an I/O cable, a controller, a power supply, a fan).

7b) Reboot the system.

8b) Confirm that objects within the class are accessible.

# sdxinfo -c Class3

If nothing is displayed, recovery was unsuccessful. You will have to contact your local customer support. If information is displayed normally, proceed with the following procedures.

9b) In case of Failure 2 when a disk component has failed, follow the procedures in "Disk Swap," or "sdxswap - Swap disk," and swap the disks.

10b) After completing the recovery for both Failure 1 or 2, confirm the number of valid configuration databases as described below.

# /etc/opt/FJSVsdx/bin/sdxdcdown

CLASS   DOWN REASON NDK NEN NDB NLDB DEVNAM
------- ---- ------ --- --- --- ---- ---------------------------
Class1  no   -       10  10  8    0 sda:sdb:sdc:sdd:sde:sdf:sdg:sdh
Class2  no   -       10  10  8    0 sdi:sdj:sdk:sdl:sdm:sdn:sdo:sdp
Class3  no   -       10  10  8    0 sdq:sdr:sds:sdt:sdu:sdv:sdw:sdx
Class4  no   -       10  10  8    0 sdaa:sdab:sdac:sdad:sdae:sdaf:sdag:sdah

NLDB field gives the insufficient number of configuration databases. If this value is "0," the problem is resolved. If this value is "1" or more, there are still disks that have not been recovered. In the above example, all NLDB fields display "0," indicating the successful recovery.

11b) Open the GDS configuration parameter file with an editor.

# vi /etc/opt/FJSVsdx/sdx.cf

Remove the following line added at the end in procedure (5b).

SDX_DB_FAIL_NUM=0

12b) Reboot the system.

If you cannot perform the recovery with the described procedures, contact your local customer support.

Contents