Top
PRIMECLUSTERGlobal Disk Services Configuration and AdministrationGuide 4.5
FUJITSU Software

F.1.4 Class Status Abnormality

If the class status is one of the following statuses, take action as indicated for the relevant situation.

(1) Class becomes closed status during operation.

Explanation

The class becomes closed when the number of configuration databases which store information on object configuration and object status within a class is insufficient, or when the communication error between nodes occurs in a cluster environment. All objects within a closed class are inaccessible.

An Insufficient number of configuration databases will occur under the following conditions:

  1. When there are no disks that can be accessed normally, if there are two or less disks in ENABLE status.

  2. When there are one or less disks that can be accessed normally, if there are three to five disks in ENABLE status.

  3. When there are two or less disks that can be accessed normally, if there are more than six disks in ENABLE status.

However, in the event of root class, the class will not be closed unless there are no accessible disks.

GDS configuration databases cannot be stored in BCV devices and target (R2) devices of Dell EMC storage units since the devices are overwritten by data in copy source disks. Therefore, GDS does not regard BCV devices and target (R2) devices as "disks that can be accessed normally" described in the above conditions.

Resolution

1) You can check whether or not a class was closed during operation as follows. Do not reboot the system or restart sdxservd daemon, as it will make the checking impossible.

# /etc/opt/FJSVsdx/bin/sdxdcdown
CLASS DOWN REASON NDK NEN NDB NLDB DEVNAM ------- ---- ------ --- --- --- ---- ------------------------------------------------------- Class1 no - 10 10 8 0 c1t1d0:c1t2d0:c1t3d0:c1t4d0:c2t1d0:c2t2d0:c2t3d0:c2t4d0 Class2 yes Comm 10 10 8 0 c3t1d0:c3t2d0:c3t3d0:c3t4d0:c4t1d0:c4t2d0:c4t3d0:c4t4d0 Class3 yes FewDB 10 10 1 7 c5t1d0 Class4 yes NoDB 10 10 0 8 -

In this example, Class2, Class3, and Class4 with "yes" in the DOWN field are closed. The cause shown in the REASON fields are as follows.

(Cause 1)

Comm Communication failure between nodes.

(Cause 2)

FewDB Insufficient number of valid configuration databases.

(Cause 3)

NoDB No valid configuration database.


2) Depending on specific causes, recovery may be difficult. First, collect the investigation material.

For information on how to collect the investigation material, see "F.2 Collecting Investigation Material."

Resolutions are described for the following two cases:

  1. Closed due to a communication error

  2. Closed due to an insufficient number of configuration databases


3a) In the event of (Cause 1), contact field engineers.


3b) In the event of (Cause 2) or (Cause 3), all (or the majority) of the disks registered with class have abnormalities.

You can check the disks registered with class as follows.

# sdxinfo -D -c Class3
OBJ NAME TYPE CLASS GROUP DEVNAM DEVBLKS DEVCONNECT STATUS ------ ------- ------ ------- ------- ------- -------- ---------------- ------- disk Disk31 mirror Class3 Group1 c1t1d0 8847360 * ENABLE disk Disk32 mirror Class3 Group1 c2t1d0 8847360 * ENABLE disk Disk33 mirror Class3 Group2 c1t2d0 8847360 * ENABLE disk Disk34 mirror Class3 Group2 c2t2d0 8847360 * ENABLE disk Disk35 mirror Class3 Group3 c1t3d0 17793024 * ENABLE disk Disk36 mirror Class3 Group3 c2t3d0 17793024 * ENABLE disk Disk37 mirror Class3 Group4 c1t4d0 17793024 * ENABLE disk Disk38 mirror Class3 Group4 c2t4d0 17793024 * ENABLE disk Disk39 spare Class3 Group1 c1t5d0 17793024 * ENABLE disk Disk40 spare Class3 * c2t5d0 17727488 * ENABLE

In this example, ten disks from Disk31 to Disk40 are registered with Class3. Physical disk names are shown in the DEVNAM field. Identify the cause of abnormality with these physical disks by referring to disk driver log messages. The cause of abnormality could be either of the following:

(Failure 1)

Failed or defective non-disk component.

(Failure 2)

Failed disk component.


4b) In the event of (Failure 1), recover the failed or defective non-disk component (such as I/O adapter, I/O cable, I/O controller, power supply, and fan).


5b) For a local class or a shared class, execute the sdxfix command to restore the class status.

# sdxfix -C -c Class3
SDX:sdxfix: INFO: Class3: class recovery completed successfully

6b) Open the GDS configuration parameter file with an editor.

# vi /etc/opt/FJSVsdx/sdx.cf

Add the following one line in the end of the file.

SDX_DB_FAIL_NUM=0


7b) Reboot the system.


8b) Confirm that objects within the class are accessible.

# sdxinfo -c Class3

If nothing is displayed, recovery was unsuccessful. You will have to contact field engineers. If information is displayed normally, proceed with the following procedures.


9b) In the event of (Failure 2), where a disk component has failed, follow the procedures in "5.3.4 Disk Swap," or "D.8 sdxswap - Swap disk," and swap the disks.


10b) After completing the recovery for both (Failure 1) and (Failure 2), check the number of valid configuration databases as described below.

# /etc/opt/FJSVsdx/bin/sdxdcdown
CLASS DOWN REASON NDK NEN NDB NLDB DEVNAM ------- ---- ------ --- --- --- ---- ------------------------------------------------------- Class1 no - 10 10 8 0 c1t1d0:c1t2d0:c1t3d0:c1t4d0:c2t1d0:c2t2d0:c2t3d0:c2t4d0 Class2 no - 10 10 8 0 c3t1d0:c3t2d0:c3t3d0:c3t4d0:c4t1d0:c4t2d0:c4t3d0:c4t4d0 Class3 no - 10 10 8 0 c5t1d0:c5t2d0:c5t3d0:c5t4d0:c6t1d0:c6t2d0:c6t3d0:c6t4d0 Class4 no - 10 10 8 0 c7t1d0:c7t2d0:c7t3d0:c7t4d0:c8t1d0:c8t2d0:c8t3d0:c8t4d0

NLDB field gives the insufficient number of configuration databases. If this value is "0," the problem is resolved. If this value is "1" or more, there are still disks that have not been recovered. In the above example, all NLDB fields display "0," indicating the successful recovery.
When step 6b) was not performed, the following procedures are not required.


11b) Open the GDS configuration parameter file with an editor.

# vi /etc/opt/FJSVsdx/sdx.cf

Remove the following one line added in step 6b).

SDX_DB_FAIL_NUM=0


12b) Reboot the system.


If you cannot perform the recovery with the described procedures, contact field engineers.

(2) Class cannot be started when booting the system

Explanation

When booting the system, if the configuration database that contains the configurations and status of objects in the class is not accessible due to an I/O error of the disk or similar causes, the class remains unstarted.

All objects in the class that remains unstarted are not accessible. Also, the class which remains unstarted will not display related objects or any other class information even running the sdxinfo command.

Resolution

1) Identify the cause by referring to the sfdsk and disk driver message output on the console, and recover the class.


2) See (Cause a) and (Cause b) in section "(1) Disk is in DISABLE status." in "F.1.2 Disk Status Abnormality", and take the corresponding resolution measures.

In any of the following situations, contact field engineers, as the corresponding class must be forcibly removed and then recreated.