D.1.4 Class Status Abnormality

If the class status is one of the following statuses, take the actions as indicated for the relevant situation.

(1) Class becomes closed status during operation.

Explanation

The class becomes closed when the number of configuration databases which store information on object configuration and object status within a class is insufficient, or when the communication error between nodes occurs in a cluster environment. All objects within a closed class are inaccessible.

An Insufficient number of configuration databases will occur under the following conditions:

When there are no disks that can be accessed normally, if there are two or less disks in ENABLE status.
When there are one or less disks that can be accessed normally, if there are three to five disks in ENABLE status.
When there are two or less disks that can be accessed normally, if there are more than six disks in ENABLE status.

However, in the event of root class, the class will not be closed unless there are no accessible disks.

GDS configuration databases cannot be stored in Dell EMC storage unit BCV devices and target (R2) devices since the devices are overwritten by data in copy source disks. Therefore, GDS does not regard BCV devices and target (R2) devices as "disks that can be accessed normally" described in the above conditions.

Resolution

1) You can check whether or not a class was closed during operation as follows. Do not reboot the system or restart sdxservd daemon, as it will make the checking impossible.

# /etc/opt/FJSVsdx/bin/sdxdcdown
CLASS   DOWN REASON NDK NEN NDB NLDB DEVNAM
------- ---- ------ --- --- --- ---- -------------------------------
Class1  no   -       10  10   8    0 sda:sdb:sdc:sdd:sde:sdf:sdg:sdh
Class2  yes  Comm    10  10   8    0 sdi:sdj:sdk:sdl:sdm:sdn:sdo:sdp
Class3  yes  FewDB   10  10   1    7 sdq
Class4  yes  NoDB    10  10   0    8 -

In this example, Class2, Class3, and Class4 with "yes" in the DOWN field are closed. The cause shown in the REASON field are as follows.

(Cause 1): Comm Communication failure between nodes.

(Cause 2): FewDB Insufficient number of valid configuration databases.

(Cause 3): NoDB No valid configuration database.

2) Depending on specific causes, recovery may be difficult. First, collect the investigation material.

For information on how to collect the investigation material, see "D.2 Collecting Investigation Material."

Resolutions are described for the following two cases:

Closed due to a communication error
Closed due to an insufficient number of configuration databases

3a) In the event of (Cause 1), contact field engineers.

3b) In the event of (Cause 2) or (Cause 3), all (or the majority) of the disks registered with class have abnormalities.

You can check the disks registered with class as follows.

# sdxinfo -D -c Class3
OBJ    NAME    TYPE   CLASS   GROUP   DEVNAM  DEVBLKS  DEVCONNECT       STATUS
------ ------- ------ ------- ------- ------- -------- ---------------- -------
disk   Disk31  mirror Class3  Group1  sda      8847360 *                ENABLE
disk   Disk32  mirror Class3  Group1  sdb      8847360 *                ENABLE
disk   Disk33  mirror Class3  Group2  sde      8847360 *                ENABLE
disk   Disk34  mirror Class3  Group2  sdf      8847360 *                ENABLE
disk   Disk35  mirror Class3  Group3  sdc     17793024 *                ENABLE
disk   Disk36  mirror Class3  Group3  sdg     17793024 *                ENABLE
disk   Disk37  mirror Class3  Group4  sdd     17793024 *                ENABLE
disk   Disk38  mirror Class3  Group4  sdh     17793024 *                ENABLE
disk   Disk39  spare  Class3  Group1  sdr     17793024 *                ENABLE
disk   Disk40  spare  Class3  *       sds     17727488 *                ENABLE

In this example, ten disks from Disk31 to Disk40 are registered with Class3. Physical disk names are shown in the DEVNAM field. Identify the cause of abnormality with these physical disks by referring to disk driver log messages. The cause of abnormality could be either of the following:

(Failure 1): Failed or defective non-disk component.

(Failure 2): Failed disk component.

4b) In the event of (Failure 1), recover the failed or defective non-disk component (such as I/O adapter, I/O cable, I/O controller, power supply, and fan).

5b) For a local class or a shared class, execute the sdxfix command to restore the class status.

# sdxfix -C -c Class3
SDX:sdxfix: INFO: Class3: class recovery completed successfully

If the sdxfix command ends normally, skip steps 6b) through 9b) and go on to step 10b).
If the sdxfix command does not end normally, go on to step 6b).
For the root class, go on to step 6b).

6b) Open the GDS configuration parameter file with an editor.

# vim /etc/opt/FJSVsdx/sdx.cf

Add the following one line in the end of the file.

SDX_DB_FAIL_NUM=0

7b) Reboot the system.

8b) Confirm that objects within the class are accessible.

# sdxinfo -c Class3

If nothing is displayed, recovery was unsuccessful. You will have to contact field engineers. If information is displayed normally, proceed with the following procedures.

9b) In the event of (Failure 2), where a disk component has failed, follow the procedures in "7.3.1.2 Operation Procedure," or "B.1.8 sdxswap - Swap Disk," and swap the disks.

10b) After completing the recovery for both (Failure 1) and (Failure 2), check the number of valid configuration databases as described below.

# /etc/opt/FJSVsdx/bin/sdxdcdown
CLASS   DOWN REASON NDK NEN NDB NLDB DEVNAM
------- ---- ------ --- --- --- ---- ---------------------------------------
Class1  no   -       10  10   8    0 sda:sdb:sdc:sdd:sde:sdf:sdg:sdh
Class2  no   -       10  10   8    0 sdi:sdj:sdk:sdl:sdm:sdn:sdo:sdp
Class3  no   -       10  10   8    0 sdq:sdr:sds:sdt:sdu:sdv:sdw:sdx
Class4  no   -       10  10   8    0 sdaa:sdab:sdac:sdad:sdae:sdaf:sdag:sdah

NLDB field gives the insufficient number of configuration databases. If this value is "0," the problem is resolved. If this value is "1" or more, there are still disks that have not been recovered. In the above example, all NLDB fields display "0," indicating the successful recovery.
When step 6b) was not performed, the following procedures are not required.

11b) Open the GDS configuration parameter file with an editor.

# vim /etc/opt/FJSVsdx/sdx.cf

Remove the following one line added in step 6b).

SDX_DB_FAIL_NUM=0

12b) Reboot the system.

If you cannot perform the recovery with the described procedures, contact field engineers.

(2) Class cannot be started when booting the system.

Explanation

When booting the system, if the configuration database that contains the configurations and status of objects in the class is not accessible due to an I/O error of the disk or similar causes, the class remains unstarted.

All objects in the class that remains unstarted are not accessible. Also, the class which remains unstarted will not display related objects or any other class information even running the sdxinfo command.

Resolution

1) Identify the cause by referring to the sfdsk and disk driver message output on the console, and recover the class.

2) See (Cause a) and (Cause b) in section "(1) Disk is in DISABLE status." in "D.1.2 Disk Status Abnormality", and take the corresponding resolution measures.

Note

Shutting down the node

If the class is registered in the cluster application, perform the following procedure before shutting down the node.

If the following procedure is not performed, the Offline process of userApplication may not be completed and the shutdown process may be waited.

Check the status of userApplication on each node.
# hvdisp -a
If there is no userApplication of which the status is "Unknown", step 2. and subsequent steps do not need to be performed.
Shut down the node as usual.
Stop RMS and then shut down the node.
The procedure differs depending on the class status.
a) If unstarted classes exist in all the nodes
a-1) Stop userApplication of which the status is not "Unknown" on all the node.
# hvutil -f userApplication_name
a-2) Stop RMS on all the nodes.
Execute the following command on all the nodes.
# hvshut -L
Enter "yes" for the warning message displayed during execution.
a-3) Shut down all the nodes.
b) If there is a node in which all the classes are started
b-1) If there is a userApplication of which the status is not "Unknown" on the node where no class is started, switch userApplication to the node where the class is started.
# hvswitch userApplication_name SysNode_name_where_the_class_is_started
b-2) Stop RMS.
Execute the following command on the node where no class is started.
# hvshut -L
Enter "yes" for the warning message displayed during execution.
b-3) Shut down the node where no class is started.

For details of each command, see the manual of each command.

In any of the following situations, contact field engineers, as the corresponding class must be forcibly removed and then recreated.

If you want to use a disk other than the normal hard disk (sdX):
Add the following text to the /etc/opt/FJSVsdx/sdx.cf file on all nodes.
```
       SDX_DEVLABEL_USE=off
```
If you have mistakenly formatted a disk that is registered with GDS when operating an ETERNUS Disk storage system or other server case.