PRIMECLUSTER Global Disk Services Configuration and Administration Guide 4.1 (Solaris(TM) Operating System) |
Contents |
Appendix F Troubleshooting | > F.1 Resolving Problems |
If the class status is one of the following statuses, take action as indicated for the relevant situation.
The class becomes closed when the number of configuration databases which store information on object configuration and object status within a class is insufficient, or when the communication error between nodes occurs in a cluster environment.
All objects within a closed class are inaccessible.
An Insufficient number of configuration databases will occur under the following conditions:
When there are no disks that can be accessed normally, if there are two or less disks in ENABLE status.
When there are one or less disks that can be accessed normally, if there are three to five disks in ENABLE status.
When there are two or less disks that can be accessed normally, if there are more than six disks in ENABLE status.
However, in the event of root class, the class will not be closed unless there are no accessible disks.
GDS configuration databases cannot be stored in BCV devices and target (R2) devices since the devices are overwritten by data in copy source disks. Therefore, GDS does not regard BCV devices and target (R2) devices as "disks that can be accessed normally" described in the above conditions.
1) You can check whether or not a class was closed during operation as follows. Do not reboot the system or restart sdxservd daemon, as it will make the checking impossible.
# /etc/opt/FJSVsdx/bin/sdxdcdown CLASS DOWN REASON NDK NEN NDB NLDB DEVNAM ------- ---- ------ --- --- --- ---- -------------------------- Class1 no - 10 10 8 0 c1t1d0:c1t2d0:c1t3d0:c1t4d0:c2t1d0:c2t2d0:c2t3d0:c2t4d0 Class2 yes Comm 10 10 8 0 c3t1d0:c3t2d0:c3t3d0:c3t4d0:c4t1d0:c4t2d0:c4t3d0:c4t4d0 Class3 yes FewDB 10 10 1 7 c5t1d0 Class4 yes NoDB 10 10 0 8 - |
In this example, Class2, Class3, and Class4 with "yes" in the DOWN field are closed. The cause shown in the REASON field are as follows.
2) Depending on specific causes, recovery may be difficult.
First, collect the investigation material.
For information on how to collect the investigation material, see "Collecting Investigation Material."
Resolutions are described for the following two cases:
Closed due to a communication error
Closed due to an insufficient number of configuration databases
3a) In the even of (Cause 1), contact your local customer support.
3b) In the event of (Cause 2) or (Cause 3), all (or the majority) of the disks registered with class have abnormalities.
You can check the disks registered with class as follows.
# sdxinfo -D -c Class3 OBJ NAME TYPE CLASS GROUP DEVNAM DEVBLKS DEVCONNECT STATUS ------ ------- ------ ------- ------- ------- -------- ---------------- ------- disk Disk31 mirror Class3 Group1 c1t1d0 8847360 * ENABLE disk Disk32 mirror Class3 Group1 c2t1d0 8847360 * ENABLE disk Disk33 mirror Class3 Group2 c1t2d0 8847360 * ENABLE disk Disk34 mirror Class3 Group2 c2t2d0 8847360 * ENABLE disk Disk35 mirror Class3 Group3 c1t3d0 17793024 * ENABLE disk Disk36 mirror Class3 Group3 c2t3d0 17793024 * ENABLE disk Disk37 mirror Class3 Group4 c1t4d0 17793024 * ENABLE disk Disk38 mirror Class3 Group4 c2t4d0 17793024 * ENABLE disk Disk39 spare Class3 Group1 c1t5d0 17793024 * ENABLE disk Disk40 spare Class3 * c2t5d0 17727488 * ENABLE |
In this example, ten disks from Disk31 to Disk40 are registered with Class3.
Physical disk names are shown in the DEVNAM field. Identify the cause of abnormality with these physical disks by referring to disk driver log messages.
The cause of abnormality could be either of the following:
4b) In the event of (Failure 1), recover the failed or defective non-disk component (such as I/O adapter, I/O cable, I/O controller, power supply, and fan).
5b) For a local class or a shared class, execute the sdxfix command to restore the class status.
# sdxfix -C -c Class3 |
If the sdxfix command ends normally, skip steps 6b) through 9b) and go on to step 10b).
If the sdxfix command does not end normally, go on to step 6b).
For the root class, go on to step 6b).
6b) Open the GDS configuration parameter file with an editor.
# vi /etc/opt/FJSVsdx/sdx.cf |
Add the following one line in the end of the file.
SDX_DB_FAIL_NUM=0
7b) Reboot the system.
8b) Confirm that objects within the class are accessible.
# sdxinfo -c Class3 |
If nothing is displayed, recovery was unsuccessful. You will have to contact your local customer support. If information is displayed normally, proceed with the following procedures.
9b) In the event of (Failure 2), where a disk component has failed, follow the procedures in "Disk Swap," or "sdxswap - Swap disk," and swap the disks.
10b) After completing the recovery for both (Failure 1) and (Failure 2), check the number of valid configuration databases as described below.
# /etc/opt/FJSVsdx/bin/sdxdcdown CLASS DOWN REASON NDK NEN NDB NLDB DEVNAM ------- ---- ------ --- --- --- ---- --------------------------- Class1 no - 10 10 8 0 c1t1d0:c1t2d0:c1t3d0:c1t4d0:c2t1d0:c2t2d0:c2t3d0:c2t4d0 Class2 no - 10 10 8 0 c3t1d0:c3t2d0:c3t3d0:c3t4d0:c4t1d0:c4t2d0:c4t3d0:c4t4d0 Class3 no - 10 10 8 0 c5t1d0:c5t2d0:c5t3d0:c5t4d0:c6t1d0:c6t2d0:c6t3d0:c6t4d0 Class4 no - 10 10 8 0 c7t1d0:c7t2d0:c7t3d0:c7t4d0:c8t1d0:c8t2d0:c8t3d0:c8t4d0 |
NLDB field gives the insufficient number of configuration databases. If this value is "0," the problem is resolved. If this value is "1" or more, there are still disks that have not been recovered. In the above example, all NLDB fields display "0," indicating the successful recovery.
When step 6b) was not performed, the following procedures are not required.
11b) Open the GDS configuration parameter file with an editor.
# vi /etc/opt/FJSVsdx/sdx.cf |
Remove the following one line added in step 6b).
SDX_DB_FAIL_NUM=0
12b) Reboot the system.
If you cannot perform the recovery with the described procedures, contact your local customer support.
Contents |