PRIMECLUSTER Global Disk Services Configuration and Administration Guide 4.1 (Linux) |
Contents
![]() ![]() |
Appendix F Troubleshooting | > F.1 Resolving Problems |
You can check the status of the slice configuring the volume as follows.
# sdxinfo -S -o Volume1 OBJ CLASS GROUP DISK VOLUME STATUS ------ ------- ------- ------- -------- -------- slice Class1 Group1 Object1 Volume1 ACTIVE slice Class1 Group1 Object2 Volume1 INVALID |
In this example, among the slices that exist in volume Volume1, the slice within Object2 is in INVALID status, as shown in the STATUS field. Object2 is a disk or lower level group connected to the highest level mirror group Group1.
The following five events could possibly cause the INVALID status of the mirror slice Volume1.Object2.
1) Identify the physical disk name of a faulty disk using the sdxinfo command.
(Example A1)
# sdxinfo -G -o Volume1 OBJ NAME CLASS DISKS BLKS FREEBLKS SPARE ------ ------- ------- ------------------- -------- -------- ----- group Group1 Class1 Object1:Object2 17596416 17498112 0 # sdxinfo -D -o Volume1 -e long OBJ NAME TYPE CLASS GROUP DEVNAM DEVBLKS FREEBLKS DEVCONNECT STATUS E ------ ------- ------ ------- ------- ------- -------- -------- ---------------- ------- ----- disk Object1 mirror Class1 Group1 sda 17596416 * node1:node2 ENABLE * disk Object2 mirror Class1 Group1 sdb 17596416 * node1:node2 ENABLE * |
In this example, Object2 is a disk connected with the highest level group Group1. As indicated in the E field, an I/O error occurred on the disk Object2, and the possible cause is (Cause a) or (Cause b). The physical disk name of the disk Object2 is sdb as shown in the DEVNAM field.
In example A1, if the value 0 is in the E field of the disk Object2 including a slice in the INVALID status and if the value 1 is in the E field of the disk Object1 that is mirrored with the disk Object2, it indicates an I/O error occurred on the disk Object1 and the possible cause is (Cause a') or (Cause b'). In such a case, see "(2) The copy destination slice was made INVALID due to an I/O error generated on the copy source slice during synchronization copying" and perform restoration.
If the E field of any disk does not contain the value 1 in example A1, the possible cause is (Cause c).
(Example B1)
# sdxinfo -G -o Volume1 OBJ NAME CLASS DISKS BLKS FREEBLKS SPARE ------ ------- ------- ------------------- -------- -------- ----- group Group1 Class1 Object1:Object2 35127296 35028992 0 group Object1 Class1 Disk1:Disk2 35127296 * * group Object2 Class1 Disk3:Disk4 35127296 * * # sdxinfo -D -o Volume1 -e long OBJ NAME TYPE CLASS GROUP DEVNAM DEVBLKS FREEBLKS DEVCONNECT STATUS E ------ ------- ------ ------- ------- ------- -------- -------- ---------------- ------- ----- disk Disk1 stripe Class1 Object1 sda 17596416 * node1:node2 ENABLE 0 disk Disk2 stripe Class1 Object1 sdb 17596416 * node1:node2 ENABLE 0 disk Disk3 stripe Class1 Object2 sdc 17682084 * node1:node2 ENABLE 1 disk Disk4 stripe Class1 Object2 sdd 17682084 * node1:node2 ENABLE 0 |
In this example, Object2 is a lower level group connected with the highest level group Group1. As indicated in the E field, an I/O error occurred on the disk Disk3 connected with Object2 and the passible cause is (Cause a) or (Cause b). The physical disk name corresponding to Disk3 is sdc as shown in the DEVNAM field.
In example B1, if the 0 value is in the E field of the disks (Disk3 and Disk4) connected with the lower level group Object2 including a slice in the INVALID status and if the value 1 is in the E field of the disk (Disk1 or Disk2) connected with the lower level group Object1 that is mirrored with the lower level group Object2, it indicates an I/O error occurred on the disk connected with Object1 and the possible cause is (Cause a') or (Cause b'). In such a case, see "(2) The copy destination slice was made INVALID due to an I/O error generated on the copy source slice during synchronization copying" and perform restoration.
In example B1, if there are no disks with "1" in the E field, the possible cause of the INVALID status is (Cause c).
2) Refer to disk driver log messages and check the physical disk abnormalities.
The causes of disk hardware failures can be failures or defects in components such as I/O adapters, I/O cables, I/O controllers, power supplies, and fans other than the disks.
Contact your local customer support and specify which component failed, or might be defective.
If there are no failures or defective components, the possible cause of the INVALID is (Cause c).
If the possoible cause is (Cause a), follow procedures 3a to 5a, and 6. If the possible cause is (Cause b), follow procedures 3b and 6. If the possible cause is (Cause c), follow procedures 3c and 6.
3a) If the possible cause is (Cause a), you must perform the following operations, before and after the disk swap. For procedures on swapping disks from Operation Management View, see "DIsk Swap."
Before swapping the disks, execute the following command.
(Example A3)
# sdxswap -O -c Class1 -d Object2 |
In the example, disk Object2 connected to the highest level group Group1 will be swapped.
(Example B3)
# sdxswap -O -c Class1 -d Disk3 |
In the example, disk Disk3 will be swapped. Disk3 is a disk connected to lower level group Object2, which is a lower level group of the highest level group Group1.
4a) Swap disks.
5a) After swapping disks, execute the following command.
(Example A5)
# sdxswap -I -c Class1 -d Object2 |
or
(Example B5)
# sdxswap -I -c Class1 -d Disk3 |
3b) If the possible cause is (Cause b), you must shut down the system. Recover the failed component and boot your system. Mirroring status will be recovered by automatic synchronization copying.
3c) If the possible cause is (Cause c), perform synchronization copying of mirror volume.
# sdxcopy -B -c Class1 -v Volume1 |
6) You can confirm the recovery of the slice configuring the volume, as shown below.
# sdxinfo -S -o Volume1 OBJ CLASS GROUP DISK VOLUME STATUS ------ ------- ------- ------- ------- -------- slice Class1 Group1 Object1 Volume1 ACTIVE slice Class1 Group1 Object2 Volume1 ACTIVE |
In this example, the slices within Object1 and Object2 are both in ACTIVE status. This indicates that the recovery process was completed successfully.
When an I/O error occurs on the copy source slice during synchronization copying, the copy destination slice becomes INVALID while the source slice is still ACTIVE.
The following two events are possible causes.
For details about determining whether the status is relevant to one of these events and identifying the physical disk name of a faulty disk, see [Explanation] and procedure 1) of [Resolution] described in "(1) Mirror slice configuring the mirror volume is in INVALID status."
First examine the physical disk abnormalities referring to disk driver log messages and so on. Then contact your local customer support and locate the disabled or faulty part.
If the possible cause is (Cause b'), shut down the system once, repair the disabled part, and reboot the system. Consequently, synchronization copying is performed and the mirroring status is restored.
If the possible cause is (Cause a'), take the following procedures for restoration.
The following illustrates restoration procedures when the class name is Class1, the volume name is Volume1, the name of a faulty disk of the copy source is Disk1, and the name of a disk of the copy destination is Disk2 as examples.
# sdxinfo -S -o Volume1 OBJ CLASS GROUP DISK VOLUME STATUS ------ ------- ------- ------- ------- -------- slice Class1 Group1 Disk1 Volume1 ACTIVE slice Class1 Group1 Disk2 Volume1 INVALID # sdxinfo -G -o Volume1 OBJ NAME CLASS DISKS BLKS FREEBLKS SPARE ------ ------- ------- ------------------- -------- -------- ----- group Group1 Class1 Disk1:Disk2 17596416 17498112 0 # sdxinfo -D -o Volume1 -e long OBJ NAME TYPE CLASS GROUP DEVNAM DEVBLKS FREEBLKS DEVCONNECT STATUS E ------ ------- ------ ------- ------- ------- -------- -------- ---------------- ------- ----- disk Disk1 mirror Class1 Group1 sda 17596416 * node1:node2 ENABLE 1 disk Disk2 mirror Class1 Group1 sdb 17596416 * node1:node2 ENABLE 0 |
1) Exit applications using the volume.
2) Unmount the file system on the volume when it has been mounted.
In this example, the volume has been used as an ext3 file system.
# umount /dev/sfdsk/Class1/dsk/Volume1 |
Depending on the part or the severity of failure in disks that compose the volume, the umount command may fail due to an I/O error. In this event, unmount the file system performing steps 2-1) through 2-3).
2-1) If class Class1 is registered with a cluster application, remove the cluster application.
2-2) Comment out the /dev/sfdsk/Class1/dsk/Volume1 line in the /etc/fstab file to prevent mounting of the volume after the system is rebooted.
2-3) Reboot the system.
3) Stop the volume.
# sdxvolume -F -c Class1 -v Volume1 -e allnodes |
4) Restore the status of the copy destination slice in the INVALID status.
# sdxfix -V -c Class1 -d Disk2 -v Volume1 |
5) Verify that the restored copy destination slice is in the STOP status and the copy source slice is the INVALID status now.
# sdxinfo -S -o Volume1 OBJ CLASS GROUP DISK VOLUME STATUS ------ ------- ------- ------- ------- -------- slice Class1 Group1 Disk1 Volume1 INVALID slice Class1 Group1 Disk2 Volume1 STOP |
6) Start the volume.
# sdxvolume -N -c Class1 -v Volume1 |
7) The consistency of volume data may have lost. Restore the backup data or perform repair using the fsck(8) command if necessary.
If the I/O error occurred on the copy source slice during resynchronization copying after the system down, restoration may possibly be performed with the fsck(8) command.
When step 2-2) was performed, undo the edit that was made in the /etc/fstab file.
When step 2-1) was performed, re-create the cluster application removed in step 2-1).
8) Remove the faulty copy source disk from GDS management to make it a replaceable status.
# sdxswap -O -c Class1 -d Disk1 |
9) Swap the faulty copy source.
10) Put the swapped disk back in control of GDS management to make it available.
# sdxswap -I -c Class1 -d Disk1 |
The slice was not attached after it has been detached with the sdxslice command. Or else, you have not performed [Attach Slice] after performing [Detach Slice] from Operation Management View.
Attach the slice again with the sdxslice command, or perform [Attach Slice] from Operation Management View as necessary.
The slice was not activated after it has been stopped with the sdxslice command, or the detached node is not current node.
Or else, you have not performed [Stop Slice] after performing [Activate Slice] from Operation Management View.
Activate slice or take over slice with the sdxslice command as needed. Or, perform [Activate Slice] from Operation Management View.
In order to attach a slice, synchronization copying is currently in process.
Or, synchronization copying is in process between master and proxy.
Wait until synchronization copying is complete. Note that a slice in the process of synchronization copying will not restrict you from accessing an active volume.
When the status of disk related to slice is either in DISABLE or SWAP status, the slice becomes NOUSE to inhibit slice operation.
Recover disk in DISABLE or SWAP status. For details, see "Disk Status Abnormality."
Contents
![]() ![]() |