F.1.1 Slice Status Abnormality

PRIMECLUSTER Global Disk Services Configuration and Administration Guide 4.1 (Linux)

Contents

Appendix F Troubleshooting

> F.1 Resolving Problems

F.1.1 Slice Status Abnormality

(1) Mirror slice configuring the mirror volume is in INVALID status.

[Explanation]

You can check the status of the slice configuring the volume as follows.

# sdxinfo -S -o Volume1

OBJ    CLASS   GROUP   DISK    VOLUME   STATUS
------ ------- ------- ------- -------- --------
slice  Class1  Group1  Object1   Volume1  ACTIVE
slice  Class1  Group1  Object2   Volume1  INVALID

In this example, among the slices that exist in volume Volume1, the slice within Object2 is in INVALID status, as shown in the STATUS field. Object2 is a disk or lower level group connected to the highest level mirror group Group1.

The following five events could possibly cause the INVALID status of the mirror slice Volume1.Object2.

An I/O error occurred on the mirror slice Volume1.Object2.

(Cause a)

A disk component relevant to Object2 failed to operate properly, and an I/O error occurred on the mirror slice Volume1.Object2.
(Cause b)

A component other than disks relevant to Object2 (such as an I/O adapter, an I/O cable, an I/O controller, a power supply, and a fan) failed to operate properly, and an I/O error occurred on the mirror slice Volume1.Object2.
An I/O error occurred on the mirror slice Volume1.Object1 during synchronization copying to the mirror slice Volume1.Object2.

(Cause a')

A disk component relevant to Object1 failed to operate properly during synchronization copying to the mirror slice Volume1.Object2, and an I/O error occurred on the copy source slice Volume1.Object1.
(Cause b')

A component other than disks relevant to Object1 (such as an I/O adapter, an I/O cable, an I/O controller, a power supply and a fan) failed to operate properly during synchronization copying to the mirror slice Volume1.Object2, and an I/O error occurred on the mirror slice Volume1.Object1 that is the copy source.
Others

(Cause c)

Synchronization copying to the mirror slice Volume1.Object2 was cancelled as the result of a cause such as [Cancel Copying] selection from the GDS Management View, sdxcopy command execution, or a power outage.

[Resolution]

1) Identify the physical disk name of a faulty disk using the sdxinfo command.

(Example A1)

# sdxinfo -G -o Volume1

OBJ    NAME    CLASS   DISKS               BLKS     FREEBLKS SPARE
------ ------- ------- ------------------- -------- -------- -----
group  Group1  Class1  Object1:Object2     17596416 17498112     0

# sdxinfo -D -o Volume1 -e long

OBJ    NAME    TYPE   CLASS   GROUP   DEVNAM  DEVBLKS  FREEBLKS DEVCONNECT       STATUS  E
------ ------- ------ ------- ------- ------- -------- -------- ---------------- ------- -----
disk   Object1 mirror Class1  Group1  sda     17596416 *        node1:node2      ENABLE  *
disk   Object2 mirror Class1  Group1  sdb     17596416 *        node1:node2      ENABLE  *

In this example, Object2 is a disk connected with the highest level group Group1. As indicated in the E field, an I/O error occurred on the disk Object2, and the possible cause is (Cause a) or (Cause b). The physical disk name of the disk Object2 is sdb as shown in the DEVNAM field.

In example A1, if the value 0 is in the E field of the disk Object2 including a slice in the INVALID status and if the value 1 is in the E field of the disk Object1 that is mirrored with the disk Object2, it indicates an I/O error occurred on the disk Object1 and the possible cause is (Cause a') or (Cause b'). In such a case, see "(2) The copy destination slice was made INVALID due to an I/O error generated on the copy source slice during synchronization copying" and perform restoration.

If the E field of any disk does not contain the value 1 in example A1, the possible cause is (Cause c).

(Example B1)

# sdxinfo -G -o Volume1

OBJ    NAME    CLASS   DISKS               BLKS     FREEBLKS SPARE
------ ------- ------- ------------------- -------- -------- -----
group  Group1  Class1  Object1:Object2     35127296 35028992     0
group  Object1 Class1  Disk1:Disk2         35127296 *            *
group  Object2 Class1  Disk3:Disk4         35127296 *            *

# sdxinfo -D -o Volume1 -e long

OBJ    NAME    TYPE   CLASS   GROUP   DEVNAM  DEVBLKS  FREEBLKS DEVCONNECT       STATUS  E
------ ------- ------ ------- ------- ------- -------- -------- ---------------- ------- -----
disk   Disk1   stripe Class1  Object1 sda     17596416 *        node1:node2      ENABLE  0
disk   Disk2   stripe Class1  Object1 sdb     17596416 *        node1:node2      ENABLE  0
disk   Disk3   stripe Class1  Object2 sdc     17682084 *        node1:node2      ENABLE  1
disk   Disk4   stripe Class1  Object2 sdd     17682084 *        node1:node2      ENABLE  0

In this example, Object2 is a lower level group connected with the highest level group Group1. As indicated in the E field, an I/O error occurred on the disk Disk3 connected with Object2 and the passible cause is (Cause a) or (Cause b). The physical disk name corresponding to Disk3 is sdc as shown in the DEVNAM field.

In example B1, if the 0 value is in the E field of the disks (Disk3 and Disk4) connected with the lower level group Object2 including a slice in the INVALID status and if the value 1 is in the E field of the disk (Disk1 or Disk2) connected with the lower level group Object1 that is mirrored with the lower level group Object2, it indicates an I/O error occurred on the disk connected with Object1 and the possible cause is (Cause a') or (Cause b'). In such a case, see "(2) The copy destination slice was made INVALID due to an I/O error generated on the copy source slice during synchronization copying" and perform restoration.

In example B1, if there are no disks with "1" in the E field, the possible cause of the INVALID status is (Cause c).

2) Refer to disk driver log messages and check the physical disk abnormalities.

The causes of disk hardware failures can be failures or defects in components such as I/O adapters, I/O cables, I/O controllers, power supplies, and fans other than the disks.

Contact your local customer support and specify which component failed, or might be defective.

If there are no failures or defective components, the possible cause of the INVALID is (Cause c).

If the possoible cause is (Cause a), follow procedures 3a to 5a, and 6. If the possible cause is (Cause b), follow procedures 3b and 6. If the possible cause is (Cause c), follow procedures 3c and 6.

3a) If the possible cause is (Cause a), you must perform the following operations, before and after the disk swap. For procedures on swapping disks from Operation Management View, see "DIsk Swap."

Before swapping the disks, execute the following command.

(Example A3)

# sdxswap -O -c Class1 -d Object2

In the example, disk Object2 connected to the highest level group Group1 will be swapped.

(Example B3)

# sdxswap -O -c Class1 -d Disk3

In the example, disk Disk3 will be swapped. Disk3 is a disk connected to lower level group Object2, which is a lower level group of the highest level group Group1.

4a) Swap disks.

5a) After swapping disks, execute the following command.

(Example A5)

# sdxswap -I -c Class1 -d Object2

(Example B5)

# sdxswap -I -c Class1 -d Disk3

3b) If the possible cause is (Cause b), you must shut down the system. Recover the failed component and boot your system. Mirroring status will be recovered by automatic synchronization copying.

3c) If the possible cause is (Cause c), perform synchronization copying of mirror volume.

# sdxcopy -B -c Class1 -v Volume1

6) You can confirm the recovery of the slice configuring the volume, as shown below.

# sdxinfo -S -o Volume1

OBJ    CLASS   GROUP   DISK    VOLUME  STATUS
------ ------- ------- ------- ------- --------
slice  Class1  Group1  Object1 Volume1 ACTIVE
slice  Class1  Group1  Object2 Volume1 ACTIVE

In this example, the slices within Object1 and Object2 are both in ACTIVE status. This indicates that the recovery process was completed successfully.

(2) The copy destination slice was made INVALID due to an I/O error generated on the copy source slice during synchronization copying.

[Explanation]

When an I/O error occurs on the copy source slice during synchronization copying, the copy destination slice becomes INVALID while the source slice is still ACTIVE.

The following two events are possible causes.

(Cause a'): A disk component of the copy source failed to operate properly, and an I/O error occurred on the copy source slice.
(Cause b'): A component other than the copy source disk (such as an I/O adapter, an I/O cable, an I/O controller, a power supply, and a fan) failed to operate properly during synchronization copying, and an I/O error occurred on the copy source slice.

For details about determining whether the status is relevant to one of these events and identifying the physical disk name of a faulty disk, see [Explanation] and procedure 1) of [Resolution] described in "(1) Mirror slice configuring the mirror volume is in INVALID status."

[Resolution]

First examine the physical disk abnormalities referring to disk driver log messages and so on. Then contact your local customer support and locate the disabled or faulty part.

If the possible cause is (Cause b'), shut down the system once, repair the disabled part, and reboot the system. Consequently, synchronization copying is performed and the mirroring status is restored.

If the possible cause is (Cause a'), take the following procedures for restoration.

The following illustrates restoration procedures when the class name is Class1, the volume name is Volume1, the name of a faulty disk of the copy source is Disk1, and the name of a disk of the copy destination is Disk2 as examples.

# sdxinfo -S -o Volume1

OBJ    CLASS   GROUP   DISK    VOLUME  STATUS
------ ------- ------- ------- ------- --------
slice  Class1  Group1  Disk1   Volume1 ACTIVE
slice  Class1  Group1  Disk2   Volume1 INVALID

# sdxinfo -G -o Volume1

OBJ    NAME    CLASS   DISKS               BLKS     FREEBLKS SPARE
------ ------- ------- ------------------- -------- -------- -----
group  Group1  Class1  Disk1:Disk2         17596416 17498112     0

# sdxinfo -D -o Volume1 -e long

OBJ    NAME    TYPE   CLASS   GROUP   DEVNAM  DEVBLKS  FREEBLKS DEVCONNECT       STATUS  E
------ ------- ------ ------- ------- ------- -------- -------- ---------------- ------- -----
disk   Disk1   mirror Class1  Group1  sda     17596416 *        node1:node2      ENABLE  1
disk   Disk2   mirror Class1  Group1  sdb     17596416 *        node1:node2      ENABLE  0

1) Exit applications using the volume.

2) Unmount the file system on the volume when it has been mounted.
In this example, the volume has been used as an ext3 file system.

# umount /dev/sfdsk/Class1/dsk/Volume1

Depending on the part or the severity of failure in disks that compose the volume, the umount command may fail due to an I/O error. In this event, unmount the file system performing steps 2-1) through 2-3).

2-1) If class Class1 is registered with a cluster application, remove the cluster application.

2-2) Comment out the /dev/sfdsk/Class1/dsk/Volume1 line in the /etc/fstab file to prevent mounting of the volume after the system is rebooted.

2-3) Reboot the system.

3) Stop the volume.

# sdxvolume -F -c Class1 -v Volume1 -e allnodes

4) Restore the status of the copy destination slice in the INVALID status.

# sdxfix -V -c Class1 -d Disk2 -v Volume1

5) Verify that the restored copy destination slice is in the STOP status and the copy source slice is the INVALID status now.

# sdxinfo -S -o Volume1

OBJ    CLASS   GROUP   DISK    VOLUME  STATUS
------ ------- ------- ------- ------- --------
slice  Class1  Group1  Disk1   Volume1 INVALID
slice  Class1  Group1  Disk2   Volume1 STOP

6) Start the volume.

# sdxvolume -N -c Class1 -v Volume1

7) The consistency of volume data may have lost. Restore the backup data or perform repair using the fsck(8) command if necessary.

If the I/O error occurred on the copy source slice during resynchronization copying after the system down, restoration may possibly be performed with the fsck(8) command.

When step 2-2) was performed, undo the edit that was made in the /etc/fstab file.
When step 2-1) was performed, re-create the cluster application removed in step 2-1).

8) Remove the faulty copy source disk from GDS management to make it a replaceable status.

# sdxswap -O -c Class1 -d Disk1

9) Swap the faulty copy source.

10) Put the swapped disk back in control of GDS management to make it available.

# sdxswap -I -c Class1 -d Disk1

(3) Slice configuring the volume is in TEMP status.

[Explanation]

The slice was not attached after it has been detached with the sdxslice command. Or else, you have not performed [Attach Slice] after performing [Detach Slice] from Operation Management View.

[Resolution]

Attach the slice again with the sdxslice command, or perform [Attach Slice] from Operation Management View as necessary.

(4) Slice configuring volume is in TEMP-STOP status.

[Explanation]

The slice was not activated after it has been stopped with the sdxslice command, or the detached node is not current node.

Or else, you have not performed [Stop Slice] after performing [Activate Slice] from Operation Management View.

[Resolution]

Activate slice or take over slice with the sdxslice command as needed. Or, perform [Activate Slice] from Operation Management View.

(5) Slice configuring the volume is in COPY status.

[Explanation]

In order to attach a slice, synchronization copying is currently in process.

Or, synchronization copying is in process between master and proxy.

[Resolution]

Wait until synchronization copying is complete. Note that a slice in the process of synchronization copying will not restrict you from accessing an active volume.

(6) Slice configuring the volume is in NOUSE status.

[Explanation]

When the status of disk related to slice is either in DISABLE or SWAP status, the slice becomes NOUSE to inhibit slice operation.

[Resolution]

Recover disk in DISABLE or SWAP status. For details, see "Disk Status Abnormality."

Contents