D.1.1 Slice Status Abnormality

If the slice status is one of the following statuses, take the actions as indicated for the relevant situation.

(1) Mirror slice configuring the mirror volume is in INVALID status.

Explanation

You can check the status of the slice configuring the volume as follows.

# sdxinfo -S -o Volume1

OBJ    CLASS   GROUP   DISK      VOLUME   STATUS
------ ------- ------- -------   -------- --------
slice  Class1  Group1  Object1   Volume1  ACTIVE
slice  Class1  Group1  Object2   Volume1  INVALID

In this example, among the slices that exist in volume Volume1, the slice within Object2 is in INVALID status, as shown in the STATUS field. Object2 is a disk or lower level group connected to the highest level mirror group Group1.

The following five events could possibly cause the INVALID status of the mirror slice Volume1.Object2.

An I/O error occurred on the mirror slice Volume1.Object2.
(Cause a)
A disk component relevant to Object2 failed to operate properly, and an I/O error occurred on the mirror slice Volume1.Object2.

(Cause b)
A component other than disks relevant to Object2 (such as an I/O adapter, an I/O cable, an I/O controller, a power supply, and a fan) failed to operate properly, and an I/O error occurred on the mirror slice Volume1.Object2.
An I/O error occurred on the mirror slice Volume1.Object1 during synchronization copying to the mirror slice Volume1.Object2.
(Cause a')
A disk component relevant to Object1 failed to operate properly during synchronization copying to the mirror slice Volume1.Object2, and an I/O error occurred on the copy source slice Volume1.Object1.

(Cause b')
A component other than disks relevant to Object1 (such as an I/O adapter, an I/O cable, an I/O controller, a power supply and a fan) failed to operate properly during synchronization copying to the mirror slice Volume1.Object2, and an I/O error occurred on the mirror slice Volume1.Object1 that is the copy source.
Others
(Cause c)
- Synchronization copying to the mirror slice Volume1.Object2 was canceled as the result of a cause such as [Cancel Copying] selection from the GDS Management View, sdxcopy command execution, or a power outage.
- An I/O error occurred due to a SCSI timeout. Whether a SCSI timeout has occurred can be determined from the following message:
```
kernel: mptscsih: iocn: attempting task abort! (sc=e00001401a0dce80)
```

Resolution

1) Identify the physical disk name of a faulty disk using the sdxinfo command.

(Example A1)

# sdxinfo -G -o Volume1
OBJ    NAME    CLASS   DISKS               BLKS     FREEBLKS SPARE
------ ------- ------- ------------------- -------- -------- -----
group  Group1  Class1  Object1:Object2     17596416 17498112 0

# sdxinfo -D -o Volume1 -e long
OBJ    NAME    TYPE   CLASS   GROUP   DEVNAM  DEVBLKS  FREEBLKS DEVCONNECT       STATUS  E
------ ------- ------ ------- ------- ------- -------- -------- ---------------- ------- -----
disk   Object1 mirror Class1  Group1  sda     17596416        * node1:node2      ENABLE      *
disk   Object2 mirror Class1  Group1  sdb     17596416        * node1:node2      ENABLE      *

In this example, Object2 is a disk connected with the highest level group Group1. As indicated in the E field, an I/O error occurred on the disk Object2, and the possible cause is (Cause a) or (Cause b). The physical disk name of the disk Object2 is sdb as shown in the DEVNAM field.

In example A1, if the value 0 is in the E field of the disk Object2 including a slice in the INVALID status and if the value 1 is in the E field of the disk Object1 that is mirrored with the disk Object2, it indicates an I/O error occurred on the disk Object1 and the possible cause is (Cause a') or (Cause b'). In such a case, see "(2) The copy destination slice was made INVALID due to an I/O error generated on the copy source slice during synchronization copying." and perform restoration.

If the E field of any disk does not contain the value 1 in example A1, the possible cause is (Cause c).

(Example B1)

# sdxinfo -G -o Volume1
OBJ    NAME    CLASS   DISKS               BLKS     FREEBLKS SPARE
------ ------- ------- ------------------- -------- -------- -----
group  Group1  Class1  Object1:Object2     35127296 35028992 0
group  Object1 Class1  Disk1:Disk2         35127296        * *
group  Object2 Class1  Disk3:Disk4         35127296        * *

# sdxinfo -D -o Volume1 -e long
OBJ    NAME    TYPE   CLASS   GROUP   DEVNAM  DEVBLKS  FREEBLKS DEVCONNECT       STATUS  E
------ ------- ------ ------- ------- ------- -------- -------- ---------------- ------- -----
disk   Disk1   stripe Class1  Object1 sda     17596416        * node1:node2      ENABLE      0
disk   Disk2   stripe Class1  Object1 sdb     17596416        * node1:node2      ENABLE      0
disk   Disk3   stripe Class1  Object2 sdc     17682084        * node1:node2      ENABLE      1
disk   Disk4   stripe Class1  Object2 sdd     17682084        * node1:node2      ENABLE      0

In this example, Object2 is a lower level group connected with the highest level group Group1. As indicated in the E field, an I/O error occurred on the disk Disk3 connected with Object2 and the possible cause is (Cause a) or (Cause b). The physical disk name corresponding to Disk3 is sdc as shown in the DEVNAM field.

In example B1, if the 0 value is in the E field of the disks (Disk3 and Disk4) connected with the lower level group Object2 including a slice in the INVALID status and if the value 1 is in the E field of the disk (Disk1 or Disk2) connected with the lower level group Object1 that is mirrored with the lower level group Object2, it indicates an I/O error occurred on the disk connected with Object1 and the possible cause is (Cause a') or (Cause b'). In such a case, see "(2) The copy destination slice was made INVALID due to an I/O error generated on the copy source slice during synchronization copying." and perform restoration.

In example B1, if there are no disks with "1" in the E field, the possible cause of the INVALID status is (Cause c).

2) Refer to disk driver log messages and check the physical disk abnormalities.

The causes of disk hardware failures can be failures or defects in components such as I/O adapters, I/O cables, I/O controllers, power supplies, and fans other than the disks.

Contact field engineers and specify which component failed, or might be defective.

If there are no failures or defective components, the possible cause of the INVALID is (Cause c).

The resolution procedure is illustrated below for each of the three causes a, b, and c.

a. For (Cause a)

a1) Perform the following operations before and after disk swapping. For the procedures for swapping disks from Operation Management View, see "7.3.1.2 Operation Procedure."

Before swapping the disks, execute the following command.

(Example A1)

# sdxswap -O -c Class1 -d Object2

In the example, disk Object2 connected to the highest level group Group1 will be swapped.

(Example B1)

# sdxswap -O -c Class1 -d Disk3

In the example, disk Disk3 will be swapped. Disk3 is a disk connected to lower level group Object2, which is a lower level group of the highest level group Group1.

a2) Swap disks.

a3) After swapping the disks, execute the following command.

(Example A3)

# sdxswap -I -c Class1 -d Object2

(Example B3)

# sdxswap -I -c Class1 -d Disk3

a4) Check the slice status according to step 3).

b. For (Cause b)

b1) Shut down the system once, repair the disabled part, and reboot the system. Consequently, synchronization copying is performed and the mirroring status is restored.

b2) Check the slice status according to step 3).

c. For (Cause c)

c1) Perform synchronization copying of mirror volume.

# sdxcopy -B -c Class1 -v Volume1

c2) Check the slice status according to step 3).

3) You can confirm the recovery of the slice configuring the volume, as shown below.

# sdxinfo -S -o Volume1
OBJ    CLASS   GROUP   DISK    VOLUME  STATUS
------ ------- ------- ------- ------- --------
slice  Class1  Group1  Object1 Volume1 ACTIVE
slice  Class1  Group1  Object2 Volume1 ACTIVE

In this example, the slices within Object1 and Object2 are both in ACTIVE status. This indicates that the recovery process was completed successfully.

(2) The copy destination slice was made INVALID due to an I/O error generated on the copy source slice during synchronization copying.

Explanation

When an I/O error occurs on the copy source slice during synchronization copying, the copy destination slice becomes INVALID while the source slice is still ACTIVE.

The following two events are possible causes.

(Cause a): A disk component of the copy source failed to operate properly, and an I/O error occurred on the copy source slice.

(Cause b): A component other than the copy source disk (such as an I/O adapter, an I/O cable, an I/O controller, a power supply, and a fan) failed to operate properly during synchronization copying, and an I/O error occurred on the copy source slice.

For details on determining whether the status is relevant to one of these events and identifying the physical disk name of a faulty disk, see [Explanation] and step 1) of [Resolution] described in "(1) Mirror slice configuring the mirror volume is in INVALID status."

Resolution

First examine the physical disk abnormalities referring to disk driver log messages and so on. Then contact field engineers and locate the disabled or faulty part.

When the possible cause is (Cause b), shut down the system once, repair the disabled part, and reboot the system. Consequently, synchronization copying is performed and the mirroring status is restored.

When the possible cause is (Cause a), follow the procedures below and repair the slice. The procedure is illustrated for each of the following three situations.

A. For /(root), /usr, or /var [EFI]

B. For the swap area [EFI]

C. For others (other than /(root), /usr, /var, swap)

The following illustrates restoration procedures when the class name is Class1, the volume name is Volume1, the name of a faulty disk of the copy source is Disk1, and the name of a disk of the copy destination is Disk2 as examples.

# sdxinfo -S -o Volume1
OBJ    CLASS   GROUP   DISK    VOLUME  STATUS
------ ------- ------- ------- ------- --------
slice  Class1  Group1  Disk1   Volume1 ACTIVE
slice  Class1  Group1  Disk2   Volume1 INVALID

# sdxinfo -G -o Volume1
OBJ    NAME    CLASS   DISKS               BLKS     FREEBLKS SPARE
------ ------- ------- ------------------- -------- -------- -----
group  Group1  Class1  Disk1:Disk2         17596416 17498112 0

# sdxinfo -D -o Volume1 -e long
OBJ    NAME    TYPE   CLASS   GROUP   DEVNAM  DEVBLKS  FREEBLKS DEVCONNECT       STATUS  E
------ ------- ------ ------- ------- ------- -------- -------- ---------------- ------- -----
disk   Disk1   mirror Class1  Group1  sda     17596416        * node1:node2      ENABLE      1
disk   Disk2   mirror Class1  Group1  sdb     17596416        * node1:node2      ENABLE      0

A. For /(root), /usr, or /var [EFI]

Follow "(4) System cannot be booted. (Failure of all boot disks)" in "D.1.5 System Disk Abnormality [EFI]" for restoration. In the procedure of "Resolution", replace only the faulty copy source disk. Do not replace all the disks that are registered in the root class.

B. For the swap area [EFI]

B.1) Check the swap volume.

# swapon -s
Filename                                Type            Size    Used    Priority
/dev/sfdsk/gdssys32               partition       4194296 0       -1
                  (*1)

(*1) In RHEL8.6 or later, the path for the GDS logical volume is /dev/sfdsksys32.

B.2) Remove the volume from the swap area.

# swapoff /dev/sfdsk/gdssys32

Depending on the part or severity of failure in disks that constitute the volume, the swapoff(8) command may fail due to an I/O error. In this event, remove the volume from the swap area performing steps B.1.1) through B.1.2).

B.2.1) Comment out the swap line to prevent use of the volume as a swap area after the system is rebooted.

# vim /etc/fstab

Before edit:
/dev/sfdsk/gdssys32   swap            swap    defaults     0 0

After edit:
#/dev/sfdsk/gdssys32   swap            swap    defaults     0 0

B.2.2) Reboot the system.

# shutdown -r now

B.3) Stop the volume.

# sdxvolume -F -c Class1 -v Volume1

B.4) Restore the status of the copy destination slice in INVALID status.

# sdxfix -V -c Class1 -d Disk2 -v Volume1

B.5) Verify that the restored copy destination slice is in STOP status and the copy source slice is in INVALID status now.

# sdxinfo -S -o Volume1
OBJ    CLASS   GROUP   DISK    VOLUME  STATUS
------ ------- ------- ------- ------- --------
slice  Class1  Group1  Disk1   Volume1 INVALID
slice  Class1  Group1  Disk2   Volume1 STOP

B.6) Start the volume.

# sdxvolume -N -c Class1 -v Volume1

B.7) Add the volume to the swap area again.

# swapon /dev/sfdsk/gdssys32

When step B.2.1) was performed, undo the edit that was made in the /etc/fstab file.

# vim /etc/fstab

Before edit:

#/dev/sfdsk/gdssys32  swap            swap    defaults     0 0

After edit:

/dev/sfdsk/gdssys32  swap            swap    defaults     0 0

B.8) Remove the faulty copy source disk from GDS management to give it a replaceable status.

# sdxswap -O -c Class1 -d Disk1

B.9) Swap the faulty copy source disk.

B.10) Put the swapped disk back in control of GDS management to make it available.

# sdxswap -I -c Class1 -d Disk1

C. For others (other than /(root), /usr, /var, and swap)

C.1) Exit applications using the volume.

C.2) Unmount the file system on the volume when it has been mounted.

In this example, the volume has been used as an ext4 file system.

# umount /dev/sfdsk/Class1/dsk/Volume1

Depending on the part or the severity of failure in disks that compose the volume, the umount command may fail due to an I/O error. In this event, unmount the file system performing steps C.2.1) through C.2.3).

C.2.1) If class Class1 is registered with a cluster application, remove the cluster application.

C.2.2) Comment out the /dev/sfdsk/Class1/dsk/Volume1 line in the /etc/fstab file to prevent mounting of the volume after the system is rebooted.

C.2.3) Reboot the system.

C.3) Stop the volume.

# sdxvolume -F -c Class1 -v Volume1

C.4) Restore the status of the copy destination slice in the INVALID status.

# sdxfix -V -c Class1 -d Disk2 -v Volume1

C.5) Verify that the restored copy destination slice is in the STOP status and the copy source slice is the INVALID status now.

# sdxinfo -S -o Volume1
OBJ    CLASS   GROUP   DISK    VOLUME  STATUS
------ ------- ------- ------- ------- --------
slice  Class1  Group1  Disk1   Volume1 INVALID
slice  Class1  Group1  Disk2   Volume1 STOP

C.6) Start the volume.

# sdxvolume -N -c Class1 -v Volume1

C.7) The consistency of volume data may have lost. Restore the backup data or perform repair using the fsck(8) command if necessary.

Note

If an I/O error occurred in the copy source slice during just resynchronization after the system went down, restoration may possibly be performed with the fsck(8) command.

When step C.2.2) was performed, undo the edit that was made in the /etc/fstab file.

When step C.2.1) was performed, re-create the cluster application removed in step C.2.1).

C.8) Remove the faulty copy source disk from GDS management to make it a replaceable status.

# sdxswap -O -c Class1 -d Disk1

C.9) Swap the faulty copy source.

C.10) Put the swapped disk back in control of GDS management to make it available.

# sdxswap -I -c Class1 -d Disk1

(3) Slice configuring the volume is in TEMP status.

Explanation

The slice was not attached after it has been detached with the sdxslice command. Or else, you have not performed [Attach Slice] after performing [Detach Slice] from Operation Management View.

Resolution

Attach the slice again with the sdxslice command, or perform [Attach Slice] from Operation Management View as necessary.

(4) Slice configuring volume is in TEMP-STOP status.

Explanation

The slice was not activated after it has been stopped with the sdxslice command, or the detached node is not current node. Or else, you have not performed [Stop Slice] after performing [Activate Slice] from Operation Management View.

Resolution

Activate slice or take over slice with the sdxslice command as needed. Or, perform [Activate Slice] from Operation Management View.

(5) Slice configuring the volume is in COPY status.

Explanation

In order to attach a slice, synchronization copying is currently in process. Or, synchronization copying is in process between master and proxy.

Resolution

Wait until synchronization copying is complete. Note that a slice in the process of synchronization copying will not restrict you from accessing an active volume.

(6) Slice configuring the volume is in NOUSE status.

Explanation

When the status of disk related to slice is either in DISABLE or SWAP status, the slice becomes NOUSE to inhibit slice operation.

Resolution

Recover disk in DISABLE or SWAP status. For details, see "D.1.2 Disk Status Abnormality."