PRIMECLUSTER Global Disk Services Configuration and Administration Guide 4.1 (Solaris(TM) Operating System)
Contents PreviousNext

Appendix F Troubleshooting> F.1 Resolving Problems

F.1.3 Volume Status Abnormality

If the volume status is one of the following statuses, take action as indicated for the relevant situation.


 

(1) Mirror volume is in INVALID status.

 

[Explanation]

You can confirm the status of the volume as shown below.

# sdxinfo -V -o Volume1

OBJ    NAME    CLASS   GROUP   SKIP JRM 1STBLK   LASTBLK  BLOCKS   STATUS
------ ------- ------- ------- ---- --- -------- -------- -------- --------
volume *       Class1  Group1  *    *          0    65535    65536 PRIVATE
volume Volume1 Class1  Group1  off  on     65536 17596415 17530880 INVALID


In this example, volume Volume1 that exists in the highest level group Group1 is in INVALID status, as shown in the STATUS field.

If none of the mirror slices consisting the mirror volume contains valid data (ACTIVE or STOP), the mirror volume becomes INVALID. You cannot start a volume in INVALID status.

There are two reasons that may cause this INVALID status.

(Cause a)
Disk is in DISABLE status.

(Cause b)
Master-proxy relationship was cancelled forcibly while master data was being copied to proxy.

 

[Resolution]

1) Confirm that there is a disk in DISABLE status within the group with which the volume is associated as follows.

 

(Example A1)

# sdxinfo -G -o Volume1

OBJ    NAME    CLASS   DISKS               BLKS     FREEBLKS SPARE
------ ------- ------- ------------------- -------- -------- -----
group  Group1  Class1  Disk1:Disk2         17596416        0     0

 

# sdxinfo -D -o Volume1

OBJ    NAME    TYPE   CLASS   GROUP   DEVNAM  DEVBLKS  DEVCONNECT       STATUS
------ ------- ------ ------- ------- ------- -------- ---------------- -------
disk   Disk1   mirror Class1  Group1  c1t1d0  17596416 node1            ENABLE
disk   Disk2   mirror Class1  Group1  c2t3d0  17596416 node1            DISABLE


In this example, disks Disk1 and Disk2 are connected to the highest level mirror group Group1. The disk Disk2 is in DISABLE status as shown in the STATUS field.

 

(Example B1)

# sdxinfo -G -o Volume1

OBJ    NAME    CLASS   DISKS               BLKS     FREEBLKS SPARE
------ ------- ------- ------------------- -------- -------- -----
group  Group1  Class1  Group2:Group3       35127296 17530880     0
group  Group2  Class1  Disk1:Disk2         35127296 *            0
group  Group3  Class1  Disk3:Disk4         35127296 *            0

 

# sdxinfo -D -o Volume1

OBJ    NAME    TYPE   CLASS   GROUP   DEVNAM  DEVBLKS  DEVCONNECT       STATUS
------ ------- ------ ------- ------- ------- -------- ---------------- -------
disk   Disk1   stripe Class1  Group2  c1t1d0  17596416 node1            ENABLE  
disk   Disk2   stripe Class1  Group2  c1t2d0  17596416 node1            DISABLE
disk   Disk3   stripe Class1  Group3  c2t3d0  17682084 node1            ENABLE 
disk   Disk4   stripe Class1  Group3  c2t4d0  17682084 node1            ENABLE 


In this example, lower level stripe groups, Group2 and Group3 are connected to the highest level mirror group Group1. Disk Disk2 which is connected to Group2 is in DISABLE status as shown in the STATUS field.

 

2) If the possible cause is (Cause a), restore the disk by following the procedures in "Disk Status Abnormality."

 

3) From the disks and lower level groups connected to the highest level mirror group, determine the disk or lower level group to which the slice you will use to recover data belongs. Then, execute the sdxfix command to recover data.

 

(Example A3)

# sdxfix -V -c Class1 -d Disk1 -v Volume1


In this example, Volume1 is recovered after a slice in disk Disk1.

 

(Example B3)

# sdxfix -V -c Class1 -g Group3 -v Volume1


In this example, Volume1 is recovered after a slice in lower level stripe group Group3.

 

4) Start the volume.

# sdxvolume -N -c Class1 -v Volume1 -e nosync

 

5) Access Volume1 and check its contents. Restore backup data or run fsck to regain data integrity as necessary.

 

6) Perform synchronization copying on volume.

# sdxcopy -B -c Class1 -v Volume1

 


 

(2) Single volume is in INVALID status.

 

[Explanation]

You can confirm the status of the volume as shown below.

# sdxinfo -V -o Volume1

OBJ    NAME    CLASS   GROUP   SKIP JRM 1STBLK   LASTBLK  BLOCKS   STATUS
------ ------- ------- ------- ---- --- -------- -------- -------- --------
volume *       Class1  Disk1   *    *          0    32767    32768 PRIVATE
volume Volume1 Class1  Disk1   off  on     32768    65535    32768 INVALID
volume *       Class1  Disk1   *    *      65536  8421375  8355840 FREE


In the example, the single volume Volume1 that exists on single disk Disk1 is in INVALID status, as shown in the STATUS field.

You cannot start a volume in INVALID status.

There are two reasons that may cause this INVALID status.

(Cause a)
Single disk is in DISABLE status. In this case, the single slice becomes NOUSE status.

(Cause b)
Master-proxy relationship was cancelled forcibly while master data was being copied to proxy.

 

[Resolution]

1) Confirm that the single disk is in DISABLE status as shown below.

# sdxinfo -D -o Volume1

OBJ    NAME    TYPE   CLASS   GROUP   DEVNAM  DEVBLKS  DEVCONNECT       STATUS
------ ------- ------ ------- ------- ------- -------- ---------------- -------
disk   Disk1   single Class1  *       c1t11d0  8355840 node1            DISABLE


In this example, the single disk Disk1 is in DISABLE status, as shown in the STATUS field.

 

2) If the possible cause is (Cause a), restore the disk by following the procedures given in section "Disk Status Abnormality."

 

3) Execute the sdxfix command to recover the single volume's data.

# sdxfix -V -c Class1 -d Disk1 -v Volume1

 

4) Start the volume.

# sdxvolume -N -c Class1 -v Volume1

 

5) Access Volume1 and check its content. Restore backup data or run the fsck command to regain data integrity as necessary.

 


 

(3) Stripe volume or volume in concatenation group is in INVALID status.

 

[Explanation]

You can confirm the status of the volume as shown below.

# sdxinfo -V -o Volume1

OBJ    NAME    CLASS   GROUP   SKIP JRM 1STBLK   LASTBLK  BLOCKS   STATUS
------ ------- ------- ------- ---- --- -------- -------- -------- --------
volume *       Class1  Group1  *    *          0    65535    65536 PRIVATE
volume Volume1 Class1  Group1  off  on     65536 17596415 17530880 INVALID


In this example, volume Volume1 that exists in the highest level group Group1 is in INVALID status, as shown in the STATUS field.

If any of the disks related to volume is in DISABLE status, the slices consisting that volume become NOUSE status, and the volume becomes INVALID. You cannot start a volume in INVALID status.

 

[Resolution]

1) You can confirm the status of the disk related to the volume as shown below.

# sdxinfo -G -o Volume1 -e long

OBJ    NAME    CLASS   DISKS               BLKS     FREEBLKS SPARE MASTER TYPE   WIDTH
------ ------- ------- ------------------- -------- -------- ----- ------ ------ -----
group  Group1  Class1  Group2:Group3       70189056 65961984     * *      stripe 32
group  Group2  Class1  Disk1:Disk2         35127296 *            * *      concat *
group  Group3  Class1  Disk3:Disk4         35127296 *            * *      concat *

 

# sdxinfo -D -o Volume1

OBJ    NAME    TYPE   CLASS   GROUP   DEVNAM  DEVBLKS  DEVCONNECT       STATUS
------ ------- ------ ------- ------- ------- -------- ---------------- -------
disk   Disk1   concat Class1  Group2  c1t1d0  17596416 node1            ENABLE
disk   Disk2   concat Class1  Group2  c1t2d0  17596416 node1            DISABLE
disk   Disk3   concat Class1  Group3  c2t3d0  17682084 node1            ENABLE
disk   Disk4   concat Class1  Group3  c2t4d0  17682084 node1            ENABLE

In this example, the lower level concatenation groups Group2 and Group3 are connected to the highest level stripe group Group1, and Disk2 connected to Group2 is in DISABLE status as shown in the STATUS field.

 

2) Follow the procedures in "Disk Status Abnormality" and restore the disk status.

 

3) Execute the sdxfix command to recover the volume's data. With -g option, indicate the highest level group name (in this example,Group1).

# sdxfix -V -c Class1 -g Group1 -v Volume1

 

4) Start the volume.

# sdxvolume -N -c Class1 -v Volume1

 

5) Access Volume1 and check its content. Restore backup data or run the fsck command to regain data integrity as necessary.

 


 

(4) Master volume is in INVALID status.

 

[Explanation]

If the copying process fails while copying data from the proxy volume to the master volume because of an I/O error or such, the status of the master volume to which the data is being copied becomes INVALID.

 

[Resolution]

1) Check if there is a DISABLE status disk in the group to which the volume belongs with the following command.

# sdxinfo -D -o Volume1

OBJ    NAME    TYPE   CLASS   GROUP   DEVNAM  DEVBLKS  DEVCONNECT       STATUS
------ ------- ------ ------- ------- ------- -------- ---------------- -------
disk   Disk1   mirror Class1  Group1  c1t1d0   8421376 *                ENABLE
disk   Disk2   mirror Class1  Group1  c1t2d0   8421376 *                DISABLE


In this example, Disk2 is in DISABLE status.

If there is a disk in DISABLE status, see section "(1) Disk is in DISABLE status" in "Disk Status Abnormality," and check which of the causes listed in that section applies. If the possible cause is (Cause a) or (Cause b), follow the procedures and restore the disk.

 

2) Follow the procedures given in section "(1) Mirror slice configuring the mirror volume is in INVALID status" in "Slice Status Abnormality," and check if there is a disk hardware abnormality. If there is, identify the faulty part. When the abnormality is caused by a failed or defective non-disk component, repair the faulty part.

 

3) Procedures to restore the data for different scenarios are given below.

Non-disk component failure

When caused by a disk component failure:

a) Procedures to recover master volume data using proxy volume.

a1) In order to check if the proxy volume that will be used to recover data is separated from the master volume, execute the sdxinfo -V -e long command, and check the PROXY field.

a2) If the proxy volume is not separated, execute the following command.

# sdxproxy Part -c Class1 -p Volume2


a3) Exit all applications accessing the proxy volume. When using the proxy volume as a file system, execute unmount. When using the proxy volume as a file system, execute unmount.

a4) If the proxy volume is started, execute the following command.

# sdxvolume -F -c Class1 -v Volume2


a5) Recover master volume data using the proxy volume's data.

# sdxproxy RejoinRestore -c Class1 -p Volume2

 

b) Procedures to recover data using backup data.

b1) When the volume is in INVALID status, you must first change it to STOP status. Decide on the disk (slice) you wish to use to recover data, and execute the sdxfix command.

# sdxfix -V -c Class1 -d Disk1 -v Volume1


In this example, Volume 1 is restored after a slice in Disk 1.

b2) When the volume to be restored is stopped, start it with the following command.

# sdxvolume -N -c Class1 -v Volume1 -e nosync


b3) Access the volume to be restored and check its contents. Restore backup data or run fsck to regain data integrity as necessary.

b4) When mirroring is configured with the volume, perform synchronization copying.

# sdxcopy -B -c Class1 -v Volume1

 

c) Procedures to swap some disks connected to the group.

c1) If you restore the INVALID master volume later using data of a proxy volume related to the master volume, or use data of proxy volumes related to the master volume after restoring it, part the proxy volumes using the sdxproxy Part command.

# sdxproxy Part -c Class1 -p Volume2


c2) When there is a volume in INVALID status in the group, change it to STOP status with the sdxfix -V command. -d option indicates the disk without abnormality.

# sdxfix -V -c Class1 -d Disk1 -v Volume1


c3) Follow the procedures and swap the disks. For details, see "sdxswap - Swap disk" and "Disk Swap."

c4) Recover the master volume data. If data will be recovered using the proxy volume, follow procedures described in a). If data will be recovered using backup data on media such as tapes, follow procedures described in b).

 

d) Procedures to swap all disks connected to the group.

d1) Exit all applications accessing the master volume and the proxy volume that will be used to recover data. When using the proxy volume or the master volume as a file system, execute unmount.

d2) Stop the master volume and proxy volume in d1).

# sdxvolume -F -c Class1 -v Volume1
#
sdxvolume -F -c Class1 -v Volume2


d3) Execute the sdxproxy RejoinRestore command and restore the master volume data using proxy volume in d1). If the command terminates normally and the master volume is not in INVALID status, restoration process is complete, and you do not need to perform steps d4) and after.

# sdxproxy RejoinRestore -c Class1 -p Volume2


d4) Execute the sdxproxy Swap command and swap the slices of the master volume with the proxy volume in d1).

# sdxproxy Swap -c Class1 -p Volume2

d5) By performing step d4), the status of master volume will not be in INVALID status, and the status of the proxy volume becomes INVALID. Follow the procedures given in section "(5) Proxy volume is in INVALID status" in "Volume Status Abnormality," and restore the proxy volume in INVALID status.

d6) Execute the sdxproxy Swap command and swap the slices of the master volume and the proxy volume you swapped in step d4).

# sdxproxy Swap -c Class1 -p Volume2

 

e) Procedures to swap disks connected to the master group.

e1) Exit all applications accessing the master group, and volumes in the proxy group that will be used to recover data. When using the volume as a file system, execute unmount.

e2) Stop all volumes in the master group and the proxy group in e1).

# sdxvolume -F -c Class1 -v Volume1
#
sdxvolume -F -c Class1 -v Volume2


e3) Execute the sdxproxy RejoinRestore command and restore the master group data using the proxy group in e1). If the command terminates normally and all master volumes are not in INVALID status, restoration process is complete, and you do not need to perform steps e4) and after.

# sdxproxy RejoinRestore -c Class1 -p Volume2


e4) Execute the sdxproxy Swap command and swap the slices of the master group and the proxy group in e1).

# sdxproxy Swap -c Class1 -p Group2


e5) By performing step e4), the master volume will not be in INVALID status and the status of the proxy volume becomes INVALID. Follow the procedures given in section "(5) Proxy volume is in INVALID status" in "Volume Status Abnormality," and restore the proxy volume in INVALID status.

e6) Execute the sdxproxy Swap command and swap the slices of the master group and the proxy group you swapped in step e4).

# sdxproxy Swap -c Class1 -p Group2

 


 

(5) Proxy volume is in INVALID status.

 

[Explanation]

If the copying process fails while copying data from the master volume to the proxy volume because of an I/O error or such, the status of the proxy volume to which the data is being copied becomes INVALID.

[Resolution]

1) Check if there is a DISABLE status disk in the group to which the volume belongs with the following command.

# sdxinfo -D -o Volume1

OBJ    NAME    TYPE   CLASS   GROUP   DEVNAM  DEVBLKS  DEVCONNECT       STATUS
------ ------- ------ ------- ------- ------- -------- ---------------- -------
disk   Disk1   mirror Class1  Group1  c1t1d0   8421376 *                ENABLE
disk   Disk2   mirror Class1  Group1  c1t2d0   8421376 *                DISABLE


In this example, Disk2 is in DISABLE status.

If there is a disk in DISABLE status, see section "(1) Disk is in DISABLE status" in "Disk Status Abnormality," and check which of the causes (There are three causes a, b and c listed.) listed in that section applies. If it is due to (Cause a) or (Cause b), follow the procedures and restore the disk.

 

2) Follow the procedures given in "(1) Mirror slice configuring the mirror volume is in INVALID status " in "Slice Status Abnormality," and check if there is a disk hardware abnormality. If there is, identify the faulty part. When the abnormality was caused by a failed or defective non-disk component, repair the faulty part.

 

3) Procedures to restore the data for different scenarios are given below.

a) Procedures to recover proxy volume data using the master volume.

a1) In order to check if the proxy volume is separated from the master volume, execute the sdxinfo -V -e long command, and check the PROXY field.

a2) If the proxy volume is not separated, execute the following command.

# sdxproxy Part -c Class1 -p Volume2


a3) Rejoin the proxy volume with the master volume.

# sdxproxy Rejoin -c Class1 -p Volume2

 

b) Procedures to swap some disks connected to the group.

b1) Cancel the relationship with master volume using the sdxproxy Break command.

# sdxproxy Break -c Class1 -p Volume2

b2) Separate the volumes that are in INVALID status in the group with the sdxfix -V command, and change them to STOP status. -d option indicates the disk without abnormality.

# sdxfix -V -c Class1 -d Disk1 -v Volume2


b3) Follow the procedures and swap the disks. For details, see "sdxswap - Swap disk," or section "Disk Swap."

b4) Join the master and the proxy again with the sdxproxy Join command.

# sdxproxy Join -c Class1 -m Volume1 -p Volume2

 

c) Procedures to swap all disks connected to the group.

c1) Cancel the relationship with the master using the sdxproxy Break command.

# sdxproxy Break -c Class1 -p Volume2


c2) Exit all applications accessing the volume in the group. When using the volume as a file system, execute unmount.

c3) Stop all volumes in the group.

# sdxvolume -F -c Class1 -v Volume2


c4) Check the volume configuration of the group (such as volume names and sizes) with the sdxinfo command, and keep a note of it.

c5) Remove all volumes in the group.

# sdxvolume -R -c Class1 -v Volume2


c6) Follow the procedures and swap the disks. For details, see "sdxswap - Swap disk," or section "Disk Swap."

c7) Create the volume that you removed in step c5) again.

# sdxvolume -M -c Class1 -g Group2 -v Volume2 -s size


c8) Stop the volume you created in step c7).

# sdxvolume -F -c Class1 -v Volume2


c9) Join the master volume and the proxy volume again, with the sdxproxy Join command.

# sdxproxy Join -c Class1 -m Volume1 -p Volume2

 

d) Procedures to swap disks connected to the proxy group.

d1) Cancel the relationship with the master using the sdxproxy Break command.

# sdxproxy Break -c Class1 -p Group2

d2) Exit all applications accessing the volume in the group. When using the volume as a file system, execute unmount.

d3) Stop all volumes in the group.

# sdxvolume -F -c Class1 -v Volume2


d4) Remove all volumes in the group.

# sdxvolume -R -c Class1 -v Volume2


d5) Follow the procedures and swap the disks. For details, see "sdxswap - Swap disk," or section "Disk Swap."

d6) Join the master group and the proxy group again with the sdxproxy Join command.

# sdxproxy Join -c Class1 -m Group1 -p Group2 -a Volume1=Volume2:on

 


 

(6) Volume is in STOP status.

 

[Explanation]

Normally, volumes automatically start when the system is booted and become ACTIVE. The volume status will change to STOP when the volume is stopped with the Stop Volume menu in the GDS Management View or the sdxvolume -F command.

In a cluster system, among volumes within GDS shared classes registered with cluster applications, volumes other than proxy volumes start or stop according to the cluster application modes. If a cluster application is in Offline mode, volumes other than proxy volumes are in STOP status.

Accessing a volume in STOP status will result in an EIO error (I/O error) or an ENXIO error (No such device or address).


For the problem in a cluster system that volumes in a shared class not registered with a cluster application do not start at node startup, see "(4) The GFS Shared File System is not mounted on node startup" in "Cluster System Related Error."

 

[Resolution]

Start the volumes with the Start Volume menu in GDS Management View or the sdxvolume -N as necessary.

To start volumes within a GDS shared class registered with a cluster application, change the cluster application mode to Online.

 


 

(7) I/O error occurs although mirror volume is in ACTIVE status.

 

[Explanation]

A mirror volume consists of multiple slices, and in an event of an I/O error, the crashed slice will be detached. Therefore, accessing the volume will complete normally.

However, when an I/O error occurs when only one slice is ACTIVE amongst those configuring the volume, accessing the volume will result in an error. At such time, the status of the slice and the volume remains ACTIVE.

Probable situations resulting in such a problem will be described using a two-way multiplex mirroring configuration, where two disks or two lower level groups are connected to a group. As an example, means to circumvent such problems will also be described.

(Situation 1)
One of the slices was detached with the sdxslice -M in order to backup volume data. While accessing the volume, an I/O error occurred with the other slice.
(Prevention 1)
Before executing the sdxslice -M command, connect a reserved disk and temporarily configure a three-way multiplex mirroring, or make the mirrored volume available for backup.

 

(Situation 2)
While restoring a slice with an I/O error, an I/O error also occurred on another slice.
(Prevention 2)
By securing a spare disk within the class, effects due to delay in restoring the slice will be avoided to a certain degree

 

[Resolution]

Identify the cause of I/O error occurrence in the last ACTIVE slice, by referring to the disk driver log message.

Resolutions are described below assuming the following three circumstances:

  1. Error occurred due to a disk component failure. Will attempt recovery using backup data.

  2. Error occurred due to a disk component failure. Will attempt data recovery from a slice in INVALID status.

  3. Error occurred due to a failed or a defective non-disk component failure.

     

a. When the error cause is a disk component failure and recovery is performed using backup data

 

a1) When the error was caused by a disk component failure, no slice with valid data exists. Restore data from the backup data following the procedures given below.

 

a2) Exit the application accessing the volume. When the volume is used as a file system, execute the unmount command.

When I/O error occurs on the unmount command, execute -f option of the unmount command.

 

a3) Stop the volume with the sdxvolume command.

# sdxvolume -F -c Class1 -v Volume1


a4) If there is a TEMP status slice within the volume, attempt recovery following the procedures given in "Slice Status Abnormality."

 

a5) If there is a NOUSE status slice within the volume, attempt

recovery following the procedures given in "Slice Status Abnormality."

 

a6) Record the volume size which can be checked as follows.

# sdxinfo -V -o Volume1

OBJ    NAME    CLASS   GROUP   SKIP JRM 1STBLK   LASTBLK  BLOCKS   STATUS
------ ------- ------- ------- ---- --- -------- -------- -------- --------
volume *       Class1  Group1  *    *          0    32767    32768 PRIVATE
volume Volume1 Class1  Group1  off  on     32768  4161535  4128768 STOP


In this example, the volume size would be 4128768 blocks given in Volume1 BLOCKS field.

 

a7) Remove the volume with the sdxvolume command.

# sdxvolume -R -c Class1 -v Volume1


a8) Swap disks following the procedures given in "Disk Swap" and "sdxswap - Swap disk."

 

a9) Create a volume with the sdxvolume command again. For the number_of_blocks, use the size recorded in a6), in this example, 4128768.

# sdxvolume -M -c Class1 -g Group1 -v Volume1 -s number_of_blocks


a10) Finally, restore the backup data to Volume1.

 

b. When the error cause is a disk component failure and data is restored from a slice in INVALID status

 

b1) When the error was caused by a disk failure, and when no backup data exists, or even if it did, the data is too old, restore data from the detached INVALID status slice, following the procedures given below.

 

b2) Exit the application accessing the volume. When the volume is used as a file system, execute the unmount command.

When I/O error occurs on the unmount command, execute -f option of the unmount command.

 

b3) Stop the volume with the sdxvolume command.

# sdxvolume -F -c Class1 -v Volume1

 

b4) If there is a TEMP status slice within the volume, attempt recovery following the procedures given in "Slice Status Abnormality."

 

b5) If there is a NOUSE status slice within the volume, attempt recovery following the procedures given in "Slice Status Abnormality."

 

b6) Determine the original mirror slice after the volume is recovered. Then, execute the sdxfix command.

(Example 1)

# sdxfix -V -c Class1 -d Disk2 -v Volume1

In this example, data is recovered from a mirror slice in the disk Disk2 which is connected to the highest level mirror group.

 

(Example 2)

# sdxfix -V -c Class1 -g Group2 -v Volume1

In this example, data is recovered from a mirror slice in the lower level group Group2 which is connected to the highest level mirror group.

 

b7) Start the volume.

# sdxvolume -N -c Class1 -v Volume1 -e nosync

 

b8) Create backup of Volume1 and regain data integrity by running fsck as necessary.

 

b9) Lastly, swap disks following the procedures given in "Disk Swap" and "sdxswap - Swap disk."

 

c. When the error cause is a non-disk component failure or defect

The slice with valid data exists within the disk, and shut down the system once, recover the failed component, and then reboot the system. Synchronization copying is automatically performed and the mirroring status will be recovered.

 


 

(8) An I/O error occurs on a single volume.

 

[Explanation]

Since a single volume consists of only one slice, accessing the volume at the time of an I/O error will result in an error. However, the status of slice and volume will remain ACTIVE.

 

[Resolution]

Identify the cause of I/O error occurrence by referring to the disk driver log message.

How to resolve the problem is described in two cases:

  1. When the error cause is a disk component failure and recovery is performed using backup data

  2. When the error cause is a non-disk component failure or defect

     

a. When the error cause is a disk component failure and recovery is performed using backup data

 

a1) In the event of a disk component failure, there will be no slice with valid data. Follow the procedures below and restore the data using the backup data. In this example, Disk1 (c1t11d0) has a failure.

# sdxinfo -D -o Disk1

OBJ    NAME    TYPE   CLASS   GROUP   DEVNAM  DEVBLKS  DEVCONNECT       STATUS
------ ------- ------ ------- ------- ------- -------- ---------------- -------
disk   Disk1   single Class1  *       c1t11d0  8493876 node1            ENABLE

 

a2) Search the volumes within the faulty disk using the sdxinfo command. And record the volume size,

# sdxinfo -V -o Disk1

OBJ    NAME    CLASS   GROUP   SKIP JRM 1STBLK   LASTBLK  BLOCKS   STATUS
------ ------- ------- ------- ---- --- -------- -------- -------- --------
volume *       Class1  Disk1   *    *          0    32767    32768 PRIVATE
volume Volume1 Class1  Disk1   off  on     32768    65535    32768 ACTIVE
volume Volume2 Class1  Disk1   off  on     65536  4194303  4128768 ACTIVE
volume *       Class1  Disk1   *    *    4194304  8421375  4227072 FREE

In this example, Volume1 and Volume2 are within the faulty Disk1. The size of Volume1 would be 32,768 blocks as shown in the BLOCKS field. The size of Volume2 would be 4,128,768 blocks as shown in the BLOCKS field.

 

a3) Exit the application accessing the volume. When the volume is used as a file system, execute the unmount command. When I/O error occurs on the unmount command, execute -f option of the unmount command.

 

a4) Stop the volume with the sdxvolume command.

# sdxvolume -F -c Class1 -v Volume1,Volume2

 

a5) Remove the volumes with the sdxvolume command.

# sdxvolume -R -c Class1 -v Volume1
#
sdxvolume -R -c Class1 -v Volume2

 

a6) Before swapping the disks, execute the following command.

# sdxswap -O -c Class1 -d Disk1


If the disk is the only remaining disk in the disk class, the command results in an error as shown below.

In that event, follow the steps a6'), a7') and a8').

SDX:sdxswap: ERROR: Disk1: The last ENABLE disk in class cannot be swapped

 

a7) Swap the disks.

 

a8) After swapping the disks, execute the following command.

# sdxswap -I -c Class1 -d Disk1

 

a6') Before swapping the disks, execute the following command.


If no error is output in a6), the steps a6'), a7'), and a8') are not required.

# sdxdisk -R -c Class1 -d Disk1

 

a7') Swap the disks.

 

a8') After swapping the disks, execute the following command.

# sdxdisk -M -c Class1 -d c1t11d0=Disk1:single

 

a9) Create volumes with the sdxvolume command again. For the -s option, use the size recorded in 2a), in this example.

# sdxvolume -M -c Class1 -d Disk1 -v Volume1 -s 32768
#
sdxvolume -M -c Class1 -d Disk1 -v Volume2 -s 4128768

 

a10) Finally, restore the backup data to Volume1 and Volume2.

 

b. When the error cause is a non-disk component failure or defect

 

Shut down the system once, recover the failed component, and then reboot the system. Slice data is valid and there is no need to restore the data.

 


 

(9) An I/O error occurs on a stripe volume or a volume in a concatenation group.

 

[Explanation]

Since a stripe volume or a volume within a concatenation group consists of only one slice, accessing the volume at the time of an I/O error will also result in an error. However, the status of slice and volume will remain ACTIVE.

 

[Resolution]

Identify the cause of I/O error occurrence by referring to the disk driver log message.

You can confirm the error status of the disk related to volume and the physical disk name as shown below.

# sdxinfo -D -o Volume1 -e long

OBJ    NAME    TYPE   CLASS   GROUP   DEVNAM  DEVBLKS  FREEBLKS DEVCONNECT       STATUS  E
------ ------- ------ ------- ------- ------- -------- -------- ---------------- ------- -----
disk   Disk1   concat Class1  Group2  c1t1d0  17596416 *        node1            ENABLE  0
disk   Disk2   concat Class1  Group2  c1t2d0  17596416 *        node1            ENABLE  1
disk   Disk3   concat Class1  Group3  c2t3d0  17682084 *        node1            ENABLE  0
disk   Disk4   concat Class1  Group3  c2t4d0  17682084 *        node1            ENABLE  0


In this example, an I/O error occurs on Disk2, as shown in the E field. The physical disk name corresponding to Disk2 is c1t2d0, as shown in the DEVNAM field.

How to resolve the problem is described in two cases:

  1. When the error cause is a disk component failure and recovery is performed using backup data

  2. When the error cause is a non-disk component failure or defect

     

a. When the error cause is a disk component failure and recovery is performed using backup data

a1) In the event of a disk component failure, there will be no slices with valid data. Follow the procedures below and restore the data using the backup data.

a2) Record the configuration information of the group that was related to the failed disk using the sdxinfo command.

# sdxinfo -G -o Disk2 -e long

OBJ    NAME    CLASS   DISKS               BLKS     FREEBLKS SPARE MASTER TYPE   WIDTH
------ ------- ------- ------------------- -------- -------- ----- ------ ------ -----
group  Group1  Class1  Group2:Group3       70189056 65961984     * *      stripe 32
group  Group2  Class1  Disk1:Disk2         35127296 *            * *      concat *
group  Group3  Class1  Disk3:Disk4         35127296 *            * *      concat *


In this example, the lower level concatenation groups Group2 and Group3 are connected to the highest level stripe group Group1. The disks Disk1 and Disk2 are connected to Group2, and the disks Disk3 and Disk4 are connected to Group3. The stripe width for Group1 is 32 blocks.

a3) Search the volumes that exist in the highest level group that are related to the faulty disk using the sdxinfo command.

# sdxinfo -V -o Disk2

OBJ    NAME    CLASS   GROUP   SKIP JRM 1STBLK   LASTBLK  BLOCKS   STATUS
------ ------- ------- ------- ---- --- -------- -------- -------- --------
volume *       Class1  Group1  *    *          0    65535    65536 PRIVATE
volume Volume1 Class1  Group1  *    *      65536    98303    32768 ACTIVE
volume Volume2 Class1  Group1  *    *      98304  4227071  4128768 ACTIVE
volume *       Class1  Group1  *    *    4227072 70189055 65961984 FREE


In this example, Volume1 and Volume2 exist in the highest level group Group1, that is related to the faulty disk Disk2. The size of Volume1 is 32768 blocks, and the size of Volume2 is 4128768 blocks as shown in the BLOCKS field.

a4) Exit the application accessing the volume. When the volume is used as a file system, execute unmount command. When I/O error occurs on unmount command, execute -f option of unmount command.

a5) Stop the volume with the sdxvolume command.

# sdxvolume -F -c Class1 -v Volume1,Volume2


a6) Remove the volumes with the sdxvolume command.

# sdxvolume -R -c Class1 -v Volume1
# sdxvolume -R -c Class1 -v Volume2


a7) Disconnect the faulty disk from the group. If the group is in a hierarchical structure, disconnect from the higher group in descending order.

# sdxgroup -D -c Class1 -h Group1 -l Group2
#
sdxdisk -D -c Class1 -g Group2 -d Disk2


In this example, the faulty disk Disk2 is connected to Group2, and Group2 is connected to Group1. Therefore, you should disconnect Group2 first, and then Disk2.

a8) Before swapping the disks, execute the following command.

# sdxswap -O -c Class1 -d Disk2



If the disk is the only remaining disk in the disk class, the command results in an error as shown below. In that event, follow the steps a8'), a9') and a10').

SDX:sdxswap: ERROR: Disk2: The last ENABLE disk in class cannot be swapped

a9) Swap the disks.

a10) After swapping the disks, execute the following command.

# sdxswap -I -c Class1 -d Disk2


a8') Before swapping the disks, execute the following command.


If no error is output in a8), the steps a8'), a9'), and a10') are not required.

# sdxdisk -R -c Class1 -d Disk2


a9') Swap the disks.

 

a10') After swapping the disks, execute the following command.

# sdxdisk -M -c Class1 -d c1t2d0=Disk2


a11) Connect the swapped disk to the group, referring to the group information recorded in a2). If the groups were in a hierarchical structure, connect the groups in an ascending order.

# sdxdisk -C -c Class1 -g Group2 -d Disk2
# sdxgroup -C -c Class1 -h Group1 -l Group2 -a type=stripe,width=32


a12) Create volumes with the sdxvolume command again. For the -s option, use the size recorded in a3), in this example, 32768 and 4128768.

# sdxvolume -M -c Class1 -g Group1 -v Volume1 -s 32768 -a pslice=off
# sdxvolume -M -c Class1 -g Group1 -v Volume2 -s 4128768 -a pslice=off


a13) Finally, restore the backup data to Volume1 and Volume2.

 

b. When the error cause is a non-disk component failure or defect

Shut down the system once, recover the failed component, and then reboot the system. Slice data is valid and there is no need to restore the data.



Contents PreviousNext

All Rights Reserved, Copyright(C) FUJITSU LIMITED 2005