8.4.2 Troubleshooting

The following figure shows the troubleshooting flow when a hardware or similar fault occurs.

Figure 8.3 Troubleshooting Flow (When Fault Occurs During Replication)

Reference ahead 1: 8.4.2.1 Hardware Error on Replication Volume
Reference ahead 2: 7.10 When Recovering Storage Cluster Continuous Copy Function
Reference ahead 3: 8.4.2.2 Troubleshooting If Bad Sector Occurred in Copy Source Volume
Reference ahead 4: 8.4.2.3 Troubleshooting When Lack of Free Physical Space Has Occurred in Copy Destination Volume
Reference ahead 5: 8.4.2.4 Error (halt) on Remote Copy Processing

Note

Refer to "8.4.1 Overview" for details about the Status column and "Fault location".
If the Status column is "?????", check if the copy processing is in the error suspend state ("failed") or the hardware suspend state ("halt") using ETERNUS Web GUI.
If the copy processing is in either of these states, take the action indicated in the above troubleshooting flow.
In other cases, take the action checked in the following points.
- If a device is not accessible:
  Check if the device exists.
- If there is anything unusual with Managed Server, switches, etc.:
  Contact Fujitsu Technical Support.

Use ETERNUS Web GUI to check the error codes. Use the following two methods to check.

Checking with the swsrpstat command
Execute the command with the -O option.
Checking with ETERNUS Web GUI
1. On the [Display status] menu, click [Advanced Copy status display] in the status display.
2. At "Session status", click the "Number of active sessions" link for the relevant copy type.
3. Refer to the value in the "Error code" column of the relevant copy process.

The following table shows the meanings of the error codes.

Table 8.8 Meanings of Error Codes
Error Code	Meaning
0xB2	The Storage Cluster Continuous Copy Sessions of the Primary Storage and the Secondary Storage were not able to synchronize, and Advanced Copy was not able to continue.
0xBA	If a) or b) below applies, a bad sector was created in the transaction volume. QuickOPC has not yet performed physical copying and tracking is in progress EC/REC is in the suspend status (replication established status) Note: If a bad sector is created in a transaction volume when a) or b) applies, the ETERNUS Disk storage system automatically changes the copy processing to the error suspend state. This prevents a restart of QuickOPC or EC/REC resume and prevents the copy destination volume from being overwritten with invalid copy source volume data.
0x1E, 0x2E, 0xBB	A lack of free space has occurred in the copy destination volume.
Other than above	An error other than the above occurred.

8.4.2.1 Hardware Error on Replication Volume

When a hardware error occurs in a duplicate volume, perform the repair work on the error according to the following procedures.
If the Storage Cluster Continuous Copy function is used, refer to "7.9.1 Recovery from Hardware Failure".

Execute the swsrpcancel command to cancel the processing in which the error occurred. If the processing cannot be cancelled from the operation server when inter-server replication is performed, cancel it from a non-operational server.
If the processing cannot be cancelled by using the command, use ETERNUS Web GUI to cancel it.
Execute the swsrprecoverres command.
Execute the swsrpstat command to verify that no other errors have occurred.
Execute the swsrpdelvol command to delete the replication volume in which the error occurred.
Execute the swsrpsetvol command to register a new replication volume. If the replication volume on which the error occurred is to be repaired, execute the swsrpsetvol command after executing the stgxfwcmsetdev command on the Management Server.
Re-execute the processing in which the error occurred.

8.4.2.2 Troubleshooting If Bad Sector Occurred in Copy Source Volume

If a bad sector occurred in the copy source volume, use the following procedure to restore the copy source volume:
If the Storage Cluster Continuous Copy function is used, refer to "7.9.1 Recovery from Hardware Failure".

Execute the swsrpcancel command to cancel processing for which the error occurred.
If inter-server replication was being performed and cancellation is not possible from the active server, cancel processing from the inactive server.
If processing cannot be cancelled using commands, use ETERNUS Web GUI to cancel it.
Execute the swsrpstat command to check for other errors.
Restoration is performed by overwriting the area containing the bad sector. Select the appropriate method, in accordance with the usage or use status of the copy source volume, from the methods below.
- Restoration method 1:
  If the area can be reconstructed from high-level software (file system, DBMS, or similar), reconstruct the area.
- Restoration method 2:
  If the area containing the bad sector is an area that is not being used, such as an unused area or a temporary area, use a system command (for example, the UNIX dd command or the Windows format command) to write to the area.
- Restoration method 3:
  Execute the swsrpmake command to restore the data from the copy destination volume. (Restoration is also possible from the copy destination volume of the copy process for which the bad sector occurred.)

8.4.2.3 Troubleshooting When Lack of Free Physical Space Has Occurred in Copy Destination Volume

Use the following procedure to recover the copy destination volume:

Execute the swsrpcancel command to cancel the copy session in which the error occurred.
If the cancellation is not possible from the operation server when an inter-server replication is being performed, cancel it from a non-operation server.
If the copy session cannot be cancelled using command, use ETERNUS Web GUI to cancel it.
Check the status of the copy destination volume and initialize it.
- If the copy destination volume is TPV
  Use Storage Cruiser or ETERNUS Web GUI to check the status of the copy destination volume and initialize it.
  For the operation procedure when using Storage Cruiser, refer to "Display Volume" and "Delete Reserved Volume or Forcible Delete/Format Volume" in the Web Console Guide.
- If the copy destination volume is FTV
  Use Storage Cruiser or ETERNUS Web GUI to check the status of the copy destination volume and initialize it.
  For the operation procedure when using Storage Cruiser, refer to "Display FTV" and "Format FTV" in the Web Console Guide.
- If the copy destination volume is SDV
  Use Storage Cruiser, the swstsdv command, or ETERNUS Web GUI to check the status of the copy destination volume and initialize it.
  For the operation procedure when using Storage Cruiser, refer to "Display Volume" and "Delete Reserved Volume or Forcible Delete/Format Volume" in the Web Console Guide.
  The operation procedure when using the swstsdv command is as follows:
  1. Execute the command with the "stat" subcommand and check the status of the SDV.
  2. Execute the command with the "init" subcommand and initialize the SDV.
Recreate partitions (slices) in the copy destination volume.

The following factors may have caused a capacity shortage of the physical space in the copy destination volume:

The estimate of the required physical space for the copy destination volume is not adequate.
Although the estimate of the required physical space for the copy destination volume is adequate, because large amounts of updates have been performed in the copy destination volume in which the copy session does not exist, the physical space of the copy destination volume is being wasted.

When corresponding to the above "a", re-estimate the physical space required for the copy destination volume and consider the disk expansion.

If the copy destination volume is TPV
Use Storage Cruiser or ETERNUS Web GUI to check the status of the Thin Provisioning pool and expand the capacity of the Thin Provisioning pool.
For the operation procedure when using Storage Cruiser, refer to "Display Thin Provisioning Pool" and "Expand Capacity of/Format/Change Threshold Value of/Delete Thin Provisioning Pool" in the Web Console Guide.
If the copy destination volume is FTV
Use Storage Cruiser or ETERNUS Web GUI to check the status of the Tier pool and expand the sub-pool capacity of the Tier pool.
For the operation procedure when using Storage Cruiser, refer to "Display Tier Pool" and "Expand Capacity of Sub-Pool in Tier Pool" in the Web Console Guide.
If the copy destination volume is SDV
Use the swstsdv command or ETERNUS Web GUI to check the status of the SDP and expand the SDP capacity.
Creating an exclusive volume named Snap Data Pool Volume (SDPV) enables the SDP and the created SDPV is automatically incorporated in the SDP. Creating an SDPV of the physical capacity that is assigned to a copy destination volume expands the SDP capacity.
The operation procedure when using the swstsdv command is as follows:
1. Execute the command with the "poolstat" subcommand and check the status of the SDP.
2. Create the SDPV with ETERNUS Web GUI.

8.4.2.4 Error (halt) on Remote Copy Processing

The REC restart (Resume) method varies, depending on the halt status.

Execute the swsrpstat command with the -H option specified to check the halt status, and then implement the relevant countermeasure.

For "halt(use-disk-buffer)" or "halt(use-buffer)"
This status means that data is saved to the REC Disk buffer or REC buffer because data cannot be transferred due to a path closure (halt).
In order to restart REC, perform path recovery before a space shortage occurs for the REC Disk buffer or REC buffer.
After recovery, the ETERNUS Disk storage system restarts REC automatically.
If a space shortage has already occurred for the REC Disk buffer or REC buffer, the "halt(sync) or halt (equivalent)" status shown below occurs. Implement the countermeasures for that status.
For "halt(sync) or halt(equivalent)"
This status means that data transfer processing was discontinued due to a path closure (halt).

The REC restart method differs for different REC Recovery modes.

For Automatic Recovery Mode

Remove the cause that made all paths close (halt).
ETERNUS Disk storage system automatically restarts (Resume) REC.

For Manual Recovery Mode

Remove the cause that made all paths close (halt).

Execute the swsrpmake command to forcibly suspend the REC that is in the halt status.

[For volume units]
swsrpmake -j <replication source volume name> <replication destination volume name>

[For group units]
swsrpmake -j -Xgroup <group name>

Execute the swsrpstartsync command to restart (Resume) the REC. The -t option must be specified if REC is being restarted after a forcible suspend.

[For volume units]
swsrpstartsync -t <replication source volume name> <replication destination volume name>

[For group units]
swsrpstartsync -t -Xgroup <group name>