Top
ETERNUS SF AdvancedCopy Manager V16.9A Operation Guide
FUJITSU Storage

7.4.2 Troubleshooting

The following figure shows the troubleshooting flow when a hardware or similar fault occurs.

Figure 7.3 Troubleshooting Flow (When Fault Occurs During Replication)

Reference ahead 1: 7.4.2.1 Hardware Error on Replication Volume
Reference ahead 2: 7.4.2.2 Troubleshooting If Bad Sector Occurred in Copy Source Volume
Reference ahead 3: 7.4.2.3 Troubleshooting When Lack of Free Physical Space Has Occurred in Copy Destination Volume
Reference ahead 4: 7.4.2.4 Error (halt) on Remote Copy Processing

Note

  • Refer to "7.4.1 Overview" for details about the Status column and "Fault location".

    If the Status column is "?????", check if the copy processing is in the error suspend state ("failed") or the hardware suspend state ("halt") using ETERNUS Web GUI.
    If the copy processing is in either of these states, take the action indicated in the above troubleshooting flow.

    In other cases, take the action checked in the following points.

    • If a device is not accessible:

      Check if the device exists.

    • If there is anything unusual with Managed Server, switches, etc.:

      Contact Fujitsu Technical Support.

  • Use ETERNUS Web GUI to check the error codes. Use the following two methods to check.

    • Checking with the swsrpstat command

      Execute the command with the -O option.

    • Checking with ETERNUS Web GUI

      1. On the [Display status] menu, click [Advanced Copy status display] in the status display.

      2. At "Session status", click the "Number of active sessions" link for the relevant copy type.

      3. Refer to the value in the "Error code" column of the relevant copy process.

    The following table shows the meanings of the error codes.

    Table 7.7 Meanings of Error Codes

    Error Code

    Meaning

    0xBA

    If a) or b) below applies, a bad sector was created in the transaction volume.

    1. QuickOPC has not yet performed physical copying and tracking is in progress

    2. EC/REC is in the suspend status (replication established status)

      Note:
      If a bad sector is created in a transaction volume when a) or b) applies, the ETERNUS Disk storage system automatically changes the copy processing to the error suspend state. This prevents a restart of QuickOPC or EC/REC resume and prevents the copy destination volume from being overwritten with invalid copy source volume data.

    0x1E,
    0x2E,
    0xBB

    A lack of free space has occurred in the copy destination volume.

    Other than above

    An error other than the above occurred.

7.4.2.1 Hardware Error on Replication Volume

When a hardware error occurs in a duplicate volume, perform the repair work on the error according to the following procedures.

  1. Execute the swsrpcancel command to cancel the processing in which the error occurred. If the processing cannot be cancelled from the operation server when inter-server replication is performed, cancel it from a non-operational server.
    If the processing cannot be cancelled by using the command, use ETERNUS Web GUI to cancel it.

  2. Execute the swsrprecoverres command.

  3. Execute the swsrpstat command to verify that no other errors have occurred.

  4. Execute the swsrpdelvol command to delete the replication volume in which the error occurred.

  5. Execute the swsrpsetvol command to register a new replication volume. If the replication volume on which the error occurred is to be repaired, execute the swsrpsetvol command after executing the stgxfwcmsetdev command on the Management Server.

  6. Re-execute the processing in which the error occurred.

7.4.2.2 Troubleshooting If Bad Sector Occurred in Copy Source Volume

If a bad sector occurred in the copy source volume, use the following procedure to restore the copy source volume:

  1. Execute the swsrpcancel command to cancel processing for which the error occurred.
    If inter-server replication was being performed and cancellation is not possible from the active server, cancel processing from the inactive server.
    If processing cannot be cancelled using commands, use ETERNUS Web GUI to cancel it.

  2. Execute the swsrpstat command to check for other errors.

  3. Restoration is performed by overwriting the area containing the bad sector. Select the appropriate method, in accordance with the usage or use status of the copy source volume, from the methods below.

    • Restoration method 1:
      If the area can be reconstructed from high-level software (file system, DBMS, or similar), reconstruct the area.

    • Restoration method 2:
      If the area containing the bad sector is an area that is not being used, such as an unused area or a temporary area, use a system command (for example, the UNIX dd command or the Windows format command) to write to the area.

    • Restoration method 3:
      Execute the swsrpmake command to restore the data from the copy destination volume. (Restoration is also possible from the copy destination volume of the copy process for which the bad sector occurred.)

7.4.2.3 Troubleshooting When Lack of Free Physical Space Has Occurred in Copy Destination Volume

Use the following procedure to recover the copy destination volume:

  1. Execute the swsrpcancel command to cancel the copy session in which the error occurred.
    If the cancellation is not possible from the operation server when an inter-server replication is being performed, cancel it from a non-operation server.
    If the copy session cannot be cancelled using command, use ETERNUS Web GUI to cancel it.

  2. Check the status of the copy destination volume and initialize it.

    • If the copy destination volume is TPV

      Use Storage Cruiser or ETERNUS Web GUI to check the status of the copy destination volume and initialize it.
      For the operation procedure when using Storage Cruiser, refer to "Display Volume" and "Delete Reserved Volume or Forcible Delete/Format Volume" in the Web Console Guide.

    • If the copy destination volume is FTV

      Use Storage Cruiser or ETERNUS Web GUI to check the status of the copy destination volume and initialize it.
      For the operation procedure when using Storage Cruiser, refer to "Display FTV" and "Format FTV" in the Web Console Guide.

    • If the copy destination volume is SDV

      Use Storage Cruiser, the swstsdv command, or ETERNUS Web GUI to check the status of the copy destination volume and initialize it.
      For the operation procedure when using Storage Cruiser, refer to "Display Volume" and "Delete Reserved Volume or Forcible Delete/Format Volume" in the Web Console Guide.
      The operation procedure when using the swstsdv command is as follows:

      1. Execute the command with the "stat" subcommand and check the status of the SDV.

      2. Execute the command with the "init" subcommand and initialize the SDV.

  3. Recreate partitions (slices) in the copy destination volume.

The following factors may have caused a capacity shortage of the physical space in the copy destination volume:

  1. The estimate of the required physical space for the copy destination volume is not adequate.

  2. Although the estimate of the required physical space for the copy destination volume is adequate, because large amounts of updates have been performed in the copy destination volume in which the copy session does not exist, the physical space of the copy destination volume is being wasted.

When corresponding to the above "a", re-estimate the physical space required for the copy destination volume and consider the disk expansion.

7.4.2.4 Error (halt) on Remote Copy Processing

The REC restart (Resume) method varies, depending on the halt status.

Execute the swsrpstat command with the -H option specified to check the halt status, and then implement the relevant countermeasure.

The REC restart method differs for different REC Recovery modes.

For Automatic Recovery Mode
  1. Remove the cause that made all paths close (halt).

  2. ETERNUS Disk storage system automatically restarts (Resume) REC.

For Manual Recovery Mode
  1. Remove the cause that made all paths close (halt).

  2. Execute the swsrpmake command to forcibly suspend the REC that is in the halt status.

    [For volume units]
    swsrpmake -j <replication source volume name> <replication destination volume name>
    
    [For group units]
    swsrpmake -j -Xgroup <group name>
  3. Execute the swsrpstartsync command to restart (Resume) the REC. The -t option must be specified if REC is being restarted after a forcible suspend.

    [For volume units]
    swsrpstartsync -t <replication source volume name> <replication destination volume name>
    
    [For group units]
    swsrpstartsync -t -Xgroup <group name>