Top
ETERNUS SF Storage Cruiser 14.0 User's Guide

8.1 Windows Displayed in the Event of a Fault and Troubleshooting

If a fault occurs on a device, an error description of the fault is output to the event log and device events in the resource view, the icon color of the device also changes to red [Error] or yellow [Warning] if the support level of the device is A or B. (The criteria of red and yellow color-coding depends on the device definition.) If this software can identify an access path affected by the fault, the access path status is changed to "Access Path Error" (red).

For a device whose support level is I, an error description of a fault on the device is not output to the event log and device events by default. Neither is the color of the device icon changed. However, if all of the following conditions are satisfied, it is possible to output an error description of the fault to the event log and device events and to change the color of the device icon:

According to the displayed status, take action in the sequence of steps below.
Incidentally, for middleware errors not involving server node HBA and multipath faults, only events are reported. To troubleshoot middleware errors, refer to the appropriate middleware manuals.

  1. If the storage system, library, or bridge is in the [Error] or [Warning] state:

    Double-click the storage system, library, or bridge on the GUI window to display the respective view. Right-click the icon of the target device, and select [Call management software] from the pop-up menu. From the Storage Maintenance window that is then displayed, identify the part of storage with the error. To replace the failed component, contact a FUJITSU engineer (CE) as necessary.

    If a channel adapter (CA) fails and Refresh is performed, "Error" is displayed for the CA in the "Storage view" (only for the ETERNUS and GR machines). Likewise, if a robot or tape in the library device fails and Refresh is performed, "Error" is displayed for the robot or tape (only for LT270/LT250/LT160).

  2. If [Error] or [Warning] is indicated for a Fibre Channel switch or hub:

    Double-click the Fibre Channel switch or hub on the GUI window to move to the Fibre Channel Switch/Hub view. Right-click the icon of target Fibre Channel switch or hub, and select [Call management software] from the pop-up menu. From the Device Maintenance window that is then displayed, identify the failed component of the Fibre Channel switch or hub. To replace the failed component, contact a FUJITSU engineer (CE) as necessary.

    If a port is in the abnormal state because of a GBIC or other some error, [Error] is indicated for the port in the Fibre Channel Switch/Hub view. In this event, the status of the affected access path is displayed as [Access Path Error]. To identify the special file name affected (device file name), refer to the properties of the access path.

    If an access path is defined in the settings of an access path of this software, the zoning setting of the Fibre Channel switch is defined as WWPN zoning. Therefore, if a port of the Fibre Channel switch fails, the Fibre Channel cable connected to it can be reconnected to another Fibre Channel switch port, and operation can be restarted.

    Also, see "A warning for a device is displayed (in yellow)" in "4.2.1.3 Problem-handling (FAQ)".

  3. If a server node is in the [Error] or [Warning] state:

    Check the physical route of the access path indicated with [Access Path Error] in the "Server Node view" of this software. If no faulty device is on the route, check whether a Fibre Channel cable along the route is disconnected. If you cannot identify the cause, contact a FUJITSU engineer.

    On the server node that msdsm (the multi path driver that is built into Windows operating system since Windows Server 2008 by the standard) is set effectively the server node becomes [Warning] state even if the single path (access path that is not redundant configuration) exists. This depends on the msdsm specification. Because the information of the access path that corresponds to [Access Path Error] disappears from OS, this software cannot be distinguished from the single path. If access path is configured with the single path, set the msdsm ineffectively.

    Also, see "A warning for a device is displayed (in yellow)" in "4.2.1.3 Problem-handling (FAQ)".

  4. If an access path is in the [Access Path Error] state:

    Refer to the properties of the access path in the [Access Path Error] state as indicated in the "Server Node view", and identify the special file name affected. The affected special file name /dev/rdsk/cXtYd?s? of the Solaris OS can be identified by specifying X and Y, where X is the controller number of the HBA properties and Y is the target ID of the access path properties.

    [Solaris OS server node] For an access path configured using multipath disk control (ETMPD, MPLB, MPHD, and GRMPD), refer to multipath disk control manuals for information on taking action on the server node as described below. To replace an HBA, follow the HBA replacement procedure given below.

    Type of Multipath disk control

    Manual name

    ETMPD

    ETERNUS Multipath Driver User's Guide

    MPHD

    Multipath Disk Control Guide

    MPLB

    Multipath Disk Control Load Balance Option (MPLB) Guide

    GRMPD

    GR Multipath Driver User's Guide

    • Obtaining path statuses (iompadm info)

      Check the current path configuration and status. If the multipath disk was not accessed, FAIL may not be detected.

    • I/O stop related to a failed access path (iompadm change)

      ETMPD/MPLB/GRMPD has a function to automatically stop input and output affected from the failed part.

    • Replacement of failed components (if necessary) and troubleshooting

      To replace a failed component, contact a FUJITSU engineer (CE).

    • I/O restart of a stopped access path (iompadm restart)

      For an access path not configured using multipath disk control (ETMPD, MPLB, MPHD, and GRMPD), a job performed with the target special file mounted is affected. Stop the job and take appropriate action for the fault. To replace a failed component, contact a FUJITSU engineer (CE). To replace an HBA, follow one of the procedures given in Steps 5 and 6, "Host bus adapter (HBA) replacement procedure".

  5. [For Solaris OS server node] Host bus adapter (HBA) replacement procedure

    For a server node that does not support the hot system replacement function, turn off the server node in order to replace the HBA. Do not restart or shut down the server node in reconfigure mode during replacement. (This is because the special file name is changed if the reboot -r command, boot -r command, or other such command is executed.) Only for a server node that supports the hot system replacement function, operation can continue while the HBA is replaced and the process described in "6.3.5 Access path inheritance" is followed. For more information on hot system replacement, refer to the appropriate device manuals.

    1. Shutting down the server node

      Confirm that the /reconfigure file does not exist, and turn off the device.

      # ls /reconfigure
      /reconfigure: No such file or directory

      Confirm that the file does not exist as described above. If the "reconfigure" file is found, delete it, create the "reconfigure" file after HBA replacement, and then reboot the server node.

      To delete the "reconfigure" file, enter the following command:

      # rm /reconfigure

      After deleting the "reconfigure" file, execute the Is command to verify that the file does not exist.

      Shut down the server node without reconfiguration.

      # /usr/sbin/shutdown -y -i0 -g0 
    2. Replacing the HBA

      To replace a failed component, contact a FUJITSU engineer (CE).

    3. Activating the server node

      Activate the server node.

      No access path, however, is recovered here. This is because the zoning is based on old HBA information is defined on the storage system and Fibre Channel switch.

    4. Inheriting access paths (resetting security)

      Follow the process of "inheriting an access path" from this software, and define the zoning setting so that the method of access based on the new HBA information can be the same as that for the previous access path of the storage system and Fibre Channel switch.

    5. Reboot the server node

      Reboot the server node without reconfiguration (reboot -r, boot -r, etc. are prohibited.)

      # /usr/sbin/shutdown -i6

      The access path is recovered here. Accordingly, the path for multipath disk control is also recovered.

  6. [For Windows, Linux, HP-UX and other server nodes] Host bus adapter (HBA) replacement procedure

    For a server node that does not support the hot system replacement function, turn off the server node in order to replace the HBA. Only for a server node that supports the hot system replacement function, operation can continue while the HBA is replaced and the access path is inherited. For more information on hot system replacement, refer to the appropriate device manuals.

    1. Shutting down the server node

    2. Replacing the HBA

    3. Activating the server node
      No access path, however, is recovered here. This is because zoning based on old HBA information is defined on the storage and Fibre Channel switch.

    4. Inheriting access paths (resetting security)
      Follow the process of "inheriting an access path" from this software, and define the zoning setting so that the method of access based on the new HBA information can be the same as that for the previous access path of the storage system and Fibre Channel switch.

    5. Reactivating the server node

  7. Actions to take on this software after component replacement

    Select [View] - [Refresh] from the menu to obtain the current device status. When the device state is recovered because of the replacement, the device state returns to and is displayed as the normal state.

    For the device status edited in the LT120, LT130, LT200, LT210, LT220, LT230, NR1000 configuration, and edited with the Manual Configuration window, those are not recovered automatically. After the device is recovered, return the state for the device manually. There are following two methods to return the state for the device manually.

    • Right-click the device icon in the Resource View, select [Property] from the pop-up menu. Click <Change> button of "Device Status", select "normal" in the list of "Change Device Status" dialog.

    • Right-click the device icon in the manual configuration window, select [Change Device Information] from the pop-up menu, select "normal" in the "Device Status" list.

    When the server node is in the [Error] or [Warning] state though the state of the storage system or the Fibre Channel switch has returned to the normal state:

    Recover the failed path by following the instructions in multipath disk control (ETMPD, GRMPD, MPLB, and MPHD) manuals. After recovering the path of multipath disk control, select [Refresh] again to check the current device status.

    Type of Multipath disk control

    Manual name

    ETMPD

    ETERNUS Multipath Driver User's Guide

    MPHD

    Multipath Disk Control Guide

    MPLB

    Multipath Disk Control Load Balance Option (MPLB) Guide

    GRMPD

    GR Multipath Driver User's Guide