ETERNUS SF Storage Cruiser User's Guide 13.2 - Solaris (TM) Operating System / Linux / Microsoft(R) Windows(R) -

Appendix E Troubleshooting

This appendix explains the most common problems that might be encountered during ETERNUS SF Storage Cruiser operation. Further, it explains basic troubleshooting and also provides information on where to find further help.

E.1 Troubleshooting information

If a failure occurs in this product, the following troubleshooting information will be required:

Information for initial investigation

It is the information required for the initial analysis of problems.

The size of data should be small so that it can easily be sent by email. Collect troubleshooting information immediately when a problem occurs. Sometimes, important information is lost over time.
Information for further investigation

The information for initial investigation can sometimes determine the cause of problems. However, some problems might require further, detailed troubleshooting information.

Therefore, in case the cause of a problem cannot be determined from an initial investigation alone, it is required to collect more detailed information. Collect information for further investigation and submit it to Fujitsu technical staff if requested.

Sometimes, important information is lost over time. Collect troubleshooting information immediately when a problem occurs.
When using the following to store the collected information, check if there is enough space in the directory to ensure system resources are not depleted or any effect is had on system operation. Also, be careful about space when a different directory is specified with the command option. Ensure there is sufficient space in the directory.

[Solaris/Linux/HP-UX]
When using the "/", "/var", and "/tmp" file systems

[Windows]
When using file systems configured in the TEMP environment variable
If troubleshooting information is collected using the managersnap command or agenttsnap command, the information will reach 200KB - 5 MB (the maximum size is 30MB with "-all" option). Also, it is necessary to reserve 5 - 30MB (30 - 80 MB with the "-all" option) for a temporary directory to store the information. The space required varies depending on operating environment, and the data described above is for reference.

E.2 Information for initial investigation

When a system failure occurs, determine and analyze the cause of the problem as soon as possible.

This section explains information required for initial investigation.

+Advantage

The information used for initial problem analysis can be acquired immediately, and is small enough to be sent by email.

E.2.1 Collecting information from the administrative server (Solaris)

Follow the procedures below to collect information on the administrative server.

Collecting error code (swsagXXXX)

If an error dialog box is displayed, make a note of the error code (swsagXXXX).
Log in to the server with OS administrator permissions.

OS administrator permissions (root) are required to execute the managersnap command.
Execute the managersnap command.

# /opt/FJSVssmgr/sbin/managersnap [-dir directory] <RETURN>

Use the -dir option to specify the directory to store the information. If this option is not used, information will be saved to the /tmp directory.
Submit the information to the Fujitsu technical staff.

In cluster configurations, collect information and execute the command on both the primary and secondary nodes of the administrative server.
Although the following message is output in /var/adm/messages when the managersnap command or agentsnap command is executed on Solaris 10 OS, it will be not affect command processing and no corrective action needs to be taken.

WARNING: The <if>:ip*_forwarding ndd variables are obsolete and may be removed in a future release of Solaris. Use ifconfig (1M) to manipulate the forwarding status of an interface.

E.2.2 Collecting information from the administrative server (Linux)

Follow the procedures below to collect information on the administrative server.

Collecting error code (swsagXXXX)

If an error dialog box is displayed, make a note of the error code (swsagXXXX).
Log in to the server with OS administrator permissions.

OS administrator permissions (root) are required to execute the managersnap command.
Execute the managersnap command.

# /opt/FJSVssmgr/sbin/managersnap [-dir directory] <RETURN>

Use the -dir option to specify the directory to store the information. If this option is not used, information will be saved to the /tmp directory.
Submit the information to the Fujitsu technical staff.

In cluster configurations, collect information and execute the command on both the primary and secondary nodes of the administrative server..

E.2.3 Collecting information from the administrative server (Windows)

For the administrative server of Windows OS, there is no command for collecting the troubleshooting information. The method for collecting information is as follows. Collect information directly by explorer or other tools.

Collecting error code (swsagXXXX)

If an error dialog box is displayed, make a note of the error code (swsagXXXX).
Collecting trace log on administrative server

Collect all files from the directory below. (including sub-directories)
- Administrative-server-work-directory\Manager\var\opt\FJSVssmgr\
- Administrative-server-work-directory\Manager\var\opt\FJSVrcxmr\
Collecting setting files for administrative server

Collecting the files below.
- Administrative-server-environment-setting-directory\Manager\etc\opt\FJSVssmgr\current\*.conf

E.2.4 Collecting information on managed server node (Solaris)

Login as the super user(root)

Only OS administrator (root) user can execute this command.
Execute the agentsnap command.

# /opt/FJSVssage/sys/agentsnap [-dir directory] <RETURN>

It is possible to specify a directory to store the information by choosing the -dir option.
If the option is omitted, information will be stored in /tmp.
Submit the file to Fujitsu technical staff.

Although the following message is output in /var/adm/messages when the rcxmgrsnap command or agentsnap command is executed on Solaris 10 OS, it will be not affect command processing and no corrective action needs to be taken.

WARNING: The <if>:ip*_forwarding ndd variables are obsolete and may be removed in a future release of Solaris. Use ifconfig (1M) to manipulate the forwarding status of an interface.

E.2.5 Collecting information on managed server node (Windows)

There is no command for collecting the troubleshooting information. The method for collecting information is as follows. Collect information directly by explorer or other tools. When collects files, please also get the information about the time to create/update the file and file's property.

Collecting trace log on administrative server

Collect all files from the directory below. (including sub-directories).
- Administrative-server-work-directory\Agent\var
Collecting setting files for administrative server

Collecting the files below.
- Administrative-server-environment-setting-directory\Agent\etc
Collecting windows event log by Event Viewer, which is a standard application of Windows.

Run "Event Viewer", select [Action] - [Save Log File As], and save the following windows event log files.
- System files
- Application files
Submit the file to Fujitsu technical staff.

E.2.6 Collecting information on managed server node (Linux)

Login as the super user(root)

Only OS administrator (root) user can execute this command.
Execute the agentsnap command.

# /opt/FJSVssage/sys/agentsnap [-dir directory] <RETURN>

It is possible to specify a directory to store the information by choosing the -dir option.
If the option is omitted, information will be stored in /tmp.
Submit the file to Fujitsu technical staff.

E.2.7 Collecting information on managed server node (HP-UX)

There is no command for collecting the troubleshooting information. The method for collecting information is as follows. Collect information directly by cp command.

Collecting system log

Collect all files from the directory below.
- /var/adm/syslog/syslog.log
- /var/adm/syslog/OLDsyslog.log (If exists)
Collect trace log on administrative server

Collecting the files below. (including sub-directories)
- /var/opt/FJSVssage
Collecting setting files for administrative server

Collecting the files below. (including sub-directories)
- /etc/opt/FJSVssage
Collecting core file

Collecting the files below. (if exists)
- /core
Submit the file to Fujitsu technical staff.

E.2.8 Collect information about the client environment

For ESC Client program, there is no command for collecting the troubleshooting information. The method for collecting information on client is as follows. Collect information directly by explorer or other tools.

If the correlation window cannot be started independently after the client is installed, it means that the following folders do not exist. There is no need to collect the resources in the following folders.

<Window Vista>

<Drive name where Windows is installed>:ETERNUS-SSC\Client\var

<Non-Windows Vista>

client_Installation_directory\Client\var

"the correlation window cannot be started independently" is to open a correlation window by the following methods.

-Double-click the <Storage Cruiser Correlation Window> icon on the desktop

-Select [Start] - [All Programs] - [ETERNUS SF Storage Cruiser] - [Storage Cruiser Correlation Window] from Windows start menu

Collecting setting files for Client.

Collecting the files below (including sub-directories)

If the directory doesn't exist, there is no need to collect the files.

<Windows Vista>
- <Drive name where Windows is installed>:\ETERNUS-SSC\Client\workspace
- <Drive name where Windows is installed>:\ETERNUS-SSC\Client\etc
- <Drive name where Windows is intalled>:\ETERNUS-SSC\Client\var
<Non-Windows Vista>
- client_Installation_directory\Client\workspace
- client_Installation_directory\Client\etc
- client_Installation_directory\Client\var
Collect a screen shot of the window at the time the trouble occurred.

Perform the following on the client that troubleshooting information is being collected for.
Press the Print Screen key, and then save the copy in a file format (e.g. bitmap file) after pasting the image on the clipboard with the edit tool.
Collect the system version.

Using the same method as in 4, obtain a hard copy of the OS, version level, and service pack.
Select "My Computer" and then select [Properties] from the popup menu. Display the window of [General], and then take hard copies of each of the tabs.
Submit the information to Fujitsu technical staff.

E.3 Information for further investigation

If it is not possible to determine the cause of the problem in the initial investigation, further detailed information may be needed.

This section explains how to collect required information to resolve the problem.

+Data feature

As there is various information required in order to determine the cause of the problem, the size of the data will be greater than that used for the initial investigation.

E.3.1 Collecting information from the administrative server (Solaris)

Follow the procedures below to collect information on the administrative server.

Log in to the server with OS administrator permissions

OS administrator permissions (root) are required to collect the information.
Use the fjsnap command to collect information. If for some reason fjsnap cannot be executed, execute the managersnap command to collect information.
1. fjsnap command
  
  # /opt/FJSVsnap/bin/fjsnap -a output <RETURN>
  
  Replace the variable 'output' with the name of the output medium or specific file name to store the collected information.
2. managersnap command
  
  # /opt/FJSVssmgr/sbin/managersnap [-dir directory] -all <RETURN>
  
  Use the -dir option to specify the directory to store the information. If this option is not used, information will be saved to the /tmp directory.
Submit the collected investigation resources to Fujitsu technical staff.

To execute the managersnap command on a cluster configuration administrative server, collect information and execute the command on both the both primary and secondary nodes.
Although the following message is output in /var/adm/messages when the managersnap command or agentsnap command is executed on Solaris 10 OS, it will be not affect command processing and no corrective action needs to be taken.

WARNING: The <if>:ip*_forwarding ndd variables are obsolete and may be removed in a future release of Solaris. Use ifconfig (1M) to manipulate the forwarding status of an interface.

E.3.2 Collecting information from the administrative server (Linux)

Follow the procedures below to collect information on the administrative server.

Log in to the server with OS administrator permissions

OS administrator permissions (root) are required to collect the information.
Execute the fjsnap and managersnap commands to collect information.

If the fjsnap command cannot be used for some reason, the managersnap command can be used on its own.
1. fjsnap command
  
  # /opt/FJSVsnap/bin/fjsnap -a output <RETURN>
  
  Replace the variable 'output' with the name of the output medium or specific file name to store the collected information.
2. managersnap command
  
  # /opt/FJSVssmgr/sbin/managersnap [-dir directory] -all <RETURN>
  
  Use the -dir option to specify the directory to store the information. If this option is not used, information will be saved to the /tmp directory.
Submit the collected investigation resources to Fujitsu technical staff.

To execute the managersnap command on a cluster configuration administrative server, collect information and execute the command on both the both primary and secondary nodes.

E.3.3 Collecting information from the administrative server (Windows)

For ESC Windows Manager program, there is no command for collecting the troubleshooting information. Collect information by the following procedures.

Collect the information which is required by Fujitsu technical staff.
Submit the file to Fujitsu technical staff.

E.3.4 Collecting information on managed server node (Solaris)

For this method, execute the agentsnap command on the managed server node.

Login as the super user(root)

Only OS administrator (root) user can execute this command.
Execute agentsnap command.

# /opt/FJSVssage/sys/agentsnap [-dir directory] -all <RETURN>

It is possible to specify a directory to store the information by choosing the -dir option.
If the option is omitted, information will be stored in /tmp.
Submit the file to Fujitsu technical staff.

E.3.5 Collecting information on managed server node (Windows)

For ESC Windows Agent program, there is no command for collecting the troubleshooting information. Collect information by the following procedures.

Collect the information which is required by Fujitsu technical staff.
Submit the file to Fujitsu technical staff.

E.3.6 Collecting information on managed server node (Linux)

Login as the super user(root)

Only OS administrator (root) user can execute this command.
Execute agentsnap command.

# /opt/FJSVssage/sys/agentsnap [-dir directory] -all <RETURN>

It is possible to specify a directory to store the information by choosing the -dir option.
If the option is omitted, information will be stored in /tmp.
Submit the file to Fujitsu technical staff.

E.3.7 Collecting information on managed server node (HP-UX)

For ESC HP-UX Agent program, there is no command for collecting the troubleshooting information. Collect information by the following procedures.

Collect the information which is required by Fujitsu technical staff.
Submit the file to Fujitsu technical staff.

E.3.8 Collect information about the client environment

For ESC Client program, there is no command for collecting the troubleshooting information. Collect information by the following procedures.

Collect the information which is required by Fujitsu technical staff.
Submit the file to Fujitsu technical staff.

E.4 Configuration management functionality

The actions to be taken for problems related to configuration management functionality are shown below.

E.4.1 The device cannot be detected

+Phenomenon

Message swsag0018 ("The selected IP device cannot be found.") is displayed and the device cannot be registered.

+Action

If the server node cannot be detected
- Check that the device power supply is on, and that there are no problems on the LAN. If the Network load is high, clear it and then re-execute.
- Check that the Agent is running.
- User the Agent information change command (setagtip) to check the Agent start IP address.
- For details about the Agent information change command (setagtip), refer to the following:
  Solaris OS
  C.5.1.2 Agent information change command (setagtip)
  Windows
  C.6.1.2 Agent information change command (setagtip)
  Linux
  C.7.1.2 Agent information change command (setagtip)
  HP-UX
  C.8.1.1 Agent information change command (setagtip)
If the Fibre Channel switch cannot be detected
- Check that the device power supply is on, and that there are no problems on the LAN. If the network load is high, clear the traffic and then re-execute.
- Refer to the following for the Fibre Channel switch:
  - SN200 (Brocade SilkWorm) Fibre Channel switches, PRIMERGY BX600 Fibre Channel switchblades
    
    4.2.1.3 Problem-handling (FAQ)
  - VS900 Virtualization switches
    4.2.2.3 Problem-handling (FAQ)
  - SN200 MDS (Cisco MDS) Fibre Channel switches
    4.2.4.2 Problem-handling (FAQ)
  - McDATA Fibre Channel switches
    4.2.5.2 Problem-handling (FAQ)
If the disk array device cannot be detected
- Check that the device power supply is on, and that there are no problems on the LAN. If the Network load is high, clear it and then re-execute.
- Refer to 4.3.1.2 Problem-handling (FAQ) in Chapter 4 Setup of the Operating Environment.

E.4.2 The Fibre Channel card (HBA) is not displayed

+Phenomenon

The server node Fibre Channel card (HBA) is not displayed.

+Action

Check that the hardware and the driver are running correctly.
Check that the Fibre Channel card (HBA) is a supported target.
Check that the SNIA HBA API library can be set correctly. For details about the settings method, check the OS in 4.1 Server Node (Host).
If zoning information has not been set in the Fibre Channel switch, the Fibre Channel card (HBA) may not be displayed. Create the zoning information in the Fibre Channel switch, and then check the Fibre Channel card display.

E.4.3 If the device cannot communicate

+Phenomenon

The device cannot communicate or a timeout occurs

+Action

Check that the device power supply is on, and that there are no problems on the LAN.
Refer to "9.1.2 Changing the operating environment" before modifying the IP address.
For communication devices that use the SNMP protocol, such as Fibre Channel switches and storage devices, the community names of the target device and administrative server may not match. To modify the community name of the target device, refer to "D.2 sanma.conf Parameter", to set the community name, and then reflect the settings in the settings file.
In server nodes, check that the Agent is running.
To modify the host name after the server node device has been registered, follow the procedures in "9.1.2.4 Changing the name of a server node (host)".
Check that there are no problems with the communication mode settings in the network environment. If one communication mode is set to "Auto Negotiation" while the other is set to "Full Duplex" communications may be slow, this needs to be corrected.

E.4.4 The access path displays an ERROR (red)

+Phenomenon

The access path displays an ERROR (red)

+Action

Check that the device power supply is on, and that there are no problems on the LAN.
For details about Fibre Channel switch problems, refer to "4.2.1.3 Problem-handling (FAQ)".
Check the "Access path error" in "6.3.2.4 Access path status display".

E.4.5 The device displays a WARNING (yellow) or an ERROR (red)

+Phenomenon

The device displays a WARNING (yellow) or an ERROR (red)

+Action

Server node
- Check whether an access path error or multi path error has occurred.
- Check whether the HBA is faulty.
- The server node also displays a WARNING (yellow) if access path inheritance is required.
- Refer to "8.1 Windows Displayed in the Event of a Fault and Troubleshooting".
Fibre Channel switch
If the Fibre Channel switch cannot be detected, refer to the following for the Fibre Channel switch:
- SN200 (Brocade SilkWorm) Fibre Channel switches, PRIMERGY BX600 Fibre Channel switchblades
  4.2.1.3 Problem-handling (FAQ)
- VS900 Virtualization switches
  4.2.2.3 Problem-handling (FAQ)
- SN200 MDS (Cisco MDS) Fibre Channel switches
  4.2.4.2 Problem-handling (FAQ)
- McDATA Fibre Channel switches
  4.2.5.2 Problem-handling (FAQ)
- Common
  Refer to "8.1 Windows Displayed in the Event of a Fault and Troubleshooting
- ".
Disk array devices
- During the removal of the parts that configure the device, depending on the timing, a device error is detected and the device may display an ERROR (red). From the menu, select [View] and then [Refresh], or press the [F5] key to update the information.
- In all other cases, refer to "8.1 Windows Displayed in the Event of a Fault and Troubleshooting".
Devices that are edited in the manual configuration window
The state for devices that are edited in the manual configuration window is not recovered automatically. After the device is recovered, right-click the device icon in the manual configuration window , select [Change Device Information] from the pop-up menu, and manually return the state for the device.

E.4.6 Managing cascaded Fibre Channel switches (SN200)

+Phenomenon

Managing cascaded Fibre Channel switches (SN200)

+Action

To manage cascaded Fibre Channel switches (SN200), register all cascaded switches in this product. If all of the switches are not registered, the SAN environment cannot be managed correctly. For example, Fibre Channel switch information cannot be displayed correctly.

For details about dealing with problems that occur when all cascaded switches are registered correctly, refer to "4.3.1.2 Problem-handling (FAQ)".

E.4.7 The setting of the access path fails

+Phenomenon

The setting of the access path fails.

+Action

If an error occurs in the monitoring state for a single (or multiple) device from the server node, HBA, storage, CA, bridge, or switch, take the following action:
- If the monitoring state is "timeiout"
  
  Refer to "E.4.3 If the device cannot communicate".
- If the monitoring state is "undefined"
  
  Register the unit (device) using the physical resource management screen.
- If the monitoring state is "Invalid password " (storage, CA, bridge, or switch)
  
  Modify the device password that is maintained in the product. Use [Change account for device management] in the document view [Device] menu.
- If the monitoring state is "The access path must be inherited " (HBA)
  
  Refer to "8.3.1 HBA state is "The access path must be inherited" after the HBA on the server node is replaced".

E.5 Performance management functionality

The actions to be taken for problems related to performance management functionality are shown below.

E.5.1 Cannot register devices in the performance management window (swsag0609)

+Phenomenon

When the device icon is moved to the performance management window from the resource management screen using the Drag & Drop method and the message swsag0609 ("Uncompleted to register the device %IP ADDRESS% ") is displayed.

+Action

The message swsag0609 will appear if the configuration file for the target device cannot be found or read. Follow the procedure below to create the configuration file.

Select the target device name in the performance management window tree.
Execute [Device] - [Create Device Configuration File] form the performance management window menu bar.

If the creation of the configuration information file fails, check the state of the device or network according to the following procedure.

All supported storage models
- Using the ping command, perform a communication check to make sure that the device can be recognized from the operation control server.
- Log into ETERNUSmgr, and set the administrative server IP address in "Setting"(Main Menu) - "Set IP Address(Others)".
ETERNUS8000/6000/4000 (except M80,100)/2000
- If you have logged into ETERNUSmgr, log out.

E.5.2 Performance monitoring does not recover from the recovery state

+Phenomenon

Performance monitoring does not recover from the recovery state.

Note: "recovery state" means that the P mark at the top right of the device icon in the resource management screen is yellow.

+Action

Take one of the following actions:

ETERNUS8000/6000/4000 (except M80,100)/2000
If you have logged into ETERNUSmgr, log out.
If the network is too busy (is experiencing very high traffic levels) and the administrative server is on a different subnet than the device being monitored, it may not be possible to obtain performance monitoring information within the interval that has been set. Change the performance monitoring interval accordingly.

E.5.3 Cannot display the performance graph (after the device configuration is modified)

+Phenomenon

The device configuration information is modified using [Device] - [Create Configuration Device File] from the performance management window menu bar, but the performance information for a LUN that has been added or has had its configuration modified cannot be displayed.

+Action

Performance information cannot be displayed because the performance monitoring process cannot read or access the new configuration information.

Update the new device configuration information for performance monitoring by following the procedures in "7.2.11 Updating configuration information".

From the point at which the device configuration is modified until the new device configuration is updated in performance monitoring processing using the above procedure, performance information will be obtained and saved using the previous device configuration information.

E.5.4 Cannot display the performance graph

+Phenomenon

Performance information for the date and time on which performance monitoring was performed cannot be displayed as a graph.

+Action

Take one of the following actions:

Performance information for the date and time specified for the graph may have been deleted if the period set to save information has passed..
Refer to "D.4 perf.conf Parameter", and check the period for saving performance information.
Performance monitoring for a single device may have been executed from more than one administrative server at the same time. Softek Storage Cruiser and ETERNUS SF Storage Cruiser/Systemwalker Resource Coordinator administrative servers are also target administrative servers.
Check whether performance monitoring for a single device was executed from more than one administrative server at the same time.

E.6 Linkage with other software

E.6.1 Cannot display performance information in Systemwalker Service Quality Coordinator

+Phenomenon

Performance monitoring information in Systemwalker Service Quality Coordinator is not displaying, even though monitoring has been started from the resource management screen.

+Action

If performance management information is displaying in the performance management window, check Systemwalker Service Quality Coordinator settings. If not, follow the procedures in "E.5 Performance management functionality"..

E.7 Installation and uninstallation

E.7.1 [Windows Version] Cannot uninstall ETERNUS SF Storage Cruiser

+Phenomenon

While uninstalling [Windows Version] ETERNUS SF Storage Cruiser (manager or client or agent) using with <Add or Remove Programs> dialog, it is not uninstalled with error dialog that displays "Error reading setup initialization file"

+Action

Insert the ETERNUS SF Storage Cruiser program CD-ROM, execute setup.exe that was used to install.

Contents Index