PRIMECLUSTER Installation and Administration Guide 4.2 (Linux for Itanium) |
Contents
Index
![]() ![]() |
Appendix C Troubleshooting | > C.1 Collecting Troubleshooting Information |
The "pclsnap" command is a tool with which you can collect information for troubleshooting of PRIMECLUSTER. If a failure occurs in the PRIMECLUSTER system, this tool enables you to collect required information to pursue an investigation into the cause of the problem. You can execute this command as follows:
Log in with system administrator authority.
Execute the "pclsnap" command.
/opt/FJSVpclsnap/bin/pclsnap -a output
or
/opt/FJSVpclsnap/bin/pclsnap -h output
If -a is specified, the amount of data becomes large because all detailed information is collected. If -h is specified, only cluster control information is collected.
Specify a special file name or an output file name (ex: /dev/st0) to output information. The specified name is the file to which collected information is output when the "pclsnap" command is executed.
If you a specifying a relative path from the current directory to an output file name that contains a directory, begin the path specification with "./".
For details on the "pclsnap" command, see the "README" file included in the "FJSVpclsnap" package.
Execution timings for the pclsnap command
For problems that occur during operation, for example, if an error message is output, execute the "pclsnap" command immediately after the problem occurs.
If the "pclsnap" command cannot be executed because the system hangs, collect a crash dump. Then start the system in single user mode, and execute the "pclsnap" command.
For information on how to collect a crash dump, see "Crash Dump."
After an error occurs, if a node restarts automatically (the node could not be started in single-user mode) or if the node is mistakenly started in multi-user mode, execute the "pclsnap" command.
If investigation information cannot be collected because the "pclsnap" command results in an error or the "pclsnap" command does not return, then collect a system dump.
Free space required for the execution of the pclsnap command
The approximate amount of free space required for the execution of the "pclsnap" command is listed in the following table:
Default directory |
Free space (approximate) (MB) |
|
---|---|---|
Output directory |
Current directory during the execution of the command |
300 |
Temporary directory |
/tmp |
500 |
The listed values for the amount of free space (300 MB, 500 MB) may be insufficient depending on the system environment.
If troubleshooting information cannot be collected successfully due to there being insufficient directory space, the "pclsnap" command outputs an error message or a warning message upon the termination of the execution. In this case, re-execute the command according to the corrective action given below:
Corrective action when the amount of free space in the output directory is insufficient
The "pclsnap" command outputs the error message shown below when the creation of the output file has failed:ERROR: failed to generate the output file "xxx".DIAG: ...Corrective action:Change the output directory to one with a large amount of free space, and then re-execute the command.Example:When the output directory is be changed to /var/crash# /opt/FJSVpclsnap/bin/pclsnap -a /var/crash/output
When there is insufficient free space in the temporary directory
The "pclsnap" command may output the following warning message upon the termination of the command execution:WARNING: The output file "xxx" may not contain some data files.DIAG: ...The output of this warning message indicates that the output file of the "pclsnap" command has been created. However, part of the information to be collected may not be included in the output file.Corrective action:Change the temporary directory to one with sufficient free space, and then re-execute the command.Example:When the temporary directory is to be changed to /var/crash# /opt/FJSVpclsnap/bin/pclsnap -a -T/var/crash output
If the same warning message continues to be output even after you change the temporary directory, the error may be caused by one of the following:
(1) A timeout occurs for a specific information collecting command due to the state of the system.
(2) The file from which information is to be collected is larger than the amount of free space in the temporary directory.
In the case of (1), the log of timeout occurrence is recorded in the pclsnap.elog file that is contained in the pclsnap output file. If possible, collect a crash dump from the pclsnap.elog file and from the pclsnap output file.
In the case of (2), confirm that the sizes of (a) and (b), below, are not larger than the amount of free space in the temporary directory:
(a) Log file size
/var/log/messages
Log files (SMAWsf/log/rcsd.log etc.) placed under
/var/opt/SMAW*/log/
(b) Total size of the core files
GFS core file
/var/opt/FJSVsfcfs/cores/*GDS core file
/var/opt/FJSVsdx/*core/*
If these are larger than the amount of free space in the temporary directory, move the relevant files to another partition that contains neither the output directory nor the temporary directory, and then re-execute the "pclsnap" command. Do not delete the moved files. Instead, save them.
Contents
Index
![]() ![]() |