Top
PRIMECLUSTER Messages
FUJITSU Software

5.2.4 Fatal error messages

Advertisement server: Data received will be discarded due to receive error on socket. errno = errno

Content:

The shutdown daemon failed to send data by the network communication. When this message is output, an unintended node may be forcibly stopped when the nodes are forcibly stopped.

Corrective action:

Record this message and collect information for an investigation. Then, contact field engineers. For details on how to collect information, see "Troubleshooting" in "PRIMECLUSTER Installation and Administration Guide."

Agent Shutdown Agent uninitialization for host nodename failed

Content:

The shutdown agent is not uninstalled properly.

Corrective action:

Check logs of the shutdown agent and then contact field engineers.

cannot determine the port on which the advertisement server should be started

Content:

A network communication port number for the shutdown daemon is not obtained successfully.

Corrective action:

Execute the following command.

# /opt/SMAW/bin/sdtool -s

When "The RCSD is not running" is displayed, execute the following command to start the shutdown facility.

# /opt/SMAW/bin/sdtool -b

Execute the following command and check the result.

# /opt/SMAW/bin/sdtool -s
  • When Init State is InitWorked and Test State is TestWorked

    No action is required.

  • When Init State is InitFailed or Test State is TestFailed

    (Solaris)
    Check if the following line is set in the /etc/inet/services file.

    sfadv           2316/udp                        # SMAWsf package

    (Linux)
    Check if the following line is set in the /etc/services file.

    4.3A20 or earlier

    sfadv           2316/udp                        # SMAWsf package

    4.3A30 or later

    sfadv           9382/udp                        # SMAWsf package

    If the line is not configured, add this line to the above file. Take the following actions after correcting the file.

    • Solaris

      Execute the following commands to restart the shutdown facility.

      # /opt/SMAW/bin/sdtool -e
      # /opt/SMAW/bin/sdtool -b
    • Linux

      Restart the cluster node after correcting the file.

  • Other than above

    Wait for a while and then execute the above command again. Check the result.

If the process does not work as a corrective action, record this message and collect information for an investigation. Then, contact field engineers. For details on how to collect information, see "Troubleshooting" in "PRIMECLUSTER Installation and Administration Guide."

Could not correctly read the rcsd.cfg file.

Content:

Either rcsd.cfg file does not exist or the syntax in rcsd.log is incorrect.

Corrective action:

Create rcsd.cfg file or correct the syntax error.

Decryption of SecretAccesskey failed.

Corrective action:

Check the following:

  • Check if the configuration file (/etc/opt/SMAW/SMAWsf/SA_vmnifclAsyncReset.cfg) is specified correctly.

    For the SecretAccessKey, set the information encrypted by the sfcipher command.

    For settings of the configuration file, see "Setting up the Shutdown Facility" in "Part 2 NIFCLOUD Environment" in "PRIMECLUSTER Installation and Administration Guide Cloud Services."

If this corrective action does not work, record this message and collect information for an investigation. Then, contact field engineers. For details on how to collect information, see "Troubleshooting" in "PRIMECLUSTER Installation and Administration Guide."

/etc/sysconfig/libvirt-guests is not configured on Hypervisor of host nodename. rcsd died abnormally.

Content:

The /etc/sysconfig/libvirt-guests on the hypervisor on the node <nodename> is not configured. The shutdown daemon (rcsd) is ended abnormally.

Corrective action:

Take the corrective action depending on the following events.

Event 1

This message is output due to the shutdown or restart of the hypervisor, and the node was not forcibly stopped successfully.

  1. Check if the forcibly stopped guest OS is in LEFTCLUSTER state. Do not recover LEFTCLUSTER state at this time.

  2. Wait until the activated guest OS becomes LEFTCLUSTER state.

  3. Check that all the nodes become LEFTCLUSTER state and then check if the /etc/sysconfig/libvirt-guests is properly configured on all the hypervisors. See "PRIMECLUSTER Installation and Administration Guide (Linux)" for configuration of the /etc/sysconfig/libvirt-guests.

  4. Stop all the guest OSes on which cluster applications will not be Online state.

  5. Check that all the guest OSes are stopped and then execute the following command on the guest OSes on which the cluster applications will be Online state. This process will recover LEFTCLUSTER state.

    # cftool -k
  6. To start the shutdown facility, execute the following command on the guest OSes on which the cluster applications will be Online state.

    # /opt/SMAW/bin/sdtool -b

Event 2

Other than the above event 1

  1. Check if the /etc/sysconfig/libvirt-guests on the hypervisor on the node <nodename> is properly configured. See "PRIMECLUSTER Installation and Administration Guide (Linux)" for configuration of the /etc/sysconfig/libvirt-guests.

  2. Execute the following command to start the shutdown facility.

    # /opt/SMAW/bin/sdtool -b

    If the process does not work as a corrective action, record this message and collect information for an investigation. Then, contact field engineers. For details on how to collect information, see "Troubleshooting" in "PRIMECLUSTER Installation and Administration Guide."

Failed to create a signal handler for SIGCHLD
Failed to create a signal handler for SIGUSR1

Content:

An internal error occurred in the program.

Corrective action:

Record this message and collect information for an investigation. Then, contact field engineers. For details on how to collect information, see "Troubleshooting" in "PRIMECLUSTER Installation and Administration Guide."

Failed to get kernel parameter kernel.panic.

Corrective action:

Getting the value failed as a result of executing the "sysctl -n kernel.panic" command.

Record this message, collect information for an investigation, and then contact field engineers. For details on how to collect information, refer to "Troubleshooting" in "PRIMECLUSTER Installation and Administration Guide."

Failed to get kernel parameter kernel.sysrq.

Corrective action:

Getting the value failed as a result of executing the "sysctl -n kernel.sysrq" command.

Record this message, collect information for an investigation, and then contact field engineers. For details on how to collect information, refer to "Troubleshooting" in "PRIMECLUSTER Installation and Administration Guide."

Failed to get kernel parameter kernel.unknown_nmi_panic.

Corrective action:

Getting the value failed as a result of executing the "sysctl -n kernel.unknown_nmi_panic" command.

Record this message, collect information for an investigation, and then contact field engineers. For details on how to collect information, refer to "Troubleshooting" in "PRIMECLUSTER Installation and Administration Guide."

Failed to unlink/create/open CLI Pipe

Content:

An internal error occurred in the program.

Corrective action:

Record this message and collect information for an investigation. Then, contact field engineers. For details on how to collect information, see "Troubleshooting" in "PRIMECLUSTER Installation and Administration Guide."

Failed to open CFSF device, reason (value)string

Content:

The CFSF device cannot be opened.

Corrective action:

Record this message and collect information for an investigation. Then, contact field engineers. For details on how to collect information, see "Troubleshooting" in "PRIMECLUSTER Installation and Administration Guide."

Fail to post LEFTCLUSTER event:string

Content:

When rci detected a node error, it failed in transmission of LEFTCLUSTER event.

Corrective action:

Record this message and collect information for an investigation. Then, contact field engineers. For details on how to collect information, see "Troubleshooting" in "PRIMECLUSTER Installation and Administration Guide."

FATAL: rcsd died too frequently.It will not be started by rcsd_monitor.

Content:

The shutdown daemon (rcsd) cannot be restarted after it abnormally ended.

Corrective action:

Check if error messages of the shutdown facility have been output before this message is output. If the error messages are output, follow them for the corrective action.

When no error messages of the shutdown facility are output, record this message and collect information for an investigation. Then, contact field engineers. For details on how to collect information, see "Troubleshooting" in "PRIMECLUSTER Installation and Administration Guide."

fopen of /etc/opt/SMAW/SMAWsf/rcsd.cfg failed, errno errno

Content:

The shutdown facility cannot be started because it has not been configured.

Corrective action:

Check again the configuration of the shutdown facility and then start the shutdown daemon again.

This action is not required for the xen kernel environment on Red Hat Enterprise Linux 5 (for Intel64).

Forced to re open rcsd net pipe due to an invalid pipe name

Content:

The network communication pipe of the shutdown daemon had an error. It has been recreated.

Corrective action:

The shutdown facility recreates the network communication pipe that had an error. No corrective action is required because this error does not affect other processes that are operated by the system.

Forced to re-open rcsd net pipe due to a missing pipe name

Content:

The network communication pipe of the shutdown daemon had an error. It has been recreated.

Corrective action:

The shutdown facility recreates the network communication pipe that had an error. No corrective action is required because this error does not affect other processes that are operated by the system.

Forced to re-open rcsd net pipe due to failed stat pipe name errno: errno

Content:

The network communication pipe of the shutdown daemon had an error. It has been recreated.

Corrective action:

The shutdown facility recreates the network communication pipe that had an error. No corrective action is required because this error does not affect other processes that are operated by the system.

function of file failed, errno errno

Content:

An internal error occurred in the program.

Corrective action:

Check if related messages are additionally output.
If additional messages are output, follow them for the corrective action.

If no additional messages were output, record this message and collect information for an investigation. Then, contact field engineers. For details on how to collect information, see "Troubleshooting" in "PRIMECLUSTER Installation and Administration Guide."

h_cfsf_get_leftcluster() failed. reason: (value)string

Content:

Failed to call cfsf_get_leftcluster.

Corrective action:

Record this message and collect information for an investigation. Then, contact field engineers. For details on how to collect information, see "Troubleshooting" in "PRIMECLUSTER Installation and Administration Guide."

HostList empty

Content:

The rcsd.cfg is not configured properly or the rcsd.cfg is not read successfully.

Corrective action:

Execute the following command.

# /opt/SMAW/bin/sdtool -s

When "The RCSD is not running" is displayed, execute the following command to start the shutdown facility.

# /opt/SMAW/bin/sdtool -b

Execute the following command and check the result.

# /opt/SMAW/bin/sdtool -s
  • When Init State is InitWorked and Test State is TestWorked

    No action is required.

  • When Init State is InitFailed or Test State is TestFailed

    Check the configuration of the rcsd.cfg and correct the configuration if it is not correct. Then, execute the following commands to restart the shutdown facility.

    # /opt/SMAW/bin/sdtool -e
    # /opt/SMAW/bin/sdtool -b
  • Other than the above status

    Wait for a while and then execute the above commands again. Check the result.

If the process does not work as a corrective action, record this message and collect information for an investigation. Then, contact field engineers. For details on how to collect information, see "Troubleshooting" in "PRIMECLUSTER Installation and Administration Guide."

Host <nodename > ICF communication failure detected

Content:

The rcsd was notified that the node <nodename > has lost its heartbeat.

Host nodename MA_exec: string failed, errno errno

Content:

An error has occurred while the MA executing thread for the node <nodename> was executed.

Corrective action:

Record this message and collect information for an investigation. Then, contact field engineers. For details on how to collect information, see "Troubleshooting" in "PRIMECLUSTER Installation and Administration Guide."

Illegal /etc/kdump.conf file. default option is not found.

Corrective action:

Set default to the configuration file of kdump.

For details on the items of the configuration file, refer to "Setting up kdump" in "PRIMECLUSTER Installation and Administration Guide Cloud Services."

Illegal /etc/kdump.conf file. default option setting is incorrect.

Content:

The default setting in the configuration file of kdump is not correct.

No value is set to default or a value other than poweroff is set.

Corrective action:

Set poweroff to default in the configuration file of kdump.

For details on the items of the configuration file, refer to "Setting up kdump" in "PRIMECLUSTER Installation and Administration Guide Cloud Services."

Illegal /etc/kdump.conf file. kdump_post option is not found.

Corrective action:

Set kdump_post to the configuration file of kdump.

For details on the items of the configuration file, refer to "Setting up kdump" in "PRIMECLUSTER Installation and Administration Guide Cloud Services."

Illegal /etc/kdump.conf file. kdump_post option setting is incorrect.

Content:

The kdump_post setting in the configuration file of kdump is not correct.

No value is set to kdump_post or the full path of the poff.sh script is not correct.

Corrective action:

Set the following value to kdump_post in the configuration file of kdump.

/opt/SMAW/SMAWsf/bin/poff.sh

For details on the items of the configuration file, refer to "Setting up kdump" in "PRIMECLUSTER Installation and Administration Guide Cloud Services."

Illegal /etc/opt/SMAW/SMAWsf/SA_vmawsReset.cfg file. CFName=nodename is not found.
Illegal /etc/opt/SMAW/SMAWsf/SA_vmawsAsyncReset.cfg file. CFName=
nodename is not found.

Content:

The configuration file of the AWS CLI (SA_vmawsReset, SA_vmawsAsyncReset) shutdown agent is not correct. The CF node name <nodename> is not found.

Corrective action:

Add the required settings in the configuration file of the AWS CLI (SA_vmawsReset, SA_vmawsAsyncReset) shutdown agent.

For details on the items of the configuration file, refer to "Setting up the Shutdown Facility" in "Part 4 AWS Environment" in "PRIMECLUSTER Installation and Administration Guide Cloud Services."

Illegal /etc/opt/SMAW/SMAWsf/SA_vmawsReset.cfg file. itemname is not found.
Illegal /etc/opt/SMAW/SMAWsf/SA_vmawsAsyncReset.cfg file.
itemname is not found.

Content:

The configuration file of the AWS CLI (SA_vmawsReset, SA_vmawsAsyncReset) shutdown agent is not correct. The required item name <itemname> is not found.

Corrective action:

Add the required settings in the configuration file of the AWS CLI (SA_vmawsReset, SA_vmawsAsyncReset) shutdown agent.

For details on the items of the configuration file, refer to "Setting up the Shutdown Facility" in "Part 4 AWS Environment" in "PRIMECLUSTER Installation and Administration Guide Cloud Services."

Illegal /etc/opt/SMAW/SMAWsf/SA_vmawsReset.cfg file. The invalid data is included.
Illegal /etc/opt/SMAW/SMAWsf/SA_vmawsAsyncReset.cfg file. The invalid data is included.

Content:

The invalid data is included in the configuration file of the AWS CLI (SA_vmawsReset, SA_vmawsAsyncReset) shutdown agent.

Corrective action:

Delete the invalid data from the configuration file of the AWS CLI (SA_vmawsReset, SA_vmawsAsyncReset) shutdown agent.

For details on the items of the configuration file, refer to "Setting up the Shutdown Facility" in "Part 4 AWS Environment" in "PRIMECLUSTER Installation and Administration Guide Cloud Services."

Illegal /etc/opt/SMAW/SMAWsf/SA_vmazureReset.cfg file. CFName=nodename is not found.

Content:

The configuration file of the Azure (SA_vmazureReset) shutdown agent is not correct. The CF node name <nodename> is not found.

Corrective action:

Add the required settings in the configuration file of the Azure (SA_vmazureReset) shutdown agent.

For details on the items of the configuration file, refer to "Setting up the Shutdown Facility" in "Part 5 Azure Environment" in "PRIMECLUSTER Installation and Administration Guide Cloud Services."

Illegal /etc/opt/SMAW/SMAWsf/SA_vmazureReset.cfg file. itemname is not found.

Content:

The configuration file of the Azure (SA_vmazureReset) shutdown agent is not correct. The required item name <itemname> is not found.

Corrective action:

Add the required settings in the configuration file of the Azure (SA_vmazureReset) shutdown agent.

For details on the items of the configuration file, refer to "Setting up the Shutdown Facility" in "Part 5 Azure Environment" in "PRIMECLUSTER Installation and Administration Guide Cloud Services."

If InstanceID is displayed in itemname, replace it with ResourceID.

Illegal /etc/opt/SMAW/SMAWsf/SA_vmazureReset.cfg file. The invalid data is included.

Content:

The invalid data is included in the configuration file of the Azure (SA_vmazureReset) shutdown agent.

Corrective action:

Delete the invalid data from the configuration file of the Azure (SA_vmazureReset) shutdown agent.

For details on the items of the configuration file, refer to "Setting up the Shutdown Facility" in "Part 5 Azure Environment" in "PRIMECLUSTER Installation and Administration Guide Cloud Services."

Illegal /etc/opt/SMAW/SMAWsf/SA_vmnifclAsyncReset.cfg file. CFName=nodename is not found.

Content:

The configuration file of the NIFCLOUD API (SA_vmnifclAsyncReset) shutdown agent is not correct. The CF node name <nodename> is not found or is not correct.

Corrective action:

Set the correct CF node name <nodename> in the configuration file of the NIFCLOUD API (SA_vmnifclAsyncReset) shutdown agent.

For details on the items of the configuration file, see "Setting up the Shutdown Facility" in "Part 2 NIFCLOUD Environment" in "PRIMECLUSTER Installation and Administration Guide Cloud Services."

Illegal /etc/opt/SMAW/SMAWsf/SA_vmnifclAsyncReset.cfg file. itemname is not found.

Content:

The configuration file of the NIFCLOUD API (SA_vmnifclAsyncReset) shutdown agent is not correct. The item name <itemname> is not found.

Corrective action:

Add the required settings in the configuration file of the NIFCLOUD API (SA_vmnifclAsyncReset) shutdown agent.

For details on the items of the configuration file, see "Setting up the Shutdown Facility" in "Part 2 NIFCLOUD Environment" in "PRIMECLUSTER Installation and Administration Guide Cloud Services."

Illegal /etc/opt/SMAW/SMAWsf/SA_vmnifclAsyncReset.cfg file. The invalid data is included.

Content:

The invalid data is included in the configuration file of the NIFCLOUD API (SA_vmnifclAsyncReset) shutdown agent.

Corrective action:

Delete the invalid data from the configuration file of the NIFCLOUD API (SA_vmnifclAsyncReset) shutdown agent.

For details on the items of the configuration file, see "Setting up the Shutdown Facility" in "Part 2 NIFCLOUD Environment" in "PRIMECLUSTER Installation and Administration Guide Cloud Services."

Illegal /etc/opt/SMAW/SMAWsf/SA_vmosr.cfg file. CFName=nodename is not found.

Content:

The configuration file of the OpenStack API (SA_vmosr) shutdown agent is not correct. The CF node name <nodename> is not found.

Corrective action:

Add the item "cfname" if it is not set in the configuration file of the OpenStack API (SA_vmosr) shutdown agent. If the item "cfname" is already set in the file, the specified CF node name is not correct. Modify the CF node name.

Illegal /etc/opt/SMAW/SMAWsf/SA_vmosr.cfg file. itemname is not found.

Content:

The configuration file of the OpenStack API (SA_vmosr) shutdown agent is not correct. The required item name is not found.

Corrective action:

Add the required settings in the configuration file of the OpenStack API (SA_vmosr) shutdown agent.

Illegal /opt/SMAW/SMAWRrms/etc/os_endpoint.cfg file. "itemname" is not found.

Content:

The setting of the RHOSP environment information file is not correct. The required item name is not found.

Corrective action:

Add the required settings in the RHOSP environment information file.

Illegal /opt/SMAW/SMAWRrms/etc/os_endpoint.cfg file. The invalid character string of "/vX.X" is included in "itemname".

Content:

The invalid character string of "/vX.X" is included in the endpoint URL of the Identity service or the endpoint URL of the Compute service.

Corrective action:

Do not include any character strings from "/vX.X" in the endpoint URLs of both Identity and Compute services.

Illegal configfile file. item is not found

Content:

An item item is not found in a config file configfile on the node on which the message is displayed.

Corrective action:

Describe the information of item in a config file configfile.

See the references below and modify the config file.

  • If the configfile is /opt/SMAW/SMAWRrms/etc/k5_endpoint.cfg

    "Creating the FJcloud-O Environment Information File" in "PRIMECLUSTER Installation and Administration Guide Cloud Services"

  • If the configfile is /etc/opt/SMAW/SMAWsf/SA_vmk5r.cfg

    "Setting up the Shutdown Facility" in "Part 1 FJcloud-O Environment" in "PRIMECLUSTER Installation and Administration Guide Cloud Services"

If this error cannot be corrected by the above action, record this message and collect information for an investigation.

Then, contact field engineers.

For details on how to collect information, see "Troubleshooting" in "PRIMECLUSTER Installation and Administration Guide."

Illegal kernel parameter. kernel.panic setting is incorrect.

Corrective action:

A value other than 0 is set to kernel.panic of the kernel parameter. Set 0.

For details on the items of the kernel parameter, refer to "Checking and Setting the Kernel Parameters" in "Part 4 AWS Environment" in "PRIMECLUSTER Installation and Administration Guide Cloud Services."

Illegal kernel parameter. kernel.sysrq setting is incorrect.

Corrective action:

0 is set to kernel.sysrq of the kernel parameter. Set a value other than 0.

For details on the items of the kernel parameter, refer to "Checking and Setting the Kernel Parameters" in "Part 4 AWS Environment" in "PRIMECLUSTER Installation and Administration Guide Cloud Services."

Illegal kernel parameter. kernel.unknown_nmi_panic setting is incorrect.

Corrective action:

A value other than 1 is set to kernel.unknown_nmi_panic of the kernel parameter. Set 1.

For details on the items of the kernel parameter, refer to "Checking and Setting the Kernel Parameters" in "Part 4 AWS Environment" in "PRIMECLUSTER Installation and Administration Guide Cloud Services."

Malloc failed during function

Content:

Not enough memory.

Corrective action:

Increase virtual memory size (ulimit -v) or increase system memory. If the problem still remains, contact field engineers.

Node id number ICF communication failure detected

Content:

CF layer has detected the lost heartbeat.

Corrective action:

The rcsd will take the action.

rcsd died abnormally. Restart it.

Content:

After an abnormal termination, the shutdown daemon (rcsd) has restarted.

Corrective action:

As the shutdown daemon automatically recovers by itself, no action is required.

SA_lkcd: FJSVossn is not installed.

Content:

The kdump shutdown agent cannot be used since the OS status notification feature (FJSVossn) of the PRIMERGY attached software is not installed.

Corrective action:

Install the OS status notification feature (FJSVossn).

SA SA_blade to test host nodename failed
SA SA_ipmi to test host nodename failed
SA SA_lkcd to test host nodename failed

Content:

The node on which the message is displayed failed to connect BMC (iRMC) of the node nodename or the blade server.

Corrective action:

Check the following:

<Common>

  • Check if the system or the network has high load.

    If this message is no longer displayed after 10 minutes, the error may have been solved. Execute the following command to check if the shutdown facility works properly.

    # /opt/SMAW/bin/sdtool -s

<SA_ipmi is output>

  • Check if the following settings are valid: the IP address of iRMC/BMC that is specified in the /etc/opt/SMAW/SMAWsf/SA_ipmi.cfg, and the user name and the password to log in to iRMC/BMC.

    See "PRIMECLUSTER Installation and Administration Guide (Linux)" for the settings to specify in the /etc/opt/SMAW/SMAWsf/ SA_ipmi.cfg.

  • Check if iRMC/BMC is configured properly.

    For how to configure iRMC/BMC and how to check the configuration, see "User's Guide" or "ServerView User's Guide" of each model.

  • Check if iRMC/BMC is turned on.

  • Check if the normal lamp of the port connected to the HUB and the LAN cable is ON.

  • Check if LAN cables are properly connected to the iRMC/BMC connector or the HUB-side connector.

  • Check if the IP address of iRMC/BMC belongs to the same segment as the cluster node.

  • Check if iRMC/BMC has not restarted or updated the firmware.

  • Check if the LAN access privilege of the user for logging in to iRMC/BMC specified in the SA_ipmi.cfg is set as Administrator.

  • Check if IPMI (IPMI over LAN) of iRMC/BMC is enabled.

<SA_blade is output>

  • Check if the IP address of the management blade and the slot number of the server blade, which are specified in the /etc/opt/SMAW/SMAWsf/SA_blade.cfg, are valid.

    See "PRIMECLUSTER Installation and Administration Guide (Linux)" for 'timeout' configuration to specify the /etc/opt/SMAW/SMAWsf/ SA_ipmi.cfg.

  • Check if the blade server is configured properly.

    For the special notes to configure the shutdown facility, see "PRIMECLUSTER Installation and Administration Guide (Linux)."

    For how to configure the blade server and how to check the configuration, see "ServerView User's Guide" and each hardware guide provided with a machine.

  • Check if the normal lamp of the port which is connected by HUBs and LAN cables is on.

  • Check if LAN cables are properly connected to the management blade connector or the HUB connector.

  • Check if the management blade has not restarted or updated the firmware, or check if the management blade in the master mode has not been switched.

<SA_ lkcd is output>

  • Check if the timeout value of SA_lkcd that is specified in the /etc/opt/SMAW/SMAWsf/rcsd.cfg is correct.

    Check if the /etc/opt/SMAW/SMAWsf/SA_lkcd.tout is also specified properly. To specify these files, see "PRIMECLUSTER Installation and Administration Guide (Linux)." Correct the errors if needed. Check and correct the errors on all the nodes.

  • Check if the following command is executed when the shutdown agent is configured.

    # /etc/opt/FJSVcllkcd/bin/panicinfo_setup

    When using the Diskdump shutdown agent or the Kdump shutdown agent, execute the above command after the IMPI shutdown agent or the Blade shutdown agent is configured. When the configuration of the IMPI shutdown agent or the Blade shutdown agent is changed, execute the above command again.

  • When the Diskdump shutdown agent is used, check if Diskdump is configured properly. When the Kdump shutdown agent is used, check if kdump is configured properly. For how to configure Diskdump or kdump and how to check the configuration, see "PRIMECLUSTER Installation and Administration Guide (Linux)."

  • When the IPMI shutdown agent is used, check the above <SA_ipmi is output>. When the Blade shutdown agent is used, check the above <SA_blade is output>.

If the error cause was any one of the above check items, take the corrective action. Then, restart the shutdown facility by executing the following commands on the node which output the above message.

# /opt/SMAW/bin/sdtool -e
# /opt/SMAW/bin/sdtool -b

When the connection fails again even after the above action is taken, a network failure or a failure of either BMC (iRMC), blade server, or HUB may be the cause of the error. In this case, contact field engineers.

If this corrective action does not work, record this message and collect information for an investigation. Then, contact field engineers. For details on how to collect information, see "Troubleshooting" in "PRIMECLUSTER Installation and Administration Guide."

SA SA_icmp to test host nodename failed

Content:

The node on which the message is displayed failed to check the connection to the node nodename

Corrective action:

Check the following:

  • If events such as panic or hang-up has not occurred on the host OS.

  • If guest OSes, the host OS, or the network do not have high load.

  • If this message is no longer displayed after 10 minutes, the error may have been solved. Execute the following command to check if the shutdown facility works properly.

    # /opt/SMAW/bin/sdtool -s

  • If the node nodename is not stopped.

  • If the IP address or the network interface that is specified for /etc/opt/SMAW/SMAWsf/SA_icmp.cfg is valid.

    See "PRIMECLUSTER Installation and Administration Guide (Linux)" to specify the /etc/opt/SMAW/SMAWsf/SA_icmp.cfg.

  • If the virtual IP addresses or the network interface that are allocated for guest OSes are valid.

  • If the normal lamp of the port which is connected by HUBs and LAN cables is on.

  • If the cables are properly connected.

  • If the error cause was any one of the above check items, restart the SNMP asynchronous monitoring function and the shutdown facility by executing the following commands on the node which output the above message.

    # /opt/SMAW/bin/sdtool -e
    # /opt/SMAW/bin/sdtool -b

When the connection fails again even after the above action is taken, a network failure or a failure of HUB may be the cause of the error. In this case, contact field engineers.

SA SA_ilomp.so to test host nodename failed
SA SA_ilomr.so to test host nodename failed
SA SA_rccu.so to test host nodename failed
SA SA_xscfp.so to test host nodename failed
SA SA_xscfr.so to test host nodename failed

Content:

The node on which the message is displayed failed to check the connection to XSCF, RCCU, or ILOM on the node nodename.

Corrective action:

Check the following:

<Common>

  • Check if the system or the network has high load.

    • Solaris PRIMECLUSTER 4.2A00 or earlier

      Execute the following commands to restart the console monitoring agent and the shutdown facility.

      # /opt/SMAW/bin/sdtool -e
      # /etc/opt/FJSVcluster/bin/clrccumonctl stop
      # /etc/opt/FJSVcluster/bin/clrccumonctl start
      # /opt/SMAW/bin/sdtool -b
    • Solaris PRIMECLUSTER 4.3A10 or later

      If this message is no longer displayed after 10 minutes, the error may have been solved. Execute the following command to check if the shutdown facility works properly.

      # /opt/SMAW/bin/sdtool -s

<When RCCU is used for consoles>

  • Check if the console information of RCCU such as the IP address or the node name is correct.

    Use clrccusetup(1M) to check the configured console information. If the console information is not correct, use clrccusetup(1M) to register the console information again.

  • Check if RCCU is configured properly.

    For how to configure RCCU and how to check the configuration, see an operation manual provided with RCCU.

  • Check if RCCU is turned on.

  • Check if the normal lamp of the port which is connected by HUBs and LAN cables is on.

  • Check if LAN cables are properly connected to the RCCU connector or the HUB connector.

  • Check if the IP address of RCCU belongs to the same segment as the administrative LAN.

<When XSCF is used for consoles>

  • Check if the console information of XSCF such as the IP address or the connection protocols (telnet and SSH) is correct.

    Use clrccusetup(1M) to check the configured console information. If the console information is not correct, use clrccusetup(1M) to register the console information again.

  • Check if XSCF is configured properly.

    For the special notes to configure the shutdown facility, see "PRIMECLUSTER Installation and Administration Guide (Oracle Solaris)."
    For how to configure XSCF and how to check the configuration, see "XSCF (eXtended System Control Facility) User's Guide."

  • When SSH is used to connect to XSCF

    • Check if the login user account for the shutdown facility is used to connect the cluster node with XSCF by SSH, and the user authentication (such as creating RSA key) has been completed to connect by SSH for the first time.

    • Check if a password authentication is used as a user authentication to connect the cluster node with XSCF by SSH.

      If an auto password authentication such as a public key authentication is used for XSCF, disable it. For how to configure XSCF or how to check the configuration, see "XSCF (eXtended System Control Facility) User's Guide."

  • Check if the normal lamp of the port which is connected by HUBs and LAN cables is on.

  • Check if LAN cables are properly connected to the XSCF-LAN port connector of XSCF or the HUB connector.

  • Check if the XSCF shell port, which belongs to the telnet port, is not connected from out of a cluster.

    Connect to the XSCF shell port through a serial port (tty-a) to check the connection status. For how to connect and how to check the connection, see "XSCF (eXtended System Control Facility) User's Guide."

  • Check if the IP address of XSCF belongs to the same segment as the administrative LAN.

  • Check if the firmware in XSCF has not been restarted or updated, or if an event such as XSCF failover has not occurred.

<When ILOM is used for consoles>

  • Check if the console information of ILOM such as the IP address or the node name is correct.

    Use clrccusetup(1M) to check the configured console information. If the console information is not correct, use clrccusetup(1M) to register the console information again.

  • Check if ILOM is configured properly.

    For the special notes to configure the shutdown facility, see "PRIMECLUSTER Installation and Administration Guide (Oracle Solaris)." See below manuals for how to configure ILOM and how to check the configuration.

    • ILOM 2.x

      "Integrated Lights Out Manager User's Guide"

    • ILOM 3.0

      "Integrated Lights Out Manager (ILOM) 3.0 Concepts Guide"

      "Integrated Lights Out Manager (ILOM) 3.0 Web Interface Procedures Guide"

      "Integrated Lights Out Manager (ILOM) 3.0 CLI Procedures Guide"

      "Integrated Lights Out Manager (ILOM) 3.0 Getting Started Guide"

  • Check if the login user account for the shutdown facility is used to connect the cluster node with ILOM by SSH, and the user authentication (such as creating RSA key) has been completed to connect by SSH for the first time.

  • For ILOM 3.0, check if a password authentication is used as a user authentication to connect the cluster node with ILOM by SSH.

    If an auto password authentication such as a public key authentication or host key based authentication is used for ILOM, disable it. For how to configure ILOM or how to check the configuration, see the above manuals for ILOM 3.0.

  • Check if the normal lamp of the port which is connected by HUBs and LAN cables is on.

  • Check if LAN cables are properly connected to network control (NET MGT) port connector of ILOM or the HUB connector.

  • Check if ILOM has not restarted or updated the firmware.

If the error cause was any one of the above check items, take the corrective action. Then, restart the console monitoring agent and the shutdown facility by executing the following commands on the node which output the above message.

# /opt/SMAW/bin/sdtool -e
# /etc/opt/FJSVcluster/bin/clrccumonctl stop
# /etc/opt/FJSVcluster/bin/clrccumonctl start
# /opt/SMAW/bin/sdtool -b

When the connection fails again even after the above action is taken, a network failure or a failure of either RCCU, XSCF, ILOM, or HUB may be the cause of the error. In this case, contact field engineers.

If this corrective action does not work, record this message and collect information for an investigation. Then, contact field engineers. For details on how to collect information, see "Troubleshooting" in "PRIMECLUSTER Installation and Administration Guide."

SA SA_irmcf.so to test host nodename failed

Content:

The node on which the message is displayed failed to check the connection to MMB on the node nodename.

Corrective action:

Check if the following messages have been output just before the above message is output. If they have been output, refer to "4.5 Error Messages" and take the corrective actions.

  • 7210 An error was detected in MMB.

  • 7214 The username or password to login to the MMB is incorrect.

  • 7606 The snmptrapd is not running.

  • 7609 The IPMI service is not running.

  • 7610 The authority of user to login to MMB is incorrect.

If these messages have not been output, record this message, collect information for an investigation, and then contact field engineers.

For details on how to collect information, see "Troubleshooting" in "PRIMECLUSTER Installation and Administration Guide."

SA SA_irmcp.so to test host nodename failed
SA SA_irmcr.so to test host nodename failed

Content:

The node on which the message is displayed failed to check the connection to iRMC on the node nodename.

Corrective action:

Check if the following messages have been output just before the above message is output. If they have been output, refer to "4.5 Error Messages" and take the corrective actions.

  • 7602 The user name or password to login to iRMC is incorrect.

  • 7603 The authority of user to login to iRMC is incorrect.

  • 7604 An error has been detected in the transmission route to iRMC.

  • 7605 An error has been detected in iRMC.

  • 7606 The snmptrapd is not running.

  • 7609 The IPMI service is not running.

If these messages have not been output, record this message, collect information for an investigation, and then contact field engineers.

For details on how to collect information, see "Troubleshooting" in "PRIMECLUSTER Installation and Administration Guide."

SA SA_kzchkhost to test host nodename failed

Content:

The node on which the message is displayed failed to check the connection to ILOM or XSCF on the node nodename.

Corrective action:

Check the following:

<Common>

  • Check if the system or the network has high load.

    If this message is no longer displayed after 10 minutes, the error may have been solved. Execute the following command to check if the shutdown facility works properly.

    # /opt/SMAW/bin/sdtool -s
  • Check if the normal lamp of the port which is connected by HUBs and LAN cables is on.

<ILOM (SPARC T4, T5, T7, S7)>

  • Check if the global zone host information such as the IP address or node name of ILOM is correct.

    Use the SF Wizard to check the set global zone host information.

    If the global zone host information is not correct, use the SF Wizard to register the information again.

  • Check if ILOM is configured properly.

    For the special notes to configure the shutdown facility, see "PRIMECLUSTER Installation and Administration Guide (Oracle Solaris)."

    See below manuals for how to configure ILOM and how to check the configuration:

    "Integrated Lights Out Manager (ILOM) 3.0 Concepts Guide"

    "Integrated Lights Out Manager (ILOM) 3.0 Web Interface Procedures Guide"

    "Integrated Lights Out Manager (ILOM) 3.0 CLI Procedures Guide"

    "Integrated Lights Out Manager (ILOM) 3.0 Getting Started Guide"

  • Check if the login user account for the shutdown facility is used to connect Kernel Zone with ILOM by SSH, and the user authentication (such as creating RSA key) has been completed to connect by SSH for the first time.

  • Check if a password authentication is used as a user authentication to connect Kernel Zone with ILOM by SSH.

    If an auto password authentication such as a public key authentication or host key based authentication is used for ILOM, disable it. For how to configure ILOM or how to check the configuration, see the above manuals for ILOM 3.0.

  • Check if LAN cables are properly connected to network control (NET MGT) port connector of ILOM or the HUB connector.

  • Check if ILOM has not restarted or updated the firmware.

<XSCF (SPARC M10, M12)>

  • Check if the configuration information of the logical domains has been saved.

  • Check if the states of the logical domains can be checked in XSCF.

    When either of the above is found to be the cause of the error, use the ldm add-spconfig command to save the configuration information of the logical domains.

  • Check if the system or the network has high load.

  • Check if the global zone host information such as the IP address, node name, or the connection protocols (telnet and SSH) in XSCF is correct.

    Use the SF Wizard to check the set global zone host information.

    If the global zone host information is not correct, use the SF Wizard to register the information again.

  • Check if XSCF is configured properly.

    For the special notes to configure the shutdown facility, see "PRIMECLUSTER Installation and Administration Guide (Oracle Solaris)."

    For how to configure XSCF and how to check the configuration, see "Fujitsu SPARC M12 and Fujitsu M10/SPARC M10 System Operation and Administration Guide."

  • When SSH is used to connect to XSCF.

    • Check if the login user account for the shutdown facility is used to connect Kernel Zone with XSCF by SSH, and the user authentication (such as creating RSA key) has been completed to connect by SSH for the first time.

    • Check if a password authentication is used as a user authentication to connect Kernel Zone with XSCF by SSH.

      If an auto password authentication such as a public key authentication is used for XSCF, disable it.

      For how to configure XSCF and how to check the configuration, see "Fujitsu SPARC M12 and Fujitsu M10/SPARC M10 System Operation and Administration Guide."

  • Check if LAN cables are properly connected to the XSCF-LAN port connector of XSCF or the HUB connector.

  • Check if the XSCF shell port is not connected from out of a cluster.

    Connect to the XSCF shell port through a serial port to check the connection status. For how to connect and how to check the connection, see "Fujitsu SPARC M12 and Fujitsu M10/SPARC M10 System Operation and Administration Guide."

  • Check if the IP address of XSCF belongs to the same segment as the administrative LAN or the asynchronous monitoring sub-LAN.

  • Check if the firmware in XSCF has not been restarted or updated, or if an event such as XSCF failover has not occurred.

If the error cause was any one of the above check items, take the corrective action. Then, restart the shutdown facility by executing the following commands on the node which output the above message.

# /opt/SMAW/bin/sdtool -e
# /opt/SMAW/bin/sdtool -b

When the connection fails again even after the above action is taken, a network fault or a hardware failure of ILOM, XSCF, or HUB may be the error cause. In this case, contact field engineers.

If this corrective action does not work, record this message and collect information for an investigation. Then, contact field engineers. For details on how to collect information, see "Troubleshooting" in "PRIMECLUSTER Installation and Administration Guide."

SA SA_kzonep to test host nodename failed
SA SA_kzoner to test host
nodename failed

Content:

The node on which the message is displayed failed to check the connection to Kernel Zone on the node nodename.

Corrective action:

Check the following:

  • Check if the system or the network has high load.

    If this message is no longer displayed after 10 minutes, the error may have been solved. Execute the following command to check if the shutdown facility works properly.

    # /opt/SMAW/bin/sdtool -s
  • Check if the normal lamp of the port which is connected by HUBs and LAN cables is on.

  • Check if LAN cables are properly connected to the administrative LAN port connector or the HUB connector.

  • Check if the Kernel Zone information such as the zone name or the global zone host name is correct.

    Use the SF Wizard to check the set Kernel Zone information.

    If the Kernel Zone information is not correct, use the SF Wizard to register the information again.

  • Check if the login user account for the shutdown facility is used to connect Kernel Zone with the global zone host by SSH, and the user authentication (such as creating RSA key) has been completed to connect by SSH for the first time.

  • Check if XSCF is configured properly.

    For the special notes to configure the shutdown facility, see "PRIMECLUSTER Installation and Administration Guide (Oracle Solaris)."

If the error cause was any one of the above check items, take the corrective action. Then, restart the shutdown facility by executing the following commands on the node which output the above message.

# /opt/SMAW/bin/sdtool -e
# /opt/SMAW/bin/sdtool -b

When the connection fails again even after the above action is taken, a network fault or a hardware failure of HUB may be the error cause. In this case, contact field engineers.

If this corrective action does not work, record this message and collect information for an investigation. Then, contact field engineers. For details on how to collect information, see "Troubleshooting" in "PRIMECLUSTER Installation and Administration Guide."

SA SA_libvirtgp to test host nodename failed
SA SA_libvirtgr to test host nodename failed

Content:

The node on which the message is displayed failed to check the connection to the hypervisor on the node nodename.

Corrective action:

Check the following:

  • If the guest OSes, hypervisor, or the network has high load.

  • If this message is no longer displayed after 10 minutes, the error may have been solved. Execute the following command to check if the shutdown facility works properly.

    # /opt/SMAW/bin/sdtool -s
  • If the following configurations are correct.

    • /etc/opt/SMAW/SMAWsf/SA_libvirtgp.cfg

    • /etc/opt/SMAW/SMAWsf/SA_libvirtgr.cfg

    Specify a user password for the shutdown facility that is encrypted by the /opt/SMAW/bin/sfcipher command.

    See "PRIMECLUSTER Installation and Administration Guide (Linux)" to specify these files.

  • If a passphrase is not configured to connect to the hypervisor.

  • If the hypervisor or the guest OSes are configured properly.

  • If the virtual IP addresses that are allocated for guest OSes are valid.

    See "PRIMECLUSTER Installation and Administration Guide (Linux)" for configuration of the hypervisor or the guest OSes.

  • If the sudo command is configured for a login user account of the shutdown facility.

    See "PRIMECLUSTER Installation and Administration Guide (Linux)" for configuration of the sudo command.

  • Check if the login user account for the shutdown facility is used to connect the guest OS (node) with hyper visor by SSH, and the user authentication (such as creating RSA key) has been completed to connect by SSH for the first time.

  • Check If a password authentication is used as a user authentication to connect the guest OS (node) with hyper visor by SSH.

  • If an auto password authentication such as a public key authentication is used for hyper visor, disable it. If the normal lamp of the port which is connected by HUBs and LAN cables is on.

  • If the cables are connected properly.

  • If events such as panic or hang-up has not occurred on the hyper visor.

If the error cause was any one of the above check items, restart the SNMP asynchronous monitoring function and the shutdown facility by executing the following commands on the node which output the above message.

# /opt/SMAW/bin/sdtool -e
# /opt/SMAW/bin/sdtool -b

When the connection fails again even after the above action is taken, hardware failure such as a network failure or a failure of HUB may be the cause of the error. In this case, contact field engineers.

SA SA_mmbp.so to test host nodename failed
SA SA_mmbr.so to test host nodename failed

Content:

The node on which the message is displayed failed to check the connection to MMB on the node nodename.

Corrective action:

When this message is output only right after the OS is started and the below 3083 message is output 10 minutes after the message has been output, no corrective action is required. The message is displayed only because the snmptrapd daemon is under activation.

3083: Monitoring another node has been started.

If the error does not correspond to the above, check the following settings.

  • Check if the IP address of MMB is properly specified. Also check if the IP address of the administrative LAN on the local node, which is configured for the shutdown daemon of the MMB shutdown facility, is properly specified.

  • Check if the user name and the password to control MMB by RMCP, which are registered for the MMB information of the MMB shutdown facility, are valid.

  • Check if the user's [Privilege] to control MMB by RMCP, which is configured for the MMB shutdown facility, is "Admin".

  • Check if the user's [Status] to control MMB by RMCP, which is configured for the MMB shutdown facility, is "Enabled ".

For how to check the configuration to control MMB by RMCP, which is configured for the MMB shutdown facility, see the manual provided with the machine.

If the IP address of MMB is changed, or the MMB information of the MMB shutdown facility should be changed because it has an error, see the description of changing the administration configuration in "PRIMECLUSTER Installation and Administration Guide (Linux)" for the corrective action. If the IP address of MMB is changed, the procedure to change the IP address of MMB, which is described in the manual, is not required.

If other configurations have errors, correct them and then execute the following commands on all the nodes to restart the shutdown facility.

# /opt/SMAW/bin/sdtool -e
# /opt/SMAW/bin/sdtool -b

If the configuration does not have errors, check the following:

  • Check if the snmptrapd daemon is started (check the process of snmptrapd by the ps [1] command or other commands).

  • Check if the normal lamp of the port connected to the HUB and the LAN cable is ON.

  • Check if LAN cables are properly connected to the MMB port connector or the HUB-side connector.

  • Check if MMB has not restarted or updated the firmware, or if an event such as a switch of Active MMB has not occurred.

  • Check if MMB does not have failures.

  • Check if the nodes and the network do not have load.

If one of the above is found to be the cause of the error, the MMB monitoring agent will be automatically recovered after the corrective action is taken. Maximum 10 minutes are required for this automatic recovery.

When the connection fails again even after the above action is taken, a network failure or a failure in hardware such as MMB or HUB may be the cause of the error. In this case, contact field engineers.

If this corrective action does not work, record this message and collect information for an investigation. Then, contact field engineers.

For details on how to collect information, see "Troubleshooting" in "PRIMECLUSTER Installation and Administration Guide."

SA SA_pprcip.so to test host nodename failed
SA SA_pprcir.so to test host nodename failed

Content:

The node on which the message is displayed failed to check the connection to RCI on the node nodename.

Corrective action:

Check the following:

  • If the system does not have load.

    • Solaris PRIMECLUSTER 4.2A00 or earlier

      Execute the following commands to restart the RCI asynchronous monitoring function and the shutdown facility.

      # /opt/SMAW/bin/sdtool -e
      # /etc/opt/FJSVcluster/bin/clrcimonctl stop
      # /etc/opt/FJSVcluster/bin/clrcimonctl start
      # /opt/SMAW/bin/sdtool -b
    • Solaris PRIMECLUSTER 4.3A10 or later

    • If this message is no longer displayed after 10 minutes, the error may have been solved. Execute the following command to check if the shutdown facility works properly.

      # /opt/SMAW/bin/sdtool -s
  • If RCI is properly configured. The RCI address does not have following errors.

    • If the RCI address is configured.

    • If the RCI address is not duplicated.

    • If the RCI address of another node is changed while the RCI asynchronous monitoring function is active.

  • If the cable is connected properly.

  • The monitoring time out time through SCF/RCI is properly configured in the /etc/system file. For how to configure the monitoring timeout time, see "PRIMECLUSTER Installation and Administration Guide (Oracle Solaris)."

  • If the RCI device has not restarted or upgraded the firmware.

If the error cause was any one of the above check items, restart the SNMP asynchronous monitoring function and the shutdown facility by executing the following commands on the node which output the above message.

# /opt/SMAW/bin/sdtool -e
# /etc/opt/FJSVcluster/bin/clrcimonctl stop
# /etc/opt/FJSVcluster/bin/clrcimonctl start
# /opt/SMAW/bin/sdtool -b

When the connection fails again even after the above action is taken, hardware failure such as a failure of RCI cables or a failure of the System Control Facility (SCF) may be the cause of the error. In this case, record this message and collect both the SCF dump and information for investigation before contacting our field engineers. For how to collect the SCF dump and information for investigation, see "PRIMECLUSTER Installation and Administration Guide"

SA SA_rpdu to test host nodename failed

Content:

Connection from the node where the error message is output to the Remote Power Distribution Unit connected to the unit of the node nodename failed.

Corrective action:

Check the following points:

Corrective action 1
  • Check if any error occurred in the Remote Power Distribution Unit.

  • See the manual of the Remote Power Distribution Unit to check if any error occurred in the Remote Power Distribution Unit. Follow the manual and take the corrective action.

Corrective action 2
  • The system or the network is not under load.

    If this message is no longer displayed 10 minutes after the first message is displayed, the connection may be recovered.

    Execute the following command to check that the Shutdown Facility is working correctly.

    # /opt/SMAW/bin/sdtool -s
  • Check if the following items of the Remote Power Distribution Unit which is specified in the RPDU Shutdown Agent are set correctly: IP address, outlet number, user name, and password.

    Check the setting of the RPDU Shutdown Agent.

    For details, see "PRIMECLUSTER Installation and Administration Guide (Oracle Solaris)."

  • The outlet connected to the node is ON.

    Execute the following command to check the status of the Remote Power Distribution Unit.

    # /opt/SMAW/SMAWsf/bin/sfrpdupoweron -l

    If the displayed status of the outlet is OFF, execute the following command to switch the outlet to ON.

    # /opt/SMAW/SMAWsf/bin/sfrpdupoweron -p CF nodename
  • The normal lamp of the port connected to HUB and LAN cable is ON.

  • The LAN cable is correctly connected to the HUB connector or the LAN port connector of the Remote Power Distribution Unit.

  • The firmware is not restarted nor updated in the Remote Power Distribution Unit.

  • The user who accesses the Remote Power Distribution Unit has the required authority.

    The required authority may not be given to the user specified for the setting of the RPDU Shutdown Agent.
    For details, see "PRIMECLUSTER Installation and Administration Guide (Oracle Solaris)."

If any of the status above is the cause of the connection failure, take the corrective action first.
After that, execute the following commands on the node where the message is output, and then restart the Shutdown Facility.

# /opt/SMAW/bin/sdtool -e
# /opt/SMAW/bin/sdtool -b

If the above corrective actions cannot solve this error, record this message and collect information for an investigation. Then, contact field engineers.

For details on how to collect information, see "Troubleshooting" in "PRIMECLUSTER Installation and Administration Guide."

SA SA_sunF to test host nodename failed

Content:

The node on which the message is displayed failed to check the connection to ALOM on the node nodename.

Corrective action:

Check the following:

  • If the system does not have load.

  • If this message is no longer displayed after 10 minutes, the error may have been solved. Execute the following command to check if the shutdown facility works properly.

    # /opt/SMAW/bin/sdtool -s
  • If the IP address of ALOM, the user name and the password to log in ALOM, which are specified in the /etc/opt/SMAW/SMAWsf/SA_sunF.cfg are correct.

    For how to specify the /etc/opt/SMAW/SMAWsf/SA_sunF.cfg, see "PRIMECLUSTER Installation and Administration Guide (Oracle Solaris)."

  • If ALOM is properly configured.

    For the special notes to configure the shutdown facility, see "PRIMECLUSTER Installation and Administration Guide (Oracle Solaris)."

    For how to configure the ALOM and how to check the configuration, see "Advanced Lights Out Management (ALOM) CMT guide."

  • If the normal lamp of the port which is connected by HUBs and LAN cables is on.

  • If LAN cables are properly connected to the network control port (NET MGT) connector of ALOM or the HUB connector.

  • If ALOM has not restarted or upgraded the firmware.

If the error cause was any one of the above check items, restart the SNMP asynchronous monitoring function and the shutdown facility by executing the following commands on the node which output the above message.

# /opt/SMAW/bin/sdtool -e
# /opt/SMAW/bin/sdtool -b

When the connection fails again even after the above action is taken, hardware failure such as a network failure or a failure of ALOM or HUB may be the cause of the error. In this case, contact field engineers.

SA SA_vmawsReset to test host nodename failed
SA SA_vmawsAsyncReset to test host
nodename failed

Content:

The node on which the message is displayed failed to connect the node nodename.

Corrective action:

Check the following:

  • Check if the system or the network has high load.

    If this message is no longer displayed after 10 minutes, the error may have been solved. Execute the following command to check if the shutdown facility works properly.

    # /opt/SMAW/bin/sdtool -s
  • Check if the configuration file (/etc/opt/SMAW/SMAWsf/SA_vmawsReset.cfg, or /etc/opt/SMAW/SMAWsf/SA_vmawsAsyncReset.cfg) is specified correctly.

    For settings of the configuration file, refer to "Setting up the Shutdown Facility" in "Part 4 AWS Environment" in "PRIMECLUSTER Installation and Administration Guide Cloud Services."

  • Check if the AWS Command Line Interface (hereinafter AWS CLI) is set correctly.

    Make sure that the credentials of an IAM user set with the aws configure command are correct.

  • Check if the version 1 (1.16 or later) of the AWS CLI is installed.

  • Check if the AWS CLI is installed for the root user (/root/.local/bin), not in /usr/local/aws, /usr/local/bin.

  • Check if the instance on which the cluster host is running can communicate with the AWS endpoint.

If the error cause was any one of the above check items, take the corrective action. Then, restart the shutdown facility by executing the following commands on the node where the above message is output.

# /opt/SMAW/bin/sdtool -e
# /opt/SMAW/bin/sdtool -b

If the problem is not resolved by the above action, record this message, collect information for an investigation, and then contact field engineers. For details on how to collect information, refer to "Troubleshooting" in "PRIMECLUSTER Installation and Administration Guide."

SA SA_vmazureReset to test host nodename failed

Content:

The node on which the message is displayed failed to connect the node nodename.

Corrective action:

Check the following:

  • Check if the system or the network has high load.

    If this message is no longer displayed after 10 minutes, the error may have been solved. Execute the following command to check if the shutdown facility works properly.

    # /opt/SMAW/bin/sdtool -s
  • Check if the configuration file (/etc/opt/SMAW/SMAWsf/SA_vmazureReset.cfg) is specified correctly.

    • For nodename, set the information of the environment where PRIMECLUSTER is installed.

    • For settings of nodename, refer to "Setting up the Shutdown Facility" in "Part 5 Azure Environment" in "PRIMECLUSTER Installation and Administration Guide Cloud Services."

  • Check if the virtual machine on which the cluster host is running can communicate with the Azure endpoint.

  • Check if the Azure CLI is installed and the service principal is registered.

  • Check if the file path where the certificate file of the service principal is located is specified in CertPath of the configuration file of the Azure (SA_vmazureReset) shutdown agent.

If the problem is not resolved by the above actions, record this message, collect information for an investigation, and then contact field engineers. For details on how to collect information, refer to "Troubleshooting" in "PRIMECLUSTER Installation and Administration Guide."

SA SA_vmchkhost to test host nodename failed
SA SA_vmgp to test host nodename failed
SA SA_vmSPgp to test host nodename failed
SA SA_vmSPgr to test host nodename failed

Content:

The node on which the message is displayed failed to check the connection to the host OS on the node nodename.

Corrective action:

Check the following:

  • If events such as panic or hang-up has not occurred on the host OS.

  • If the guest OS, host OS, or the network has high load.

  • If this message is no longer displayed after 10 minutes, the error may have been solved. Execute the following command to check if the shutdown facility works properly.

    # /opt/SMAW/bin/sdtool -s

  • Check if the following configurations are correct when the output message is either SA_vmchkhost, SA_vmSPgp, or SA_vmSPgr.

    • /etc/opt/SMAW/SMAWsf/SA_vmchkhost.cfg

    • /etc/opt/SMAW/SMAWsf/SA_vmSPgp.cfg

    • /etc/opt/SMAW/SMAWsf/SA_vmSPgr.cfg

    Specify the account of the host OS and the login password of FJSVvmSP that are encrypted by the /opt/SMAW/bin/sfcipher command.

    See "PRIMECLUSTER Installation and Administration Guide (Linux)" to specify these files.

  • Check if the host OS information, which are the IP address, user name, and the node name of the host OS, are correct when the output message is SA_vmgp.

    Use clrccusetup(1M) to check the configured console information. If there is any error in the information, use clrccusetup(1M) to register the console information again.

  • If a passphrase is not configured to connect to the hypervisor.

  • If the virtual IP addresses that are allocated for guest OSes are valid.

  • If the host OS or the guest OSes are configured properly.

    See "PRIMECLUSTER Installation and Administration Guide (Linux)" for configuration of the host OS or the guest OSes.

  • Check if the login user account for the shutdown facility is used to connect the guest OS (node) with the host OS by SSH, and the user authentication (such as creating RSA key) has been completed to connect by SSH for the first time.

  • Check If a password authentication is used as a user authentication to connect the guest OS (node) with the host OS by SSH. If an auto password authentication such as a public key authentication is used for the host OS, disable it.

  • If the normal lamp of the port which is connected by HUBs and LAN cables is on.

  • If LAN cables are properly connected to the PRIMEQUEST connector.

If the error cause was any one of the above check items, restart the SNMP asynchronous monitoring function and the shutdown facility by executing the following commands on the node which output the above message.

# /opt/SMAW/bin/sdtool -e
# /opt/SMAW/bin/sdtool -b

When the connection fails again even after the above action is taken, hardware failure such as a network failure or a failure of HUB may be the cause of the error. In this case, contact field engineers.

SA SA_vmk5r to test host nodename failed

Content:

The node on which the message is displayed failed to connect the node nodename.

Corrective action:

Check the following:

  • Check if the system or the network has high load.

    If this message is no longer displayed after 10 minutes, the error may have been solved. Execute the following command to check if the shutdown facility works properly.

    # /opt/SMAW/bin/sdtool -s
  • Check if the environment information of FJcloud-O specified in the /opt/SMAW/SMAWRrms/etc/k5_endpoint.cfg is correct.

    For how to specify the /opt/SMAW/SMAWRrms/etc/k5_endpoint.cfg, see "Creating the FJcloud-O Environment Information File" in "PRIMECLUSTER Installation and Administration Guide Cloud Services."

  • Check if the items such as a virtual server name, and a user name and a password for forcibly stopping the virtual server that are specified in the /etc/opt/SMAW/SMAWsf/SA_vmk5r.cfg are correct.

    For how to specify the /etc/opt/SMAW/SMAWsf/SA_vmk5r.cfg, see "Setting up the Shutdown Facility" in "Part 1 FJcloud-O Environment" in "PRIMECLUSTER Installation and Administration Guide Cloud Services."

  • Check if a password of a user for forcibly stopping the virtual server specified in the /etc/opt/SMAW/SMAWsf/SA_vmk5r.cfg has not expired (90 days).

    See "Changing a Password Periodically" in "PRIMECLUSTER Installation and Administration Guide Cloud Services" and change the password. Also, change the password periodically.

  • Check if the virtual server on which the cluster host is running can communicate with the endpoint for the regional user management or the endpoint for the compute (standard service) in FJcloud-O.

If this error cannot be corrected by the above action, record this message and collect information for an investigation.

Then, contact field engineers.

For details on how to collect information, see "Troubleshooting" in "PRIMECLUSTER Installation and Administration Guide."

SA SA_vmnifclAsyncReset to test host nodename failed.

Content:

The node on which the message is displayed failed to connect the node nodename.

Corrective action:

Check the following:

  • Check if the system or the network has high load.

    If this message is no longer displayed after 10 minutes, the error may have been solved. Execute the following command to check if the shutdown facility works properly.

    # /opt/SMAW/bin/sdtool -s
  • Check if the configuration file (/etc/opt/SMAW/SMAWsf/SA_vmnifclAsyncReset.cfg) is specified correctly.

    For settings of the configuration file, see "Setting up the Shutdown Facility" in "Part 2 NIFCLOUD Environment" in "PRIMECLUSTER Installation and Administration Guide Cloud Services."

  • Check if the server on which the cluster host is running can communicate with the NIFCLOUD endpoint.

If the error cause was any one of the above check items, take the corrective action. Then, restart the shutdown facility by executing the following commands on the node where the above message is output.

# /opt/SMAW/bin/sdtool -e
# /opt/SMAW/bin/sdtool -b

If this corrective action does not work, record this message and collect information for an investigation. Then, contact field engineers. For details on how to collect information, see "Troubleshooting" in "PRIMECLUSTER Installation and Administration Guide."

SA SA_vmosr to test host nodename failed

Content:

The node on which the message is displayed failed to connect to the node nodename.

Corrective action:

Check the following:

  • Check if the system or the network has high load.

    If this message is no longer displayed after 10 minutes, the error may have been solved. Execute the following command to check if the shutdown facility works properly.

    # /opt/SMAW/bin/sdtool -s
  • Check if the following configurations are correct.

    • /opt/SMAW/SMAWRrms/etc/os_endpoint.cfg

    • /etc/opt/SMAW/SMAWsf/SA_vmosr.cfg

    Specify the user password for instance control in RHOSP encrypted by the /opt/SMAW/bin/sfcipher command.

    For details on specifying these files, see "PRIMECLUSTER Installation and Administration Guide (Linux)."

  • Check if the admin role is given to the user for instance control in RHOSP.

  • Check if the instance where the cluster host is running can communicate with both Identity and Compute services in RHOSP.

If the error cause was any one of the above check items, take the corrective action. Then, restart the shutdown facility by executing the following commands on the node which output the above message.

# /opt/SMAW/bin/sdtool -e
# /opt/SMAW/bin/sdtool -b

When the connection fails again even after the above action is taken, hardware failure such as a network failure or a failure of HUB may be the cause of the error. In this case, contact field engineers.

If this corrective action does not work, record this message and collect information for an investigation. Then, contact field engineers. For details on how to collect information, see "Troubleshooting" in "PRIMECLUSTER Installation and Administration Guide."

SA SA_vwvmr to test host nodename failed

Content:

The node on which the message is displayed failed to check the connection to vCenter Server.

Corrective action:

Check the following:

  • Make sure that the system or the network has high load.

    If this message is no longer displayed after 10 minutes, the error may have been solved. Execute the following command to check if the shutdown facility works properly.

    # /opt/SMAW/bin/sdtool -s

  • Make sure that the setting items of vCenter Server including IP address, port number, user name, and password specified in /etc/opt/SMAW/SMAWsf/SA_vwvmr.cfg are valid.

  • Make sure that the VM name specified in /etc/opt/SMAW/SMAWsf/SA_vwvmr.cfg is valid.

  • Make sure that /etc/opt/SMAW/SMAWsf/SA_vwvmr.cfg contains Japanese characters, and the character code of SA_vwvmr.cfg must be UTF-8.

    For details on specifying /etc/opt/SMAW/SMAWsf/SA_vwvmr.cfg, see "PRIMECLUSTER Installation and Administration Guide (Linux)."

  • Make sure that the necessary permission or the role to power off VM is set to the user specified in /etc/opt/SMAW/SMAWsf/SA_vwvmr.cfg.

    For details on user permissions, see "PRIMECLUSTER Installation and Administration Guide (Linux)."

  • Make sure that the network interface that are allocated for guest OSes are valid.

  • Make sure that the normal lamp of the port which is connected by HUBs and LAN cables is on.

  • Make sure that the cables are properly connected.

If any of the status above is the cause of the connection failure, take the corrective action first.
After that, execute the following commands on the node where the message is output, and then restart the shutdown facility.

# /opt/SMAW/bin/sdtool -e
# /opt/SMAW/bin/sdtool -b

If the connection can still not be established after checking above items, the possible cause may be a hardware damage (e.g. network or hub failure).In that case, please contact field engineers.

When this corrective action does not work, record this message and collect information for an investigation. Then, contact field engineers. For details on how to collect information, see "Troubleshooting" in "PRIMECLUSTER Installation and Administration Guide."

SA SA_xscfsnmpg0p.so to test host nodename failed
SA SA_xscfsnmpg1p.so to test host nodename failed
SA SA_xscfsnmpg0r.so to test host nodename failed
SA SA_xscfsnmpg1r.so to test host nodename failed
SA SA_xscfsnmp0r.so to test host nodename failed
SA SA_xscfsnmp1r.so to test host nodename failed

Content:

The node on which the message is displayed failed to check the connection to XSCF on the node nodename.

Corrective action:

Check the following:

<Corrective action 1>
  • If the operation has been performed for the cluster after Migration.

  • If the guest domain, which was stopped by Cold Migration, has been started.

When either of the above is found to be the cause of the error, see "PRIMECLUSTER Installation and Administration Guide (Oracle Solaris)" to perform the operation after Migration.

<Corrective action 2>
  • If the configuration information of the logical domains has been saved.

  • If the states of the logical domains can be checked in XSCF.

When either of the above is found to be the cause of the error, use the ldm add-spconfig command to save the confirmation information of the logical domains.

<Corrective action 3>
  • Whether a load is placed on the system or the network.

  • If the console information of XSCF such as the IP address, node name, or the connection protocols (telnet and SSH) is correct.

  • Use clrccusetup(1M) to check the configured SNMP asynchronous monitoring information. If there is any error in the information, use clrccusetup(1M) to register the console information again.

  • If XSCF is configured properly.

  • For the special notes to configure the shutdown facility, see "PRIMECLUSTER Installation and Administration Guide (Oracle Solaris)."

  • For how to configure XSCF and how to check the configuration, see "Fujitsu SPARC M12 and Fujitsu M10/SPARC M10 System Operation and Administration Guide."

  • When SSH is used to connect to XSCF.

    • Check if the login user account for the shutdown facility is used to connect the cluster node with XSCF by SSH, and the user authentication (such as creating RSA key) has been completed to connect by SSH for the first time.

    • Check if a password authentication is used as a user authentication to connect the cluster node with XSCF by SSH.

    • If an auto password authentication such as a public key authentication is used for XSCF, disable it. For how to configure XSCF and how to check the configuration, see "Fujitsu SPARC M12 and Fujitsu M10/SPARC M10 System Operation and Administration Guide."

  • If the normal lamp of the port which is connected by HUBs and LAN cables is on.

  • If LAN cables are properly connected to the XSCF-LAN port connector of XSCF or the HUB connector.

  • If the XSCF shell port, which belongs to the telnet port, is not connected from out of a cluster.

  • Connect to the XSCF shell port through a serial port to check the connection status. For how to connect and how to check the connection, see "Fujitsu SPARC M12 and Fujitsu M10/SPARC M10 System Operation and Administration Guide."

  • If the IP address of XSCF belongs to the same segment as the administrative LAN or the asynchronous monitoring sub-LAN.

  • If the firmware in XSCF has not been restarted or updated, or if an event such as XSCF failover has not occurred.

If the error cause was any one of the above check items, restart the SNMP asynchronous monitoring function and the shutdown facility by executing the following commands on the node which output the above message.

# /opt/SMAW/bin/sdtool -e
# /etc/opt/FJSVcluster/bin/clsnmpmonctl stop # /etc/opt/FJSVcluster/bin/clsnmpmonctl start # /opt/SMAW/bin/sdtool -b
<Corrective action 4>
  • If the port number (No.9385) of the SNMP trap receiver daemon (snmptrapd) that is used in the shutdown facility is duplicated with the port number of the other product.

When the above is found to be the cause of the error, see "Changing Port Numbers for SNMP" in "PRIMECLUSTER Installation and Administration Guide (Oracle Solaris)."

<Corrective action 5>
  • If the status of the SMF service (svc:/milestone/fjsvcldevtrap) in the shutdown facility is online.

    Execute the following command to check the status of the fjsvcldevtrap service.

    # svcs svc:/milestone/fjsvcldevtrap

If the status is either disabled or maintenance, take the following actions.

# /opt/SMAW/bin/sdtool -e
# /etc/opt/FJSVcluster/bin/clsnmpmonctl stop
# /etc/opt/FJSVcluster/bin/clsnmpmonctl start
# /opt/SMAW/bin/sdtool -b
<Corrective action 6>
  • If the SNMP agent of XSCF works correctly.

    Log in to XSCF and execute the showsnmp command. Check the execution result.

    XSCF> showsnmp

    [Checklist]

    • "Agent Status: Enabled" is output

    • "Enabled MIB Modules: None" is not output

If the execution result does not match the expected output results in the checklist above, execute the following commands on XSCF.

XSCF> setsnmp disable
XSCF> setsnmp enable

After that, take the following actions on all cluster nodes.

# /opt/SMAW/bin/sdtool -e
# /opt/SMAW/bin/sdtool -b

<Other corrective actions>

If the error cause was not any one of the above check items, a network fault or a hardware fault such as XSCF or HUB may be the error cause. In this case, contact field engineers.

If the problem is not resolved by the above action, record this message and collect information for an investigation. Then, contact field engineers. For details on how to collect information, see "Troubleshooting" in "PRIMECLUSTER Installation and Administration Guide."

SA Shutdown Agent to init host nodename failed

Content:

Initializing the shutdown agent Monitoring Agent for the node <nodename> failed.

Corrective action:

Check the following points:

<Corrective action 1>

Make sure that the shutdown agent is properly configured. For how to configure the shutdown agent, see "PRIMECLUSTER Installation and Administration Guide."

<Corrective action 2>

Make sure that the asynchronous monitoring agent is started. For how to verify that the asynchronous monitoring agent is started, refer to "PRIMECLUSTER Installation and Administration Guide." If the asynchronous monitoring agent is stopped, take the following procedure:

  1. Stop the shutdown facility.

    # /opt/SMAW/bin/sdtool -e
  2. Start the asynchronous monitoring daemon according to your environment. For how to start the asynchronous monitoring daemon, refer to "PRIMECLUSTER Installation and Administration Guide."

  3. Start the shutdown facility.

    # /opt/SMAW/bin/sdtool -b

If the failure continues after taking the above actions, record this message and collect information for an investigation. Then, contact field engineers. For details on how to collect information, see "Troubleshooting" in "PRIMECLUSTER Installation and Administration Guide."

SA Shutdown Agent to test host nodename failed

Content:

The node on which the message is displayed failed to check the connection to the node nodename.

The connection destination varies depending on each shutdown agent.

Corrective action:

Check the following. The connection destination varies depending on each shutdown agent.

  • Check if the system or the network has high load.

    If this message is no longer displayed after 10 minutes, the error may have been solved. Execute the following command to check if the shutdown facility works properly.

    # /opt/SMAW/bin/sdtool -s
  • Check if the configuration of the shutdown daemon (/etc/opt/SMAW/SMAWsf/rcsd.cfg) or the configuration of the shutdown agent is correct.

  • Check if the configuration of the connection destination is correct.

  • When SSH is used to connect to the connection destination, check if the login user account for the shutdown facility is used to connect the cluster node with the connection destination by SSH, and the user authentication (such as creating RSA key) has been completed to connect by SSH for the first time.

  • Check if a password authentication is used as a user authentication to connect the cluster node with the connection destination by SSH.

    If an auto password authentication such as a public key authentication is used for the connection destination, disable it.

  • Check if the cables are properly connected to the connection destination.

If the error cause was any one of the above check items, take the corrective action. Then, restart the shutdown facility by executing the following commands on the node which output the above message. Some shutdown agents require the restart of the asynchronous monitoring function.

# /opt/SMAW/bin/sdtool -e
# /opt/SMAW/bin/sdtool -b

When the connection fails again even after the above action is taken, hardware failure such as a network failure, a failure of the connection destination or HUB may be the cause of the error. In this case, contact field engineers.

If this corrective action does not work, record this message and collect information for an investigation. Then, contact field engineers. For details on how to collect information, see "Troubleshooting" in "PRIMECLUSTER Installation and Administration Guide."

SA Shutdown Agent to unInit host nodename failed

Content:

The shutdown process of the shutdown agent <Shutdown Agent> for the node <nodename> failed.

Corrective action:

Collect information for an investigation. Then, contact field engineers. For details on how to collect information, see "Troubleshooting" in "PRIMECLUSTER Installation and Administration Guide."

select of CLI Pipe & RCSDNetPipe failed, errno errno

Content:

The select function in the shutdown daemon returns abnormally.

Corrective action:

The shutdown facility starts the operation to restart the pipe of CLI. No action is required because this does not affect other processes that are operated by the system.

string in file file around line number

Content:

The syntax in rcsd.cfg is incorrect.

Corrective action:

Correct the syntax error.

The attempted shutdown of cluster host nodename has failed

Content:

An internal error occurred in the program.

Corrective action:

Record this message and collect information for an investigation. Then, contact field engineers. For details on how to collect information, see "Troubleshooting" in "PRIMECLUSTER Installation and Administration Guide."

The authentication request failed.

Content:

  • When using SA_vmk5r

    The authentication request failed on the node on which the message is displayed.

  • When using SA_vmosr

    The authentication request of the token to the endpoint of the Identity service failed.

Corrective action:

  • When using SA_vmk5r

    Check the following:

    • Check if the system or the network has high load.

    • Check if the environment information of FJcloud-O specified in the /opt/SMAW/SMAWRrms/etc/k5_endpoint.cfg is correct.

      For how to specify the /opt/SMAW/SMAWRrms/etc/k5_endpoint.cfg, see "Creating the FJcloud-O Environment Information File" in "PRIMECLUSTER Installation and Administration Guide Cloud Services."

    • Check if a user name and a password for forcibly stopping a virtual server that are specified in the /etc/opt/SMAW/SMAWsf/SA_vmk5r.cfg are correct.

      For how to specify the /etc/opt/SMAW/SMAWsf/SA_vmk5r.cfg, see "Setting up the Shutdown Facility" in "Part 1 FJcloud-O Environment" in "PRIMECLUSTER Installation and Administration Guide Cloud Services."

    • Check if the virtual server on which the cluster host is running can communicate with the endpoint for the regional user management in FJcloud-O.

  • When using SA_vmosr

    Check the following:

    • Check if the system or the network has high load.

      If this message is no longer displayed after 10 minutes, the error may have been solved. Execute the following command to check if the shutdown facility works properly.

      # /opt/SMAW/bin/sdtool -s
    • Check if the following configurations are correct.

      • /opt/SMAW/SMAWRrms/etc/os_endpoint.cfg

      • /etc/opt/SMAW/SMAWsf/SA_vmosr.cfg

      Specify the user password for instance control in RHOSP encrypted by the /opt/SMAW/bin/sfcipher command.

      For details on specifying these files, see "PRIMECLUSTER Installation and Administration Guide (Linux)."

    • Check if the instance where the cluster host is running can communicate with the Identity service in RHOSP.

    When the connection fails again even after the above action is taken, a network failure or a failure in hardware such as HUB may be the cause of the error. In this case, contact field engineers.

If this corrective action does not work, record this message and collect information for an investigation. Then, contact field engineers. For details on how to collect information, see "Troubleshooting" in "PRIMECLUSTER Installation and Administration Guide."

The AWS CLI execution failed.

Corrective action:

Check if the AWS CLI is installed with a root user.

For installation of the AWS CLI, refer to "Installing the AWS Command Line Interface" in "PRIMECLUSTER Installation and Administration Guide Cloud Services."

If the problem is not resolved by the above action, record this message, collect information for an investigation, and then contact field engineers. For details on how to collect information, refer to "Troubleshooting" in "PRIMECLUSTER Installation and Administration Guide."

The Azure CLI execution failed.

Corrective action:

Check if the Azure CLI is installed with a root user.

For installation of the Azure CLI, refer to "Installing the Azure Command-Line Interface" in "PRIMECLUSTER Installation and Administration Guide Cloud Services."

If the problem is not resolved by the above action, record this message, collect information for an investigation, and then contact field engineers. For details on how to collect information, refer to "Troubleshooting" in "PRIMECLUSTER Installation and Administration Guide."

The configuration file /etc/opt/SMAW/SMAWsf/SA_vmawsReset.cfg does not exist.
The configuration file /etc/opt/SMAW/SMAWsf/SA_vmawsAsyncReset.cfg does not exist.

Content:

The configuration file of the AWS CLI (SA_vmawsReset, SA_vmawsAsyncReset) shutdown agent does not exist.

Corrective action:

Create the configuration file of the AWS CLI (SA_vmawsReset, SA_vmawsAsyncReset) shutdown agent.

For details on creating the configuration file, refer to "Setting up the Shutdown Facility" in "Part 4 AWS Environment" in "PRIMECLUSTER Installation and Administration Guide Cloud Services."

The configuration file /etc/opt/SMAW/SMAWsf/SA_vmazureReset.cfg does not exist.

Content:

The configuration file of the Azure (SA_vmazureReset) shutdown agent does not exist.

Corrective action:

Create the configuration file of the Azure (SA_vmazureReset) shutdown agent.

For details on creating the configuration file, refer to "Setting up the Shutdown Facility" in "Part 5 Azure Environment" in "PRIMECLUSTER Installation and Administration Guide Cloud Services."

The configuration file /etc/opt/SMAW/SMAWsf/SA_vmnifclAsyncReset.cfg does not exist.

Content:

The configuration file of the NIFCLOUD API (SA_vmnifclAsyncReset) shutdown agent does not exist.

Corrective action:

Create the configuration file of the NIFCLOUD API (SA_vmnifclAsyncReset) shutdown agent.

For details on creating the configuration file, see "Setting up the Shutdown Facility" in "Part 2 NIFCLOUD Environment" in "PRIMECLUSTER Installation and Administration Guide Cloud Services."

The configuration file /etc/opt/SMAW/SMAWsf/SA_vmosr.cfg does not exist.

Content:

The configuration file of the OpenStack API (SA_vmosr) shutdown agent does not exist.

Corrective action:

Create the configuration file of the OpenStack API (SA_vmosr) shutdown agent.

The configuration file /opt/SMAW/SMAWRrms/etc/os_endpoint.cfg does not exist.

Content:

The RHOSP environment information file does not exist.

Corrective action:

Create the RHOSP environment information file.

The configuration file configfile does not exist

Content:

The configuration file configfile does not exist on the node on which the message is displayed.

Corrective action:

See the references below and create the config file configfile.

  • If the configfile is /opt/SMAW/SMAWRrms/etc/k5_endpoint.cfg

    "Creating the FJcloud-O Environment Information File" in "PRIMECLUSTER Installation and Administration Guide Cloud Services"

  • If the configfile is /etc/opt/SMAW/SMAWsf/SA_vmk5r.cfg

    "Setting up the Shutdown Facility" in "Part 1 FJcloud-O Environment" in "PRIMECLUSTER Installation and Administration Guide Cloud Services"

The information acquisition request of the virtual machine instance-id failed.

Content:

The information acquisition request of the virtual machine for the AWS endpoint failed.

Corrective action:

Check the following:

  • Check if the system or the network has high load.

    If this message is no longer displayed after 10 minutes, the error may have been solved. Execute the following command to check if the shutdown facility works properly.

    # /opt/SMAW/bin/sdtool -s
  • Check if the configuration file (/etc/opt/SMAW/SMAWsf/SA_vmawsAsyncReset.cfg) is specified correctly.

    • For instance-id, set the information of the environment where PRIMECLUSTER is installed.

    • For settings of instance-id, refer to "Setting up the Shutdown Facility" in "Part 4 AWS Environment" in "PRIMECLUSTER Installation and Administration Guide Cloud Services."

  • Check if the instance on which the cluster host is running can communicate with the AWS endpoint.

  • Check if the AWS CLI is installed and the credentials are set.

If the problem is not resolved by the above action, record this message, collect information for an investigation, and then contact field engineers. For details on how to collect information, refer to "Troubleshooting" in "PRIMECLUSTER Installation and Administration Guide."

The information acquisition request of the virtual machine instancename failed.

Content:

The information acquisition request of the virtual machine to the endpoint of the Compute service failed.

Corrective action:

Check the following:

  • Check if the system or the network has high load.

    If this message is no longer displayed after 10 minutes, the error may have been solved. Execute the following command to check if the shutdown facility works properly.

    # /opt/SMAW/bin/sdtool -s
  • Check if the following configurations are correct.

    • /opt/SMAW/SMAWRrms/etc/os_endpoint.cfg

    • /etc/opt/SMAW/SMAWsf/SA_vmosr.cfg

    Specify the user password for instance control in RHOSP encrypted by the /opt/SMAW/bin/sfcipher command.

    For details on specifying these files, see "PRIMECLUSTER Installation and Administration Guide (Linux)."

  • Check if the admin role is given to the user for instance control in RHOSP.

  • Check if the instance where the cluster host is running can communicate with the Compute service in RHOSP.

When the connection fails again even after the above action is taken, hardware failure such as a network failure or a failure of HUB may be the cause of the error. In this case, contact field engineers.

If this corrective action does not work, record this message and collect information for an investigation. Then, contact field engineers. For details on how to collect information, see "Troubleshooting" in "PRIMECLUSTER Installation and Administration Guide."

The information acquisition request of the virtual machine resource-id failed.

Content:

The information acquisition request of the virtual machine for the Azure endpoint failed.

Corrective action:

Check the following:

  • Check if the system or the network has high load.

    If this message is no longer displayed after 10 minutes, the error may have been solved. Execute the following command to check if the shutdown facility works properly.

    # /opt/SMAW/bin/sdtool -s
  • Check if the configuration file (/etc/opt/SMAW/SMAWsf/SA_vmazureReset.cfg) is specified correctly.

    • For resource-id, set the resource ID of the virtual machine where PRIMECLUSTER is installed.

    • For settings of resource-id, refer to "Setting up the Shutdown Facility" in "Part 5 Azure Environment" in "PRIMECLUSTER Installation and Administration Guide Cloud Services."

  • Check if the virtual machine on which the cluster host is running can communicate with the Azure endpoint.

  • Check if the Azure CLI is installed and the service principal is registered.

  • Check if the file path where the certificate file of the service principal is located is specified in CertPath of the configuration file of the Azure (SA_vmazureReset) shutdown agent.

If the problem is not resolved by the above actions, record this message, collect information for an investigation, and then contact field engineers. For details on how to collect information, refer to "Troubleshooting" in "PRIMECLUSTER Installation and Administration Guide."

The information acquisition request of the virtual machine ServerName failed.

Content:

[When this message is output in an FJcloud-O environment]

The information acquisition request of the virtual server ServerName failed on the node on which the message is displayed.

[When this message is output in a NIFCLOUD environment]

The information acquisition request of the server ServerName for the NIFCLOUD endpoint failed.

Corrective action:

Check the following:

[In an FJcloud-O environment]
  • Check if the system or the network has high load.

  • Check if the environment information of FJcloud-O specified in the /opt/SMAW/SMAWRrms/etc/k5_endpoint.cfg is correct.

    For how to specify the /opt/SMAW/SMAWRrms/etc/k5_endpoint.cfg, see "Creating the FJcloud-O Environment Information File" in "PRIMECLUSTER Installation and Administration Guide Cloud Services."

  • Check if a virtual server name, and a user name and a password for forcibly stopping the virtual server that are specified in the /etc/opt/SMAW/SMAWsf/SA_vmk5r.cfg are correct.

    For how to specify the /etc/opt/SMAW/SMAWsf/SA_vmk5r.cfg, see "Setting up the Shutdown Facility" in "Part 1 FJcloud-O Environment" in "PRIMECLUSTER Installation and Administration Guide Cloud Services."

  • Check if the instance where the cluster host is running can communicate with the endpoint for the compute (standard service) in FJcloud-O.

[In a NIFCLOUD environment]
  • Check if the system or the network has high load.

    If this message is no longer displayed after 10 minutes, the error may have been solved. Execute the following command to check if the shutdown facility works properly.

    # /opt/SMAW/bin/sdtool -s
  • Check if the configuration file (/etc/opt/SMAW/SMAWsf/SA_vmnifclAsyncReset.cfg) is specified correctly.

    For ServerName, set the information of the environment where PRIMECLUSTER is installed. For settings, see "Setting up the Shutdown Facility" in "Part 2 NIFCLOUD Environment" in "PRIMECLUSTER Installation and Administration Guide Cloud Services."

  • Check if the server on which the cluster host is running can communicate with the NIFCLOUD endpoint.

If this corrective action does not work, record this message and collect information for an investigation. Then, contact field engineers. For details on how to collect information, see "Troubleshooting" in "PRIMECLUSTER Installation and Administration Guide."

The SF-CF event processing failed string, status value

Content:

When the "status 6147" is displayed during the shutdown processing, this means that the shutdown notification is received from other nodes.

When other messages are displayed, this means that an internal error occurred in the program.

Corrective action:

No corrective action is required when the "status 6147" is displayed during the shutdown processing. When the status is other than above, record this message and collect information for an investigation. Then, contact field engineers. For details on how to collect information, see "Troubleshooting" in "PRIMECLUSTER Installation and Administration Guide."

The SF-CF has failed to locate host nodename

Content:

The node name in the rcsd.cfg is not the CF node name.

Corrective action:

Specify the CF node name for the node name in the rcsd.cfg. Check the CF node name by cftool -n command. Correct the rcsd.cfg and execute the following command to restart the shutdown facility.

# /opt/SMAW/bin/sdtool -e
# /opt/SMAW/bin/sdtool -b

If the above action does not work, record this message and collect information for an investigation. Then, contact field engineers.

The SF-CF initialization failed, status value

Content:

CF is not configured or CF may not be loaded.

Corrective action:

Configure CF.
See the manual below for configuration.

"Example of creating a cluster" in "PRIMECLUSTER Cluster Foundation (CF) Configuration and Administration Guide"

The specified guest domain cannot be connected. (nodename:nodename)

Content:

The specified guest domain nodename cannot be connected.

Corrective action:

Check the following points:

  • Whether a load is placed on the system or the network.

    If it is a temporary load, no corrective action is required.

  • Whether the configured guest OS information is correct.

    • For Linux 4.3A30 or later

      Check whether the guest OS information set in /etc/opt/FJSVcluster/etc/kvmguests.conf is correct.
      If the guest OS information is incorrect, set the information in the kvmguests.conf file again.

    • For Solaris 4.3A20 or later

      Execute the clovmmigratesetup -l command to check whether the guest OS information is correct.
      If the guest OS information is incorrect, set the correct information again.
      If the guest OS information is output when executing the clovmmigratesetup command, check whether the content is correct.

  • Whether the specified guest OS can be connected.

    If the guest OS cannot be connected, review the network settings.

    Check whether the general privileged user of the shutdown facility is being registered in the guest OS.

  • Whether the guest OS is connected from the host OS via SSH connection with the root user or the general privileged user of the shutdown facility, and also the user inquiry of the SSH connection first time such as RSA Key Generation is complete.

    Complete the user inquiry of the SSH connection first time such as RSA Key Generation if it is not complete.

If this corrective action does not work, write down the error message, collect required information for troubleshooting and contact field engineers. For details on how to collect information, see "Troubleshooting" in "PRIMECLUSTER Installation and Administration Guide."

The stop request of the virtual machine instance-id failed.

Content:

The stop request of the virtual machine for the AWS endpoint failed.

Corrective action:

Check the following:

  • Check if the system or the network has high load.

  • Check if the configuration file (/etc/opt/SMAW/SMAWsf/SA_vmawsAsyncReset.cfg) is specified correctly.

    • For instance-id, set the information of the environment where PRIMECLUSTER is installed.

    • For settings of instance-id, refer to "Setting up the Shutdown Facility" in "Part 4 AWS Environment" in "PRIMECLUSTER Installation and Administration Guide Cloud Services."

  • Check if the instance on which the cluster host is running can communicate with the AWS endpoint.

  • Check if the AWS CLI is installed and the credentials are set.

If the problem is not resolved by the above action, record this message, collect information for an investigation, and then contact field engineers. For details on how to collect information, refer to "Troubleshooting" in "PRIMECLUSTER Installation and Administration Guide."

The stop request of the virtual machine instancename failed.

Content:

The stop request of the virtual machine to the endpoint of the Compute service failed.

Corrective action:

Check the following:

  • Check if the system or the network has high load.

  • Check if the following configurations are correct.

    • /opt/SMAW/SMAWRrms/etc/os_endpoint.cfg

    • /etc/opt/SMAW/SMAWsf/SA_vmosr.cfg

    Specify the user password for instance control in RHOSP encrypted by the /opt/SMAW/bin/sfcipher command.

    For details on specifying these files, see "PRIMECLUSTER Installation and Administration Guide (Linux)."

  • Check if the admin role is given to the user for instance control in RHOSP.

  • Check if the instance where the cluster host is running can communicate with the Compute service in RHOSP.

When the connection fails again even after the above action is taken, hardware failure such as a network failure or a failure of HUB may be the cause of the error. In this case, contact field engineers.

If this corrective action does not work, record this message and collect information for an investigation. Then, contact field engineers. For details on how to collect information, see "Troubleshooting" in "PRIMECLUSTER Installation and Administration Guide."

The stop request of the virtual machine resource-id failed.

Content:

The stop request of the virtual machine for the Azure endpoint failed.

Corrective action:

Check the following:

  • Check if the system or the network has high load.

  • Check if the configuration file (/etc/opt/SMAW/SMAWsf/SA_vmazureReset.cfg) is specified correctly.

    • For resource-id, set the resource ID of the virtual machine where PRIMECLUSTER is installed.

    • For settings of resource-id, refer to "Setting up the Shutdown Facility" in "Part 5 Azure Environment" in "PRIMECLUSTER Installation and Administration Guide Cloud Services."

  • Check if the virtual machine on which the cluster host is running can communicate with the Azure endpoint.

  • Check if the Azure CLI is installed and the service principal is registered.

  • Check if the file path where the certificate file of the service principal is located is specified in CertPath of the configuration file of the Azure (SA_vmazureReset) shutdown agent.

If the problem is not resolved by the above actions, record this message, collect information for an investigation, and then contact field engineers. For details on how to collect information, refer to "Troubleshooting" in "PRIMECLUSTER Installation and Administration Guide."

The stop request of the virtual machine ServerName failed.

Content:

[When this message is output in an FJcloud-O environment]

The stop request of the virtual server ServerName failed on the node on which the message is displayed.

[When this message is output in a NIFCLOUD environment]

The stop request of the server ServerName for the NIFCLOUD endpoint failed.

Corrective action:

Check the following:

[In an FJcloud-O environment]
  • Check if the system or the network has high load.

  • Check if the environment information of FJcloud-O specified in the /opt/SMAW/SMAWRrms/etc/k5_endpoint.cfg is correct.

    For how to specify the /opt/SMAW/SMAWRrms/etc/k5_endpoint.cfg, see "Creating the FJcloud-O Environment Information File" in "PRIMECLUSTER Installation and Administration Guide Cloud Services."

  • Check if a virtual server name, and a user name and a password for forcibly stopping the virtual server that are specified in the /etc/opt/SMAW/SMAWsf/SA_vmk5r.cfg are correct.

    For how to specify the /etc/opt/SMAW/SMAWsf/SA_vmk5r.cfg, see "Setting up the Shutdown Facility" in "Part 1 FJcloud-O Environment" in "PRIMECLUSTER Installation and Administration Guide Cloud Services."

  • Check if the virtual server on which the cluster host is running can communicate with the endpoint for the compute (standard service) in FJcloud-O.

[In a NIFCLOUD environment]
  • Check if the system or the network has high load.

    If this message is no longer displayed after 10 minutes, the error may have been solved. Execute the following command to check if the shutdown facility works properly.

    # /opt/SMAW/bin/sdtool -s
  • Check if the configuration file (/etc/opt/SMAW/SMAWsf/SA_vmnifclAsyncReset.cfg) is specified correctly.

    For ServerName, set the information of the environment where PRIMECLUSTER is installed. For settings, see "Setting up the Shutdown Facility" in "Part 2 NIFCLOUD Environment" in "PRIMECLUSTER Installation and Administration Guide Cloud Services."

  • Check if the server on which the cluster host is running can communicate with the NIFCLOUD endpoint.

If this corrective action does not work, record this message and collect information for an investigation. Then, contact field engineers. For details on how to collect information, see "Troubleshooting" in "PRIMECLUSTER Installation and Administration Guide."