Top
PRIMECLUSTER Messages
FUJITSU Software

6.1.2 Warning Messages

This chapter contains a detailed list of all RMS warnings that appear in the switchlog.

Check displayed component names of messages and then see the table below to determine references. The component names are explained in numerical order of messages.

Component
name

Reference

ADC

"6.1.2.1 ADC: Admin configuration"

ADM

"6.1.2.2 ADM: Admin, command, and detector queues"

BAS

"6.1.2.3 BAS: Startup and configuration errors"

BM

"6.1.2.4 BM: Base monitor"

CTL

"6.1.2.5 CTL: Controllers"

CUP

"6.1.2.6 CUP: userApplication contracts"

DET

"6.1.2.7 DET: Detectors"

SCR

"6.1.2.8 SCR: Scripts"

SWT

"6.1.2.9 SWT: Switch requests (hvswitch command))"

SYS

"6.1.2.10 SYS: SysNode objects"

UAP

"6.1.2.11 UAP: userApplication objects"

US

"6.1.2.12 US: us files"

WLT

"6.1.2.13 WLT: Wait list"

WRP

"6.1.2.14 WRP: Wrappers"

6.1.2.1 ADC: Admin configuration

(ADC, 19) Clearing the cluster Waitstate for SysNode <sysnode>, by faking a successful host elimination! If <sysnode> is actuality still Online, and/or if any applications are Online, this hvutil -u command may result in data corruption!

Content:

Information message.

Corrective action:

No action is required.

(ADC, 23) File <filename> can't be opened: <errortext>.

Content:

A file to be sent to a remote node cannot be opened.

Corrective action:

Check the error text <errortext> or other WARNING/ERROR messages.

(ADC, 24) File cannot be open for read.

Content:

A file to be sent to a remote node cannot be read.

Corrective action:

Message (ADC, 23) is also output. Check the error test of (ADC, 23) <errortext> or other WARNING/ERROR messages.

(ADC, 51) hvshut utility has timed out.

Content:

The hvshut command was timed out.

When the hvshut command is executed with either -l/-s/-a option, some resources that are included in cluster applications may fail to stop.

Corrective action:

To prevent the timeout of the hvshut command, depending on your environment, change RELIANT_SHUT_MIN_WAIT, which is the global environment variable of RMS, to a larger value.

See

For details on RELIANT_SHUT_MIN_WAIT, see "RELIANT_SHUT_MIN_WAIT" in "Global environment variables" of the following manual below:

  • For PRIMECLUSTER 4.3A30 or later: "PRIMECLUSTER Reliant Monitor Services (RMS) with Wizard Tools Configuration and Administration Guide."

See "PRIMECLUSTER Reliant Monitor Services (RMS) with Wizard Tools Configuration and Administration Guide" for how to refer to and change the RMS environment variables.

Take either of following actions depending on with which option the hvshut command has been executed.

  • With -l option

    Shut down the OS on the node on which the command has been executed, or stop the node forcibly.

  • With -s option

    Shut down the OS on the target node of the command, or stop the node forcibly.

  • With -a option

    Shut down the OS on all the nodes except a node on which RMS has ended normally, or stop the node forcibly.

  • With -L option

    When the BM (base monitor) process does not stop on the node on which the command has been executed, execute the hvshut -f command to stop RMS forcibly. No action is required when the BM process stops.

  • With -A option

    When the BM process does not stop on some nodes, execute the hvshut -f command on these nodes to stop RMS forcibly. No action is required when the BM process stops on all the nodes.

(ADC, 65) Since RMS on this host has already encountered other Online nodes, it will remain running. However, no nodes reporting incorrect checksums will be brought Online.

Content:

RMS internal error.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

6.1.2.2 ADM: Admin, command, and detector queues

(ADM, 61) object is deactivated. Switch request skipped.

Content:

A switch request cannot be performed for a userApplication in the Deact State

Corrective action:

Activate the userApplication and issue the switch request again.

(ADM, 65) System hostname is currently down !!!!

Content:

The hvswitch command has been executed for a currently shutdown node.

Corrective action:

Start the target node to execute the switch request again or select other node.

(ADM, 69) Shutting down RMS while resource resource is not offline.

Content:

RMS is shut down even though a resource is not offline.

Corrective action:

Shut down the OS or stop the node forcibly.

(ADM, 80) Application <userapplication> has a not null attribute ControlledSwitch. Therefore, it should be switched from the controller. 'hvswitch' command ignored.

Content:

Information message.

Corrective action:

No action is required.

(ADM, 105) Shutdown on targethost <sysnode> in progress. Switch request for application <object> skipped!

Content:

The target node of the switch request is responding to an earlier shutdown request. The switch request is cancelled.

Corrective action:

No action is required.

(ADM, 110) Sysnode <node> has been marked as going down, but failed to become Offline. Check for a possibly hanging shutdown. Note that this SysNode cannot re-join the cluster without having finished its shutdown to avoid cluster inconsistency!

Content:

Information message.

Corrective action:

No action is required.

(ADM, 111) Timeout occured for local hvshut request. Reporting a failure back to the command now!

Content:

The hvshut command was timed out.

When the hvshut command is executed with either -l/-s/-a option, some resources that are included in cluster applications may fail to stop.

Corrective action:

To prevent the timeout of the hvshut command, depending on your environment, change RELIANT_SHUT_MIN_WAIT, which is the global environment variable of RMS, to a larger value.

See

For details on RELIANT_SHUT_MIN_WAIT, see "RELIANT_SHUT_MIN_WAIT" in "Global environment variables" of the following manual below:

  • For PRIMECLUSTER 4.3A30 or later: "PRIMECLUSTER Reliant Monitor Services (RMS) with Wizard Tools Configuration and Administration Guide."

See "PRIMECLUSTER Reliant Monitor Services (RMS) with Wizard Tools Configuration and Administration Guide" for how to refer to and change the RMS environment variables.

Take either of following actions depending on with which option the hvshut command has been executed.

  • With -l option

    Shut down the OS on the node on which the command has been executed, or stop the node forcibly.

  • With -s option

    Shut down the OS on the target node of the command, or stop the node forcibly.

  • With -a option

    Shut down the OS on all the nodes except a node on which RMS has ended normally, or stop the node forcibly.

  • With -L option

    When the BM (base monitor) process does not stop on the node on which the command has been executed, execute the hvshut -f command to stop RMS forcibly. No action is required when the BM process stops.

  • With -A option

    When the BM process does not stop on some nodes, execute the hvshut -f command on these nodes to stop RMS forcibly. No action is required when the BM process stops on all the nodes.

(ADM, 113) Terminating due to a timeout of RMS shutdown. All running scripts will be killed!

Content:

Information message.

Corrective action:

No action is required.

(ADM, 114) userapplication: Shutdown in progress. AutoSwitchOver (ShutDown) attribute is set, but the userApplication failed to reach a settled Offline state. SwitchOver must be skipped!

Content:

A userApplication userApplication failed to switch to the Offline state while RMS is being shut down. In this case, the switch request is cancelled even if the ShutDown option is specified for the AutoSwitchOver attribute.

Corrective action:

Check if RMS was already shut down and the switch request is cancelled. After that, switch the userApplication userApplication manually. Check logs for why the userApplication failed to switch to the Offline state.

(ADM, 115) Received "old style" shutdown contract, though no host with RMS 4.0 is member of the cluster. Discarding it!

Content:

RMS internal error.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(ADM, 116) Received "new style" shutdown contract, though at least one host with RMS 4.0 is member of the cluster. Discarding it!

Content:

RMS internal error.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(ADM, 129) Shutdown on targethost <sysnode> in progress. Switch request for resource <resource> skipped!

Content:

The target node of the switch request is responding to an earlier shutdown request. The switch request is cancelled.

Corrective action:

No action is required.

6.1.2.3 BAS: Startup and configuration errors

(BAS, 1) Object <object> is not offline!

Content:

The offline processing for the object <object> failed, and the object is still partially online, so the switch request is cancelled.

Corrective action:

Check the log files to see why the offline processing of the object <object> failed.

(BAS, 8) Object <object> has no rName attribute. The rName attribute is normally used by the generic detector to determine which resource to monitor. Be sure that your detector can function without an rName attribute.

Content:

The object <object> has no rName attribute. This attribute is required for a generic RMS detector; however, it may not exist in a custom detector.

Corrective action:

No action is required if the corresponding custom detector is properly designed. However, if the generic detector is soon used or will be used with this object, specify the rName attribute.

(BAS, 22) DetectorStartScript for kind <kind> is not defined in either .us or hvgdstartup files, therefore RMS will be using default <gkind -kkind -ttimeperiod>.

Content:

RMS internal error.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

6.1.2.4 BM: Base monitor

(BM, 4) The CF cluster timeout <cftimeout> exceeds the RMS timeout <rmstimeout>. This may result in RMS node elimination request before CF timeout is exceeded. Please check the CF timeout specified in "/etc/default/cluster.config" and the RMS heartbeat miss time specified by hvcm '-h' option.

(BM, 8) Failed sending message <message> to object <object> on host <host>.

Content:

When RMS encounters some problems in transmitting the message message to some other host in the cluster, it prints this message. This could be due to the fact that the RMS on the other host is down or there might be a network problem.

Corrective action:

Make sure that the RMS is running on the other hosts in the cluster and no network issues exist.

When fjsnap command, pclsnap command, or hvdump command is executed while some nodes that configure a cluster are stopped, this message may be printed. In this case, no action is required.

(BM, 28) Application <userapplication> has a not null attribute ControlledHvswitch. Therefore, it should be switched on/off from the controller. 'hvutil -f/-c' command ignored.

Content:

The hvutil -f/-c command is ignored because the userApplication <userapplication> is controlled by a scalable application.

Corrective action:

Use the command for a scalable application.

(BM, 30) Ignoring dynamic modification failure for object <object>:attribute <attribute> is invalid.

Content:

RMS internal error.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(BM, 31) Ignoring dynamic modification failure at line linenumber: cannot modify attribute <attribute> of object <object> with value <value> because the attribute does not exist.

Content:

RMS internal error.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(BM, 53) The RMS-CF-CIP mapping cannot be determined for any host due to the CIP configuration file <configname> cannot be opened. Please verify all entries in <configfilename> are correct and that CF and CIP are fully configured.

(BM, 70) Some messages were not sent out during RMS shutdown.

Content:

Information message.

Corrective action:

No action is required.

(BM, 76) Failed to find "rmshb" port address in /etc/services. The "hvutil -A" command will fail until a port entry for "rmshb" is made in the /etc/services file and RMS is restarted.

Content:

RMS internal error.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(BM, 77) Failed to allocate a socket for "rmshb" port monitoring.

Content:

The socket() call failed to allocate a port for rmshb.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(BM, 78) The reserved port for "rmshb" appears to be in use. The "rmshb" port is reserved in the /etc/services file but another process has it bound already. Select another port by editing the /etc/services file and propagate this change to all nodes in the cluster and then restart RMS.

Content:

RMS internal error.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(BM, 79) Failed to listen() on the "rmshb" port.

Content:

The listen() system call failed to call rmshb port.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(BM, 82) A message to host <remotehost> failed to reach that host after <count> delivery attempts. Communication with that host has been broken.

Content:

Make sure remotehost is running and that communication between the two hosts is possible. Check if communication with the remote node is possible by using a standard method such as ping. After that, restart RMS on the local node.

Corrective action:

The communication between two hosts should be completely established. After that, restart the RMS monitor.

(BM, 83) Failed to execute the fcntl system call.

Content:

RMS was unable to set the close-on-exec flag using fcntl.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(BM, 85) Application <userapplication> has a not null attribute attribute. Therefore, it should be deactivated from the controller. 'hvutil -d' command ignored.

Content:

The hvutil -d command is ignored because the userApplication <userapplication> is controlled by a scalable application.

Corrective action:

Use the command for a scalable application.

(BM, 112) Controller <controller> has its attribute Follow set to 1, while its ClusterExclusive attribute is set to 0. However, it is controlling, directly or indirectly via a chain of Follow controllers, an application <application> -- that application contains a resource named <resource> which attribute ClusterExclusive is set to 1. This is not allowed due to a potential problem of that resource becoming Online on more than one host. Cluster exclusive resources must be controlled by cluster exclusive Follow controllers.

Content:

RMS internal error.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(BM, 119) The RMS base monitor failed to be locked in memory via mlockall() - <errortext>.

Content:

When the environment variable HV_MLOCKALL is set to 1, the base monitor process and a memory allocated by the base monitor process are fixed. This message is output to indicate that the base monitor failed to lock the memory. In this case, RMS uses the unlocked memory to keep running.

Corrective action:

See the error text to find the cause. Check if the memory is sufficient.

6.1.2.5 CTL: Controllers

(CTL, 6) Controller <controller> has detected more than one controlled application Online.

Content:

Information message.

Corrective action:

No action is required.

(CTL, 7) Controller <controller> has its attribute <IgnoreOnlineRequest> set to 1 and its OnlineScript is empty. Therefore, a request Online to the controller might fail to bring the controlled application Online.

Content:

RMS internal error.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(CTL, 8) Controller <controller> has its attribute <IgnoreOffineRequest> set to 1 and its OffineScript is empty. Therefore, a request Offline to the controller might fail to bring the controlled application Offline.

Content:

RMS internal error.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(CTL, 11) Controller <controller> has its attributes StandbyCapable set to 1, its attribute <IgnoreStandbyRequest> set to 1 and its OnlineScript is empty. Therefore, a request Standby to the controller might fail to bring the controlled application Standby.

Content:

RMS internal error.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

6.1.2.6 CUP: userApplication contracts

(CUP, 1) userApplication: priority list conflict detected, trying again ...

Content:

In a priority list, priority of the destination node of a userApplication is configured. When the priority list is renewed by starting or stopping of RMS, the priority list is synchronized to be consistent between the nodes However, when the priority list between the nodes is not consistent temporary, this message is printed.

Corrective action:

No action is required.

(CUP, 9) userApplication: Switch Request skipped, processing of current online host contract is not yet settled.

Content:

A switch request was cancelled because processing of a current online host contract is not yet settled.

Corrective action:

If the userApplication didn't go online, invoke a manual switch request.

(CUP, 11) userapplication offline processing failed! The application is still partially online. The switch request is being skipped.

Content:

The offline processing for the userApplication failed, and the userApplication is still partially online, so the switch request iscancelled.

Corrective action:

Check the log files to see why the offline processing failed.

(CUP, 12) userApplication switch request skipped, required target node is not ready to go online!

Content:

A switch request to a SysNode has been executed. However, the userApplication userApplication cannot be switched to Online on the SysNode.

Corrective action:

Execute the switch request again after the userApplication userApplication can be switched to Online on the SysNode.

(CUP, 13) userApplication switch request skipped, no available node is ready to go online!

Content:

A switch request has been executed. However, No SysNode that includes the userApplication userApplication, which can be switched to Online state, exists.

Corrective action:

Execute the switch request again when a SysNode that includes the userApplication userApplication, which can be switched to Online state, exists.

(CUP, 14) userApplication did not get a response from <sender>.

Content:

A timeout occurred during the contract processing.

Corrective action:

If the userApplication didn't eventually go online, make sure that the userApplication is not online on any of the other nodes, and then invoke a manual switch request.

(CUP, 15) userApplication: targethost <host> is no longer available.

Content:

Information message.

Corrective action:

No action is required.

(CUP, 16) userapplication offline processing failed! The application is still partially online. The switch request is being skipped.

Content:

The offline processing for the userApplication failed and the userApplication is still partially online, so the switch request is cancelled.

Corrective action:

Check the log files to see why the offline processing failed.

(CUP, 17) userApplication: current online host request of host "host" accepted, local inconsistency has been overridden with the forced flag.

Content:

Although a local Inconsistent state existed, the current online host request with the forced switch option ('hvswitch -f') has been accepted. The local inconsistency has been overridden.

Corrective action:

No action is required.

(CUP, 18) userApplication: current online host request of host "host" denied due to a local inconsistent state.

Content:

The current online host request is denied due to a local Inconsistent state.

Corrective action:

Clear the Inconsistent state first.

(CUP, 19) userApplication: is locally online, but is inconsistent on another host
Trying to force a CurrentOnlineHost contract ...

Content:

The application is currently online on the local host but is inconsistent on another host. The application is switched to another host with the forced switch option to override the inconsistency.

Corrective action:

No action is required.

(CUP, 20) userApplication: AutoStart skipped, application is inconsistent on host "hostname".

Content:

The AutoStartUp processing is cancelled due to the Inconsistent state.

Corrective action:

Clear the Inconsistent state.

(CUP, 21) userApplication: FailOver skipped, application is inconsistent on host "hostname".

Content:

The failover processing is cancelled due to the Inconsistent state.

Corrective action:

Clear the Inconsistent state.

(CUP, 22) userApplication: Switch Request skipped, application is inconsistent on host "hostname".

Content:

The switch request is cancelled due to the Inconsistent state.

Corrective action:

Clear the Inconsistent state.

(CUP, 23) userApplication: Switch Request skipped, application is inconsistent on local host.

Content:

The switch request is cancelled due to the Inconsistent state.

Corrective action:

Clear the Inconsistent state.

(CUP, 24) userApplication: Switch Request processed, local inconsistency has been overridden with the forced flag.

Content:

Although a state is inconsistent, a switch request with the forced switch option ('hvswitch -f') is accepted and the local inconsistency has been overridden.

Corrective action:

No action is required.

(CUP, 25) userApplication is currently in an inconsistent state.
The switch request is being skipped.
Clear inconsistency first or you may override this restriction by using the forced switch option.

Content:

The userApplication is currently in an Inconsistent state on the local host. The application cannot be switched until the inconsistency is resolved, so the switch request is cancelled.

Corrective action:

Clear the Inconsistent state.

(CUP, 26) userApplication: LastOnlineHost conflict detected. Processing an AutoStart or PrioSwitch CurrentOnlineHost Contract with OnlinePriority enabled. TargetHost of Switch request is host "host", but the local host is the LastOnlineHost. Denying the request.

Content:

A LastOnlineHost conflict is detected and the local host is the LastOnlineHost, so the application will be brought online on the local host.

Corrective action:

No action is required.

(CUP, 27) userApplication: LastOnlineHost conflict occurred. Skipping local Online request, because host "host" has a conflicting LastOnlineHost entry.

Content:

A LastOnlineHost conflict is detected and the local host is not the LastOnlineHost, so the application will be brought online on the other host.

Corrective action:

No action is required.

(CUP, 28) userApplication: priority switch skipped, cannot get a deterministic information about the LastOnlineHost. Tried to switch to "hostname", but "loh" claims to be the LastOnlineHost.
Conflict may be resolved by system administrator intervention (specifying explicitly the targethost in the hvswitch call).

Content:

A LastOnlineHost conflict is detected, and RMS cannot determine the LastOnlineHost, so the application will not go online anywhere.

Corrective action:

Invoke a switch request specifying the target host.

(CUP, 29) userApplication: LastOnlineHost conflict occurred. Timestamps of conflicting LastOnlineHosts entries do not allow a safe decision, because their difference is lower than time seconds.
Conflict must be resolved by system administrator intervention (invalidate LastOnlineHost entry via "hvutil -i userApplication" and invoke an explicate hvswitch call).

Content:

A LastOnlineHost conflict is detected, and the timestamps of conflicting LastOnlineHost entries do not allow a safe decision because their difference is lower than HV_LOH_INTERVAL. Therefore, the application will not go online anywhere.

Corrective action:

Invalidate the LastOnlineHost entry with 'hvutil -i <userapplication>', and then invoke a switch request specifying the target host.

(CUP, 30) userApplication: Denying maintenance mode request. userApplication is busy or is in stateFaulted.

Content:

A maintenance mode request, i.e., 'hvutil -m on/off' is denied because the userApplication is busy or is in the Faulted state.

Corrective action:

Clear the Faulted state and retry the maintenance mode request.

(CUP, 31) userApplication: maintenance mode request was denied by the remote SysNode "SysNode" because userApplication is busy or is in stateFaulted or not ready to leave Maintenance Mode. See remote switchlog for details

Content:

A maintenance mode request, i.e., 'hvutil -m on/off', is denied because the userApplication is busy, is in the Faulted state, or is not ready to leave maintenance mode.

Corrective action:

See the remote switchlog for details.

(CUP, 32) userApplication: Denying maintenance mode request. The following object(s) are not in an appropriate state for safely returning to normal operation: <resource>

Content:

A maintenance mode request ('hvutil -m on/off') is denied because the resources are not in an appropriate state for safely returning to normal operation.

Corrective action:

Fix the states of the listed resources.

(CUP, 33) userApplication: Denying maintenance mode request. The initialization of the state of the userApplication is not yet complete.

Content:

A maintenance mode request ('hvutil -m on/off') is denied because the initialization of the state of the userApplication is not yet complete.

Corrective action:

Wait for the initialization of the state of the userApplication and retry the maintenance mode request.

(CUP, 34) userApplication: LastOnlineHost conflict detected. Processing an AutoStart or PrioSwitch CurrentOnlineHost Contract with OnlinePriority enabled. TargetHost of Switch request is host "host", but the local host is the LastOnlineHost. The local host takes over Switch request.

Content:

A LastOnlineHost conflict is detected, and the local host is the LastOnlineHost, so the application will be brought online on the local host.

Corrective action:

No action is required.

6.1.2.7 DET: Detectors

(DET, 29) Resource <resource>: received detector report DetReportsOnlineWarn. The WarningScript "warningscript" will be run.

Content:

Information message.

Corrective action:

No action is required.

(DET, 31) Resource <resource> received detector report "DetReportsOfflineFaulted", the posted state will become <offlinefault> until one of the subsequent reports "DetReportsOffline", "DetReportsOnline", "DetReportsStandby" or "DetReportsFaulted"

Content:

Information message.

Corrective action:

No action is required.

(DET, 35) Resource <resource> received detector report "DetReportsOnlineWarn", the WarningScript is not defined and will not be run.

Content:

Information message.

Corrective action:

No action is required.

6.1.2.8 SCR: Scripts

(SCR, 17) Resource <resource> WarningScript has failed with status status.

Content:

The WarningScript of the resource <resource> has ended abnormally with the status <status>.

Corrective action:

Investigate if the WarningScript that is set for the resource <resource> has problems.

(SCR, 25) Controller <resource> StateChangeScript has failed with status status.

Content:

StateChangeScript exited with exit code n.

Corrective action:

Check that there are no problems with the controller that notified exit code n and is set by StateChangeScript.

(SCR, 31) AppStateScript of userApplication userapplication has failed with status status.

Content:

RMS internal error.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

6.1.2.9 SWT: Switch requests (hvswitch command))

(SWT, 1) The 'AutoStartUp' attribute is set and the HV_AUTOSTART_WAIT time for the user application <appli> has expired, without an automatic start up having yet taken place. Reason: not all necessary cluster hosts are online!

Content:

This message indicates that AutoStartup=1, PartialCluster=0, and not all the nodes were started within the HV_AUTOSTART_WAIT time.

Corrective action:

No action is required.

(SWT, 5) AutoStartUp skipped by object. Reason: object is faulted!

Content:

The AutoStartUp is cancelled due to the Faulted state.

Corrective action:

Clear the Faulted state.

(SWT, 6) AutoStartUp skipped by object. Reason: Fault occurred during initialization!

Content:

The AutoStartUp is cancelled due to the Faulted state.

Corrective action:

Clear the Faulted state.

(SWT, 7) AutoStartUp skipped by object. Reason: object is deactivated!

Content:

The AutoStartUp is cancelled because the userApplication is in the Deact state.

Corrective action:

Activate the userApplication and start the application manually.

(SWT, 8) AutoStartUp skipped by object. Reason: not all necessary cluster hosts are online!

Content:

The AutoStartUp is cancelled because the PartialCluster attribute is set to 0 and not all necessary cluster hosts are online.

Corrective action:

Start RMS on all necessary cluster hosts and then start the application manually if necessary.

(SWT, 11) object: no responsible node available, switch request skipped.

Content:

The switch request is cancelled because no node to which a userApplication can be switched is available.

Corrective action:

Execute the switch request again after the node to which userApplication can be switched. The destination node should include userApplications that are in Offline or Standby state.

(SWT, 12) object is busy or locked, switch request skipped.

Content:

The switch request is cancelled because <object> is either busy or locked.

Corrective action:

Wait until <object> is in a switchable state and then issue the request again.

(SWT, 13) Not all necessary cluster hosts for application <userapplication> are online, switch request is being skipped. If the application should be brought online anyway, use the force flag. Be aware, however, that forcing the application online could result in an inconsistent cluster if the application is online somewhere else!

Content:

The switch request is cancelled because not all necessary cluster hosts for the application are online.

Corrective action:

If the application should be brought online anyway, use the forced switch option ('hvswitch -f').

Note

A forced application switch overrides all safety checks and could therefore result in data corruption or other inconsistencies.In PRIMECLUSTER 4.3A10 or later (Solaris)or PRIMECLUSTER 4.3A30 or later (Linux), RMS may kill the node on which RMS is not running before starting the application to reduce the risk of data corruption when the Forced switch request of an application is issued.

(SWT, 14) object is deactivated, switch request skipped.

Content:

The switch request is cancelled because the application has been deactivated.

Corrective action:

Activate the application and then issue the request again.

(SWT, 16) Switch request skipped, no target host found or target host is not ready to go online!

Content:

The target host was either not found or not ready to go online, so the switch request is cancelled.

Corrective action:

Wait for the target host to go online or start the target host.

(SWT, 18) object: is not ready to go online on local host, switch request Hskipped!

Content:

The switch request is cancelled because the application or the local host is in a transitional state.

Corrective action:

Wait until both the application and the local host are online and then issue the request again.

(SWT, 19) object: is not ready to go online on local host, trying to find another host.

Content:

For a priority or 'last online host' switch, if the target host of the switch is the node where the application is faulted, then the switch request is denied and the switch request is forwarded to another host in the cluster.

(SWT, 21) object: local node has faulted or offlinefaulted descendants, no other node is ready to go online, switchover skipped.

Content:

The switch request is cancelled because the local node has Faulted or OfflineFaulted descendants and no other node is ready to go online.

Corrective action:

Clear the Faulted/OfflineFaulted state.

(SWT, 22) object: local node has faulted or offlinefaulted descendants, forwarding switchover request to next host: targethost.

Content:

Information message.

Corrective action:

No action is required.

(SWT, 23) object is busy or locked, deact request skipped.

Content:

The Deact request cannot be processed because the target application is in busy or locked state.

Corrective action:

Wait until the target application status is changed, and then execute the Deact request again.

(SWT, 24) object is deactivated, switch request skipped.

Content:

The switch request cannot be processed because the target application is in the Deact state.

Corrective action:

Activate the target application.

(SWT, 28) hostname is unknown locally!

Content:

Information message.

Corrective action:

No action is required.

(SWT, 30) <object> was Online on <onlinehost>, which is not reachable. Switch request must be skipped to ensure data integrity.  This secure mechanism may be overridden with the forced flag (-f) of the hvswitch command.  WARNING: Ensure, that no  further access to the data is performed by <onlinehost>, otherwise the use of the -f flag may break data consistency!

Content:

The object was Online on the remote node onlinehost, but it's currently inoperable. This could occur due to a previous shutdown of onlinehost via 'hvshut -f', or it could be a timing issue. The switch request is cancelled to protect data.

Corrective action:

If the application should be brought online anyway, use the forced switch option ('hvswitch -f').

Note

A forced application switch overrides all safety checks and could therefore result in data corruption or other inconsistencies.

In PRIMECLUSTER 4.3A10 or later (Solaris) or PRIMECLUSTER 4.3A30 or later (Linux), RMS may kill the node on which RMS is not running before starting the application to reduce the risk of data corruption when the Forced switch request of an application is issued. If the previous shutdown of onlinehost was not via 'hvshut -f', this could be a timing issue, so wait a moment and try it again.

(SWT, 31) <object> was Online on <onlinehost>, which is not reachable. Caused by the use of the force flag the RMS secure mechanism has been overridden, Switch request is processed.

Content:

The object was Online on the remote node onlinehost, but it's currently inoperable. However, the switch request is processed because the forced switch option ('hvswitch -f') is used.

Corrective action:

No action is required.

(SWT, 32) <object> is currently in an inconsistent state on local host. The switch request is being skipped. Clear inconsistency first or you may override this restriction by using the forced switch option.

Content:

The application is currently in an Inconsistent state on the local host. The application cannot be switched until the inconsistency is resolved, so the switch request is cancelled.

Corrective action:

You can either clear the inconsistency first, or you can override this restriction by using the forced switch option ('hvswitch -f').

Note

A forced application switch overrides all safety checks and could therefore result in data corruption or other inconsistencies.

In PRIMECLUSTER 4.3A10 or later (Solaris) or PRIMECLUSTER 4.3A30 or later (Linux), RMS may kill the node on which RMS is not running before starting the application to reduce the risk of data corruption when the Forced switch request of an application is issued.

(SWT, 33) <object> is not ready to go online on the local host. Due to a local inconsistent state no remote targethost may be used. The switch request is being skipped.

Content:

The application is currently in an Inconsistent state on the local host. The application cannot be switched until the inconsistency is resolved, so the switch request is cancelled.

Corrective action:

You can either clear the inconsistency first, or you can override this restriction by using the forced switch option ('hvswitch -f').

Note

A forced application switch overrides all safety checks and could therefore result in data corruption or other inconsistencies.

In PRIMECLUSTER 4.3A10 or later (Solaris) or PRIMECLUSTER 4.3A30 or later (Linux), RMS may kill the node on which RMS is not running before starting the application to reduce the risk of data corruption when the Forced switch request of an application is issued.

(SWT, 34) <object> is not ready to go online on local host
trying to find another host.

Content:

The userApplication is not ready to go online on the local host, so RMS forwards the switch request to the next host in its priority list.

Corrective action:

No action is required.

(SWT, 35) object is not ready to go online on local host switch request skipped.

Content:

The userApplication is not ready to go online on the local host so the direct switch request is cancelled.

Corrective action:

Make sure that the userApplication is in Offline or Standby state on the local host.

(SWT, 36) <sysnode> is in Wait state, switch request skipped.

Content:

The sysnode is in the Wait state, so the switch request is cancelled.

Corrective action:

Wait for the node to get out of the Wait state and try the switch request again.

(SWT, 37) AutoStartUp for application <userapplication> is ignored since hvmod had been invoked with the flag '-i'.

Content:

Information message.

Corrective action:

No action is required.

(SWT, 58) Processing policy switch request for application userapplication. The cluster host sysnode is in a Wait state, no switch request can be processed. The application will go offline now.

Content:

Information message.

Corrective action:

No action is required.

(SWT, 59) Processing policy switch request for application userapplication. No cluster host is available to take over this application. The application will go offline now.

Content:

Information message.

Corrective action:

No action is required.

(SWT, 60) Processing policy switch request for application userapplication which is in state Standby. The application will go offline now.

Content:

During a policy switch, if an exclusive application switches to a node, then all applications in the Standby state must go offline because they have a lower priority. This message simply warns the user that the application is in the Standby state and will be going offline due to the above reason.

Corrective action:

No action is required.

(SWT, 69) AutoStartUp for application <userapplication> is ignored since, the environment variable HV_AUTOSTARTUP is set to 0.

Content:

The application doesn't start up automatically because the environment variable HV_AUTOSTARTUP is set to 0, and this overrides each application's AutoStartUp attribute.

Corrective action:

To allow application startup according to each application's AutoStartUp attribute, set the environment variable HV_AUTOSTARTUP to 1.

(SWT, 72) userapplication received Maintenance Mode request from the controlling userApplication. The request is denied, because the state is either Faulted or Deact or the application is busy or locked.

Content:

The maintenance mode request from the controlling userApplication is denied because the state is either Faulted or Deact or the application is busy or locked.

Corrective action:

Clear the Faulted or Deact state and try it again.

6.1.2.10 SYS: SysNode objects

(SYS, 16) The RMS internal SysNode name "sysnode" is not compliant with the naming convention of the Reliant Cluster product. A non-compliant setting is possible, but will cause all RMS commands to accept only the SysNode name, but not the HostName (uname -n) of the cluster nodes!

Content:

The SysNode of RMS is not consistent with the format <`uname -n`>RMS. On Oracle Solaris zone environments, in the configuration in which PRIMECLUSTER is used, when the host name in the non-global zone includes capital letters, this message may be printed when RMS in the non-global zone is started.

Corrective action:

No action is required.

(SYS, 18) The SysNode <sysnode> does not follow the RMS naming convention for SysNodes. To avoid seeing this message in the future, please rename the SysNode to use the CF-based name of the form "<CFname>RMS" and restart the RMS monitor.

Content:

The SysNode name of RMS is not consistent with <CFname>RMS.

Corrective action:

Change the SysNode name to <CFname>RMS.

(SYS, 88): No heartbeat from cluster host sysnode within the last 10 seconds. This may be a temporary problem caused by high system load. RMS will react if this problem persists for time seconds more.

Content:

As heartbeat between RMS's was lost and no responses were returned for <time > sec or more, forcible stop is executed.

Corrective action:

The following causes are possible. Take the necessary action according to the cause.

  • Cluster interconnect cannot communicate because of hardware failure. Remove the hardware failure cause by replacing the LAN card or cable.

  • High CPU load has been continued for long time to the degree RMS cannot process heartbeat. Remove the Review the process that the host on the SysNode <SysNode > has high load.

  • The clock was set back rapidly with NTP. Slowly adjust the clock with NTP.

(SYS, 99) The attribute <alternateip>  specified for SysNode <sysnode> should not be used in CF mode. Ignoring it.

Content:

Information message.

Corrective action:

No action is required.

6.1.2.11 UAP: userApplication objects

(UAP, 2) object got token token from node node.
TOKEN SKIPPED! Reason: reason.

Content:

This message gives a reason for skipping a particular action. This message is output when a process is executed by Cluster Admin or other function while some other process is executed. For example, it is generated to show that the Offline processing that has been requested during the PreCheck processing prior to the transition of the userApplication to Standby is ignored because the userApplication is currently executing the Standby processing.

Corrective action:

No action is required.

(UAP, 3) object: double fault occurred and Halt attribute is set. Halt attribute will be ignored, because no other cluster host is available.

Content:

The HaltFlag attribute will be ignored if there are no more available hosts.

Corrective action:

Make sure that there is a sufficient number of available cluster hosts.

(UAP, 4) object has become online, but is also in the HV_AUTOSTARTUP_IGNORE list of cluster hosts to be ignored on startup! The Cluster may be in an inconsistent condition!

Content:

Information message.

Corrective action:

No action is required.

(UAP, 11) object is not ready to go online on local node. Online processing skipped.

Content:

The userApplication is not ready to go online on the local node because it is busy or in the Faulted state.

Corrective action:

Clear the Faulted state.

(UAP, 12) object: targethost of switch request: <host> no longer available, request skipped.

Content:

Information message.

Corrective action:

No action is required.

(UAP, 14) object: is not ready to go online on local host. Switch request skipped.

Content:

Information message.

Corrective action:

No action is required.

(UAP, 18) SendUAppLockContract(): invalid token: token.

Content:

During contract processing, the invalid token is received.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(UAP, 25) AutoStartUp skipped by object. Reason: not all necessary, cluster hosts are online!

Content:

The userApplication didn't start up automatically because not all necessary cluster hosts are online.

Corrective action:

Enable all necessary cluster hosts to be Online..

(UAP, 30) object is not ready to go online on local host. Trying to find another host.

Content:

The userApplication is not ready to go online on local host so find another host.

Corrective action:

No action is required.

(UAP, 52) userapplication: double fault occurred and Halt attribute is set. Halt attribute will be ignored, because attribute AutoSwitchOver is set to attrvalue.

Content:

Information message.

Corrective action:

No action is required.

6.1.2.12 US: us files

(US, 10) object: userApplication transitions into stateOnline, though it was faulted before according to the persistent Fault info. Check for possible inconsistencies

Content:

Information message.

Corrective action:

No action is required.

(US, 23) appli: double fault occurred, processing terminated.

Content:

Further processing for < appli > will be stopped because of the double fault.

Corrective action:

Check the other messages in the switchlog to determine the reason for the double fault. Clear the double fault.

(US, 28) object: PreCheck failed
switch request will be canceled now and not be forwarded to another host, because this was a directed switch request, where the local host has explicitely been specified as targethost.

Content:

A PreCheckScript failed during a directed switch request, i.e., the target host of the request was explicitly specified. In this case the switch request is cancelled, so it is not forwarded to the next host in the priority list.

Corrective action:

Invoke a new switch request specifying the next host as target host. If you want RMS to forward the request automatically, you should invoke a priority switch (hvswitch without a specified target host).

(US, 29) object: PreCheck failed
trying to find another host ...

Content:

A PreCheckScript failed during a priority switch request. In this case the switch request is forwarded to the next host in the priority list.

Corrective action:

No action is required.

(US, 43) object: PreCheck failed
Standby request canceled.

Content:

Execution of the PreCheckScript has failed and standby processing will be stopped.

Corrective action:

Check to see why the PreCheckScript has failed and correct the script if necessary.

(US, 45) object: PreCheck failed
switch request will be canceled now and not be forwarded to another host, because AutoSwitchOver=ResourceFailure is not set.

Content:

A PreCheckScript failed and the AutoSwitchOver attribute did not include the ResourceFailure option. In this case RMS will not take automatic action in the event of a script failure. The switch request is cancelled, and it is not forwarded to the next host in the priority list.

Corrective action:

Invoke a new switch request specifying the next host as the target. If you want RMS to forward the request automatically, turn on the ResourceFailure option of the AutoSwitchOver attribute.

(US, 47) userapplication: Processing of Clear Request resulted in a Faulted state. Resuming Maintenance Mode nevertheless.
It is highly recommended to analyse and clear the fault condition before leaving Maintenance Mode!

Content:

A Clear request ('hvutil -c') was issued for an application <userapplication> in maintenance mode. It failed to clear the state of the graph and resulted in a Faulted state of the application.

Corrective action:

Check the switchlog for the origin of the failure. Fix the failure condition and re-run 'hvutil -c'. Do not leave maintenance mode until the fault condition has been cleared.

(US, 55) object: PreCheck failed, because the controller userApplication of type LOCAL "userapplication" is not ready to perform a PreCheck.

Content:

RMS internal error.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

6.1.2.13 WLT: Wait list

(WLT, 6) Resource resource's script did not terminate gracefully after receiving SIGTERM.

Content:

A resource script failed to end normally.

Corrective action:

Check if the timeout has occurred in the script.

6.1.2.14 WRP: Wrappers

(WRP, 11) Message send failed, queue id <queueid>, process <process>, <name>, to host <node>.

Content:

RMS exchanges messages between processes and hosts to maintain inter-host communication. If the delivery of a message has failed then this error is printed. This can occur if one or more hosts in the cluster are not active or if there is a problem with the network.

Corrective action:

(i) Check the other hosts in the cluster. If any are not alive, check the switchlog for information regarding why RMS has died on those hosts. Perform the following steps in order:

  1. 'hvdisp -a'

  2. In the output of step 1., check if the state of any of the resources whose type is SysNode is offline. If so, that means that RMS is not running on that node.

  3. Check the switchlogs of all the nodes that are offline to determine the reason why RMS on that node is not active.

(ii) If the other hosts that are part of the cluster are alive then that means there is some problem with the network.

When fjsnap command, pclsnap command, or hvdump command is executed while some nodes that configure a cluster are stopped, this message may be printed. In this case, no action is required.

(WRP, 39) The RMS base monitor has not been able to process timer interrupts for the last n seconds. This delay may have been caused by an unusually high OS load. The differences between respective values of the times() system call are for tms_utime utime, for tms_stime stime, for tms_cutime cutime, and for tms_cstime cstime. If this condition persists, then normal RMS operations can no longer be guaranteed; it can also lead to a loss of heartbeats with remote hosts and to an elimination of the current host from the cluster.

Content:

RMS was unable to operation for certain seconds (n sec).

Corrective action:

This message may be generated by the temporary high load on CPU. The RMS will return to its normal operation when the load is reduced. You can usually ignore this message if the high CPU load lasts for only a short time.

(WRP, 41) The interconnect entry <interconnect> specified for SysNode <sysnode> has the same IP address as that of the interface <existinginterconnect>.

Content:

The IP address of the interconnects <interconnect> and the existing interconnects <existinginterconnect> are the same.

Corrective action:

Check if different IP address is specified for each interconnects.

(WRP, 51) The 'echo' service for udp may not have been turned on, on the local host. Please ensure that the echo service is turned on.

Content:

The echo service of UDP may not be valid on the local node.

Corrective action:

Check if the echo service is valid and activated.