6.1.3 Non-fatal Error Messages

Component name	Reference
ADC	"6.1.3.1 ADC: Admin configuration"
ADM	"6.1.3.2 ADM: Admin, command, and detector queues"
BAS	"6.1.3.3 BAS: Startup and configuration error"
BM	"6.1.3.4 BM: Base monitor"
CML	"6.1.3.5 CML: Command line"
CRT	"6.1.3.6 CRT: Contracts and contract jobs"
CTL	"6.1.3.7 CTL: Controllers"
CUP	"6.1.3.8 CUP: userApplication contracts"
DET	"6.1.3.9 DET: Detectors"
GEN	"6.1.3.10 GEN: Generic detector"
INI	"6.1.3.11 INI: init script"
MIS	"6.1.3.12 MIS: Miscellaneous"
QUE	"6.1.3.13 QUE: Message queues"
SCR	"6.1.3.14 SCR: Scripts"
SWT	"6.1.3.15 SWT: Switch requests (hvswitch command)"
SYS	"6.1.3.16 SYS: SysNode objects"
UAP	"6.1.3.17 UAP: userApplication objects"
US	"6.1.3.18 US: us files"
WLT	"6.1.3.19 WLT: Wait lis"
WRP	"6.1.3.20 WRP: Wrappers"

(ADC, 1) Since this host <hostname> has been online for no more than time seconds and due to the previous error, it will shut down now.

Content:

time is the value of the environment variable HV_CHECKSUM_INTERVAL, if set, or 120 seconds otherwise. This message could appear when the checksums of the configurations of the local and the remote host are different, no more than time seconds have elapsed, and one of the following is true:

When the remote host is joining the cluster, and all the applications on the local host are either Offline or Faulted. RMS exits with exit code 60.
The configuration for the local host does not include the remote host, but the configuration for the remote host does include the local host. The local host hostname will shut down with exit code 60.

Corrective action:

The local and the remote hosts are running different configurations. Make sure that both of them are running the same configuration.

(ADC, 2) Since not all of the applications are offline or faulted on this host <hostname>, and due to the previous error, it will remain online, but neither automatic nor manual switchover will be possible on this host until <detector> detector will report offline or faulted.

Content:

The checksums of the configurations of the local and the remote hosts are different, no more than the number of seconds determined by the value of the environment variable HV_CHECKSUM_INTERVAL have passed, and not all of the applications are offline or faulted. RMS will continue to remain online, but neither automatic nor manual switchover will be possible on this host until the detector detector reports offline or faulted.

Corrective action:

Make sure that both the local and remote hosts are running with the same RMS configuration file.

(ADC, 3) Remote host <hostname> reported the checksum (remotechecksum) which is different from the local checksum (localchecksum).

Content:

This message is output in the following situations.

The checksum of the configuration file reported by the remote host hostname is different from the checksum of the configuration file on the local host.
Setting of the RMS global environment variable differs depending on each node.

Corrective action:

Take following actions depending on the situation.

The checksum of the configuration file reported by the remote host hostname is different from the checksum of the configuration file on the local host.
The most likely cause for this would be that the local host and the remote host are running with different configuration files. Make sure that the local host and the remote host are running the same configuration file.
Setting of the RMS global environment variable differs depending on each node.
Correct the hvenv.local on all the nodes and then restart RMS.

(ADC, 4) Host <hostname> is not in the local configuration.

Content:

This message is output when the RMS configuration file reported by the remote node <hostname> and the RMS configuration file on the local node is different. This message is also output when the setting of the RMS global environment variable differs depending on each node.

Corrective action:

The most likely cause for this would be that the local host and the remote host are running with different configuration files. Make sure that the local host and the remote host are running the same configuration file.

Make sure that the When the setting of the RMS global environment variable differs depending on each node, correct the hvenv.local to be consistent on all the nodes and then restart RMS.

(ADC, 5) Since this host <hostname> has been online for more than time seconds, and due to the previous error, it will remain online, but neither automatic nor manual switchover will be possible on this host until <detector> detector will report offline or faulted.

Content:

If the checksums of the configurations of the local and the remote host are different and if more than time seconds have elapsed since this host has gone online (time is the value of the environment variable HV_CHECKSUM_INTERVAL if set, or 120 seconds otherwise), then RMS prints the above message.

Corrective action:

Make sure that all the hosts in the cluster are running with the same configuration file.

(ADC, 15) Global environment variable <envattribute> is not set in hvenv file.

Content:

RMS was unable to set the global environment variable envattribute because it has not been set in hvenv.

envattribute can be any one of the following: RELIANT_LOG_LIFE, RELIANT_SHUT_MIN_WAIT, HV_CHECKSUM_INTERVAL, HV_LOG_ACTION_THRESHOLD, HV_LOG_WARNING_THRESHOLD, HV_WAIT_CONFIG or HV_RCSTART. This will eventually cause RMS to exit with exit code 1.

Corrective action:

Set the value of the environment variable to an appropriate value.

(ADC, 17) <SysNode> is not in the Wait state, hvutil -u request skipped!

Content:

The 'hvutil -u' command has been invoked on a node, but its SysNode is not in the Wait State (internal option).

Corrective action:

Reissue the command after the Sysnode has reached the Wait state.

(ADC, 18) Local environmental variable <envattribute> is not set up in hvenv file.

Content:

RMS was unable to set the local environment variable envattribute because it has not been set in hvenv. envattribute can be any one of the following:

SCRIPTS_TIME_OUT, RELIANT_INITSCRIPT, RELIANT_STARTUP_PATH, HV_CONNECT_TIMEOUT, HV_MAXPROC or HV_SYSLOG_USE. This will eventually cause RMS to exit with exit code 1.

Corrective action:

Set the value of local environment variable in the /opt/SMAW/SMAWRrms/bin/hvenv.local file to an appropriate value.

(ADC, 20) <SysNode> is not in the Wait state. hvutil -o request skipped!

Content:

The 'hvutil -o' command has been invoked on a node, but its SysNode is not in the Wait State.

Corrective action:

Reissue the command after the Sysnode has reached the Wait state.

(ADC, 25) Application <userapplication> is locked or busy, modification request skipped.

Content:

This message is generated if hvmod has been invoked without the -l option and the application is processing some requests.

Corrective action:

Reissue the hvmod command when the userApplication has completed the current switch request.

(ADC, 27) Dynamic modification failed.

Content:

Dynamic modification has failed. The exact reason for the failure is displayed in the message preceding this one.

Corrective action:

Check the switchlog for the error message occurring prior to this message or find out the exact cause of the failure.

(ADC, 30) HV_WAIT_CONFIG value <seconds> is incorrect, using 120 instead.

Content:

If the value of the environment variable HV_WAIT_CONFIG is 0 or has not been set, the default value of 120 is used instead.

Corrective action:

Set the value of HV_WAIT_CONFIG in the /opt/SMAW/SMAWRrms/bin/hvenv.

(ADC, 31) Cannot get the NET_SEND_Q queue.

Content:

RMS uses the NET_SEND_Q queue for transmitting contract information. If there is some problem with this queue, the operation is aborted. The operation can be any one of the following: hvrcp, hvcopy.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(ADC, 32) Message send failed during the file copy of file <filename>.

Content:

A error occurred while transferring file filename across the network.

Corrective action:

Check if there are any problems with the network.

(ADC, 33) Dynamic modification timeout.

Content:

The time taken for dynamic modification is greater than the timeout limit. The timeout limit is the greater of the environment variable MODIFYTIMEOUTLIMIT (if defined) or 0. If the value for the environment variable is 0 or less, the timeout limit is 0. If the variable is not defined, the default timeout limit is 120 seconds.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(ADC, 34) Dynamic modification timeout during start up - bm will exit.

Content:

The time taken for dynamic modification during bm startup is greater than the timeout limit. The timeout limit is the greater of the environment variable MODIFYTIMEOUTLIMIT (if defined) or 0. If the value for the environment variable is 0 or less, the timeout limit is 0. If the variable is not defined, the default timeout limit is 120 seconds. RMS exits with exit code 63.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(ADC, 35) Dynamic modification timeout, bm will exit.

Content:

Critical internal error.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(ADC, 37) Dynamic modification failed: cannot make a non-critical resource <resource> critical by changing its attribute MonitorOnly to 0 since this resource is not online while it belongs to an online application <userapplication>; switch the application offline before making this resource critical.

Content:

During dynamic modification, if there is an attempt to make a non-critical resource resource MonitorOnly while it is not online and the userApplication userapplication is Online this message is printed, and dynamic modification is aborted.

Corrective action:

Switch the userApplication Offline before making the resource critical.

(ADC, 38) Dynamic modification failed: application <userapplication> has no children, or its children are not valid resources.

Content:

If RMS finds that the userApplication userapplication will have no children while performing dynamic modification, this message is printed to the switchlog and dynamic modification is aborted.

Corrective action:

Make sure that the userApplication has valid children while performing dynamic modification.

(ADC, 39) The putenv() has failed (failurereason)

Content:

The wizards use the environment variable HVMOD_HOST during dynamic modification. This variable holds the name of the host on which hvmod has been invoked. If this variable cannot be set with the function putenv(), then this message is printed to the switchlog along with the reason failurereason.

Corrective action:

Check the reason failurereason in the switchlog to find out why this operation has failed and take corrective action based on this.

(ADC, 41) The Wizard action failed (command)

Content:

Wizards make use of an action file during hvmod. If the execution of this action file (command) has failed due to the process exiting by using an exit call, this message is printed to the switchlog along with the reason for this failure.

Corrective action:

Check the switchlog for finding the reason for this failure and rectify it before reissuing the hvmod command.

(ADC, 43) The file transfer for <filename> failed in "command". The dynamic modification will be aborted.

Content:

Corrective action:

Make sure that host and cluster conditions are such that command can be safely executed.

(ADC, 44) The file transfer for <filename> failed in "command". The join will be aborted.

Content:

When a host joins a cluster, it receives a cluster configuration file. If, for any reason, a file transfer fails, the dynamic modification is aborted.

Corrective action:

Make sure that host and cluster conditions are such that command can be safely executed.

(ADC, 45) The file transfer for <filename> failed in "command" with errno <errno> - errorreason.The dynamic modification will be aborted.

Content:

During dynamic modification, files containing modification information are transferred between the hosts of the cluster. If, for any reason, a file transfer fails, the dynamic modification is aborted. A specific reason for this failure is referred to by the OS error code ERRNO and its explanation in ERRORREASON. The list is also available at "Appendix B Solaris/Linux ERRNO table" in this manual.

Corrective action:

Make sure that host and cluster conditions are such that command can be safely executed.

(ADC, 46) The file transfer for <filename> failed with unequal write byte count, expected expectedvalue actual actualvalue. The dynamic modification will be aborted.

Content:

During dynamic modification, files containing modification information are transferred between the hosts of the cluster.

Corrective action:

Make sure that host, cluster and network conditions are such that command can be safely executed.

(ADC, 47) RCP fail:can't open file filename.

Content:

If the file filename that has been specified as the file to be copied from the local host to the remote host cannot be opened for reading, this message is printed.

Corrective action:

Make sure that the file filename is readable.

(ADC, 48) RCP fail:fseek errno errno.

Content:

During a file transfer between the hosts, RMS encountered a problem indicated by the OS error code ERRNO.

Corrective action:

Make sure that the host, cluster and network conditions are such that file transfer proceeds without errors.

(ADC, 49) Error checking hvdisp temporary file <filename>, errno <errno>, hvdisp process pid <processid> is restarted.

Content:

The RMS base monitor periodically checks the integrity and size of the temporary file used to transfer configuration data to the hvdisp process. If this file cannot be checked, then hvdisp process is restarted automatically, though some data may be lost and not displayed at this time. Specific OS error code for the error encountered is displayed in ERRNO.

Corrective action:

Make sure that the host conditions are such that the temporary file can be checked. Sometimes, you may need to restart the hvdisp process by hand.

(ADC, 57) An error occurred while writing out the RMS configuration for the joining host. The hvjoin operation is aborted.

Content:

When a remote host joins a cluster, this host attempts to dump its own configuration for a subsequent transfer to the remote host. If the configuration cannot be saved, the hvjoin operation is aborted.

Corrective action:

One of the previous messages contain a detailed explanation about the error occurring while saving the configuration. Correct the host environment according to the explanation, or contact field engineers.

(ADC, 58) Failed to prepare configuration files for transfer to a joining host. Command used <command>.

Content:

When a remote host joins a cluster, this host attempts to prepare its own configuration for a subsequent transfer to the remote host. For that, it uses the command command. If the command fails, the hvjoin operation is aborted.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(ADC, 59) Failed to store remote configuration files on this host. Command used <command>.

Content:

When this host joins a cluster, this host attempts to store remote configuration files for a subsequent dynamic modification on this host. For that, it uses the command command. If the command fails, the hvjoin operation is aborted.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(ADC, 60) Failed to compress file <file>. Command used <command>.

Content:

File transfer is a part of some RMS operations such as dynamic modification and hvjoin. Before transferring a file file to a remote host, it must be compressed with the command command. If the command fails, the operation that requires the file transfer is aborted.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(ADC, 61) Failed to shut down RMS on host <host>.

Content:

While performing RMS cluster-wide shutdown, RMS on host host failed to shut down.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(ADC, 62) Failed to shut down RMS on this host, attempting to exit RMS.

Content:

While performing RMS clusterwide shutdown, RMS on this host failed to shut down. Another attempt to shut down this host is automatically initiated.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(ADC, 63) Error <errno> while reading file <file>, reason: <reason>.

Content:

While reading file file, an error errno occurred. The reason is explained in reason File reading errors may occur during dynamic modification, or during hvjoin operation.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(ADC, 68) Error <errno> while opening file <file>, reason:<reason>.

Content:

RMS internal error.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(ADC, 70) Message sequence # is out of sync - File transfer of file <filename> has failed.

Content:

A critical internal error has occurred.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

6.1.3.2 ADM: Admin, command, and detector queues

(ADM, 3) Dynamic modification failed: some resource(s) supposed to come offline failed.

Content:

During dynamic modification when new resource(s) to be added to an offline parent object and that resource cannot be brought offline, this message is printed.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(ADM, 4) Dynamic modification failed: some resource(s) supposed to come online failed.

Content:

During dynamic modification when new resource(s) that are to be added to an online parent object by executing the online scripts and that resource cannot be brought online, dynamic modification is aborted.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(ADM, 5) Dynamic modification failed: object <object> is not linked to any application.

Content:

During dynamic modification, if there is an attempt to add an object object that does not have a parent (and hence not linked to any userApplication), this message is printed and dynamic modification is aborted.

Corrective action:

Make sure that every object to be added during dynamic modification is linked to a userApplication.

(ADM, 6) Dynamic modification failed: cannot add new resource <resource> since another existing resource with this name will remain in the configuration.

Content:

When RMS receives a directive to add a new resource resource with the same name as that of an existing resource, this message is printed to the switchlog and dynamic modification aborts.

Corrective action:

Make sure that when adding a new resource, its name does not match the name of any other existing resource.

(ADM, 7) Dynamic modification failed: cannot add new resource <resource> since another existing resource with this name will not be deleted.

Content:

When RMS receives a directive to add a new resource resource with the name of an existing resource, it prints out this message and dynamic modification aborts.

Corrective action:

Make sure that when adding a new resource, its name does not match the name of any other existing resource.

(ADM, 8) Dynamic modification failed: cycle of length <cycle_length> detected in resource <resource> -- <cycle>.

Content:

In the overall structure of the graph of the RMS resources, no cycles are allowed along the chains of parent/child links. If this is not the case then dynamic modification fails and the message specified above will be printed to the switchlog.

Corrective action:

Get rid of the cycles.

(ADM, 9) Dynamic modification failed: cannot modify resource <resource> since it is going to be deleted.

Content:

Since, deleting a resource causes all its children with no other parents to get deleted as well, deleting a resource and then modifying the attributes of the deleted resource or a child of that resource that has no other parents leads to dynamic modification being aborted and the message being printed to the switchlog.

Corrective action:

While performing dynamic modification of a resource make sure that the resource that is being modified has not been deleted.

(ADM, 11) Dynamic modification failed: cannot delete object <resource> since it is a descendant of another object that is going to be deleted.

Content:

When there is an attempt to delete a child object when the parent object has been deleted, the above message will appear in the switchlog and dynamic modification aborted.

Corrective action:

Make sure that when an object is being deleted explicitly, its parents have not already been deleted because that means this object has also been deleted.

(ADM, 12) Dynamic modification failed: cannot delete <resource> since its children will be deleted.

Content:

When there is an attempt to delete a resource resource whose children have already been deleted, the above message will appear in the switchlog and dynamic modification aborted.

Corrective action:

Make sure that when a resource is being deleted explicitly, its children have not already been deleted.

(ADM, 13) dynamic modification failed:object <resource> is in state <state> while needs to be in one of stateOnline, stateStandby, stateOffline, stateFaulted, or stateUnknown.

Content:

Every resource has to be in either one of the states: stateOnline, stateOffline, stateFaulted, stateUnknown or stateStandby. If the resource resource is not in any of the states mentioned above, it prints the above message and dynamic modification is aborted. Theoretically this is not possible.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(ADM, 14) Dynamic modification failed: cannot link to or unlink from an application <userapplication>.

Content:

If the parent of the resource is a userApplication, then linking to or unlinking a child from that parent is not possible. If there is an attempt to perform this, then the above message will be printed to the switchlog and dynamic modification will be aborted.

Corrective action:

Do not link or unlink a resource from a userApplication.

(ADM, 15) Dynamic modification failed: parent object <parentobject> is not a resource.

Content:

When RMS gets a directive to link existing resources during dynamic modification, and the parent object parentobject to which the child object is being linked is not a resource, then dynamic modification fails, and this message is printed.

Corrective action:

Make sure that while linking 2 objects, the parent of the child object is a resource.

(ADM, 16) Dynamic modification failed: child object <childobject> is not a resource.

Content:

When RMS gets a directive to link existing resources during dynamic modification, if the child object childobject that is being linked to a parent object is not a resource, then dynamic modification fails and this message is printed.

Corrective action:

Make sure that while linking 2 objects, the child of the parent object is a resource.

(ADM, 17) Dynamic modification failed: cannot link parent <parentobject> and child <childobject> since they are already linked.

Content:

An attempt was made to link a parent parentobject and a child childobject that are already linked. This message is printed, and dynamic modification is aborted.

Corrective action:

While trying to perform dynamic modification, make sure that the parent and the child that are to be linked are not already linked.

(ADM, 18) Dynamic modification failed: cannot link a faulted child <childobject> to parent <parentobject> which is not faulted.

Content:

While creating a new link between 2 existing objects, during dynamic modification, a faulted child childobject cannot be linked to a parent parentobject that is not faulted. The child first needs to be brought to the state of the parent. If this condition is violated, the aforementioned message will be printed to the switchlog. Dynamic modification is aborted.

Corrective action:

Bring the faulted child to the state of the parent before linking them.

(ADM, 19) Dynamic modification failed: cannot link child <childobject> which is not online to online parent <parentobject>.

Content:

While linking 2 existing objects during dynamic modification, the combination of states parent Online and child not Online is not allowed. When this happens, dynamic modification is aborted and a message is printed to the switchlog.

Corrective action:

The child childobject first needs to be brought to the online state before linking it to the online parent parentobject.

(ADM, 20) Dynamic modification failed: cannot link child <childobject> which is neither offline nor standby to offline or standby parent <parentobject>.

Content:

Any attempt to link 2 existing objects in which the child is neither in the Offline nor the Standby state, and the parent is in the Offline or Standby state, is prohibited. This message is printed in the switchlog, and dynamic modification is aborted.

Corrective action:

The child needs to be first brought to offline or standby state before linking it to the parent that is in offline or standby state.

(ADM, 21) Dynamic modification failed: Cannot unlink parent <parentobject> and child <childobject> since they are not linked.

Content:

Trying to unlink object parentobject from object childobject when they are not already linked results in this message with dynamic modification aborted.

Corrective action:

If you want to unlink 2 objects make sure that they share a parent child relationship.

(ADM, 22) Dynamic modification failed: child <childobject> will be unlinked but not linked back to any of the applications.

Content:

Unlinking a child childobject so that no links remain linking it to any userApplication is not allowed.

Corrective action:

Make sure that the child is still linked to a userApplication

(ADM, 23) Dynamic modification failed: sanity check did not pass for linked or unlinked objects.

Content:

Dynamic modification performs some sanity checks to ensure that all of the following are true:

The HostName attribute is present only for children of userApplication objects.
The child of a userApplication does not have another parent.
Each object belongs to only one userApplication.
Leaf objects have detectors.
Leaf objects that have the DeviceName attribute have it set to a valid value.
The length of the attribute rName for the leaf objects is smaller than the maximum.
There are no duplicate lines in the hvgdstartup file.
The kind argument for the detector in the hvgdstartup is specified.
All detectors can be loaded.
A valid value has been specified for the rKind attribute.
The ScriptTimeout value is greater than the detector cycle time.
No objects are and and or at the same time.
ClusterExclusive and LieOffline, which are mutually exclusive, are not used together.

If some of these sanity checks fail, then this message will be printed and dynamic modification is aborted.

A FATAL message is also printed to the switchlog with more details as to why the sanity check failed.

Corrective action:

Make sure that the sanity checks mentioned above pass.

(ADM, 24) Dynamic modification failed: object <object> that is going to be linked or unlinked will be either deleted, or unlinked from all applications.

Content:

Any attempt to perform the operations of deleting an object object from the RMS resource graph and then trying to unlink it from its parent object or vice versa results in dynamic modification being aborted and the above message being printed to the switchlog.

Corrective action:

Make sure that the operations of deletion and unlinking are not performed on an object at the same time.

(ADM, 25) Dynamic modification failed: parent object <parentobject> is absent.

Content:

When a new object is being added to an existing configuration, it should have an existing object parentobject as its parent, if not then, dynamic modification is aborted and the message is printed to the switchlog.

Corrective action:

Make sure that the parent specified for a new object that is being added is existent.

(ADM, 26) Dynamic modification failed: parent object <parentobject> is neither a resource nor an application.

Content:

When a new object is being added to an existing configuration, if the parent object parentobject that has been specified is not a resource, it leads to dynamic modification aborting and the message being printed. Dynamic modification is aborted.

Corrective action:

Make sure that the parent object specified for a new object is a resource.

(ADM, 27) Dynamic modification failed -- child object <childobject> is absent.

Content:

Any attempt to link to a child object childobject that is non-existent leads to this message and dynamic modification aborts.

Corrective action:

Make sure that the child object to be linked to exists.

(ADM, 28) Dynamic modification failed: child object <childobject> is not a resource.

Content:

When a new object childobject being added to an existing configuration is not a resource, this message is printed, and dynamic modification is aborted.

Corrective action:

Make sure that the child object specified is a resource.

(ADM, 29) Dynamic modification failed -- parent object <parentobject> is absent.

Corrective action:

A critical error has occurred.Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(ADM, 30) Dynamic modification failed: parent object <parentobject> is not a resource.

Content:

During dynamic modification if there is a request to add a new parent object parentobject that is not a resource, this message is printed, and dynamic modification aborts.

Corrective action:

Make sure that the object being added as a parent object is a resource.

(ADM, 31) Dynamic modification failed: child object <childobject> is absent.

Content:

As part of dynamic modification, if the specified child object childobject does not exist, then this message is printed, and dynamic modification is aborted.

Corrective action:

Make sure that the child object that has been specified exists.

(ADM, 32) Dynamic modification failed: child object <childobject> is not a resource.

Content:

When adding a new object to the RMS resource graph, if the child childobject of this new object is not a resource, dynamic modification aborts.

Corrective action:

Make sure that when adding a new object, its child is a resource.

(ADM, 33) Dynamic modification failed: object <object> cannot be deleted since either it is absent or it is not a resource.

Content:

If RMS gets a directive to delete an object object that is either non-existent or not a resource, this message is printed along with the failure of dynamic modification.

Corrective action:

Make sure that you don't try to delete an object that does not exist.

(ADM, 34) Dynamic modification failed: deleted object <object> is neither a resource nor an application nor a host.

Content:

An object deleted during dynamic modification is neither a resource type object, nor a userApplication nor a SysNode object. Only resources, applications and hosts (SysNode objects) can be deleted during dynamic modification.

Corrective action:

Do not delete this object, or delete another object.

(ADM, 37) Dynamic modification failed: resource <object> cannot be brought online and offline/standby at the same time.

Content:

When a resource object is added to an existing RMS resource graph and it is linked as a child to two parent objects, one of which is online and the other offline/standby, this message is printed: a child object needs to be brought to the state of its parent.

Corrective action:

Make sure that both the parents of the resource to be added are in the same state before adding it.

(ADM, 38) Dynamic modification failed: existing parent resource <parentobject> is in state <state> but needs to be in one of stateOnline, stateStandby, stateOffline, stateFaulted, or stateUnknown.

Content:

During dynamic modification, if the state state of a parent resource parentobject is not one of the states stateOnline, stateOffline, stateFaulted, or stateUnknown, dynamic modification aborts.

Corrective action:

Make sure that the state of the parent resource is one of the states mentioned above.

(ADM, 39) Dynamic modification failed: new resource object which is a child of application <userapplication> has its HostName <hostname> the same as another child of application <userapplication>.

Content:

When a new object object is being added as a child of userapplication and the value of its HostName attribute is the same as the value of the HostName attribute of an existing child of userapplication, this message is printed, and dynamic modification is aborted.

Corrective action:

Make sure that the HostName attribute of an object that is being added to userApplication is different from the values of the HostName attributes of other first level children of userapplication.

(ADM, 40) Dynamic modification failed: a new child <child_object> of existing application <userapplication> does not have its HostName set to a name of any SysNode.

Content:

When a new child object childobject is added to an application userapplication during dynamic modification, if the HostName attribute is missing for this object, this message is printed, and dynamic modification is aborted.

Corrective action:

The first level object under userapplication must have a HostName attribute.

(ADM, 41) Dynamic modification failed: existing child <childobject> is not online, but needs to be linked with <parentobject> which is supposed to be brought online.

Content:

If both the parent parentobject and the child childobject have detectors associated with them, if the state of the child is not online, but it needs to be linked to the parent that is supposed to be online, then this message will be printed and dynamic modification aborted.

Corrective action:

Make sure that the parent and the child are in the similar state.

(ADM, 42) Dynamic modification failed: existing child <childobject> is online, but needs to be linked with <parentobject> which is supposed to be brought offline.

Content:

Trying to link a child childobject that is online to a parent object, which is supposed to go offline, is not allowed, and dynamic modification is aborted.

Corrective action:

Make sure that the parent and the child are in a similar state.

(ADM, 43) Dynamic modification failed: linking the same resource <childobject> to different applications <userApplication1> and <userApplication2>.

Content:

When RMS gets a directive to add a new child object childobject having as parent and child resources belonging to different applications userapplication1 and userapplication2, the above message is printed and dynamic modification aborts.

Corrective action:

When adding a new resource make sure that it does not have as its parent and children, resources belonging to different applications.

(ADM, 44) Dynamic modification failed: object <object> does not have an existing parent.

Content:

Any attempt to create an object object that does not have an existing parent leads to this message and dynamic modification aborts.

Corrective action:

Make sure that the object object has an existing object as its parent.

(ADM, 45) Dynamic modification failed: HostName is absent or invalid for resource <object>.

Content:

If the HostName attribute of object object is an invalid value then this message occurs and dynamic modification is aborted. If the HostName attribute is missing, (ADM, 40) will take care of it.

Corrective action:

Set the HostName attribute of resource object to the name of a valid SysNode.

(ADM, 46) Dynamic modification failed: linking the same resource <object> to different applications <userapplication1> and <userapplication2>.

Content:

RMS received a directive to add a new child object object by linking it to parent objects belonging to different applications userapplication1 and userapplication2. Dynamic modification is aborted.

Corrective action:

When adding a new child resource, make sure that it does not have as its parents resources belonging to different applications.

(ADM, 47) Dynamic modification failed: parent object <parentobject> belongs to a deleted application.

Content:

Any attempt to add a new node having as its parent parentobject fails if the parent parentobject is the child of an object that has been deleted, because deleting an object automatically causes its children to be deleted as well if they don't have any other parents. This causes dynamic modification to fail.

Corrective action:

When adding a new object makes sure that its parent has not already been deleted.

(ADM, 48) Dynamic modification failed: child object <childobject> belongs to a deleted application.

Content:

Any attempt to delete an object childobject belonging to a deleted application elicits this response from RMS because deleting an application automatically causes all its children to be deleted as well.

Corrective action:

Do not try to delete an object belonging to an already deleted application.

(ADM, 49) Dynamic modification failed: deleted object <objectname> belongs to a deleted application.

Content:

Any attempt to delete an object objectname that belongs to a deleted application leads to this error because deleting an application deletes all its children including objectname.

Corrective action:

Make sure that before an object is deleted, it does not belong to an application that is being deleted.

(ADM, 50) Dynamic modification failed: cannot delete object <object> since it is a descendant of a new object.

Content:

When RMS gets a directive to delete an object object, which is a descendant of a new object, this message is printed, and dynamic modification is aborted.

Corrective action:

Make sure that when an object is being deleted, it is not a descendant of a new object.

(ADM, 51) Dynamic modification failed: cannot link to child <childobject> since it will be deleted.

Content:

When RMS gets a directive to link to a child childobject that is going to be deleted, dynamic modification aborts.

Corrective action:

Do not link to a child object that is to be deleted.

(ADM, 52) Dynamic modification failed: cannot link to parent <parentobject> since it will be deleted as a result of deletion of object <object>.

Content:

If there is an attempt to delete an object object and use its descendants (which should be deleted as a result of deleting the parent) as the parent for a new resource that is being added to the RMS resource graph, this error message is printed and dynamic modification aborts.

Corrective action:

Do not attempt to delete an object and use its descendant as the parent for a new resource.

(ADM, 53) Dynamic modification failed: <node> is absent.

Content:

An attempt was made to modify the attribute of a node node that is absent. This message is printed and dynamic modification is aborted.

Corrective action:

Modify the attributes of an existing node.

(ADM, 54) Dynamic modification failed: NODE <object>, attribute <attribute> is invalid.

Content:

When RMS receives a directive to modify a node object with attribute attribute that has an invalid value, this message is printed, and dynamic modification is aborted.

Corrective action:

Specify a valid value for the attribute attribute.

(ADM, 55) Cannot create admin queue.

Content:

RMS uses Unix queues internally for interprocess communication. Admin queue is one such queue that is used for communication between RMS and other utilities like hvutil, hvmod, hvshut, hvswitch and hvdisp. If RMS cannot create this queue due to some reason, RMS exits with exit code 50.

Corrective action:

Restart RMS.

(ADM, 57) hvdisp - open failed - filename.

Content:

If RMS is unable to open the file /opt/SMAW/SMAWRrms/locks/.rms.<pid> for writing when 'hvdisp -m' has been invoked, this message is printed.

Corrective action:

Verify that the directory /opt/SMAW/SMAWRrms/locks exists and allows files to be created (correct permissions, free space in the file system, free inodes). If one of these problems exists, fix it via the appropriate administrator operation. If none of these problems apply, but the RMS failure still occurs, contact field engineers.

(ADM, 58) hvdisp - open failed - filename : errormsg.

Content:

When hvdisp is unable to open the file file (/opt/SMAW/SMAWRrms/locks/.rms.<pid>) for writing, it prints out the reason errormsg.

Corrective action:

(ADM, 59) userapplication: modification is in progress, switch request skipped.

Content:

This message is printed to the switchlog because commands like hvswitch, hvutil and hvshut cannot run in parallel with a non local hvmod.

Corrective action:

Make sure that before a hvswitch is performed, hvmod is not operating on userapplication.

(ADM, 60) <resource> is not a userApplication object, switch request skipped!

Content:

While performing a switch, hvswitch requires a userApplication as its argument. If the resource resource is not a userApplication, this message is printed.

Corrective action:

Check the man page for hvswitch for usage information.

(ADM, 62) The attribute <ShutdownScript> may not be specified for object <object>.

Content:

The attribute ShutdownScript is a hidden attribute of a SysNode. The RMS base monitor automatically defines its value -- users cannot change it in any way.

Corrective action:

Do not attempt to change the built-in value of the ShutdownScript attribute.

(ADM, 63) System name <sysnode> is unknown.

Content:

This message can occur in these scenarios:

The name of the SysNode specified in hvswitch is not included in the current configuration. ('hvswitch [-f] userapplication [sysnode]')
The name of the SysNode specified for 'hvshut -s sysnode' is not a valid one, i.e., sysnode is not included in the current configuration.
The name of the SysNode specified for 'hvutil -ou' is unknown (hidden options).

Corrective action:

Specify a SysNode that is included in the current configuration, i.e., appears in the configname.us file.

(ADM, 67) sysnode Cannot shut down.

Content:

This message could appear if 'hvshut -a' was invoked and not all of the nodes replied with an acknowledgement.

Corrective action:

Login to the remote hosts. If RMS is still running, perform 'hvutil -f <userapplication>' to shut down each application one at a time. If this fails, refer to the switchlog and userapplication log files to find the reason for the problem. If all applications have been shut down correctly, perform a forced RMS shutdown with 'hvshut -f.' Report the problem to field engineers.

(ADM, 70) NOT ready to shut down.

Content:

The node on which 'hvshut -a' has been invoked is not yet ready to be shut down because the application is busy on the node.

Corrective action:

Wait until the ongoing action (e.g., switchover, dynamic reconfiguration) has terminated.

(ADM, 75) Dynamic modification failed: child <resource> of userApplication object <userapplication> has HostName attribute <hostname> common with other children of the same userApplication.

Content:

This message occurs if the RMS internal sanity-check functions detect a severe configuration problem. This message should not occur if the configuration has been set up using RMS configuration wizards.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(ADM, 76) Modification of attribute <attribute> is not allowed within existing object <object>.

Content:

The attribute attribute is constant and can only be set in a configuration file.

Corrective action:

Make sure that there is no attempt to modify attribute within object.

(ADM, 77) Dynamic modification failed: cannot delete object object since its state is currently being asserted.

Content:

This message can appear in the switchlog if dynamic modification is being performed on an object that is being asserted.

Corrective action:

Perform the modification after the assertion has been fulfilled.

(ADM, 78) Dynamic modification failed: PriorityList <prioritylist> does not include all the hosts where the application <userapplication> may become Online. Make sure that PriorityList contains all hosts from the HostName attribute of the application's children.

Content:

Set PriorityList for userapplication to include all the host names from the HostName attribute of the application's children.

Corrective action:

No duplicate host names should be present in the PriorityList.

(ADM, 79) Dynamic modification failed: PriorityList <prioritylist> includes hosts where the application <userapplication> may never become Online. Make sure PriorityList contains only hosts from the HostName attributes of the application's children.

Content:

The HostName attribute of one or more of the children specifies hosts that are not in the parent's PriorityList attribute.

Corrective action:

Set the PriorityList attribute of userapplication to include all the host names listed in the HostName attributes of the application's children. No duplicate host names should be present in the PriorityList.

(ADM, 81) Dynamic modification failed: application <userapplication> may not have more than <maxcontroller> parent controllers as specified in its attribute MaxControllers.

Content:

If userapplication uses more parent controllers than specified by the attribute MaxControllers (maxcontroller), this message is printed, and dynamic modification is aborted.

Corrective action:

Make sure that the number of parent controllers used by an application is less than the number specified as part of the MaxControllers attribute, or modify MaxControllers to increase the number.

(ADM, 82) Dynamic modification failed: cannot delete type <object> unless its state is one of Unknown, Wait, Offline or Faulted.

Content:

This message may appear in the switchlog if there is an attempt to delete a SysNode from a running configuration if the node is not in one of the states Unknown, Wait, Offline or Faulted.

Corrective action:

Shut down RMS on that host and then do the deletion.

(ADM, 83) Dynamic modification failed: cannot delete SysNode <sysnode> since this RMS monitor is running on this SysNode.

Content:

During dynamic modification the local SysNode sysnode was going to be deleted.

Corrective action:

Make sure dynamic modification does not contain 'delete sysnode;' where sysnode is the name of the local node.

(ADM, 84) Dynamic modification failed: cannot add SysNode <sysnode> since its name is not valid.

Content:

This message appears in the switchlog if the name sysnode specified as part of the dynamic modification is not resolvable to any known host name.

Corrective action:

Specify a host name that is resolvable to a network address.

(ADM, 85) Dynamic modification failed: timeout expired, timeout symbol is <symbol>.

Content:

If the dynamic modification takes too much time, this message is printed.

Corrective action:

Make sure that the network connection between the hosts is functional, and also verify that the scripts from newly added resources do not take too much time to execute, or that dynamic modification does not add too many new nodes, or that the modification file is too big or too complex.

(ADM, 86) Dynamic modification failed: application <userapplication> cannot be deleted since it is controlled by the controller <controller>.

Content:

A controlled application userapplication cannot be deleted while its controller controller retains the application's name in its Resource attribute.

Corrective action:

Remove the name of the deleted application from the controller's Resource attribute, or add a new application with the same name, or delete the controller together with its controlled application, or change the controller's NullDetector attribute to 1.

(ADM, 87) Dynamic modification failed: only local attributes such as ScriptTimeout, DetectorStartScript, NullDetector or MonitorOnly can be modified during local modification (hvmod -l).

Content:

The reason for this message is that only the modification of local attributes is allowed during local modification.

Corrective action:

Make a non-local modification, or modify different attributes.

(ADM, 88) Dynamic modification failed: attribute <attribute> is modified more than once for object <object>.

Content:

This message may appear because an attribute of a particular object can be modified only once in the same modification file, but attribute has been modified more than once for <object>.

Corrective action:

Modify the attribute only once per object.

(ADM, 89) Dynamic modification failed: cannot rename existing object <sysnode> to <othersysnode> because either there is no object named <sysnode>, or another object with the name <othersysnode> already exists, or a new object with that name is being added, or the object is not a resource, or it is a SysNode, or it is a controlled application which state will not be compatible with its controller.

Content:

This message appears when we try to rename an existing object sysnode to other node othersysnode but one of the following conditions was encountered:

othersysnode is not a valid name.
othersysnode is already used by some other host in the cluster.
othersysnode is not a resource.
othersysnode is a controlled application.

Corrective action:

Choose another valid host name

(ADM, 90) Dynamic modification failed: cannot change attribute Resource of the controller object <controllerobject> from <oldresource> to <newresource> because some of <oldresource> are going to be deleted.

Content:

This message appears when the user tries to rename a resource that is controlled by a controller object and is going to be deleted.

Corrective action:

Make sure deleted applications are not referred from any controller.

(ADM, 91) Dynamic modification failed: controller <controller> has its Resource attribute set to <resource>, but application named <userapplication> is going to be deleted.

Content:

This message appears when the user tries to control a resource resource with a controller controller but the application associated with that resource is going to be deleted.

Corrective action:

Make sure the controller's Resource attribute does not refer to a deleted application.

(ADM, 95) Cannot retrieve information about command line used when starting RMS. Start on remote host must be skipped. Please start RMS manually on remote hosts.

Content:

RMS was started with the -a option but due to some internal error RMS could not be started on the remote host. This is a critical internal error.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide." For temporary workaround, try again or start RMS manually on each host.

(ADM, 96) Remote startup of RMS failed <startupcommand>.Reason: errorreason.

Content:

When RMS cannot be started on remote hosts because the command startupcommand failed, this message is printed.

Corrective action:

This may occur when some of the hosts are not reachable or the network is down.

Check the network and remote host for abnormalities remove the problem cause, and retry.

(ADM, 98) Dynamic modification failed: controller <controller> has its Resource attribute set to <resource>, but some of the controlled applications from this list do not exist.

Content:

This message appears when the controller node was not able to find the applications controlled by it with the applications running on the host.

Corrective action:

Correct your modification file so that the controllers refer only to the existing applications.

(ADM, 99) Dynamic modification failed: cannot change attribute Resource of the controller object <controller> from <oldresource> to <newresource> because one or more of the applications listed in <newresource> is not an existing application or its state is incompatible with the state of the controller, or because the list contains duplicate elements.

Content:

This message appears when the user tries to change the Resource attribute of the controller object controller from oldresource to newresource because one or more of the applications listed in newresource is not an existing application or its state is incompatible with the state of the controller, or because the list contains duplicate elements.

Corrective action:

Make sure that the applications listed in the resource newresource are not written more than once or invalid.

(ADM, 100) Dynamic modification failed: because a controller <controller> has AutoRecover set to 1, its controlled application <userapplication> cannot have PreserveState set to 0 or AutoSwitchOver set to 1.

Content:

If an application needs to be controlled by a controller then the applications' attributes PreserveState and AutoSwitchOver need to be 1 and No respectively if the controller has its AutoRecover set to 1.

Corrective action:

Check the PreserveState and AutoSwitchOver attribute of the application.

(ADM, 106) The total number of SysNodes specified in the configuration for this cluster is hosts. This exceeds the maximum allowable number of SysNodes in a cluster which is maxhosts.

Content:

The total number of SysNode objects in the cluster has exceeded the maximum allowable limit.

Corrective action:

Make sure that the total number of SysNode objects in the cluster does not exceed maxhosts.

(ADM, 107) The cumulative length of the SysNode names specified in the configuration for the userApplication <userapplication> is length. This exceeds the maximum allowable length which is maxlength.

Content:

The cumulative length of the SysNode names specified in the configuration for application userapplication exceeds the maximum allowable limit.

Corrective action:

Limit the length of the SysNode names so that they fit within the maximum allowable limit.

(ADM, 125) Dynamic modification failed: The <attr> entry <value> for SysNode <sysnode> matches the <attr> entry or the SysNode name for another SysNode.

Content:

The entry attr must be unique.

Corrective action:

Ensure that the attr entry is unique.

(ADM, 126) userapplication: This application is controlled by controller object. That controller is defined as a LOCAL controller and as such switching this application must be done by switching the controlling application userapplication

Content:

RMS internal error.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(ADM, 128) <resource> is neither a userApplication nor a resource object, switch request skipped!

Content:

While performing a switch, hvswitch requires a userApplication or a resource as its argument. If the resource resource is not a userApplication or a resource, this message is printed.

Corrective action:

Check the man page for hvswitch for usage information.

6.1.3.3 BAS: Startup and configuration error

(BAS, 2) Duplicate line in hvgdstartup.

Content:

If RMS detects that a line has been duplicated in the hvgdstartup, it prints this error message. The end result of this is that RMS will exit with exit code 23.

Corrective action:

Only unique lines are allowed in hvgdstartup. Remove all the duplicate entries.

(BAS, 3) No kind specified in hvgdstartup.

Content:

In the hvgdstartup file, the entry for the detector is not of the form 'g<n> -t<n> -k<n>', or the -k<n>& option is missing. Since RMS is unable to start, it exits with exit code 23.

Corrective action:

Modify the entry for the detector so that the kind (-k<n> option) for the detector is specified properly.

(BAS, 6) DetectorStartScript for kind <kind> cannot be redefined while detector is running.

Content:

During dynamic modification, there was an attempt to redefine the kind for the DetectorStartScript.

Corrective action:

Do not attempt to redefine the DetectorStartScript when the detector is already running.

(BAS, 9) ERROR IN CONFIGURATION FILE: message.

Content:

The message can be any one of the following:

Check for SanityCheckErrorPrint
Object <object> cannot have its HostName attribute set since it is not a child of any userApplication. Only the direct descendants of userApplication can have the HostName attribute set.
In basic.C:parentsCount(...)
The node <node> belongs to more than one userApplication, app1 and app2. Nodes must be children of one and only one userApplication node.
The node <node> is a leaf node and this type <type> does not have a detector. Leaf nodes must have detectors.
The node <node> has an empty DeviceName attribute. This node uses a detector and therefore it needs a valid DeviceName attribute.
The rName is <rname>, its length length is larger than max length maxlength.
The DuplicateLineInHvgdstartup is <number>, so the hvgdstartup file has a duplicate line.
The NoKindSpecifiedForGdet is <number>, so no kind specified in hvgdstartup.
Failed to load a detector of kind <kind>.
The node <node> has an invalid rKind attribute. Nodes of type gResource must have a valid rKind attribute.
The node <node> has a ScriptTimeout value that is less than its detector report time. This will cause a script timeout error to be reported before the detector can report the state of the resource. Increase the ScriptTimeout value for objectname (currently value seconds) to be greater than the detector cycle time (currently value seconds).
Node <node> has no detector while all its children's "MonitorOnly" attributes are set to 1.
The node <node> has both attributes "LieOffline" and "ClusterExclusive" set. These attributes are incompatible; only one of them may be used.
The type of object <object> cannot be or and at the same time.
Object <object> is of type and, its state is online, but not all children are online.

Corrective action:

Verify the above description and change the configuration appropriately.

(BAS, 14) ERROR IN CONFIGURATION FILE:The object <object> belongs to more than one userApplication, userapplication1 and userapplication2.Objects must be children of one and only one userApplication object.

Content:

An object was encountered as a part of more than one user applications.

RMS applications cannot have common objects.

Corrective action:

Redesign your configuration so that no two applications have common objects.

(BAS, 15) ERROR IN CONFIGURATION FILE:The object <object> is a leaf object and this type <type> does not have a detector.Leaf objects must have detectors.

Content:

An object that has no children objects (i.e. a leaf object) is of type type that has no detectors in RMS. All leaf objects in RMS configurations must have detectors.

Corrective action:

Redesign your configuration so that all leaf objects have detectors.

(BAS, 16) ERROR IN CONFIGURATION FILE:The object object has an empty DeviceName attribute.This object uses a detector and therefore it needs a valid DeviceName attribute.

Content:

A critical internal error has occurred. If this message appears in switchlog, it indicates a severe problem in the base monitor.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(BAS, 17) ERROR IN CONFIGURATION FILE:The rName is <rname>, its length length is larger than max length maxlength.

Content:

The value of the rName attribute exceeds the maximum length of maxlength characters.

Corrective action:

Specify a shorter rName not to exceed the upper limit.

(BAS, 18) ERROR IN CONFIGURATION FILE:The duplicate line number is <linenumber>.

Content:

This message prints out a line number of the duplicate line in hvgdstartup file.

Corrective action:

Make sure that file hvgdstartup has no duplicate lines.

(BAS, 19) ERROR IN CONFIGURATION FILE:The NoKindSpecifiedForGdet is <kind>, so no kind specified in hvgdstartup.

Content:

The kind has not been specified for the generic detector in the hvgdstartup file.

Corrective action:

Specify the kind for the generic detector in hvgdstartup.

(BAS, 23) ERROR IN CONFIGURATION FILE: DetectorStartScript for object object is not defined. Objects of type type should have a valid DetectorStartScript attribute.

Content:

Object object does not have its DetectorStartScript defined.

Corrective action:

Make sure that the DetectorStartScript is defined for object object.

(BAS, 24) ERROR IN CONFIGURATION FILE: The object object has an invalid rKind attribute. Objects of type gResource must have a valid rKind attribute.

Content:

Object object has an invalid rKind attribute.

Corrective action:

Make sure that the object object has a valid rKind attribute.

(BAS, 25) ERROR IN CONFIGURATION FILE:The object object has a ScriptTimeout value that is less than its detector report time.This will cause a script timeout error to be reported before the detector can report the state of the resource.Increase the ScriptTimeout value for object (currently seconds seconds) to be greater than the detector cycle time (currently detectorcycletime seconds).

Content:

The ScriptTimeout value is less than the detector cycle time. This will cause the resource to appear faulted when being brought Online or Offline.

Corrective action:

Make the value of ScriptTimeout greater than the detector report time.

(BAS, 26) ERROR IN CONFIGURATION FILE:The type of object <object> cannot be 'or' and 'and' at the same time.

Content:

Each RMS object must be of a type derived from or and types, but not both. If this message appears in the switchlog, it indicates of a severe corruption of the RMS executable.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(BAS, 27) ERROR IN CONFIGURATION FILE:object <object> is of type 'and', its state is online, but not all children are online.

Content:

This message may appear during dynamic modification, when the existing configuration is checked before applying the modification. If this message appears, the dynamic modification will not proceed.

Corrective action:

Make sure that online objects of type and have all their children in online states, only then apply dynamic modification.

(BAS, 29) ERROR IN CONFIGURATION FILE:object <object> cannot have its HostName attribute set since it is not a child of any userApplication.

Content:

An object that is not a child of a userApplication has its HostName attribute set. Only children of the userApplication object can and must have its HostName attribute set.

Corrective action:

Eliminate the HostName attribute from the definition of the object, or disconnect the userApplication object from this object, making this object a child of another, non-userApplication object.

(BAS, 30) ERROR IN CONFIGURATION FILE:The object object has both attributes "LieOffline" and "ClusterExclusive" set.These attributes are incompatible; only one of them may be used.

Content:

Both attributes LieOffline and ClusterExclusive are set for the same RMS object. Only one of them can be set for the same object.

Corrective action:

Eliminate one or both settings from the RMS object object.

(BAS, 31) ERROR IN CONFIGURATION FILE:Failed to load a detector of kind <kind>.

Content:

A detector was not able to be started by the RMS base monitor.

Corrective action:

Make sure detector executable is present in the right place and has executable privileges.

(BAS, 32) ERROR IN CONFIGURATION FILE:Object <object> has no detector while all its children's <MonitorOnly> attributes are set to 1.

Content:

An object without a detector has all its children's MonitorOnly attributes set to 1. An object without a detector must have at least one child for which MonitorOnly is set to 0.

Corrective action:

Change the configuration so that each object without a detector has at least one child with its MonitorOnly set to 0.

(BAS, 36) ERROR IN CONFIGURATION FILE:The object object has both attributes "MonitorOnly" and "ClusterExclusive" set. These attributes are incompatible; only one of them may be used.

Content:

Both attributes MonitorOnly and ClusterExclusive are set for the same RMS object. Only one of them can be set for the same object.

Corrective action:

Eliminate one or both settings from the RMS object object.

(BAS, 43) ERROR IN CONFIGURATION FILE: The object object has both attributes "MonitorOnly" and "NonCritical" set. These attributes are incompatible; only one of them may be used.

Content:

RMS internal error.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

6.1.3.4 BM: Base monitor

(BM, 13) no symbol for object <object> in .inp file, line = linenumber.

Content:

RMS internal error.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(BM, 14) Local queue is empty on read directive in line:linenumber.

Content:

RMS internal error.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(BM, 15) destination object <object> is absent in line: linenumber.

Content:

RMS internal error.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(BM, 16) sender object <object> is absent in line:linenumber.

Content:

RMS internal error.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(BM, 17) Dynamic modification failed: line linenumber, cannot build an object of unknown type <symbol>.

Content:

An object of unknown type is added during dynamic modification.

Corrective action:

Use only objects of known types in configuration files.

(BM, 18) Dynamic modification failed: line linenumber, cannot set value for attribute <attribute> since object <object> does not exist.

Content:

An attribute of a non-existing object cannot be modified.

Corrective action:

Modify attributes only for existing objects.

(BM, 19) Dynamic modification failed: line linenumber, cannot modify attribute <attribute> of object <object> with value <value>.

Content:

Invalid attribute is specified for modification.

Corrective action:

Modify only valid attributes.

(BM, 20) Dynamic modification failed: line linenumber, cannot build object <object> because its type <symbol> is not a user type.

Content:

An object object of a system type symbol is specified during dynamic modification.

Corrective action:

Use only valid resource types when adding new objects to configuration.

(BM, 21) Dynamic modification failed: cannot delete object <object> because its type <symbol> is not a user type.

Content:

An object object of a system type symbol is specified for deletion.

Corrective action:

Delete only objects that are valid resource types.

(BM, 23) Dynamic modification failed: The <Follow> attribute for controller <controller> is set to 1, but the content of a PriorityList of the controlled application <controlleduserApplication> is different from the content of the PriorityList of the application <userapplication> to which <controller> belongs.

Content:

This message appears when the PriorityList of the controlled application controlleduserapplication is different from the content of the PriorityList of the application userapplication to which the controller controller belongs.

Corrective action:

Make sure that the PriorityList of the controller and the controlled application is same.

(BM, 24) Dynamic modification failed: some resource(s) supposed to come standby failed.

Content:

During dynamic modification, an attempt was made to add new resource(s) to a resource that was in Standby mode, but the resources could not also be brought into Standby mode.

Corrective action:

Analyze your configuration to make sure that standby capable resources can get to the Standby state.

(BM, 25) Dynamic modification failed: standby capable controller <controller> cannot control application <userapplication> which has no standby capable resources on host <sysnode>.

Content:

In order for an application userapplication to be controlled by a controller controller the application userapplication has to have at least one standby capable resource on host sysnode.

Corrective action:

Make sure that the controlled application has at least one standby capable controller or make sure that the controllers are not standby capable.

(BM, 26) Dynamic modification failed: controller <controller> cannot have attributes StandbyCapable and IgnoreStandbyRequest both set to 0.

Content:

This message appears when user sets both controller attributes StandbyCapable and IgnoreStandbyRequest to 1.

Corrective action:

Make sure that only one is set to 1 and other to 0.

(BM, 29) Dynamic modification failed: controller object <controller> cannot have its attribute 'Follow' set to 1 while one of OnlineTimeout or StandbyTimeout is not null.

Content:

The controller node controller should have one of its attributes Online-Timeout or StandbyTimeout be null to allow the attribute Follow to be 1.

Corrective action:

Set the attributes accordingly and try again.

(BM, 42) Dynamic modification failed: application <userapplication> is not controlled by any controller, but has one of its attributes ControlledSwitch or ControlledShutdown set to 1.

Content:

This message appears when the user wants the application userapplication to be controlled by a controller but one or more of the applications' attributes ControlledSwitch or ControlledShutdown is set to 1.

Corrective action:

Set the attributes accordingly and try again.

(BM, 46) Dynamic modification failed: cannot modify a global attribute <attribute> locally on host <hostname>.

Content:

The user cannot modify global attributes attribute like DetectorStartScript or NullDetector or NonCritical locally on a host hostname.

Corrective action:

Modify the attribute globally or modify locally a different attribute.

(BM, 54) The RMS-CF-CIP mapping cannot be determined for any host due to the CIP configuration file <configfilename> missing entries.Please verify all entries in <configfilename> are correct and that CF and CIP are fully configured.

Content:

CIP configuration file has missing entries.

Corrective action:

Make sure that the CIP configuration has entries for all the RMS hosts that are running in a cluster.

(BM, 59) Error errno while reading line <linenumber> of .dob file -- <errorreason>.

Content:

During dynamic modification, the base monitor reads its configuration from a '.dob' file. When this file cannot be read, this message appears in the switchlog. The specific OS error is indicated in errno and errorreason.

Corrective action:

Make sure the host conditions are such that .dob file can be read without errors.

(BM, 68) Cannot get message queue parameters using sysdef, errno = <errno>, reason: <reason>.

Content:

While obtaining message queue parameters, sysdef was not able to communicate them back to the base monitor. The values of errno and reason indicate the kind of error.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(BM, 71) Dynamic modification failed: Controller <controller> has its attribute Follow set to 1. Therefore, its attribute IndependentSwitch must be set to 0, and its controlled application <application> must have attributes
    AutoSwitchOver == "No"
    StandbyTransitions="No"
    AutoStartUp=0
    ControlledSwitch = 1
    ControlledShutdown = 1
    PartialCluster = 0.
However, the real values are
    IndependentSwitch = <isw>
    AutoSwitchOver = <asw>
    StandbyTransitions = <str>
    AutoStartUp = <asu>
    ControlledSwitch = <csw>
    ControlledShutdown = <css>
    PartialCluster = <pcl>.

Content:

When the controller's Follow attribute is set, other attributes such as IndependentSwitchover, AutoSwitchOver, StandbyTransitions, AutoStartUp, ControlledSwitch, ControlledShutdown and PartialCluster must have the values 0, No, No, 0, 1, 1 and 0 respectively. However, this condition is violated in the configuration file.

Corrective action:

Supply a valid combination of attributes for the controller and its controlled user application.

(BM, 72) Dynamic modification failed: Controller <controller> with the <Follow> attribute set to 1 belongs to an application <application> which PersistentFault is <appfault>, while its controlled application <controlledapplication> has its PersistentFault <_fault>.

Content:

If controller has its Follow set to 1 then all its controlled applications must have the same value for the attribute PersistentFault as the parent application of the controller.

Corrective action:

Check and correct the RMS configuration file.

(BM, 73) The RMS-CF interface is inconsistent and will require operator intervention. The routine "routine" failed with error code errocode - "errorreason".

Content:

This is a generic message indicating that the execution of the routine routine failed due to the reason errorreason and hence the RMS-CF interface is inconsistent.

Corrective action:

No action is required if this message is output when RMS is stopped on multiple nodes at the same time. In other cases, record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(BM, 74) The attribute DetectorStartScript and hvgdstartup file cannot be used together.The hvgdstartup file is for backward compatibility only and support for it may be withdrawn in future releases.Therefore it is recommended that only the attribute DetectorStartScript be used for setting new configurations.

Content:

The attribute DetectorStartScript and the file hvgdstartup are mutually exclusive.

Corrective action:

Make sure that the DetectorStartScript be used for setting new configurations as support for hvgdstartup may be discontinued in future releases.

(BM, 75) Dynamic modification failed: controller <controller> has its attributes SplitRequest, IgnoreOnlineRequest, and IgnoreOfflineRequest set to 1. If SplitRequest is set to 1, then at least one of IgnoreOfflineRequest or IgnoreOnlineRequest must be set to 0.

Content:

Invalid combination of controller attributes is encountered. If both IgnoreOfflineRequest and IgnoreOnlineRequest are set to 1, then no request will be propagated to the controlled application(s), so no request can be split.

Corrective action:

Provide a valid combination of the controller attributes.

(BM, 80) Dynamic modification failed: controller <controller> belongs to the application <application> which AutoSwitchOver attribute has "ShutDown" option set, but its controlled application <controlled> has not.

Content:

If a controlling application has its AutoSwitchOver attribute set with the option "Shutdown", then all applications controlled by the controllers that belong to this controlling application must also have their AutoSwitchOver attributes having the option "Shutdown" set as well.

Corrective action:

Provide correct settings for the AutoSwitchOver attributes.

(BM, 81) Dynamic modification failed: local controller attributes such as NullDetector or MonitorOnly cannot be modified during local modification (hvmod -l).

Content:

The reason for this message is that the modification of local controller attributes such as NullDetector or MonitorOnly are allowed only during global modification.

Corrective action:

Make a non-local modification, or modify different attributes.

(BM, 90) Dynamic modification failed: The length of object name <object> is length. This is greater than the maximum allowable length name of maxlength.

Content:

The length of object name is greater than the maximum allowable length.

Corrective action:

Ensure that the length of the object name is smaller than maxlength.

(BM, 92) Dynamic modification failed: a non-empty value <value> is set to <ApplicationSequence> attribute of a non-scalable controller <controller>.

Content:

A non-scalable controller cannot have its ApplicationSequence attribute set to a non-empty value.

Corrective action:

Provide correct settings for the ApplicationSequence and Scalable attributes.

(BM, 94) Dynamic modification failed: the ApplicationtSequence attribute of a scalable controller <controller> includes application name <hostname>, but this name is absent from the list of controlled applications set to the value of <resource> in the attribute <Resource>.

Content:

The ApplicationSequence attribute of a scalable controller includes an application name absent from the list of the controlled applications.

Corrective action:

Provide correct settings for ApplicationSequence and Resource attributes of the controller.

(BM, 96) Dynamic modification failed: a scalable controller <controller> has its attributes <Follow> set to 1 or <IndependentSwitch> set to 0.

Content:

A scalable controller must have its attribute Follow set to 0 and IndependentSwitch set to 1.

Corrective action:

Provide correct settings for the Follow, IndependentSwitch, and Scalable attributes.

(BM, 97) Dynamic modification failed: controller <controller> attribute <ApplicationSequence> is set to <applicationsequence> which refers to application(s) not present in the configuration.

Content:

A scalable controller must list only existing applications in its ApplicationSequence attribute.

Corrective action:

Provide correct settings for attribute ApplicationSequence.

(BM, 98) Dynamic modification failed: two scalable controllers <controller1> and <controller2> control the same application <application>.

Content:

Only one scalable controller can control an application.

Corrective action:

Correct the RMS configuration.

(BM, 99) Dynamic modification failed: controlled application <controlledapp> runs on host <hostname>, but it is controlled by a scalable controller <scontroller> which belongs to an application <controllingapp> that does not run on that host.

Content:

Hostname mismatch between controlled and controlling applications. Controlling application must run on all the hosts where the controlled applications are running.

Corrective action:

Fix RMS configuration.

(BM, 101) Dynamic modification failed: controlled application <controlledapp> runs on host <hostname>, but it is controlled by a scalable controller <scontroller> which belongs to a controlling application <controllingapp> that does not allow for the controller to run on that host.

Content:

Hostname mismatch between controlled and controlling applications. Controlling application must run on all the hosts where the controlled applications are running.

Corrective action:

Fix RMS configuration.

(BM, 103) Dynamic modification failed: Controller <controller> has its attribute Follow set to 1 and the controlled application <application> has StandbyCapable resources. Therefore the controller itself must have StandbyCapable set to 1 and IgnoreStandbyRequest must be set to 0.

Content:

When the controller's Follow attribute is set and the controlled application has StandbyCapable resources, the controller must have StandbyCapable set and IgnoreStandbyRequest must be disabled. Otherwise Standby requests will not properly been propagated to the controlled application.

Corrective action:

Supply a valid combination of attributes for the controller and its controlled user application.

(BM, 105) Dynamic modification failed: Invalid kind of generic resource specified in DetectorStartScript <script> for object <object>.

Content:

Wrong value is supplied for a flag -k in the detector startup script.

Corrective action:

Fix RMS configuration.

(BM, 106) The rKind attribute of object <object> does not match the value of the '-k' flag of its associated detector.

Content:

Values for rKind attribute and flag -k of the detector startup line do not match.

Corrective action:

Correct the RMS configuration.

(BM, 107) Illegal different values for rKind attribute in object <object>.

Content:

Different values for rKind attribute are encountered within the same object.

Corrective action:

Fix RMS configuration.

(BM, 108) Dynamic modification failed: Scalable controller <object> cannot have its attribute <SplitRequest> set to 1.

Content:

Setting controller attributes Scalable and SplitRequest is mutually exclusive.

Corrective action:

Fix RMS configuration.

(BM, 109) Dynamic modification failed: Application <application> has its attribute PartialCluster set to 1 or is controlled, directly or indirectly, via a Follow controller that belongs to another application that has its attribute PartialCluster set to 1 -- this application <application> cannot have a cluster exclusive resource <resource>.

Content:

An exclusive resource cannot belong to an application with the attribute PartialCluster set to 1, or cannot be controlled, directly or indirectly, by a Follow controller from an application with the attribute PartialCluster set to 1.

Corrective action:

Fix RMS configuration.

(BM, 110) Dynamic modification failed: Application <application> is controlled by a scalable controller <controller>, therefore it cannot have its attribute <ControlledShutdown> set to 1 while its attribute <AutoSwitchOver> includes option <ShutDown>.

Content:

An application controlled by a scalable controller cannot have ControlledShutdown set to 1 and AutoSwitchOver including the option ShutDown at the same time.

Corrective action:

Correct the RMS configuration file.

(BM, 111) Dynamic modification failed: Line #line is too big.

Content:

A line in a configuration file is too big.

Corrective action:

Fix RMS configuration, so that each line takes less than 2000 bytes.

(BM, 113) Base monitor has reported 'Faulted' for host <Sysnode>.

Content:

This message indicates that the RMS on the node Sysnode has terminated unexpectedly.

Corrective action:

Investigate why RMS has terminated unexpectedly and then take necessary actions. If RMS terminated unexpectedly, since the node which was terminated unexpectedly needs to be forcibly stopped, the message (US, 12) will be output.

This message may be output when RMS is activated when the SysNode <Sysnode> is stopped. In this case, no action is required.

(BM, 122) getaddrinfo failed, reason: errorreason, errno <errno>. Failed to allocate a socket for "rmshb" port monitoring.

Content:

The getaddrinfo call failed to allocate a port for rmshb.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

6.1.3.5 CML: Command line

(CML, 11) Option (option) requires an operand.

Content:

Certain options for hvcm require an argument. If hvcm has been invoked without the argument, this message appears along with the usage and RMS exits with exit code 3.

Corrective action:

Refer to "PRIMECLUSTER Installation and Administration Guide" for the correct usage of hvcm.

(CML, 12) Unrecognized option option.

Content:

The option provided is not a valid one.

Corrective action:

Refer to "PRIMECLUSTER Installation and Administration Guide" for the correct usage of hvcm.

(CML, 17) Incorrect range argument with -l option.

Content:

The number for the -l option is not correct. Check the range.

Corrective action:

Check the man page for hvcm for range argument with -l option.

Refer to "PRIMECLUSTER Installation and Administration Guide" for the correct usage of hvcm.

(CML, 18) Log level <loglevel> is too large. The valid range is 1..maxloglevel with the -l option.

Content:

If the loglevel loglevel specified with -l option for hvcm or hvutil is greater than the maximum possible loglevel maxloglevel, this message is printed.

Corrective action:

Specify a loglevel between 1 and maxloglevel for 'hvcm -l' or 'hvutil -l'.

(CML, 19) Invalid range <low - high>.Within the '-l' option, the end range value must be larger than the first one.

Content:

When a range of loglevels has been specified with -l option for hvcm or hvutil, if the value of the end range high is smaller than the value of low, this message appears.

Corrective action:

Specify the end range value to be higher than the initial end range value.

(CML, 20) Log level must be numeric.

Content:

This message is output under any one of the following conditions:

The log level specified with -l option of hvutil is neither a numeric value, nor "off", nor "display".
The log level specified with -l option of hvcm is neither a numeric value nor "off".

Corrective action:

Refer to the manual page for the correct use of hvutil/hvcm command.

(CML, 21) 0 is an invalid range value. 0 implies all values. If a range is desired, the valid range is 1..maxloglevel with the -l option.

Content:

If the log level specified with the -l option of hvcm or hvutil is outside the valid range, this message is printed.

Corrective action:

The valid range for the -l option of hvcm or hvutil is 1..maxloglevel.

6.1.3.6 CRT: Contracts and contract jobs

(CRT, 1) FindNextHost: local host not found in priority list of nodename.

Content:

The RMS base monitor maintains a priority list of all the hosts in the cluster. Under normal circumstances, the local host should always be present in the list. If this is not the case, this message is printed.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(CRT, 2) cannot obtain the NET_SEND_Q queue.

Content:

RMS uses internal queues for sending contracts. (Contracts are messages that are transmitted between the hosts in a cluster that ensure the hosts are synchronized with respect to a particular operation. The messages may be transmitted between processes on the same host or processes on different hosts.) If there is a problem with the queue NET_SEND_Q that is being used to transmit these contracts, this message is printed in the switchlog.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(CRT, 3) Message send failed.

Content:

When RMS tries to send a message to another host in the cluster, if the delivery of this message over the queue NET_SEND_Q has failed, this message is printed. This could be due to the fact that the host that is to receive the message has gone down or there is a problem with the cluster interconnect.

Corrective action:

Check to make sure that the other hosts in the cluster are all alive and make sure that none of them are

experiencing any network problems.

(CRT, 4) object: type Contract retransmit failed: Message Id = messageid
see bmlog for contract details.

Content:

When RMS on one host sends a contract over the queue NET_SEND_Q to another host (or itself, if there is only one host in the cluster), it tries to transmit this contract a certain number of times that is determined internally. If the message transmission fails after all attempts, this message is printed to the switchlog and the contract is discarded. (Note: UAP contracts are not discarded.)

Corrective action:

Make sure that there is no problem with the cluster interconnect and integrity of the cluster. (In other words, cluster applications are not Online on multiple nodes or the SysNode is not in Wait state.)

If there is a problem on results were confirmed, record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(CRT, 5) The contract <crtname> is being dropped because the local host <crthost> has found the host originator <otherhost> in state <state>. That host is expected to be in state Online. Please check the interhost communication channels and make sure that these hosts see each other Online.

Content:

The local host crthost sees the contract host originator in state state when it is expected to be in state Online.

Corrective action:

Make sure that the interhost communication channels are working correctly and that the hosts see each other online.

6.1.3.7 CTL: Controllers

(CTL, 1) Controller <controller> will not operate properly since its controlled resource <resource> is not in the configuration.

Content:

This message appears when a resource is not in the RMS configuration file that is controlled by a controller and the controller's NullDetector attribute is set to off.

Corrective action:

The controlled resource must be present in the RMS configuration file for the controller to work properly.

Configure the resource properly.

(CTL, 2) Controller <controller> detected more than one controlled application Online. This has lead to the controller fault. Therefore, all the online controlled application will now be switched offline.

Content:

If the controller controller has two or more of the controlled applications Online on one or more hosts, then the controller faults.

Corrective action:

Make sure that more than one controlled application for a controller is not Online.

6.1.3.8 CUP: userApplication contracts

(CUP, 2) object: cluster is in inconsistent condition
current online host conflict,
received: host, local: onlinenode.

Content:

The cluster hosts were unable to determine which host is responsible for a particular userApplication. The most likely reason for this is an erroneous system administrator intervention (e.g., a forced hvswitch request) that left the userApplication Online on more than one host simultaneously.

Corrective action:

Analyze the cluster inconsistency and perform the appropriate action to resolve it. If the application is online on more than one hosts, shut down ('hvutil -f') the userApplication on all but one host.

(CUP, 3) object is already waiting for an event cannot set timer!

Content:

A critical internal error has occurred.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(CUP, 5) object received unknown contract.

Content:

The contract received by the node from the application is not recognizable. This is a critical internal error.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(CUP, 7) userApplication is locally online, but is also online on another host.

Content:

The userapplication is already online on other host and is also online in current host.

Corrective action:

The userApplication can only be online on one host. Make sure the application is offline on all but one of the hosts. If this is not the case use 'hvutil -f' to bring the userApplication to the offline state on the superfluous hosts.

(CUP, 8) object: could not get an agreement about the current online host; cluster may be in an inconsistent condition!

Content:

Note: This message corresponds to (CUP, 2). While (CUP, 8) is printed on the contract originator, (CUP, 2) is printed on the non-originator hosts.

Corrective action:

6.1.3.9 DET: Detectors

(DET, 1) FAULT REASON: Resource <resource> transitioned to a Faulted state due to a child fault.

Content:

This message appears when the child faulted unexpectedly thereby causing the resource to fault.

Corrective action:

Check to see why the child resource has faulted and based on this take corrective action.

(DET, 2) FAULT REASON: Resource <resource> transitioned to a Faulted state due to a detector report.

Content:

A detector unexpectedly reported the Faulted state.

Corrective action:

Check to see why the resource has faulted and take appropriate action.

(DET, 3) FAULT REASON: Resource <resource> transitioned to a Faulted state due to a script failure.

Content:

This message appears when the detector failed to execute the script for a resource.

Corrective action:

Ensure that there is nothing wrong with the script and also check the resource for any problems.

(DET, 4) FAULT REASON: Resource <resource> transitioned to a Faulted state due to a FaultScript failure.This is a double fault.

Content:

As a failure occurred to a resource, although FaultScript was executed, execution of FaultScript for the resource shown in resource failed.

Corrective action:

Check the problem cause against the resource that triggered execution of FaultScript. Also, check influence to the system caused by fail of FaultScript execution against the resource shown in resource that failed execution of FaultScript. If the problem cause and influence to the system arising from failure in FaultScript execution are unknown, collect the investigation information and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(DET, 5) FAULT REASON: Resource <resource> transitioned to a Faulted state due to the resource failing to come Offline after running its OfflineScript (offlineScript).

Content:

After a resource executes its Offline script, it is expected to come Offline. If it does not change its state, or transitions to a state other than Offline within the period of seconds specified by its ScriptTimeout attribute, the resource is considered as being Faulted.

Corrective action:

Make sure the Offline script moves the resource into Offline state.

(DET, 6) FAULT REASON: Resource <resource> transitioned to a Faulted state due to the resource failing to come Online after running its OnlineScript (onlinescript).

Content:

After a resource executes its online script, it is expected to come Online. If it does not change its state, or transitions to a state other than Online within the period of seconds specified by its ScriptTimeout attribute, the resource is considered as being Faulted.

Corrective action:

Make sure the Online script moves the resource into Online state.

(DET, 7) FAULT REASON: Resource <resource> transitioned to a Faulted state due to the resource unexpectedly becoming Offline.

Content:

This message appears when the resource becomes Offline unexpectedly. When detector stops response to BM, it is judged that the resource becomes faulty.

Corrective action:

Check the cause why the resource suddenly transitioned to the Offline state. If the cause cannot be identified, collect the investigation information and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(DET, 11) DETECTOR STARTUP FAILED: Corrupted command line <commandline>.

Content:

This message occurs when the command line is empty or has some incorrect value.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(DET, 12) DETECTOR STARTUP FAILED <detector>.REASON: errorreason.

Content:

If the detector detector could not be started due to errorreason, this message is printed. The reason errorreason could be any one of the following:

The detector detector does not exist.
The detector detector does not have execute permission.
The process for the detector could not be spawned.
If the number of processes created by the base monitor at the same time is greater than 128.

Corrective action:

Depending on what the reason for the error is take appropriate action.

(DET, 13) Failed to execute script <script>.

Content:

The detector script is not good or the format is not good.

Corrective action:

Check the detector startup script.

(DET, 24) FAULT REASON: Resource <resource> transitioned to a Faulted state due to the resource failing to come Standby after running its OnlineScript (onlinescript).

Content:

After a resource executes its online script during standby request, it is expected to come Standby. If it does not change its state, or transitions to a state other than Standby or Online within the period of seconds specified by its ScriptTimeout attribute, the resource is considered as being Faulted.

Corrective action:

Make sure the Online script moves the resource into Standby or Online state during standby request.

(DET, 26) FAULT REASON: Resource <resource> transitioned to a Faulted state due to the resource failing to come Online.

Content:

This message appears when the resource fails to come Online after executing it Online scripts that may transition the state of the resource to faulted.

Corrective action:

Check to see what prevented the resource resource from coming Online.

(DET, 28) <object>: CalculateState() was invoked for a non-local object! This must never happen. Check for possible configuration errors!

Content:

During the processing of a request within the state engine, a "request or response token" was delivered to an object that is not defined for the local host. This is a critical internal error.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(DET, 33) DETECTOR STARTUP FAILED: Restart count exceeded.

Content:

When a detector dies, restart has attempted exceeding the predetermined value in vain. The detector is considered faulty.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(DET, 34) No heartbeat has been received from the detector with pid <pid>, <startupcommand>, during the last <seconds> seconds. The base monitor will send the process a SIGALRM to interrupt the detector if it is currently stalled waiting for the alarm.

Content:

In order to avoid stalling of RMS detectors, each detector periodically sends a heartbeat message to the base monitor. When the heartbeat is missing for a period of time, the base monitor prints this message into switchlog. The base monitor will send an alarm signal to the stalled process to ensure the detector will properly handle its main loop responsibilities. If the amount of time stated since the last time the base monitor had received the heartbeat from the detector exceeds 300 seconds, then the message may indicate the base monitor is not allowed to run. Currently, the base monitor is a real-time process, but not locked in memory. This message may also occur because the bm process has been swapped out and has not had a chance to run again.

Corrective action:

Make sure that the base monitor and detector are active using system tools such as truss(1) or strace(1). If the loss of heartbeat greatly exceeds the 300 second timeout, then this may require that system swap or main memory is insufficient.

6.1.3.10 GEN: Generic detector

(GEN, 1) Usage: command -t time_interval -k kind [-d]

Content:

command has been invoked in a way that does not conform to its expected usage.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(GEN, 3) Cannot open command log file.

Content:

The file command log used for logging could not be opened.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(GEN, 4) failed to create mutex: directory

Content:

The various RMS commands like hvdisp, hvswitch, hvutil and hvdump utilize the lock files from the directory directory for signal handling purposes. These files are deleted after these commands are completed. The locks directory is also cleaned when RMS starts up. If they are not cleaned for some reason, this message is printed, and RMS exits with exit code 99.

Corrective action:

Make sure that the locks directory directory exists. If so, delete it.

(GEN, 5) command: failed to get information about RMS base monitor bm!

Content:

The generic detector command was unable to get any information about the base monitor.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(GEN, 7) command: failed to lock virtual memory pages, errno = value, reason: reason.

Content:

The generic detector command was not able to lock its virtual memory pages in physical memory.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

6.1.3.11 INI: init script

(INI, 1) Cannot open file dumpfile, errno = errno: explanation.

Content:

This message appears when the file dumpfile failed to open because of the error code errno, explained in explanation.

Corrective action:

Correct the problem according to explanation.

(INI, 9) Cannot close file dumpfile, errno = errno: explanation.

Content:

This message appears when the file dumpfile failed to close because of the error code errno, explained in explanation.

Corrective action:

Correct the problem according to explanation.

6.1.3.12 MIS: Miscellaneous

(MIS, 1) No space for object.

Content:

An internal error has occurred.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

6.1.3.13 QUE: Message queues

(QUE, 13) RCP fail: filename is being copied.

Content:

This message is printed if an attempt was made to copy the file filename when there was another copy in progress.

Corrective action:

Make sure that concurrent copies of the same file do not occur.

(QUE, 14) RCP fail: fwrite errno errno.

Content:

There was a problem while transferring files from one cluster host to the other.

Corrective action:

Take action based on the errno.

6.1.3.14 SCR: Scripts

(SCR, 8) Invalid script termination for controller <controller>.

Content:

The controller script is not correct or invalid.

Corrective action:

Check the controller script.

(SCR, 9) REASON: failed to execute script <script> with resource <resource>: errorreason.

Content:

When the PreCheckScript is set in the script <script>, multiple cluster applications with exclusive relationship might have been activated on the same node.
The created script may have an error.

Corrective action:

When the script is PreOnlineScript and the SControllerOf_ScalableCtr_* is output in the resource <resource>, the cluster application in Standby state, which is controlled by the scalable application, cannot be activated.
If this message is output when some nodes that configure a cluster are activated, start the cluster application in Standby state, or start RMS on all the nodes.
If this message is output when RMS is about to be stopped right after RMS is activated on all the nodes, No action is required when RMS is stopped.
When exclusive relationship is configured among multiple cluster applications, and the job priorities are the same, or the higher priority cluster application is in operation, the startup of another cluster application with the exclusive relationship is stopped and this message is printed. No action is required when exclusive relationship is configured among multiple cluster applications.
Check the error reason errorreason. Review the created script.

When the above actions do not solve this error, record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(SCR, 20) The attempt to shut down the cluster host host has failed: errorreason.

Content:

The cluster host could not be killed because of one of the following reasons:

Script exited with a non-zero status.
Script exited due to signal caught.
Other unknown failure.

Corrective action:

Verify the status of the node, make any necessary corrections to the script, potentially correct the node state manually if possible and issue appropriate 'hvutil -{o, u}' as needed.

(SCR, 21) Failed to execute the script <script>, errno = <errno>, error reason: <errorreason>.

Content:

If the script cannot be executed, this message is printed along with the errorreason.

Corrective action:

Take action based on the errorreason.

(SCR, 26) The sdtool notification script has failed with status status after dynamic modification.

Content:

After dynamic modification, the Shutdown Facility is notified via sdtool about the changes in the current configuration. If sdtool exits abnormally, then the base monitor must exit.

Corrective action:

Verify that sdtool and the Shutdown Facility are operating properly.

Red Hat Enterprise Linux 5 (for Intel64)

No action is required in the xen kernel environment.

6.1.3.15 SWT: Switch requests (hvswitch command)

(SWT, 4) object is online locally, but is also online on onlinenode.

Content:

The object <object> is online both on the local node and onlinenode. When the object <object> manages shared disks, data corruption may occur.

Corrective action:

Make sure that the object object is online on only one host in the cluster.

(SWT, 20) Could not remove host <hostname> from local priority list.

Content:

A host has left the cluster, but RMS was unable to remove the corresponding entry from its internal Priority List. This is an internal problem in the program stack and memory management.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(SWT, 25) objectname: outstanding switch request of dead host was denied; cluster may be in an inconsistent condition!

Content:

A host died during the processing of a switch request. The host that takes over the responsibility for that particular userApplication tried to proceed with the partly-done switch request, but another host does not agree. This indicates a severe cluster inconsistency and critical internal error.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(SWT, 26) object: dead host <hostname> was holding an unknown lock. Lock will be skipped!

Content:

This message appears when the dead host hostname was holding a lock that is unknown to the new responsible host.

Corrective action:

Allow time for the cluster to cleanup.

(SWT, 45) hvshut aborted because of a busy uap <userapplication>.

Content:

The hvshut request was aborted because the application is busy.

Corrective action:

Do not shut down RMS when its applications are busy. Make sure the application finishes its processing before shutting down RMS.

(SWT, 46) hvshut aborted because modification is in progress.

Content:

The hvshut request was aborted because dynamic modification is in progress.

Corrective action:

Do not shut down RMS while dynamic modification is in progress. Wait until dynamic modification finishes before shutting down RMS.

(SWT, 84) The userApplication application is in an Inconsistent state on multiple hosts. hvswitch cannot be processed until this situation is resolved by bringing the userApplication Offline on all hosts - use hvutil -f application to achieve this.

Content:

The userApplication application is in an Inconsistent state on more than one host in the cluster, as such switch request was denied.

Corrective action:

Clear the Inconsistent state.

6.1.3.16 SYS: SysNode objects

(SYS, 1) Error on SysNode: object. It failed to send the kill success message to the cluster host: host.

Content:

When a cluster host is killed, the host requested the kill must send a success message to the surviving hosts. This message appears in the switchlog when this message send fails.

Corrective action:

Make sure the cluster and network conditions are such that the message can be sent across the network.

(SYS, 8) RMS failed to shut down the host host via a Shutdown Facility, no further kill functionality is available. The cluster is now hung. An operator intervention is required.

Content:

This message appears when the RMS was sending a kill request to the Shutdown Facility and did not get the elimination acknowledgement.

Corrective action:

If CF is in LEFTCLUSTER state, clear the LEFTCLUSTER state. See "PRIMECLUSTER Cluster Foundation (CF) Configuration and Administration Guide" for LEFTCLUSTER state.

If CF state is not LEFTCLUSTER, check the status of SysNode.

If SysNode is in Wait state, clear the Wait state. See "PRIMECLUSTER Reliant Monitor Services (RMS) with Wizard Tools Configuration and Administration Guide" for how to clear the Wait state.

(SYS, 13) Since this host <hostname> has been online for no more than time seconds, and due to the previous error, it will shut down now.

Content:

This message appears when the checksum of this host is different from the hosts in the cluster (one of the possible reasons).

Corrective action:

Check the configuration in all the cluster hosts and verify that same configuration is running on all of them.

(SYS, 14) Neither automatic nor manual switchover will be possible on this host until <detector> detector will report offline or faulted.

Content:

When different configurations are encountered in a cluster where one host is offline and the other is online.

Corrective action:

Run the same configuration in a single cluster or different clusters do not have common hosts.

(SYS, 15) The uname() system call returned with Error. RMS will be unable to verify the compliance of the RMS naming convention!

Content:

This message appears when uname() system call returned with a non-zero value.

Corrective action:

Make sure that the SysNode name is valid and restart RMS as needed.

(SYS, 17) The RMS internal SysNode name "sysnode" is ambiguous with the name "name". Please adjust names compliant with the RMS naming convention "SysNode = `uname -n`RMS"

Content:

The RMS naming convention '_sysnodename_ = `uname -n`RMS' is intended to allow use of the CF-name with and without trailing "RMS" whenever an RMS command expects a SysNode reference. This rule creates an ambiguity if one SysNode is named "xxxRMS" and another is named "xxx", because '_rms_command_ xxx' could refer to either SysNode. Therefore, ambiguous SysNode names are not be allowed.

Corrective action:

Use non-ambiguous SysNode names and adhere to the RMS naming conventions.

(SYS, 48) Remote host <hostname> replied the checksum <remotechecksum> which is different from the local checksum <localchecksum>. The sysnode of this host will not be brought online.

Content:

This message appears when the remote host hostname is running different configuration than the local host or different loads of RMS package are installed on these hosts.

Corrective action:

Make sure all the hosts are running the same configuration and the configuration is distributed on all hosts. Make sure that same RMS package is installed on all hosts (same load).

(SYS, 49) Since this host <hostname> has been online for more than time seconds, and due to the previous error, it will remain online, but neither automatic nor manual switchover will be possible on this host until <detector> detector will report offline or faulted.

Content:

This message appears when the checksum of this host is different from the hosts in the cluster (one of the possible reasons).

Corrective action:

Check the configuration in all the cluster hosts and verify that same configuration is running on all of them.

(SYS, 50) Since this host <hostname> has been online for no more than time seconds, and due to the previous error, it will shut down now.

Content:

This message appears when the checksum of this host is different from the hosts in the cluster (one of the possible reasons).

Corrective action:

Check the configuration in all the cluster hosts and verify that same configuration is running on all of them.

(SYS, 84) Request <hvshut -a> timed out. RMS will now terminate! Note: some cluster hosts may still be online!

Content:

The hvshut -a command has timed out. RMS may end abnormally on some nodes and some resources that are included in the cluster applications may fail to end.

Corrective action:

Shut down the OSes on all the nodes except the nodes on which RMS has ended normally or shut down the nodes forcibly. To prevent the timeout of the hvshut command, depending on your environment, change RELIANT_SHUT_MIN_WAIT, which is the global environment variable of RMS, to a larger value.

See

For details on RELIANT_SHUT_MIN_WAIT, see "RELIANT_SHUT_MIN_WAIT" in "Global environment variables" of the following manual below:

For PRIMECLUSTER 4.3A30 or later: "PRIMECLUSTER Reliant Monitor Services (RMS) with Wizard Tools Configuration and Administration Guide."

See "PRIMECLUSTER Reliant Monitor Services (RMS) with Wizard Tools Configuration and Administration Guide" for how to refer to and change the RMS environment variables.

(SYS, 90) hostname internal WaitList addition failure! Cannot set timer for delayed detector report action!

Content:

System Error.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(SYS, 93) The cluster host nodename is not in the Wait state. The hvutil command request failed!

Content:

This message appears when the user issues the hvutil command ('hvutil -o' or 'hvutil -u') and the cluster host nodename is not in the Wait state.

Corrective action:

Reissue hvutil -{o, u} only when the host is in a Wait state or configure so that this command is not issued.

(SYS, 94) The last detector report for the cluster host hostname is not online. The hvutil command request failed!

Content:

This message appears when the user issues the hvutil command ('hvutil -o sysnode') to clear the Wait state of the SysNode and the SysNode is still in Wait state because the last detector report for the cluster host hostname is not Online i.e. the SysNode might have transitioned to Wait state not from Online but from some other state.

Corrective action:

Issue 'hvutil -o' only when the host transits from the online state to Wait state.

(SYS, 97) Cannot access the NET_SEND_Q queue.

Content:

When a new host comes Online, the other hosts in the cluster try to determine if the new host has been started with -C option. The host that has just come online uses the queue NET_SEND_Q to send the necessary information to the other hosts in the cluster. If this host is unable to access the queue NET_SEND_Q this message is printed.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(SYS, 98) Message send failed in SendJoinOk.

Content:

When a new host comes Online, the other hosts in the cluster try to determine if the new host has been started with -C option. The host that has just come online uses the queue NET_SEND_Q to send the necessary information to the other hosts in the cluster. If this host is unable to send the necessary information to the other hosts in the cluster, this message is printed.

Corrective action:

Check if there is a problem with the network.

(SYS, 100) The value of the attribute <attr> specified for SysNode <sysnode> is <invalidvalue> which is invalid. Ensure that the entry for <attr> is resolvable to a valid address.

Content:

The value of attr is not resolvable to a valid network address.

Corrective action:

Ensure that a valid interface is specified for attr.

(SYS, 101): Unable to start RMS on the remote SysNode <SysNode> using cfsh, rsh or ssh.

Content:

RMS on the local node could not start RMS on the remote node <SysNode> by using either cfsh, rsh, and ssh.

Corrective action:

Make cfsh available. rsh and ssh are not supported.

This message may be output when the hvcm -a command is executed on multiple nodes. In this case, execute the hvcm -a command on any one node that configures a cluster.

6.1.3.17 UAP: userApplication objects

(UAP, 1) Request to go online will not be granted for application <userapplication> since the host <sysnode> runs a different RMS configuration.

Content:

This message appears when the request is done for an application userapplication to go Online but the host sysnode is running a different configuration.

Corrective action:

Make sure that the user is running the same configuration.

(UAP, 5) object: cmp_Prio: list.

Content:

This message is printed when invalid entries exist in the priority list list.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(UAP, 6) Could not add new entry to priority list.

Content:

A critical internal error has occurred.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(UAP, 7) Could not remove entries from priority list.

Content:

A critical internal error has occurred.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(UAP, 8) object: cpy_Prio failed, source list corrupted.

Content:

This message appears when either the PriorityList is empty or the list is corrupted. A critical internal error has occurred.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(UAP, 9) object: Update of PriorityList failed, cluster may be in inconsistent condition.

Content:

If a contract that is supposed to be present in the internal list does not exist, this message is printed. The cluster may be in an inconsistent condition.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(UAP, 15) sysnode: PrepareStandAloneContract() processing unknown contract.

Content:

This message appears when there is only one application sysnode Online and has to process a contract that is not supported. A critical internal error has occurred.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(UAP, 16) object::SendUAppLockContract: local host doesn't hold a lock -- Contract processing denied.

Content:

This message appears when the contract is processed by the local host, which does not have the lock for that application contract. A critical internal error has occurred.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(UAP, 19) object::SendUAppLockContract: LOCK Contract cannot be sent.

Content:

This message appears when the LOCK contract cannot be sent over the network.

Corrective action:

The network may be down. Check the network for abnormalities.

(UAP, 21) object::SendUAppUnLockContract: UNLOCK Contract cannot be sent.

Content:

This message appears when the UNLOCK contract cannot be sent over the network.

Corrective action:

The network may be down. Check the network for abnormalities.

(UAP, 22) object unlock processing failed, cluster may be in an inconsistent condition!

Content:

This message appears when the local node receives a UNLOCK contract but is unable to perform the follow up processing that was committed in the contract.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(UAP, 23) object failed to process UNLOCK contract.

Content:

A host was unable to propagate the received UNLOCK contract, e.g., because of networking problems or memory problems.

Corrective action:

This message should appear with an additional ERROR message specifying the origin of the problem.

Refer to the ERROR message.

(UAP, 24) Deleting of local contractUAP object failed, cannot find object.

Content:

This message appears when the local contract node has completed the contract and has sent it to the local node but the local node could not able to find it.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(UAP, 27) object received a DEACT contract in state: state.

Content:

The correspondent userApplication on a remote host is in the DeAct state, but the local userApplication is not. This is an error that should not occur.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(UAP, 28) object failed to update the priority list. Cluster may be in an inconsistent state.

Content:

When the local host receives a contract for unlocking the hosts in the cluster with respect to a particular operation, if the local host finds that a particular host has died, it updates its priority list to reflect this, but if it is unable to perform this operation due to some reason, this message is printed. This indicates a critical internal problem in memory management. This is a critical internal error.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(UAP, 29) object: contract data section is corrupted.

Content:

This message appears when the application is unable to read the data section of the contract.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(UAP, 32) object received unknown contract.

Content:

This message appears when the application unable to unlock the contract as it was unable to find the kind of contract request in its code that it expected. A critical internal error has occurred.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(UAP, 33) object unknown task in list of outstanding contracts.

Content:

This message appears when an unknown task is found in list of outstanding contracts. Critical internal error.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(UAP, 35) object: inconsistency occurred. Any further switch request will be denied (except forced requests). Clear inconsistency before invoking further actions!

Content:

The state of the application is Offline or Standby and some of its resources are Online or Faulted.

Corrective action:

Clear the inconsistency by the appropriate command (usually 'hvutil - c').

(UAP, 41) cannot open file filename. Last Online Host for userApplication cannot be stored into non-volatile device.

Content:

File open error.

Corrective action:

Check the environmental variable RELIANT_PATH.

(UAP, 42) found incorrect entry in status file:<entry>

Content:

This message appears when the status_info file has incorrect entry in it.

This error should not occur unless the status_info file was edited manually.

Corrective action:

Check the status_info file for manually edited incorrect entries. If this is not the case, record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(UAP, 43) <object>: could not insert <host> into local priority list.

Content:

A critical internal error has occurred.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(UAP, 44) <object>: could not remove <host> from local priority list.

Content:

A critical internal error has occurred.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(UAP, 45) <object>: could not remove <host> from priority list.

Content:

A critical internal error has occurred.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(UAP, 51) Failed to execute the fcntl system call to flags the file descriptor flags for file filename: errno = <errornumber>: <errortext>.

Content:

RMS is unable to execute the fcntl() system call to <flags> the file descriptor flags of file <filename> because of error code <errornumber> as explained by <errortext>.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

6.1.3.18 US: us files

(US, 5) The cluster host hostname is no longer reachable! Please check the status of the host and the Ethernet connection.

Content:

The local cluster host detected that another cluster host <hostname> was no longer reachable. In other words, this cluster host sees the other host <hostname> as faulted. The other host <hostname> may have gone down, or there may some problem with the cluster interconnect.

Corrective action:

See if the host <hostname> is indeed dead. If not, see if there is a problem with the network connection.

(US, 6) RMS has died unexpectedly on the cluster host hostname!

Content:

When the detector on the local host detects that the host <hostname> has transitioned from online to offline unexpectedly, it attempts to kill the host <hostname>.

Corrective action:

Check the syslog on the host <hostname> to determine the reason why it went down.

(US, 31) FAULT REASON: Resource resource transitioned to a Faulted state due to a detector report.

Content:

This message is output when an unexpected Faulted state is notified from a detector.

Corrective action:

Investigate the cause of the resource failure and take necessary actions.

6.1.3.19 WLT: Wait lis

(WLT, 1) FAULT REASON: Resource resource's script (scriptexecd) has exceeded the ScriptTimeout of timeout seconds.

Content:

The detector script for the resource has exceeded the ScriptTimeout limit.

Corrective action:

Check the resource for breakdown. If so, take the necessary corrective action.

If not, change the attribute so that the value specified to resource ScriptTimeout is longer than the Online/Offline script execution time.

(WLT, 3) Cluster host hostname's Shutdown Facility invoked via (script) has not finished in the last time seconds. An operator intervention is required!

Content:

The Shutdown Facility that is killing host hostname has not terminated yet. Operator intervention may be required. This message will appear periodically (with the period equal to the node's ScriptTimeout value), until either the script terminates on its own, or until the script is terminated by the Unix kill command. If terminated by the kill command, the host being killed will not be considered killed.

Corrective action:

Wait until the script terminates, or terminate the script using kill command if the script cannot terminate on its own.

(WLT, 5) CONTROLLER FAULT: Controller <object> has propagated <request> request to its controlled application(s) <applications>, but the request has not been completed within the period of <timeout> seconds.

Content:

When controller propagates its requests to the controlled applications, it is waiting for the completion of the request for a period of time sufficient for the controlled applications to process the request. When the request is not completed within this period, controller faults.

Corrective action:

Fix the controller's scripts and/or scripts of the controlled applications, or repair resources of the controlled applications. For user defined controller scripts increase their ScriptTimeout values.

(WLT, 9) sdtool notification timed out after <timeout> seconds.

Content:

After dynamic modification, the Shutdown Facility is notified via sdtool about the changes in the current configuration. If this notification does not finish within the period specified by the local SysNode ScriptTimeout value, the base monitor must exit.

Corrective action:

Verify that sdtool and Shutdown Facility are properly operating. Increase the ScriptTimeout value if needed.

6.1.3.20 WRP: Wrappers

(WRP, 1) Failed to set script to TS.

Content:

The script could not be made into a time sharing process.

Corrective action:

Take the necessary action based on the cause.

(WRP, 2) Illegal flag for process wrapper creation.

Content:

A critical internal error has occurred.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(WRP, 3) Failed to execv: command.

Content:

This message could occur in any of the following scenarios:

A detector cannot be started because RMS is unable to create the detector process with the command command.
'hvcm -a' has been invoked and the RMS base monitor cannot be started on the individual hosts comprising the cluster with the command command.
A script cannot be started because RMS is unable to create the script process with the command command.

RMS shuts down on the node where this message appears and returns an error number errno, which is the error number returned by the operating system.

Corrective action:

Check if the problem cause can be identified from the error number errno, referring to the manual page of the system or "Appendix B Solaris/Linux ERRNO table" If the cause cannot be determined, contact field engineers.

(WRP, 4) Failed to create a process: command.

Content:

This message could occur in any of the following scenarios:

A detector cannot be started because RMS is unable to create the detector process to execute the command command.
'hvcm -a' has been invoked and the RMS base monitor cannot be started on the individual hosts comprising the cluster with the command command.
A script cannot be started because RMS is unable to create the script process with the command command.

RMS shuts down on the node where this message appears and returns an error number errno, which is the error number returned by the operating system.

Corrective action:

(WRP, 5) No handler for this signal event <signal>.

Content:

There is no signal handler associated with the signal signal.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(WRP, 6) Cannot find process (pid=processid) in the process wrappers.

Content:

A critical internal error has occurred.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(WRP, 7) getservbyname failed for service name: servicename.

Content:

A critical internal error has occurred.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(WRP, 8) gethostbyname failed for remote host: host.

Content:

A critical internal error has occurred.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(WRP, 9) Socket open failed.

Content:

This message occurs if RMS is unable to create a datagram endpoint for communication.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(WRP, 12) Failed to bind port to socket.

Content:

This could occur if RMS is unable to bind the endpoint for communication.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(WRP, 13) Cannot allocate memory, errno <errno> - strerrno.

Content:

RMS internal error.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(WRP, 14) No available slot to create a new host instance.

Content:

When the base monitor for RMS starts up, it creates a slot in an internal data structure for every host in the cluster.

When hvdet_node is started up, RMS sends it a list of the SysNode objects that are put into different slots in the internal data structure. If the data structure has run out of slots (16) to put the SysNode name in, RMS generates this error.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(WRP, 15) gethostbyname(hostname): host name should be in /etc/hosts

Content:

When the host name hostname specified as a SysNode does not have an entry in /etc/hosts, this message is printed to the switchlog.

Corrective action:

Correct the host name hostname to be an entry in /etc/hosts.

(WRP, 16) No available slot for host hostname

Content:

When a slot for a cluster interface (64) is insufficient, this message is output with the node name hostname.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(WRP, 17) Size of integer or IP address is not 4-bytes

Content:

A critical internal error has occurred.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(WRP, 18) Not enough memory in <processinfo>

Content:

A critical internal error has occurred.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(WRP, 23) The child process <cmd> with pid <pid> could not be killed due to errno <errno>, reason: reason.

Content:

The child process cmd with pid pid could not be killed due to reason: reason.

Corrective action:

Take action based on the reason reason.

(WRP, 24) Unknown flag option set for 'killChild'.

Content:

The killChild routine accepts one of the 2 flags: KILL_CHILD and DONTKILL_CHILD. If an option other than these two has been specified, this message is printed.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(WRP, 25) Child process <cmd> with pid <pid> has exceeded its timeout period. Will attempt to kill the child process.

Content:

The child process cmd has exceeded its timeout period.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(WRP, 29) RMS on the local host has received a message from host host, but the local host is unable to resolve the sending host`s address. This could be due to a misconfiguration. This message will be dropped. Further such messages will appear in the switchlog.

Content:

RMS on the local host has received a message from host host whose address is not resolvable by the local host.

Corrective action:

Make sure that the local host is able to resolve the remote host host's address by checking for any misconfigurations.

(WRP, 30) RMS on the local host has received a message from host host, but the local host is unable to resolve the sending host`s address. This message will be dropped. Please check for any misconfiguration.

Content:

RMS on the local host has received a message from host host whose address is not resolvable by the local host.

Corrective action:

Make sure that the local host is able to resolve the remote host host's address by checking for any misconfigurations.

(WRP, 31) RMS has received a message from host host with IP address receivedip. The local host has calculated the IP address of that host to be calcip. This may be due to a misconfiguration in /etc/hosts. Further such messages will appear in the switchlog.

Content:

The local host has received a message from host host with IP address receivedip, which is different from the locally calculated IP address for that host.

Corrective action:

Check /etc/hosts for any misconfiguration of RMS configuration file.

(WRP, 32) RMS has received a message from host host with IP address receivedip. The local host has calculated the IP address of that host to be calcip. This may be due to a misconfiguration in /etc/hosts.

Content:

The local host has received a message from host host with IP address receivedip, which is different from the locally calculated IP address for that host. This message will be printed in the switchlog for every 25 such messages that have been received as long as the number of received messages is less than 500, if not this message is printed for every 250th such message received.

Corrective action:

Check /etc/hosts for any misconfiguration of RMS configuration file.

(WRP, 33) Error while creating a message queue with the key <id>, errno = <errno>, explanation: <explanation>.

Content:

An abnormal OS condition occurred while creating a message queue.

Corrective action:

Check OS conditions that affect memory allocation for message queues, such as the size of swap space, the values of parameters msgmax, msgmnb, msgmni, msgtql. Check if the maximum number of message queues have already been allocated.

(WRP, 34) Cluster host host is no longer in time sync with local node. Sane operation of RMS can no longer be guaranteed. Further out-of-sync messages will appear in the syslog.

Content:

The time on host is not in sync with the time on the local node.

Corrective action:

Synchronize the time of host with that of the local node.
Also, check if NTP server is connected to network and if the NTP setting is correct.

(WRP, 35) Cluster host host is no longer in time sync with local node. Sane operation of RMS can no longer be guaranteed.

Content:

The time on the cluster host host differs significantly (more than 25 seconds) from the local node.

Corrective action:

Synchronize the time.
Also, check if NTP server is connected to network and if the NTP setting is correct.

(WRP, 52) The operation func failed with error code errorcode.

Content:

The operation func failed with error code errorcode.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(WRP, 60) The elm heartbeat detects that the cluster host <hostname> has become offline.

Content:

ELM heart beat is stopped.

Corrective action:

No action is required because nodes will be forcibly stopped by the stop of ELM heartbeat normally.

(WRP, 68) Unable to update the RMS lock file, function <function>, errno <errno> - errorreason.

Content:

RMS internal error.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(WRP, 69) function failed, reason: errorreason, errno <errno>.

Content:

RMS internal error.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(WRP, 71) Both IPv4 and IPv6 addresses are assigned to <SysNode> in /etc/hosts.

Content:

For SysNode, both IPv4 and IPv6 addresses are assigned to the host name database files (/etc/hosts, /etc/inet/ipnodes).

Corrective action:

Correct the host name database files so that either IPv4 IP address or IPv6 IP address is assigned to SysNode.