This chapter contains a detailed list of non-fatal RMS error messages output in the switchlog file.
Check displayed component names of messages and then see the table below to determine references. The component names are explained in numerical order of messages.
Component | Reference |
---|---|
ADC | |
ADM | |
BAS | |
BM | |
CML | |
CRT | |
CTL | |
CUP | |
DET | |
GEN | |
INI | |
MIS | |
QUE | |
SCR | |
SWT | |
SYS | |
UAP | |
US | |
WLT | |
WRP |
Content:
time is the value of the environment variable HV_CHECKSUM_INTERVAL, if set, or 120 seconds otherwise. This message could appear when the checksums of the configurations of the local and the remote host are different, no more than time seconds have elapsed, and one of the following is true:
When the remote host is joining the cluster, and all the applications on the local host are either Offline or Faulted. RMS exits with exit code 60.
The configuration for the local host does not include the remote host, but the configuration for the remote host does include the local host. The local host hostname will shut down with exit code 60.
Corrective action:
The local and the remote hosts are running different configurations. Make sure that both of them are running the same configuration.
Content:
The checksums of the configurations of the local and the remote hosts are different, no more than the number of seconds determined by the value of the environment variable HV_CHECKSUM_INTERVAL have passed, and not all of the applications are offline or faulted. RMS will continue to remain online, but neither automatic nor manual switchover will be possible on this host until the detector detector reports offline or faulted.
Corrective action:
Make sure that both the local and remote hosts are running with the same RMS configuration file.
Content:
This message is output in the following situations.
The checksum of the configuration file reported by the remote host hostname is different from the checksum of the configuration file on the local host.
Setting of the RMS global environment variable differs depending on each node.
Corrective action:
Take following actions depending on the situation.
The checksum of the configuration file reported by the remote host hostname is different from the checksum of the configuration file on the local host.
The most likely cause for this would be that the local host and the remote host are running with different configuration files. Make sure that the local host and the remote host are running the same configuration file.
Setting of the RMS global environment variable differs depending on each node.
Correct the hvenv.local on all the nodes and then restart RMS.
Content:
This message is output when the RMS configuration file reported by the remote node <hostname> and the RMS configuration file on the local node is different. This message is also output when the setting of the RMS global environment variable differs depending on each node.
Corrective action:
The most likely cause for this would be that the local host and the remote host are running with different configuration files. Make sure that the local host and the remote host are running the same configuration file.
Make sure that the When the setting of the RMS global environment variable differs depending on each node, correct the hvenv.local to be consistent on all the nodes and then restart RMS.
Content:
If the checksums of the configurations of the local and the remote host are different and if more than time seconds have elapsed since this host has gone online (time is the value of the environment variable HV_CHECKSUM_INTERVAL if set, or 120 seconds otherwise), then RMS prints the above message.
Corrective action:
Make sure that all the hosts in the cluster are running with the same configuration file.
Content:
RMS was unable to set the global environment variable envattribute because it has not been set in hvenv.
envattribute can be any one of the following: RELIANT_LOG_LIFE, RELIANT_SHUT_MIN_WAIT, HV_CHECKSUM_INTERVAL, HV_LOG_ACTION_THRESHOLD, HV_LOG_WARNING_THRESHOLD, HV_WAIT_CONFIG or HV_RCSTART. This will eventually cause RMS to exit with exit code 1.
Corrective action:
Set the value of the environment variable to an appropriate value.
Content:
The 'hvutil -u' command has been invoked on a node, but its SysNode is not in the Wait State (internal option).
Corrective action:
Reissue the command after the Sysnode has reached the Wait state.
Content:
RMS was unable to set the local environment variable envattribute because it has not been set in hvenv. envattribute can be any one of the following:
SCRIPTS_TIME_OUT, RELIANT_INITSCRIPT, RELIANT_STARTUP_PATH, HV_CONNECT_TIMEOUT, HV_MAXPROC or HV_SYSLOG_USE. This will eventually cause RMS to exit with exit code 1.
Corrective action:
Set the value of local environment variable in the /opt/SMAW/SMAWRrms/bin/hvenv.local file to an appropriate value.
Content:
The 'hvutil -o' command has been invoked on a node, but its SysNode is not in the Wait State.
Corrective action:
Reissue the command after the Sysnode has reached the Wait state.
Content:
This message is generated if hvmod has been invoked without the -l option and the application is processing some requests.
Corrective action:
Reissue the hvmod command when the userApplication has completed the current switch request.
Content:
Dynamic modification has failed. The exact reason for the failure is displayed in the message preceding this one.
Corrective action:
Check the switchlog for the error message occurring prior to this message or find out the exact cause of the failure.
Content:
If the value of the environment variable HV_WAIT_CONFIG is 0 or has not been set, the default value of 120 is used instead.
Corrective action:
Set the value of HV_WAIT_CONFIG in the /opt/SMAW/SMAWRrms/bin/hvenv.
Content:
RMS uses the NET_SEND_Q queue for transmitting contract information. If there is some problem with this queue, the operation is aborted. The operation can be any one of the following: hvrcp, hvcopy.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
A error occurred while transferring file filename across the network.
Corrective action:
Check if there are any problems with the network.
Content:
The time taken for dynamic modification is greater than the timeout limit. The timeout limit is the greater of the environment variable MODIFYTIMEOUTLIMIT (if defined) or 0. If the value for the environment variable is 0 or less, the timeout limit is 0. If the variable is not defined, the default timeout limit is 120 seconds.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
The time taken for dynamic modification during bm startup is greater than the timeout limit. The timeout limit is the greater of the environment variable MODIFYTIMEOUTLIMIT (if defined) or 0. If the value for the environment variable is 0 or less, the timeout limit is 0. If the variable is not defined, the default timeout limit is 120 seconds. RMS exits with exit code 63.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
Critical internal error.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
During dynamic modification, if there is an attempt to make a non-critical resource resource MonitorOnly while it is not online and the userApplication userapplication is Online this message is printed, and dynamic modification is aborted.
Corrective action:
Switch the userApplication Offline before making the resource critical.
Content:
If RMS finds that the userApplication userapplication will have no children while performing dynamic modification, this message is printed to the switchlog and dynamic modification is aborted.
Corrective action:
Make sure that the userApplication has valid children while performing dynamic modification.
Content:
The wizards use the environment variable HVMOD_HOST during dynamic modification. This variable holds the name of the host on which hvmod has been invoked. If this variable cannot be set with the function putenv(), then this message is printed to the switchlog along with the reason failurereason.
Corrective action:
Check the reason failurereason in the switchlog to find out why this operation has failed and take corrective action based on this.
Content:
Wizards make use of an action file during hvmod. If the execution of this action file (command) has failed due to the process exiting by using an exit call, this message is printed to the switchlog along with the reason for this failure.
Corrective action:
Check the switchlog for finding the reason for this failure and rectify it before reissuing the hvmod command.
Content:
During dynamic modification, files containing modification information are transferred between the hosts of the cluster. If, for any reason, a file transfer fails, the dynamic modification is aborted.
Corrective action:
Make sure that host and cluster conditions are such that command can be safely executed.
Content:
When a host joins a cluster, it receives a cluster configuration file. If, for any reason, a file transfer fails, the dynamic modification is aborted.
Corrective action:
Make sure that host and cluster conditions are such that command can be safely executed.
Content:
During dynamic modification, files containing modification information are transferred between the hosts of the cluster. If, for any reason, a file transfer fails, the dynamic modification is aborted. A specific reason for this failure is referred to by the OS error code ERRNO and its explanation in ERRORREASON. The list is also available at "Appendix B Solaris/Linux ERRNO table" in this manual.
Corrective action:
Make sure that host and cluster conditions are such that command can be safely executed.
Content:
During dynamic modification, files containing modification information are transferred between the hosts of the cluster.
Corrective action:
Make sure that host, cluster and network conditions are such that command can be safely executed.
Content:
If the file filename that has been specified as the file to be copied from the local host to the remote host cannot be opened for reading, this message is printed.
Corrective action:
Make sure that the file filename is readable.
Content:
During a file transfer between the hosts, RMS encountered a problem indicated by the OS error code ERRNO.
Corrective action:
Make sure that the host, cluster and network conditions are such that file transfer proceeds without errors.
Content:
The RMS base monitor periodically checks the integrity and size of the temporary file used to transfer configuration data to the hvdisp process. If this file cannot be checked, then hvdisp process is restarted automatically, though some data may be lost and not displayed at this time. Specific OS error code for the error encountered is displayed in ERRNO.
Corrective action:
Make sure that the host conditions are such that the temporary file can be checked. Sometimes, you may need to restart the hvdisp process by hand.
Content:
When a remote host joins a cluster, this host attempts to dump its own configuration for a subsequent transfer to the remote host. If the configuration cannot be saved, the hvjoin operation is aborted.
Corrective action:
One of the previous messages contain a detailed explanation about the error occurring while saving the configuration. Correct the host environment according to the explanation, or contact field engineers.
Content:
When a remote host joins a cluster, this host attempts to prepare its own configuration for a subsequent transfer to the remote host. For that, it uses the command command. If the command fails, the hvjoin operation is aborted.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
When this host joins a cluster, this host attempts to store remote configuration files for a subsequent dynamic modification on this host. For that, it uses the command command. If the command fails, the hvjoin operation is aborted.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
File transfer is a part of some RMS operations such as dynamic modification and hvjoin. Before transferring a file file to a remote host, it must be compressed with the command command. If the command fails, the operation that requires the file transfer is aborted.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
While performing RMS cluster-wide shutdown, RMS on host host failed to shut down.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
While performing RMS clusterwide shutdown, RMS on this host failed to shut down. Another attempt to shut down this host is automatically initiated.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
While reading file file, an error errno occurred. The reason is explained in reason File reading errors may occur during dynamic modification, or during hvjoin operation.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
RMS internal error.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
A critical internal error has occurred.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
During dynamic modification when new resource(s) to be added to an offline parent object and that resource cannot be brought offline, this message is printed.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
During dynamic modification when new resource(s) that are to be added to an online parent object by executing the online scripts and that resource cannot be brought online, dynamic modification is aborted.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
During dynamic modification, if there is an attempt to add an object object that does not have a parent (and hence not linked to any userApplication), this message is printed and dynamic modification is aborted.
Corrective action:
Make sure that every object to be added during dynamic modification is linked to a userApplication.
Content:
When RMS receives a directive to add a new resource resource with the same name as that of an existing resource, this message is printed to the switchlog and dynamic modification aborts.
Corrective action:
Make sure that when adding a new resource, its name does not match the name of any other existing resource.
Content:
When RMS receives a directive to add a new resource resource with the name of an existing resource, it prints out this message and dynamic modification aborts.
Corrective action:
Make sure that when adding a new resource, its name does not match the name of any other existing resource.
Content:
In the overall structure of the graph of the RMS resources, no cycles are allowed along the chains of parent/child links. If this is not the case then dynamic modification fails and the message specified above will be printed to the switchlog.
Corrective action:
Get rid of the cycles.
Content:
Since, deleting a resource causes all its children with no other parents to get deleted as well, deleting a resource and then modifying the attributes of the deleted resource or a child of that resource that has no other parents leads to dynamic modification being aborted and the message being printed to the switchlog.
Corrective action:
While performing dynamic modification of a resource make sure that the resource that is being modified has not been deleted.
Content:
When there is an attempt to delete a child object when the parent object has been deleted, the above message will appear in the switchlog and dynamic modification aborted.
Corrective action:
Make sure that when an object is being deleted explicitly, its parents have not already been deleted because that means this object has also been deleted.
Content:
When there is an attempt to delete a resource resource whose children have already been deleted, the above message will appear in the switchlog and dynamic modification aborted.
Corrective action:
Make sure that when a resource is being deleted explicitly, its children have not already been deleted.
Content:
Every resource has to be in either one of the states: stateOnline, stateOffline, stateFaulted, stateUnknown or stateStandby. If the resource resource is not in any of the states mentioned above, it prints the above message and dynamic modification is aborted. Theoretically this is not possible.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
If the parent of the resource is a userApplication, then linking to or unlinking a child from that parent is not possible. If there is an attempt to perform this, then the above message will be printed to the switchlog and dynamic modification will be aborted.
Corrective action:
Do not link or unlink a resource from a userApplication.
Content:
When RMS gets a directive to link existing resources during dynamic modification, and the parent object parentobject to which the child object is being linked is not a resource, then dynamic modification fails, and this message is printed.
Corrective action:
Make sure that while linking 2 objects, the parent of the child object is a resource.
Content:
When RMS gets a directive to link existing resources during dynamic modification, if the child object childobject that is being linked to a parent object is not a resource, then dynamic modification fails and this message is printed.
Corrective action:
Make sure that while linking 2 objects, the child of the parent object is a resource.
Content:
An attempt was made to link a parent parentobject and a child childobject that are already linked. This message is printed, and dynamic modification is aborted.
Corrective action:
While trying to perform dynamic modification, make sure that the parent and the child that are to be linked are not already linked.
Content:
While creating a new link between 2 existing objects, during dynamic modification, a faulted child childobject cannot be linked to a parent parentobject that is not faulted. The child first needs to be brought to the state of the parent. If this condition is violated, the aforementioned message will be printed to the switchlog. Dynamic modification is aborted.
Corrective action:
Bring the faulted child to the state of the parent before linking them.
Content:
While linking 2 existing objects during dynamic modification, the combination of states parent Online and child not Online is not allowed. When this happens, dynamic modification is aborted and a message is printed to the switchlog.
Corrective action:
The child childobject first needs to be brought to the online state before linking it to the online parent parentobject.
Content:
Any attempt to link 2 existing objects in which the child is neither in the Offline nor the Standby state, and the parent is in the Offline or Standby state, is prohibited. This message is printed in the switchlog, and dynamic modification is aborted.
Corrective action:
The child needs to be first brought to offline or standby state before linking it to the parent that is in offline or standby state.
Content:
Trying to unlink object parentobject from object childobject when they are not already linked results in this message with dynamic modification aborted.
Corrective action:
If you want to unlink 2 objects make sure that they share a parent child relationship.
Content:
Unlinking a child childobject so that no links remain linking it to any userApplication is not allowed.
Corrective action:
Make sure that the child is still linked to a userApplication
Content:
Dynamic modification performs some sanity checks to ensure that all of the following are true:
The HostName attribute is present only for children of userApplication objects.
The child of a userApplication does not have another parent.
Each object belongs to only one userApplication.
Leaf objects have detectors.
Leaf objects that have the DeviceName attribute have it set to a valid value.
The length of the attribute rName for the leaf objects is smaller than the maximum.
There are no duplicate lines in the hvgdstartup file.
The kind argument for the detector in the hvgdstartup is specified.
All detectors can be loaded.
A valid value has been specified for the rKind attribute.
The ScriptTimeout value is greater than the detector cycle time.
No objects are and and or at the same time.
ClusterExclusive and LieOffline, which are mutually exclusive, are not used together.
If some of these sanity checks fail, then this message will be printed and dynamic modification is aborted.
A FATAL message is also printed to the switchlog with more details as to why the sanity check failed.
Corrective action:
Make sure that the sanity checks mentioned above pass.
Content:
Any attempt to perform the operations of deleting an object object from the RMS resource graph and then trying to unlink it from its parent object or vice versa results in dynamic modification being aborted and the above message being printed to the switchlog.
Corrective action:
Make sure that the operations of deletion and unlinking are not performed on an object at the same time.
Content:
When a new object is being added to an existing configuration, it should have an existing object parentobject as its parent, if not then, dynamic modification is aborted and the message is printed to the switchlog.
Corrective action:
Make sure that the parent specified for a new object that is being added is existent.
Content:
When a new object is being added to an existing configuration, if the parent object parentobject that has been specified is not a resource, it leads to dynamic modification aborting and the message being printed. Dynamic modification is aborted.
Corrective action:
Make sure that the parent object specified for a new object is a resource.
Content:
Any attempt to link to a child object childobject that is non-existent leads to this message and dynamic modification aborts.
Corrective action:
Make sure that the child object to be linked to exists.
Content:
When a new object childobject being added to an existing configuration is not a resource, this message is printed, and dynamic modification is aborted.
Corrective action:
Make sure that the child object specified is a resource.
Corrective action:
A critical error has occurred.Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
During dynamic modification if there is a request to add a new parent object parentobject that is not a resource, this message is printed, and dynamic modification aborts.
Corrective action:
Make sure that the object being added as a parent object is a resource.
Content:
As part of dynamic modification, if the specified child object childobject does not exist, then this message is printed, and dynamic modification is aborted.
Corrective action:
Make sure that the child object that has been specified exists.
Content:
When adding a new object to the RMS resource graph, if the child childobject of this new object is not a resource, dynamic modification aborts.
Corrective action:
Make sure that when adding a new object, its child is a resource.
Content:
If RMS gets a directive to delete an object object that is either non-existent or not a resource, this message is printed along with the failure of dynamic modification.
Corrective action:
Make sure that you don't try to delete an object that does not exist.
Content:
An object deleted during dynamic modification is neither a resource type object, nor a userApplication nor a SysNode object. Only resources, applications and hosts (SysNode objects) can be deleted during dynamic modification.
Corrective action:
Do not delete this object, or delete another object.
Content:
When a resource object is added to an existing RMS resource graph and it is linked as a child to two parent objects, one of which is online and the other offline/standby, this message is printed: a child object needs to be brought to the state of its parent.
Corrective action:
Make sure that both the parents of the resource to be added are in the same state before adding it.
Content:
During dynamic modification, if the state state of a parent resource parentobject is not one of the states stateOnline, stateOffline, stateFaulted, or stateUnknown, dynamic modification aborts.
Corrective action:
Make sure that the state of the parent resource is one of the states mentioned above.
Content:
When a new object object is being added as a child of userapplication and the value of its HostName attribute is the same as the value of the HostName attribute of an existing child of userapplication, this message is printed, and dynamic modification is aborted.
Corrective action:
Make sure that the HostName attribute of an object that is being added to userApplication is different from the values of the HostName attributes of other first level children of userapplication.
Content:
When a new child object childobject is added to an application userapplication during dynamic modification, if the HostName attribute is missing for this object, this message is printed, and dynamic modification is aborted.
Corrective action:
The first level object under userapplication must have a HostName attribute.
Content:
If both the parent parentobject and the child childobject have detectors associated with them, if the state of the child is not online, but it needs to be linked to the parent that is supposed to be online, then this message will be printed and dynamic modification aborted.
Corrective action:
Make sure that the parent and the child are in the similar state.
Content:
Trying to link a child childobject that is online to a parent object, which is supposed to go offline, is not allowed, and dynamic modification is aborted.
Corrective action:
Make sure that the parent and the child are in a similar state.
Content:
When RMS gets a directive to add a new child object childobject having as parent and child resources belonging to different applications userapplication1 and userapplication2, the above message is printed and dynamic modification aborts.
Corrective action:
When adding a new resource make sure that it does not have as its parent and children, resources belonging to different applications.
Content:
Any attempt to create an object object that does not have an existing parent leads to this message and dynamic modification aborts.
Corrective action:
Make sure that the object object has an existing object as its parent.
Content:
If the HostName attribute of object object is an invalid value then this message occurs and dynamic modification is aborted. If the HostName attribute is missing, (ADM, 40) will take care of it.
Corrective action:
Set the HostName attribute of resource object to the name of a valid SysNode.
Content:
RMS received a directive to add a new child object object by linking it to parent objects belonging to different applications userapplication1 and userapplication2. Dynamic modification is aborted.
Corrective action:
When adding a new child resource, make sure that it does not have as its parents resources belonging to different applications.
Content:
Any attempt to add a new node having as its parent parentobject fails if the parent parentobject is the child of an object that has been deleted, because deleting an object automatically causes its children to be deleted as well if they don't have any other parents. This causes dynamic modification to fail.
Corrective action:
When adding a new object makes sure that its parent has not already been deleted.
Content:
Any attempt to delete an object childobject belonging to a deleted application elicits this response from RMS because deleting an application automatically causes all its children to be deleted as well.
Corrective action:
Do not try to delete an object belonging to an already deleted application.
Content:
Any attempt to delete an object objectname that belongs to a deleted application leads to this error because deleting an application deletes all its children including objectname.
Corrective action:
Make sure that before an object is deleted, it does not belong to an application that is being deleted.
Content:
When RMS gets a directive to delete an object object, which is a descendant of a new object, this message is printed, and dynamic modification is aborted.
Corrective action:
Make sure that when an object is being deleted, it is not a descendant of a new object.
Content:
When RMS gets a directive to link to a child childobject that is going to be deleted, dynamic modification aborts.
Corrective action:
Do not link to a child object that is to be deleted.
Content:
If there is an attempt to delete an object object and use its descendants (which should be deleted as a result of deleting the parent) as the parent for a new resource that is being added to the RMS resource graph, this error message is printed and dynamic modification aborts.
Corrective action:
Do not attempt to delete an object and use its descendant as the parent for a new resource.
Content:
An attempt was made to modify the attribute of a node node that is absent. This message is printed and dynamic modification is aborted.
Corrective action:
Modify the attributes of an existing node.
Content:
When RMS receives a directive to modify a node object with attribute attribute that has an invalid value, this message is printed, and dynamic modification is aborted.
Corrective action:
Specify a valid value for the attribute attribute.
Content:
RMS uses Unix queues internally for interprocess communication. Admin queue is one such queue that is used for communication between RMS and other utilities like hvutil, hvmod, hvshut, hvswitch and hvdisp. If RMS cannot create this queue due to some reason, RMS exits with exit code 50.
Corrective action:
Restart RMS.
Content:
If RMS is unable to open the file /opt/SMAW/SMAWRrms/locks/.rms.<pid> for writing when 'hvdisp -m' has been invoked, this message is printed.
Corrective action:
Verify that the directory /opt/SMAW/SMAWRrms/locks exists and allows files to be created (correct permissions, free space in the file system, free inodes). If one of these problems exists, fix it via the appropriate administrator operation. If none of these problems apply, but the RMS failure still occurs, contact field engineers.
Content:
When hvdisp is unable to open the file file (/opt/SMAW/SMAWRrms/locks/.rms.<pid>) for writing, it prints out the reason errormsg.
Corrective action:
Verify that the directory /opt/SMAW/SMAWRrms/locks exists and allows files to be created (correct permissions, free space in the file system, free inodes). If one of these problems exists, fix it via the appropriate administrator operation. If none of these problems apply, but the RMS failure still occurs, contact field engineers.
Content:
This message is printed to the switchlog because commands like hvswitch, hvutil and hvshut cannot run in parallel with a non local hvmod.
Corrective action:
Make sure that before a hvswitch is performed, hvmod is not operating on userapplication.
Content:
While performing a switch, hvswitch requires a userApplication as its argument. If the resource resource is not a userApplication, this message is printed.
Corrective action:
Check the man page for hvswitch for usage information.
Content:
The attribute ShutdownScript is a hidden attribute of a SysNode. The RMS base monitor automatically defines its value -- users cannot change it in any way.
Corrective action:
Do not attempt to change the built-in value of the ShutdownScript attribute.
Content:
This message can occur in these scenarios:
The name of the SysNode specified in hvswitch is not included in the current configuration. ('hvswitch [-f] userapplication [sysnode]')
The name of the SysNode specified for 'hvshut -s sysnode' is not a valid one, i.e., sysnode is not included in the current configuration.
The name of the SysNode specified for 'hvutil -ou' is unknown (hidden options).
Corrective action:
Specify a SysNode that is included in the current configuration, i.e., appears in the configname.us file.
Content:
This message could appear if 'hvshut -a' was invoked and not all of the nodes replied with an acknowledgement.
Corrective action:
Login to the remote hosts. If RMS is still running, perform 'hvutil -f <userapplication>' to shut down each application one at a time. If this fails, refer to the switchlog and userapplication log files to find the reason for the problem. If all applications have been shut down correctly, perform a forced RMS shutdown with 'hvshut -f.' Report the problem to field engineers.
Content:
The node on which 'hvshut -a' has been invoked is not yet ready to be shut down because the application is busy on the node.
Corrective action:
Wait until the ongoing action (e.g., switchover, dynamic reconfiguration) has terminated.
Content:
This message occurs if the RMS internal sanity-check functions detect a severe configuration problem. This message should not occur if the configuration has been set up using RMS configuration wizards.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
The attribute attribute is constant and can only be set in a configuration file.
Corrective action:
Make sure that there is no attempt to modify attribute within object.
Content:
This message can appear in the switchlog if dynamic modification is being performed on an object that is being asserted.
Corrective action:
Perform the modification after the assertion has been fulfilled.
Content:
Set PriorityList for userapplication to include all the host names from the HostName attribute of the application's children.
Corrective action:
No duplicate host names should be present in the PriorityList.
Content:
The HostName attribute of one or more of the children specifies hosts that are not in the parent's PriorityList attribute.
Corrective action:
Set the PriorityList attribute of userapplication to include all the host names listed in the HostName attributes of the application's children. No duplicate host names should be present in the PriorityList.
Content:
If userapplication uses more parent controllers than specified by the attribute MaxControllers (maxcontroller), this message is printed, and dynamic modification is aborted.
Corrective action:
Make sure that the number of parent controllers used by an application is less than the number specified as part of the MaxControllers attribute, or modify MaxControllers to increase the number.
Content:
This message may appear in the switchlog if there is an attempt to delete a SysNode from a running configuration if the node is not in one of the states Unknown, Wait, Offline or Faulted.
Corrective action:
Shut down RMS on that host and then do the deletion.
Content:
During dynamic modification the local SysNode sysnode was going to be deleted.
Corrective action:
Make sure dynamic modification does not contain 'delete sysnode;' where sysnode is the name of the local node.
Content:
This message appears in the switchlog if the name sysnode specified as part of the dynamic modification is not resolvable to any known host name.
Corrective action:
Specify a host name that is resolvable to a network address.
Content:
If the dynamic modification takes too much time, this message is printed.
Corrective action:
Make sure that the network connection between the hosts is functional, and also verify that the scripts from newly added resources do not take too much time to execute, or that dynamic modification does not add too many new nodes, or that the modification file is too big or too complex.
Content:
A controlled application userapplication cannot be deleted while its controller controller retains the application's name in its Resource attribute.
Corrective action:
Remove the name of the deleted application from the controller's Resource attribute, or add a new application with the same name, or delete the controller together with its controlled application, or change the controller's NullDetector attribute to 1.
Content:
The reason for this message is that only the modification of local attributes is allowed during local modification.
Corrective action:
Make a non-local modification, or modify different attributes.
Content:
This message may appear because an attribute of a particular object can be modified only once in the same modification file, but attribute has been modified more than once for <object>.
Corrective action:
Modify the attribute only once per object.
Content:
This message appears when we try to rename an existing object sysnode to other node othersysnode but one of the following conditions was encountered:
othersysnode is not a valid name.
othersysnode is already used by some other host in the cluster.
othersysnode is not a resource.
othersysnode is a controlled application.
Corrective action:
Choose another valid host name
Content:
This message appears when the user tries to rename a resource that is controlled by a controller object and is going to be deleted.
Corrective action:
Make sure deleted applications are not referred from any controller.
Content:
This message appears when the user tries to control a resource resource with a controller controller but the application associated with that resource is going to be deleted.
Corrective action:
Make sure the controller's Resource attribute does not refer to a deleted application.
Content:
RMS was started with the -a option but due to some internal error RMS could not be started on the remote host. This is a critical internal error.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide." For temporary workaround, try again or start RMS manually on each host.
Content:
When RMS cannot be started on remote hosts because the command startupcommand failed, this message is printed.
Corrective action:
This may occur when some of the hosts are not reachable or the network is down.
Check the network and remote host for abnormalities remove the problem cause, and retry.
Content:
This message appears when the controller node was not able to find the applications controlled by it with the applications running on the host.
Corrective action:
Correct your modification file so that the controllers refer only to the existing applications.
Content:
This message appears when the user tries to change the Resource attribute of the controller object controller from oldresource to newresource because one or more of the applications listed in newresource is not an existing application or its state is incompatible with the state of the controller, or because the list contains duplicate elements.
Corrective action:
Make sure that the applications listed in the resource newresource are not written more than once or invalid.
Content:
If an application needs to be controlled by a controller then the applications' attributes PreserveState and AutoSwitchOver need to be 1 and No respectively if the controller has its AutoRecover set to 1.
Corrective action:
Check the PreserveState and AutoSwitchOver attribute of the application.
Content:
The total number of SysNode objects in the cluster has exceeded the maximum allowable limit.
Corrective action:
Make sure that the total number of SysNode objects in the cluster does not exceed maxhosts.
Content:
The cumulative length of the SysNode names specified in the configuration for application userapplication exceeds the maximum allowable limit.
Corrective action:
Limit the length of the SysNode names so that they fit within the maximum allowable limit.
Content:
The entry attr must be unique.
Corrective action:
Ensure that the attr entry is unique.
Content:
RMS internal error.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
While performing a switch, hvswitch requires a userApplication or a resource as its argument. If the resource resource is not a userApplication or a resource, this message is printed.
Corrective action:
Check the man page for hvswitch for usage information.
Content:
If RMS detects that a line has been duplicated in the hvgdstartup, it prints this error message. The end result of this is that RMS will exit with exit code 23.
Corrective action:
Only unique lines are allowed in hvgdstartup. Remove all the duplicate entries.
Content:
In the hvgdstartup file, the entry for the detector is not of the form 'g<n> -t<n> -k<n>', or the -k<n>& option is missing. Since RMS is unable to start, it exits with exit code 23.
Corrective action:
Modify the entry for the detector so that the kind (-k<n> option) for the detector is specified properly.
Content:
During dynamic modification, there was an attempt to redefine the kind for the DetectorStartScript.
Corrective action:
Do not attempt to redefine the DetectorStartScript when the detector is already running.
Content:
The message can be any one of the following:
Check for SanityCheckErrorPrint
Object <object> cannot have its HostName attribute set since it is not a child of any userApplication. Only the direct descendants of userApplication can have the HostName attribute set.
In basic.C:parentsCount(...)
The node <node> belongs to more than one userApplication, app1 and app2. Nodes must be children of one and only one userApplication node.
The node <node> is a leaf node and this type <type> does not have a detector. Leaf nodes must have detectors.
The node <node> has an empty DeviceName attribute. This node uses a detector and therefore it needs a valid DeviceName attribute.
The rName is <rname>, its length length is larger than max length maxlength.
The DuplicateLineInHvgdstartup is <number>, so the hvgdstartup file has a duplicate line.
The NoKindSpecifiedForGdet is <number>, so no kind specified in hvgdstartup.
Failed to load a detector of kind <kind>.
The node <node> has an invalid rKind attribute. Nodes of type gResource must have a valid rKind attribute.
The node <node> has a ScriptTimeout value that is less than its detector report time. This will cause a script timeout error to be reported before the detector can report the state of the resource. Increase the ScriptTimeout value for objectname (currently value seconds) to be greater than the detector cycle time (currently value seconds).
Node <node> has no detector while all its children's "MonitorOnly" attributes are set to 1.
The node <node> has both attributes "LieOffline" and "ClusterExclusive" set. These attributes are incompatible; only one of them may be used.
The type of object <object> cannot be or and at the same time.
Object <object> is of type and, its state is online, but not all children are online.
Corrective action:
Verify the above description and change the configuration appropriately.
Content:
An object was encountered as a part of more than one user applications.
RMS applications cannot have common objects.
Corrective action:
Redesign your configuration so that no two applications have common objects.
Content:
An object that has no children objects (i.e. a leaf object) is of type type that has no detectors in RMS. All leaf objects in RMS configurations must have detectors.
Corrective action:
Redesign your configuration so that all leaf objects have detectors.
Content:
A critical internal error has occurred. If this message appears in switchlog, it indicates a severe problem in the base monitor.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
The value of the rName attribute exceeds the maximum length of maxlength characters.
Corrective action:
Specify a shorter rName not to exceed the upper limit.
Content:
This message prints out a line number of the duplicate line in hvgdstartup file.
Corrective action:
Make sure that file hvgdstartup has no duplicate lines.
Content:
The kind has not been specified for the generic detector in the hvgdstartup file.
Corrective action:
Specify the kind for the generic detector in hvgdstartup.
Content:
Object object does not have its DetectorStartScript defined.
Corrective action:
Make sure that the DetectorStartScript is defined for object object.
Content:
Object object has an invalid rKind attribute.
Corrective action:
Make sure that the object object has a valid rKind attribute.
Content:
The ScriptTimeout value is less than the detector cycle time. This will cause the resource to appear faulted when being brought Online or Offline.
Corrective action:
Make the value of ScriptTimeout greater than the detector report time.
Content:
Each RMS object must be of a type derived from or and types, but not both. If this message appears in the switchlog, it indicates of a severe corruption of the RMS executable.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
This message may appear during dynamic modification, when the existing configuration is checked before applying the modification. If this message appears, the dynamic modification will not proceed.
Corrective action:
Make sure that online objects of type and have all their children in online states, only then apply dynamic modification.
Content:
An object that is not a child of a userApplication has its HostName attribute set. Only children of the userApplication object can and must have its HostName attribute set.
Corrective action:
Eliminate the HostName attribute from the definition of the object, or disconnect the userApplication object from this object, making this object a child of another, non-userApplication object.
Content:
Both attributes LieOffline and ClusterExclusive are set for the same RMS object. Only one of them can be set for the same object.
Corrective action:
Eliminate one or both settings from the RMS object object.
Content:
A detector was not able to be started by the RMS base monitor.
Corrective action:
Make sure detector executable is present in the right place and has executable privileges.
Content:
An object without a detector has all its children's MonitorOnly attributes set to 1. An object without a detector must have at least one child for which MonitorOnly is set to 0.
Corrective action:
Change the configuration so that each object without a detector has at least one child with its MonitorOnly set to 0.
Content:
Both attributes MonitorOnly and ClusterExclusive are set for the same RMS object. Only one of them can be set for the same object.
Corrective action:
Eliminate one or both settings from the RMS object object.
Content:
RMS internal error.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
RMS internal error.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
RMS internal error.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
RMS internal error.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
RMS internal error.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
An object of unknown type is added during dynamic modification.
Corrective action:
Use only objects of known types in configuration files.
Content:
An attribute of a non-existing object cannot be modified.
Corrective action:
Modify attributes only for existing objects.
Content:
Invalid attribute is specified for modification.
Corrective action:
Modify only valid attributes.
Content:
An object object of a system type symbol is specified during dynamic modification.
Corrective action:
Use only valid resource types when adding new objects to configuration.
Content:
An object object of a system type symbol is specified for deletion.
Corrective action:
Delete only objects that are valid resource types.
Content:
This message appears when the PriorityList of the controlled application controlleduserapplication is different from the content of the PriorityList of the application userapplication to which the controller controller belongs.
Corrective action:
Make sure that the PriorityList of the controller and the controlled application is same.
Content:
During dynamic modification, an attempt was made to add new resource(s) to a resource that was in Standby mode, but the resources could not also be brought into Standby mode.
Corrective action:
Analyze your configuration to make sure that standby capable resources can get to the Standby state.
Content:
In order for an application userapplication to be controlled by a controller controller the application userapplication has to have at least one standby capable resource on host sysnode.
Corrective action:
Make sure that the controlled application has at least one standby capable controller or make sure that the controllers are not standby capable.
Content:
This message appears when user sets both controller attributes StandbyCapable and IgnoreStandbyRequest to 1.
Corrective action:
Make sure that only one is set to 1 and other to 0.
Content:
The controller node controller should have one of its attributes Online-Timeout or StandbyTimeout be null to allow the attribute Follow to be 1.
Corrective action:
Set the attributes accordingly and try again.
Content:
This message appears when the user wants the application userapplication to be controlled by a controller but one or more of the applications' attributes ControlledSwitch or ControlledShutdown is set to 1.
Corrective action:
Set the attributes accordingly and try again.
Content:
The user cannot modify global attributes attribute like DetectorStartScript or NullDetector or NonCritical locally on a host hostname.
Corrective action:
Modify the attribute globally or modify locally a different attribute.
Content:
CIP configuration file has missing entries.
Corrective action:
Make sure that the CIP configuration has entries for all the RMS hosts that are running in a cluster.
Content:
During dynamic modification, the base monitor reads its configuration from a '.dob' file. When this file cannot be read, this message appears in the switchlog. The specific OS error is indicated in errno and errorreason.
Corrective action:
Make sure the host conditions are such that .dob file can be read without errors.
Content:
While obtaining message queue parameters, sysdef was not able to communicate them back to the base monitor. The values of errno and reason indicate the kind of error.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
When the controller's Follow attribute is set, other attributes such as IndependentSwitchover, AutoSwitchOver, StandbyTransitions, AutoStartUp, ControlledSwitch, ControlledShutdown and PartialCluster must have the values 0, No, No, 0, 1, 1 and 0 respectively. However, this condition is violated in the configuration file.
Corrective action:
Supply a valid combination of attributes for the controller and its controlled user application.
Content:
If controller has its Follow set to 1 then all its controlled applications must have the same value for the attribute PersistentFault as the parent application of the controller.
Corrective action:
Check and correct the RMS configuration file.
Content:
This is a generic message indicating that the execution of the routine routine failed due to the reason errorreason and hence the RMS-CF interface is inconsistent.
Corrective action:
No action is required if this message is output when RMS is stopped on multiple nodes at the same time. In other cases, record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
The attribute DetectorStartScript and the file hvgdstartup are mutually exclusive.
Corrective action:
Make sure that the DetectorStartScript be used for setting new configurations as support for hvgdstartup may be discontinued in future releases.
Content:
Invalid combination of controller attributes is encountered. If both IgnoreOfflineRequest and IgnoreOnlineRequest are set to 1, then no request will be propagated to the controlled application(s), so no request can be split.
Corrective action:
Provide a valid combination of the controller attributes.
Content:
If a controlling application has its AutoSwitchOver attribute set with the option "Shutdown", then all applications controlled by the controllers that belong to this controlling application must also have their AutoSwitchOver attributes having the option "Shutdown" set as well.
Corrective action:
Provide correct settings for the AutoSwitchOver attributes.
Content:
The reason for this message is that the modification of local controller attributes such as NullDetector or MonitorOnly are allowed only during global modification.
Corrective action:
Make a non-local modification, or modify different attributes.
Content:
The length of object name is greater than the maximum allowable length.
Corrective action:
Ensure that the length of the object name is smaller than maxlength.
Content:
A non-scalable controller cannot have its ApplicationSequence attribute set to a non-empty value.
Corrective action:
Provide correct settings for the ApplicationSequence and Scalable attributes.
Content:
The ApplicationSequence attribute of a scalable controller includes an application name absent from the list of the controlled applications.
Corrective action:
Provide correct settings for ApplicationSequence and Resource attributes of the controller.
Content:
A scalable controller must have its attribute Follow set to 0 and IndependentSwitch set to 1.
Corrective action:
Provide correct settings for the Follow, IndependentSwitch, and Scalable attributes.
Content:
A scalable controller must list only existing applications in its ApplicationSequence attribute.
Corrective action:
Provide correct settings for attribute ApplicationSequence.
Content:
Only one scalable controller can control an application.
Corrective action:
Correct the RMS configuration.
Content:
Hostname mismatch between controlled and controlling applications. Controlling application must run on all the hosts where the controlled applications are running.
Corrective action:
Fix RMS configuration.
Content:
Hostname mismatch between controlled and controlling applications. Controlling application must run on all the hosts where the controlled applications are running.
Corrective action:
Fix RMS configuration.
Content:
When the controller's Follow attribute is set and the controlled application has StandbyCapable resources, the controller must have StandbyCapable set and IgnoreStandbyRequest must be disabled. Otherwise Standby requests will not properly been propagated to the controlled application.
Corrective action:
Supply a valid combination of attributes for the controller and its controlled user application.
Content:
Wrong value is supplied for a flag -k in the detector startup script.
Corrective action:
Fix RMS configuration.
Content:
Values for rKind attribute and flag -k of the detector startup line do not match.
Corrective action:
Correct the RMS configuration.
Content:
Different values for rKind attribute are encountered within the same object.
Corrective action:
Fix RMS configuration.
Content:
Setting controller attributes Scalable and SplitRequest is mutually exclusive.
Corrective action:
Fix RMS configuration.
Content:
An exclusive resource cannot belong to an application with the attribute PartialCluster set to 1, or cannot be controlled, directly or indirectly, by a Follow controller from an application with the attribute PartialCluster set to 1.
Corrective action:
Fix RMS configuration.
Content:
An application controlled by a scalable controller cannot have ControlledShutdown set to 1 and AutoSwitchOver including the option ShutDown at the same time.
Corrective action:
Correct the RMS configuration file.
Content:
A line in a configuration file is too big.
Corrective action:
Fix RMS configuration, so that each line takes less than 2000 bytes.
Content:
This message indicates that the RMS on the node Sysnode has terminated unexpectedly.
Corrective action:
Investigate why RMS has terminated unexpectedly and then take necessary actions. If RMS terminated unexpectedly, since the node which was terminated unexpectedly needs to be forcibly stopped, the message (US, 12) will be output.
This message may be output when RMS is activated when the SysNode <Sysnode> is stopped. In this case, no action is required.
Content:
The getaddrinfo call failed to allocate a port for rmshb.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
Certain options for hvcm require an argument. If hvcm has been invoked without the argument, this message appears along with the usage and RMS exits with exit code 3.
Corrective action:
Refer to "PRIMECLUSTER Installation and Administration Guide" for the correct usage of hvcm.
Content:
The option provided is not a valid one.
Corrective action:
Refer to "PRIMECLUSTER Installation and Administration Guide" for the correct usage of hvcm.
Content:
The number for the -l option is not correct. Check the range.
Corrective action:
Check the man page for hvcm for range argument with -l option.
Refer to "PRIMECLUSTER Installation and Administration Guide" for the correct usage of hvcm.
Content:
If the loglevel loglevel specified with -l option for hvcm or hvutil is greater than the maximum possible loglevel maxloglevel, this message is printed.
Corrective action:
Specify a loglevel between 1 and maxloglevel for 'hvcm -l' or 'hvutil -l'.
Content:
When a range of loglevels has been specified with -l option for hvcm or hvutil, if the value of the end range high is smaller than the value of low, this message appears.
Corrective action:
Specify the end range value to be higher than the initial end range value.
Content:
This message is output under any one of the following conditions:
The log level specified with -l option of hvutil is neither a numeric value, nor "off", nor "display".
The log level specified with -l option of hvcm is neither a numeric value nor "off".
Corrective action:
Refer to the manual page for the correct use of hvutil/hvcm command.
Content:
If the log level specified with the -l option of hvcm or hvutil is outside the valid range, this message is printed.
Corrective action:
The valid range for the -l option of hvcm or hvutil is 1..maxloglevel.
Content:
The RMS base monitor maintains a priority list of all the hosts in the cluster. Under normal circumstances, the local host should always be present in the list. If this is not the case, this message is printed.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
RMS uses internal queues for sending contracts. (Contracts are messages that are transmitted between the hosts in a cluster that ensure the hosts are synchronized with respect to a particular operation. The messages may be transmitted between processes on the same host or processes on different hosts.) If there is a problem with the queue NET_SEND_Q that is being used to transmit these contracts, this message is printed in the switchlog.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
When RMS tries to send a message to another host in the cluster, if the delivery of this message over the queue NET_SEND_Q has failed, this message is printed. This could be due to the fact that the host that is to receive the message has gone down or there is a problem with the cluster interconnect.
Corrective action:
Check to make sure that the other hosts in the cluster are all alive and make sure that none of them are
experiencing any network problems.
Content:
When RMS on one host sends a contract over the queue NET_SEND_Q to another host (or itself, if there is only one host in the cluster), it tries to transmit this contract a certain number of times that is determined internally. If the message transmission fails after all attempts, this message is printed to the switchlog and the contract is discarded. (Note: UAP contracts are not discarded.)
Corrective action:
Make sure that there is no problem with the cluster interconnect and integrity of the cluster. (In other words, cluster applications are not Online on multiple nodes or the SysNode is not in Wait state.)
If there is a problem on results were confirmed, record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
The local host crthost sees the contract host originator in state state when it is expected to be in state Online.
Corrective action:
Make sure that the interhost communication channels are working correctly and that the hosts see each other online.
Content:
This message appears when a resource is not in the RMS configuration file that is controlled by a controller and the controller's NullDetector attribute is set to off.
Corrective action:
The controlled resource must be present in the RMS configuration file for the controller to work properly.
Configure the resource properly.
Content:
If the controller controller has two or more of the controlled applications Online on one or more hosts, then the controller faults.
Corrective action:
Make sure that more than one controlled application for a controller is not Online.
Content:
The cluster hosts were unable to determine which host is responsible for a particular userApplication. The most likely reason for this is an erroneous system administrator intervention (e.g., a forced hvswitch request) that left the userApplication Online on more than one host simultaneously.
Corrective action:
Analyze the cluster inconsistency and perform the appropriate action to resolve it. If the application is online on more than one hosts, shut down ('hvutil -f') the userApplication on all but one host.
Content:
A critical internal error has occurred.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
The contract received by the node from the application is not recognizable. This is a critical internal error.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
The userapplication is already online on other host and is also online in current host.
Corrective action:
The userApplication can only be online on one host. Make sure the application is offline on all but one of the hosts. If this is not the case use 'hvutil -f' to bring the userApplication to the offline state on the superfluous hosts.
Content:
The cluster hosts were unable to determine which host is responsible for a particular userApplication. The most likely reason for this is an erroneous system administrator intervention (e.g., a forced hvswitch request) that left the userApplication Online on more than one host simultaneously.
Note: This message corresponds to (CUP, 2). While (CUP, 8) is printed on the contract originator, (CUP, 2) is printed on the non-originator hosts.
Corrective action:
Analyze the cluster inconsistency and perform the appropriate action to resolve it. If the application is online on more than one hosts, shut down ('hvutil -f') the userApplication all hosts, except one.
Content:
This message appears when the child faulted unexpectedly thereby causing the resource to fault.
Corrective action:
Check to see why the child resource has faulted and based on this take corrective action.
Content:
A detector unexpectedly reported the Faulted state.
Corrective action:
Check to see why the resource has faulted and take appropriate action.
Content:
This message appears when the detector failed to execute the script for a resource.
Corrective action:
Ensure that there is nothing wrong with the script and also check the resource for any problems.
Content:
As a failure occurred to a resource, although FaultScript was executed, execution of FaultScript for the resource shown in resource failed.
Corrective action:
Check the problem cause against the resource that triggered execution of FaultScript. Also, check influence to the system caused by fail of FaultScript execution against the resource shown in resource that failed execution of FaultScript. If the problem cause and influence to the system arising from failure in FaultScript execution are unknown, collect the investigation information and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
After a resource executes its Offline script, it is expected to come Offline. If it does not change its state, or transitions to a state other than Offline within the period of seconds specified by its ScriptTimeout attribute, the resource is considered as being Faulted.
Corrective action:
Make sure the Offline script moves the resource into Offline state.
Content:
After a resource executes its online script, it is expected to come Online. If it does not change its state, or transitions to a state other than Online within the period of seconds specified by its ScriptTimeout attribute, the resource is considered as being Faulted.
Corrective action:
Make sure the Online script moves the resource into Online state.
Content:
This message appears when the resource becomes Offline unexpectedly. When detector stops response to BM, it is judged that the resource becomes faulty.
Corrective action:
Check the cause why the resource suddenly transitioned to the Offline state. If the cause cannot be identified, collect the investigation information and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
This message occurs when the command line is empty or has some incorrect value.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
If the detector detector could not be started due to errorreason, this message is printed. The reason errorreason could be any one of the following:
The detector detector does not exist.
The detector detector does not have execute permission.
The process for the detector could not be spawned.
If the number of processes created by the base monitor at the same time is greater than 128.
Corrective action:
Depending on what the reason for the error is take appropriate action.
Content:
The detector script is not good or the format is not good.
Corrective action:
Check the detector startup script.
Content:
After a resource executes its online script during standby request, it is expected to come Standby. If it does not change its state, or transitions to a state other than Standby or Online within the period of seconds specified by its ScriptTimeout attribute, the resource is considered as being Faulted.
Corrective action:
Make sure the Online script moves the resource into Standby or Online state during standby request.
Content:
This message appears when the resource fails to come Online after executing it Online scripts that may transition the state of the resource to faulted.
Corrective action:
Check to see what prevented the resource resource from coming Online.
Content:
During the processing of a request within the state engine, a "request or response token" was delivered to an object that is not defined for the local host. This is a critical internal error.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
When a detector dies, restart has attempted exceeding the predetermined value in vain. The detector is considered faulty.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
In order to avoid stalling of RMS detectors, each detector periodically sends a heartbeat message to the base monitor. When the heartbeat is missing for a period of time, the base monitor prints this message into switchlog. The base monitor will send an alarm signal to the stalled process to ensure the detector will properly handle its main loop responsibilities. If the amount of time stated since the last time the base monitor had received the heartbeat from the detector exceeds 300 seconds, then the message may indicate the base monitor is not allowed to run. Currently, the base monitor is a real-time process, but not locked in memory. This message may also occur because the bm process has been swapped out and has not had a chance to run again.
Corrective action:
Make sure that the base monitor and detector are active using system tools such as truss(1) or strace(1). If the loss of heartbeat greatly exceeds the 300 second timeout, then this may require that system swap or main memory is insufficient.
Content:
command has been invoked in a way that does not conform to its expected usage.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
The file command log used for logging could not be opened.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
The various RMS commands like hvdisp, hvswitch, hvutil and hvdump utilize the lock files from the directory directory for signal handling purposes. These files are deleted after these commands are completed. The locks directory is also cleaned when RMS starts up. If they are not cleaned for some reason, this message is printed, and RMS exits with exit code 99.
Corrective action:
Make sure that the locks directory directory exists. If so, delete it.
Content:
The generic detector command was unable to get any information about the base monitor.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
The generic detector command was not able to lock its virtual memory pages in physical memory.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
This message appears when the file dumpfile failed to open because of the error code errno, explained in explanation.
Corrective action:
Correct the problem according to explanation.
Content:
This message appears when the file dumpfile failed to close because of the error code errno, explained in explanation.
Corrective action:
Correct the problem according to explanation.
Content:
An internal error has occurred.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
This message is printed if an attempt was made to copy the file filename when there was another copy in progress.
Corrective action:
Make sure that concurrent copies of the same file do not occur.
Content:
There was a problem while transferring files from one cluster host to the other.
Corrective action:
Take action based on the errno.
Content:
The controller script is not correct or invalid.
Corrective action:
Check the controller script.
Content:
When the PreCheckScript is set in the script <script>, multiple cluster applications with exclusive relationship might have been activated on the same node.
The created script may have an error.
Corrective action:
When the script is PreOnlineScript and the SControllerOf_ScalableCtr_* is output in the resource <resource>, the cluster application in Standby state, which is controlled by the scalable application, cannot be activated.
If this message is output when some nodes that configure a cluster are activated, start the cluster application in Standby state, or start RMS on all the nodes.
If this message is output when RMS is about to be stopped right after RMS is activated on all the nodes, No action is required when RMS is stopped.
When exclusive relationship is configured among multiple cluster applications, and the job priorities are the same, or the higher priority cluster application is in operation, the startup of another cluster application with the exclusive relationship is stopped and this message is printed. No action is required when exclusive relationship is configured among multiple cluster applications.
Check the error reason errorreason. Review the created script.
When the above actions do not solve this error, record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
The cluster host could not be killed because of one of the following reasons:
Script exited with a non-zero status.
Script exited due to signal caught.
Other unknown failure.
Corrective action:
Verify the status of the node, make any necessary corrections to the script, potentially correct the node state manually if possible and issue appropriate 'hvutil -{o, u}' as needed.
Content:
If the script cannot be executed, this message is printed along with the errorreason.
Corrective action:
Take action based on the errorreason.
Content:
After dynamic modification, the Shutdown Facility is notified via sdtool about the changes in the current configuration. If sdtool exits abnormally, then the base monitor must exit.
Corrective action:
Verify that sdtool and the Shutdown Facility are operating properly.
Red Hat Enterprise Linux 5 (for Intel64)
No action is required in the xen kernel environment.
Content:
The object <object> is online both on the local node and onlinenode. When the object <object> manages shared disks, data corruption may occur.
Corrective action:
Make sure that the object object is online on only one host in the cluster.
Content:
A host has left the cluster, but RMS was unable to remove the corresponding entry from its internal Priority List. This is an internal problem in the program stack and memory management.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
A host died during the processing of a switch request. The host that takes over the responsibility for that particular userApplication tried to proceed with the partly-done switch request, but another host does not agree. This indicates a severe cluster inconsistency and critical internal error.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
This message appears when the dead host hostname was holding a lock that is unknown to the new responsible host.
Corrective action:
Allow time for the cluster to cleanup.
Content:
The hvshut request was aborted because the application is busy.
Corrective action:
Do not shut down RMS when its applications are busy. Make sure the application finishes its processing before shutting down RMS.
Content:
The hvshut request was aborted because dynamic modification is in progress.
Corrective action:
Do not shut down RMS while dynamic modification is in progress. Wait until dynamic modification finishes before shutting down RMS.
Content:
The userApplication application is in an Inconsistent state on more than one host in the cluster, as such switch request was denied.
Corrective action:
Clear the Inconsistent state.
Content:
When a cluster host is killed, the host requested the kill must send a success message to the surviving hosts. This message appears in the switchlog when this message send fails.
Corrective action:
Make sure the cluster and network conditions are such that the message can be sent across the network.
Content:
This message appears when the RMS was sending a kill request to the Shutdown Facility and did not get the elimination acknowledgement.
Corrective action:
If CF is in LEFTCLUSTER state, clear the LEFTCLUSTER state. See "PRIMECLUSTER Cluster Foundation (CF) Configuration and Administration Guide" for LEFTCLUSTER state.
If CF state is not LEFTCLUSTER, check the status of SysNode.
If SysNode is in Wait state, clear the Wait state. See "PRIMECLUSTER Reliant Monitor Services (RMS) with Wizard Tools Configuration and Administration Guide" for how to clear the Wait state.
Content:
This message appears when the checksum of this host is different from the hosts in the cluster (one of the possible reasons).
Corrective action:
Check the configuration in all the cluster hosts and verify that same configuration is running on all of them.
Content:
When different configurations are encountered in a cluster where one host is offline and the other is online.
Corrective action:
Run the same configuration in a single cluster or different clusters do not have common hosts.
Content:
This message appears when uname() system call returned with a non-zero value.
Corrective action:
Make sure that the SysNode name is valid and restart RMS as needed.
Content:
The RMS naming convention '_sysnodename_ = `uname -n`RMS' is intended to allow use of the CF-name with and without trailing "RMS" whenever an RMS command expects a SysNode reference. This rule creates an ambiguity if one SysNode is named "xxxRMS" and another is named "xxx", because '_rms_command_ xxx' could refer to either SysNode. Therefore, ambiguous SysNode names are not be allowed.
Corrective action:
Use non-ambiguous SysNode names and adhere to the RMS naming conventions.
Content:
This message appears when the remote host hostname is running different configuration than the local host or different loads of RMS package are installed on these hosts.
Corrective action:
Make sure all the hosts are running the same configuration and the configuration is distributed on all hosts. Make sure that same RMS package is installed on all hosts (same load).
Content:
This message appears when the checksum of this host is different from the hosts in the cluster (one of the possible reasons).
Corrective action:
Check the configuration in all the cluster hosts and verify that same configuration is running on all of them.
Content:
This message appears when the checksum of this host is different from the hosts in the cluster (one of the possible reasons).
Corrective action:
Check the configuration in all the cluster hosts and verify that same configuration is running on all of them.
Content:
The hvshut -a command has timed out. RMS may end abnormally on some nodes and some resources that are included in the cluster applications may fail to end.
Corrective action:
Shut down the OSes on all the nodes except the nodes on which RMS has ended normally or shut down the nodes forcibly. To prevent the timeout of the hvshut command, depending on your environment, change RELIANT_SHUT_MIN_WAIT, which is the global environment variable of RMS, to a larger value.
See
For details on RELIANT_SHUT_MIN_WAIT, see "RELIANT_SHUT_MIN_WAIT" in "Global environment variables" of the following manual below:
For PRIMECLUSTER 4.3A30 or later: "PRIMECLUSTER Reliant Monitor Services (RMS) with Wizard Tools Configuration and Administration Guide."
See "PRIMECLUSTER Reliant Monitor Services (RMS) with Wizard Tools Configuration and Administration Guide" for how to refer to and change the RMS environment variables.
Content:
System Error.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
This message appears when the user issues the hvutil command ('hvutil -o' or 'hvutil -u') and the cluster host nodename is not in the Wait state.
Corrective action:
Reissue hvutil -{o, u} only when the host is in a Wait state or configure so that this command is not issued.
Content:
This message appears when the user issues the hvutil command ('hvutil -o sysnode') to clear the Wait state of the SysNode and the SysNode is still in Wait state because the last detector report for the cluster host hostname is not Online i.e. the SysNode might have transitioned to Wait state not from Online but from some other state.
Corrective action:
Issue 'hvutil -o' only when the host transits from the online state to Wait state.
Content:
When a new host comes Online, the other hosts in the cluster try to determine if the new host has been started with -C option. The host that has just come online uses the queue NET_SEND_Q to send the necessary information to the other hosts in the cluster. If this host is unable to access the queue NET_SEND_Q this message is printed.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
When a new host comes Online, the other hosts in the cluster try to determine if the new host has been started with -C option. The host that has just come online uses the queue NET_SEND_Q to send the necessary information to the other hosts in the cluster. If this host is unable to send the necessary information to the other hosts in the cluster, this message is printed.
Corrective action:
Check if there is a problem with the network.
Content:
The value of attr is not resolvable to a valid network address.
Corrective action:
Ensure that a valid interface is specified for attr.
Content:
RMS on the local node could not start RMS on the remote node <SysNode> by using either cfsh, rsh, and ssh.
Corrective action:
Make cfsh available. rsh and ssh are not supported.
This message may be output when the hvcm -a command is executed on multiple nodes. In this case, execute the hvcm -a command on any one node that configures a cluster.
Content:
This message appears when the request is done for an application userapplication to go Online but the host sysnode is running a different configuration.
Corrective action:
Make sure that the user is running the same configuration.
Content:
This message is printed when invalid entries exist in the priority list list.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
A critical internal error has occurred.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
A critical internal error has occurred.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
This message appears when either the PriorityList is empty or the list is corrupted. A critical internal error has occurred.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
If a contract that is supposed to be present in the internal list does not exist, this message is printed. The cluster may be in an inconsistent condition.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
This message appears when there is only one application sysnode Online and has to process a contract that is not supported. A critical internal error has occurred.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
This message appears when the contract is processed by the local host, which does not have the lock for that application contract. A critical internal error has occurred.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
This message appears when the LOCK contract cannot be sent over the network.
Corrective action:
The network may be down. Check the network for abnormalities.
Content:
This message appears when the UNLOCK contract cannot be sent over the network.
Corrective action:
The network may be down. Check the network for abnormalities.
Content:
This message appears when the local node receives a UNLOCK contract but is unable to perform the follow up processing that was committed in the contract.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
A host was unable to propagate the received UNLOCK contract, e.g., because of networking problems or memory problems.
Corrective action:
This message should appear with an additional ERROR message specifying the origin of the problem.
Refer to the ERROR message.
Content:
This message appears when the local contract node has completed the contract and has sent it to the local node but the local node could not able to find it.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
The correspondent userApplication on a remote host is in the DeAct state, but the local userApplication is not. This is an error that should not occur.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
When the local host receives a contract for unlocking the hosts in the cluster with respect to a particular operation, if the local host finds that a particular host has died, it updates its priority list to reflect this, but if it is unable to perform this operation due to some reason, this message is printed. This indicates a critical internal problem in memory management. This is a critical internal error.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
This message appears when the application is unable to read the data section of the contract.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
This message appears when the application unable to unlock the contract as it was unable to find the kind of contract request in its code that it expected. A critical internal error has occurred.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
This message appears when an unknown task is found in list of outstanding contracts. Critical internal error.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
The state of the application is Offline or Standby and some of its resources are Online or Faulted.
Corrective action:
Clear the inconsistency by the appropriate command (usually 'hvutil - c').
Content:
File open error.
Corrective action:
Check the environmental variable RELIANT_PATH.
Content:
This message appears when the status_info file has incorrect entry in it.
This error should not occur unless the status_info file was edited manually.
Corrective action:
Check the status_info file for manually edited incorrect entries. If this is not the case, record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
A critical internal error has occurred.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
A critical internal error has occurred.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
A critical internal error has occurred.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
RMS is unable to execute the fcntl() system call to <flags> the file descriptor flags of file <filename> because of error code <errornumber> as explained by <errortext>.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
The local cluster host detected that another cluster host <hostname> was no longer reachable. In other words, this cluster host sees the other host <hostname> as faulted. The other host <hostname> may have gone down, or there may some problem with the cluster interconnect.
Corrective action:
See if the host <hostname> is indeed dead. If not, see if there is a problem with the network connection.
Content:
When the detector on the local host detects that the host <hostname> has transitioned from online to offline unexpectedly, it attempts to kill the host <hostname>.
Corrective action:
Check the syslog on the host <hostname> to determine the reason why it went down.
Content:
This message is output when an unexpected Faulted state is notified from a detector.
Corrective action:
Investigate the cause of the resource failure and take necessary actions.
Content:
The detector script for the resource has exceeded the ScriptTimeout limit.
Corrective action:
Check the resource for breakdown. If so, take the necessary corrective action.
If not, change the attribute so that the value specified to resource ScriptTimeout is longer than the Online/Offline script execution time.
Content:
The Shutdown Facility that is killing host hostname has not terminated yet. Operator intervention may be required. This message will appear periodically (with the period equal to the node's ScriptTimeout value), until either the script terminates on its own, or until the script is terminated by the Unix kill command. If terminated by the kill command, the host being killed will not be considered killed.
Corrective action:
Wait until the script terminates, or terminate the script using kill command if the script cannot terminate on its own.
Content:
When controller propagates its requests to the controlled applications, it is waiting for the completion of the request for a period of time sufficient for the controlled applications to process the request. When the request is not completed within this period, controller faults.
Corrective action:
Fix the controller's scripts and/or scripts of the controlled applications, or repair resources of the controlled applications. For user defined controller scripts increase their ScriptTimeout values.
Content:
After dynamic modification, the Shutdown Facility is notified via sdtool about the changes in the current configuration. If this notification does not finish within the period specified by the local SysNode ScriptTimeout value, the base monitor must exit.
Corrective action:
Verify that sdtool and Shutdown Facility are properly operating. Increase the ScriptTimeout value if needed.
Content:
The script could not be made into a time sharing process.
Corrective action:
Take the necessary action based on the cause.
Content:
A critical internal error has occurred.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
This message could occur in any of the following scenarios:
A detector cannot be started because RMS is unable to create the detector process with the command command.
'hvcm -a' has been invoked and the RMS base monitor cannot be started on the individual hosts comprising the cluster with the command command.
A script cannot be started because RMS is unable to create the script process with the command command.
RMS shuts down on the node where this message appears and returns an error number errno, which is the error number returned by the operating system.
Corrective action:
Check if the problem cause can be identified from the error number errno, referring to the manual page of the system or "Appendix B Solaris/Linux ERRNO table" If the cause cannot be determined, contact field engineers.
Content:
This message could occur in any of the following scenarios:
A detector cannot be started because RMS is unable to create the detector process to execute the command command.
'hvcm -a' has been invoked and the RMS base monitor cannot be started on the individual hosts comprising the cluster with the command command.
A script cannot be started because RMS is unable to create the script process with the command command.
RMS shuts down on the node where this message appears and returns an error number errno, which is the error number returned by the operating system.
Corrective action:
Check if the problem cause can be identified from the error number errno, referring to the manual page of the system or "Appendix B Solaris/Linux ERRNO table" If the cause cannot be determined, contact field engineers.
Content:
There is no signal handler associated with the signal signal.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
A critical internal error has occurred.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
A critical internal error has occurred.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
A critical internal error has occurred.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
This message occurs if RMS is unable to create a datagram endpoint for communication.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
This could occur if RMS is unable to bind the endpoint for communication.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
RMS internal error.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
When the base monitor for RMS starts up, it creates a slot in an internal data structure for every host in the cluster.
When hvdet_node is started up, RMS sends it a list of the SysNode objects that are put into different slots in the internal data structure. If the data structure has run out of slots (16) to put the SysNode name in, RMS generates this error.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
When the host name hostname specified as a SysNode does not have an entry in /etc/hosts, this message is printed to the switchlog.
Corrective action:
Correct the host name hostname to be an entry in /etc/hosts.
Content:
When a slot for a cluster interface (64) is insufficient, this message is output with the node name hostname.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
A critical internal error has occurred.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
A critical internal error has occurred.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
The child process cmd with pid pid could not be killed due to reason: reason.
Corrective action:
Take action based on the reason reason.
Content:
The killChild routine accepts one of the 2 flags: KILL_CHILD and DONTKILL_CHILD. If an option other than these two has been specified, this message is printed.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
The child process cmd has exceeded its timeout period.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
RMS on the local host has received a message from host host whose address is not resolvable by the local host.
Corrective action:
Make sure that the local host is able to resolve the remote host host's address by checking for any misconfigurations.
Content:
RMS on the local host has received a message from host host whose address is not resolvable by the local host.
Corrective action:
Make sure that the local host is able to resolve the remote host host's address by checking for any misconfigurations.
Content:
The local host has received a message from host host with IP address receivedip, which is different from the locally calculated IP address for that host.
Corrective action:
Check /etc/hosts for any misconfiguration of RMS configuration file.
Content:
The local host has received a message from host host with IP address receivedip, which is different from the locally calculated IP address for that host. This message will be printed in the switchlog for every 25 such messages that have been received as long as the number of received messages is less than 500, if not this message is printed for every 250th such message received.
Corrective action:
Check /etc/hosts for any misconfiguration of RMS configuration file.
Content:
An abnormal OS condition occurred while creating a message queue.
Corrective action:
Check OS conditions that affect memory allocation for message queues, such as the size of swap space, the values of parameters msgmax, msgmnb, msgmni, msgtql. Check if the maximum number of message queues have already been allocated.
Content:
The time on host is not in sync with the time on the local node.
Corrective action:
Synchronize the time of host with that of the local node.
Also, check if NTP server is connected to network and if the NTP setting is correct.
Content:
The time on the cluster host host differs significantly (more than 25 seconds) from the local node.
Corrective action:
Synchronize the time.
Also, check if NTP server is connected to network and if the NTP setting is correct.
Content:
The operation func failed with error code errorcode.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
ELM heart beat is stopped.
Corrective action:
No action is required because nodes will be forcibly stopped by the stop of ELM heartbeat normally.
Content:
RMS internal error.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
RMS internal error.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
For SysNode, both IPv4 and IPv6 addresses are assigned to the host name database files (/etc/hosts, /etc/inet/ipnodes).
Corrective action:
Correct the host name database files so that either IPv4 IP address or IPv6 IP address is assigned to SysNode.