6.1.4 Fatal Error Messages

Component name	Reference
ADC	"6.1.4.1 ADC: Admin configuration"
ADM	"6.1.4.2 ADM: Admin, command, and detector queues"
BM	"6.1.4.3 BM: Base monitor"
CML	"6.1.4.4 CML: Command line"
CMM	"6.1.4.5 CMM: Communication"
CRT	"6.1.4.6 CRT: Contracts and contract jobs"
DET	"6.1.4.7 DET: Detectors"
INI	"6.1.4.8 INI: init script"
MIS	"6.1.4.9 MIS: Miscellaneous"
QUE	"6.1.4.10 QUE: Message queues"
SCR	"6.1.4.11 SCR: Scripts"
SYS	"6.1.4.12 SYS: SysNode objects"
UAP	"6.1.4.13 UAP: userApplication objects"
US	"6.1.4.14 US: us files"
WRP	"6.1.4.15 WRP: Wrappers"

(ADC, 16) Because some of the global environment variables were not set in hvenv file, RMS cannot start up. Shutting down.

Content:

All of the global environment variables RELIANT_LOG_LIFE, RELIANT_SHUT_MIN_WAIT, HV_CHECKSUM_INTERVAL, HV_LOG_ACTION_THRESHOLD, HV_LOG_WARNING_THRESHOLD, HV_WAIT_CONFIG and HV_RCSTART have to be set in hvenv in order for RMS to function properly. If some of them have not been set, RMS exits with exit code 1.

Corrective action:

Set the values of all the environment variables in hvenv.

(ADC, 21) Because some of the local environment variables were not set in hvenv file, RMS cannot start up. Shutting down.

Content:

If some of the local environment variables have not been set in the hvenv file, RMS prints this message and exits with exit code 1.

Corrective action:

Make sure that all the local environment variables have been set to an appropriate value in the /opt/SMAW/SMAWRrms/bin/hvenv.local file.

(ADC, 69) RMS will not start up - previous errors opening file.

Content:

RMS internal error.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(ADC, 73) The environment variable <hvenv> has value <value> which is out of range.

Content:

The value of environment variable <hvenv> is out of range. The base monitor will exit.

Corrective action:

Use a valid value for the environment variable and restart RMS.

6.1.4.2 ADM: Admin, command, and detector queues

(ADM, 1) cannot open admin queue

Content:

RMS uses UNIX message queues for interprocess communication. The admin queue is one such queue used for communication between utilities like hvutil, hvswitch, etc. If there is a problem opening this queue, then this message is printed and RMS exits with exit code 3.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(ADM, 2) RMS will not start up - errors in configuration file

Content:

When RMS is starting up, it performs dynamic modification under the hood, if during this phase it encounters errors in its configuration file, RMS exits with exit code 23.

Corrective action:

Make sure there are no errors in the configuration file based on the error messages printed prior to the above message in the switchlog.

6.1.4.3 BM: Base monitor

(BM, 3) Usage: progname [-c config_file] [-m] [-h time] [-l level] [-n]

Content:

An attempt has been made to start RMS in a way that does not conform to its expected usage. This message is printed to the switchlog indicating the arguments, and RMS exits with exit code 3.

Corrective action:

Start RMS with the right arguments.

(BM, 49) Failure calculating configuration checksum

Content:

During dynamic reconfiguration, RMS calculates the configuration checksum by using /usr/bin/sum. If this fails, then this message is printed and RMS exits with the exit code 52.

Corrective action:

Check if /usr/bin/sum is available.

(BM, 51) The RMS-CF interface is inconsistent and will require operator intervention. The routine "routine" failed with errno errno - "error_reason"

Content:

While setting up CF, if RMS encounters a problem in the routine routine that can either be "dlopen" or "dlsym", it exits with exit code 95 or 94 respectively. The error_reason gives the reason for the error.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(BM, 58) Not enough memory -- RMS cannot continue its operations and is shutting down

Content:

This is a generic message that is printed to the switchlog before RMS discontinues its functioning because it does not have enough memory for it to operate.

Corrective action:

Possible causes are:

Insufficient memory resources
Incorrect kernel parameter setting

Reexamine the estimation of the memory resources that are required for the entire system. For information on the amount of memory required for operation of PRIMECLUSTER, see the PRIMECLUSTER Installation Guide, which is provided with each product.

If you still have the problem, confirm that the kernel parameter settings are correct by referring to "Setup (initial configuration)" of PRIMECLUSTER Designsheets for PRIMECLUSTER 4.4 or later, or "Kernel Parameter Worksheet" of "PRIMECLUSTER Installation and Administration Guide" for PRIMECLUSTER 4.3 or earlier. Change the settings, as needed, and then reboot the system.

If the error still cannot be solved after the above actions, record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(BM, 67) An error occurred while writing out the RMS configuration after dynamic modification. RMS is shutting down.

Content:

Upon concluding dynamic modification, RMS dumps out its current configuration into a file /var/tmp/config.us. If this cannot be done, RMS cannot recalculate configuration's checksum. Therefore, it shuts down.

Corrective action:

The previous message in switchlog explains why RMS has not been able to write down the configuration

file. Correct the host environment according to the description, or contact field engineers.

(BM, 69) Some of the OS message queue parameters msgmax= <msgmax>, msgmnb= <msgmnb>, msgmni=<msgmni>, msgtql=<msgtql> are below lower bounds <hvmsgmax>, <hvmsgmnb>, <hvmsgmni>, <hvmsgtql>. RMS is shutting down.

Content:

One or more of the system defined message queue parameters is not sufficient for correct operation of RMS. RMS exits with exit code 28.

Corrective action:

Change the OS message queue parameters and reboot the OS before restarting RMS.

(BM, 89) The SysNode length is length. This is greater than the maximum allowable length of maxlength. RMS will now shut down.

Content:

The SysNode name length is greater than the maximum allowable length.

Corrective action:

Ensure that the length of the SysNode name is less than maxlength.

(BM, 116) The RMS-CF interface is inconsistent and will require operator intervention. The CF layer is not yet initialized.

Content:

RMS is started before CF has been started.

Corrective action:

Start RMS again after CF has been started.

(BM, 117) The RMS-CIP interface state on the local node cannot be determined due to error in popen() -- errno = errornumber: errortext.

Content:

RMS internal error.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(BM, 118) The RMS-CIP interface state on the local node is required to be "UP", the current state is state.

Content:

RMS is started before CF has been started.

Corrective action:

Start RMS again after CF has been started.

6.1.4.4 CML: Command line

(CML, 14) ###ERROR: Unable to find or Invalid configuration file.###
#####CONFIGURATION MONITOR exits !!!!!######

Content:

If the configuration file specified for RMS is non-existent, RMS exits with exit code 1.

Or, resource that is not registered to userApplication may remain.

Another possibility is that cluster applications are not created.

Corrective action:

Specify a valid configuration file for RMS to function.

Or, register the corresponding resource to userApplication or delete the resource, as needed.

Alternatively, create cluster applications. For the overview of cluster applications and how to create them, see "PRIMECLUSTER Installation and Administration Guide."

6.1.4.5 CMM: Communication

(CMM, 1) Error establishing outbound network communication

Content:

If there is an error in creating outbound network communication, this message is printed, and RMS exits with exit code 12.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(CMM, 2) Error establishing inbound network communication

Content:

If there is an error in creating inbound network communication, this message is printed, and RMS exits with exit code 12.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

6.1.4.6 CRT: Contracts and contract jobs

(CRT, 6) Fatal system error in RMS. RMS will shut down now. Please check the bmlog for SysNode information.

Content:

A system error has occurred within RMS.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

6.1.4.7 DET: Detectors

(DET, 8) Failed to create DET_REP_Q

Content:

If RMS is unable to create the Unix Message queue DET_REP_Q for communication between a detector and itself, this message is printed, and RMS exits with exit code 12.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(DET, 9) Message send failed in detector request Q: queue

Content:

During hvlogclean, the detector request queue queue is used for sending information to the detector from the base monitor. If there is a problem in communication, this message is printed, and RMS exits with exit code 12.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(DET, 16) Cannot create gdet queue of kind gkind

Content:

Each of the generic detectors has a message queue that it uses to communicate with the base monitor. If there is a problem creating a queue for a detector of kind kind, this message is printed, and RMS exits with exit code 12.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(DET, 18) Error reading hvgdstartup file. Error message: errorreason.

Content:

If RMS is unable to create the Unix Message queue DET_REP_Q for communication between a detector and itself, this message is printed, and RMS exits with exit code 12. When RMS encounters an error while reading this file, it prints this message along with the reason errorreason for the failure. RMS then exits with exit code 26.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

6.1.4.8 INI: init script

(INI, 4) InitScript does not have execute permission.

Content:

InitScript exists, but cannot be executed.

Corrective action:

make InitScript executable.

(INI, 7) sysnode must be in your configuration file

Content:

If the local SysNode sysnode is not part of the configuration file, this message is printed and RMS exits with exit code 23.

Corrective action:

Make sure that the local SysNode sysnode is part of the configuration file.

(INI, 10) InitScript has not completed within the allocated time period of timeout seconds.

Content:

InitScript was still running when the timeout limit allocated for its execution has expired. The timeout limit is the lesser of the values defined in the environment variable SCRIPTS_TIME_OUT or 300.

Corrective action:

Increase the timeout value, or correct the conditions that lead to timeout during script execution.

(INI, 11) InitScript failed to start up, errno errno, reason: reason.

Content:

An error occurred during startup of InitScript. The errno code errno and reason reason are presented in the message.

Corrective action:

Correct the erroneous host condition for InitScript to be able to start up.

(INI, 12) InitScript returned non-zero exit code exitcode.

Content:

InitScript completed with a non-zero exit code exitcode.

Corrective action:

Correct the erroneous host condition for InitScript to be able to return a zero exit code, or fix the InitScript itself.

(INI, 13) InitScript has been stopped.

Content:

InitScript has been stopped.

Corrective action:

Correct the erroneous host condition for InitScript to run without stopping, or fix the InitScript itself.

(INI, 14) InitScript has been abnormally terminated.

Content:

InitScript has been abnormally terminated.

Corrective action:

Correct the erroneous host condition for InitScript to run without stopping, or fix the InitScript itself.

(INI, 17) Controller controller refers to an unknown userApplication <userapplication>

Content:

RMS internal error.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(INI, 18) Configuration uses objects of type "controller" and of type "gcontroller". These object types are mutually exclusive!

Content:

RMS internal error.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(INI, 19) userApplication <childapp> is simultaneously controlled by 2 gcontroller objects <controller1> and <controller2>. This will result in unresolveable conflicts!

Content:

RMS internal error.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(INI, 20) Incorrect configuration of the gcontroller object <controller>! The attributes "Resource" and "ControllerType" are mandatory.

Content:

RMS internal error.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(INI, 21) Incorrect configuration of the gcontroller object <controller>! It has the attribute Local set, but the host list for the controlled application <childapp> does not match the host list for the controlling application <parentapp>.

Content:

RMS internal error.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

6.1.4.9 MIS: Miscellaneous

(MIS, 4) The locks directory directory cannot be cleaned of all old locks files: error at call of file: filename, errno = errnonumber, error -- errortext.

Content:

The various RMS commands like hvdisp, hvswitch, and hvdump utilize the lock files from the directory directory for signal handling purposes. These files are deleted after these commands are completed. The locks directory is also cleaned when RMS starts up. If they are not cleaned for some reason, this message is printed, and RMS exits with exit code 99. The call indicates at which stage the cleanup has failed, errornumber is the OS errno value, errortext is the OS supplied expla-nation for the errno.

Corrective action:

Make sure that the locks directory directory exists.

(MIS, 9) The locks directory directory does not exist. An installation error occured or the directory was removed after installation.

Content:

RMS internal error.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

6.1.4.10 QUE: Message queues

(QUE, 1) Error status in ADMIN_Q.

Content:

Different utilities use the ADMIN_Q to communicate with the base monitor. If there is an error with this queue, this message is printed and RMS exits with exit code 3.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(QUE, 2) Read message failed in ADMIN_Q.

Content:

The RMS base monitor was unable to extract a message of the ADMIN_Q that is used for communication between the utilities and RMS. This message is printed and RMS then exits with exit code 3.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(QUE, 5) Network message read failed.

Content:

If there is a problem reading a message over the network, this message is printed, and RMS exits with exit code 3.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(QUE, 6) Network problem occurred.

Content:

When a network problem occurs during message transmission, this message is printed.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(QUE, 11) Read message failed in DET_REP_Q.

Content:

All the detectors use the queue DET_REP_Q to communicate with the RMS base monitor. If there is a problem in reading the message of the queue, RMS prints this message and exits with exit code 15.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(QUE, 12) Error status in DET_REP_Q: status.

Content:

The RMS base monitor encountered a problem with the queue DET_REP_Q that is used by the different detectors to report their state. This message is printed, and RMS then exits with exit code 15.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(QUE, 15) Error No errornumber : <errortext> in accessing the message queue.

Content:

There was a problem using the message queue. The error number <errornumber> and the text in <errortext> indicate the type of error.Message queues are used to communicate with the base monitor.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

6.1.4.11 SCR: Scripts

(SCR, 4) Failed to create a detector request queue for detector detector_name.

Content:

If a detector request queue could not be created for detector detector_name, this message is printed, and RMS exits with exit code 12. This is a critical internal error.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(SCR, 5) REQUIRED PROCESS RESTART FAILED: Unable to restart detector. Shutting down RMS.

Content:

If the detector detector could not be restarted, this message is printed, and RMS exits with exit code 14. The restart could have failed for any of the following reasons:

If the detector needs to be restarted more than 3 times in one minute.
If there is a problem with memory allocation within RMS.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(SCR, 10) InitScript did not run ok. RMS is being shut down

Content:

RMS runs the InitScript initially. The value of InitScript is the value of the environment variable RELIANT_INITSCRIPT. If InitScript fails (e.g., exits with a non-zero code, gets a signal), then this message is printed and RMS shuts down with exit code 56.

Corrective action:

If the InitScript is configured, check that there are no problems with the set InitScript. If it is not configured, record this message and collect the investigation information. Then, contact field engineers. For details on how to collect the information, refer to "PRIMECLUSTER Installation and Administration Guide."

(SCR, 12) incorrect initialization of RealDetReport; Shutting down RMS.

Content:

Since the scripts are executed based on the reports of the detectors, if the detector reports a state other than Online, Offline, Faulted, Standby or NoReport, this message is printed, and RMS exits with exit code 8.

Corrective action:

Make sure that the detector only reports states Online, Offline, Faulted, Standby or NoRepo

(SCR, 13) ExecScript: Failed to exec script <script> for object <objectname>: errno errno

Content:

RMS has been unable to execute a script <script> for the object <objectname>. The error number <errornumber> returned by the operating system provides a diagnosis of the failure. RMS exits with exit code 8.

Corrective action:

Consult the "Appendix B Solaris/Linux ERRNO table" of this manual for the explanation for error number <errornumber> and see if the cause is evident. If not, contact field engineers.

6.1.4.12 SYS: SysNode objects

(SYS, 33) The RMS cluster host <hostname> does not have a valid entry in the /etc/hosts file. The lookup function gethostbyname failed. Please change the name of the host to a valid /etc/hosts entry and then restart RMS.

Content:

If the lookup function gethostbyname searches the file /etc/hosts to get information about the host hostname, but is unable to find a valid entry for it, this message is printed and RMS exits with exit code 114.

Corrective action:

Make sure that the host name hostname has a valid entry in /etc/hosts and restart RMS.

(SYS, 52) SysNode sysnode: error creating necessary message queue NODE_REQ_Q...exiting.

Content:

When RMS encounters a problem in creating the NODE_REQ_Q, this message is printed, and RMS exits with exit code 12.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

6.1.4.13 UAP: userApplication objects

(UAP, 36) object: double fault occurred, but Halt attribute is set. RMS will exit immediately in order to allow a failover!

Content:

When the HaltFlag attribute is set for an object and a double fault occurs, then RMS will exit with exit code 96 on that node.

Corrective action:

Determine the cause of failure occurred to the resource and take the necessary action.

6.1.4.14 US: us files

(US, 1) RMS will not start up - fatal errors in configuration file.

Content:

Errors were found in the RMS configuration file that prevented RMS startup. This is usually caused by manual edition and distribution of the RMS configuration file.

Corrective action:

Correct errors of the RMS configuration file.

(US, 42) A State transition error occured. See the next message for details.

Content:

A state transition error occurred during the course of RMS state transitions. Details of the error are printed in the subsequent lines.

Corrective action:

Take the necessary action according to the error content. If action to be taken is unclear, collect the investigation information. Then, contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

6.1.4.15 WRP: Wrappers

(WRP, 40) The length of the type name specified for the host host is <length> which is greater than the maximum allowable length <maxlength>. RMS will exit now.

Content:

The length of the interconnect name is greater than the maximum value.

Corrective action:

Make sure that the interconnect name is less than the maximum value of maxlength.

Change the interconnect name to a shorter one and then reconfigure the cluster system.

(WRP, 44) Not enough slots left in the wrapper data structure to create new entries.

Content:

RMS internal error.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(WRP, 45) The SysNode to the CIP name mapping for <sysnode> has failed.

Content:

RMS internal error.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(WRP, 46) The RMS-CF interface is inconsistent and will require operator intervention. The routine "routine" failed with error code errorcode -"errorreason".

Content:

This is a generic message indicating that the execution of the routine <routine> failed due to the reason <errorreason> and hence the RMS-CF interface is inconsistent.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(WRP, 47) The RMS-CF-CIP mapping cannot be determined for any host as the CIP configuration file <configfilename> cannot be opened. Please verify that all the entries in <configfilename> are correct and that CF and CIP are fully configured.

Content:

RMS internal error.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(WRP, 48) The RMS-CF-CIP mapping cannot be determined for any host as the CIP configuration file <configfilename> has missing entries. Please verify that all the entries in <configfilename> are correct and that CF and CIP are fully configured.

Content:

RMS internal error.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(WRP, 54) The heartbeat mode setting of <hbmode> is wrong. Cannot use ELM heartbeat method on non-CF cluster.

Content:

ELM heartbeat method is not available on non CF mode cluster.

Corrective action:

Install CF or disable ELM mode by setting HV_USE_ELM=0.

(WRP, 55) The heartbeat mode setting of <hbmode> is wrong. The valid settings are '1' for ELM+UDP and '0' for UDP.

Content:

The HV_USE_ELM setting is invalid.

Corrective action:

Set HV_USE_ELM to 0 or 1.

(WRP, 58) The ELM lock resource <resource> for the local host is being held by another node or application.

Content:

A critical internal error has occurred.

Corrective action:

A fatal internal error has occurred. Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(WRP, 64) The ELM heartbeat startup failure for the cluster host <hostname>.

Content:

RMS internal error.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."

(WRP, 67) The RMS-CF-CIP mapping cannot be determined for any host as the CIP configuration file <configfilename> has missing entries. Please verify that all the entries in <configfilename> are correct and that CF and CIP are fully configured.

Content:

RMS internal error.

Corrective action:

Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."