This chapter contains a detailed list of fatal RMS error messages output in the switchlog file.
Check displayed component names of messages and then see the table below to determine references. The component names are explained in numerical order of messages.
Component | Reference |
---|---|
ADC | |
ADM | |
BM | |
CML | |
CMM | |
CRT | |
DET | |
INI | |
MIS | |
QUE | |
SCR | |
SYS | |
UAP | |
US | |
WRP |
Content:
All of the global environment variables RELIANT_LOG_LIFE, RELIANT_SHUT_MIN_WAIT, HV_CHECKSUM_INTERVAL, HV_LOG_ACTION_THRESHOLD, HV_LOG_WARNING_THRESHOLD, HV_WAIT_CONFIG and HV_RCSTART have to be set in hvenv in order for RMS to function properly. If some of them have not been set, RMS exits with exit code 1.
Corrective action:
Set the values of all the environment variables in hvenv.
Content:
If some of the local environment variables have not been set in the hvenv file, RMS prints this message and exits with exit code 1.
Corrective action:
Make sure that all the local environment variables have been set to an appropriate value in the /opt/SMAW/SMAWRrms/bin/hvenv.local file.
Content:
RMS internal error.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
The value of environment variable <hvenv> is out of range. The base monitor will exit.
Corrective action:
Use a valid value for the environment variable and restart RMS.
Content:
RMS uses UNIX message queues for interprocess communication. The admin queue is one such queue used for communication between utilities like hvutil, hvswitch, etc. If there is a problem opening this queue, then this message is printed and RMS exits with exit code 3.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
When RMS is starting up, it performs dynamic modification under the hood, if during this phase it encounters errors in its configuration file, RMS exits with exit code 23.
Corrective action:
Make sure there are no errors in the configuration file based on the error messages printed prior to the above message in the switchlog.
Content:
An attempt has been made to start RMS in a way that does not conform to its expected usage. This message is printed to the switchlog indicating the arguments, and RMS exits with exit code 3.
Corrective action:
Start RMS with the right arguments.
Content:
During dynamic reconfiguration, RMS calculates the configuration checksum by using /usr/bin/sum. If this fails, then this message is printed and RMS exits with the exit code 52.
Corrective action:
Check if /usr/bin/sum is available.
Content:
While setting up CF, if RMS encounters a problem in the routine routine that can either be "dlopen" or "dlsym", it exits with exit code 95 or 94 respectively. The error_reason gives the reason for the error.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
This is a generic message that is printed to the switchlog before RMS discontinues its functioning because it does not have enough memory for it to operate.
Corrective action:
Possible causes are:
Insufficient memory resources
Incorrect kernel parameter setting
Reexamine the estimation of the memory resources that are required for the entire system. For information on the amount of memory required for operation of PRIMECLUSTER, see the PRIMECLUSTER Installation Guide, which is provided with each product.
If you still have the problem, confirm that the kernel parameter settings are correct by referring to "Setup (initial configuration)" of PRIMECLUSTER Designsheets for PRIMECLUSTER 4.4 or later, or "Kernel Parameter Worksheet" of "PRIMECLUSTER Installation and Administration Guide" for PRIMECLUSTER 4.3 or earlier. Change the settings, as needed, and then reboot the system.
If the error still cannot be solved after the above actions, record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
Upon concluding dynamic modification, RMS dumps out its current configuration into a file /var/tmp/config.us. If this cannot be done, RMS cannot recalculate configuration's checksum. Therefore, it shuts down.
Corrective action:
The previous message in switchlog explains why RMS has not been able to write down the configuration
file. Correct the host environment according to the description, or contact field engineers.
Content:
One or more of the system defined message queue parameters is not sufficient for correct operation of RMS. RMS exits with exit code 28.
Corrective action:
Change the OS message queue parameters and reboot the OS before restarting RMS.
Content:
The SysNode name length is greater than the maximum allowable length.
Corrective action:
Ensure that the length of the SysNode name is less than maxlength.
Content:
RMS is started before CF has been started.
Corrective action:
Start RMS again after CF has been started.
Content:
RMS internal error.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
RMS is started before CF has been started.
Corrective action:
Start RMS again after CF has been started.
Content:
If the configuration file specified for RMS is non-existent, RMS exits with exit code 1.
Or, resource that is not registered to userApplication may remain.
Another possibility is that cluster applications are not created.
Corrective action:
Specify a valid configuration file for RMS to function.
Or, register the corresponding resource to userApplication or delete the resource, as needed.
Alternatively, create cluster applications. For the overview of cluster applications and how to create them, see "PRIMECLUSTER Installation and Administration Guide."
Content:
If there is an error in creating outbound network communication, this message is printed, and RMS exits with exit code 12.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
If there is an error in creating inbound network communication, this message is printed, and RMS exits with exit code 12.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
A system error has occurred within RMS.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
If RMS is unable to create the Unix Message queue DET_REP_Q for communication between a detector and itself, this message is printed, and RMS exits with exit code 12.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
During hvlogclean, the detector request queue queue is used for sending information to the detector from the base monitor. If there is a problem in communication, this message is printed, and RMS exits with exit code 12.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
Each of the generic detectors has a message queue that it uses to communicate with the base monitor. If there is a problem creating a queue for a detector of kind kind, this message is printed, and RMS exits with exit code 12.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
If RMS is unable to create the Unix Message queue DET_REP_Q for communication between a detector and itself, this message is printed, and RMS exits with exit code 12. When RMS encounters an error while reading this file, it prints this message along with the reason errorreason for the failure. RMS then exits with exit code 26.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
InitScript exists, but cannot be executed.
Corrective action:
make InitScript executable.
Content:
If the local SysNode sysnode is not part of the configuration file, this message is printed and RMS exits with exit code 23.
Corrective action:
Make sure that the local SysNode sysnode is part of the configuration file.
Content:
InitScript was still running when the timeout limit allocated for its execution has expired. The timeout limit is the lesser of the values defined in the environment variable SCRIPTS_TIME_OUT or 300.
Corrective action:
Increase the timeout value, or correct the conditions that lead to timeout during script execution.
Content:
An error occurred during startup of InitScript. The errno code errno and reason reason are presented in the message.
Corrective action:
Correct the erroneous host condition for InitScript to be able to start up.
Content:
InitScript completed with a non-zero exit code exitcode.
Corrective action:
Correct the erroneous host condition for InitScript to be able to return a zero exit code, or fix the InitScript itself.
Content:
InitScript has been stopped.
Corrective action:
Correct the erroneous host condition for InitScript to run without stopping, or fix the InitScript itself.
Content:
InitScript has been abnormally terminated.
Corrective action:
Correct the erroneous host condition for InitScript to run without stopping, or fix the InitScript itself.
Content:
RMS internal error.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
RMS internal error.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
RMS internal error.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
RMS internal error.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
RMS internal error.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
The various RMS commands like hvdisp, hvswitch, and hvdump utilize the lock files from the directory directory for signal handling purposes. These files are deleted after these commands are completed. The locks directory is also cleaned when RMS starts up. If they are not cleaned for some reason, this message is printed, and RMS exits with exit code 99. The call indicates at which stage the cleanup has failed, errornumber is the OS errno value, errortext is the OS supplied expla-nation for the errno.
Corrective action:
Make sure that the locks directory directory exists.
Content:
RMS internal error.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
Different utilities use the ADMIN_Q to communicate with the base monitor. If there is an error with this queue, this message is printed and RMS exits with exit code 3.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
The RMS base monitor was unable to extract a message of the ADMIN_Q that is used for communication between the utilities and RMS. This message is printed and RMS then exits with exit code 3.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
If there is a problem reading a message over the network, this message is printed, and RMS exits with exit code 3.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
When a network problem occurs during message transmission, this message is printed.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
All the detectors use the queue DET_REP_Q to communicate with the RMS base monitor. If there is a problem in reading the message of the queue, RMS prints this message and exits with exit code 15.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
The RMS base monitor encountered a problem with the queue DET_REP_Q that is used by the different detectors to report their state. This message is printed, and RMS then exits with exit code 15.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
There was a problem using the message queue. The error number <errornumber> and the text in <errortext> indicate the type of error.Message queues are used to communicate with the base monitor.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
If a detector request queue could not be created for detector detector_name, this message is printed, and RMS exits with exit code 12. This is a critical internal error.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
If the detector detector could not be restarted, this message is printed, and RMS exits with exit code 14. The restart could have failed for any of the following reasons:
If the detector needs to be restarted more than 3 times in one minute.
If there is a problem with memory allocation within RMS.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
RMS runs the InitScript initially. The value of InitScript is the value of the environment variable RELIANT_INITSCRIPT. If InitScript fails (e.g., exits with a non-zero code, gets a signal), then this message is printed and RMS shuts down with exit code 56.
Corrective action:
If the InitScript is configured, check that there are no problems with the set InitScript. If it is not configured, record this message and collect the investigation information. Then, contact field engineers. For details on how to collect the information, refer to "PRIMECLUSTER Installation and Administration Guide."
Content:
Since the scripts are executed based on the reports of the detectors, if the detector reports a state other than Online, Offline, Faulted, Standby or NoReport, this message is printed, and RMS exits with exit code 8.
Corrective action:
Make sure that the detector only reports states Online, Offline, Faulted, Standby or NoRepo
Content:
RMS has been unable to execute a script <script> for the object <objectname>. The error number <errornumber> returned by the operating system provides a diagnosis of the failure. RMS exits with exit code 8.
Corrective action:
Consult the "Appendix B Solaris/Linux ERRNO table" of this manual for the explanation for error number <errornumber> and see if the cause is evident. If not, contact field engineers.
Content:
If the lookup function gethostbyname searches the file /etc/hosts to get information about the host hostname, but is unable to find a valid entry for it, this message is printed and RMS exits with exit code 114.
Corrective action:
Make sure that the host name hostname has a valid entry in /etc/hosts and restart RMS.
Content:
When RMS encounters a problem in creating the NODE_REQ_Q, this message is printed, and RMS exits with exit code 12.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
When the HaltFlag attribute is set for an object and a double fault occurs, then RMS will exit with exit code 96 on that node.
Corrective action:
Determine the cause of failure occurred to the resource and take the necessary action.
Content:
Errors were found in the RMS configuration file that prevented RMS startup. This is usually caused by manual edition and distribution of the RMS configuration file.
Corrective action:
Correct errors of the RMS configuration file.
Content:
A state transition error occurred during the course of RMS state transitions. Details of the error are printed in the subsequent lines.
Corrective action:
Take the necessary action according to the error content. If action to be taken is unclear, collect the investigation information. Then, contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
The length of the interconnect name is greater than the maximum value.
Corrective action:
Make sure that the interconnect name is less than the maximum value of maxlength.
Change the interconnect name to a shorter one and then reconfigure the cluster system.
Content:
RMS internal error.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
RMS internal error.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
This is a generic message indicating that the execution of the routine <routine> failed due to the reason <errorreason> and hence the RMS-CF interface is inconsistent.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
RMS internal error.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
RMS internal error.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
ELM heartbeat method is not available on non CF mode cluster.
Corrective action:
Install CF or disable ELM mode by setting HV_USE_ELM=0.
Content:
The HV_USE_ELM setting is invalid.
Corrective action:
Set HV_USE_ELM to 0 or 1.
Content:
A critical internal error has occurred.
Corrective action:
A fatal internal error has occurred. Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
RMS internal error.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."
Content:
RMS internal error.
Corrective action:
Record this message, collect investigation information, and contact field engineers. For details on collecting the investigation information, see "PRIMECLUSTER Installation and Administration Guide."