Note
Global variable settings (ENV) are included in the configurations checksum that is common to the cluster. The checksum is verified on each node during startup of the base monitor. RMS will fail to start if it detects a checksum difference between the values on any two nodes.
The default values of the environment variables are found in <RELIANT_PATH>/bin/hvenv. They can be redefined in the hvenv.local configuration file.
The following list describes the global environment variables for RMS:
Possible values: List of RMS cluster nodes. The list of RMS cluster nodes must be the names of the SysNodes as found in the RMS configuration file. The list of nodes cannot include the CF name.
Default: "" (empty)
List of cluster nodes that RMS ignores when it starts. This environment variable is not set by default. A user application will begin its automatic startup processing if the AutoStartUp attribute is set and when all cluster nodes defined in the user application have reported Online. If a cluster node appears in this list, automatic startup processing will begin even if this node has not yet reported the Online state.
Use this environment variable if one or more cluster nodes need to be taken out of the cluster for an extended period and RMS will continue to use the configuration file that specifies the removed cluster nodes. In this case, specifying the unavailable cluster nodes in this environment variable ensures that all user applications are automatically brought online even if the unavailable cluster nodes do not report Online.
Note
If the environment variables are used, ensure that it is correctly defined on all cluster nodes and that it is always kept up-to-date. When a node is brought back into the cluster, remove it from this environment variable. If this does not occur, data loss could occur because RMS will ignore this node during the startup procedure and will not check whether the application is already running on the nodes specified in this list. It is the system administrator's responsibility to keep this list up-to-date if it is used.
Possible values: 0 - MAXINT
Default: 60 (seconds)
Defines the period (in seconds) that RMS waits for cluster nodes to report Online when RMS is started. If this period expires and not all cluster nodes are online, a switchlog message indicates the cluster nodes that have not reported Online and why the user application(s) cannot be started automatically.
Note
This attribute generates a warning message only. AutoStartUp will proceed even if the specified period has expired.
Possible values: 0 - MAXINT
Default: 120 (seconds)
Interval in seconds for which the RMS base monitor waits for each Online node to verify that its checksum is the same as the local checksum.
If checksums are confirmed within this interval, then RMS on the local node continues its operations as usual. However, if a checksum from a remote node is not confirmed, or if it is confirmed to be different, then the local monitor shuts down if it has been started less than HV_CHECKSUM_INTERVAL seconds before.
Also, if a checksum from a remote node is not confirmed, or if the checksum is confirmed to be different, then the local monitor considers the remote node as Offline if that local monitor has been started more than HV_CHECKSUM_INTERVAL seconds before.
Possible values: 0 - 65535
Default: 8000
The communication port used by the RMS base monitor on all the nodes in the cluster.
Possible values: 0 - 100
Default: 98
Determines conditions under which hvlogcontrol cleans up RMS log files. If the percentage of used space on the file system containing RELIANT_LOG_PATH is greater than or equal to this threshold, all subdirectories below RELIANT_LOG_PATH will be removed. Furthermore, if HV_LOG_ACTION is set to on and all subdirectories have already been removed, the current log files will be removed too. See HV_LOG_ACTION for more information.
Possible values: 0 - 100
Default : 95
Defines when hvlogcontrol warns the user about the volume of RMS log files. If the percentage of used space on the file system containing RELIANT_LOG_PATH is greater than or equal to this threshold value, hvlogcontrol issues a warning to the user. See also HV_LOG_ACTION_THRESHOLD above.
Possible values: 0 - MAXINT
Default: 30
Specifies the minimum difference in seconds when comparing timestamps to determine the last online host (LOH) for userApplication. It is determined if the OnlinePriority attribute is set to 1.
If the difference between the LOH timestamp entries logged in the userApplication on two cluster nodes is less than the time specified by this attribute, RMS does not perform AutoStartUp and does not allow priority switches. Instead, it sends a message to the console and waits for operator intervention.
When adjusting this variable, the quality of the time synchronization in the cluster must be taken into account. The value must be larger than any possible random time difference between the cluster hosts.
Possible values: 0, 1
Default: 1
Specifies the heartbeat monitoring mode used by the RMS base monitor:
0 - remote node and base monitor states are detected by periodically sending UDP heartbeat packets across the network. If no heartbeats are received from a remote node during an interval defined by HV_CONNECT_TIMEOUT, RMS marks the node as down and waits for a recovery period before taking further action.
1 - combines the Enhanced Lock Manager (ELM) method and the UDP heartbeat method. This setting is valid only when CF is installed and configured. The ELM lock is taken and held by the local node until ELM reports a remote node down or remote base monitor down. In either of these cases, the remote node is immediately killed. Until ELM reports a change in a remote node's state, RMS also monitors the UDP heartbeat of each remote node as described above, but with a much longer recovery timeout.
Whether or not ELM is enabled, a remote node is killed if its UDP heartbeat is not received before its heartbeat recovery timeout expires. When CF is not present, ELM is disabled automatically, and the heartbeat recovery timeout defaults to 45 seconds. When CF is present, ELM is enabled by default, and the heartbeat recovery timeout defaults to 600 seconds; this avoids premature node kills when the remote node is slow to respond.
Only experts should disable ELM manually. When CF is present but ELM is disabled, the default 600 second heartbeat recovery timeout is too long for efficient detection of remote RMS or node outages. In this case, the recovery timeout on the local node must also be adjusted manually by starting RMS with the 'hvcm -h <timeout> -c <config_file>' command. Note that the recovery timeout should be set to the same value on every node in the cluster. When ELM is disabled, the recommended global value is 45 seconds.
Possible values: Any number of days
Default: 7 (days)
Specifies the number of days that RMS logging information is retained. Every time RMS starts, the system creates a directory that is named on the basis of when RMS was last started, and which contains all constituent log files. All RMS log files are preserved in this manner. All sub directories under RELIANT_LOG_PATH whose update time (ctime) is older than the number of days specified in this variable are deleted by a cron job.
Possible values: Any valid path
Default: /var/opt/SMAWRrms/log
Specifies the directory where all RMS and Wizard Tools log files are stored. The location to store the internal log will not be changed even if the value of this variable is changed.
Possible values: Any valid path
Default: /opt/SMAW/SMAWRrms
Specifies the root directory of the RMS directory hierarchy. Users do not normally need to change the default setting.
Possible values: 0 - MAXINT
Default: MAXINT (seconds)
Defines the period (in seconds) until the hvshut command times out.
If the hvshut command is executed with any of the -l, -s, and -a options, RMS performs shutdown processing after it performs offline processing of the active applications.
Set the total time of the following items for RELIANT_SHUT_MIN_WAIT:
Maximum time required to complete offline processing of applications
Maximum time required to shut down base monitor (30 seconds)
Use the total value of the script timeout of all resources included in an application for the value of 1.
To check the script timeout value of each resource, execute the hvdisp command by using the resource name as the argument and see the setting value (in seconds) of the ScriptTimeout attribute of the resource with which the OfflineScript attribute is not blank.
When the ScriptTimeout attribute is in "timeout_value[:[offline_value][:online_value]]" format, use the offline_value timeout value if offline_value exists. If offline_value does not exist, use the timeout_value timeout value.
If there are two or more applications, use the largest value of the total values of the script timeout for each application.
Note
By increasing the value of RELIANT_SHUT_MIN_WAIT, the following effects occur if delay or hang-up in offline processing of applications occurs.
It may take longer time than the setting value of RELIANT_SHUT_MIN_WAIT to shut down RMS or the operating system.
It may take longer time than the setting value of RELIANT_SHUT_MIN_WAIT to automatically switch applications by shutting down RMS or the operating system.
If the value of RELIANT_SHUT_MIN_WAIT is too large, use the processing time in the case that it is expected to take the longest time until timeout of the assumed cases where offline processing of an application times out for the value of the item 1 above.
Note that if the value of RELIANT_SHUT_MIN_WAIT is too small, timeout of the hvshut command may frequently occur before offline processing of applications completes. Tune RELIANT_SHUT_MIN_WAIT carefully.
Note
Resources may remain active without stopping because RMS terminates abnormally when the hvshut command times out. Under this situation, if you start RMS on a different node and forcibly activate the application, the resources may become active on several nodes at the same time, and data may be corrupted when a shared disk is controlled by the resources. For this reason, if the hvshut command times out, shut down the operating system of the node where RMS has terminated abnormally, or forcibly shut down the node to ensure that resources stop. Then, start RMS and applications.