2.1.3 Online processing

Generating the online request is referred to as switching the userApplication; that is, switching the userApplication online or switching the userApplication to another cluster node (refer also to the "2.1.6 Switch processing").

The following actions can generate an online request:

Manual request using the GUI or CLI (hvswitch)
Automatic request when RMS is started using the GUI or CLI (hvcm)
Automatic requests controlled by the application's AutoSwitchOver attribute:
- AutoSwitchOver includes ResourceFailure and a fault occurs
- AutoSwitchOver includes ShutDown and a node is shut down
- AutoSwitchOver includes HostFailure and a node is killed

Manual methods

Manual methods have two modes for switching the userApplication. These modes are as follows:

Priority switch - RMS selects the SysNode. The userApplication is switched to the highest priority SysNode. The SysNode objects' priority is determined by their order in the PriorityList attribute of the userApplication object.
Directed switch - The user selects the SysNode. The userApplication is switched to a specific SysNode.

In both priority and directed switches, only SysNode objects that are in the Online state may be selected.

Manual request using the GUI

To manually generate an online request, perform the following steps:

Using the graph, right-click on an application to display its context menu.
Click on a switch or online item in the context menu.

Manual request using the CLI: To generate an online request for each userApplication, use the hvswitch command. Refer to the hvswitch manual page for details on usage and options.

Automatic methods

All automatic methods can only invoke a priority switch.

Automatic request at RMS startup

When RMS first starts on a cluster, it switches the userApplication online on the highest priority node if all of the following conditions are true:

All SysNode objects associated with a specific application are online.
The userApplication is neither online nor inconsistent on any other cluster node.
The AutoStartUp attribute of the userApplication is enabled.
No object in the graph of the userApplication is in the faulted state.

These limitations ensure that the userApplication is not started on more than one cluster node at a time.

If the userApplication is already online after startup, an automated startup request for the userApplication is immediately created, even if AutoStartUp is not set or not all SysNodes are online. This is intended to ensure a consistent graph of an online userApplication. Otherwise objects could still be offline in an graph of an online application.

Automatic request when a fault occurs

RMS initiates a priority switchover when it detects either a fault of a userApplication, or a fault of a SysNode where a userApplication was online. This automatic switchover is controlled by the application's AutoSwitchOver attribute as follows:

AutoSwitchOver includes ResourceFailure and a fault occurs
AutoSwitchOver includes ShutDown and a node is shut down
AutoSwitchOver includes HostFailure and a node is killed

No automatic switchover occurs if AutoSwitchOver is set to No.

2.1.3.2 PreCheckScript

The PreCheckScript is intended to verify in advance that certain prerequisites for successful online processing are fulfilled. It avoids useless attempts when those prerequisites are not (yet) met. The PreCheckScript is also invoked during policy-based switching.

The PreCheckScript will be forked before the original online processing begins. If the script is successful and returns with an exit code of 0, online processing proceeds as usual. If the script fails and returns with an exit code other than 0, online processing is discarded and a warning is written into the switchlog.

Resulting state

When the PreCheckScript is running, the userApplication object transits into the Wait state. If the PreCheckScript fails, the userApplication object transits back into its previous state, usually Offline or Faulted.

AutoSwitchOver

If the PreCheckScript fails and the AutoSwitchOver attribute includes ResourceFailure, then RMS automatically forwards the online request to the next priority node (except in cases of directed-switch requests).

2.1.3.3 Online processing in a logical graph of a userApplication

If the PreCheckScript is successful, the base monitor generates a pre-online request. Relative to the resource graph, the pre-online request process is as follows:

Request is sent from the parent to the child.
Parent object changes to the Wait state, but no script is initiated.
Child receives the request. The pre-online script is initiated in the leaf objects.
When the script terminates, confirmation is sent to the parent.
As soon as all children of the parent have sent their confirmation, the pre-online script is executed on the parent.

In relation to the resource graph, the above steps illustrate the bottom-up procedure for executing the scripts in online processing.

The userApplication object is the final object to execute its pre-online script; it then generates an online request that is passed to the leaf objects. However, there is a difference between online processing and pre-online processing.

Relative to the resource graph, the online script process is as follows:

RMS executes the online script.
The system waits until the object detector reports the Online state. If a object does not have a detector, the post-online script executes after the OnlineScript is completed successfully.
The post-online script executes immediately.
Confirmation of the success of online processing is forwarded to the parent.
The object exits the Wait state and changes to the Online state.

In the context of RMS, "the userApplication is online" means that all the graph nodes configuring userApplication are Online. In this case, the term online does not pertain to the state of the actual application. The actual application is not controlled by RMS, or it is started by OnlineScript (or PostOnlineScript) configured for the userApplication object (more generally a Cmdline child object). When the userApplication object is Online, it only means that the script execution ended normally.

Note

How a script influences the state of the actual application depends on the application itself. RMS has no direct control over any user application. For a more complete discussion, see the section "1.2.2 Relationship of RMS configurations to the real world."

Example 2

The scenario for this example is as follows:

AutoStartUp attribute is set to "yes."
None of the resource objects have PreOnlineScript definitions.
All objects are in the Offline state at startup time.

Online processing is as follows:

RMS starts.
userApplication object app on node fuji2RMS generates a pre-online request because the AutoStartUp attribute is set to "yes."
This request is passed through to the lfs leaf object. As no PreOnlineScript has been configured for any of the objects in this example, lfs forwards a message to app indicating that pre-online processing has completed successfully.
When the pre-online success message arrives, app generates the online request, which is also passed through to the lfs leaf object.
The lfs object executes the online script and brings the disk online.
As soon as the detector of lfs reports Online, successful completion of online processing is notified upwards to the cmd object. (If the object had a post-online script, this would have been executed before the success message was forwarded.)
The cmd object starts its online script.
As soon as the cmd detector reports a success completion, the success message is forwarded to andOp1.
The andOp1 object is a object without a detector; it does not have an online script in this example. As soon as its local child reports the Online state, it forwards the success message to its parent object app.
Upon receipt of the success message at app, RMS executes the online script and the application starts. Because app does not have a detector and also because no post-online script is configured, app changes immediately to the Online state after the online script has completed successfully.

2.1.3.4 Unexpected reports during online processing

Unexpected reports during online processing mean reports which are reported during the online processing but not in the Online state ignored by the base monitor.

Reports other than in the Online state ignored by the base monitor may be reported from the point where the online processing of the user application which an object belongs to starts until the object's online processing succeeds or fails.

For cases where the object's online processing fails, see the "2.1.3.5 Fault situations during online processing".

2.1.3.5 Fault situations during online processing

If an error situation occurs during online processing, the affected object commences fault processing and notifies its parent of the error (see also the "2.1.5 Fault processing"). The following can cause faults during online processing:

When the last reported state from a detector of a resource object is in the Offline or Faulted state at the point where Online processing finishes.
Script fails with an exit status other than 0.
Script fails with a timeout.
An object's OnlineScript finishes and the detector does not notify the Online state within a specific period.

For case a, fault processing is initiated after online processing of userApplication finishes in direct contrast to cases b, c, and d where fault processing is initiated immediately once that condition is satisfied.

2.1.3.6 Initialization when an application is already online

A situation can occur in which the entire logical graph of a userApplication is already online when RMS is initialized. In this case, the PreCheckScript does not execute and the affected objects switch directly from the Unknown state to the Online state without executing any scripts.

Request while online

If a userApplication receives an online request when it is already online, it is forwarded to the other objects as usual. The only difference from the description in the "2.1.3 Online processing" is that any objects that are already online forward the request or the responses without executing their scripts and without changing to the Wait state. In particular, the PreCheckScript is not run.

A typical example of an object which is always online when RMS is initialized is a gResource object for a physical disk, since physical disks cannot in general be disabled through a software interface.

No request while online

If a userApplication does not receive an online request when it is already online and RMS is initialized, the userApplication carries out online processing of its graph as if it had received an explicit online request. The resulting state of the local graph is exactly the same as in the previous case.

Guarding against data loss when the application is already online

A primary objective of RMS is to ensure that no data loss occurs as a result of simultaneous activity of the same application on more than one node in the cluster. Therefore, after the online processing of the application's graph in either of the two cases described above, the base monitor on the local node reports the userApplication object's Online state to the base monitors on the other nodes to ensure that no corresponding application goes online elsewhere in the cluster.

Note

It can be extremely damaging if a userApplication is online on more than one node immediately after RMS has initialized. In this case, RMS generates a FATAL ERROR message and blocks any further requests for the userApplication. This minimizes the possibility of damage caused by inconsistency in the cluster.
The situations described in this section are a result of manual intervention. If the manual intervention allowed competing instances of an application or a disk resource to run on multiple nodes, data corruption may have already occurred before RMS was initialized.