Top
PRIMECLUSTER  Installation and Administration Guide4.3

1.4 Test

Purpose

When you build a cluster system using PRIMECLUSTER, you need to confirm before starting production operations that the entire system will operate normally and cluster applications will continue to run in the event of failures.

For 1:1 standby operation, the PRIMECLUSTER system takes an operation mode like the one shown in the figure below.

The PRIMECLUSTER system switches to different operation modes according to the state transitions shown in the figure below. To check that the system operates normally, you must test all operation modes and each state transition that switches to an operation mode.

Figure 1.3 State transitions of the PRIMECLUSTER system

PRIMECLUSTER System State

Description

Dual instance operation

A cluster application is running, and it can switch to the other instance in the event of a failure (failover). Two types of the dual instance operation are OPERATING and STANDBY.

Even if an error occurs while the system is operating, the standby system takes over ongoing operations as an operating system. This operation ensures the availability of the cluster application even after failover.

Single instance operation

A cluster application is running, but failover is disabled.

Two types of the single instance operation are OPERATING and STOP. Since the standby system is not supported in this operation, a cluster application cannot switch to other instance in the event of a failure. So, ongoing operations are disrupted.

Stopped state

A cluster application is stopped.

The above-mentioned "OPERATING", "STANDBY", and "STOP" are defined by the state of RMS and cluster application as follows;

RMS state

Cluster application state

Remark

OPERATING

Operating

Online

-

STANDBY

Operating

Offline or Standby

-

STOP

Stopped

Unknown *

SysNode is Offline

* RMS determines the cluster application state. When RMS is stopped, the cluster application state is unknown.

Main tests for PRIMECLUSTER system operation

Startup test

Conduct a startup test and confirm the following:

Clear fault

If a failure occurs in a cluster application, the state of that application changes to Faulted.

To build and run this application in a cluster system again, you need to execute "Clear Fault" and clear the Faulted state.

Conduct a clear-fault test and confirm the following:

Switchover

Conduct a failover or switchover test and confirm the following:

You need to know the operation downtime in the event of a failure, so measure the switching time for each failure detection cause and check the recovery time.

Replacement test

Conduct a replacement and confirm the following:

Stop

Conduct a stop test and confirm the following:

Work process continuity

Conduct work process continuity and confirm the following:

Cluster Node Forced Stop Test

Please check that the shutdown facility's settings are properly functioning.

With a view to the following, please conduct a test of whether or not there has once been a stop to the cluster nodes of which the cluster is comprised

See