8.3 Determining Redundancy

One way of addressing faults is to use redundancy for the system, the definition information, and other resources.

By making the system, the definition information, and other resources redundant in case of unexpected problems, the impact of problems can be minimized, in the same way as with backups.

The redundancy methods supported by Systemwalker Operation Manager are explained below.

Cluster system

Systemwalker Operation Manager supports cluster systems.

Cluster systems involve operating multiple linked servers as though they are a single server. In this way, even if a fault occurs on one server, processing can be taken over by another server, making it possible to achieve higher availability than a system that is operated using stand-alone servers.

Refer to the Systemwalker Operation Manager Cluster Setup Guide for Windows and the Systemwalker Operation Manager Cluster Setup Guide for UNIX for details on cluster systems.

Execution server redundancy

Redundancy can be applied to execution servers in case an execution server for a network job fails.

By specifying two candidate execution servers (Execution server A as the first candidate and Execution server B as the second candidate), job execution requests will be automatically sent to the second candidate (Execution server B) if the first candidate (Execution server A) fails. Refer to "8.4.2 How to Continue Business Operations when Problems Occur with Execution Servers (Execution Server Redundancy for Network Jobs)" for details.

Another method for automatically executing jobs on another server when a server fails involves using the Distributed Execution function. The Distributed Execution function is originally used as a function that groups multiple execution servers and automatically allocates jobs to the execution server with the lowest load. However, this function can also be used for execution server redundancy, because if one of the execution servers in a group fails, then jobs will be automatically allocated to the execution server that has the lowest load in the group.

Each method has its advantages and disadvantages, so refer to the following table when considering these methods.

	Advantage	Disadvantage
Execution server redundancy	Jobs are normally executed on the first candidate server. Jobs are executed on the second candidate server only if the first candidate server fails. In other words, this method enables jobs to be executed on substitute servers only when a fault occurs.	If the first candidate server fails, it can take some time for jobs to be submitted, because execution does not switch to the second candidate server until job submission to the first candidate server has been attempted.
Distributed Execution	Up to 100 servers can be set up as distribution destinations. If a particular server fails, it is possible to recognize that the server has failed for up to 10 minutes, during which jobs will not be submitted to that server (no attempt is made to submit jobs to the server that has failed).	It is impossible to predict on which server a job will be executed.

Schedule information file redundancy

By using the jmmode command to enable data to be written to backup files, it is possible to duplicate the files containing schedule information (known as "schedule information files").

If redundancy has been specified, synchronous writes will be performed on the schedule information files and the backup files. Synchronous writes make it less likely that inconsistencies will occur with the schedule information files. However, the schedule performance and the startup performance for groups, job nets, and jobs will decline. Before adopting file redundancy, conduct thorough performance testing to ensure that job nets start on schedule without any problems. Even if redundancy has been specified, inconsistencies may occur with the schedule information files, for example, if power shutdown occurs while the operating system is writing data to the disk. Consider taking regular backups in anticipation of such circumstances.

If redundancy is not specified, the schedule performance and the startup performance will not decline. However, there is a high possibility that inconsistencies will occur with the schedule information files due to unscheduled power shutdown or some other reason, so consider backing up the schedule information files regularly and restoring them when faults occur.

Refer to "jmmode Continuous Execution Mode Switching Command" in the Systemwalker Operation Manager Reference Guide for details on the jmmode command.