Top
Systemwalker Operation Manager  Cluster Setup Guide for UNIX
FUJITSU Software

2.10 Failover Testing

Before testing a failover

Before testing a failover, terminate the following process by sending the signal "SIGTERM" to the process:

How to identify the tskwnsrv process of a subsystem

When operating multiple subsystems, follow the procedure below to identify the process ID of the tskwnsrv process of the applicable subsystem in order to send it the signal SIGTERM.

  1. Execute the ps command.

    # ps aux | grep tskwnsrv

    Note:
     The above example is for Linux. The command varies depending on the operating system.
       Solaris: Solaris 10): /usr/ucb/ps -auxww | grep tskwnsrv
                Solaris 11): /usr/bin/ps auxww | grep tskwnsrv
       HP-UX: ps -aex | grep tskwnsrv
       AIX: ps auxww | grep tskwnsrv
     The command varies depending on the machine environment. Refer to the relevant operating system manual for information on commands.
  2. From the listed processes, identify the ID of the tskwnsrv process of the applicable subsystem from the displayed result of the ps command.

    Example of ps command output for subsystem 0

    root 6944 0.0 0.0 155408 2484 ? S Aug 31 0:00 /opt/FJSVJOBSC/bin/tskwnsrv 229380 /var/opt/FJSVJOBSC 0 -1

    In the example above, the process ID is "6944".

    Example of ps command output for subsystem n (1 to 9) (the example below is for subsystem 1)

    root 7250 0.0 0.0 153324 2452 ? S Aug 31 0:00 /opt/FJSVJOBSC/bin/tskwnsrv 327687 /var/opt/FJSVJOBSC/JOBDB1 1 -1

    In the example above, the process ID is "7250".


A failover does not take place if the daemon has been stopped using the poperationmgr command.

This is because the daemons (Jobscheduler, Job Execution Control) to be registered in the cluster are excluded from the targets for automatic startup/stop using the soperationmgr/poperationmgr command when creating a cluster system.

Triggering a failover

There are many types of events that can trigger a failover. However, do not forcefully unmount the shared disk to start a failover.

Systemwalker Operation Manager references shared disk information in the process of stopping the Jobscheduler daemon. Therefore, after the shared disk is forcefully unmounted, the process of stopping the Jobscheduler daemon cannot be performed correctly. (The failover is completed.)

If a failover occurs because the shared disk is unmounted, send the SIGTERM signal to the tskwnsrv process on the node from which the shared disk has been unmounted.

Note that the SIGKILL signal should not be sent.