If the system stops during job execution
This section explains how jobs are handled (processed) if the system stops due to a system failure or an interruption to the power supply (including interruptions during actual operations) while a job is executing.
Submitted jobs are not cleared but preserved even if the system stops.
Jobs are processed as explained below depending on their status when the system stopped.
Jobs are saved and then executed when the Job Execution Control service or daemon is restarted.
Jobs that were set (*1) to re-execute are saved and then re-executed from the beginning when the Job Execution Control service or daemon is restarted.
Jobs that were not set to re-execute are not re-executed because their queuing information is cleared. If the system stops on the execution server while network jobs or distributed execution jobs are executing, these jobs are not re-executed because their queuing information is cleared. This is regardless of the status of the jobs when the system stops and whether re-execution has been specified. This prevents the jobs from being executed twice, which would be the result if they were set for re-execution and the submitting server resubmitted the jobs as a recovery action. Take measures such as resubmitting the jobs from the submitting server after the system has restarted.
To re-execute demand jobs when the system stops, in the Edit Job Information/Submit dialog box, click the Additional Information tab and specify Re-execute Job.
When using commands to execute demand jobs, you can control re-execution by specifying or omitting the -nr option in the qsub command. Specifying the -nr option prohibits the re-execution of jobs. You must take care when not specifying the -nr option because operations will permit the re-execution of jobs.
Refer to the Systemwalker Operation Manager Reference Guide for information on the qsub command.
If the system stops, queuing information is cleared and jobs are not re-executed. This is to prevent subsequent jobs from being re-executed because the job net has not completed processing. After the system has restarted, the only operations that should be performed are recovery operations for the job net, such as restarting it.
If the system stops on the execution server for network jobs or distributed execution jobs, their queuing information is cleared by the execution server in the same way as for demand jobs, so they are not re-executed.
Information
If the system stops on the schedule server (submitting server) for network jobs and distributed execution jobs, these jobs can continue to be executed. Refer to "Continuing Job Operations at Schedule Server System Down" in the Systemwalker Operation Manager User's Guide for information on how to define continuous execution.
On a cluster system, jobs can be taken over by the standby node if failover occurs. Refer to "Jobs Taken Over in the Cluster System" in the Systemwalker Operation Manager Cluster Setup Guide for information on the settings required for takeover of jobs.
Environment variables
Before starting jobs, the Job Execution Control function assigns values to the environment variables listed below. Use these environment variables in situations such as when the processing for subsequent jobs needs to be changed depending on the type or completion code of the proceeding job. The environment variables present when the Operation Manager is started are also inherited. However for network jobs, environment variables on the submitting server are not inherited.
This variable stores the subsystem number.
This variable stores the host name of the client. This variable is set when jobs are submitted from the Edit Job Information/Submit dialog box or window, or from the Select/Submit Jobs window.
This variable stores the name of the job owner.
This variable stores the job comment.
This variable stores the name of the execution host.
This variable stores the job number.
This variable stores the name of the job.
This variable stores the name of the queue.
This variable stores the name of the job submission directory.
If I/O file transfer has been performed for network jobs and distributed execution jobs, the directory where the files (sent or received on the execution server) have been stored is set in the environment variable of the job process on the execution server.
In network jobs and distributed execution jobs, the schedule server host name is set in the environment variable of the job process on the execution server.
Job numbers
Job numbers 1 to 99999 are used cyclically.
Job Execution Control can manage up to 99999 jobs.