Top
Systemwalker Operation Manager Troubleshooting Guide
FUJITSU Software

5.3.2 Failed to Execute the Network Job (Error Message:MJS881S is Output)

Applicable versions and levels

Check all applicable actions below to resolve the issue.

Action 1

Error message

MJS881S jobname(<job number>) COMMUNICATION ERROR OCCURRED FOR <host name>,CODE(<code1>,<code2>)

<code1>:2
<code2> depends on the schedule server operating system type, as follows:
  Windows versions: 274D
  Solaris versions: 0092
  HP-UX versions: 00ef
  AIX versions: 004f
  Linux versions: 006f

Points to check

Does the mjsnet port number match on the schedule server and the execution server?

Cause

If the mjsnet port number does not match on the schedule server and the execution server, the network job ends abnormally.

Action method

Ensure that the port number of mjsnet or mjsnetn (n is the subsystem number from 1 to 9) defined in the services file matches on the schedule server and the execution server.

To reflect the changes made to the port number of the execution server, the Job Execution Control service/daemon of the execution server must be restarted.

Information

The port number default values when mjsnet is not defined in the services file are as follows:

  • Windows version V5.0L30 or earlier version levels: 28452

  • Windows version V10.0L10 or later version levels, UNIX versions: 9327

In Windows version V5.0L30 or earlier version levels, to describe the mjsnet definition on the final line of the services file, a line feed must be entered after the line on which mjsnet is defined.

Action 2

Error message

MJS881S jobname(<job number>) COMMUNICATION ERROR OCCURRED FOR <host name>,CODE(<code1>,<code2>)

<code1>:2
<code2> depends on the schedule server operating system type, as follows:
Windows versions: 274D
Solaris versions: 0092
HP-UX versions: 00ef
AIX versions: 004f
Linux versions: 006f

Points to check

Has a request to execute a network job been made for a subsystem number for which an operation has not been performed on the execution server?

Cause

A request to execute a network job registered as subsystem number n (0 to 9) on the schedule server was made for the same subsystem number on the execution server. The port number used for communication is different for each subsystem number, therefore, if the operation is not performed for the same subsystem number n as the schedule server, a communication error occurs in the network job, and it cannot be executed.

Action method

Unify the subsystem numbers that perform the network job operation on the schedule server and the execution server.

Note that, when the schedule server is V10.0L10/5.2 or later and the execution server is V5.0L30/5.2 or later, the mjsnet port number (default: 9327/tcp) is used in all subsystem numbers, and the execution of network jobs with all subsystem numbers on the schedule server is requested for subsystem number 0 on the execution server according to the procedure shown below.

However, if this action is performed, network jobs with the same subsystem number cannot be executed.

  1. If the mjsnet1 to mjsnet9 port numbers are defined in the services file of the schedule server, delete the mjsnet1 to mjsnet9 port number settings.

  2. Restart the Job Execution Control service/daemon of the schedule server.

Action 3

Error message

MJS881S jobname(<job number>) COMMUNICATION ERROR OCCURRED FOR <host name>,CODE(<code1>,<code2>)

<code1>: 2 or 3
<code2>: Not fixed

Cause

This error occurs if software such as NetWorker uses a port number (default: 9327/tcp) also used by Job Execution Control.

Action method

Perform the "Action method" in Error message 5 "MpMjes: ERROR: 10110: mjsnetsv : Failed in the bind of port number <port number>/tcp." of "2.1.3 Systemwalker Operation Manager Error Messages are Displayed in an Environment in Which Networker, etc., Coexist".

Action 4

Error message

MJS881S jobname(<job number>) COMMUNICATION ERROR OCCURRED FOR <host name>,CODE(<code1>,<code2>)

<code1> depends on the execution server operating system type, as follows:
Windows: 3 or 4
UNIX: 2
<code2> depends on the schedule server operating system type if the execution server is Windows, as follows:
Windows versions: 2746
Solaris versions: 0083
HP-UX versions: 00e8
AIX versions: 0049
Linux versions: 0068
<code2> depends on the schedule server operating system type if the execution server is UNIX, as follows:
Windows versions: 274C
Solaris versions: 0091
HP-UX versions: 00ee
AIX versions: 004e
Linux versions: 006e

Points to check

Have many network jobs been submitted at the same time?

Cause

If many network jobs have been submitted at the same time, it may not be able to receive them sequentially on the execution server. The network jobs that could not be processed on the execution server end abnormally.

Note that the number of network jobs that can be processed on the execution server depends on the server performance.

Action method

Spread the timing at which the network jobs are submitted.

Action 5

Error message

MJS881S jobname(<job number>) COMMUNICATION ERROR OCCURRED FOR <host name>,CODE(<code1>,<code2>)

<code1>: Not fixed
<code2>: Not fixed

Points to check

Has a line error occurred between the schedule server and the execution server?

Cause

This problem occurs if a line error has occurred between the schedule server and the execution server.

Action method

Refer to the following manuals, and establish the cause of the line error from the meaning of <code1> and <code2> that are output.

Note that, in V13.0.0 or later, the retry behavior can be changed if a line error has occurred in the network job. Refer to "Defining the System Operating Information" in the Installation Guide for information on how to change the details.

Action 6

Error message

MJS881S jobname(<job number>) COMMUNICATION ERROR OCCURRED FOR <host name>,CODE(<code1>,<code2>)

<code1>: Not fixed
<code2>: Not fixed

Points to check

Is there a firewall between the schedule server and the execution server?

Cause

This problem occurs if communication in the port used by Systemwalker Operation Manager is not possible because of a firewall between the schedule server and the execution server.

Action method

Operations in environments in which there is a firewall between the schedule server and the execution server are possible by setting the firewall so that communication using the ports shown below is possible. Consider and verify these settings thoroughly before configuring them. Refer to "Listing of Port Numbers" in the Installation Guide for information on port numbers.

Action 7 (UNIX versions V17.0.0 or later)

Error message

MJS881S jobname(<job number>) COMMUNICATION ERROR OCCURRED FOR hostname,CODE(<code1>,<code2>)

<code1>: Not fixed
<code2>: Not fixed

Points to check

Has a timeout occurred in the network job?

Cause

This issue occurs due to the timeout of the network job.

The default timeout values are the following:

Action method

Customize the default timeout value for the communication path connection process and data send/receive process. This way, the time until a processing error is detected can be adjusted when an execution server is stopped, or a network error occurs.

The setting can be made in a definition file. The definition takes effect for the communication path connection process and data send/receive process performed on the server where the setting is defined. The following shows the setting method.

Setting method
  • In subsystem operations, the setting must be defined for each subsystem.

  • Create a definition file using an editor such as vi, and set the value.

  • Set read permissions to the created file for all users.

  • In a system with a cluster configuration, the following definition file is located on the shared disk. There is no need to create a definition file on each node. Create a definition file on the active node that can access to the shared disk.

Definition file name

If not operating subsystems, or subsystem number is 0

/etc/mjes/mjconf.ini

If subsystem number is 1 to 9

/etc/mjes/mjesN/mjconf.ini

N: 1 to 9

File format

[Socket]

ConnectTimeout=NNN

SendRecvTimeout=nnn

[Socket]

Specify this section when customizing the communication path connection process and data send/receive process for the network job.

ConnectTimeout=NNN

Specify the timeout value to NNN in seconds to set the timeout duration of the communication path connection process for the network job when the process stops responding. Specify a value from 5 to 300 (seconds).

This key is optional. The default value of 30 (seconds) will be used if this is omitted, the file format is incorrect, or a definition file is not created.

SendRecvTimeout=nnn

Specify the timeout value to nnn in seconds to set the timeout duration of the data send/receive process for the network job when the process stops responding. Specify a value from 10 to 300 (seconds).

This key is optional. The default value of 60 (seconds) will be used if this is omitted, the file format is incorrect, or a definition file is not created.

Cautions
  • Do not include characters that do not belong to the format, such as tabs and spaces, between the section name and key name.

  • Do not create a file that is not in the above format, such as omitting the value for key name.

  • If a definition file that configures definitions for another function already exists with the same name, the above definitions can be added to the file.

Setting examples

Definition file setting example

[Socket]

ConnectTimeout=30

SendRecvTimeout=30

Read permission setting example

# chmod 444 /etc/mjes/mjconf.ini

When the setting takes effect

The settings take effect for jobs that are executed after creating the mjconf.ini definition file.

There is no need to stop the Systemwalker Operation Manager daemon.

Cancel the settings

Delete the setting key.

Output messages

The following message is output to the SYSLOG if the specified file format is invalid:

This message may be output multiple times for a single job execution. If the file format is invalid, the default value will be used.

  • If the key is ConnectTimeout

    MpMjes: ERROR: 10133: Internal configuration file is invalid.(Section=Socket,Key=ConnectTimeout)

    [Description]

    The specified file format is invalid. (Section name=Socket, Key name=ConnectTimeout)

  • If the key is SendRecvTimeout

    MpMjes: ERROR: 10133: Internal configuration file is invalid.(Section=Socket,Key=SendRecvTimeout)

    [Description]

    The specified file format is invalid. (Section name=Socket, Key name=SendRecvTimeout)

Notes on operation
  • This setting is for operations of network jobs or distributed execution jobs. There is no effect on operations of local jobs or jobs that use a JCL file.

  • When setting the timeout value using this function, consider the temporary network load, and perform sufficient verification before operation. If the setting value is too small, timeout may occur due to the temporary system load or network load. If the setting value is greater than the system function timeout provided by the operating system, timeout may occur before the set value.

  • If the continuous execution mode is enabled and a communication error occurs while processing a network job, a retry process to connect to the execution server is performed. (Normally, 6 retries will be performed at intervals of 10 seconds.) Therefore, if the process stop responding after submitting a job, the job becomes abnormal after the retry is complete.

  • The definition file that was created is a target of the backup and restore of Systemwalker Operation Manager.

  • The definition file that was created is a target of the Systemwalker Operation Manager backup command for migration and conversion/registration command for migration.

  • The definition file that was created is not a target of policy extraction and distribution of Systemwalker Operation Manager. If distributing policies to another server, the settings need to be reconfigured.