Top
Interstage Big DataParallel Processing ServerV1.0.1 User's Guide
FUJITSU Software

15.1.2 If the Master Server Does Not Use Replicated Configuration

If any of the errors listed below occur on the master server during execution of jobs, those jobs will be interrupted.

Tasks may be stopped for a long time until the master server has recovered.

The following section explains the corrective action to take after an error occurs on the master server.

Figure 15.2 Procedure to resume tasks after an error occurs in a non-replicated configuration


(1) Recover server

Refer to the system log of the master server, and remove the cause of the error.

If a serious error has occurred on the master server requiring a server to be rebuilt, recover the master server.

The restore feature of this product can be used to rebuild and reconfigure the system configuration and the master server definition information to the normal running status.

Refer to "14.2.1.1 Restoring a Master Server, Development Server, or Collaboration Server" for information on the procedure to restore the master server.

Point

Prior to performing a restore, a backup of the master server must be created when it is running normally.

Refer to "14.1.2.1 Backing Up a Master Server, Development Server, or Collaboration Server" for information on the procedure to back up the master server.

(2) Resume server operations

Restart the master server that was recovered. When restarting the master server, the DFS must be unmounted and remounted on the DFS client (slave servers, development servers, and collaboration servers).

Use the following procedure to restart the master server:

  1. Unmount the DFS on all slave servers, development servers, and collaboration servers.

    Example

    If the logical file system name of the DFS is pdfs1:

    # umount pdfs1 <Enter>
  2. Restart the master server.

    If the master server is configured so that the DFS is not mounted automatically when the master server is started, restart it and then mount the DFS manually.

  3. Mount the DFS on all slave servers, development servers, and collaboration servers.

    Example

    If the logical file system name of the DFS is pdfs1:

    # mount pdfs1 <Enter>
(3) Start JobTracker

Start Hadoop on the master server that was recovered.

Always use the bdpp_start command to start Hadoop.

(4) Restart analysis tasks (re-execute jobs)

After the master server has fully recovered, execute jobs as required and resume tasks.