This section explains operation when a slave server error has occurred.
When the events shown below occur at a slave server, other slave servers take over the job currently being executed, enabling processing to continue.
If the slave server stopped
If slave server shutdown, system panic, forced power cut, or similar occurs
If the slave server does not respond, for example because it hangs up
If an error occurs in the slave server network
If an error occurs in both networks, if the public LAN is duplicated
If a slave server error occurs, check the slave server status with informaton such as system log file. Take action, and then execute the bdpp_start command on the master server to restart Hadoop on the slave server where the error occurred.
Refer to "A.1.11 bdpp_start" for information on starting Hadoop.
Note
If the bdpp_start command is executed to restart Hadoop on some slave servers, the "bdpp:WARN:001" message is output, but this does not indicate a problem with restarting the Hadoop on the slave servers.