Top
ServerView Resource Coordinator VE Operation Guide

5.3 Addressing Resource Failures

This section explains how to address problems like hardware failures occurring in a system.


Basic Procedure

The following procedure is used to confirm and resolve problems using the RC console.

  1. Confirm the existence of a problem

    Use the RC console to confirm that a problem has occurred on a resource.
    Refer to the following for details on checking resource statuses.

    • For PRIMERGY servers

      Refer to "5.2 Resource Status" in this chapter and "2.3 Status Panel" of the "ServerView Resource Coordinator VE Setup Guide".

    • For servers other than PRIMERGY servers

      Refer to "2.3 Status Panel" of the "ServerView Resource Coordinator VE Setup Guide".

  2. Check the event log

    Use the event log to check the device where the error occurred and the content of the event.
    In some cases a single problem can cause a series of events to occur, so search back through past events to find events with dates that are close together.

  3. Check the status of resources

    From the resource tree, open the resource where the problem occurred and look for any affected chassis, physical server, LAN switch, physical OS, VM host and VM guests.
    If Auto-Recovery has been enabled for a physical OS or VM host, it will be automatically switched over with a spare server. If Auto-Recovery has not been enabled, server switchover can still be performed manually as long as a spare server has been designated.
    For more information regarding server switchover, refer to "10.2 Switchover".

  4. Perform detailed investigation and recovery

    From the [Resource Details] tab of the failed resource, launch the external management software to investigate the precise cause of the problem.
    When no management software is available, confirm with the maintenance staff of the failed resource to investigate the problem.
    Once this is done, perform the necessary maintenance work on any faulty hardware identified.
    If a server hardware failure requires replacing a managed server, carry out the replacement operation as described in "9.4 Replacing Servers".

  5. Perform post-recovery verification

    Following recovery, confirm that there are no more icons indicating problems on the RC console.