Top
PRIMECLUSTER  Installation and Administration Guide 4.5
FUJITSU Software

I.3.2 Corrective Actions When an Error Occurs in the Compute Node

I.3.2.1 If Not Using the High Availability Configuration for Compute Instances

If an error occurs in the compute node in the environment where the high availability configuration for compute instances is not used, the compute node becomes LEFTCLUSTER. This section describes the recovery procedure from the LEFTCLUSTER state.

  1. Make sure that the cluster node is actually stopped. Stop the node if it is operating.

  2. If the cluster node where an error occurred becomes LEFTCLUSTER, perform the procedure described in "5.2 Recovering from LEFTCLUSTER" in "PRIMECLUSTER Cluster Foundation (CF) Configuration and Administration Guide."

  3. Check the compute node status and recover the compute node.

    You can skip this step if the compute node is recovered automatically.

  4. Recover the cluster node.

  5. Execute the following command on any one node in the cluster system and make sure that all the cluster nodes have joined the cluster.

    # cftool -n

    Make sure that all the CF node names are displayed in "Node" field. Also make sure that UP is displayed in "State" field.

    Example

    # cftool -n
    Node   Number   State   Os     Cpu
    node1      1    UP      Linux  EM64T
    node2      2    UP      Linux  EM64T

    Make sure that all the CF node names are displayed in "Node" field. Also make sure that UP is displayed in "State" field.

    For the following operations, refer to "7.2 Operating the PRIMECLUSTER System."

I.3.2.2 If Using the High Availability Configuration for Compute Instances

In the environment where the high availability configuration for compute instances is used, if an error occurs in the compute node where the virtual machine of the cluster node with low survival priority exists, the virtual machine of the cluster node is not moved to another compute node. This section describes how to recover from this status.

  1. Perform the following procedures on the director or the controller node to move the cluster node to another compute node.

    1. Execute the following command to reset the cluster node status on the compute node where an error occurred.

      Example: If the instance name of the cluster node is instance1

      $ nova reset-state instance1
    2. If the cluster node on the compute node where an error occurred is not moved automatically to another compute node after step 1 was executed, execute the following command to move it to another compute node.

      Example: If the instance name of the cluster node is instance1

      $ nova evacuate instance1

      For more information on the nova command, refer to the RHOSP manual of Red Hat, Inc.

  2. Execute the following command on any one node in the cluster system and make sure that all the cluster nodes have joined the cluster.

    # cftool -n

    Make sure that all the CF node names are displayed in "Node" field. Also make sure that UP is displayed in "State" field.

    Example

    # cftool -n
    Node   Number   State   Os     Cpu
    node1      1    UP      Linux  EM64T
    node2      2    UP      Linux  EM64T

    Make sure that all the CF node names are displayed in "Node" field. Also make sure that UP is displayed in "State" field.

    For the following operations, refer to "7.2 Operating the PRIMECLUSTER System."

  3. Check the compute node status and recover the compute node.

    You can skip this step if the compute node is recovered automatically.