Top
PRIMECLUSTER  Cluster Foundation Configuration and Administration Guide 4.3
FUJITSU Software

5.2.3 Caused by a cluster partition

A cluster partition is a communications failure in which all CF communications between sets of nodes in the cluster are lost. In this case, the cluster itself is effectively partitioned into sub-clusters.

To manually recover from a cluster partition, you must do the following:

  1. Decide which of the sub-clusters you want to survive. Typically, you will chose the sub-cluster that has the largest number of nodes in it or the one where the most important hardware is connected or the most important application is running.

  2. Shut down all of the nodes in the sub-cluster which you don't want to survive.

  3. Fix the network break so that connectivity is restored between all the nodes in the cluster.

  4. Bring the nodes back up.

  5. If the nodes fail to join the cluster and remain in the LEFTCLUSTER state after being shut down and coming back up, use the Cluster Admin GUI to log on to one of the surviving nodes and run the CF portion of the GUI. Select Mark Node Down from the Tools menu to mark all of the shutdown nodes as DOWN.

  6. The nodes should successfully join the cluster.

For example, consider the following figure

Figure 5.4 Four-node cluster with cluster partition

In this figure, a four-node cluster has suffered a cluster partition. Both of its CF interconnects (Interconnect 1 and Interconnect 2) have been severed. The cluster is now split into two sub-clusters. Nodes A and B are in one sub-cluster while Nodes C and D are in the other.

To recover from this situation, in instances where SF fails to resolve the problem, you would need to do the following:

  1. Decide which sub-cluster you want to survive. In this example, let us arbitrarily decide that Nodes A and B will survive.

  2. Shut down all of the nodes in the other sub-cluster, here Nodes C and D.

  3. Fix the interconnect break on Interconnect 1 and Interconnect 2 so that both sub-clusters will be able to communicate with each other again.

  4. Bring Nodes C and D back up.

  5. If the LEFTCLUSTER state persists on Nodes C or D, run the Cluster Admin GUI on either Node A or Node B. Start the CF portion of the GUI and go to Mark Node Down from the Tools pull-down menu. Mark any nodes still in the LEFTCLUSTER state as DOWN.