Top
PRIMECLUSTER Concepts Guide 4.6
FUJITSU Software

1.8.1 Linux

This section describes the availability of cluster system in the following environments in Linux.

1.8.1.1 Physical environment and virtual environment

This section describes the availability of cluster system in the following environments in Linux.

The table below summarizes the availability of error detection in each monitored cluster system.

Table 1.1 Availability according to each cluster system configuration (in a physical environment and virtual environment)

Monitoring target

Physical server

KVM

RHOSP

VMware

Cluster system between guest OSes with the Host OS failover function

Cluster system between guest OSes on multiple host OSes

Cluster system between guest OSes on one host OS

Cluster system between guest OSes on multiple compute nodes

Cluster system between guest OSes on one compute node

Cluster system between guest OSes on multiple ESXi hosts

Cluster system between guest OSes on one ESXi host

1. Unit

Y

Y

N

N

Y*1

N

Y*2

N

2. Shared disk and path of disk access

Y

Y

Y

N

Y

N

Y

N

3. Public LAN

Y

Y

Y

N

Y

N

Y

N

4. OS (physical, host OS/ESXi host)

Y

Y

N

N

Y*1

N

Y*2

N

5. OS (guest OS)

-

Y

Y

Y

Y

Y

Y*3

Y*4

6. Service (cluster application)

Y

Y

Y

Y

Y

Y

Y

Y

Service continuity when an error occurs Y: Available, N: Unavailable, - : Excluded

*1 The service can be continued by configuring high availability for compute instances.
For more information on configuring high availability for compute instances, refer to "High Availability for Compute Instances" in "Red Hat OpenStack Platform."

*2 Only when the I/O fencing function is used or VMware vCenter Server functional cooperation and VMware vSphere HA are used, if a hang-up is detected in a guest OS and the guest OS cannot be switched to the standby system automatically, the guest OS will be changed to LEFTCLUSTER state.

*3 When the guest OS cannot be switched to the standby system automatically, the guest OS becomes the LEFTCLUSTER state.

*4 Only when VMware vCenter Server functional cooperation is used, the guest OS can be switched automatically.

Figure 1.17 Physical environment

Figure 1.18 Virtual environment

For the RHOSP environment, read "host OS" as "compute node". For the VMware environment, read "host OS" as "ESXi host."

How to detect an error in the following targets to be monitored

  1. Unit

    For PRIMEQUEST 2000, the asynchronous monitoring linked with Management Board (MMB), and for PRIMEQUEST3000, the asynchronous monitoring linked with iRMC/MMB, immediately detects a panic or a reset triggered by an error in CPU, memory, or others, and the service is switched to the standby system. For PRIMERGY and virtual environments, an error is detected by the heartbeat monitoring, and the service is switched to the standby system. *1

  2. Shared disk and path of disk access

    Combining with the volume management function (GDS), the system detects a failure of a disk access or disk access path (monitored by the Gds resource), and the service is switched to the standby system when the disk cannot be accessed or a failure of the whole system of the disk access path occurs.

  3. Public LAN

    Combining with the network multiplexing function (Global Link Services, hereinafter referred to as GLS), the system detects a failure of a network adapter or a route in the public LAN (monitored by the Gls resource), and the service is switched to the standby system when a failure of the whole system of the network occurs.

  4. OS (physical and host OS/ESXi host)

    An error is detected by the heartbeat monitoring, and the service is switched to the standby system. *1

  5. OS (guest OS)

    An error is detected by the heartbeat monitoring, and the service is switched to the standby system.

  6. Service (cluster application)

    When a resource error of the cluster application occurs, the service is switched to the standby system.

*1 For the cluster system between guest OSes (RHOSP, VMware) on different host OSes, the status becomes LEFTCLUSTER. After the guest OS is restarted by high availability configuration for compute instances (RHOSP) or the vSphere HA function (VMware), LEFTCLUSTER state of the guest OS is automatically cleared and the service is switched to the standby system.

1.8.1.2 Cloud environment

This section describes the availability of cluster systems in the following environments in Linux.

The table below summarizes the availability of error detection in each monitored cluster system.

Table 1.2 Availability according to each cluster system configuration (in a cloud environment)

Monitoring target

FJcloud-O

NIFCLOUD

FJcloud-Baremetal

AWS

Azure

Cluster system between guest OSes

Cluster system in multiple zones

Cluster system in a single zone

Cluster system between Bare Metal servers

Cluster system in multiple Availability Zones (Multi-AZ)

Cluster system in a single Availability Zone (Single-AZ)

Cluster system in multiple Availability Zones

Cluster system in a single Availability Zone

1. AZ/Zone

N

Y *1

N

- *2

Y

N

Y *1

N

2. Disk

Y

Y

Y

Y

Y

Y

Y

Y

3. Public LAN

Y

Y

Y

Y

Y

Y

Y

Y

4. OS (guest OS)

Y

Y

Y

Y

Y

Y

Y

Y

5. Service (cluster application)

Y

Y

Y

Y

Y

Y

Y

Y

6. Bare Metal server

-

-

-

Y

-

-

-

-

Service continuity when an error occurs Y: Available, N: Unavailable, - : Excluded

*1 An error is detected in AZ (Azure) or a zone (NIFCLOUD), and the node becomes LEFTCLUSTER. Continue the operation by recovering the LEFTCLUSTER state. For how to recover from the LEFTCLUSTER state, refer to "PRIMECLUSTER Cluster Foundation (CF) Configuration and Administration Guide."

*2 There is no AZ in East Japan region 3 and West Japan region 3, where an FJcloud-Baremetal environment is provided.

Figure 1.19 FJcloud-O environment

How to detect an error in the following targets to be monitored

  1. AZ

    AZ is not a target to be monitored.

  2. Disk

    Combining with the volume management function (GDS), the system detects an error of a disk access (monitored by the Gds resource), and the service is switched to the standby system when the disk cannot be accessed.

  3. Public LAN

    Combining with the network multiplexing function (GLS), the system detects a failure of a network adapter or a route in the public LAN (monitored by the Gls resource), and the service is switched to the standby system when a failure of the whole system of the network occurs.

  4. OS (guest OS)

    An error is detected by the heartbeat monitoring, and the service is switched to the standby system.

  5. Service (cluster application)

    When a resource error of the cluster application occurs, the service is switched to the standby system.

Figure 1.20 NIFCLOUD environment

How to detect an error in the following targets to be monitored

  1. Zone

    The cyclic monitoring of the cluster interconnect detects an error of a zone, and the node becomes LEFTCLUSTER.

  2. Disk

    GDS monitors I/O to a disk, and when an error of the disk access occurs, the disk is detached and the service continues.

    If an I/O error occurs in all slices in a mirror, the service is automatically switched to the standby system.

  3. Public LAN

    The network monitoring using ICMP detects a route failure, and the service is automatically switched to the standby system.

  4. OS (guest OS)

    The cyclic monitoring of the cluster interconnect detects an error of the guest OS, and the service is automatically switched to the standby system.

  5. Service (cluster application)

    When a resource error of the cluster application occurs, the service is automatically switched to the standby system.

Figure 1.21 FJcloud-Baremetal environment

How to detect an error in the following targets to be monitored

2. Disk

Combining with the volume management function (GDS), the system detects an error of a disk access (monitored by the Gds resource), and the service is switched to the standby system when the disk cannot be accessed.

3. Public LAN

Combining with the network multiplexing function (GLS), the system detects a failure of a network adapter or a route in the public LAN (monitored by the Gls resource), and the service is switched to the standby system when a failure of the whole system of the network occurs.

4. OS (guest OS)

An error is detected by the heartbeat monitoring, and the service is switched to the standby system.

5. Service (cluster application)

When a resource error of the cluster application occurs, the service is switched to the standby system.

6. Bare Metal server

An error is detected by the heartbeat monitoring, and the service is switched to the standby system.

Figure 1.22 AWS environment

How to detect an error in the following targets to be monitored

  1. AZ

    An error is detected by the heartbeat monitoring, and the service is automatically switched.

  2. Disk

    Combining with the volume management function (GDS), the system detects an error of a disk access (monitored by the Gds resource), and the service is switched to the standby system when the disk cannot be accessed.

  3. Public LAN

    By registering scripts for control to the Cmdline resource, the system detects a route failure, and the service is switched to the standby system in the event of a network failure.

  4. OS (guest OS)

    An error is detected by the heartbeat monitoring, and the service is switched to the standby system.

  5. Service (cluster application)

    When a resource error of the cluster application occurs, the service is switched to the standby system.

Figure 1.23 Azure environment

How to detect an error in the following targets to be monitored

  1. AZ

    An error is detected by the heartbeat monitoring, and the node becomes LEFTCLUSTER.

  2. Disk

    Combining with the volume management function (GDS), the system detects an error of a disk access (monitored by the Gds resource), and the service is switched to the standby system when the disk cannot be accessed.

  3. Public LAN

    By registering scripts for control to the Cmdline resource, the system detects a route failure, and the service is switched to the standby system in the event of a network failure.

  4. OS (guest OS)

    An error is detected by the heartbeat monitoring, and the service is switched to the standby system.

  5. Service (cluster application)

    When a resource error of the cluster application occurs, the service is switched to the standby system.