When using PRIMECLUSTER in a VMware environment, clustering (virtual machine function) between guest OSes on multiple ESXi hosts are available.
When an error occurs on a guest OS within a VMware environment, applications on that guest OS will no longer work. With PRIMECLUSTER applied to guest OSes, applications will failover from the active guest OS to a standby guest OS in the event of a failure, which creates a highly reliable guest OS environment.
For the cluster system in VMware environment, make sure to select one of the two functions, which are "VMware vCenter Server functional cooperation" and "I/O fencing function", to stop the virtual machine forcibly.
To stop the operation node certainly and then fail over the operation when an error occurs in the guest OS or in the virtual machine, it is generally recommended to set up the forcible stop with the VMware vCenter Server functional cooperation.
However, set up the forcible stop with the I/O fencing function in the following cases:
VMware vCenter Server is disabled, or the guest OS cannot communicate with VMware vCenter Server or cannot operate VMware vCenter Server.
Upgrading from the VMware environment of PRIMECLUSTER 4.3A40 or earlier in which the I/O fencing function is used.
Note
Note the following points when using the forcible stop with the I/O fencing function:
The guest OS on which the cluster application is started panics regardless the survival priority if the cluster partition occurs due to failure of the cluster interconnect.
If the operation node panics when the operation is failed over, the status of cluster application may become Online temporarily on both operation and standby guest OSes. However, as access to the shared disk from both guest OSes at the same time is prevented, there is no impact on the operation.
The cluster application cannot be switched by the forcible stop with the VMware vCenter Server functional cooperation when an error occurs in ESXi or in the server, and the cluster node becomes the status of LEFTCLUSTER at this time. By using VMware vSphere HA, the cluster application can be switched when an error occurs in ESXi or in the server.
When a failure occurs in a guest OS, the virtual machine of the guest OS is powered off forcibly by linking with VMware vCenter Server. By this process, an operation can be failed over.
This process is enabled to stop a virtual machine in the cluster environment without a shared disk, or in the cluster environment between guest OSes on a single ESXi host. Instead of using the shared disk, configuration that shares the data by using VMware vSAN is available.
Figure H.1 Cluster Systems in a VMware Environment (VMware vCenter Server functional cooperation)
If the VMware vCenter Server functional cooperation is used with VMware vSphere HA, an operation can be failed over even in the case of ESXi failure or server failure.
Figure H.2 Cluster Systems in a VMware Environment (VMware vCenter Server functional cooperation + VMware vSphere HA + vSAN)
Use SCSI-3 Persistent Reservation as the exclusive control function to panic and stop the failed guest OS. By this operation, the operation can be switched. This process does not require VMware vCenter Server. It means that a guest OS can be panicked without any other servers besides the virtual machines that configure the cluster. However, a shared disk connected via RDM (Raw Device Mapping) and available with SCSI-3 Persistent Reservation is required.
Note
A forcible stop with the I/O fencing function is disabled in the following environments:
Environment between guest OSes on a single ESXi host
Environment in which the cluster application is configured with 3 or more nodes
Environment in which multiple cluster applications that use a shared disk exist
When using the disk configured with GDS mirroring among servers
VMware vSAN disk is used as the shared disk
When using VMware vSphere HA
When using PRIMECLUSTER Wizard for SAP HANA
Information
In the cluster configuration where the I/O fencing function is used, by setting the SA_icmp shutdown agent, response from the guest OSes is checked on the network paths (administrative LAN/interconnect). The application will be switched when no response is confirmed from the guest OSes. In this case, if the failed guest OS does not stop completely (when the OS is hanging, for example), both guest OSes may access the shared disk at the same time. By using SCSI-3 Persistent Reservation, the I/O fencing function prevents both guest OSes from accessing the shared disk at the same time. (To prevent the access from both guest OSes in the configuration where the VMware vCenter Server function is used, stop the failed guest OS completely before switching the guest OS.)
In the cluster system in which the takeover IP address is registered, the route information of communication device is updated when switching the application. Therefore, the switching destination node is accessible even when the takeover IP address is activated on multiple guest OSes. However, if the failed guest OS remains running without completely shut down, the route information of communication device may return to the switching source node. By advertising the route information from the switching destination node to the communication device in a 60-second cycle, the time to accidentally access the switching source node can be reduced.
Figure H.3 Cluster Systems in a VMware Environment (I/O fencing function)
The comparison table below shows the forcible stop with VMware vCenter Server functional cooperation and the forcible stop with the I/O fencing function.
Item | Function to stop a virtual machine forcibly | ||
---|---|---|---|
VMware vCenter Server functional cooperation (recommended) | I/O fencing function | ||
Configuration | VMware vCenter Server | Required | Optional |
Cluster configuration between guest OSes on a single ESXi host | Allowed | Not allowed | |
Number of nodes that configure the cluster application | 2 to 16 nodes | 2 nodes | |
Cluster application configuration | Unlimited | Allowed only one of the following configurations: | |
Settings of survival priority | Allowed | Not allowed (regardless of the survival priority, a guest OS on which cluster applications are started panics) | |
Shared disk | Optional Note: When the disk is shared between the cluster nodes, for all of the virtual disk, RDM disk, and VMware vSAN, the number of shared ESXi hosts must be within 8. If the number of shared ESXi hosts is within 8, up to 16 cluster nodes can share the disk. | Required The following disks are not allowed: | |
Path policy for the Native Multipathing (NMP) | All supported | Only either of "Most Recently Used" or "Round Robin" is supported. | |
VMware vSphere HA | Allowed | Not allowed | |
PRIMECLUSTER Wizard for SAP HANA | Allowed | Not allowed | |
Other unsupported configurations and functions | - VMware vSphere FT | - VMware vSphere FT | |
Operation when an error occurs | Error in cluster interconnect | An operating node or a standby node is forcibly stopped, and an operation is failed over or the standby node is cut off. | - Only the cluster interconnect is specified for SA_icmp: - The cluster interconnect and any other networks are specified for SA_icmp: |
Error in operating guest OS or in virtual machine | An operating node is forcibly stopped, and an operation is failed over. | An operating node panics, and an operation is failed over. | |
Error in standby guest OS or in virtual machine | A standby node is forcibly stopped and then cut off. | A standby node is cut off (the standby node does not panic). * | |
Failure in ESXi or in server | - If VMware vSphere HA is allowed: | The cluster application is switched (the operating node panics) or the standby node is cut off (the standby node does not panic). * | |
Failure in VMware vCenter Server | A virtual machine cannot be forcibly stopped | - | |
Failure in network between a virtual machine and VMware vCenter Server | A virtual machine cannot be forcibly stopped | - | |
Dump collection when an error occurs | Not allowed | Allowed | |
Restrictions in maintenance | When using Cold Migration | None | If the migration is performed to operate two nodes that configure the cluster on a single ESXi host, an operation cannot be failed over when an error occurs either in a guest OS, a virtual machine, and the cluster interconnect. |
* If the I/O fencing function is used, the standby node is cut off when it temporarily does not work. The standby node works as follows after it can work again.
When specifying only the cluster interconnect to SA_icmp:
The cluster application is switched to the standby node that became to work. The old operation node may panic by the I/O fencing function.
When specifying the cluster interconnect and other networks to SA_icmp:
The cluster application cannot be switched and the cluster node becomes the status of LEFTCLUSTER. Restart OS of the standby node.
Note
Make sure to set either one of VMware vCenter Server functional cooperation or the I/O fencing function. A configuration with both functions or a configuration with neither of them is not allowed.