H.1 Cluster Systems in a VMware Environment

When using PRIMECLUSTER in a VMware environment, clustering (virtual machine function) between guest OSes on multiple ESXi hosts are available.

When an error occurs on a guest OS within a VMware environment, applications on that guest OS will no longer work. With PRIMECLUSTER applied to guest OSes, applications will failover from the active guest OS to a standby guest OS in the event of a failure, which creates a highly reliable guest OS environment.

Stopping virtual machine forcibly

For the cluster system in VMware environment, make sure to select one of the two functions, which are "VMware vCenter Server functional cooperation" and "I/O fencing function", to stop the virtual machine forcibly.

To stop the operation node certainly and then fail over the operation when an error occurs in the guest OS or in the virtual machine, it is generally recommended to set up the forcible stop with the VMware vCenter Server functional cooperation.

However, set up the forcible stop with the I/O fencing function in the following cases:

VMware vCenter Server is disabled, or the guest OS cannot communicate with VMware vCenter Server or cannot operate VMware vCenter Server.
Upgrading from the VMware environment of PRIMECLUSTER 4.3A40 or earlier in which the I/O fencing function is used.

Note

Note the following points when using the forcible stop with the I/O fencing function:
- The guest OS on which the cluster application is started panics regardless the survival priority if the cluster partition occurs due to failure of the cluster interconnect.
- If the operation node panics when the operation is failed over, the status of cluster application may become Online temporarily on both operation and standby guest OSes. However, as access to the shared disk from both guest OSes at the same time is prevented, there is no impact on the operation.
- The cluster application cannot be switched by the forcible stop with the VMware vCenter Server functional cooperation when an error occurs in ESXi or in the server, and the cluster node becomes the status of LEFTCLUSTER at this time. By using VMware vSphere HA, the cluster application can be switched when an error occurs in ESXi or in the server.

Forcible stop with VMware vCenter Server functional cooperation (recommended)

When a failure occurs in a guest OS, the virtual machine of the guest OS is powered off forcibly by linking with VMware vCenter Server. By this process, an operation can be failed over.

This process is enabled to stop a virtual machine in the cluster environment without a shared disk, or in the cluster environment between guest OSes on a single ESXi host. Instead of using the shared disk, configuration that shares the data by using VMware vSAN is available.

Figure H.1 Cluster Systems in a VMware Environment (VMware vCenter Server functional cooperation)

If the VMware vCenter Server functional cooperation is used with VMware vSphere HA, an operation can be failed over even in the case of ESXi failure or server failure.

Figure H.2 Cluster Systems in a VMware Environment (VMware vCenter Server functional cooperation + VMware vSphere HA + vSAN)

Forcible stop with I/O fencing function

Use SCSI-3 Persistent Reservation as the exclusive control function to panic and stop the failed guest OS. By this operation, the operation can be switched. This process does not require VMware vCenter Server. It means that a guest OS can be panicked without any other servers besides the virtual machines that configure the cluster. However, a shared disk connected via RDM (Raw Device Mapping) and available with SCSI-3 Persistent Reservation is required.

Note

A forcible stop with the I/O fencing function is disabled in the following environments:

Environment between guest OSes on a single ESXi host
Environment in which the cluster application is configured with 3 or more nodes
Environment in which multiple cluster applications that use a shared disk exist
When using the disk configured with GDS mirroring among servers
VMware vSAN disk is used as the shared disk
When using VMware vSphere HA
When using PRIMECLUSTER Wizard for SAP HANA

Information

In the cluster configuration where the I/O fencing function is used, by setting the SA_icmp shutdown agent, response from the guest OSes is checked on the network paths (administrative LAN/interconnect). The application will be switched when no response is confirmed from the guest OSes. In this case, if the failed guest OS does not stop completely (when the OS is hanging, for example), both guest OSes may access the shared disk at the same time. By using SCSI-3 Persistent Reservation, the I/O fencing function prevents both guest OSes from accessing the shared disk at the same time. (To prevent the access from both guest OSes in the configuration where the VMware vCenter Server function is used, stop the failed guest OS completely before switching the guest OS.)

In the cluster system in which the takeover IP address is registered, the route information of communication device is updated when switching the application. Therefore, the switching destination node is accessible even when the takeover IP address is activated on multiple guest OSes. However, if the failed guest OS remains running without completely shut down, the route information of communication device may return to the switching source node. By advertising the route information from the switching destination node to the communication device in a 60-second cycle, the time to accidentally access the switching source node can be reduced.

Figure H.3 Cluster Systems in a VMware Environment (I/O fencing function)

The comparison table below shows the forcible stop with VMware vCenter Server functional cooperation and the forcible stop with the I/O fencing function.

Item		Function to stop a virtual machine forcibly
Item		VMware vCenter Server functional cooperation (recommended)	I/O fencing function
Configuration	VMware vCenter Server	Required (The guest OSes can communicate with VMware vCenter Server or operate VMware vCenter Server. Also in VMware vCenter Server, the user who is authorized to stop an operating virtual machine in the cluster must be created)	Optional
	Cluster configuration between guest OSes on a single ESXi host	Allowed	Not allowed
	Number of nodes that configure the cluster application	2 to 16 nodes	2 nodes
	Cluster application configuration	Unlimited	Allowed only one of the following configurations: - Only one cluster application - Among multiple cluster applications, only one of them contains a shared disk.
	Settings of survival priority	Allowed	Not allowed (regardless of the survival priority, a guest OS on which cluster applications are started panics)
	Shared disk	Optional Following disks are available: - Virtual disk created on the data store that can be accessed from each ESXi host - RDM (Raw Device Mapping) disk) - VMware vSAN disk Note: When the disk is shared between the cluster nodes, for all of the virtual disk, RDM disk, and VMware vSAN, the number of shared ESXi hosts must be within 8. If the number of shared ESXi hosts is within 8, up to 16 cluster nodes can share the disk.	Required (Shared RDM (Raw Device Mapping) disk supporting SCSI-3 Persistent Reservation) The following disks are not allowed: - A virtual disk created on the datastore accessible from each ESXi host - VMware vSAN disk
	Path policy for the Native Multipathing (NMP)	All supported	Only either of "Most Recently Used" or "Round Robin" is supported.
	VMware vSphere HA	Allowed	Not allowed
	PRIMECLUSTER Wizard for SAP HANA	Allowed	Not allowed
	Other unsupported configurations and functions	- VMware vSphere FT - VMware vSphere DRS - VMware vSphere DPM - Snapshot function - Backup by Data Protection - Suspending the virtual machine	- VMware vSphere FT - VMware vSphere DRS - VMware vSphere DPM - Snapshot function - Backup by Data Protection - Suspending the virtual machine - FCoE connection for storages - VMware vSphere vMotion - VMware vSphere Storage vMotion
Operation when an error occurs	Error in cluster interconnect	An operating node or a standby node is forcibly stopped, and an operation is failed over or the standby node is cut off.	- Only the cluster interconnect is specified for SA_icmp: An old operating node may panic due to the I/O fencing function even when the cluster application is switched. - The cluster interconnect and any other networks are specified for SA_icmp: The cluster application is not switched and the cluster node becomes the status of LEFTCLUSTER.
	Error in operating guest OS or in virtual machine	An operating node is forcibly stopped, and an operation is failed over.	An operating node panics, and an operation is failed over.
	Error in standby guest OS or in virtual machine	A standby node is forcibly stopped and then cut off.	A standby node is cut off (the standby node does not panic). *
	Failure in ESXi or in server	- If VMware vSphere HA is allowed: An operation is failed over or the standby node is cut off. - If VMware vSphere HA is not allowed: An operation is not failed over on a single PRIMECLUSTER. A node on the error ESXi becomes LEFTCLUSTER.	The cluster application is switched (the operating node panics) or the standby node is cut off (the standby node does not panic). *
	Failure in VMware vCenter Server	A virtual machine cannot be forcibly stopped	-
	Failure in network between a virtual machine and VMware vCenter Server	A virtual machine cannot be forcibly stopped	-
	Dump collection when an error occurs	Not allowed (Forcible stop by power-off is only allowed. In this case, a cause of the error of the cluster node may not be determined.)	Allowed
Restrictions in maintenance	When using Cold Migration	None	If the migration is performed to operate two nodes that configure the cluster on a single ESXi host, an operation cannot be failed over when an error occurs either in a guest OS, a virtual machine, and the cluster interconnect.

* If the I/O fencing function is used, the standby node is cut off when it temporarily does not work. The standby node works as follows after it can work again.

When specifying only the cluster interconnect to SA_icmp:
The cluster application is switched to the standby node that became to work. The old operation node may panic by the I/O fencing function.

When specifying the cluster interconnect and other networks to SA_icmp:
The cluster application cannot be switched and the cluster node becomes the status of LEFTCLUSTER. Restart OS of the standby node.

Note

Make sure to set either one of VMware vCenter Server functional cooperation or the I/O fencing function. A configuration with both functions or a configuration with neither of them is not allowed.