In the cluster system between guest domains among the different physical partitions in Oracle VM Server for SPARC environment, "I/O fencing function" can be used in the guest domain.
The I/O fencing function is the function to prevent simultaneous access from both nodes to the shared disk device by using SCSI-3 Persistent Reservation as an exclusive control function.
By using the I/O fencing function and the shutdown agents in combination, a guest domain error due to an error in the physical partition can be detected, and then the guest domain can be forcibly stopped to switch the application, even in the environment without PRIMECLUSTER being configured in the control domain.
Note
In the cluster system between guest domains within the same physical partition, the I/O fencing function cannot be used.
Do not set the ICMP shutdown agent in an environment where the I/O fencing function is not used.
When only the ICMP shutdown agent is set without using the I/O fencing function, the cluster applications may start on both guest domains and these applications access the shared disk at the same time, if the error guest domain does not stop completely (when the OS is hanging, for example).
When connecting ETERNUS used as a shared disk and a server with multipath in the control domain, install the ETERNUS multipath driver in the control domain.
In the configuration where the shared disk device is mirrored by ZFS, the I/O fencing function cannot be used.
Set the nodes operated by the userApplication to the scope of GDS shared class used in the userApplication.
For details on the class scope, see "PRIMECLUSTER Global Disk Services Configuration and Administration Guide."
When using the I/O fencing function, make sure to use shared disks of the storage compatible with SCSI-3 Persistent Reservation. Depending on the type of storage, you may need to change the settings to conduct response based on SCSI-3 Persistent Reservation. For details, see the manual of the storage to be used.
See
To use the I/O fencing function, the settings of I/O fencing function in GDS, the settings of shutdown agents, and the settings during the configuration of cluster application are necessary. For the settings of I/O fencing function, refer to "3.2.4 Setting I/O Fencing Function of GDS" and "Checking the registration information of a cluster application" in "6.7.2.1 Creating Standby Cluster Applications."
For the settings of shutdown agent, refer to "5.1.2 Configuring the Shutdown Facility."
When setting the I/O fencing function in PRIMECLUSTER on the guest domain, either ICMP or XSCF must be set as the shutdown agent of the guest domain.
See below for the functions and the notes in combination with each shutdown agent when using the I/O fencing function:
The guest domain can detect an I/O error and switch the application, if an error occurs in the physical partition even in the environment without PRIMECLUSTER being configured in the control domain.
By using the ICMP shutdown agent that checks a response from the failed guest domain and the I/O fencing function that causes the panic to stop the guest domain in combination, the application can be switched without using XSCF.
Figure 2.26 I/O fencing function + ICMP shutdown agent
Note
This configuration cannot be selected in the environment where PRIMECLUSTER is configured in the control domain.
In this configuration, only a single cluster application can be created. If multiple cluster applications are necessary, configure multiple cluster systems to create the configuration where each cluster system controls each cluster application one by one.
The cluster application created in this configuration must satisfy all the conditions below:
Number of nodes that configures cluster application is 2.
Controls the Gds resource.
The I/O fencing resource is registered.
The shared disk controlled by the Gds resource is an EFI labeled disk or a VTOC labeled disk, both of which are compatible with SCSI-3 Persistent Reservation.
If the following errors occur on the operational guest domain in this configuration, the operational node panics and then the application is switched.
Communication disconnection due to a system hang or other causes
Double fault of resources (Double Fault)
If an error other than the above mentioned errors occurs on the guest domain and a subsystem hang occurs when the operational node responds to ping in the network route that is specified to the ICMP shutdown agent, the operational node is not panicked nor forcibly stopped. In this case, the operational node cannot be switched to the standby node automatically. The application must be switched manually.
For the corrective actions when the application must be switched manually, refer to "7.6 CF and RMS Heartbeats."
In addition to using the function of XSCF that forcibly stops the guest domain diagnosed as failed during heartbeat check, using the
I/O fencing function prevents simultaneous access from both nodes to the shared disk device.
However, if an error occurs in the physical partition, the guest domain becomes LEFTCLUSTER.
Figure 2.27 I/O fencing function + XSCF SNMP shutdown agent
See
For more information on XSCF, refer to "2.2.2 XSCF Configuration in SPARC M10 and M12."
Note
In this configuration, multiple cluster applications can be created. However, at least one cluster application that controls the Gds resource is required.
The cluster application that controls the Gds resource and created in this configuration must satisfy all the conditions below:
Configured by 2 nodes.
The I/O fencing resource is registered.
The shared disk controlled by the Gds resource is an EFI labeled disk or a VTOC labeled disk, both of which are compatible with SCSI-3 Persistent Reservation.
In this configuration, the node configuration of the cluster application that does not control the Gds resource can be created with 2 to (number of supported nodes).
In this configuration, the I/O fencing resource cannot be registered to the cluster application that does not control the Gds resource.
See the table below for functions and notes of the comparison with or without the settings of I/O fencing function and with each shutdown agent.
Item | With I/O fencing function | Without I/O fencing function | ||
---|---|---|---|---|
ICMP | XSCF | XSCF | ||
Configuration | Access from the guest domain to XSCF | Not required | Required | Required |
Number of nodes that configure the cluster application | 2 nodes | - Number of cluster applications that control Gds resource: 2 nodes - Number of cluster applications that does not control Gds resource: 2 to (number of supported nodes) | 2 to (number of supported nodes) | |
Cluster application configuration | Only one cluster application | Unrestricted | Unrestricted | |
Resource configuration | Gds resources are required | Gds resources are required | Unrestricted | |
Survival priority settings | Impossible | Possible | Possible | |
Shared disk | Required | Required | Optional | |
GFS shared file system is used | Impossible | Possible | Possible | |
Behavior when an error occurs | Error in the cluster interconnect | - If only the cluster interconnect is specified to SA_icmp: - If the cluster interconnect and other networks are specified to SA_icmp: | The operation node or the standby node is forcibly stopped. The application is switched or the standby node is cut off. | The operation node or the standby node is forcibly stopped. The application is switched or the standby node is cut off. |
Error in the operational guest domain or the virtual machine | The operation node panics and the application is switched. | The operation node is forcibly stopped and the application is switched. | The operation node is forcibly stopped and the application is switched. | |
Error in the standby guest domain or the virtual machine | The standby node is cut off (the standby node does not panic). | The standby node is forcibly stopped and cut off. | The standby node is forcibly stopped and cut off. | |
Error in the physical partition | The application is switched, or the standby node is cut off (the operation node panics while the standby node does not panic). | The application is not switched (the guest domain becomes LEFTCLUSTER as it cannot be forcibly stopped). | The application is not switched (the guest domain becomes LEFTCLUSTER as it cannot be forcibly stopped). | |
Shared disk data protection if the application is started in both systems due to an user operation error | Possible | Possible | Impossible | |
Restrictions in maintenance | Live Migration | Impossible | Impossible | Possible |
Cold Migration | Migration cannot be performed to operate 2 nodes that configure the cluster on the same physical partition. | Migration cannot be performed to operate 2 nodes that configure the cluster on the same physical partition. | Unrestricted |
GFS: Global File Services