Top
PRIMECLUSTER  Installation and Administration Guide4.5
FUJITSU Software

2.2.1 Virtual Machine Function

The virtual machine function is to operate the PRIMECLUSTER systems in virtualized environments for Oracle Solaris.

There are the following virtualized environments:

Note

When installing PRIMECLUSTER in a virtual machine environment, do not perform the following procedures:

  • Stopping the guest domain and the I/O root domain temporarily (suspend)

  • Restarting the guest domain and the I/O root domain from the temporary stopped state(resume)

2.2.1.1 Cluster Systems in Oracle VM Server for SPARC Environment

2.2.1.1.1 Cluster System Configuration in Oracle VM Server for SPARC Environment

The following cluster system configurations are supported in an Oracle VM Server for SPARC Environment:

See the flowchart below for the suitable cluster system configuration of each system requirement.

Figure 2.1 Criteria for cluster system configuration in Oracle VM Server for SPARC environment

(*) The I/O fencing function can be used in the environment where 2 nodes configure the cluster application and Gds resources are controlled. For more information on the I/O fencing function and the ICMP shutdown agent, refer to "2.2.3 I/O Fencing Function."

Monitoring and notes of each cluster system configuration are as follows.

Table 2.1 Comparison in configuration/monitoring/notes

Cluster system configuration

Availability of building a cluster

Monitoring

Notes

Guest

domain

Control

domain

Cluster system between guest domains

(1) Within the same physical partition

Y

N

- The cluster application error on the guest domain or the I/O root domain

-The OS error on the guest domain or the I/O root domain

- Since this environment only comprises one physical partition, all of the cluster nodes will be stopped when the physical partition failure occurs. Therefore, this mode is not suitable for the practical business.

- Set XSCF for the shutdown agent, and set the configuration without I/O fencing function.

(2) Among the different physical partitions (only between guest domains)

Y

-

- The cluster application error

- The OS error on the guest domain or the I/O root domain

- The hardware (network, shared disk and the route) faults

- You must build the cluster system between the cabinets.

- When the physical partition error occurs in the environment where the
I/O fencing and the ICMP shutdown agent are not set, the node (guest domain) becomes LEFTCLUSTER because the guest domain cannot be stopped forcibly.

(3) Among the different physical partitions (between control domains, also between guest domains)

Y

Y

- The cluster application error

- The OS error on the control domain, the guest domain, or the I/O root domain

- The hardware (network, shared disk and the route) faults

- The physical partition error

- You must build the cluster system between the cabinets.

- Make sure to set XSCF for both the shutdown agent of control domain and the shutdown agent of guest domain, and set the configuration without I/O fencing function on the guest domain.

(4) Cluster system between control domains

N

Y

- The cluster application error on the control domain

- The control domain OS error

- The control domain hardware (network, shared disk and the route) faults

- The error of the guest domain status (which is displayed by the ldm list-domain command)

PRIMECLUSTER does not monitor the status of guest domains and applications.

Availability of building a cluster Y: Required N: Not available -: Not required

Note

  • A tagged VLAN interface cannot be used for the cluster interconnect.

  • In the environment where a cluster application is built in the cluster system on the control domain, if an error occurs in the cluster application that is built on the control domain, the error may affect the guest domain to which the virtual device is provided by the control domain.

Cluster system between guest domains within a same physical partition

This configuration enables the cluster system to operate on guest domains or on I/O root domains within a single physical partition. This is effective when verifying operation of cluster applications operated on PRIMECLUSTER. The following types of error monitoring are performed in this configuration. This configuration supports only for SPARC M10 and M12.

  • The cluster application error on the guest domain or the I/O root domain

  • The OS error on the guest domain or the I/O root domain

    Figure 2.2 Cluster system between guest domains within a same physical partition

Note

  • Since this environment comprises a single physical partition, when the physical partition failure occurred, all cluster nodes are stopped. In consequence, the transaction comes to a stop. Therefore, this mode is not suitable for business operation.

  • The I/O fencing function cannot be used.

  • Specify the same type for domains in the cluster. A cluster cannot be configured with different types of domains, for example, between the guest domain and I/O root domain, or between the control domain and I/O root domain.

  • When using a virtual disk as a shared disk of a cluster between guest domains in PRIMECLUSTER, you need to specify a timeout option of the virtual disk.

    [Specifying a timeout option]

    When omitting the timeout option, or specifying 0 to the option, an I/O error does not occur even the control domain or I/O root domain stops. You should wait for the recovery of the control domain or I/O root domain.
    When specifying a value greater than 0 to the timeout option, an I/O error will occur after the specified number of seconds has passed.

    The following explains how to specify a timeout option:

    Example 1: Specifying 15 (seconds) to the timeout when assigning a virtual disk.

    # ldm add-vdisk timeout=15 vdisk0 disk0@primary-vds0 guest0

    Example 2: Specifying 15 (seconds) to the timeout for the assigned virtual disk.

    # ldm set-vdisk timeout=15 vdisk0 guest0

    For details on the timeout option, see Oracle VM administration guide.

Cluster system between guest domains among the different physical partitions

This enables the cluster system to operate between guest domains or I/O root domains (including an I/O domain) among different physical partitions. In a cluster system that consist of only guest domains and I/O root domains, when the physical partition failure occurred, the nodes that construct the cluster may come into the LEFTCLUSTER state. For dealing with this, setting the I/O fencing function and the ICMP shutdown agent to the guest domain, or installing PRIMECLUSTER switches cluster applications on the guest domain or I/O root domain automatically even when the physical partition failure occurred. This configuration supports only for SPARC M10 and M12.

See

For more information on the I/O fencing function and the ICMP shutdown agent, refer to "2.2.3 I/O Fencing Function."

The following errors are monitored in this configuration:

  • Cluster application errors on a control domain(*1), a guest domain, or an I/O root domain

  • OS errors on a control domain(*1), a guest domain, or an I/O root domain

  • Hardware (network, shared disk and the route) faults

  • Physical partition errors(*1, *2)

    *1) Only when PRIMECLUSTER is built on the control domain

    *2) Only when the I/O fencing function and the ICMP shutdown agent are set.

However, use this configuration with careful consideration of system design because this function limits other functions.

Note

  • When building the cluster system on multiple physical partitions within a single cabinet, the transaction comes to a stop if the cabinet failed. Therefore, you must build the cluster system between the cabinets.

  • Specify the same type for domains in the cluster. A cluster cannot be configured with different types of domains, for example, between the guest domain and I/O root domain, or between the control domain and I/O root domain.

  • When using a virtual disk as a shared disk of a cluster between guest domains in PRIMECLUSTER, you need to specify a timeout option of the virtual disk.

    [Specifying a timeout option]

    When omitting the timeout option, or specifying 0 to the option, an I/O error does not occur even the control domain or I/O root domain stops. You should wait for the recovery of the control domain or I/O root domain.
    When specifying a value greater than 0 to the timeout option, an I/O error will occur after the specified number of seconds has passed.

    The following explains how to specify a timeout option:

    Example 1: Specifying 15 (seconds) to the timeout when assigning a virtual disk.

    # ldm add-vdisk timeout=15 vdisk0 disk0@primary-vds0 guest0

    Example 2: Specifying 15 (seconds) to the timeout for the assigned virtual disk.

    # ldm set-vdisk timeout=15 vdisk0 guest0

    For details on the timeout option, see Oracle VM administration guide.

In addition, when PRIMECLUSTER is built on the control domain, note the following points as well:

  • Make sure to set XSCF for both the shutdown agent of control domain and the shutdown agent of guest domain.

  • The I/O fencing function cannot be used.

  • When creating the cluster application on the control domain, the guest domain, or the I/O root domain, do not specify the RMS priority (ShutdownPriority) attribute.

  • Set the survival priority of the guest domains or I/O root domains so as to be the same order relation as that of the control domain.

  • When a failure of the control domain (including the cluster application error) is detected and the control domain cannot be forcibly stopped, all the guest domains or all the I/O domains within the failed physical partition are stopped regardless of whether a cluster system exists. This is because of stopping the physical partition forcibly.

  • When a virtual I/O is set on the control domain, the guest domain within the failed physical partition may be stopped regardless of whether a cluster system exists.

Figure 2.3 Cluster system between guest domains among the different physical partitions (only between guest domains)

Figure 2.4 Cluster system between guest domains among the different physical partitions (between control domains, also between guest domains)

Figure 2.5 Switching image when the physical partition failure occurred

Cluster system between the control domain

This configuration applies PRIMECLUSTER on the control domain in an environment where the guest domain is configured, so that the cluster on the control domain can monitor the state of the guest domain.

In this configuration, the operation can be continued even when the hardware (networks and disks) failed for starting the guest domain on the other control domain to continue the operation by executing failover of the control domain. Applying PRIMECLUSTER to the control domain monitors the following failures that disable applications on guest domains:

  • The cluster application error on the control domain

  • The control domain OS error

  • The control domain hardware (network, shared disk and the route) fault

  • The guest domain status (the state displayed on the ldm list-domain) error

When a failure occurred, the guest domain is switched to the standby system to realize the guest domain environment with high reliability.

Figure 2.6 Cluster system between control domains

Note

  • PRIMECLUSTER does not monitor the status of guest domains and applications.

When using the cluster system between the control domains, the redundant line control method supported by GLS is only the NIC switching mode.

Note

  • GLS must be installed in both control and guest domains.

  • The I/O used in a guest domain must only be assigned to a virtual disk provided in a control domain.

  • Multiple guest domains on the same control domain cannot share a GDS shared class. When configuring multiple guest domains, please create shared classes separately for each guest domain.

  • When a failure of the control domain (including the cluster application error) is detected and the control domain cannot be forcibly stopped, all the guest domains or all the I/O domains within the failed physical partition are stopped regardless of whether a cluster system exists. This is because of stopping the physical partition forcibly.

  • When the virtual I/O is set on the control domain, the guest domain within the failed physical partition may be stopped regardless of whether a cluster system exists.

2.2.1.1.2 Migration for a Cluster System in Oracle VM Server for SPARC Environment

Following two types of the Migration function can be used for a cluster system in Oracle VM Server for SPARC Environment:

These functions can be used in combination with ServerView Resource Orchestrator Cloud Edition.

The Migration function of Oracle VM Server for SPARC can be used in the following cluster system configuration:

2.2.1.1.3 When Migrating a Cluster System in the Physical Environment to a Guest Domain in Oracle VM Server for SPARC Environment (Physical to Virtual)

In the physical environment, you can migrate a cluster system that uses PRIMECLUSTER 4.2A00 or later to a guest domain (or I/O root domain) in Oracle VM Server for SPARC Environment (Physical to Virtual: hereafter referred to as P2V). (Only SPARC M10 and M12 are supported)

See

Figure 2.12 Cluster system before migration

Figure 2.13 Cluster system after migration

System requirement for migration
  • PRIMECLUSTER version

    PRIMECLUSTER 4.2A00 or later

  • Supported OS

    Solaris 10

  • GLS redundant line switching method

    NIC switching mode and GS/SURE linkage mode

  • File system in a shared disk

    UFS, ZFS, and GFS(only for Solaris 10)

Note

  • The disk size of GDS volumes in a shared disk must be the same in the migration source and migration destination.

  • You must migrate user data with ETERNUS storage migration or LUN to LUN such as REC beforehand.

  • Use GDS and GLS functions in the cluster system on a guest domain after migration.

Use the same configuration as the migration source after the migration.

See

If a setting has to be changed after completing the cluster system migration, see the following to change the setting:

  • "Part 4 System Configuration Modification"

  • "PRIMECLUSTER Global Link Services Configuration and Administration Guide 4.5: Redundant Line Control Function"

  • "PRIMECLUSTER Global Disk Services Configuration and Administration Guide 4.5"

  • "PRIMECLUSTER Global File Services Configuration and Administration Guide 4.5"

Note

There are some important points when using a cluster system in an Oracle VM Server for SPARC Environment. For details, see "14.2 Precautions on Using Cluster Systems in Oracle VM Server for SPARC Environments."

2.2.1.2 Cluster System Operating in Oracle Solaris Kernel Zones Environment

The following cluster system configurations are supported in Oracle Solaris Kernel Zones environment. You can build an Oracle Solaris Kernel Zones environment in the physical environment, or on the control domain and guest domain in Oracle VM for SPARC environment. The guest domain includes the I/O root domain in this section.

Monitoring and notes of each cluster system are as follows.

Cluster system
configuration

Availability of building a cluster

Monitoring

Notes

Kernel Zones

Guest domain

Control domain

Within the same physical

partition

Cluster system between Kernel Zones (physical environment/
control domain)

Y

-

N

- The cluster application error on the Kernel Zone

- The OS error on the Kernel Zone

Since this environment only comprises one physical partition, all of the cluster nodes will be stopped when the physical partition failure occurs. Therefore, this mode is not suitable for the practical business.

Cluster system between Kernel Zones (guest domain)

Y

Y

N

- The cluster application error on the Kernel Zone or guest domain

- The OS error on the Kernel Zone or guest domain

Since this environment only comprises one physical partition, all of the cluster nodes will be stopped when the physical partition failure occurs. Therefore, this mode is not suitable for the practical business.

Among the different physical partitions

Cluster system between Kernel Zones (physical environment/
control domain)

Y

-

Y

- The cluster application error

- The OS error in the physical environment, on the control domain, or Kernel Zone

- The hardware (network, shared disk and the route) failures

- The physical partition error

You must build the cluster system between the cabinets.

Cluster system between Kernel Zones (guest domain)

Y

Y

-

- The cluster application error

- The OS error in the physical environment, on the control domain(*), guest domain, or Kernel Zone

- The hardware (network, shared disk and the route) failures

- The physical partition error(*)

- You must build the cluster system between the cabinets.

- If you do not build PRIMECLUSTER in the physical environment or on the control domain, when a physical partition error occurs, the node (guest domain) becomes LEFTCLUSTER state because the guest domain cannot be stopped forcibly. By building PRIMECLUSTER in the physical environment or on the control domains, the cluster application will be switched over automatically.

Availability of building a cluster Y: Required N: Not available -: Not required

*) Only when PRIMECLUSTER is built in the physical environment or on the control domain

Note

  • A tagged VLAN interface cannot be used for the cluster interconnect.

  • In the environment where a cluster application is built in the cluster system on the control domain or guest domain, if a cluster application error occurs on the control domain or guest domain, the error may affect Kernel Zones.

Cluster system between Kernel Zones within a same physical partition (physical environment/control domain)

This configuration enables the cluster system to operate between Kernel Zones that are built on the control domain within a single physical partition. This is effective when verifying operation of cluster applications operated on PRIMECLUSTER. The following types of error monitoring are performed in this configuration:

Figure 2.14 Cluster system between Kernel Zones within a same physical partition (physical environment/control domain)

Note

Since this environment only comprises one physical partition, all of the cluster nodes will be stopped when the physical partition failure occurs. Therefore, this mode is not suitable for the practical business.

Cluster system between Kernel Zones within a same physical partition (guest domain)

This configuration enables the cluster system to operate between Kernel Zones that are built on the guest domain within a single physical partition. This is effective when verifying operation of cluster applications operated on PRIMECLUSTER.

In a cluster system installed only on the Kernel Zones, when the guest domain error occurs, the nodes that construct the cluster may come into the LEFTCLUSTER state. For dealing with this, install PRIMECLUSTER also to the guest domain, and it is possible to switch over the cluster applications on the Kernel Zones automatically even when the guest domain error occurred.

The following types of error monitoring are performed in this configuration.

However, use this configuration with careful consideration of system design because this function limits other functions, such as disabling the RMS priority (ShutdownPriority) setting.

Figure 2.15 Cluster system between Kernel Zones within a same physical partition (guest domain)

Note

  • Since this environment only comprises one physical partition, all of the cluster nodes will be stopped when the physical partition failure occurs. Therefore, this mode is not suitable for the practical business.

  • When creating the cluster application on the guest domain, I/O root domain, or Kernel Zones, do not specify the RMS priority (ShutdownPriority) attribute.

  • Set the survival priority of the Kernel Zones so as to be the same order relation as that of the guest domain.

  • Specify the same type for domains in the cluster. A cluster cannot be configured with different types of domains, for example, between the guest domain and I/O root domain, or between the control domain and I/O root domain.

Cluster system between Kernel Zones among different physical partitions (control domain)

This enables the cluster system to operate between Kernel Zones on the control domain among different physical partitions. In a cluster system installed only on the Kernel Zones on the control domain, when the physical partition error occurs, the nodes that construct the cluster may come into the LEFTCLUSTER state. For dealing with this, install PRIMECLUSTER also to the control domain, and it is possible to switch over the cluster applications on the Kernel Zones automatically even when the physical partition error occurred.

However, use this configuration with careful consideration of system design because this function limits other functions, such as disabling the RMS priority (ShutdownPriority) setting.

Figure 2.16 Cluster system between Kernel Zones among different physical partitions (control domain)

Note

  • When creating the cluster application on the control domain, guest domain, or Kernel Zones, do not specify the RMS priority (ShutdownPriority) attribute.

  • Set the survival priority of the Kernel Zones so as to be the same order relation as that of the physical environment/control domain.

  • When building a cluster application on the control domain, if the control domain to be switched cannot be forcibly stopped due to a cluster application error on the control domain, the physical partition is forcibly stopped. Therefore, all the guest domains, I/O root domains, or Kernel Zones within the failed physical partition are stopped regardless of whether a cluster system exists.

  • When a virtual I/O is set on the control domain, the guest domain within the failed physical partition may be stopped regardless of whether a cluster system exists.

  • Specify the same type for domains in the cluster. A cluster cannot be configured with different types of domains, for example, between the guest domain and I/O root domain, or between the control domain and I/O root domain.

Cluster system between Kernel Zones among different physical partitions (guest domain)

This enables the cluster system to operate between Kernel Zones built on the guest domain among different physical partitions. In a cluster system installed only on the Kernel Zones on the guest domain, when the physical partition error or the OS error on the guest domain occurs, the nodes that construct the cluster may come into the LEFTCLUSTER state. For dealing with this, install PRIMECLUSTER also to the control domain, and it is possible to switch over the cluster applications on the Kernel Zones automatically even when the physical partition error or the OS error on the guest domain occurred. The following types of error monitoring are performed in this configuration.

However, use this configuration with careful consideration of system design because this function limits other functions, such as disabling the RMS priority (ShutdownPriority) setting.

Figure 2.17 Cluster system between Kernel Zones among different physical partitions (guest domain)

Note

  • When building a cluster application on the control domain or the guest domain, the control domain or the guest domain may be forcibly switched due to a switchover caused by a cluster application error on the control domain or the guest domain. Therefore, all the guest domains, I/O root domains, or Kernel Zones within the failed physical partition are stopped regardless of whether a cluster system exists. In addition, if the control domain cannot be forcibly stopped, all the guest domains or all the I/O domains within the failed physical partition are stopped regardless of whether a cluster system exists. This is because of stopping the physical partition forcibly.

  • A cluster cannot be configured with different types of domains, for example, between the guest domain and I/O root domain, or between the control domain and I/O root domain.

In addition, when PRIMECLUSTER is built on the domain, note the following points as well:

  • When creating the cluster application on the control domain, guest domain, I/O root domain, or Kernel Zones, do not specify the RMS priority (ShutdownPriority) attribute.

  • Set the survival priority of the Kernel Zones, guest domains, or I/O root domains so as to be the same order relation as that of the control domain.

  • When a virtual I/O is set on the control domain, guest domain or Kernel Zone within the failed physical partition may be stopped regardless of whether a cluster system exists.

2.2.1.3 Cluster System Operating in Oracle Solaris Zones Environment

In an Oracle Solaris Zones environment, the applications on the non-global zone enter an inoperable status when an error occurs to the global zone or non-global zone.
Applying PRIMECLUSTER to the global zone and non-global zone provides status monitoring and a switchover function. Through these means, it becomes possible to switch over to a standby system in the event of an error occurring, and to achieve high reliability for the non-global zone.

You can build Oracle Solaris Zones environments on guest OS domains in Oracle VM Server for SPARC Environments (only for SPARC M10 and M12) as well as on physical server environments.

In addition, when the global zone is Solaris 10, the existing systems running on Solaris 8 or Solaris 9 can also be run on Solaris 10 by migrating them to the non-global zone with Oracle Solaris Legacy Containers (OSLC). (*1)

PRIMECLUSTER provides a status monitoring and switchover function for the non-global zone running on Solaris 8 or Solaris 9. Through these means, it becomes possible to switch over to a standby system in the event of an error occurring, and to achieve high reliability for the non-global zone running on Solaris 8 or Solaris 9.

(*1) To check whether the using middleware product is available in the non-global zone using Oracle Solaris Legacy Containers, see the respective middleware product manuals.

Figure 2.18 Switchover for When a Global Zone OS Error Occurs

Figure 2.19 Switchover for When some Application Error Occurs in a Non-Global Zone

If using a cluster system comprised of three or more nodes, consolidation of the standby server becomes possible by preparing one standby server for the multiple operating servers. An example is shown below.

Figure 2.20 Switchover for When an OS Error for a Global Zone on a Three-Node Configuration Zones Environments Occurs

If using a single-node cluster comprised of one node, the status of OS and applications on the non-global zone are monitored. The availability is increased by restarting the non-global zone or an application on the non-global zone automatically to perform recovery when an error is detected. An example is shown in the following figure.

Figure 2.21 The Operations When an OS Error for a Non-Global Zone on a Single-Node Cluster Operation Zones Environments Occurs.

Note

  • It is not possible to change the "cluster name" or "CF node name" in the non-global zone.

  • The following functions or commands cannot be used in the non-global zone:

    • Automatic configure

    • Shared disk device connection confirmation

    • Operator intervention

    • Fault resource identification

    • Patrol diagnosis

    • clsyncfile (distributes a file between cluster nodes)

  • The operations do not take over between the non-global zones operating on the same global zone.

  • In the environment where a cluster application is built in the cluster system on the global zone, if an error occurs in the cluster application that is built on the global zone, the error may affect the non-global zone that is related to the global zone.