Top
PRIMECLUSTER Installation and Administration Guide4.6 Cloud Services
FUJITSU Software

A.1.2 System Configuration

This subsection describes the system configuration of the smart workload recovery feature. Design and deploy your AWS environment based on your system configuration. For deployment instructions, see "A.2 Installation".

Point

For an overview of smart workload recovery feature, see "1.9 Smart workload recovery" in the "PRIMECLUSTER Concept Guide".

For more information about AWS resources and services, see the official AWS documentation.

System configuration

Smart workload recovery is a single-node cluster system that is operational only and does not have a standby system. Remove the instance from the AZ in which the cluster node instance is running in the event of failure and launch a new instance in another AZ.

Before you can switch to an AZ, you must create a network environment, such as a subnet.

Resource monitor and switcher work with services such as AWS CloudWatch and AWS Lambda to provide instance switching. The tag identifies the cluster node instance and the subnet to which it switches. Therefore, you set the specified tags on your instances and subnets.

Figure A.1 System configuration

Component Description

A description of each component is provided below. It also provides an overview of the tasks required to design and deploy an AWS environment. The tasks described below that are required at the time of implementation are performed in "A.2 Installation" and later.

Point

About Creating Multiple Resources

When you design a smart workload recovery feature, you might have multiple resources of the same type. The situation is described below. Here's how to design a smart workload recovery feature in that situation.

  • Region

    If you want to use multiple regions, design them as follows.

    • Amazon DynamoDB as a Switcher

      You must have one Amazon DynamoDB table for each region.

    • Tags to set for instances on cluster nodes ([fujitsu.pclswr.id] key)

      The [fujitsu.pclswr.id] key in the tag must be a unique integer value for each instance. Unique integer values must be unique for each region.

  • VPC

    If you use multiple VPCs, design them as follows.

    • Blackhole Security Group

      You must have one Blackhole security group for each VPC.

    • Switcher for AWS Lambda

      You must have one AWS Lambda function for each VPC.

    • Switcher for Amazon EventBridge

      You must have one set of Amazon EventBridge events (two events) for each VPC.

  • Cluster node instances

    When using instances of multiple cluster nodes, design as follows.

    • ELB

      Prepare a NLB of ELB or ALB for an instance.

    • EFS

      Prepare an EFS to store the data that you want to share for an instance.

    • Amazon CloudWatch for Resource monitor

      You must have one set of Amazon CloudWatch Alarm (two alarms) for each instance.

Operation

The following describes the behavior of each component when a cluster application error/RMS error occurs in a system using the smart workload recovery feature in an AWS environment.

Figure A.2 How smart workload recovery works in AWS environment

When error occurs in a cluster application
  1. RMS detects cluster application error

  2. RMS notifies resource monitor by running FaultScript and shutting down the instance

  3. When resource monitor receives an error, it requests switcher to switch.

  4. Switcher Updates tables in Amazon DynamoDB

  5. Switcher retrieves the AMI of the instance from which it was switched

  6. Switcher destroys the switching instance

  7. Switcher launches instances in any AZ that is different from the source

  8. Switcher switches instances of load balancer target groups

When RMS error occurs
  1. CloudWatch Agent detects RMS error

  2. CloudWatch Agent notifies resource monitor

  3. When resource monitor receives an error, it requests switcher to switch

  4. Switcher updates tables in Amazon DynamoDB

  5. Switcher retrieves the AMI of the instance from which it was switched

  6. Switcher destroys the switching instance

  7. Switcher launches instances in any AZ that is different from the source

  8. Switcher switches instances of load balancer target groups

The instance inherits the following when you switch.

Note

When you switch instances, only CloudWatch Alarm required by resource monitor are guaranteed to be carried over to the switched instance.