Top
PRIMECLUSTER Installation and Administration Guide4.6 Cloud Services
FUJITSU Software

A.2.4 Construction of Switcher and Resource monitor

Provides step-by-step instructions on how to build switcher and resource monitor.

A.2.4.1 Configuring AWS Lambda

Describes how to configure AWS Lambda for switcher.

Creating Blackhole Security Groups

Create a blackhole security group as described in "A.1.3.5.4 Rules for Blackhole Security Group".

Creating Execution Role

Create the IAM role that you specify for the AWS Lambda execution role. Create the IAM role that you designed in "A.1.5 Policy Design".

Creating an AWS Lambda Function

Select the "Create From Scratch" option and create a Lambda function with the following settings.

Setting item

Value

Function name

Fujitsu-Pclswr-Function-Switcher-"Any string"

"Any string" must be unique for the VPC. Please specify "Any string" referring to "About Creating Multiple Resources" in "A.1.2 System Configuration". Function names must conform to AWS rules.

Runtime

Python 3.7

Default Execution Role

Specify the IAM role that you created in "Creating Execution Role".

Enable Network

uncheck

After you create the Lambda function, continue to modify settings on the following Lambda function tabs.

Code

Upload the Lambda function zip file on the software DVD.

The Lambda function zip file is located at.

Tool/pclswr/FJSVpcl-swr-switcher.zip

Set the following values for "Handler" in "Runtime Settings".

Setting item

Value

Handler

pclswr.lambda_handler

Configuration

Under general configuration, set the timeout value to.

Setting item

Value

Timeout

15 min 0 sec

Display Environment Variables to add the following environment variables.

Key

Value

PCLSWR_SYSTEM_LIST

Specify the list of cluster nodes to be handled by switcher. Separate multiple cluster nodes with spaces.

PCLSWR_BLACKHOLE

Specify the blackhole security group identifier.

A.2.4.2 Configuring Amazon EventBridge

This subsection describes how to set the switcher and the resource monitor.

Create an Amazon EventBridge rule with the following settings.

Instance State Change Event

Classification

Setting item

Value

Name and description

Name

Fujitsu-Pclswr-Rule-InstanceStopped-"Any string"

"Any string" must be unique for the VPC. Please specify "Any string" referring to "About Creating Multiple Resources" in "A.1.2 System Configuration". Function names must conform to AWS rules.

Define pattern

Pattern

Event pattern

Event matching pattern

Pre-defined pattern by service

Service provider

AWS

Service name

EC2

Event type

EC2 Instance State-change Notification

State

Specific state(s)

stopped

Instance

Any instance

Select targets

Target

Lambda function

Function

Specify the Lambda function that you created in "A.2.4.1 Configuring AWS Lambda".

CloudWatch Alarm State Change Events

Classification

Setting item

Value

Name and description

Name

Fujitsu-Pclswr-Rule-CloudWatchStatusCheck-"Any string"

"Any string" must be unique for the VPC. Please specify "Any string" referring to "About Creating Multiple Resources" in "A.1.2 System Configuration". Function names must conform to AWS rules.

Define pattern

Pattern

Event pattern

Event matching pattern

Pre-defined pattern by service

Service provider

AWS

Service name

CloudWatch

Event type

CloudWatch Alarm State Change

Select targets

Target

Lambda function

Function

Specify the Lambda function that you created in "A.2.4.1 Configuring AWS Lambda".

A.2.4.3 Configuring CloudWatch Alarm

This subsection describes how to set up the resource monitor.

If you created instances in "A.2.3.1 Creating Cluster Node Instance", create a CloudWatch Alarm for each instance.

CloudWatch Alarm create two alarms: a switcher and an alarm for CloudWatch Agent. To create a CloudWatch Alarm, specify the following settings.

Switcher alarms

Category

Setting Item

Value

Select Metrics

AWS Namespace

EC2

Metric

Per-Instance Metrics

Instance name

(InstanceId)

Select an instance of a cluster node

Metric name

StatusCheckFailed

Select StatusCheckFailed for the instance on the screen.

Metric

Statistic

Maximum

Period

1 minute

Conditions

Threshold types

Static

Alarm condition

Greater/Equal

Threshold

0.99

Datapoints to alarm

2/2

Missing data treatment

Treat missing data as missing

Action settings

Notification

None

Click the [Delete] button to delete the notification.

Turn off notification.

Auto Scaling action

None

EC2 action

None

Systems Manager action

None

Name and description

Alarm name

Fujitsu-Pclswr-Alarm-Instance-StatusCheckFailed-"Integer value identifying cluster node"

The integer value that identifies the cluster node must be unique for each region. For the integer value, refer to "About Creating Multiple Resources" in "A.1.2 System Configuration".

CloudWatch Agent Alarms

Category

Setting Item

Value

Select Metrics

Custom Namespace

RMS

Displays the value you set for namespace in CloudWatch Agent configuration file.

Dimension

InstanceId, InstanceType, pattern, pid_finder

Instance name

(InstanceId)

Select an instance of a cluster node.

Metric name

procstat_lookup_pid_count

Select procstat_lookup_pid_count for the instance on the screen.

Metric

Statistic

Average

Period

1 minute

Conditions

Threshold types

Static

Alarm condition

Lower/Equal

Threshold

0

Datapoints to alarm

1/1

Missing data treatment

Treat missing data as missing

Action settings

Notification

None

Click [Delete] button to delete the notification.

Turn off notification.

Auto Scaling action

None

EC2 action

None

Systems Manager action

None

Name and description

Alarm name

Fujitsu-Pclswr-Alarm-Instance-RMS-"Integer value identifying cluster node"

The integer value that identifies the cluster node must be unique for each region. For the integer value, refer to "About Creating Multiple Resources" in "A.1.2 System Configuration".

Point

  • Create an alarm for CloudWatch Agent after you start collecting metrics on CloudWatch Agent.

  • CloudWatch Agent alarms are configured for survival monitoring to detect RMS outages.

  • RMS shutdown is determined by whether "/opt/SMAW/SMAWRrms/bin/bm" process has stopped. The judgment condition is when the number of "/opt/SMAW/SMAWRrms/bin/bm" processes (pid_count) becomes 0 or lower. The evaluation is once a minute. If the condition is satisfied in the evaluation, the alarm transitions from OK to ALARM.

  • Alarms can be in the following states. When the status changes from OK to ALARM, the process of switching instances is performed.

    • OK (within thresholds for which metrics are defined)

    • ALARM (the metric is above a defined threshold)

    • INSUFFICIENT_DATA (CloudWatch Agent stopped, metrics unavailable, and so on)

  • To create an alarm, see "Create a CloudWatch Alarm based on a static threshold" in the "Amazon CloudWatch User Guide".

A.2.4.4 Configuring Amazon DynamoDB

Describes how to configure Amazon DynamoDB, which is required for switcher.

Switcher relies on Amazon DynamoDB to manage information for the cluster nodes. Create a table in Amazon DynamoDB. Create as many items as there are cluster nodes. If you add a cluster node later, add an item.

Creating Tables

Create an Amazon DynamoDB table with the following settings.

Setting item

Value

Table name

Fujitsu-Pclswr-DB-Switcher

Partition key

SystemID / Number

Sort Key

InstanceID / String

Settings

Default settings

Creating Item

Items store information about cluster nodes. For each instance of the cluster node, create an item as follows.

Add new attribute

Attribute name

Value

-

(Added)

SystemID

(Partition key)

Specifies the same integer value as the [fujitsu.pclswr.id] key specified in "A.2.3.1 Creating Cluster Node Instance".

-

(Added)

InstanceID

(Sort key)

Instance ID of the cluster node

String

State

NOT_SWITCHED

Point

The attribute name state contains a value indicating whether the cluster node instance is switched. Its values are described below.

  • NOT_SWITCHED

    Indicates that the cluster node instance is not switched. Specify NOT_SWITCHED if no initial value or switch has been performed.

  • SWITCHING

    Indicates that the cluster node instance is switched.