Provides step-by-step instructions on how to build switcher and resource monitor.
Describes how to configure AWS Lambda for switcher.
Create a blackhole security group as described in "A.1.3.5.4 Rules for Blackhole Security Group".
Create the IAM role that you specify for the AWS Lambda execution role. Create the IAM role that you designed in "A.1.5 Policy Design".
Select the "Create From Scratch" option and create a Lambda function with the following settings.
Setting item | Value |
---|---|
Function name | Fujitsu-Pclswr-Function-Switcher-"Any string" "Any string" must be unique for the VPC. Please specify "Any string" referring to "About Creating Multiple Resources" in "A.1.2 System Configuration". Function names must conform to AWS rules. |
Runtime | Python 3.7 |
Default Execution Role | Specify the IAM role that you created in "Creating Execution Role". |
Enable Network | uncheck |
After you create the Lambda function, continue to modify settings on the following Lambda function tabs.
Code
Upload the Lambda function zip file on the software DVD.
The Lambda function zip file is located at.
Tool/pclswr/FJSVpcl-swr-switcher.zip
Set the following values for "Handler" in "Runtime Settings".
Setting item | Value |
---|---|
Handler | pclswr.lambda_handler |
Configuration
Under general configuration, set the timeout value to.
Setting item | Value |
---|---|
Timeout | 15 min 0 sec |
Display Environment Variables to add the following environment variables.
Key | Value |
---|---|
PCLSWR_SYSTEM_LIST | Specify the list of cluster nodes to be handled by switcher. Separate multiple cluster nodes with spaces. |
PCLSWR_BLACKHOLE | Specify the blackhole security group identifier. |
This subsection describes how to set the switcher and the resource monitor.
Create an Amazon EventBridge rule with the following settings.
Classification | Setting item | Value |
---|---|---|
Name and description | Name | Fujitsu-Pclswr-Rule-InstanceStopped-"Any string" "Any string" must be unique for the VPC. Please specify "Any string" referring to "About Creating Multiple Resources" in "A.1.2 System Configuration". Function names must conform to AWS rules. |
Define pattern | Pattern | Event pattern |
Event matching pattern | Pre-defined pattern by service | |
Service provider | AWS | |
Service name | EC2 | |
Event type | EC2 Instance State-change Notification | |
State | Specific state(s) | |
stopped | ||
Instance | Any instance | |
Select targets | Target | Lambda function |
Function | Specify the Lambda function that you created in "A.2.4.1 Configuring AWS Lambda". |
Classification | Setting item | Value |
---|---|---|
Name and description | Name | Fujitsu-Pclswr-Rule-CloudWatchStatusCheck-"Any string" "Any string" must be unique for the VPC. Please specify "Any string" referring to "About Creating Multiple Resources" in "A.1.2 System Configuration". Function names must conform to AWS rules. |
Define pattern | Pattern | Event pattern |
Event matching pattern | Pre-defined pattern by service | |
Service provider | AWS | |
Service name | CloudWatch | |
Event type | CloudWatch Alarm State Change | |
Select targets | Target | Lambda function |
Function | Specify the Lambda function that you created in "A.2.4.1 Configuring AWS Lambda". |
This subsection describes how to set up the resource monitor.
If you created instances in "A.2.3.1 Creating Cluster Node Instance", create a CloudWatch Alarm for each instance.
CloudWatch Alarm create two alarms: a switcher and an alarm for CloudWatch Agent. To create a CloudWatch Alarm, specify the following settings.
Category | Setting Item | Value |
---|---|---|
Select Metrics | AWS Namespace | EC2 |
Metric | Per-Instance Metrics | |
Instance name (InstanceId) | Select an instance of a cluster node | |
Metric name | StatusCheckFailed Select StatusCheckFailed for the instance on the screen. | |
Metric | Statistic | Maximum |
Period | 1 minute | |
Conditions | Threshold types | Static |
Alarm condition | Greater/Equal | |
Threshold | 0.99 | |
Datapoints to alarm | 2/2 | |
Missing data treatment | Treat missing data as missing | |
Action settings | Notification | None Click the [Delete] button to delete the notification. Turn off notification. |
Auto Scaling action | None | |
EC2 action | None | |
Systems Manager action | None | |
Name and description | Alarm name | Fujitsu-Pclswr-Alarm-Instance-StatusCheckFailed-"Integer value identifying cluster node" The integer value that identifies the cluster node must be unique for each region. For the integer value, refer to "About Creating Multiple Resources" in "A.1.2 System Configuration". |
Category | Setting Item | Value |
---|---|---|
Select Metrics | Custom Namespace | RMS Displays the value you set for namespace in CloudWatch Agent configuration file. |
Dimension | InstanceId, InstanceType, pattern, pid_finder | |
Instance name (InstanceId) | Select an instance of a cluster node. | |
Metric name | procstat_lookup_pid_count Select procstat_lookup_pid_count for the instance on the screen. | |
Metric | Statistic | Average |
Period | 1 minute | |
Conditions | Threshold types | Static |
Alarm condition | Lower/Equal | |
Threshold | 0 | |
Datapoints to alarm | 1/1 | |
Missing data treatment | Treat missing data as missing | |
Action settings | Notification | None Click [Delete] button to delete the notification. Turn off notification. |
Auto Scaling action | None | |
EC2 action | None | |
Systems Manager action | None | |
Name and description | Alarm name | Fujitsu-Pclswr-Alarm-Instance-RMS-"Integer value identifying cluster node" The integer value that identifies the cluster node must be unique for each region. For the integer value, refer to "About Creating Multiple Resources" in "A.1.2 System Configuration". |
Point
Create an alarm for CloudWatch Agent after you start collecting metrics on CloudWatch Agent.
CloudWatch Agent alarms are configured for survival monitoring to detect RMS outages.
RMS shutdown is determined by whether "/opt/SMAW/SMAWRrms/bin/bm" process has stopped. The judgment condition is when the number of "/opt/SMAW/SMAWRrms/bin/bm" processes (pid_count) becomes 0 or lower. The evaluation is once a minute. If the condition is satisfied in the evaluation, the alarm transitions from OK to ALARM.
Alarms can be in the following states. When the status changes from OK to ALARM, the process of switching instances is performed.
OK (within thresholds for which metrics are defined)
ALARM (the metric is above a defined threshold)
INSUFFICIENT_DATA (CloudWatch Agent stopped, metrics unavailable, and so on)
To create an alarm, see "Create a CloudWatch Alarm based on a static threshold" in the "Amazon CloudWatch User Guide".
Describes how to configure Amazon DynamoDB, which is required for switcher.
Switcher relies on Amazon DynamoDB to manage information for the cluster nodes. Create a table in Amazon DynamoDB. Create as many items as there are cluster nodes. If you add a cluster node later, add an item.
Create an Amazon DynamoDB table with the following settings.
Setting item | Value |
---|---|
Table name | Fujitsu-Pclswr-DB-Switcher |
Partition key | SystemID / Number |
Sort Key | InstanceID / String |
Settings | Default settings |
Items store information about cluster nodes. For each instance of the cluster node, create an item as follows.
Add new attribute | Attribute name | Value |
---|---|---|
- (Added) | SystemID (Partition key) | Specifies the same integer value as the [fujitsu.pclswr.id] key specified in "A.2.3.1 Creating Cluster Node Instance". |
- (Added) | InstanceID (Sort key) | Instance ID of the cluster node |
String | State | NOT_SWITCHED |
Point
The attribute name state contains a value indicating whether the cluster node instance is switched. Its values are described below.
NOT_SWITCHED
Indicates that the cluster node instance is not switched. Specify NOT_SWITCHED if no initial value or switch has been performed.
SWITCHING
Indicates that the cluster node instance is switched.