Top
PRIMECLUSTER Installation and Administration Guide4.6 Cloud Services
FUJITSU Software

A.3.2 Actions to take when switching is interrupted

Switching to a system using the smart workload recovery function may be interrupted due to an API or service error. This subsection describes what to do if switching is interrupted.

Note

If the problem persists, collect troubleshooting information and contact your our company service representative (SE).

If the switch is aborted, see "A.3.2.1 Reviewing Error Messages" for an error message.

Take the appropriate action as described in the message, then perform "A.3.2.2 Recovering According to Instance State" or "A.3.2.3 Changing the settings of the switch destination AZ".

Note that you can use the AWS Management Console to handle the switch interruption. For instructions about using the AWS Management Console, see the official AWS documentation.

A.3.2.1 Reviewing Error Messages

If the switch is interrupted while the system is operating with the smart workload recovery feature, check the Lambda function log for error messages about the switch interruption. To check for error messages, use the AWS Management Console and do the following:.

  1. From the Region Name drop-down list, select the region in which your system is running.

  2. Open the Amazon CloudWatch service administration screen and choose "Log Group" from the drop-down list that displays "Log" from the sidebar.

  3. From the list of log groups displayed in the Log Groups screen, choose a log group name in the following format that includes the Lambda function name you are using on your production system.

    /aws/lambda/< Lambda function name >
  4. From the list of log streams displayed in the "Log Stream" block, select an item that contains a log that is close to the time when the switch was interrupted in the active system. Check the time in the "Last event time" column to see if this is the log stream that contains the log.

  5. Check the list of messages displayed on the screen for error messages.

  6. Refer to "8.1.4 Error (ERROR) Message" in the "PRIMECLUSTER Messages" and take action according to the error message identified in step 5.

A.3.2.2 Recovering According to Instance State

If you need to recover the instance as per the error message identified in "A.3.2.1 Reviewing Error Messages", recover the instance from which the switch was interrupted. To recover your instance, use the AWS Management Console and perform the following steps:.

A.3.2.2.1 Deleting a Source Instance

To delete the source instance, use the AWS Management Console and do the following.

This procedure is not necessary if the source instance does not exist in the displayed list of instances when you open the Manage Amazon EC2 Service screen, or if the state of the source instance is terminated.

  1. Sets the tag "fujitsu.pclswr.is_recovery_target" value of the source instance to false.

  2. Stop the source instance.
    Not required if the instance state is stopped.

  3. Note the integer value that identifies the cluster node for the tag "fujitsu.pclswr.id" of the source instance that you stopped in Step 2. The saved integer value is used by "A.3.2.2.5 Creating the target instance" when the error message does not identify the cluster node of the source instance. Do not use if the error message confirms it. Terminate the instance that you stopped in step 2.

A.3.2.2.2 Checking the Existence of the target instance

To verify the existence of the switched instance, use the AWS Management Console to check if the error message contains "new instance_id = <new_instance_id>", and then do one of the following:.

If the Error Message Contains "new instance_id = <new_instance_id>"

Depending on the instance ID value of the switched instance described in <new_instance_id>, do the following:.

If the Error Message Does Not Contain "new_instance_id = <new_instance_id>"

The destination instance does not exist.

Proceed to "A.3.2.2.5 Creating the target instance".

A.3.2.2.3 Actions to take depending on the status of the target instance

Check the status of the instance that you checked in "A.3.2.2.2 Checking the Existence of the target instance", and take appropriate action. The procedure is as follows:.

  1. Check the status of the destination instance.

    From the AWS Management Console, open the Manage Amazon EC2 Service screen and check the Instance State column from the displayed list of instances.

  2. Take action according to the status of the switched instance.

    Follow the table below to take action depending on the state of the switched instance.

    Instance State

    treatment content

    terminated

    Because the destination instance does not exist, skip to step 1 in "A.3.2.2.5 Creating the target instance".

    Running

    Because the destination instance was created successfully, skip to step 2 in "A.3.2.2.5 Creating the target instance".

    Note the instance ID of the switched instance from < new_instance_id > in the error message.

    pending

    Proceed to "A.3.2.2.4 Deleting a target instance" because the destination instance was not created successfully.

    Other than the above

    Because an unexpected error might have occurred while creating the instance, collect the survey information and contact your our company service representative (SE).

    If you are in a hurry to switch the system, collect the survey information and proceed to "A.3.2.2.4 Deleting a target instance".

A.3.2.2.4 Deleting a target instance

To delete the switched instance, use the AWS Management Console and do the following:.

  1. Open the "Amazon EC2 Service" administration screen and select "Instances" from the sidebar.

  2. From the list of displayed instances, select the check box for the instance.

  3. From the Instance State drop-down list, select Terminate Instance.

A.3.2.2.5 Creating the target instance

To create a switched instance, use the AWS Management Console and do the following:.

  1. Create a destination instance from the AMI.

    After creating the switched instance as described in "A.2.3.1 Creating Cluster Node Instance", note the instance ID. The instance ID of this switched to instance is used in Step 2, "A.3.2.2.6 Configuring CloudWatch Alarm", and Step 2 in "A.3.2.2.7 Configuring Amazon DynamoDB".

  2. Configure the target group for the ELB.

    This step is not required if the target instance has already been added to the ELB target group.

    Add the instance ID of the switched instance to the ELB target group.

    For information about configuring target groups, see "A.2.3.2 Configuring Network Takeover". However, do not perform Step 1 of "A.2.3.2 Configuring Network Takeover" and add to an existing target group.

    If you provided the instance ID of the switched instance in step 2 of "A.3.2.2.3 Actions to take depending on the status of the target instance", add it to the target group.

A.3.2.2.6 Configuring CloudWatch Alarm

Configure CloudWatch alarms.

If the target instance is already configured for CloudWatch alarms, this step is not required.

Set the instance ID of the switched instance to the instance name (InstanceID) for the CloudWatch alarm.

For information about configuring CloudWatch alarms, see "A.2.4.3 Configuring CloudWatch Alarm".

If you recorded the instance ID of the switched instance in step 2 of "A.3.2.2.3 Actions to take depending on the status of the target instance", set the recorded instance ID to a CloudWatch alarm.

A.3.2.2.7 Configuring Amazon DynamoDB
  1. Review the tables in Amazon DynamoDB.

    Verify that the table "Fujitsu-Pclswr-DB-Switcher" has the following entries:.

    Attribute name

    Value

    SystemID

    Same value as the tag for the instance on the cluster node "fujitsu.pclswr.id"

    InstanceID

    Same as the <instance id> of the source instance in the error message

  2. Update the table in Amazon DynamoDB according to the results of step 1.

    • If the item checked in step 1 exists:

      Update the table Fujitsu-Pclswr-DB-Switcher from step 1.

      Update the value of the attribute "InstanceID" to the instance ID of the switched instance.

      If you recorded the instance ID of the switched instance in step "A.3.2.2.3 Actions to take depending on the status of the target instance", set that instance ID to the attribute "InstanceID".

      If the value of attribute "State" is SWITCHING, update the value of attribute "State" to NOT_SWITCHED.

      Attribute name

      Value

      SystemID

      Same value as the tag for the instance on the cluster node "fujitsu.pclswr.id"

      InstanceID

      Instance ID of the switched instance

      State

      NOT_SWITCHED

    • If the item checked in step 1 does not exist:

      Add the following to the table "Fujitsu-Pclswr-DB-Switcher":. If you recorded the instance ID of the switched instance in step "A.3.2.2.3 Actions to take depending on the status of the target instance", set that instance ID to the attribute "InstanceID".

      Attribute name

      Value

      SystemID

      Same value as the tag for the instance on the cluster node "fujitsu.pclswr.id"

      InstanceID

      Instance ID of the switched instance

      State

      NOT_SWITCHED

A.3.2.3 Changing the settings of the switch destination AZ

Follow the actions for the error message identified in "A.3.2.1 Reviewing Error Messages", and take action in case the switch is interrupted due to AZ resource exhaustion. Because AZ resources are depleted, remove the subnet of the AZ whose resources are depleted from the switched AZ. Also, add a subnet for the AZ that has not experienced resource exhaustion to the switched AZ. Use the AWS Management Console to do the following:.

  1. Create an instance according to "A.3.2.2.5 Creating the target instance" in the switch-source AZ.

  2. Change the value of the tag "fujitsu.pclswr.is_recovery_target" of the subnet to which you want to switch AZ that is experiencing resource exhaustion to false.

  3. Note the integer value that identifies the cluster node from the tag "fujitsu.pclswr.idlist" of the subnet to which AZ is switched when resource exhaustion occurs. The integer value that identifies the cluster node may have multiple values. This integer value is used in step 4.

  4. Configure the subnet to set AZ as the switch destination.

    To add a new switched subnet

    1. Create a virtual system that contains the subnet to switch to, as described in "A.2.1 Creating Virtual System". Make sure that the subnet you create can use API endpoints and mount targets for EFS.

    2. Configure network takeover, as described in "A.2.3.2 Configuring Network Takeover". Make sure that NLB and ALB are available on the subnet you created.

    3. Add the integer value that identifies the cluster node you recorded in Step 3 to the subnet tag "fujitsu.pclswr.idlist". This step is not required if it has already been added.

    4. Change the subnet tag "fujitsu.pclswr.is_recovery_target" to true. This step is not required if it has already been changed.

    You already have a subnet to switch to.

    1. Based on the value of the subnet tag "fujitsu.pclswr.idlist" and the integer value you recorded in Step 3 that identifies the cluster node, consider the integer value that identifies the cluster node you want to set as a tag for the source and destination subnets so that the switch destination is set to multiple AZ.

    2. Change the subnet tag "fujitsu.pclswr.is_recovery_target" to true.

    Example

    Here's how to set the subnet tag "fujitsu.pclswr.idlist":.

    The following is an example of setting change when resource exhaustion occurs in AZ of SubnetB.

    Note that SubnetA, SubnetB, and SubnetC are different AZ subnets.

    • Example 1) Only one configured subnet

      Change the value of the SubnetB tag "fujitsu.pclswr.is_recovery_target" to false and the SubnetC tag "fujitsu.pclswr.is_recovery_target" to true.

      The following shows the settings before and after the change.

      Table A.1 Previous setting (only one subnet has been set)

      Subnet Name

      Tag "fujitsu.pclswr.idlist" Value

      Tag "fujitsu.pclswr.is_recovery_target" Value

      SubnetA

      1,2,3,4

      true

      SubnetB

      1,2,3,4

      true

      SubnetC

      tag "fujitsu.pclswr.idlist" value

      false

      Table A.2 The new setting, if there is only one subnet.

      Subnet Name

      Tag "fujitsu.pclswr.idlist" Value

      Tag "fujitsu.pclswr.is_recovery_target" Value

      SubnetA

      1,2,3,4

      true

      SubnetB

      -

      false

      SubnetC

      1,2,3,4

      true

    • Example 2) Two or more configured subnets

      Change the value of the SubnetB tag "fujitsu.pclswr.is_recovery_target" to false.

      The following shows the settings before and after the change.

      Table A.3 Previous setting (if there are two or more configured subnets)

      Subnet Name

      Tag "fujitsu.pclswr.idlist" Value

      Tag "fujitsu.pclswr.is_recovery_target" Value

      SubnetA

      1,3,4

      true

      SubnetB

      1,2,4

      true

      SubnetC

      2,3,4

      true

      Table A.4 The new setting, if there are two or more configured subnets

      Subnet Name

      Tag "fujitsu.pclswr.idlist" Value

      Tag "fujitsu.pclswr.is_recovery_target" Value

      SubnetA

      1,2,3,4

      true

      SubnetB

      -

      false

      SubnetC

      1,2,3,4

      true