Posted On: Nov 8, 2021

You can now create and run AWS Fault Injection Simulator (FIS) experiments that check the state of Amazon CloudWatch alarms and run AWS Systems Manager (SSM) Automations. You can also now run new FIS experiment actions that inject I/O, network black hole, and packet loss faults into your Amazon EC2 instances using pre-configured SSM Agent documents. Because it can be difficult to predict how applications will respond to stress under real world conditions whether in testing or production environments, integrating alarm checks and automated runbooks into your FIS experiments can help you gain more confidence when injecting disruptive events such as network problems, instance termination, API throttling, or other failure conditions.

First, the new CloudWatch action allows you to assert the state of a CloudWatch alarm as part your FIS experiment workflow. Then, when the experiment runs, it will verify that the alarm is in the expected state: OK, ALARM, or INSUFFICIENT_DATA. You can use this for example to check whether or not the impact of a previous action (such as network latency injection) has taken effect before moving on to the next action in the experiment (such as an EC2 instance reboot).

Next, you can now execute AWS Systems Manager Automation runbooks from within an FIS experiment. AWS Systems Manager Automation allows you to build and run automations to perform a variety of common tasks, such as creating and deleting EC2 AMIs or CloudFormation templates, deleting S3 buckets, running AWS Step Function state machines, invoking AWS Lambda functions, creating tags, launching EC2 instances, or making AWS APIs requests. By configuring Automation runbooks to be triggered from within FIS experiments, you can more easily, safely, and repeatably recreate complex failure conditions that more closely resemble real world conditions.

Finally, several new and updated SSM Agent documents are now available to run as fault injection actions, including: an IO stress action; a network blackhole action that drops inbound or outbound traffic for a given protocol and port; a network latency action that adds latency and/or jitter through a given network interface to or from sources you specify such as IP addresses/blocks, domains, or AWS services including S3 and DynamoDB; and two network packet loss actions that can inject packet loss failures into a given interface and (optionally) source. These SSM documents are pre-configured for EC2 instances running Amazon Linux and Ubuntu.

You can get started creating and running fault injection experiments in the AWS Management Console or using the AWS SDKs, and each of these new features is available today. AWS FIS is available in all commercial AWS Regions.