AWS Fault Injection Simulator
AWS Fault Injection Simulator is a fully managed service for running fault injection experiments on AWS that makes it easier to improve an application’s performance, observability, and resiliency. Fault injection experiments are used in chaos engineering, which is the practice of stressing an application in testing or production environments by creating disruptive events, such as sudden increase in CPU or memory consumption, observing how the system responds, and implementing improvements. Fault injection experiment helps teams create the real-world conditions needed to uncover the hidden bugs, monitoring blind spots, and performance bottlenecks that are difficult to find in distributed systems.
Fault Injection Simulator simplifies the process of setting up and running controlled fault injection experiments across a range of AWS services so teams can build confidence in their application behavior. With Fault Injection Simulator, teams can quickly set up experiments using pre-built templates that generate the desired disruptions. Fault Injection Simulator provides the controls and guardrails that teams need to run experiments in production, such as automatically rolling back or stopping the experiment if specific conditions are met. With a few clicks in the console, teams can run complex scenarios with common distributed system failures happening in parallel or building sequentially over time, enabling them to create the real world conditions necessary to find hidden weaknesses.
Improve application performance, resiliency, and observability
AWS Fault Injection Simulator makes it easy for teams to run and observe their experiments from end-to-end, making it easier to find their monitoring blind spots, performance bottlenecks, or other “unknown” weaknesses missed by traditional software tests.
Validate how your application performs on AWS
AWS Fault Injection Simulator supports creating disruptive events across a range of AWS services, such as Amazon EC2, Amazon EKS, Amazon ECS, and Amazon RDS. Teams can run GameDay scenarios or stress test their most critical applications on AWS at scale, helping them ensure their application will behave as expected.
Safeguard fault injection experiments
AWS Fault Injection Simulator provides the fine-grained controls that teams need to define the specific conditions under which they want to stop an experiment or roll back to the pre-experiment state.
A fast and easy way to get started with fault injection experiments
AWS Fault Injection Simulator provides prebuilt templates that enable teams to set up and run high quality experiments in minutes. Fault Injection Simulator structures the experiment process so that teams can quickly run fault injection experiments by following the step-by-step process in the console and selecting from a predefined list of actions.
Get superior insights by generating real-world failure conditions
AWS Fault Injection Simulator is designed to run disruptive real-world scenarios on AWS that are very difficult for teams to accomplish on their own. With Fault Injection Simulator, teams can take actions such as gradually or simultaneously impairing the performance of different resources in a production environment at scale, enabling them to better validate their application behavior.
How it works
Periodic Game Days
A game day is the process of rehearsing ahead of an event by creating the anticipated conditions and observing how effectively the team and system respond. An event could be an unusually high traffic day, a new launch, a failure, or something else. You can use AWS Fault Injection Simulator to run a game day by creating the event conditions and monitoring the performance of your system.
Continuous Delivery Pipeline Integration
You can integrate AWS Fault Injection Simulator into your continuous delivery pipeline. This will enable you to repeatedly test the impact of fault actions as part of your software delivery process.
Customer success stories
Accenture's Chaos and Resiliency Engineering team has developed an AWS-based chaos engineering framework. It helps identify weaknesses, such as inadvertent dependencies in large-scale distributed applications, before they turn into slowness or unavailability.
“With AWS Fault Injection Simulator, we can add capabilities to this framework to make it easier to create standard chaos experiments centrally using templates. We can then roll those out to product teams across Accenture to ensure consistently high availability and performance across our product suite.”
- Daniel Gunawan, Managing Director, Cloud Infrastructure & Engineering, Accenture SEA.
Classmethod, Inc. has been interested in chaos engineering since 2019, and Classmethod often provide seminars and events about chaos engineering in Japan.
“We are very excited to use a fully managed fault injection service on AWS. We hope that all AWS users will be able to perform fault injection experiments more easily and safely, and see the supported services continue to expand.”
- Satoshi Yokota, CEO, Classmethod, Inc.
nClouds is an award-winning provider of AWS and DevOps consulting and implementation services and an AWS Premier Consulting Partner.
“nClouds is adding advanced chaos engineering capabilities and service offerings to our DevOps practice that will improve the resiliency of distributed service architectures we build for our customers and prove regulatory compliance. AWS Fault Injection Simulator has a deep level of fault injection that will enable us to create failure scenarios that more accurately reflect real-world events. With this capability, we expect to have an even better perspective on the expected time to recovery during real events."
- Marius Ducea, VP DevOps Practice, nClouds.
In this video, Adrian Hornsby talks about challenges of distributed systems, what chaos engineering is and why it is difficult, and introductes AWS Fault Injection Simulator with demo walk-throughs.
In this video, Laura Thomson, PM of AWS Fault Injection Simulator, discusses the intent of the product with demo walk-throughs with the AWS Developer Advocates, Sebastien Stormacq, and Alex Casalboni.