AWS Cloud Operations Blog

Tag: chaos engineering

Learn from AWS Fault Injection Service team’s approach to Game Days

Learn from AWS Fault Injection Service team’s approach to Game Days

In today’s digital world, availability and reliability are crucial competitive advantages. For DevOps and SRE teams, the ability to respond quickly and effectively to incidents can mean the difference between a minor issue and a major disruption of service that impacts millions of customers. Teams must have clear-cut runbooks and appropriate observability to be ready […]

Blog Post title image

Simulating partial failures with AWS Fault Injection Service

Modern distributed systems must be resilient to unexpected disruptions to maintain availability, performance, and stability. Chaos engineering helps teams uncover hidden weaknesses by deliberately injecting faults into a system and observing how it recovers. While traditional testing validates expected behavior, chaos engineering tests system resilience during failures. AWS Fault Injection Service (AWS FIS) is a […]

Chaos engineering leveraging AWS Fault Injection Simulator in a multi-account AWS environment

Large-scale distributed software systems in the cloud are composed of several individual sub-systems—such as CDNs, load balancers, web servers, application servers and databases—as well as their interactions. The interactions sometimes have unpredictable outcomes caused by unforeseen events (for example, a network failure, instance failure, etc.). These events can lead to system-wide failures of your critical […]