Amazon Web Services
In this AWS re:Invent 2023 session, David Yanacek explores how to leverage observability to enhance system resilience. He covers various failure modes and demonstrates practical techniques using AWS services like CloudWatch and X-Ray. Key topics include using dimensionality to diagnose issues, uncovering hidden problems through synthetic workloads and real user monitoring, and preventing future issues with auto-scaling and controlled experiments. Yanacek emphasizes the importance of measuring things that can fail separately and using composite alarms to reduce alert fatigue. The talk provides valuable insights for IT professionals looking to improve their observability practices and operate resilient systems effectively.