AWS re:Invent 2019: Amazon’s approach to failing successfully (52:29)

Welcome to the real world, where things don’t always go your way. Systems can fail despite being designed to be highly available, scalable, and resilient. These failures, if used correctly, can be a powerful lever for gaining a deep understanding of how a system actually works, as well as a tool for learning how to avoid future failures.

In this session, we cover Amazon’s favorite techniques for defining and reviewing metrics—watching the systems before they fail—as well as how to do an effective postmortem that drives both learning and meaningful improvement.