Amazon Web Services
In this AWS re:Invent 2022 session, David and Ian from Amazon discuss the company's approach to observability and monitoring. They cover Amazon's cultural flywheel of improved operations, driven by observability systems that generate telemetry data in metrics, logs, and traces. The speakers explain how Amazon uses various tools like dashboards, service maps, and log analysis to troubleshoot issues and find root causes in complex distributed systems. They emphasize the importance of continuous refinement of observability practices, including the use of high-cardinality metrics and production profiling. The session also highlights Amazon's focus on measuring customer experience through synthetic testing and real user monitoring. Overall, the talk provides insights into Amazon's never-ending journey to achieve perfect granularity in observability, aiming to improve operational excellence and customer experience.