David Yanacek, Senior Principal Engineer
David Yanacek is a Senior Principal Engineer working on services like CloudWatch in the Amazon Monitoring & Observability organization in AWS. David has been a software developer at Amazon since 2006, previously working on Amazon DynamoDB, AWS Lambda, and AWS IoT. He has also worked on internal web service frameworks and fleet operations automation systems. One of David’s favorite activities at work is performing log analysis and sifting through operational metrics to find ways to make systems run more and more smoothly over time.

Authored by David
Amazon's approach to production services monitoring
This session covers the full spectrum of monitoring at Amazon, from how teams assess system health at a high level to how they zoom in to understand the details of a single request. Also, learn how Amazon thinks about percentiles, dimensionality of metrics, dashboards, log analysis, and distributed tracing.
Operational Excellence at Amazon
In this session, learn about Amazon’s operational practices. How the habits that teams have adopted, such as handling retrospectives, sharing knowledge, and regularly reviewing operational metrics, led teams to innovate to build better tools and make architectural shifts.
Architecting and operating resilient serverless systems at scale
In this video, we cover what AWS does to build reliable and resilient services, including avoiding modes and overload, performing bounded work, throttling at multiple layers, guarding concurrency, sending idempotent requests, applying backpressure and fairness in queueing, and performing shuffle sharding.
Implementing health checks
Automatically detecting and mitigating server failures without unintended consequences from fleet-wide false positives.
Instrumenting distributed systems for operational visibility
Gaining operational visibility into production systems, and troubleshoot failures with software instrumentation.
Using load shedding to avoid overload
Strategies for maintaining predictable, consistent performance in the face of overload.
Using dependency isolation to contain concurrency overload
Containing the impact caused by a failing dependency to affect only the relevant functionality in an application.
Fairness in multi-tenant systems
Building fairness into multitenant systems to provide predictable performance and availability.
Avoiding insurmountable queue backlogs
Prioritizing draining important workloads from queue backlogs quickly, and avoid backlogs in the first place.