AWS Architecture Blog

Exponential Backoff And Jitter

Introducing OCC Optimistic concurrency control (OCC) is a time-honored way for multiple writers to safely modify a single object without losing writes. OCC has three nice properties: it will always make progress as long as the underlying store is available, it’s easy to understand, and it’s easy to implement. DynamoDB’s conditional writes make OCC a […]

Read More

Internet Routing and Traffic Engineering

Internet Routing Internet routing today is handled through the use of a routing protocol known as BGP (Border Gateway Protocol). Individual networks on the Internet are represented as an autonomous system (AS). An autonomous system has a globally unique autonomous system number (ASN) which is allocated by a Regional Internet Registry (RIR), who also handle […]

Read More

Selecting Service Endpoints for Reliability and Performance

Choose Your Route Wisely Much like a roadway, the Internet is subject to congestion and blockage that cause slowdowns and at worst prevent packets from arriving at their destination. Like too many cars jamming themselves onto a highway, too much data over a route on the Internet results in slowdowns. Transatlantic cable breaks have much […]

Read More

Running Multiple HTTP Endpoints as a Highly Available Health Proxy

Route 53 Health Checks provide the ability to verify that endpoints are reachable and that HTTP and HTTPS endpoints successfully respond. However, there are many situations where DNS failover would be useful, but TCP, HTTP, and HTTPS health checks alone can’t sufficiently determine the health of the endpoint. In these cases, it’s possible for an […]

Read More

Doing Constant Work to Avoid Failures

Amazon Route 53’s DNS Failover feature allows fast, automatic rerouting at the DNS layer based on the health of some endpoints. Endpoints are actively monitored from multiple locations and both application or connectivity issues can trigger failover. Trust No One One of the goals in designing the DNS Failover feature was making it resilient to […]

Read More

A Case Study in Global Fault Isolation

In a previous blog post, we talked about using shuffle sharding to get magical fault isolation. Today, we’ll examine a specific use case that Route 53 employs and one of the interesting tradeoffs we decided to make as part of our sharding. Then, we’ll discuss how you can employ some of these concepts in your […]

Read More

Organizing Software Deployments to Match Failure Conditions

Deploying new software into production will always carry some amount of risk, and failed deployments (e.g., software bugs, misconfigurations, etc.) will occasionally occur. As a service owner, the goal is to try and reduce the number of these incidents and to limit customer impact when they do occur. One method to reduce potential impact is […]

Read More

AWS and Compartmentalization

Practically every experienced driver has suffered a flat tire. It’s a real nuisance, you pull over, empty the trunk to get out your spare wheel, jack up the car and replace the puncture before driving yourself to a nearby repair shop. For a car that’s ok, we can tolerate the occasional nuisance, and as drivers […]

Read More

Welcome to the AWS Architecture Blog

At Amazon Web Services we have the great fortune to work on many interesting large-scale distributed systems, as well as the privilege to observe our customers achieve audacious goals. Many highly available services, web sites, and business systems have been built on top of Amazon Web Services. The AWS Architecture blog will dive a little […]

Read More