AWS AppConfig: The Amazon service that helps you scale for large events like Prime Day

Amazon uses a number of AWS services to help meet increased traffic and demand during Prime Day events.

As Jeff Barr has mentioned in his previous blog posts, some key services used in Prime Day include:

Amazon DynamoDB handles the trillions of Prime Day requests.
Amazon Interactive Video Service (Amazon IVS) enables shoppers to shop from livestreams on Amazon Live.
Amazon Elastic Block Store (Amazon EBS) and Amazon Elastic Compute Cloud (Amazon EC2) help fulfill transaction-intensive workloads at massive scale.

However, another AWS service plays an important role in helping Amazon cope with the steep surges in traffic during the 48-hour shopping event. Many teams who work on Prime Day events use AWS AppConfig to deploy dynamic configuration changes independently of any code deployments. Developers roll out configuration changes to applications in seconds and are able to adapt quickly to heightened spikes in demand.

On Prime Day and other surge events, teams at Amazon update their app configuration as they scale. Some teams turn on and off specific site capabilities during the event; this can be done with a feature flag. Other teams are able to adjust different cart and checkout code-paths to maximize efficiencies during peak loads; this can be done with an operational lever. Also, many infrastructure services adjust maximum throughput and transactions-per-second (TPS) limits to handle the increase in simultaneous users; by making this config something that can be updated in real time, teams have more scaling dimensions. These are all done by carefully updating configuration data that is read at runtime by each app.

In this blog post, I’ll show how your engineering teams can use AWS AppConfig for your version of Prime Day events, those periods when your applications have to cope with sizable surges in traffic.

The importance of dynamic configuration

Applications can experience spikes in usage due to an unforeseen event. For example, in the weeks following the COVID-19 outbreak, many video-conferencing and streaming apps saw as much as a threefold increase in the number of users. Even applications designed for planned events such as Prime Day can experience sharp fluctuations in traffic. The number of application users might peak during event launch, and surge at other times of the day. For both unplanned and planned events, your development teams should be able to respond to the changes in real time. Although there are many scaling techniques to consider, I’ll highlight one for updating your app at runtime.

By separating your configuration from code, and then enabling your code to regularly poll configuration update at runtime, you can change the behavior of your application on production in a controlled way. These small updates to configuration tend to be a safer way to update your software’s behavior rather than a full-stack deploy.

Deploying the whole software stack can be a time-consuming process. Even with a solid CI/CD pipeline, there can be many steps needed to get the software updated on production. In addition, updating the code may necessitate restarting an application, which further reduces your ability to respond to changes in timely way. When you have a surge in traffic, it is not prudent to be simultaneously releasing entirely new versions of your software stack.

This is where AWS AppConfig can be a benefit. As traffic to your app surges, AWS AppConfig allows you to configure, validate, and deploy new versions of your configuration. These updates are read by your app at runtime, and can help tune the performance of your app. Configuration updates are made with these additional safety controls in place. In this model, there are no mid-surge deployments of code, just updates to configuration.

Let’s say you work on a food-ordering app, and that you are planning a large marketing promotion that will result in an increase in users. Your app has been optimized over time to handle some increased traffic, but this promotion is estimated to multiply your users 10x. During the surge, you want to optimize the path of getting users through your checkout flow as quickly and efficiently as possible.

First, since you know you will be increasing simultaneous users, you should make sure your operational throttling values are placed into dynamic configuration. This will give you operational levers to adjust limits on the fly during the traffic peaks. Second, let’s say that you have an upsell engine that prompts users for associated drinks and desserts at checkout; this upsell engine requires additional compute resources, and can slow down the checkout flow, even though it sometimes results in a larger food order. You may consider adding a feature flag to disable the upsell engine during the surge to get users through the checkout flow faster. Third, your app has a post-checkout survey to ask users about their experience. While you don’t want to turn the survey off completely, you instead decide to reduce the survey frequency to display for only 20% of your users. You add a configuration that throttles the display of the post-checkout survey.

You would then configure your app to poll for new configuration on a regular basis so that you can adjust these settings on the fly. You would test out adjusting these configurations, and then can turn on and off features, adjust throttling limits, and tune your application during the surge. All of these updates are made without a new code deploy. Of course, there are other scaling techniques beyond using dynamic configuration that should be considered, but this example should illustrate how important configuration can be for scaling. AWS AppConfig can help you manage that dynamic configuration.

Getting started with AWS AppConfig

You can access AWS AppConfig through AWS Systems Manager. Creating a configuration is a simple three-step process:

First, create your application. An application can be a microservice that runs on an EC2 instance, a mobile application, or a serverless application using Amazon API Gateway and AWS Lambda. Simply put, your application is any system that you create for your users.

Second, create your environment. An environment is a logical deployment group of AppConfig targets, such as applications in a Beta or Production environment. You can also define environments for application subcomponents such as the web, mobile, and backend components for your application.

And for the crucial last step, set validation rules and create a deployment strategy for your app.

The best laid plans: when things go wrong

You’ve probably heard the adage about the best laid plans of mice and men.

Although the language in the Robert Burns poem is from an earlier era, the sentiment is spot-on for the twenty-first century world of cloud. A configuration error discovered after deployment can be as problematic as the release of faulty code. The results are especially disastrous during events like Prime Day, when there are surges in traffic and app engagement.

That’s why configuration validation and deployment safety are critical. In addition to allowing you to specify environments (Dev, Beta, Production, others) for your application, AWS AppConfig also allows you to set up Amazon CloudWatch alarms for each of these environments.

AWS AppConfig validators provide a syntactic check using a JSON schema or a semantic check using an AWS Lambda function before deployment. Configuration deployments proceed only after everything passes validation.

There are other fallback mechanisms to ensure that a mistake doesn’t snowball into a disaster. AWS AppConfig allows you to deploy the configuration changes to only a small percentage of users so you can analyze the effects of the configuration changes before rolling them out to the entire user base. You can also specify a development bake time, which is a time window between when you first deploy your configuration changes and when you want the deployment to be considered final.

You can use Amazon CloudWatch alarms to proactively monitor your deployments and detect errors during this time window. If AWS AppConfig encounters an error, it automatically rolls back the deployment to an earlier validated state. This way, even if your best laid plans go awry, they won’t do so for very long.

Conclusion

We live in a nonlinear world, where applications see planned and unexpected surges in traffic. You application might experience your own version of surge events like Prime Day. You can use AWS AppConfig to respond to these changes in real time. When there’s a surge on your website or app, you’ll be ready to delight your customers.

For more information, see the following resources:

AWS Cloud Operations & Migrations Blog

AWS AppConfig: The Amazon service that helps you scale for large events like Prime Day

The importance of dynamic configuration

Getting started with AWS AppConfig

The best laid plans: when things go wrong

Conclusion

About the author

Resources

Follow