New – Predictive Scaling for EC2, Powered by Machine Learning
Update May 21, 2021 – Predictive Scaling is now available natively in EC2 Auto Scaling for easier configuration. See our documentation for more information and to get started.
When I look back on the history of AWS and think about the launches that truly signify the fundamentally dynamic, on-demand nature of the cloud, two stand out in my memory: the launch of Amazon EC2 in 2006 and the concurrent launch of CloudWatch Metrics, Auto Scaling, and Elastic Load Balancing in 2009. The first launch provided access to compute power; the second made it possible to use that access to rapidly respond to changes in demand. We have added a multitude of features to all of these services since then, but as far as I am concerned they are still central and fundamental!
New Predictive Scaling
Today we are making Auto Scaling even more powerful with the addition of predictive scaling. Using data collected from your actual EC2 usage and further informed by billions of data points drawn from our own observations, we use well-trained Machine Learning models to predict your expected traffic (and EC2 usage) including daily and weekly patterns. The model needs at least one day’s of historical data to start making predictions; it is re-evaluated every 24 hours to create a forecast for the next 48 hours.
We’ve done our best to make this really easy to use. You enable it with a single click, and then use a 3-step wizard to choose the resources that you want to observe and scale. You can configure some warm-up time for your EC2 instances, and you also get to see actual and predicted usage in a cool visualization! The prediction process produces a scaling plan that can drive one or more groups of Auto Scaled EC2 instances.
Once your new scaling plan is in action, you will be able to scale proactively, ahead of daily and weekly peaks. This will improve the overall user experience for your site or business, and it can also help you to avoid over-provisioning, which will reduce your EC2 costs.
Let’s take a look…
Predictive Scaling in Action
The first step is to open the Auto Scaling Console and click Get started:
I can select the resources to be observed and predictively scaled in three different ways:
I select an EC2 Auto Scaling group (not shown), then I assign my group a name, pick a scaling strategy, and leave both Enable predictive scaling and Enable dynamic scaling checked:
As you can see from the screen above, I can use predictive scaling, dynamic scaling, or both. Predictive scaling works by forecasting load and scheduling minimum capacity; dynamic scaling uses target tracking to adjust a designated CloudWatch metric to a specific target. The two models work well together because of the scheduled minimum capacity already set by predictive scaling.
I can also fine-tune the predictive scaling, but the default values will work well to get started:
I can forecast on one of three pre-chosen metrics (this is in the General settings):
Or on a custom metric:
I have the option to do predictive forecasting without actually scaling:
And I can set up a buffer time so that newly launched instances can warm up and be ready to handle traffic at the predicted time:
After a couple more clicks, the scaling plan is created and the learning/prediction process begins! I return to the console and I can see the forecasts for CPU Utilization (my chosen metric) and for the number of instances:
I can see the scaling actions that will implement the predictions:
I can also see the CloudWatch metrics for the Auto Scaling group:
And that’s all you need to do!
Here are a couple of things to keep in mind about predictive scaling:
Timing – Once the initial set of predictions have been made and the scaling plans are in place, the plans are updated daily and forecasts are made for the following 2 days.
Cost – You can use predictive scaling at no charge, and may even reduce your AWS charges.
Resources – We are launching with support for EC2 instances, and plan to support other AWS resource types over time.
Applicability – Predictive scaling is a great match for web sites and applications that undergo periodic traffic spikes. It is not designed to help in situations where spikes in load are not cyclic or predictable.
Long-Term Baseline – Predictive scaling maintains the minimum capacity based on historical demand; this ensures that any gaps in the metrics won’t cause an inadvertent scale-in.
Predictive scaling is available now and you can starting using it today in the US East (N. Virginia), US East (Ohio), US West (Oregon), Europe (Ireland), and Asia Pacific (Singapore) Regions.