AWS Compute Blog

How to prepare your application to scale reliably with Amazon EC2

This blog post is written by, Gabriele Postorino, Senior Technical Account Manager, and Giorgio Bonfiglio, Principal Technical Account Manager

In this post, we’ll discuss how you can prepare for planned and unplanned scaling events with Amazon Elastic Compute Cloud (Amazon EC2), and make sure that your infrastructure is ready to sustain increased compute power requirements. Although we’ll be focusing on Amazon EC2, most of the best practices and learnings apply to other services, such as Amazon Relational Database Service (Amazon RDS), Amazon Elastic Container Service (Amazon ECS), Amazon Elastic Kubernetes Service (Amazon EKS).

Most of the challenges related to horizontal scaling can be mitigated by optimizing the architectural implementation and applying improvements in operational processes.

In the following sections, we’ll explore this in depth. Recommendations can be applied partially or fully – they come with different complexities, and each one will help you reduce the risk of facing insufficient capacity errors or scaling delays, as well as deliver enhancements in areas such as fault tolerance, elasticity, and cost optimization.

Architectural best practices

Instance capacity can be regarded as being divided into “pools” defined by AZ (such as us-east-1a), instance type (for example m5.xlarge), and tenancy. Combining the following two guidelines will widen the capacity pools available to scale out your fleets of instances. This will help you reduce costs, transparently recover from failures, and increase your application scalability.

Instance flexibility

Whether you’re migrating a new workload to the cloud, or tuning an existing workload, you’ll likely evaluate which compute configuration options are available and determine the right configuration for your application.

If your workload is already running on EC2 instances, you might already be aware of the instance type that it runs best on. Let’s say that your application is RAM intensive, and you found that r6i.4xlarge instances are best suited for it.

However, relying on a single instance type might result in artificially limiting your ability to scale compute resources for your workload when needed. It’s always a good idea to explore how your workload behaves when running on other instance types: you might find that your application can serve double the number of requests served by one r6i.4xlarge instance when using one r6i.8xlarge instance or four r6i.2xlarge instances.

Furthermore, there’s no reason to limit your options to a single instance family, generation, or processor type. For example, m6a.8xlarge instances offer the same amount of RAM of r6i.4xlarge and might be used to run your application if needed.

Amazon EC2 Auto Scaling helps you make sure that you have the right number of EC2 instances available to handle the load for your application.

Auto Scaling groups can be configured to respond to scaling events by selecting the type of instance to launch among a list of instance types. You can statically populate the list in advance, as in the following screenshot,

The Instance type requirements section of the Auto Scaling Wizard instance launch options step is shown with the option “Manually add instance types” selected.

or dynamically define it by a set of instance attributes as shown in the subsequent screenshot:

The Instance type requirements section of the Auto Scaling Wizard instance launch options step is shown with the option “Specify instance attribute” selected.

For example, by setting the requirements to a minimum of 8 vCPUs, 64GiB of Memory, and a RAM/CPU ratio of 8 (just like r6i.2xlarge instances), up to 73 instance types can be included in the list of suitable instances. They will be selected for launch starting from the lowest priced instance types. If the request can’t be fulfilled in full by the lowest priced instance type, then additional instances will be launched from the second lowest instance type pool, and so on.

Instance distribution

Each AWS Region consists of multiple, isolated Availability Zones (AZ), interconnected with high-bandwidth, low-latency networking. Spreading a workload across AZs is a well-established resiliency best practice. It will make sure that your end users aren’t impacted in the case of a single AZ, data center, or rack failures, as each AZ has its own distinct instance capacity pools that you can leverage to scale your application fleets.

EC2 Auto Scaling can manage the optimal distribution of EC2 instances in a group across all AZs in a Region automatically, as well as deal with temporary failures transparently. To do so, it must be configured to use at least one subnet in each AZ. Then, it will attempt to distribute instances evenly across AZs and automatically cycle through AZs in case of temporary launch failures.

Diagram showing a VPC with subnets in 2 Availability Zones and an Autoscaling group managing groups of instances of different types

Operational best practices

The way that your workload is operated also impacts your ability to scale it when needed. Failure management and appropriate scaling techniques will help you maximize the availability of your environment.

Failure management

On-Demand capacity isn’t guaranteed to always be available. There might be short windows of time when AWS doesn’t have enough available On-Demand capacity to fulfill your specific request: as the availability of On-Demand capacity changes frequently, it’s important that your launch processes implement retry mechanisms.

Retries and fallbacks are managed automatically by EC2 Auto Scaling. But if you have a custom workflow to launch instances, it should be able to work with server error codes, in particular InsufficientInstanceCapacity or InternalError, by retrying the launch request. For a complete list of error codes for the EC2 API, please refer to our documentation.

Another option provided by EC2 is represented by EC2 Fleets. EC2 Fleet is a feature that helps to implement instance flexibility best practices. Instead of calling RunInstances with one instance type and retrying, EC2 Fleet in Instant mode considers all provided instance types, using a list of instances or Attribute Based Instance selection, and provisions capacity from the pools configured by the EC2 Fleet call where capacity was available.

Scaling technique

Launching EC2 instances as soon as you have an initial indication of increased load, in smaller batches and over a longer time span, helps increase your application performance and reliability while reducing costs and minimizing disruptions.

Two different scaling techniques that follow the increase in load are depicted. One scaling approach adds a large number of instances less frequently, while the second approach launches a smaller batch of instances more frequently. In the graph above, two different scaling techniques are depicted. Scaling approach #1 adds a large number of instances less frequently, while approach #2 launches a smaller batch of instances more frequently. Adopting the first approach risks your application not being able to sustain the increase in load in a timely manner. This will potentially cause an impact on end users and leave the operations team with little time to resolve.

Effective capacity planning

On-Demand Instances are best suited for applications with irregular, uninterruptible workloads. Interruptible workloads can avail of Spot Instances that pick from spare EC2 capacity. They cost less than On-Demand Instances but can be interrupted with a two-minute warning.

If your workload has a stable baseline utilization that hardly changes over time, then you can reserve capacity for your baseline usage of EC2 instances using open On-Demand Capacity Reservations and cover them with Savings Plans to get discounted rates with a one-year or three-year commitment, with the latter offering the bigger discounts.

Open On-Demand Capacity Reservations and Savings Plans aren’t tightly related to the EC2 instances that they cover at a certain point in time. Rather they shift to other usage, matching all of the parameters of the respective On-Demand Capacity Reservation or Savings Plan (e.g., Instance Type, Operating System, AZ, tenancy) in your account or across accounts for which you have sharing enabled. This lets you be dynamic even with your stable baseline. For example, during a rolling update or a blue/green deployment, On-Demand Capacity Reservations and Savings Plans will automatically cover any instances that match the respective criteria.

ODCR Fleets

There are times when you can’t apply all of the recommended mitigating actions in anticipation of a planned event. In those cases, you might want to use On-Demand Capacity Reservation Fleets to reserve capacity in advance for additional peace of mind. Capacity reservation fleets let you define capacity requests across multiple instance types, up to a target capacity that you specify. They can be created and managed using the AWS Command Line Interface (AWS CLI) and the AWS APIs.

Key concepts of Capacity Reservation Fleets are the total target capacity and the instance type weight. The instance type weight expresses the number of capacity units that each instance of a specific instance type counts toward the total target capacity.

Let’s say your workload is memory-bound, you expect to need 1,6TiB of RAM, and you want to use r6i instances. You can create a Capacity Reservation Fleet for r6i instances defining weights for each instance type in the family based on the relative amount of memory that they have in an instance type specification json file.

        "InstanceType": "r6i.2xlarge",                       
        "Weight": 1,
        "EbsOptimized": true,           
        "Priority" : 3
        "InstanceType": "r6i.4xlarge",                        
        "Weight": 2,
        "EbsOptimized": true,            
        "Priority" : 2
        "InstanceType": "r6i.8xlarge",                        
        "Weight": 4,
        "EbsOptimized": true,            
        "Priority" : 1

Then, you want to use this specification to create a Capacity Reservation Fleet that takes care of the underlying Capacity Reservations needed to fulfill your request:

$ aws ec2 create-capacity-reservation-fleet \
--total-target-capacity 25 \
--allocation-strategy prioritized \
--instance-match-criteria open \
--tenancy default \
--end-date 2022-05-31T00:00:00.000Z \
--instance-type-specifications file://instanceTypeSpecification.json

In this example, I set the target capacity to 25, which is the number of r6i.2xlarge needed to get 1,6TiB of total memory across the fleet. As you might have noticed, Capacity Reservation Fleets can be created with an end date. They will automatically cancel themselves and the Capacity Reservations that they created when the end date is reached, so that you don’t need to.

AWS Infrastructure Event Management

Last but not least, our teams can offer the AWS Infrastructure Event Management (IEM) program. Part of select AWS Support offerings, the IEM program has been designed to help you with planning and executing events that impact your infrastructure on AWS. By requesting an IEM engagement, you will be supported by AWS experts during all of the phases of your event.Flow chart showing the steps and IEM is usually made of: 1. Event is planned 2. IEM is initiated 6-8 weeks in advance of the event 3. Infrastructure readiness is assessed and mitigations are applied 4. The event 5. Post-event reviewStarting from your business outcomes and success criteria, we’ll assess your infrastructure readiness for the event, evaluate risks, and recommend specific actions to mitigate them. The AWS experts will focus on your application architecture as a whole and dive deep into each of its components with your respective teams. They might also engage with other AWS teams to notify them of the upcoming event, and get specific prescriptive guidance when needed. During the event, AWS experts will have the context needed to help you resolve any issue that might arise as quickly as possible. The program is included in the Enterprise and Enterprise On-Ramp Support plans and is available to Business Support customers for an additional fee.


Whether you’re planning for a big future event, or you want to make sure that your application can withstand unexpected increases in traffic, it’s important that you consider what we discussed in this article:

  • Use as many instance types as you can, don’t limit your workload to use a single instance type when it could also use a lot more types
  • Distribute your EC2 instances across all AZs in the Region
  • Expect failures: manage retries and fallback options
  • Make use of EC2 Autoscaling and EC2 Fleet whenever possible
  • Avoid scaling spikes: start scaling earlier, in smaller chunks, and more frequently
  • Reserve capacity only when you really need to

For further study, we recommend the Well-Architected Framework Reliability and Operational Excellence pillars as starting points. Moreover, if you have an event coming up, talk to your Technical Account Manager, your Account Team, or contact us to find out how we can help!