Guidance for Optimizing Heterogeneous Auto Scaling Group Resource Utilization on AWS

This Guidance shows how to use Elastic Load Balancing (ELB) to efficiently distribute traffic across heterogeneous Auto Scaling groups. When scaling is based on metrics like CPU utilization or RAM, hot spots can form, with smaller instances reaching their full capacity even as larger instances remain underutilized. With existing routing algorithms, ELB cannot route requests proportionately to targets with different weights or sizes. This Guidance provides a balanced distribution method that considers the capacity of individual target groups. As a result, you can minimize hot spots and time-outs, raise scale-out threshold limits, and reduce compute costs and CO2 emissions.

Please note: [Disclaimer]

Architecture Diagram

[Architecture diagram description]

Download the architecture diagram PDF

Guidance Architecture Diagram for Optimizing Heterogeneous Auto Scaling Group Resource Utilization on AWS

Step 1
Configure an AWS Lambda function with a load balancer Amazon Resource Name (ARN) through Elastic Load Balancing (ELB) and with listener ARNs. You can have multiple listeners for each load balancer.

Step 2
The Lambda function updates the target group weights for each listener dynamically and periodically (default: 15 minutes).

Step 3
ELB dynamically routes traffic based on the weighted percentage of the target groups.

Step 4
Define multiple homogeneous Amazon Elastic Compute Cloud (Amazon EC2) Auto Scaling groups. Configure similarly capable instances in a single Auto Scaling group.

Step 5
Define up to five target groups. Map multiple Auto Scaling groups to corresponding target groups. For example, you could group instances that are based on virtual CPUs (vCPUs).

Step 6
The diagram shows example percentages of ELB listeners’ forwarding weights to each target group.

Step 7
Use attribute-based instance type selection to select similarly capable instances.

Step 8
Configure a Lambda function to update the ELB target group weights at a specified interval (for instance, every 5 minutes) or based on Amazon CloudWatch metrics or Amazon EventBridge events.

Well-Architected Pillars

The AWS Well-Architected Framework helps you understand the pros and cons of the decisions you make when building systems in the cloud. The six pillars of the Framework allow you to learn architectural best practices for designing and operating reliable, secure, efficient, cost-effective, and sustainable systems. Using the AWS Well-Architected Tool, available at no charge in the AWS Management Console, you can review your workloads against these best practices by answering a set of questions for each pillar.

The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.

Operational Excellence

EventBridge implements Lambda functions at specified intervals so that you can automate the use of heterogeneous instances in an Auto Scaling group. CloudWatch monitors and logs events, such as interruptions to Amazon EC2 Spot Instances, which you can further configure in the Spot Instance interruption dashboard.

Read the Operational Excellence whitepaper
Security

AWS Identity and Access Management (IAM) grants Lambda only the permissions that are necessary for it to modify listener weights on the specified ELB ARN and listener ARNs. By using the principle of least privilege, you can better control access to AWS resources.

Read the Security whitepaper
Reliability

This Guidance is an enhanced optimization solution built on a reliable architecture. It uses ELB for traffic distribution so that you will not have a single point of failure. ELB uses synchronous loose coupling to avoid directing traffic to overloaded Amazon EC2 instances, reducing the chance of application failure. As a result, you can minimize downtime errors. Additionally, this Guidance uses multiple weighted target groups backed by Auto Scaling groups. This adds an additional layer of reliability, because even if one Auto Scaling group cannot get enough Amazon EC2 instances, other Auto Scaling groups can support inbound traffic, helping you avoid application failures.

Read the Reliability whitepaper
Performance Efficiency

This Guidance helps you avoid hot spots in heterogenous clusters by balancing the use of larger and smaller Amazon EC2 instances. Additionally, it is an AWS best practice to use a mix of instance sizes and types in a single Auto Scaling group and to use Spot Instances as a diversification tactic. However, this practice can result in unbalanced utilization, so this Guidance uses multiple Auto Scaling groups with similar capability and adjusts the weights at ELB listener forwarders to maximize resource utilization.

Read the Performance Efficiency whitepaper
Cost Optimization

Previously, to avoid timeouts and hot spots, you might have needed to artificially keep utilization below 40 percent. Now, by using this Guidance, you can increase resource utilization by distributing traffic with dynamic weighted routing from ELB. Depending on your application and requirements, you might even be able to achieve utilization of 60 to 70 percent, leading to significant Amazon EC2 cost savings. Additionally, Spot Instances provide a discount of up to 90 percent compared to Amazon EC2 On-Demand Pricing. Finally, Auto Scaling groups scale up and down depending on traffic and usage patterns to optimize resource utilization, and the use of multiple Auto Scaling groups in a dynamic weighted routing solution provides another layer of resource optimization, ultimately resulting in cost savings.

Read the Cost Optimization whitepaper
Sustainability

This Guidance uses Auto Scaling groups that automatically scale in or terminate idle resources. Additionally, the use of multiple homogeneous Auto Scaling groups with similarly capable Amazon EC2 instances helps eliminate hot spots, increase utilization, and reduce wastage, power, and cooling. Spot Instance utilization in an Auto Scaling group also directly contributes to sustainability, because these instances consume the same energy resources as On-Demand Instances, but they would be unused otherwise.

Read the Sustainability whitepaper

Implementation Resources

The sample code is a starting point. It is industry validated, prescriptive but not definitive, and a peek under the hood to help you begin.

Open sample code on GitHub

[SEO Subhead]

Architecture Diagram

Well-Architected Pillars

Implementation Resources

Related Content

[Title]

Disclaimer

Was this page helpful?

Guidance for Optimizing Heterogeneous Auto Scaling Group Resource Utilization on AWS

[SEO Subhead]

Architecture Diagram

Well-Architected Pillars

Implementation Resources

Related Content

[Title]

Disclaimer

Was this page helpful?

Ending Support for Internet Explorer