Scaling strategies for Elastic Load Balancing

Elastic Load Balancing (ELB) offers four types of load balancers, all featuring high availability, automatic scaling, and robust security support for your applications: Application Load Balancer (ALB), Network Load Balancer (NLB), Gateway Load Balancer (GWLB), and Classic Load Balancer (CLB).

ELB automatically scales up and down, and scales in and out in response to traffic load, to help manage requests by optimally routing incoming traffic. The load balancer scaling system scales up and out very aggressively in response to the incoming traffic, and keeps enough capacity to withstand an Availability Zone (AZ) impairment. On the other hand, scaling in/down happens very conservatively.

In certain scenarios, as discussed in the subsequent sections, sharding may be needed. Sharding is a form of horizontal scaling that helps to deliver a high degree of reliability and enables higher scalability of applications by distributing the workload across multiple load balancers through Domain Name Service (DNS) using Amazon Route 53 pointing to the same set of backend targets. Similar to sharding a database, where a large database or table is broken up into smaller chunks distributed across multiple servers, you shard the overall capacity of the workload and segment it into multiple load balancers. This way, each load balancer shard is responsible for handling a subset of requests, thereby distributing the load across multiple load balancers. The process of sharding is driven by customers, and AWS does not manage sharding automatically for you.

In this post, we discuss considerations for sharding your NLB, ALB and CLB along with how to implement sharding for ELB to handle large volumes of traffic. We have created a GitHub repository for code samples that can optionally help accelerate the sharding of your ELB. This repository includes samples for AWS CloudFormation templates to monitor the IP addresses being used by either an ALB or CLB, and automate the process to create Route 53 resource records.

Overview of ELB Sharding

Without sharding, a single ELB handles all of the traffic. With sharding, the overall traffic is distributed across multiple ELBs, with each ELB handling a portion of the overall traffic. Figure 1 shows a high-level view of the sharding technique using Amazon Route 53.

Figure 1: High-level view of ELB without/with sharding concept

Note: Sharding discussed here is a different concept than Shuffle sharding, which is at the target level.

Application Load Balancer and Classic Load Balancer Sharding Considerations

ALB and CLB automatically distribute incoming application traffic across multiple targets in one or more AZs. ALB first scales up and then starts scaling out. When you create an Internet or internal-facing ALB or CLB, we create at least one load balancer node in each configured AZ. For internet-facing ALB or CLB, each node has a public IP address available in DNS record, and a private IP address available using the Amazon EC2 describe-network-interfaces API call. For internal-facing ALB or CLB, each node has a private IP address and is available in the DNS record. Each node consumes a single IP address per subnet. To ensure that your load balancer can scale properly, we recommend that you use a minimum subnet size of /27 bitmask for your load balancer subnets and have at least eight free IP addresses per subnet. ALB and CLB can scale up to a maximum of 100 active nodes created across all AZs. Once the load balancer reaches close to the maximum supported 100 active nodes across all AZs, sharding becomes necessary.

When to shard your Application Load Balancer and Classic Load Balancer

ELB uses DNS as the point where clients should send traffic. We monitor the health and status of the resources supporting each IP address and update the load balancer DNS record to contain only IPs of healthy and appropriately scaled resources. There are several operations that can cause the IP addresses for a given ALB or CLB to change. When they do change, new IPs will be added to DNS and the old IPs will be removed. To use ALB and CLBs, we recommend clients to resolve the ELB DNS name and follow the best practices of honoring DNS TTLs (1 minute for all ELBs), retrying failed requests with exponential backoff and jitter, and refreshing DNS after a connection failure. Today, when you create a load balancer, we create a DNS record for it and return a maximum of 8 random IP addresses. This is done to fit the response in a single UDP query response. To determine whether your load balancer needs sharding, you need to get the total number of active nodes on your load balancer, and you can do that by prepending ‘all‘ to the load balancer DNS name as shown below. Please note that this may include the nodes that are unhealthy. We recommend using this DNS name only for troubleshooting or capacity alerting, not to determine where traffic can be sent. We also published a sample CloudFormation script that you can optionally use to monitor the IP addresses of your load balancer through a custom Amazon CloudWatch metric.

To determine the number of active nodes your load balancer uses, you can use dig and note the output.

The syntax is: % dig +short all.<FQDN_of_your_ALB or CLB> | wc -l

% dig +short all.www-example-com-1234567890.us-east-1.elb.amazonaws.com | wc -l

Based on the result from the above command, you can use the following table to determine the need for sharding.

Number of active ALB or CLB nodes across all AZs (Result)

Recommendation

Equal to or less than 12

Sharding is not needed.

Between 12 and 50

Sharding is not needed. Continue to monitor your load balancer

More than 50

You should look at your load balancers historical traffic and determine how long it will be until it doubles, and then plan to shard at that time

Equal to or more than 99

You are already at the maximum number of ALB or CLB nodes, sharding is required immediately.

Network Load Balancer sharding considerations

NLB (and GWLB which is out of scope of this blog) is built on top of AWS Hyperplane, a distributed network function virtualization system deployed in each AZ within a Region. When you create an Internet-facing NLB, it creates an Elastic Network Address (ENI) and assigns an AWS assigned public IP per subnet for you, which is available in the DNS record. You can also optionally choose to specify one of your own Elastic IP addresses (EIP). For internal-facing NLB, it creates an ENI and assigns a private IP address per subnet for you, which is also available in the DNS record. These EIPs for Internet-facing or private IP addresses for internal-facing NLB provide your load balancer with static IP addresses that will not change during the life of the load balancer. The scaling happens transparently for your NLB and independently on a per-ENI basis. If you have created a NLB with 3 AZs, then it creates 3 ENIs on the same NLB and all these ENIs (because they’re in different AZs) do not share scaling information. Therefore, all the ENIs of the same NLB will scale independently based on the traffic detected in that AZ.

To shard your NLB, we recommend customers plan in terms of NLB AZ, meaning 1 NLB with 3 AZs counts as 3 NLB AZs regardless of whether these are 3 NLBs each with 1 AZ or 1 NLB with 3 AZs. This does mean that some workloads will require multiple NLBs. To plan for NLB sharding, we recommend using 100Gbps per NLB/AZ (ENI) as the sharding threshold. Keeping the used capacity in each NLB/AZ (ENI) to be no more than 100 Gbps helps reduce blast radius. Refer to the Observability with Amazon CloudWatch Metrics section later in the blog for more info.

Sharding architectures

In this section, we will focus on single and multi-Region sharding scenarios. Figure 2 below shows an architecture with a single-Region, single ELB, across multiple AZs, without sharding. In this example, you create a single Route 53 alias record targeting the ELB, which handles 100% of the incoming requests. This architecture covers majority of the use-cases without the need for sharding, as the ELB will scale automatically by adding either more nodes (scale out) or larger nodes (scale up) to handle the incoming requests.

Figure 2: Single-Region without sharding

1. Single-Region with sharding

As you determine that ELB sharding may be necessary based on the aforementioned criteria, you then shard the ELB as shown in Figure 3 below. You first determine how many ELBs are needed based on the requirement in question. In the example below, we use two ELBs to handle the incoming requests. Both ELBs point to the same targets. You can register the same target with multiple target groups. To split the load (shard) between the two ELBs, you can use Route 53 weighted routing policy. Weighted routing lets you associate multiple resources with a single domain name (example.com) or subdomain name (acme.example.com) and choose how much traffic is routed to each resource. In Figure 2, we create two Route 53 alias records by pointing the same domain (public hosted zone) www.example.com to the respective ELBs (load-balancer-1-<xxxxxxxxxx>.us-east-1.elb.amazonaws.com and load-balancer-2-<yyyyyyyyyy>.us-east-1.elb.amazonaws.com). We then assign each record a relative weight that corresponds with how much traffic you want to send to each ELB. Here we are splitting the traffic evenly, therefore each ELB will receive 50% of the incoming requests. If you want to stop sending traffic to a resource, you can change the weight for that record to zero. Refer to Route 53 Weighted routing documentation for details.

Figure 3: Single-Region with sharding

2. Multi-Region sharding

In multi-Region sharding (Figure 4), we replicate the same setup in multiple AWS Regions. Region 1 and Region 2 both have multiple ELBs sharded to handle the incoming traffic requests within the respective Region. This is achieved using Route 53 weighted routing policy as described in the previous section to distribute the incoming traffic between the ELB shards. To improve performance for your users by serving their requests from the AWS Region that provides the lowest latency, we can setup Route 53 Latency-based routing. This way, users will be served from the closest Region based on the latency and then to the ELB shard based on the corresponding Route 53 weight assigned. You can create records one at a time, but it may be challenging to keep track of the relationships among the records when you are reviewing the settings in the Route 53 console. Optionally, Route 53 Traffic flow can help simplify the process of creating and maintaining records in large and complex configurations via visual editor (Figure 4) for e.g., by combining latency-based routing with weighted routing policy. Refer to Route 53 traffic flow documentation for more details.

Figure 4: Multi-Region sharding [Click the image to see a larger version in a new tab]

How to shard your ELB

ELB sharding requires thoughtful planning and requires you to take your future growth in to consideration, therefore understand the pre-requisites before getting started. In this section, we will use ALB as an example to show how to shard your load balancer, but the process remains the same for NLB and CLB.

Pre-requisites

Create a new Route 53 public hosted zone or leverage an existing public hosted zone if you are already using Amazon Route 53
Create a separate target group for each load balancer
Make sure you have created the required number of load balancers. In our example, we show two ALBs (ALB1 and ALB2 respectively) both in the us-east-1 Region which we would like to distribute traffic. Note the DNS name of both of your ALBs. Refer to the documentation on Getting the DNS name for an Elastic Load Balancing load balancer:
1. load-balancer-1-xxxxxxxxx.us-east-1.elb.amazonaws.com
2. load-balancer-2-yyyyyyyyyy.us-east-1.elb.amazonaws.com

Once you have created the additional(s) ALBs, make sure to register the same targets behind each load balancer. You can also optionally plan to have different targets per ALB, and can optionally use this CloudFormation script to create resource records in Amazon Route 53.

Steps

Step 1: In your Route 53 hosted zone, create an Alias resource record, which we will call “shard.example.com” pointing to the first ALB (Figure 5).

Record name: example.com
Record type: A – Routes traffic to an IPv4 address and some AWS resources
Alias: Select the toggle
Value/Route traffic to: Choose Alias to Application and Classic Load Balancer or Alias to Network Load Balancer for endpoint, select the region the endpoint is in and then select the first load balancer you want to add to the shard.
Routing Policy: Select the Weighted routing policy from the dropdown and specify a number for the Weight that represents the proportion of traffic to send to this record.
Evaluate target health: Click the toggle to enable.

Figure 5: Route 53 Console Create record step for ALB1

When you set Evaluate target health to Yes for an alias record, Route 53 evaluates the health of the resource that the alias target value specifies. Therefore, it is recommended to set it to Yes.

The console prepends dualstack. to the DNS name of the ALB from the same AWS account only. When a client, such as a web browser, requests the IP address for your domain name (example.com) or subdomain name (www.example.com), the client can request an IPv4 address (an A record), an IPv6 address (an AAAA record), or both IPv4 and IPv6 addresses (in separate requests with IPv4 first). The dualstack. designation allows Route 53 to respond with the appropriate IP address for your load balancer based on which IP address format the client requested. You will need to prepend dualstack. for ALB from the different account.

Step 2: Repeat the above steps for creating a resource record for ALB2 with exactly the same name shard.example.com. Figure 6 shows the resource record for ALB2.

Figure 6: Route 53 Console Create record step for ALB2

Upon completion of the above steps, the Route 53 console output will look as shown in Figure 7.

Figure 7: Route 53 Console output after creating the ALBs along with alias records

How will this work?

Every time the user performs a DNS lookup on the resource record “shard.example.com”, Route53 will return up to 8 IP addresses belonging to ALB nodes from either of the ALBs with a probability that we calculate as follows:

p = weight of set 1 / ( (weight of set 1)+(weight of set 2) )

Where p represents the probability of the IPs of set 1 being returned, which in this case is:

p = 100 / (100+100) = 0.5

So, the probability is 0.5 or 50%.

You can extend this concept to more than two ALBs, or alter the weights to have very fine granular control over the traffic that you send to each ALB.

Observability with Amazon CloudWatch Metrics

For ALB, CloudWatch metrics, specifically the RequestCount can be used to monitor the result (before and after) of sharding. This metric shows the number of requests processed over IPv4 and IPv6. This metric is only incremented for requests where the load balancer node was able to choose a target. Requests that are rejected before a target is chosen are not reflected in this metric. Figure 7 below shows the ALB request distribution before sharding with ALB1 (blue line) receiving all the traffic and after sharding, with ALB2 (green line) taking 50% of the load thereby evenly distributing the traffic load.

To monitor the current amount of bytes processed by your NLB, we recommend to monitor the ProcessedBytes, ProcessedBytes_TCP, ProcessedBytes_TLS, and ProcessedBytes_UDP CloudWatch metrics depending upon the listener configured on your NLB.

Figure 8: CloudWatch RequestCount metric showing the before/after results of ALB sharding

Considerations

Sticky sessions for your ALB will not expand across two different load balancers. If you use this feature, be aware that a client might end up in a different target whenever it hits a different load balancer.
CloudWatch metrics and logs are individually emitted per load balancer.
You cannot add two target groups to a single Auto Scaling Group (ASG), so you will need to create a new ASG with different targets.
There is an overhead of setting up the ALBs, CLBs or NLBs and adding them behind Route 53 with the alias resource records.
Each ALB and CLB node performs health check to all registered targets independently of other nodes. Therefore, with more ALBs and CLBs, the registered targets should be prepared to handle health checks from all ALB and CLB nodes. If cross-zone load balancing attribute is enabled, this will mean that all ALB and CLB nodes will be health checking all target instances. If cross-zone attribute is not enabled, each ALB and CLB node will health check only the registered targets in the same AZ that they are in.
Each additional load balancer will incur additional hourly cost. Refer to the ELB pricing page for details.
Each additional ALB and CLB also requires at least eight free IP addresses per subnet, to ensure that the load balancer can scale properly. If you determine that you have high number of ALB nodes or ENIs and you want to shard, you will need a significantly higher number of free IPs available. For example, if you have 50 nodes ALB in 5 AZs, 10/AZ, you may need 10 more IPs per AZ to scale to the maximum size of 100 nodes.
Make sure your ALB and CLB are configured to have the appropriate level of capacity based on expected traffic. We suggest that you should shift traffic using the weighted record slowly, increasing the amount to the new load balancer at no more than double every 5 minutes.

Conclusion

In this blog post, we discussed the foundations of ELB sharding technique and it’s process to scale your load balancer. We also covered various considerations that leads to sharding your ALB, CLB and NLB using Amazon Route 53. Overall, ELB sharding can be a powerful technique for improving the scalability of the load balancer. However, it requires careful planning and implementation to do effectively and efficiently. For more information about ELB, you can refer to the following resources:

Rohit Aswani

Rohit is a Principal Specialist Solutions Architect focused on Networking at AWS, where he helps customers build and design scalable, highly-available, secure, resilient and cost effective networks. He holds a MS in Telecommunication Systems Management from Northeastern University, specializing in Computer Networking. In his spare time, Rohit enjoys hiking, exploring new coffee places and traveling to new places..

Lucas Pellucci Barreto Rolim

Lucas Rolim is a Senior Solutions Architect with Amazon Web Services (AWS), working in the Application Networking team and based in Sydney, Australia. He is passionate about assisting customers in making informed decisions while building on AWS. His primary areas of expertise are Networking and Security.

Rizwan Mushtaq

Rizwan is a Principal Solutions Architect at AWS. He helps customers design innovative, resilient, and cost-effective solutions using AWS services. He holds an MS in Electrical Engineering from Wichita State University.

Networking & Content Delivery