Elastic Load Balancer: Maximizing Benefits and Keeping Costs Low

This post provides guidance on optimizing Elastic Load Balancer (ELB) cost and performance for your workloads. You can find recommendations for achieving optimal throughput and low latency, implementing efficient connection management, and ensuring performance and reliability during periods of high demand.

Organizations building technology solutions on AWS should be well acquainted with the six pillars of the Well‑Architected Framework: operational excellence, security, reliability, performance efficiency, cost optimization, and sustainability. Two of these pillars, namely performance efficiency and cost optimization, are common topics in our conversations with service owners and technologists. In 1-on-1 conversations, our customers are asking us for tools and strategies to them do more with less. Although AWS offers Cost Management and AWS Compute Optimizer for general guidance, we are writing this post to take some of the specific advice we have given our customers about optimizing ELB infrastructure costs and share it our more broadly.

ELB features and pricing

ELB is a portfolio of foundational AWS networking services used by customers to deliver application traffic securely, with high availability and scalability. It is a common building block for customers scaling their services hosted in AWS or on-premises locations. ELB helps organizations achieve application and infrastructure resiliency while simultaneously lowering implementation and operational costs. With ELB, organizations can dynamically adjust their infrastructure (and the costs they incur) based on demand, allowing them to adjust targets according to the incoming load.

Application Load Balancer (ALB) and Network Load Balancer (NLB) are the most commonly used ELB services and are the focal points of this article. It is important to familiarize yourself with the features and pricing of ALB and NLB and how they may apply to your specific use cases. In addition, changing conditions, traffic patterns, or new features may be grounds for revisiting your ELB setup, and how to offer the best experience and performance for the underlying clients.

Maximizing performance and optimizing cost

The following recommendations focus on real-world scenarios that have been successfully used by AWS customers to maximize performance and optimize the cost of ELB for their workloads. Having a good understanding of your workloads, such as the protocols you are using for communications, the APIs and their specific traffic patterns, as well as being aware of your LCU/NLCU characteristics, are all important to making sure you can glean the most benefit from the following best practices.

1. Optimize your client connections

ELB brokers connections between clients and endpoints, such as an application instance. Clients may access a given ELB endpoint frequently, and optimizing the clients’ connection mechanism is crucial for optimal performance. Moreover, it can have an important impact on resiliency and service health.

Clients should make efficient use of connections by using Connection Pooling, which encourages the re-use of connections when clients are making multiple calls to a service. This can also have a beneficial impact on client resources and latency. Connection pools should be populated with connections as they are needed to prevent uncontrollable connection exhaustion on the service fleets. Connection retry strategies such as Exponential backoff and injecting jitter are common ways to shield your service against being overwhelmed by sudden demand.

Familiarize yourself with the configurable ALB and NLB connection idle timeout feature to make sure the connection lifecycle is optimal by tuning clients, servers, and, if applicable, the ELB settings themselves. See here for connection idle timeout documentation for ALB, and see here for the NLB documentation.

A popular feature of ELB is TLS protocol support. AWS announced the availability of TLS 1.3 for ELB in March 2023, which not only offers better security but lowers latency for clients as well by optimizing the TLS handshake to a single round trip exchange. It may be an easy win for you to enable this version for your low latency demanding clients. For clients supporting TLS 1.3, this feature reduces the connection handshake by an entire round trip, which cuts the latency in half.

ALB and NLB also support TLS session re-use. For the correct use case, you can configure clients that support this feature to use a previously established TLS session. This enables them to re-connect to the same TLS endpoint much more rapidly and without as many bytes transferred. This is especially important for mobile clients that may have more frequent connections and disconnections. Furthermore, this can result in substantial latency and cost savings.

2. Optimize inbound and outbound traffic

We recommend that you critically examine the type of traffic that is traversing through ELB.

A standard recommendation for accelerating data transfer is to compress the payload. This can yield significant savings in total bytes transferred and lower the total time required to transfer the data. For many use cases, combining requests so that a larger amount of data is transferred instead of making multiple calls will lower the overhead of the data transport layer as well as lower latency by only requiring a single call.

The data transport layer protocol used is also a large factor in achieving high efficiency for data transfers. Serializing the data you are transferring with an efficient protocol is critical, especially for high-volume payloads. The AWS Service Framework team has developed Smithy in part to address efficiency and scalability challenges of traditional wire protocols. Although this may be a longer transition for some of your clients, targeting the clients and APIs that are causing the highest usage may be a good strategy for savings.

Here is an example of potential traffic cost savings by using Protocol Buffers (a protocol you could use with Smithy), as opposed to a traditional REST API data transport layer:

A REST JSON object such as this one would be 50 bytes in size:
{ "cid": 12345, "ctype": "elb", "country": "CA" }

The same JSON string encoded with Protocol Buffers results in a byte array that looks like the following, and it would be only 12 bytes in size:
08 b9 60 12 03 65 6c 62 1a 02 43 41

The data size alone represents a 76% reduction in payload size in this example. If the top LCU/NLCU was data transfer, then changing the wire protocol in this case would have a direct and substantial impact on the resultant pricing for the ELB instance.

In addition, the preceding example does not consider other advantages that are realized when using a Remote Procedural Call (RPC) mechanism. These include dramatically reducing the accompanying payload metadata, such as http headers, and reduced overhead for data serialization/de-serialization. The use of a binary protocol may not be appropriate for all use cases, so providing a dual RPC and REST interface is an option as well.

Service owners should establish clear guidelines for their clients and provide examples for accessing the service, including how the data can be locally cached by the client and to allow for sub-millisecond latency use cases.

If your API being served through ELB is being used to obtain data that could be accessed directly from the authoritative data provider, then you are spending more money than needed. For example, if you are returning data from an Amazon Simple Storage Service (Amazon S3) object through your API, then consider options to return the object reference to the caller so they may request it from the Amazon S3 service directly. This avoids the potential costs associated with Amazon S3 transfer charges as well, but it needs to be weighed against potentially introducing additional complexity.

Block/Filter/Cache requests at the edge to avoid charges associated with illegitimate or incorrectly formatted requests. For Internet accessible destinations, AWS WAF helps you protect against common web exploits and bots that can affect availability, compromise security, or consume excessive resources. AWS WAF is tightly integrated with Amazon CloudFront, which can be used to deliver traffic faster to your customers around the world by caching content, terminating TLS connections at the edge, and delivering it through Edge Locations.

3. Cross-Zone Load Balancing: Need to know

A best practice for reliability is to deploy your services and distribute traffic to multiple AWS Availability Zones (AZs).
NLB by default does not enable Cross-Zone Load-Balancing (CZLB), because the nature of the traffic may often not benefit from enabling CZLB.
With NLB cross-zone enabled, if your connections are long-lived and you observe an issue in one of your AZ deployments, then shutting down the service in that AZ interrupts your clients and shunts the traffic to the other zone(s). However, when the AZ comes back online, load balancing isn’t equally distributed for the established clients and it may take a prolonged period before this is the case. Having a method to gradually and safely drain connections is important for critical user applications.
If you enable CZLB for NLB, then be aware of some of the considerations around client IP preservation.
If you choose to enable CZLB for your NLB, you will incur charges for data transfer charges between zones. This is NOT the case for ALB. ALB by convention enables CZLB, but it can be disabled for target groups.

4. Good housekeeping

Consolidate ELBs where it makes sense. Some AWS services that use ELB as part of their feature set have built-in techniques to minimize the number of ELBs used. For example, Amazon Elastic Kubernetes Service (Amazon EKS) can share a single ALB with multiple services using IngressGroups. However, be aware that each service adds a listener rule and you should familiarize yourself with how they are configured.

It is a good practice to set up metrics that track your underlying service behavior so that you can review and automatically be alerted when requests start returning invalid responses. A malfunctioning client can quickly become your #1 traffic source. Use Amazon CloudWatch anomaly detection to discover and alert on issues impacting your service.

You should find any unused ALBs and NLBs and remove them from your account to help lower the cost of your monthly AWS bill. For example, an ALB with target groups that do not have any healthy resources attached, such as Amazon Elastic Compute Cloud (Amazon EC2) instances, are a strong indication that it is no longer used. Hidden infrastructure automation bugs can easily leave old infrastructure behind. We have real customer cases where they spend millions on unused resources: don’t be one of them. An ELBv2 load-balancer is considered “unused” when the associated target group has no Amazon EC2 target instance registered or when the registered target instances no longer report as healthy.

Conclusion

We recommend internalizing this guidance and re-visiting it as your workloads change and scale. It is important to understand the type of traffic that flows through your ELBs and the client types that access these endpoints so you can optimize the traffic flow. In many cases, you will find that optimizing for performance and efficiency from an ELB point of view not only decreases your cost for ELB, but also has downstream benefits for other AWS resources and your clients. We hope the guidance in this blog was helpful!

It is now up to you to look at your ELB setup, your clients, and then identify and prioritize the changes that allow you to optimize your cost, performance, and latency. Your customers and your pocketbook will thank you for it. We encourage you to explore further cost optimization topics with AWS.

About the authors

abel_cruz Abel Cruz serves as a Principal Customer Solutions Manager at AWS, providing strategic technology and business guidance to AWS’ largest and most strategic customers. In addition to his undergraduate EE degree from Walla Walla University, and his MBA from Seattle Pacific University (SPU), he is also a Certified Information Systems Security Professional (CISSP), and a certified Professional Program Manager (PMP).

avinash_kolluri Avinash Kolluri is a Senior Solutions Architect at AWS. He works across Amazon Alexa and Devices to architect and design modern distributed solutions. His passion is to build cost-effective and highly scalable solutions on AWS. In his spare time, he enjoys cooking fusion recipes and traveling.

Philippe_Lantin Philippe Lantin is a Principal Solutions Architect for Strategic Accounts at AWS. He helps Amazon develop innovative business and architectural solutions using cloud native services. Philippe is passionate about delivering experiences and technology that were previously science fiction to reality, and innovating for customers to create delightful results.

Networking & Content Delivery