Why is Elastic Load Balancing unequally routing my load balancer traffic?

Last updated: 2019-12-12

I've configured my load balancer to route traffic equally between instances or across Availability Zones. However, Elastic Load Balancing (ELB) routes more traffic to one instance or Availability Zone than the others. Why is this happening and how can I fix it?

Short Description

ELB might unequally route traffic to your targets if:

  • Clients are routing requests to an incorrect IP address of a load balancer node with a DNS record that has an expired TTL.
  • Sticky sessions (session affinity) are enabled for the load balancer. Sticky sessions use cookies to help the client maintain a connection to the same instance over a cookie's lifetime, which can cause imbalances over time.
  • Available healthy instances aren’t evenly distributed across Availability Zones.
  • Instances of a specific capacity type aren’t equally distributed across Availability Zones.
  • There are long-lived TCP connections between clients and instances.

Resolution

Confirm traffic imbalance

Analyze ELB access logs, if available, to confirm the traffic imbalance. Use command line tools to find the number of requests that are routed by the load balancer to specific applications.

For Application Load Balancers:

awk '{print $5}' *.log | awk -F ":" '{print $1}' | sort | uniq -c | sort -r

For Classic Load Balancers:

awk '{print $4}' *.log | awk -F ":" '{print $1}' | sort | uniq -c | sort -r

ELB adds individual files for each ELB node to your bucket. You can compare the number of lines in your access log files over a specific time period.

Flush the DNS cache

Routing based on out-of-date DNS entries results in an imbalanced RequestCount pattern across different Availability Zones. For more information, see Application Load Balancer Metrics or Classic Load Balancer Metrics. Flush your client's DNS cache to be sure that it uses current DNS records for load balancer nodes.
Note: When cross-zone load balancing is enabled, the load balancer is still able to evenly balance requests at the instance level.

For Linux clients using nscd for DNS caching, run one of the following commands:

sudo /etc/init.d/nscd restart 
# service nscd restart
# service nscd reload

For Linux clients using dnsmasq for DNS caching, run one of the following commands:

$ sudo /etc/init.d/dnsmasq restart
# service dnsmasq restart

For Linux clients using BIND for DNS caching, run one of the following commands:

# /etc/init.d/named restart
# rndc restart
# rndc exec

For Windows clients, run the following command:

ipconfig /flushdns

Note: If you cleared your client’s DNS cache but still experience caching issues, be sure that your client application isn’t caching DNS records.

Check the configuration of sticky sessions

If you use duration-based session stickiness, configure an appropriate cookie expiration time for your specific use case. For more information, see:

If you set session stickiness from individual applications, use session cookies instead of persistent cookies where possible. For more information, see Application-Controlled Session Stickiness (Classic Load Balancers).

Check healthy instance distribution across Availability Zones

If there's an unequal number of available healthy instances in your Availability Zones and cross-zone load balancing is disabled, ELB must balance requests across fewer instances in the affected Availability Zones. The remaining healthy instances process a higher number of requests to compensate, which can negatively impact performance.
Note: A traffic load imbalance across instances or Availability Zones doesn’t necessarily mean that resource utilization is also imbalanced. For example, an imbalance can happen when one or more instances that are behind a load balancer process requests faster than the others.

Maintain an equal number of instances in each enabled Availability Zone. To add more instances as load balancer targets, see:

For Classic Load Balancers and Network Load Balancers, consider enabling cross-zone load balancing to distribute requests at the instance level instead of the Availability Zone level. For more information, see Cross-Zone Load Balancing (Network Load Balancers) or Configure Cross-Zone Load Balancing for Your Classic Load Balancer. Cross-zone load balancing is always enabled for Application Load Balancers.

Check instance type distribution

A Classic Load Balancer with HTTP or HTTPS listeners might route more traffic to higher-capacity instance types. This distribution aims to prevent lower-capacity instance types from having too many outstanding requests. For more information, see Instance Types. It’s a best practice to use similar instance types and configurations to reduce the likelihood of capacity gaps and traffic imbalances.

A traffic imbalance might also occur if you have instances of similar capacities running on different Amazon Machine Images (AMIs). In this scenario, the imbalance of the traffic in favor of higher-capacity instance types is desirable.

Check for long-lived TCP connections

Elastic Load Balancing routes TCP traffic using a round-robin algorithm. Long-lived TCP connections between clients and instances cause uneven traffic load distribution by design. As a result, new instances take longer to reach connection equilibrium. Be sure to check your metrics for long-lived TCP connections that might be causing issues. Also note that with TCP listeners, the load balancer distributes traffic only at the connection level. This means, for example, that clients that are frequently reusing connections for sending and receiving multiple HTTP requests might produce unbalanced traffic at the instance level. Consider moving to a Layer 7 load balancer if your application supports higher layer network protocols such as HTTP, HTTPS, WebSocket, or HTTP2.

Check your load balancer’s RequestCount patterns and other relevant metrics. For more information, see:


Did this article help you?

Anything we could improve?


Need more help?