Why is Elastic Load Balancing unequally routing my load balancer traffic?

Last updated: 2022-01-25

I've configured my load balancer to route traffic equally between instances or across Availability Zones. However, Elastic Load Balancing (ELB) routes more traffic to one instance or Availability Zone than the others. Why is this happening and how can I fix it?

Short description

ELB might unequally route traffic to your targets if:

  • Clients are routing requests to an incorrect IP address of a load balancer node with a DNS record that has an expired TTL.
  • Sticky sessions (session affinity) are enabled for the load balancer. Sticky sessions use cookies to help the client maintain a connection to the same instance over a cookie's lifetime, which can cause imbalances over time.
  • Available healthy instances aren’t evenly distributed across Availability Zones.
  • Instances of a specific capacity type aren’t equally distributed across Availability Zones.
  • There are long-lived TCP connections between clients and instances.
  • The connection uses a WebSocket.

Resolution

Confirm traffic imbalance

Analyze ELB access logs, if available, to confirm the traffic imbalance. Use command line tools to find the number of requests that are routed by the load balancer to specific applications.

For Application Load Balancers:

awk '{print $5}' *.log | awk -F ":" '{print $1}' | sort | uniq -c | sort -r

For Classic Load Balancers:

awk '{print $4}' *.log | awk -F ":" '{print $1}' | sort | uniq -c | sort -r

ELB adds individual files for each ELB node to your bucket. You can compare the number of lines in your access log files over a specific time period.

Flush the DNS cache

Routing based on out-of-date DNS entries results in an imbalanced RequestCount pattern across different Availability Zones. For more information, see Application Load Balancer metrics or Classic Load Balancer metrics. Flush your client's DNS cache to be sure that it uses current DNS records for load balancer nodes.

Note: When cross-zone load balancing is enabled, then the load balancer can still evenly balance requests at the instance level.

For Linux clients using nscd for DNS caching, run one of the following commands:

sudo /etc/init.d/nscd restart
# service nscd restart
# service nscd reload

For Linux clients using dnsmasq for DNS caching, run one of the following commands:

$ sudo /etc/init.d/dnsmasq restart
# service dnsmasq restart

For Linux clients using BIND for DNS caching, run one of the following commands:

# /etc/init.d/named restart
# rndc restart
# rndc exec

For Windows clients, run the following command:

ipconfig /flushdns

Note: If you cleared your client’s DNS cache but still experience caching issues, then be sure that your client application isn’t caching DNS records.

Check the configuration of sticky sessions

If you use duration-based session stickiness, then configure an appropriate cookie expiration time for your specific use case. For more information, see:

If you set session stickiness from individual applications, then use session cookies instead of persistent cookies where possible. For more information, see Application-controlled session stickiness (Classic Load Balancers).

Check healthy instance distribution across Availability Zones

If there's an unequal number of available healthy instances in your Availability Zones and cross-zone load balancing is disabled, then ELB must balance requests across fewer instances in the affected Availability Zones. The remaining healthy instances process a higher number of requests to compensate, which can negatively impact performance.

Note: A traffic load imbalance across instances or Availability Zones doesn’t necessarily mean that resource utilization is also imbalanced. For example, an imbalance can happen when one or more instances that are behind a load balancer process requests faster than the others.

Maintain an equal number of instances in each enabled Availability Zone. To add more instances as load balancer targets, see:

For Classic Load Balancers and Network Load Balancers, consider enabling cross-zone load balancing to distribute requests at the instance level instead of the Availability Zone level. For more information, see Cross-zone load balancing (Network Load Balancers) or Configure cross-zone Load Balancing for your Classic Load Balancer. Cross-zone load balancing is always enabled for Application Load Balancers.

Check instance type distribution

A Classic Load Balancer with HTTP or HTTPS listeners might route more traffic to higher-capacity instance types. This distribution aims to prevent lower-capacity instance types from having too many outstanding requests. For more information, see Instance types. It’s a best practice to use similar instance types and configurations to reduce the likelihood of capacity gaps and traffic imbalances.

A traffic imbalance might also occur if you have instances of similar capacities running on different Amazon Machine Images (AMIs). In this scenario, the imbalance of the traffic in favor of higher-capacity instance types is desirable.

Check for long-lived TCP connections

Elastic Load Balancing routes TCP traffic using a round-robin algorithm. Long-lived TCP connections between clients and instances cause uneven traffic load distribution by design. As a result, new instances take longer to reach connection equilibrium. Be sure to check your metrics for long-lived TCP connections that might be causing issues. Also note that with TCP listeners, the load balancer distributes traffic only at the connection level. This means, for example, that clients that are frequently reusing connections for sending and receiving multiple HTTP requests might produce unbalanced traffic at the instance level. Consider moving to a Layer 7 load balancer if your application supports higher layer network protocols such as HTTP, HTTPS, WebSocket, or HTTP2.

Check your load balancer’s RequestCount patterns and other relevant metrics. For more information, see:

Check for WebSocket connections

Clients using load balancers with WebSocket connections use a 1:1 connection between the client and target. This connection remains stuck to the target during the duration of the WebSocket connection, causing unequal traffic distribution. Application Load Balancers provide native support for WebSockets. Only new HTTP requests that are upgraded to WebSockets go to the new targets. WebSockets also work with Classic Load Balancers and Network Load Balancers with layer 4 listener.

For more information, see Listener configuration.


Did this article help?


Do you need billing or technical support?