How do I troubleshoot high latency on my ELB Classic Load Balancer?

Last updated: 2022-05-17

I am experiencing high latency when connecting to web applications running on Amazon Elastic Compute Cloud (Amazon EC2) instances registered to a Classic Load Balancer. How do I troubleshoot the Elastic Load Balancing latency?

Short description

High latency on a Classic Load Balancer occurs due to the following:

  • Network connectivity issues
  • Improper configuration of the Classic Load Balancer
  • High memory (RAM) utilization on backend instances
  • High CPU utilization on backend instances
  • Improper web server configuration on backend instances
  • Problems with web application dependencies running on backend instances, such as external databases or Amazon Simple Storage Service (Amazon S3) buckets

Resolution

1.    Troubleshoot network connectivity issues for your Classic Load Balancer.

2.    Configure the Classic Load Balancer for your use case.

3.    Determine which backend instances are experiencing high latency by checking the access logs for your Classic Load Balancer. Review backend_processing_time to find backend instances with latency issues.

To verify that a backend instance's web application server is experiencing high latency, use curl to measure the first byte response. For example:

[ec2-user@ip-192.0.2.0 ~]$ for X in `seq 6`; do curl -Ik -w "HTTPCode=%{http_code} TotalTime=%{time_total}\n" http://www.example.com/ -so /dev/null; done

High Latency sample output:
HTTPCode=200 TotalTime=2.452
HTTPCode=200 TotalTime=1.035

Low latency sample output:
HTTPCode=200 TotalTime=0.515
HTTPCode=200 TotalTime=0.013

4.    Check the Average statistic of the CloudWatch Latency metric for your Classic Load Balancer. If the value is high, there is a problem with backend instances or application dependency servers.

5.    Check the Maximum statistic of the Latency metric. If the value meets or exceeds the idle timeout value, requests are timing out which results in HTTP 504 errors.

6.    Check for patterns in the Latency metric. Metric spikes at regular intervals indicate performance problems with backend instances due to overhead from scheduled tasks.

7.    Check the CloudWatch SurgeQueueLength metric for Elastic Load Balancing. If requests to the Classic Load Balancer exceed the maximum value (1024), requests are rejected and an HTTP 503 error is generated by the load balancer. The sum statistic of the SpilloverCount metric measures the total number of rejected requests. For more information, see How do I troubleshoot Classic Load Balancer capacity issues in Elastic Load Balancing?

8.    Check for memory issues by checking the Apache processes on your backend instances.

Example command:

watch -n 1 "echo -n 'Apache Processes: ' && ps -C apache2 --no-headers | wc -l && free -m"

Example output:

Every 1.0s: echo –n 'Apache Processes: ' && ps –C apache2 –no-
headers | wc -1 && free –m
Apache Processes: 27
          total     used     free     shared     buffers     cached
Mem:      8204      7445     758      0          385         4567
-/+ buffers/cache:  2402     5801
Swap:     16383     189      16194

9.    Check the CloudWatch CPUUtilization metric of your backend instances. Look for high CPU utilization or spikes in CPU utilization. For high CPU utilization, consider upgrading your instances to a larger instance type.

10.    Check the MaxClient setting for the web servers on your backend instances, which defines how many simultaneous requests the instance can serve. For instances with appropriate memory and CPU utilization experiencing high latency, consider increasing the MaxClient value.

Compare the number of processes generated by Apache (httpd) with the MaxClient setting. If the number of Apache processes frequently reaches the MaxClient value, consider increasing the value.

Example command:

[root@ip-192.0.2.0 conf]# ps aux | grep httpd | wc -l 15

Example output:

<IfModule prefork.c>
StartServers         10
MinSpareServers      5
MaxSpareServers      10
ServerLimit          15
MaxClients           15
MaxRequestsPerChild  4000
</IfModule>

11.    Check for dependencies that cause latency issues on your backend instances. These include, but aren’t limited to: shared databases, external resources (such as S3 buckets), external resource connections such as network address translation (NAT) instances, remote web services, and proxy servers.


Did this article help?


Do you need billing or technical support?