How can I troubleshoot high latency on my Application Load Balancer?

Last updated: 2020-10-02

I'm experiencing high latency and timeouts when I try to access web applications running over targets registered behind an Application Load Balancer. How should I fix these issues?

Short description

Possible causes of high latency on an Application Load Balancer include:

  • Network connectivity issues
  • High memory (RAM) utilization on backend instances
  • High CPU utilization on backend instances
  • Incorrect web server configuration on backend instances
  • Problems with web application dependencies running on backend instances, such as external databases or Amazon Simple Storage Service (Amazon S3) buckets

Resolution

1.    Check for network connectivity issues using the troubleshooting steps in Troubleshoot your Application Load Balancers.

2.    Use curl to measure the first byte response and check if slow DNS resolution is contributing to the latency.

curl -kso /dev/null https://www.example.com -w "==============\n\n 
| dnslookup: %{time_namelookup}\n 
| connect: %{time_connect}\n 
| appconnect: %{time_appconnect}\n 
| pretransfer: %{time_pretransfer}\n 
| starttransfer: %{time_starttransfer}\n 
| total: %{time_total}\n 
| size: %{size_download}\n 
| HTTPCode=%{http_code}\n\n" ; done

Example output:

 | dnslookup: 0.005330
 | connect: 0.006682
 | appconnect: 0.026540
 | pretransfer: 0.026636
 | starttransfer: 0.076980
 | total: 0.077111
 | size: 12130
 | HTTPCode=200

Perform these tests through the Application Load Balancer. Then, perform the tests while bypassing the Application Load Balancer to targets. This approach helps to isolate the component that's inducing latency.

3.    Check the Average statistic of the Amazon CloudWatch TargetResponseTime metric for your Application Load Balancer. If the value is high, there's likely a problem with the backend instances or application dependency servers.

4.    Determine which backend instances are experiencing high latency by checking the access log entries for your Application Load Balancer. Check target_processing_time to find backend instances with latency issues. Also, review the request_processing_time and response_processing_time fields to verify any issues with the Application Load Balancer.

5.    Check the CloudWatch CPUUtilization metric of your backend instances. Look for high CPU utilization or spikes in CPU utilization. For high CPU utilization, consider upgrading your instances to a larger instance type.

6.    Check for memory issues by reviewing the Apache processes on your backend instances.

Example command:

watch -n 1 "echo -n 'Apache Processes: ' && ps -C apache2 --no-headers | wc -l && free -m"

Example output:

Every 1.0s: echo –n 'Apache Processes: ' && ps –C apache2 –no-
headers | wc -1 && free –m
Apache Processes: 27
          total     used     free     shared     buffers     cached
Mem:      8204      7445     758      0          385         4567
-/+ buffers/cache:  2402     5801
Swap:     16383     189      16194

7.    Check the MaxClient setting for the web servers on your backend instances. This setting defines how many simultaneous requests the instance can serve. For instances with appropriate memory and CPU utilization experiencing high latency, consider increasing the MaxClient value.

Compare the number of processes generated by Apache (httpd) with the MaxClient setting. If the number of Apache processes frequently reaches the MaxClient value, consider increasing the value.

[root@ip-192.0.2.0 conf]# ps aux | grep httpd | wc -l 15
<IfModule prefork.c>
StartServers         10
MinSpareServers      5
MaxSpareServers      10
ServerLimit          15
MaxClients           15
MaxRequestsPerChild  4000
</IfModule>

8.    Check for dependencies of your backend instances that might be causing latency issues. Dependencies might include shared databases or external resources (such as Amazon S3 buckets). Dependencies might also include external resource connections, such as network address translation (NAT) instances, remote web services, or proxy servers.

9.    Use the following Linux tools to identify performance bottlenecks on the server.

uptime – Shows load averages to help determine the number of tasks (processes) waiting to run. On Linux systems, this number includes processes waiting to run on the CPU, as well as processes blocked in uninterruptible I/O (usually disk I/O). This data provides a high-level look at resource load (or demand) that must be interpreted using other tools. When Linux load averages increase, there's a higher demand for resources. To determine which resources are in higher demand, you must use other metrics. For example, for CPUs you can use mpstat -P ALL 1 to measure per-CPU utilization, or top or pidstat 1 to measure per-process CPU utilization.

mpstat -P ALL 1 – Shows CPU time breakdowns per CPU, which you can use to check for an imbalance. A single hot CPU might be evidence of a single-threaded application.

pidstat 1 – Shows per-process CPU utilization and prints a rolling summary that's useful for watching patterns over time.

dmesg | tail – Shows the last 10 system messages, if there are any. Look for errors that might cause performance issues.

iostat -xz 1 – Shows the workload applied for block devices (disks) and the resulting performance.

free -m – Shows the amount of free memory. Check that these numbers aren’t near-zero in size, which can lead to higher disk I/O (confirm using iostat), and decreased performance.

sar -n DEV 1 – Shows network interface throughput (rxkB/s and txkB/s) as a measure of workload. Check if any limits have been reached.

sar -n TCP,ETCP 1 – Shows key TCP metrics, including: active/s (number of locally-initiated TCP connections per second), passive/s (number of remotely-initiated TCP connections per second), and retrans/s (number of TCP retransmits per second).

iftop – Shows the connections between your server and a remote IP address that are consuming the most bandwidth. n iftop is available in a package with the same name on Red Hat and Debian-based distributions. However, with Red Hat-based distributions, you might instead find n iftop in a third-party repository.


Did this article help?


Do you need billing or technical support?