Clients experience high latency when connecting to web applications running on EC2 instances registered to an Elastic Load Balancing (ELB) load balancer.

High latency can be caused by several factors, such as:

  • Network connectivity
  • ELB configuration
  • Backend web application server issues including but not limited to:
    • Memory utilization – One of the most common causes of web application latency is when most or all available physical memory (RAM) has been consumed on the host EC2 instance.
    • CPU utilization – High CPU utilization on the host EC2 instance can significantly degrade web application performance and in some cases cause a server crash.
    • Web server configuration – If a backend web application server exhibits high latency in the absence of excessive memory or CPU utilization, the web server configuration should be reviewed for potential problems.
    • Web application dependencies – If a backend web application server exhibits high latency after ruling out memory, CPU, and web server configuration issues, then web application dependencies such as external databases or Amazon S3 buckets may be causing performance bottlenecks.

Follow these steps to determine what is causing high latency:

Run a Linux curl command to measure the first byte response to determine if one or more backend web application servers are experiencing high latency:


You can also determine which backend web application servers are experiencing high latency by reviewing the ELB access log and checking which web application servers are associated with a high value for "backend processing time."

Focus web application troubleshooting on any backend servers that exhibit high latency.

The Latency metric represents the time elapsed, in seconds, after the request leaves the load balancer until a response is received by the load balancer from a registered instance. The preferred statistic for this metric is average, which reports average latency for all requests. A high Latency average value typically indicates a problem with the backend server(s) rather than a problem with the load balancer. Check the maximum statistic to determine the number of latency data points that reach or exceed the load balancer idle timeout value. When latency data points meet or exceed the idle timeout value, it is likely that some requests are timing out, which initiates an HTTP 504 response to clients.


If the Latency maximum statistic appears to spike at regular intervals or follow a particular pattern, this may indicate performance problems on backend web application servers or application dependency servers due to overhead incurred when running scheduled tasks.

This metric provides a count of the total number of requests that are pending submission (queued) for a registered instance. The maximum value for SurgeQueueLength is 1024. When the max SurgeQueueLength value is exceeded, the sum statistic of the SpilloverCount metric begins to measure the total number of requests that were rejected due to the queue being full. For more information about troubleshooting problems with a high SurgeQueueLength value, see How do I troubleshoot Elastic Load Balancing capacity issues?

Open the ELB access log to locate the cause of the high latency. The access log captures "backend_processing_time," which records the total time elapsed (in seconds) from when the load balancer sends the request (HTTP listener)/first byte (TCP listener) to a registered instance and the instance begins sending the response headers/first byte. For details on the access log, see Monitor Your Load Balancer Using Elastic Load Balancing Access Logs. if you notice higher than expected values for "request_processing_time" and "response_processing_time," contact AWS support.

Check available memory – Running out of memory can cause high latency. When this happens, the operating system attempts to free RAM by moving some of it to swap, which is a reserved amount of space on your hard drive. A web application server should avoid swapping memory to disk because swapping increases the latency of each request significantly. This in turn may cause the user to attempt reloading pages, which increases load and exacerbates the problem. Keep an eye on the number of Apache processes, and the total RAM used. Run the following Linux command to display the number of Apache processes and the total RAM used, in a tabular format that updates every second:

watch -n 1 "echo -n 'Apache Processes: ' && ps -C apache2 --no-headers | wc -l && free -m"

The output generated when running this command is similar to the following:


Check CPU utilization – The web server's CPU is used to obtain and serve web pages to your visitors; whether these pages are static or dynamic is irrelevant. More CPU cycles or resources are used when your web pages are served dynamically from a database or script. Check the CloudWatch metric CPU Utilization to monitor the CPU usage and determine if upgrading to a larger instance type is needed. The average statistic will give you a general idea of the overall CPU utilization. You can also measure the maximum statistic to check for CPU spikes, which can cause latency issues.


Check the web server configuration – Most web servers provide a configurable MaxClient setting that defines the maximum number of web server processes that can be created. This setting limits how many clients can be served by your web server simultaneously. If your web server has plenty of available RAM and CPU resources but still exhibits high latency, you might need to check this value and see if it is set too low. If it is too low, client connections are deferred to the queue and can eventually time out.

Again using Apache web server as an example, if the Apache web server is using Prefork multi processing modules (MPM), we can count the number of processes launched by Apache to measure the number of the concurrent connections. This is the case because Prefork MPM uses multiple child processes with one thread each, and each process handles one connection at a time.

Run the following command to determine the number of processes created by an Apache web server (httpd):

[root@ip-  conf]# ps aux | grep httpd | wc -l 15

The output of this command allows you to compare the number of all Apache processes with the MaxClient setting in the Apache web server configuration file:


If you find the number of Apache processes consistently reaching the value set for MaxClient, there is a good chance your end users are experiencing slowness.

Check web server dependencies – If the CPU, memory, and configuration of your backend instances are all good, it can be useful to consider dependencies. Some questions that you may consider when evaluating web server dependencies might include:

  • Do your web servers depend on a shared database, which might be overloaded?
  • Do the web servers connect to external resources (an S3 bucket, for example)?
  • How does your web server connect to external resources? For example, a connection through an improperly sized NAT instance may be limiting the throughput necessary to provide adequate performance.
  • Are the web servers calling a remote web service that is running slowly?
  • Are the web servers connecting to external resources through a backend load balancer that is serving as a proxy server to other backend instances?

These are just a few examples of considerations that may apply when evaluating the impact of web server dependencies on web application server performance and high latency.

Elastic Load Balancing, backend troubleshooting, maxclient, RAM usage, CloudWatch latency metric, CPU utilization, process monitor

Did this page help you? Yes | No

Back to the AWS Support Knowledge Center

Need help? Visit the AWS Support Center.