Clients sometimes have trouble connecting to my load balancer, receiving HTTP: 503 Service Unavailable or HTTP: 504 Gateway Timeout errors. I am monitoring my load balancer using Amazon CloudWatch and have noticed a significant increase in the max statistic of the SurgeQueueLength metric, which provides a count of the total number of requests that are pending submission (queued) for a registered instance.

Note
When the max SurgeQueueLength is exceeded, the sum statistic of the SpilloverCount metric begins to measure the total number of requests that were rejected due to the queue being full. The max SurgeQueueLength value is 1024.

Surge queue length is defined by the number of requests that are queued by Elastic Load Balancing (ELB). These requests are queued when back-end systems are unable to process incoming requests as fast as the requests are being received. Some of the reasons that a load balancer can have a high max statistic for the SurgeQueueLength metric include:

  • Overloaded back-end instance(s) – Back-end instance resources—CPU, memory, and network—might be overloaded and unable to adequately process incoming requests.
  • Application dependency issues – Modern web applications can have multiple depencencies on external resources such as databases, S3 buckets, or other applications. If there are performance issues with an application's external dependencies, the application's performance is affected. For example, if an application is dependent on a database table that is not properly indexed, database performance can hinder the application's performance.
  • Max connections reached – Back-end web servers might have reached their maximum allowable connection limit and be unable to process new requests.

To alleviate problems with high surge queue length and potential high spillover count, follow these steps:

  1. Enable Auto Scaling with Elastic Load Balancing, as described in Load Balance Your Auto Scaling Group in the Auto Scaling Developer Guide.
  2. Use Amazon CloudWatch to monitor your load balancer, as described in Monitor Your Load Balancer Using Amazon CloudWatch.
  3. Use Amazon CloudWatch to monitor the CPUUtilization metric of your back-end instances as described in Enabling or Disabling Detailed Monitoring on an Amazon EC2 Instance to determine if CPU utilization is spiking excessively. To get statistics for a specific Amazon EC2 instance, see Get Statistics for a Specific EC2 Instance. If CPU utilization is spiking, there is a good chance that your back-end instance(s) are too busy processing existing requests to accept new requests. In this scenario, it may be beneficial to scale out your back-end instances or scale up your back-end instances to a type with additional processing power, such as a compute optimized (C4) instance. For a description of the types and capabilities of Amazon EC2 instances, see Amazon EC2 Instances.
  4. Follow the recommendations for troubleshooting specific HTTP errors returned by your load balancer described at Troubleshooting Elastic Load Balancing: HTTP Errors. Load-related issues typically return HTTP 503: Service Unavailable or HTTP 504: Gateway Timeout errors.
  5. If you determine that surge queue length is increasing as a result of back-end web servers exceeding their maximum allowable connection limit, you might need to increase the number of child processes or the number of threads available to each process. For example, the following configuration file for an Apache web server can be modified to increase child processes or threads per process:
       StartServers                 8
       MinSpareServers          5
       MaxSpareServers         20
       ServerLimit                   256
       MaxClients                   256
       MaxRequestsPerChild 4000
  6. Over time, it is not unusual for a web server to experience increased traffic. When this occurs, your instances might become overutilized, which in turn affect performance. To address this, consider increasing capacity as described in Resizing Your Instance.

Elastic Load Balancing, VPC, SurgeQueueLength, CPUUtilization, Amazon CloudWatch, performance, insufficient resources, web server configuration, capacity


Did this page help you? Yes | No

Back to the AWS Support Knowledge Center

Need help? Visit the AWS Support Center.