My Amazon CloudWatch metric SurgeQueueLength for my Classic Load Balancer has an increased maximum statistic. Clients also receive HTTP 503 Service Unavailable or HTTP 504 Gateway Timeout errors when they try to connect to my Classic Load Balancer. How do I troubleshoot these Elastic Load Balancing capacity issues?
The Classic Load Balancer metric SurgeQueueLength measures the total number of requests queued by your Classic Load Balancer. An increased maximum statistic for SurgeQueueLength indicates that backend systems aren't able to process incoming requests as fast as the requests are received. Possible reasons for a high SurgeQueueLength metric include:
- Overloaded Amazon Elastic Compute Cloud (Amazon EC2) instances behind the Classic Load Balancer that are unable to process all incoming requests
- Application dependency issues due to external resource performance issues
- Maximum allowable connection limits for instances
When requests exceed the maximum SurgeQueueLength, the SpilloverCount metric starts to measure rejected requests. The maximum SurgeQueueLength is 1024.
- Configure Auto Scaling groups with your Classic Load Balancer to scale your instances based on demand.
- Configure CloudWatch to monitor your Classic Load Balancer.
- Enable detailed monitoring for instances behind your Classic Load Balancer to monitor the CPUUtilization metric. You can also get statistics for a specific instance. If CPU utilization spikes, your instances are too busy processing existing requests to accept new requests. Consider scaling out your instances or scaling up to an instance type with more processing power.
- Troubleshoot HTTP errors for your Classic Load Balancer. Load-related issues typically return 503 or 504 HTTP errors.
- If your EC2 instances run Apache, see How do I tune memory allocation for an Apache web server running on an Amazon EC2 Linux instance?
- For performance issues related to normal traffic increases over time, consider increasing your instance capacity.