How do I troubleshoot 504 errors returned while using an Application Load Balancer?

Last updated: 2022-04-25

I found HTTP 504 errors in Application Load Balancer access logs, Amazon CloudWatch metrics, or when connecting to my service through an Application Load Balancer. How do I fix this?

Short description

A HTTP 504 error is a HTTP status code that indicates a gateway or proxy has timed out.

Application Load Balancer HTTP 504 errors can occur if:

  • The load balancer failed to establish a connection to the target before the connection timeout expired (10 seconds).
  • The load balancer established a connection to the target but the target didn't respond before the idle timeout period elapsed.
  • The network ACL for the subnet didn't allow traffic from the targets to the load balancer nodes on the ephemeral ports (1024-65535).
  • The target returns a Content-Length header value that is larger than the entity body. The load balancer timed out waiting for the missing bytes.
  • The target is an AWS Lambda function and the service didn't respond before the connection timeout expired.

Resolution

Check your load balancers idle timeout and modify if necessary

Load balancer HTTP 504 errors can occur if the backend instance didn't respond to the request within the configured idle timeout period. By default, the idle timeout for Application Load Balancer is 60 seconds.

If CloudWatch metrics are enabled, check CloudWatch metrics for your Application Load Balancer. The HTTPCode_ELB_5XX metric indicates the 504 error originated from the load balancer. If there aren't any HTTPCode_ELB_504_Count metric datapoints, the 504 errors are being returned by your application servers, not the load balancer.

Check the maximum and average values for the CloudWatch metric TargetResponseTime. The timeout value may indicate the time elapsed after the load balancer request was received from the target.

To resolve this:

Modify the idle timeout for your load balancer so that the HTTP request completes within the idle timeout period.

-or-

Modify your application to respond to the HTTP request faster. Make sure that the application doesn't take longer to respond than the configured idle timeout.

(Optional) Add the following custom filters on the backend web servers application logs to help determine the cause of the slow response times:

Apache web server

a- Apache : %D in log format
b- Nginx: $request_time and $upstream_response_timein log format
c- IIS: “time-taken” in log format
d- Apache Tomcat Access logs: %D in log format

Make sure that your load balancer allows traffic with registered targets

Verify that the network security groups associated with the load balancer and the backend targets allow traffic from each other in both directions on the traffic and health check ports. Make sure that the network ACL for the subnet allows traffic from the targets to the load balancer nodes on the ephemeral ports (1024-65535).

Note: It's a best practice to use the following security group rules for your Application Load Balancer.

If you review the CloudWatch metric TargetConnectionErrorCount with sum statistic, you are likely to see positive datapoints. For example, the number of connections that aren't successfully established between the load balancer and target.

For more information, see Configure the idle timeout using the console.

Make sure that your Lambda function responds before the connection timeout expires

If your target is a Lambda function, check the performance metric duration with max statistics to verify the amount of time that event processes. For more information, see Using performance metrics.