How do I troubleshoot Application Load Balancer HTTP 502 errors?

6 minute read
1

I encounter HTTP 502 errors with my Application Load Balancer.

Short description

There are several possible causes for HTTP 502: bad gateway errors, and the source can be either from your target or your Application Load Balancer. To identify the source of the error, use Amazon CloudWatch metrics and access logs.

Before you troubleshoot the error from your Application Load Balancer, make sure that you turn on access logging. To understand what each field means in the access log, see Access log entries.

If the target is an AWS Lambda function, then see Troubleshoot HTTP 502 errors when the target is a Lambda function in the Resolution section.

Resolution

Find the source of the HTTP 502 errors

Using CloudWatch metrics

If data points appear under the HTTPCode_ELB_502_Count metric, then your load balancer is the source of the HTTP 502 errors. If they appear under the HTTPCode_Target_5XX_Count metric, then your target is the source.

Using access logs

If the elb_status_code is "502" and the target_status_code is "-", then your load balancer is the source of the HTTP 502 errors. If the elb_status_code is "502" and the target_status_code is "502", then your target is the source of the errors.

Troubleshoot HTTP 502 errors

Note: Filter the access logs by elb_status_code = "502" and target_status_code to help you determine the cause. Then, complete the relevant steps for your use case.

The load balancer received a TCP RST from the target when attempting to establish a connection

If you receive a TCP RST from the target when establishing a connection, then the load balancer can't establish a TCP 3-way handshake with the target. As a result, the load balancer can't forward the user request to the target.

  • Check if there are data points for the TargetConnectionErrorCount metric. This metric represents the number of connections that aren't successfully established between the load balancer and the target.
  • Check if the request_processing_time, target_processing_time, and response_processing_time, fields in the access logs are each set to value -1. This value means that the load balancer can't dispatch the request to the target because it needs a successful connection.

The following is an example of an access log entry:

http 2022-04-15T16:52:50.757968Z app/my-loadbalancer/50dc6c495c0c9188 192.168.131.39:2817 10.0.0.1:80 -1 -1 -1 502 - 86 155 "GET http://example.com:80/ HTTP/1.1" 
"curl/7.51.0" - - arn:aws:elasticloadbalancing:us-east-1:123456789012:targetgroup/my-targets/73e2d6bc24d8a067" Root=1-58337262-36d228ad5d99923122bbe354"

Note: In this access log entry, the request_processing_time, target_processing_time and response_processing_time are each set to -1.

The load balancer received an unexpected response from the target, such as "ICMP Destination unreachable (Host unreachable)", when attempting to establish a connection

  • Check if the request_processing_time, target_processing_time and response_processing_time fields in the access logs are all set to value -1.
  • Check if traffic is allowed from the load balancer subnets to the targets on the target port.

The target closed the connection with a TCP RST or a TCP FIN while the load balancer had an outstanding request to the target

The load balancer receives a request and forwards it to the target. The target receives the request and starts to process it, but closes the connection to the load balancer too early. This usually occurs when the duration of the keep-alive timeout for the target is shorter than the idle timeout value of the load balancer. Make sure that the duration of the keep-alive timeout is greater than the idle timeout value.

Check the values for the request_processing_time, target_processing_time and response_processing_time fields.

See the following example access log entry:

http 2022-04-15T16:52:50.757968Z app/my-loadbalancer/50dc6c495c0c9188 192.168.131.39:2817 10.0.0.1:80 0.001 4.205 -1 502 - 94 326 "GET http://example.com:80 HTTP/1.1" "curl/7.51.0" - - arn:aws:elasticloadbalancing:us-east-1:123456789012:targetgroup/my-targets/73e2d6bc24d8a067 "Root=1-58337262-36d228ad5d99923122bbe354"

Note: In this access log entry, the request_processing_time is 0.001, the target_processing_time is 4.205, and the response_processing_time is -1.

The target response is malformed or contains HTTP headers that aren't valid

Perform a packet capture on the target for the timeframe of the issue to understand the target response.

The load balancer encountered an SSL handshake error or SSL handshake timeout (10 seconds) when connecting to a target

The TCP connection from the load balancer to the target's HTTPS listener is successful, but the subsequent SSL handshake times out. As a result, the load balancer can't forward the request to the target.

Check if the target group uses the HTTPS protocol. If it doesn't use HTTPS protocol, then the SSL handshake timeout isn't the cause of the issue. If the target group is using the HTTPS protocol, then check the following points:

  • Check if the request_processing_time, target_processing_time and response_processing_time fields in the access logs are all set to value -1.
  • Check if there are data points for the TargetTLSNegotiationErrorCount metric.
  • Perform a packet capture on the target for the timeframe of the issue to validate that it's related to an SSL handshake. If it is, then complete the steps in Perform a packet capture section.
  • Check if the ciphers or protocols are mismatched.

The deregistration delay period elapsed for a request that's handled by a target that was deregistered

In your CloudTrail events, check for an API call with the DeregisterTargets action during the timeframe of the issue. If an API call with DeregisterTargets happened during the timeframe of the issue, then the error is caused by a target that was deregistered too early. To resolve this issue, increase the deregistration delay period so that lengthy operations can complete without failing.

Troubleshoot the HTTP 502 errors when the target is a Lambda function

Note: For requests to a Lambda function that fail, the load balancer stores Lambda-specific error reason codes in the error_reason field of the access logs.

The target is a Lambda function, and the response body exceeds 1 MB

  • Check if there's a data point for the LambdaUserError metric.
  • Check if the error_reason field in the load balancer access log is set to LambdaResponseTooLarge.

The target is a Lambda function that didn't respond before its configured timeout was reached

  • Check the Lambda function timeout configuration.
  • Check if there's a data point for the LambdaUserError metric.
  • Check if the error_reason field in the load balancer access log is set to LambdaUnhandled.

The target is a Lambda function that returned an error, or the function was throttled by the Lambda service

Contact AWS Support for guidance on service throttling.

Perform a packet capture

For Linux, use the following command:

sudo tcpdump -i any -w filename.pcap

For Microsoft Windows, download and use the Wireshark application (from the Wireshark website).

For more details, see How do I troubleshoot network performance issues between EC2 Linux or Windows instances in a VPC and an on-premises host over the internet gateway? Refer to the Test packet capture samples using tcpdump and Take a packet capture sections.

AWS OFFICIAL
AWS OFFICIALUpdated a year ago