How do I troubleshoot and fix failing health checks for Application Load Balancers?
Last updated: 2020-01-15
The targets registered to my Application Load Balancer aren't healthy. How do I find out why my targets are failing health checks?
To troubleshoot and fix failing health checks for your Application Load Balancer:
1. Check the health of your target to find the reason code and description of your issue.
2. Follow the resolution steps below for the error you received.
Description: Initial health checks in progress.
Resolution: Before a target can receive requests from the load balancer, that target must pass initial health checks. Wait for your target to pass the initial health checks, and then recheck its health status.
Description: Target registration is in progress.
Resolution: The load balancer starts routing requests to the target as soon as the registration process completes and the target passes the initial health checks.
Description: Target deregistration is in progress.
Resolution: When you deregister a target, the load balancer waits until in-flight requests are complete. This is known as the deregistration delay. By default, Elastic Load Balancing waits 300 seconds before completing the deregistration process. However, you can customize this value.
If a deregistering target has no in-flight requests and no active connections, then Elastic Load Balancing immediately deregisters without waiting for the deregistration delay to elapse. The initial state of a deregistering target is draining. After the deregistration delay elapses, the deregistration process completes and the state of the target is unused. If the target is part of an Auto Scaling group, then it can be terminated and replaced.
Description: The load balancer received an error while establishing a connection to the target, or the target response was malformed.
- Verify that your application is running. Use the service command to check the status of services on Linux targets. For Windows targets, check the Services tab of the Windows Task Manager. If the service is stopped, start the service. If the service isn't recognized, verify that the required service is installed.
- Verify that the target is listening for traffic on the health check port. You can use the ss command on Linux targets to verify which ports your server is listening on. For Windows targets, you can use the netstat command.
- Verify that your application responds to the load balancer's health check requests accordingly. The following example shows a typical health check request from the Application Load Balancer that your targets must return with a valid HTTP response. The Host header value contains the private IP address of the target, followed by the health check port. The User-agent is set to ELB-HealthChecker/2.0. The line terminator for message-header fields is the sequence CRLF, and the header terminates at the first empty line followed by a CRLF. If necessary, add a default virtual host to your web server configuration to receive the health check requests.
GET / HTTP/1.1 Host: 10.0.0.1:80 Connection: close User-Agent: ELB-HealthChecker/2.0 Accept-Encoding: gzip, compressed
- The Target Type of your target group determines which network interface that the load balancer sends health checks to on the targets. For example, you can register instance IDs, IP addresses, and Lambda functions. If the target type is instance ID, then the load balancer sends health check requests to the primary network interface of the targets. If the target type is IP address, then the load balancer sends health check requests to the network interface associated with the corresponding IP address. If your targets have multiple interfaces attached, then verify that your application is listening on the correct network interface.
- The ELBSecurityPolicy-2016-08 security policy is used for target connections and HTTPS health checks. Verify that the target provides a server certificate and a key in the format specified in the security policy. Also verify that the target supports one or more matching ciphers and a protocol provided by the load balancer to establish the TLS handshake.
Description: The target is in the stopped or terminated state.
Description: The IP address can't be used as a target because it's in use by a load balancer.
Resolution: When you create a target group, you specify its Target Type. When the target type is IP, don't choose an IP address that's already in use by a load balancer.
Description: The target group isn't used by any load balancer or the target is in an Availability Zone that isn't enabled for its load balancer.
- Check the target group and verify that it's configured to receive traffic from the load balancer.
- Verify that the Availability Zone of the target is enabled for the load balancer.
Description: The target isn't registered to the target group.
Resolution: Verify that the target is registered to the target group.
Description: The health checks didn't return an expected HTTP code.
- Success codes are the HTTP codes to use when checking for a successful response from a target. You can specify values or ranges of values between 200 and 499. The default value is 200. Check your load balancer health check configuration to verify which success codes that it's expecting to receive. Then, inspect your web server access logs to see if the expected success codes are being returned. Modify the success code value if necessary.
- Verify that the ping path is valid. The ping path is the destination on the targets for health checks. Be sure to specify a valid URI (/path?query). The default is /. Modify the ping path value if necessary.
Description: Request timed out.
Resolution: If you can connect, then the target page might not respond before the health check timeout period. Most web servers, such as nginx and IIS, let you log how long the server takes to respond. If your health check requests take longer than the configured timeout, you can:
- Choose a simpler target page for the health check.
- Adjust the health check settings.
If you can't connect:
- Verify that the security group associated with the target allows traffic from the load balancer using the health check port and health check protocol. You can add a rule to the security group to allow all traffic from the load balancer security group. Also, the security group for your load balancer must allow traffic to the targets.
- Verify that the network ACL associated with the subnets for your target allows inbound traffic on the health check port. Verify that it also allows outbound traffic on the ephemeral ports (1024-65535).
- Verify that the network ACL associated with the subnets for your load balancer nodes allows inbound traffic on the ephemeral ports. Verify that it also allows outbound traffic on the health check and ephemeral ports.
- Verify that any OS-level firewalls on the target are allowing health check traffic in and out.
- Verify that the route table for the subnets associated with the target contains an entry that allows health check traffic back to the load balancer.
- Verify that the memory and CPU utilization of your targets are within acceptable limits. If your memory or CPU utilization is too high, add additional targets or increase the capacity of your Auto Scaling Group. If your target is an EC2 instance, you can also upgrade the instance to a larger instance type.