How can I troubleshoot unhealthy Route 53 health checks?

Last updated: 2020-06-17

The Amazon Route 53 health checks I created are reporting as unhealthy. How can I troubleshoot and fix the issues?

Resolution

First, you must determine the reason for the last health check failure using the AWS Management Console. Or, you can use the get-health-check-last-failure-reason command in the AWS Command Line Interface (AWS CLI). After you identified the health check type, complete the corresponding troubleshooting steps to identify and fix the issue.

Note: Regardless of health check type, be sure to check the status of the "Invert health check status" option. If this option is set to "true", then Route 53 considers the health check unhealthy when the health checkers mark the health check as healthy, and vice-versa.

Troubleshoot a health check that monitors an endpoint

Cause: This issue is indicated by the "The health checker could not establish a connection within the timeout limit." error message. This error is caused by a timeout that happens when health checkers attempt to establish a connection with the configured endpoint. The minimum time to establish a connection differs based on the health check protocol (TCP, HTTP, or HTTPS):

  • For TCP health checks, the TCP connection between the health checkers and the endpoint must happen within ten seconds.
  • For HTTP and HTTPS health checks, the TCP connection between the health checkers and the endpoint must happen within four seconds. The endpoint must respond with a 2xx or 3xx HTTP status code within two seconds after establishing a connection. For more information, see How Amazon Route 53 determines whether a health check is healthy.

Steps:

1.    In the heath check configuration, note the "Domain name" or "IP address" of the endpoint.

2.    Access the endpoint. Confirm that the firewall or server allows connections from the Route 53 public IP addresses for the Regions enabled in the health check configuration. See IP ranges and search for "service": "ROUTE53_HEALTHCHECKS". If the endpoint resources are on AWS, configure security groups and network access control lists (NACLs) to allow the IP addresses of the Route 53 health checkers.

3.    Use the following tools to test connectivity with the configured endpoint over the internet. Be sure to replace the placeholders in the commands with your respective values.

TCP test:

$ telnet <domain name / IP address> <port>

HTTP/HTTPS test:

$ 

curl -Ik -w "HTTPCode=%{http_code} TotalTime=%{time_total}\n" <http/https>://<

domain-name/ip address>:<port>/<path>

 -so /dev/null 

Compare the output of the preceding with the timeout values for the health checks. Then, confirm that your application is responding within the respective timelines.

4.    If enabled, use the Latency graph option in the health check configuration to check the metrics graph for "TCP Connection Time," "Time to first byte," and "Time to complete SSL handshake." For more information, see Monitoring the latency between health checkers and your endpoint.

Note: If the Latency graph isn't enabled, you can't edit existing health checks. Instead, you must create a new health check.

Troubleshoot health checks with string match condition

Cause: This issue is indicated when the endpoint server returns "200 OK", but Route 53 marks the health check as unhealthy. Health checkers must establish a TCP connection with the endpoint within four seconds. Health checkers must then receive an HTTP status code of 2xx or 3xx in the next two seconds. Then, the configured string must appear in the first 5,120 bytes of the response body within the next two seconds. If the string isn't present in the first 5,120 bytes, Route 53 marks the health check as unhealthy.

Steps:

To verify whether the string appears entirely in the first 5,120 bytes of the response body, use the following command. Be sure to replace “$search-string” with the actual string.

$ curl -sL <http/https>://<domain-name>:<port> | head -c 5120 | grep $search-string   

Troubleshoot a health check that monitors a CloudWatch alarm

Cause: Route 53 doesn't wait for the Amazon CloudWatch alarm to go into the ALARM state as it monitors the metric data stream instead of the state of the CloudWatch alarm.

Steps:

1.    Verify the configuration of the health check that's in the "INSUFFICIENT DATA" state. If the metric data stream provides insufficient information to determine the state of the alarm, then the health check status depends on the "InsufficientDataHealthStatus" configuration. The status options for the "InsufficientDataHealthStatus" setting are "healthy", "unhealthy", or "last known status".

2.    If you've updated the configuration of the CloudWatch alarm, then the new settings do not automatically appear in the associated health check. In the Route 53 console, choose Health Checks. Select the health check, and then choose Synchronize configuration. This action synchronizes the health check configuration with the updated CloudWatch alarm's configuration.


Did this article help you?

Anything we could improve?


Need more help?