How do I troubleshoot health check failures for Amazon ECS tasks on Fargate?

Last updated: 2021-01-08

I'm getting health check failures on my Amazon Elastic Container Service (Amazon ECS) tasks on AWS Fargate.

Resolution

Troubleshoot the most common load balancer errors

If you receive either of the following errors:

  • (service AWS-service) (port 8080) is unhealthy in (target-grouparn:uxyztargetgroup/aws-targetgroup/123456789) due to (reason Health checks failed with these codes: [502]) or [request timeout]
  • (service AWS-Service) (port 8080) is unhealthy in target-group tf-20190411170 due to (reason Health checks failed)

Try these troubleshooting steps:

  • If your container is mapped to port 80, confirm that your container security group allows inbound traffic on port 80 for the load balancer.
  • Confirm that the ping port value for your load balancer health is configured correctly. If this port isn't configured correctly, then your load balancer could de-register the container from itself.
  • Define a minimum health check grace period. This instructs the service scheduler to ignore Elastic Load Balancing health checks for a pre-defined time period after a task has been instantiated.
  • Monitor the CPU and memory metrics of the service. For example, high CPU can make your application unresponsive and result in a 502 error.
  • Check your application logs for application errors.
  • Check if the ping port and the health check path are configured correctly.
  • Make sure that your backend database is connected successfully. This assumes that your application is running as a set of tasks launched by Amazon Elastic Container Service (Amazon ECS) on Amazon Elastic Compute Cloud (Amazon EC2) instances. It also assumes that your application can't communicate with the Amazon Relational Database Service (Amazon RDS) database.

Troubleshoot 504 errors

You can receive a 504 error for any of the following reasons:

  • Your load balancer failed to establish a connection to the target before the connection timeout expired (10 seconds).
  • Your load balancer established a connection to the target, but the target didn't respond before the idle timeout period elapsed.
  • The network access control list for your subnet didn't allow traffic from the targets to the load balancer nodes on the ephemeral ports (1024-65535)

If you receive a 504 error, such as the following:

  • (service AWS-Service) (port 8080) is unhealthy in target-group due to (reason Health checks failed with these codes:[504]

Try these troubleshooting steps:

  • Confirm there is a successful response from the backend without delay.
  • Set the response time out value correctly.
    Note: The response time out is the amount of time that your container has to return a response to the health check ping. If this value is lower than the amount of time required for a response, the health check fails.
  • Check the access logs of your load balancer for more information about errors.

Troubleshoot failed container health checks

If you receive the following error, then your service isn't integrated with your load balancer, but the containers in your task are using health checks that your service can't pass:

  • (service AWS-Service) (task ff3e71a4-d7e5-428b-9232-2345657889) failed container health checks

Try the following troubleshooting steps:

  • Confirm that the command that you're passing to the container is correct and has the right syntax.
  • Check your application logs and Amazon CloudWatch logs if the task has been running for a while.

Note: You can't access the underlying host because Fargate is managed by AWS. For further troubleshooting, launch your Amazon ECS tasks in Amazon EC2. Then, connect to your EC2 instances using SSH.