How do I troubleshoot issues related to blue/green deployments in Amazon ECS?

Last updated: 2022-04-13

I want to troubleshoot issues related to blue/green deployments for services hosted on Amazon Elastic Container Service (Amazon ECS).

Short description

The most common issues related to blue/green deployments for services hosted on Amazon ECS are the following:

AWS Identity and Access Management (IAM) related issues:

  • You're unable to create your ECS service because you're getting this error: Please create your Service role for CodeDeploy
  • You're getting this error: service failed to launch a task with (error ECS was unable to assume the role that was provided for this task. Please verify that the role being passed has the proper trust relationship and permissions and that your IAM user has permissions to pass this role)

Load balancer/ECS related issues:

  • Your ECS service is failing to stabilize due to health check failures.
  • You're getting this error: The ELB could not be updated due to the following error: Primary taskset target group must be behind listener
  • Traffic is still routed to the blue target group after successful deployment.
  • Your ECS tasks running in the ECS Service are failing Application Load Balancer health checks only during a new GREEN deployment.
  • Your ECS tasks are inconsistently failing Application Load Balancer health checks.
  • Your ECS Service is unable to place a task because no container instance meets all the requirements. The closest matching container instance has insufficient CPU units available.

AWS CloudFormation related issues (if you're performing blue/green deployment through CloudFormation):

  • When creating a change set that triggers a blue/green deployment, CloudFormation stack fails with an Internal Failure error.
  • You're getting an error when creating a change set to trigger the blue/green deployment: 'CodeDeployBlueGreenHook' of type AWS::CodeDeploy::BlueGreen failed with message: The TaskDefinition logical Id [ ] is the same between initial and final template, CodeDeploy can't perform BlueGreen style update properly

Resolution

You are unable to create your ECS Service because you get the error: Please create your Service role for CodeDeploy:

You get this error because AWS CodeDeploy doesn't have the required IAM permissions to action the blue/green deployment strategy. You must grant the CodeDeploy service permissions to update your Amazon ECS service on your behalf.

To troubleshoot this error, verify that your CodeDeploy IAM role is created correctly and has the required permissions.

To create an IAM role for CodeDeploy, do the following:

  1. Open the IAM console.
  2. In the navigation pane, choose Roles.
  3. Choose Create role
  4. In the Select type of trusted entity section, choose AWS service, and then choose CodeDeploy.
  5. In the Select your use case section, choose CodeDeploy - ECS, and then choose Next:Permissions.
    Note: Keep the default AWSCodeDeployforECS policy. This policy includes the permissions that CodeDeploy requires for interacting correctly with Amazon ECS and other services.
  6. Choose Next: Tags.
  7. (Optional) Enter a tag name, and then choose Next: Review.
  8. For Role name, enter ecsCodeDeployRole.
  9. Choose Create role.

You are getting the error: service failed to launch a task with (error ECS was unable to assume the role that was provided for this task: Verify that the IAM role being passed has the proper trust relationship and permissions and that your IAM user has permissions to pass this role):

Check the IAM role returned by the error message to make sure that the Amazon Elastic Compute Cloud (Amazon EC2) instance has a trusted relationship for the ECS tasks service ecs-tasks.amazonaws.com. The trust relationship for your role must look similar to the following:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": [
          "ec2.amazonaws.com",
          "ecs-tasks.amazonaws.com"
        ]
      },
      "Action": "sts:AssumeRole"
    }
  ]
}
  • Your ECS service is failing to stabilize due to health check failures: Be sure that the port mappings of your task definitions are matching with the ports of your target groups. For more information, see How can I get my Amazon ECS tasks running using the Amazon EC2 launch type to pass the Application Load Balancer health check in Amazon ECS?
  • You are getting the error: Error Message: The ELB could not be updated due to the following error: Primary taskset target group must be behind listener : You get this error when your Elastic Load Balancing listeners or target groups are misconfigured. Be sure that the ELB primary listener and test listener are both pointing to the primary target group that's currently serving your workloads.
  • Traffic is still routed to the blue target group after successful deployment: CodeDeploy automatically updates the primary Listener of your load balancer to point to the green target group after the deployment is complete. However, CodeDeploy updates only the production listener that you specified. If CodeDeploy fails to switch traffic after the deployment, then your ELB listeners might be configured with the wrong traffic type. Be sure that you specified the correct protocol and port for the primary ELB listener.
  • Your ECS tasks running in the ECS Service are failing Application Load Balancer health checks only during a new green deployment: Check whether any other ECS service is trying to register its tasks to the same green target group, causing a discrepancy. Update the load balancer configuration to make sure that only one ECS service or port is registered to one target group.
  • Your ECS tasks are inconsistently failing Application Load Balancer health checks: This issue might happen when your containers are taking more than the expected time to start. Check your container application code to find the cause for the delay. To resolve this issue, optimize the application code. If you still can't resolve the issue, then include a health check grace period on your ECS Service so that the containers get enough time to start.
  • Your ECS Service is unable to place a task because no container instance meets all of its requirements. The closest matching container instance has insufficient CPU units available: Be sure that you have enough container instance resources before performing a blue/green deployment.

Note: These troubleshooting steps are applicable only if you're using CloudFormation for your blue/green deployment

  • When creating a change set that triggers a blue/green deployment, CloudFormation stack fails with an Internal Failure error: To mitigate this issue, use a CloudFormation service role and attach this role to your CloudFormation stack. Be sure that the service role has the necessary permissions to run all stack operations. Note that you can't remove the service role from the stack after the stack is created.
  • You're getting an error when creating a change set to trigger the blue/green deployment: 'CodeDeployBlueGreenHook' of type AWS::CodeDeploy::BlueGreen failed with message: The TaskDefinition logical Id [ ] is the same between initial and final template, CodeDeploy can't perform BlueGreen style update properly: When you specify a test listener that's already pointing to the green target group, then the CodeDeploy hook fails with this error. Be sure that your test listener isn't already pointing to the green target group before running the blue/green deployment.

Important: Don't use the UpdateService API to cancel and roll back the blue/green deployment. Instead, use the CreateDeployment API. To roll back a deployment, use the deploy StopDeployment API.