How do I resolve network interface provision errors for Amazon ECS on Fargate?

Last updated: 2022-09-23

I want to resolve network interface provision errors for Amazon Elastic Container Service (Amazon ECS) on AWS Fargate.

Short description

You can receive the following errors when Fargate has intermittent API issues with the underlying host:

  • If the Fargate service tries to attach an elastic network interface to the underlying infrastructure that the task is meant to run on, then you can receive the following error message: "Timeout waiting for network interface provisioning to complete."
  • If your Fargate tasks can't launch because the elastic network interface wasn't created during the task provisioning state, then you can receive the following error message: "Network interface provision complete error timeout wait for network interface provision."

Note: Create a test elastic network interface manually in the same subnet as your Fargate task to determine if any issues are due to the creation of the elastic network. You can also check the AWS Service Health Dashboard for API issues.

Resolution

If the Fargate task is part of an ECS service, then the ECS Service Scheduler attempts to launch the task automatically again.

A task that's launched using the RunTask API involves an asynchronous workflow. If the workflow started successfully, then a success code is returned. The task doesn't indicate that it's in a RUNNING state. Tasks that are launched manually with the RunTask API require a manual reattempt.

Reattempts can be automated with an exponential backoff and retry logic by using AWS Step Functions.

To create a Step Function that runs the ECS RunTask operation in a synchronous way, perform the following steps:

1. Open the Step Functions console.

2.    Choose Create State Machine .

3.    Choose Write your workflow in code.

4.    Choose Standard for Type. For more information on the different types of workflow, see Standard vs. Express Workflows.

5.    Replace the default content of the Definition section with the following code:

{
  "Comment": "Synchronous RunTask ",
  "StartAt": "Run Synchronous ECS Task",
  "TimeoutSeconds": 3600,
  "States": {
    "Run Synchronous ECS Task": {
      "Type": "Task",
      "Resource": "arn:aws:states:::ecs:runTask.sync",
      "Parameters": {
        "LaunchType": "FARGATE",
        "Cluster": "<ECS_CLUSTER_ARN>",
        "TaskDefinition": "<TASK_DEFINITION_ARN>",
        "NetworkConfiguration": {
          "AwsvpcConfiguration": {
            "Subnets": [
              "<SUBNET_1>",
              "<SUBNET_2>",
            ],
            "AssignPublicIp": "<ENABLED or DISABLED>"
          }
        }
      },
      "Retry": [
        {
          "ErrorEquals": [
            "States.ALL"
          ],
          "IntervalSeconds": 10,
          "MaxAttempts": 3,
          "BackoffRate": 2
        }
      ],
      "Next": "Notify Success",
      "Catch": [
        {
          "ErrorEquals": [
            "States.ALL"
          ],
          "Next": "Notify Failure"
        }
      ]
    },
    "Notify Success": {
      "Type": "Task",
      "Resource": "arn:aws:states:::sns:publish",
      "Parameters": {
        "Message": "AWS ECS Task started by Step Functions reached a RUNNING state",
        "TopicArn": "<SNS_TOPIC_ARN>"
      },
      "End": true
    },
    "Notify Failure": {
      "Type": "Task",
      "Resource": "arn:aws:states:::sns:publish",
      "Parameters": {
        "Message": "AWS ECS Task started by Step Functions failed to reach a RUNNING state",
        "TopicArn": "<SNS_TOPIC_ARN>"
      },
      "End": true
    }
  }
}

6.    Choose Next.

7.    Enter a Name for your State Machine.

8.    Choose a Role to run the state machine and relate resources. It's best practice to select a role that uses the least privilege necessary and to include only the permissions that are necessary for your IAM policies.

These code example show the least privileged permissions:

ECS Policy:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "ecs:RunTask"
      ],
      "Resource": [
        "arn:aws:ecs:*:123456789:task-definition/<TASK_DEFINITION>"
      ],
      "Condition": {
        "ArnLike": {
          "ecs:cluster": "arn:aws:ecs:*:123456789:cluster/<ECS "
        }
      }
    },
    {
      "Effect": "Allow",
      "Action": "iam:PassRole",
      "Resource": [
        "*"
      ],
      "Condition": {
        "StringLike": {
          "iam:PassedToService": "ecs-tasks.amazonaws.com"
        }
      }
    },
    {
      "Effect": "Allow",
      "Action": [
        "ecs:StopTask",
        "ecs:DescribeTasks"
      ],
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "events:PutTargets",
        "events:PutRule",
        "events:DescribeRule"
      ],
      "Resource": [
        "arn:aws:events:us-east-1:123456788:rule/StepFunctionsGetEventsForECSTaskRule"
      ]
    }
  ]
}

SNS Policy:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "sns:Publish"
      ],
      "Resource": [
        "arn:aws:sns:us-east-1:12345678:<TOPIC>"
      ]
    }
  ]
}

Amazon CloudWatch Policy:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "logs:CreateLogDelivery",
        "logs:GetLogDelivery",
        "logs:UpdateLogDelivery",
        "logs:DeleteLogDelivery",
        "logs:ListLogDeliveries",
        "logs:PutResourcePolicy",
        "logs:DescribeResourcePolicies",
        "logs:DescribeLogGroups"
      ],
      "Resource": "*"
    }
  ]
}

9.    Choose your Log Level. This creates the necessary Amazon CloudWatch Log Streams.

10.    Choose Create State Machine.

Integrating your step function with CloudWatch

1.    Open the Amazon Eventbridge Console.

2.    In the navigation panel, choose Events, and then choose Rules.

3.    Choose Create Rule.

4.    Choose Schedule. You can also choose Event if you want to have an event-driven response. To learn more, see Event Patterns in CloudWatch Events.

5.    Choose Add Target.

6.    Choose Step Function State Machine from the dropdown list.

7.    Choose the State Machine that you created.

8.    Choose a role with the appropriate permissions to run the State Machine.

9.    Choose Configure details and provide a name and description for your Rule.

10.    Choose Create Rule.


Did this article help?


Do you need billing or technical support?