AWS Compute Blog

Setting up AWS PrivateLink for Amazon ECS, and Amazon ECR

Amazon ECS and Amazon ECR now have support for AWS PrivateLink. AWS PrivateLink is a networking technology designed to enable access to AWS services in a highly available and scalable manner. It keeps all the network traffic within the AWS network. When you create AWS PrivateLink endpoints for ECR and ECS, these service endpoints appear as elastic network interfaces with a private IP address in your VPC.

Before AWS PrivateLink, your Amazon EC2 instances had to use an internet gateway to download Docker images stored in ECR or communicate to the ECS control plane. Instances in a public subnet with a public IP address used the internet gateway directly. Instances in a private subnet used a network address translation (NAT) gateway hosted in a public subnet. The NAT gateway would then use the internet gateway to talk to ECR and ECS.

Now that AWS PrivateLink support has been added, instances in both public and private subnets can use it to get private connectivity to download images from Amazon ECR. Instances can also communicate with the ECS control plane via AWS PrivateLink endpoints without needing an internet gateway or NAT gateway.

 

This networking architecture is considerably simpler. It enables enhanced security by allowing you to deny your private EC2 instances access to anything other than these AWS services. That’s assuming that you want to block all other outbound internet access for those instances. For this to work, you must create some AWS PrivateLink resources:

  • AWS PrivateLink endpoints for ECR. This allows instances in your VPC to communicate with ECR to download image manifests
  • Gateway VPC endpoint for Amazon S3. This allows instances to download the image layers from the underlying private Amazon S3 buckets that host them.
  • AWS PrivateLink endpoints for ECS. These endpoints allow instances to communicate with the telemetry and agent services in the ECS control plane.

This post explains how to create these resources.

Create an AWS PrivateLink interface endpoint for ECR

ECR requires two interface endpoints:

  • com.amazonaws.region.ecr.api
  • com.amazonaws.region.ecr.dkr

In the VPC console, create the interface VPC endpoints for ECR using the endpoint creation wizard. Choose AWS services and select an endpoint. Substitute your AWS Region of choice.

Next, specify the VPC and subnets to which the AWS PrivateLink interface should be added. Make sure that you select the same VPC in which your ECS cluster is running. To be on the safe side, select every Availability Zone and subnet from the list. Each zone has a list of the subnets available. You can select all the subnets in each Availability Zone.

However, depending on your networking needs, you might also choose to only enable the AWS PrivateLink endpoint in your private subnets from each Availability Zone. Let instances running in a public subnet continue to communicate with ECR via the public subnet’s internet gateway.

Next, enable Private DNS Name, which is required for the endpoint.

com.amazonaws.region.ecr.dkr.

A private hosted zone enables you to access the resources in your VPC using the Amazon ECR default DNS domain names. You don’t need to use the private IPv4 address or the private DNS hostnames provided by Amazon VPC endpoints. The Amazon ECR DNS hostname that the AWS CLI and Amazon ECR SDKs use by default (https://api.ecr.region.amazonaws.com) resolves to your VPC endpoint.

If you enabled a private hosted zone for com.amazonaws.region.ecr.api and you are using an SDK released before January 24, 2019, you must specify the following endpoint when using an SDK or the AWS CLI. Use the following command:

aws --endpoint-url https://api.ecr.region.amazonaws.com

If you don’t enable a private hosted zone, use the following command:

aws --endpoint-url https://VPC_Endpoint_ID.api.ecr.region.vpce.amazonaws.com ecr describe-repositories

If you enabled a private hosted zone and you are using the SDK released on January 24, 2019 or later, use the following command:

aws ecr describe-repositories

Lastly, specify a security group for the interface itself. This is going to control whether each host is able to talk to the interface. The security group should allow inbound connections on port 443 from the instances in your cluster.

You may have a security group that is applied to all the EC2 instances in the cluster, perhaps using an Auto Scaling group. You can create a rule that allows the VPC endpoint to be accessed by any instance in that security group.

Finally, choose Create endpoint. The new endpoint appears in the list.

Add a gateway VPC endpoint for S3

The next step is to create a gateway VPC endpoint for S3. This is necessary because ECR uses S3 to store Docker image layers. When your instances download Docker images from ECR, they must access ECR to get the image manifest and S3 to download the actual image layers.

S3 uses a slightly different endpoint type called a gateway. Be careful about adding an S3 gateway to your VPC if your application is actively using S3. With gateway endpoints, your application’s existing connections to S3 may be briefly interrupted while the gateway is being added. You may have a busy cluster with many active ECS deployments, causing image layer downloads from S3. Or, your application itself may make heavy usage of S3. In that case, it’s best to create a fresh new VPC with an S3 gateway, then migrate your ECS cluster and its containers into that VPC.

To add the S3 gateway endpoint, select com.amazonaws.region.s3 on the list of AWS services and select the VPC hosting your ECS cluster. Gateway endpoints are added to the VPC route table for the subnets. Select each route table associated with the subnet in which the S3 gateway should be.

Instead of using a security group, the gateway endpoint uses an IAM policy document to limit access to the service. This policy is similar to an IAM policy but does not replace the default level of access that your applications have through their IAM role. It just further limits what portions of the service are available via the gateway.

It’s okay to just use the default Full Access policy. Any restrictions you have put on your task IAM roles or other IAM user policies still apply on top of this policy. For information about a minimal access policy, see the Minimum Amazon S3 Bucket Permissions for Amazon ECR.

Choose Create to add this gateway endpoint to your VPC. When you view the route tables in your VPC subnets, you see an S3 gateway that is used whenever ECR Docker image layers are being downloaded from S3.

Create an AWS PrivateLink interface endpoint for ECS

In addition to downloading Docker images from ECR, your EC2 instances must also communicate with the ECS control plane to receive orchestration instructions.

ECS requires three endpoints:

  • com.amazonaws.region.ecs-agent
  • com.amazonaws.region.ecs-telemetry
  • com.amazonaws.region.ecs

Create these three interface endpoints in the same way that you created the endpoint for ECR, by adding each endpoint and setting the subnets and security group for the endpoint.

After the endpoints are created and added to your VPC, there is one additional step. Make sure that your ECS agent is upgraded to version 1.25.1 or higher. For more information, see the instructions for upgrading the ECS agent.

If you are already running the right version of the ECS agent, restart any ECS agents that are currently running in the VPC. The ECS agent uses a persistent web socket connection to the ECS backend and VPC endpoints do not interrupt existing connections. The agent continues to use its existing connection instead of establishing a new connection through the new endpoint, unless you restart it.

To restart the agent with no disruption to your application containers, you can connect using SSH to each EC2 instance in the cluster and issue the following command:

sudo docker restart ecs-agent

This restarts the ECS agent without stopping any of the other application containers on the host. Your application may be stateless and safe to stop at any time, or you may not have or want SSH access to the underlying hosts. In that case, choose to just reboot each EC2 instance in the cluster one at a time. This restarts the agent on that host while also restarting any service launched tasks on that host on a different host.

Conclusion

In this post, I showed you how to add AWS PrivateLink endpoints to your VPC for ECS and ECR, including an S3 gateway for ECR layer downloads.

The instances in your ECS cluster can communicate directly with the ECS control plane. They should be able to download Docker images directly without needing to make any connections outside of your VPC using an internet gateway or NAT gateway. All container orchestration traffic stays inside the VPC.

If you have questions or suggestions, please comment below.