How can I activate a DataSync agent across AWS Regions or across accounts using an Amazon VPC endpoint?

Last updated: 2020-08-28

I want to use AWS DataSync to transfer data between the following locations:

  • Between on-premises and AWS
  • Between AWS Regions
  • Between AWS accounts

How can I set up my environments and DataSync agent for this scenario in a private network using an Amazon Virtual Private Cloud (Amazon VPC) endpoint?

Resolution

Important: The following configuration assumes that:

  • The resources won't connect to the public internet except for the connection between the private endpoints to AWS.
  • The source of the data transfer is an on-premises or remote VPC environment with an NFS or SMB data source. The destination of the data transfer is an Amazon VPC that has access to Amazon Simple Storage Service (Amazon S3), Amazon Elastic File System (Amazon EFS), or Amazon FSx. After you complete the setup, you can reverse the transfer direction based on the supported location combinations for DataSync.

Set up the source network environment (NFS or SMB data source)

The DataSync agent runs on the source network that's close to the NFS or SMB data source. For this configuration, the source network can be either on-premises or a private Amazon VPC.

Note: If you want to set up transfers between VPCs using VPC peering, be sure to review the limitations of VPC peering to be sure that the feature supports your configuration.

Set up the destination network environment (Amazon S3, Amazon EFS, or Amazon FSx)

For this configuration, the destination network must be a private Amazon VPC that can access a destination location such as Amazon S3, Amazon EFS, or Amazon FSx. Additionally, you must set up the following on the destination private VPC:

1.    Create a VPC endpoint for DataSync.

2.    Confirm that the subnet associated with the VPC endpoint has at least four IP addresses available for DataSync execution endpoints.

Note: Each DataSync task uses four IP addresses for the task execution endpoints.

3.    Configure a security group for the DataSync VPC endpoints. The security group must allow:

  • Inbound traffic on TCP port 443 to the endpoint
  • Outbound ephemeral traffic
  • Inbound traffic on TCP port range 1024-1062 to the destination VPC endpoint
  • To open an AWS Support channel, allow inbound traffic on TCP port 22

Set up the network connection between the source and destination environments

For this configuration, the data transfer can be from a source on-premises environment to a destination private VPC. Or, the data transfer can be between private VPCs that are in different AWS Regions or belong to different AWS accounts. You must set up the following connection and network requirements between the source and destination environments:

1.    Set up an active network connection between the source environment and the destination VPC endpoint. For example, you can set up this connection using AWS Direct Connect, VPC peering, or a transit VPC.

2.    Confirm that there's no overlap in the private network address space between the source and destination environments. Verify the CIDR blocks.

3.    Confirm that the routing table entries in both the source subnet and destination subnet allow traffic between the networks without issues. For example, if you're using VPC peering, update your route tables for the peering connection.

4.    If there's a firewall between the source and destination networks, you must allow the following:

  • Traffic on TCP port 443 to the destination VPC endpoint
  • Traffic on TCP port range 1024-1062 to the destination VPC endpoint
  • To open an AWS Support channel, allow traffic on TCP port 22

5.    Confirm that all security groups and firewalls allow ephemeral outbound traffic or the use of connection tracking tools.

Set up the machine that you'll use to activate the DataSync agent

You can use a physical computer, a virtual machine, or an Amazon Elastic Compute Cloud (Amazon EC2) instance to activate the DataSync agent. You must set up the following on the machine:

1.    Set up a connection to one of the private networks in the source or destination environment. You must configure valid network routes to both networks.

2.    If there's no internet connection, you must set up network access to the DataSync agent on TCP port 80 (HTTP).

3.    Install the cURL command to get the activation key.

4.    Install the AWS Command Line Interface (AWS CLI) to activate the DataSync agent.

5.    Configure the AWS CLI with AWS Identity and Access Management (IAM) credentials that allow you to activate the DataSync agent, similar to the following:

{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VisualEditor2",
"Effect": "Allow",
"Action": [
"datasync:*"
],
"Resource": "arn:aws:datasync:<AWS region>:<AWS Account ID>:*"
},
{
"Sid": "VisualEditor3",
"Effect": "Allow",
"Action": [
"ec2:*VpcEndpoint*",
"ec2:*subnet*",
"ec2:*security-group*"
],
"Resource": "*"
}
]
}

Note: If you're using an Amazon EC2 instance to activate the agent, then you can attach the IAM role with the correct permissions to the instance profile.

Activate the DataSync agent

1.    Deploy the DataSync agent on a virtual machine (on-premises) or on an EC2 instance (private VPC).

2.    From the machine that you set up in the previous steps, get the DataSync agent's activation key by running the following cURL command:

curl -vvv -G \
  --data-urlencode "activationRegion=<AWS region>" \
  --data-urlencode "gatewayType=SYNC" \
  --data-urlencode "endpointType=PRIVATE_LINK" \
  --data-urlencode "privateLinkEndpoint=<VPC Endpins IP address>" \
  --data-urlencode "redirect_to=https://<AWS region>.console.aws.amazon.com/datasync/home?region=<AWS region>#/agents/create" \
  "http://<DataSync Agent IP>"

Note: You can optionally include --data-urlencode "no_redirect" to simplify and shorten the command and output.

3.    Note the activation key from the command output.

4.    Using the AWS CLI, run the describe-vpc-endpoints command to get the destination VPC endpoint ID:

aws ec2 describe-vpc-endpoints --region <AWS region>

5.    Note the "VpcEndpointId" from the command output, similar to the following:

"VpcEndpointId": "vpce-0ba3xxxxx3752b63"

6.    Using the AWS CLI, run the describe-security-groups command to get the security group ID of the destination VPC. This is the security group that the DataSync execution endpoints will use to connect to the DataSync VPC endpoint.

Note: We recommend that you use the same security group as the VPC endpoint to reduce the complexity of the configuration.

aws ec2 describe-security-groups --region <AWS region>

7.    Note the "GroupID" from the command output, similar to the following:

"GroupId": "sg-000e8edxxxx4e4701"

8.    Using the AWS CLI, run the describe-subnets command to get the subnet ID associated with the VPC endpoint:

Note: It's a best practice that you use the same subnet as the VPC endpoint to reduce the complexity of the configuration.

aws ec2 describe-subnets --region <AWS region>

9.    Note the "SubnetArn" from the command output, similar to the following:

"SubnetArn": "arn:aws:ec2:<AWS region>:<AWS Account ID>:subnet/subnet-03dc4xxxx6905bb76"

10.    Using the AWS CLI, run the create-agent command to activate the DataSync agent:

  • For --activation-key, enter the activation key that you got in step 3.
  • For --vpc-endpoint-id, enter the "VpcEndpointId" that you got in step 5.
  • For --security-group-arns, enter the GroupID that you got in step 7.
  • For --subnet-arns, enter the SubnetArn that you got in step 9.
aws datasync create-agent --agent-name <Agent Name> --vpc-endpoint-id vpce-0cxxxxxxxxxxxxf57 --activation-key UxxxQ-0xxxB-LxxxL-AUxxV-JxxxN --subnet-arns arn:aws:ec2:<AWS region>:<AWS Account ID>:subnet/subnet-0cxxxxxxxxxxxx3 --security-group-arns arn:aws:ec2:<AWS region>:<AWS Account ID>:security-group/sg-xxxxxxxxxxxxxx --region <AWS region>

11.    The command returns the DataSync agent's Amazon Resource Name (ARN):

{
    "AgentArn": "arn:aws:datasync:<AWS region>:<AWS Account ID>:agent/agent-0bxxxxxxxxxxxxxx57c"
}

12.    Run the list-agents command to confirm that you created the agent successfully:

aws datasync list-agents --region <AWS region>

13.    Confirm that your DataSync agent's ARN is returned in the output:

{
    "Agents": [
        {
            "AgentArn": "arn:aws:datasync:<AWS region>:<AWS Account ID>:agent/agent-0bxxxxxxxxxxxxxx57c",
            "Status": "ONLINE",
            "Name": "<Agent Name>"
        }
    ]
}

After your DataSync agent is activated, you can use the DataSync console to create locations and tasks for your transfers.

Troubleshoot errors during DataSync agent activation

1.    'The cURL command returns "errorType=PRIVATE_LINK_ENDPOINT_UNREACHABLE' and doesn't return the activation key'

This error typically occurs when traffic on TCP port 443 is not allowed to the VPC endpoint.

2.    "An error occurred (InvalidRequestException) when calling the CreateAgent operation: Private link configuration is invalid: VPC Endpoint Id should remain unspecified for public-endpoint activation keys"

This error typically occurs when you enter the public activation key for the --activation-key parameter in the create-agent command. You must enter the private activation key for the private endpoint type in this configuration.

3.    "An error occurred (InvalidRequestException) when calling the CreateAgent operation: Invalid EC2 subnet, ARN: arn:aws:ec2:<AWS region>:<AWS Account ID>:subnet/subnet-41xxxx08, reason: invalid subnet, StatusCode: 403"

-or-

"An error occurred (InvalidRequestException) when calling the CreateAgent operation: Invalid EC2 security group, ARN: arn:aws:ec2:<AWS region>:<AWS Account ID>:security-group/sg-000e8xxxx9d4e4701, reason: invalid security group, StatusCode: 403"

-or-

"An error occurred (InvalidRequestException) when calling the CreateAgent operation: Private link configuration is invalid: VPC endpoint vpce-0ba34edxxxx752b63 is not valid"

These errors typically occur when the IAM identity configured on your AWS CLI has insufficient permissions. You must confirm that your IAM identity's policy grants permissions for ec2:*VpcEndpoint*, ec2:*subnet*, and ec2:*security-group*.