AWS Storage Blog

Replicate objects Using AWS DataSync with Amazon S3 compatible storage on Snowball Edge

Users have successfully leveraged Amazon Web Services (AWS) Snow Family to transfer petabytes of data between on premises and AWS Regions since its launch in 2015 with the AWS Snowball device. Increasingly, users are not just migrating data with the AWS Snow Family but now are using AWS Snowball Edge Compute Optimized devices to host applications requiring data processing in locations with denied, disrupted, intermittent, and limited (DDIL) network connectivity. Although data processing at the edge enables faster insights, users often store edge captured data in enterprise data lakes with archival for long-term retention. The simplest way to transfer data back to an AWS Region has been to return the Snow device as part of an import job process. However, a single import job is a one-time solution moving data from on premises to AWS, and it is subject to additional time delays related to shipment return, inbound processing, and data ingestion.

Import jobs are a great choice for large, offline data backups to the cloud related to Business Continuity Planning and legacy data migration, where network connectivity is limited or intermittent. However, a single import job won’t meet real or near real time data pipeline requirements for machine learning (ML) model retraining or enterprise data business intelligence reporting use cases where ongoing, selective data transfers may be needed.

With the 2023 launches of Amazon S3 compatible storage on AWS Snowball Edge Compute Optimized Devices and AWS DataSync for Amazon S3 compatible storage on Snow, processing and storing data across the cloud continuum while meeting data lifecycle requirements is possible.

In this post, we walk through loading an AWS DataSync agent onto an AWS Snowball Edge device as an Amazon Elastic Compute Cloud (Amazon EC2)-compatible compute instance. We also configure DataSync to compare and move Amazon Simple Storage Service (Amazon S3) objects between an AWS Snowball Edge S3 compatible storage bucket and an S3 bucket in an AWS Region. This method has built-in retry and network resiliency mechanisms when operating over an intermittent network connection. Furthermore, DataSync tasks also support defining maximum bandwidth usage when operating with limited network connectivity, and both are common in remote or adverse environments.

Solution overview

In our scenario, we have a camera storing high-resolution, uncompressed video to an S3 bucket on a Snowball Edge device where an artificial intelligence (AI) application running on an Amazon EC2-compatible compute instance on the Snowball Edge processes the video to identify objects of interest. After analysis, the raw video must be down sampled and archived off-site for potential human review.

The solution includes the following AWS services and features:

  • AWS Snowball Edge Compute Optimized device operating on-premises issued from a local compute and storage only job.
    • Amazon S3 compatible storage on Snow with two buckets for data and Amazon Machine Images (AMI)
    • Amazon EC2 DataSync agent configured to replicate objects to an S3 bucket in an AWS Region
  • S3 bucket in an AWS Region
  • DataSync

Architecture image depicting Datasync agent copying files from Snowball Edge to Amazon S3

Figure 1: Snowball Edge S3 DataSync Architecture

Prerequisites

The following prerequisites are necessary to follow along with this post:

  • An AWS account with administrative access to at least one AWS Region (we use the us-east-1 Region in our example). Create a standalone AWS account.
  • An AWS Snowball Edge with Amazon S3 compatible storage on Snow on-premises, access to the unlock code and manifest file, and internet connectivity. See Getting started with Snowball Edge for more information.
  • Facilities support for the Snowball Edge and a workstation: power, network cabling, and networking devices to inter-connect the Snow device, your workstation, and the internet.
  • A Windows or Mac workstation with an additional 90 GigaBytes (GB) of free storage and network connectivity to the AWS Snowball Edge, as well as the following software installed:

Walk through

To replicate objects from Amazon S3 compatible storage on Snow Family devices to an S3 bucket, complete the following steps:

  1. Start the Amazon S3 compatible storage service on your AWS Snowball Edge device.
  2. Configure the AWS CLI to interact with the S3 Control (bucket) and S3 API (object) endpoints.
  3. Create two compatible S3 storage buckets on your AWS Snowball Edge device.
  4. Create an AWS Identity and Access Management (IAM) Role and Policy to enable virtual machine Import/Export (VMIE) service on your AWS Snowball Edge Device.
  5. Create a DataSync AMI from an EC2 instance and export it locally.
  6. Import and start the DataSync agent as an Amazon EC2-compatible compute instance on the AWS Snowball Edge device.
  7. Deploy Infrastructure-as-Code (IaC) to activate the agent, create DataSync task and locations to replicate from AWS Snowball Edge device to an Amazon S3 Regional bucket on a user defined schedule.
  8. Validate the DataSync replication task and execution.

Step 1: Start the Amazon S3 compatible storage service on your AWS Snowball Edge device

After you have ordered, received, installed, and established network connectivity of the Snowball Edge device, the next steps are to connect to the device, configure, and start the Amazon S3 compatible service. On the device, this is called the S3-snow service and is referenced as such in the following configuration steps.

Note that we use the SBE CLI to unlock and start the S3-snow service. Alternatively, you can use AWS OpsHub for Snow Family to have a GUI driven workflow.

  1. Download a copy of the manifest from the AWS Snow Family Management Console. Then, write down the unlock code that appears when you download your manifest. We recommend you keep the unlock code and manifest in separate locations to prevent unauthorized access to the AWS Snowball Edge device while it is at your facility.
  2. Using the Snowball Edge CLI on your workstation, configure a profile “snowsbe” with the Snowball Edge credentials:
snowballEdge configure --profile snowsbe

You are asked to input the path of the manifest file, the unlock code, and the Snowball Edge endpoint in the format: https://<IP ADDRESS>.

  1. Unlock the Snowball Edge device by using this command:
snowballEdge unlock-device --profile snowsbe
  1. Validate that you have unlocked the device by running the following command:
snowballEdge describe-device --profile snowsbe

Once the device has been unlocked, you must set up the Amazon S3 compatible storage on Snow service. The service needs two Virtual Network Interfaces (VNIs): one for the S3 API (object) endpoint and another for the S3 Control (bucket) endpoint. Later, you set up a Terraform variable value that points to the host name or Internet Protocol (IP) address of the S3 endpoint that the DataSync agent interacts with to execute the task of synchronizing the buckets between the edge device and AWS Region. The simplest way to differentiate between the two different endpoints is to determine whether you are performing actions on a bucket or an object. The S3 control endpoint is for bucket operations while the S3 API endpoint is for object operations.

  1. Retrieve the physical network interface ID of the device by running the following command:
snowballEdge describe-device --profile snowsbe
  1. Identify two available IP addresses on the same subnet as your Snowball Edge device. Run the following command to create a VNI and retrieve the “VirtualNetworkingInterfaceArn” value. You must run this twice – once for each Amazon S3 endpoint (each IP address should be entered separately):
snowballEdge create-virtual-network-interface --ip-address-assignment static --physical-network-interface-id "<PHYSICAL_INT_ID>" --static-ip-address-configuration IpAddress=<IP_ADDRESS>,NetMask=<NETMASK> --profile snowsbe
  1. Start the S3-snow service, specifying the VNI Amazon Resource Names (ARNs) for the IP addresses created previously (the order in which they’re specified dictates whether they are a control or object endpoint):
snowballEdge start-service --service-id s3-snow --device-ip-addresses <SNOWBALL_IP> --virtual-network-interface-arns <S3_CONTROL_VNI_ARN> <S3_OBJECT_VNI_ARN> --profile snowsbe

Optional: Validate that the service is Active by running the following command:

snowballEdge describe-service --service-id s3-snow --profile snowsbe

Step 2: Configure the AWS CLI to interact with the S3 Control (bucket) and S3 API (object) endpoints

Now that we have the Amazon S3 compatible storage on Snow service enabled, we must configure the AWS CLI credentials to use the S3 Control and S3 API commands.

1. Run the following commands to retrieve the access key from the Snowball Edge through the Snowball Edge CLI:

snowballEdge list-access-keys --profile snowsbe

2. Use the preceding access key to retrieve the paired secret access key:

snowballEdge get-secret-access-key --access-key-id <ACCESS_KEY_ID> --profile snowsbe

3. List the Snowball Edge certificates and their respective ARNs:

snowballEdge list-certificates --profile snowsbe

4. Use the preceding certificate arn value to download and save the Snowball Edge certificate locally. Save the output certificate to a file with a “.pem” extension:

snowballEdge get-certificate --certificate-arn <CERT_ARN> --profile snowsbe > ~/.aws/snowsbe_cert.pem

5. Set the permissions of the .pem file to read-only and inaccessible to other users on the system:

a. On Linux: chmod 400 ~/.aws/snowsbe_cert.pem
b. On Windows: right click on file > properties > toggle Read-only

6. Edit the ~/.aws/config file, and create a profile for this Snowball Edge device:

Text file depicting how to configure CLI credentials for Snowball Edge

Step 3: Create two Amazon S3 compatible storage buckets on your AWS Snowball Edge device

After setting up the AWS CLI profile for the Snowball Edge device, you can access the Amazon S3 Control API through the AWS CLI to create an S3 bucket.

  1. Create a bucket on the Snowball Edge to store the downsampled videos:
aws s3control create-bucket --bucket <downsampled-bucket-video> --profile snowsbe --endpoint-url https://<S3_CONTROL_IP>
  1. Next create a bucket to store your DataSync agent as an AMI:
aws s3control create-bucket --bucket <ami-bucket> --profile snowsbe --endpoint-url https://<S3_CONTROL_IP>

Step 4: Create an IAM Role and Policy to enable VMIE service on your AWS Snowball Edge Device

To import the DataSync agent as a Snow EC2-compatible compute instance, we use the VMIE service to perform this task on our behalf. The Snow VMIE service needs an IAM Role and IAM Policy that it assumes to grant the necessary permissions needed for the import process.

In our example, we use the CLI to create an IAM Policy, IAM Role, attach the policy to the role, and create a trust policy to allow the Snowball VMIE service to assume the role. For a walkthrough of configuring policies using OpsHub GUI, review Step 3 in this Amazon Storage post.

  1. Download the IAM trust policy file locally and name it “trust-policy.json”:
  2. Create the IAM Role with the referenced trust-policy.json:
aws iam create-role --role-name vmimport --assume-role-policy-document file://trust-policy.json --profile snowsbe --endpoint https://<SNOWBALL_IP>:6089
  1. Download the IAM Policy file locally and name it “iam-policy.json”. Edit the file to reflect your AWS Account ID, Snow Job ID, and <ami-bucket> created in Step 3.2.
  2. Create the IAM Policy on the Snowball Edge using the iam-policy.json file:
aws iam create-policy --policy-name vmimport-resource-policy --policy-document file://iam-policy.json --profile snowsbe --endpoint https://<SNOWBALL_IP>:6089
  1. Attach the vmimport-resource-policy IAM Policy to the vmimport IAM Role by using the IAM Policy ARN in the previous step:
aws iam attach-role-policy --role-name vmimport --policy-arn arn:aws:iam::<ACCOUNT-ID>:policy/vmimport-resource-policy --profile snowsbe --endpoint https://<SNOWBALL_IP>:6089

Step 5: Create a Datasync AMI from an EC2 instance and export it locally

The first step in setting up DataSync is to deploy a DataSync agent. When choosing where to deploy the DataSync agent, opt to place it as close as possible to the Snowball Edge to reduce latency and use DataSync’s in-line compression, lowering transfer times and network transit costs. In our example, we placed the DataSync agent on the SnowBall Edge directly to use the localized compute capability while lowering latency to the Amazon S3-compatible storage service.

Create a private DataSync agent AMI in the AWS Region by deploying the latest DataSync agent on an EC2 instance. This method allows for local console access over SSH to the DataSync agent when running on the AWS Snowball Edge using the following steps:

1. Prepare to export the DataSync agent image to Amazon S3 by completing the necessary regional VMIE service precursors, similar to guidance in Step 4 for the VMIE service on Snowball Edge:

a. Create an S3 bucket for storage of your image in the same AWS Region as the DataSync service.
b. Create a service role for VMIE to assume with the appropriate permissions.

2. Deploy a DataSync agent as an Amazon EC2 instance with the following guidance:

a. Use the same AWS Region for the DataSync AMI as the DataSync service and S3 bucket.
b. Launch your instance with the following settings:

i. Use the c4.xl instance, as this runs on the previous generation xen hypervisor and makes sure any needed networking and storage drivers are installed prior to importing onto the AWS Snowball Edge.
ii. Proceed without a key pair for login as we can create a key pair on the Snowball Edge device locally later on.
iii.Choose any of the Region’s default VPC subnets.
iv. Disable auto-assign public IP assignment, as no public connectivity is needed.
v. Select the existing default security group to restrict inbound connectivity.
vi. Do not encrypt the Amazon Elastic Block Store (Amazon EBS) volume, as you cannot export an image with encrypted EBS snapshots in the block device mapping. Confirm your AWS Region is not encrypting EBS volumes by default.

3. After the instance runs for a few minutes, stop the instance to prepare for creating an AMI.

4. Create an AMI from the stopped DataSync EC2 instance.

5. Export the AMI using the VMIE service.

a. Use the ami-id value from Step 5.4
b. Use the disk-image-format of raw
c. Use the Step 5.1 S3 bucket name for “S3Bucket=<name>
d. Use the default Amazon S3 prefix value of “S3Prefix=export/”

6. Terminate the DataSync agent EC2 instance.

7. Download the .raw image file locally to your machine from the S3 bucket created in Step 5.5. The AWS Management Console can be used, but the AWS CLI is optimized for large data transfers. We recommend using a high-speed internet connection as the file is ~80GB in size:

aws s3 cp s3://<export bucket name>/export/<image-name>.raw  <path for local object storage>/datasync-agent.raw

Step 6: Import and start the DataSync agent as an Amazon EC2-compatible compute instance on the AWS Snowball Edge device

The DataSync agent image is ready to be sideloaded onto the Snowball Edge. If you need to access the DataSync agent local console for getting the agent registration key or operational support access for troubleshooting, then make sure you create a local SnowBall Edge keypair and launch the instance with that associated key.

In our example we deployed the DataSync agent on the Snowball Edge device without an associated SSH keypair as we activated the agent through Terraform using the following steps:

1. Upload the DataSync agent image to run as an Amazon EC2-compatible compute instance on your SnowBall Edge:

a. Upload the datasync-agent.raw image file to the ami-bucket Amazon S3 compatible storage on Snow bucket (created in Step 3.2). The .raw file is approximately 85GBs, and it is recommended to do this over a high-bandwidth connection if you are not locally connected to the device.

aws s3 cp datasync-agent.raw s3://<ami-bucket>/datasync-agent --profile snowsbe --endpoint-url https://<S3_OBJECT_IP>

b. Import the uploaded datasync-agent.raw image as a snapshot <SNAPSHOT_ID>:

aws ec2 import-snapshot --disk-container "Format=RAW,UserBucket={S3Bucket=<ami-bucket>,S3Key=datasync-agent}" --description "DataSync Image" --profile snowsbe --endpoint https://<SNOWBALL_IP>:8243

c. To confirm that the import from .raw file to snapshot has been completed, the status should change from Pending to Completed:

aws ec2 describe-import-snapshot-tasks --profile snowsbe --endpoint https://<SNOWBALL_IP>:8243

d. Register the snapshot as an AMI. Make sure to capture the SnapshotID from the previous step and update the following based on either using a Mac or Windows terminal:

aws ec2 register-image --name DataSync --description "DataSync agent" --block-device-mappings "[{\"DeviceName\": \"/dev/sda1\",\"Ebs\":{\"Encrypted\":false,\"DeleteOnTermination\":false,\"SnapshotId\":\"<SNAPSHOTID>\",\"VolumeSize\":80}}]" --root-device-name /dev/sda1 --profile snowsbe --endpoint https://<SNOWBALL_IP>:8243

Windows Powershell terminal:

aws ec2 register-image --name DataSync --description "DataSync agent" --block-device-mappings ‘[{"DeviceName": "/dev/sda1","Ebs":{"Encrypted":false,"DeleteOnTermination":true,"SnapshotId":"<SNAPSHOT_ID>","VolumeSize":80}}]’ --root-device-name /dev/sda1 --profile snowsbe --endpoint https://<SNOWBALL_IP>:8243

2. Create a security group for the DataSync agent. The default SnowBall Edge security group permits all traffic inbound and outbound. Read Using and Managing Security Groups on AWS SnowBall Edge devices more for details on security groups with the Snow Family. Review the DataSync network requirements to fit your use case needs. We restricted network connectivity inbound for testing and activation of DataSync agent using the following steps:

a. Create a new security group named ‘datasync-agent’:

aws ec2 create-security-group --group-name datasync-agent --description "Security group for the DataSync agent operating as EC2-compatible compute instance" --profile snowsbe --endpoint https://<SNOWBALL_IP>:8243

b. Create ingress rule entry permitting ICMP for connectivity testing using the group-ID value from the previous command along with your local area network subnet:

aws ec2 authorize-security-group-ingress --group-id <s.sg-id> --ip-permissions '[{"IpProtocol": "icmp", "FromPort": -1, "ToPort": -1, "IpRanges": [{"CidrIp": "<Local Area Network Subnet/CIDR >", "Description": "ICMP inbound from local area network"}]}]' --profile snowsbe --endpoint https://<SNOWBALL_IP>:8243

c. Create ingress rule entry permitting HTTP (TCP/80) for activating your Amazon EC2 DataSync agent from the Terraform workstation (for example, your local workstation). This rule is necessary if you do not retrieve the DataSync agent activation key from the EC2 instance local console through SSH:

aws ec2 authorize-security-group-ingress --group-id <s.sg-id> --protocol tcp --port 80 --cidr <Terraform Host IP/32> --profile snowsbe --endpoint https://<SNOWBALL_IP>:8243

Optional: Create ingress rule entry permitting SSH inbound for local console access to the DataSync agent from your workstation IP.

aws ec2 authorize-security-group-ingress --group-id <s.sg-id> --protocol tcp --port 22 --cidr <local workstation IP/32> --profile snowsbe --endpoint https://<SNOWBALL_IP>:8243

d. Validate the security group rule configurations:

aws ec2 describe-security-groups --group-id <s.sg-id> --profile snowsbe --endpoint https://<SNOWBALL_IP>:8243

3. Start the DataSync agent as an EC2-compatible compute instance on your Snowball Edge:

a. Launch the DataSync agent AMI:

aws ec2 run-instances --image-id <IMAGE_ID> --instance-type sbe-c.2xlarge --profile snowsbe --endpoint https://<SNOWBALL_IP>:8243

b. Check the state until the state “Name” is Running:

aws ec2 describe-instances --profile snowsbe --endpoint https://<SNOWBALL_IP>:8243

c. Once the instance is in a Running state, use the Snowball Edge CLI to assign an IP address for the DataSync agent EC2 instance by first creating a VNI for a static IP address assignment. You can identify the physical interface ID from the previous Step 1.5. Choose an available IP on the local area subnet for the instance:

snowballEdge create-virtual-network-interface --physical-network-interface-id <PHYSICAL_INT_ID> --ip-address-assignment STATIC --static-ip-address-configuration IpAddress=<IP_ADDRESS>,Netmask=<NETMASK> --profile snowsbe

d. Associate the VNI to the instance:

aws ec2 associate-address --public-ip <IP_ADDRESS> --instance-id <DATASYNC_INSTANCE_ID> --profile snowsbe --endpoint https://<SNOWBALL_IP>:8243

e. Assign the new datasync-agent security group to the instance, replacing the default:

aws ec2 modify-instance-attribute --instance-id <datasync instance ID> --groups <s.sg-id> --profile snowsbe --endpoint https://<SNOWBALL_IP>:8243

f. Activate the DataSync agent through one of the two following steps:

i. If the DataSync agent is accessible from the Terraform workstation, then assign the IP address of the DataSync agent to the Terraform variable datasync_agent_ip_address. When the following Terraform is applied, the agent is activated automatically.
ii. If the DataSync agent is not accessible from Terraform (make sure the workstation accessing the Console has access to the DataSync agent at http://<AGENT_ADDRESS>), then obtain the Activation Key through the Console:

1. Open the Console to the DataSync service and select Agents. Make sure that the Region is the same as the Snowball Edge device.
2. Select Create agent.
3. Under the Activation key section, enter the IP address of the DataSync agent in the Agent address field and select Get key.

Enter the IP address for DataSync agent running in Activation Key section with the Radio button chosen as the Automatically get the activation key from your agent.

4. Copy the Activation key and assign it to the Terraform variable datasync_agent_activation_key.

DataSync Activation key value is displayed under the Activation key section needed to create an agent

Step 7: Deploy IaC to activate the agent and create DataSync task and locations to replicate from AWS Snowball Edge device to an Amazon S3 Regional bucket on a user defined schedule

Now that the DataSync agent has been deployed, we used Terraform to accomplish these tasks:

  1. Activate the agent.
  2. Create two AWS Key Management Service (AWS KMS) keys for encrypting the S3 bucket and Amazon CloudWatch log group.
  3. Create an encrypted S3 bucket to receive the downsampled videos from the DataSync task.
  4. Create an encrypted CloudWatch log group for the DataSync task execution logs to be stored.
  5. Create a DataSync task for the agent to execute. You can use this walkthrough guide for the Console instead.
  6. Configure the source and destination buckets for the DataSync task.
  7. Schedule the DataSync task to run nightly.

In the DataSync task configuration defined, we save time and cost by only transferring data that has changed or differs between the source and destination buckets. Because the downsampled video files in the S3-Snow bucket are automatically purged on a regular basis by an external application, we’re also preserving deleted files (default) so that they are not also removed from the destination bucket.
You can also use bucket lifecycle rules to automatically delete files after a certain amount of time. Note that we’re only retaining CloudWatch logs for DataSync execution logs for 30 days for cost optimization. Make sure to update for your log retention needs.

a. Download the provider.tf file locally and load into your Terraform working directory.
b. Download the variables.tf file locally and load into your Terraform working directory.
c. Downlaod the main.tf file locally and load into your Terraform working directory.

Set the variables to definitions applicable to your environment, such as within terraform.tfvars. Also modify the following variables.tf variables before applying:

  • local_storage_hostname -set as the <S3_OBJECT_IP> in quotations as type string
  • local_storage_bucket -set as the <downsampled-bucket-video> name in quotations as type string
  • local_storage_certificate -set the path and object name as set at Step 2.6
  • aws_region -change if you are not using us-east-1
  • datasync_agent_ip_address or datasync_agent_activation_key -set only one of these depending on your activation choice at Step 6.3f in quotations as type string, replacing null

Furthermore, set the following shell variables to the Access Key and Secret Access Key, respectively, for your SnowballEdge device:

export TF_VAR_local_storage_ak="SNOWBALLEDGE_ACCESS_KEY_ID"
export TF_VAR_local_storage_sk="SNOWBALLEDGE_SECRET_ACCESS_KEY"

Alternatively, you can enter the key information when prompted during the terraform apply.

Change to the working directory containing the Terraform files and apply the manifests:

terraform init
terraform apply

Step 8: Validate the DataSync replication task and execution

To verify everything is configured properly, upload a file to the S3-Snow bucket <downsampled-bucket-video> and manually execute the DataSync task:

  1. Upload a test file to the Snowball bucket:
aws s3 cp <testvideo.mpg> s3://<downsampled-bucket-video>/<testfile.mpg> --profile snowsbe --endpoint-url https://<S3_OBJECT_IP>
  1. Execute the DataSync task, found on the last line of output from Terraform apply or in the Console on the DataSync service page under Tasks:
aws datasync start-task-execution --task-arn <DATASYNC_TASK_ARN>
  1. Check the status of the task from the task execution ARN provided in the previous step output:
aws datasync describe-task-execution --task-execution-arn <TASK_EXECUTION_ARN>

If the task returns “Success,” then the test file (testvideo.mpg) should now be available in the destination bucket in Amazon S3. If the task ends with a status other than “Success”, then you can troubleshoot by reviewing the CloudWatch logs for the task execution. Check the CloudWatch log group with the prefix “/aws/datasync/” followed by a random 12 character string.

You may wish to set up an Amazon EventBridge rule to monitor task status and generate notifications on failures. You may also wish to set retention of CloudWatch logs to a greater retention period of 30 days for auditing purposes and enable regional S3 Bucket Keys for cost optimization.

To replicate more than 20 million files, objects, and directories, modify the instance type for the DataSync agent from sbe-c.2xlarge to sbe-c.4xlarge to get the required 64GB of memory.

Lastly, you may wish to explore Amazon S3 request costs when using DataSync to better understand how DataSync uses these requests and how they might affect your Amazon S3 costs.

Cleaning up

To tear down the Terraform deployed infrastructure, permanently delete all objects stored in your S3 bucket in the Region first. You also must comment out or delete line 77 of the main.tf code to remove the delete prevention on the newly created S3 bucket.

Finally, run the following from your terminal window:

terraform destroy

Conclusion

In this post, we showed how to configure AWS DataSync to automatically and efficiently migrate data from the edge to an AWS Region. We detailed the steps necessary to configure the Amazon S3 compatible storage on an AWS Snowball Edge device to include creating two buckets for AMI storage and application data usage. We described how to deploy a DataSync agent on a Snowball Edge and programmatically configure a DataSync task to automatically migrate change data from the Snowball Edge S3 compatible bucket to an S3 bucket on a regular basis.

This solution could be used in whole or in part to simplify and reduce delays related to data synchronization of any media between Snowball Edge devices in the field and your preferred AWS Region, thereby enabling use cases such as ML operational pipelines or enterprise data lake ingestion.

If you have feedback or suggestions, leave a comment in the comment section.

Brad Beaulieu

Brad Beaulieu

Brad Beaulieu is a Principal on Booz Allen’s Chief Technology Office (CTO) BrightLabs team leading Edge Cloud investments for the U.S. Federal Government and Commercial clients. He has worked for Booz Allen since 2009 on a range of cloud security and managed services initiatives. He enjoys the outdoors whether gardening in the backyard or hiking in the mountains.

Christopher Smith

Christopher Smith

Christopher Smith is a Principal Solutions Architect on the U.S. Federal Partners team at Amazon Web Services (AWS). He’s supported the U.S. Federal Government since entering the commercial workforce in 2003. He manages his work-life balance by focusing his free time on his family and also enjoys trivia and being outdoors whenever possible.

Dru Grote

Dru Grote

Dru Grote is a Senior Lead Technologist at Booz Allen Hamilton. He is passionate about securely implementing automation and monitoring solutions in the cloud. Away from the keyboard you’ll likely find him taking photos or biking.