Cloud Data Migration

Move on-premises data to AWS for migrations or ongoing workflows

Data is a cornerstone of successful application deployments, analytics workflows, and machine learning innovations. When moving data to the cloud, you need to understand where you are moving it for different use cases, the types data you are moving, and the network resources available among other considerations. AWS offers a wide variety of services and partner tools to help you migrate your data sets, whether they are files, databases, machine images, block volumes or even tape backups.

Best practices for migrating data to AWS (50:07)

AWS Cloud Data Migration Services

The suite of data transfer services created by AWS includes many methods that help you migrate your data more effectively. You can think about them in two categories; online data transfer and hybrid cloud storage and offline data migration to Amazon S3.

Online data transfer and hybrid cloud storage

These methods make it simple to create a network link to your VPC, transfer data to AWS, or use S3 for hybrid cloud storage with your existing on-premises applications. These services can help you both lift and shift large data-sets once, as well as help you integrate existing process flows like backup and recovery or continuous data streams directly with cloud storage.

AWS Direct Connect

Customers select a Direct Connect dedicated physical connection to accelerate network transfers between their datacenters and ours.

AWS Direct Connect lets you establish a dedicated network connection between your network and one of the AWS Direct Connect locations. Using industry standard 802.1q VLANs, this dedicated connection can be partitioned into multiple virtual interfaces. This allows you to use the same connection to access public resources such as objects stored in Amazon S3 using public IP address space, and private resources such as Amazon EC2 instances running within an Amazon Virtual Private Cloud (VPC) using private IP space, while maintaining network separation between the public and private environments. Virtual interfaces can be reconfigured at any time to meet your changing needs.

Explore our AWS Direct Connect Partner Bundles that help extend on-premises technologies to the cloud. 

AWS DataSync

AWS DataSync is a data transfer service that makes it easy for you to automate moving data between on-premises storage and Amazon S3 or Amazon Elastic File System (Amazon EFS). DataSync automatically handles many of the tasks related to data transfers that can slow down migrations or burden your IT operations, including running your own instances, handling encryption, managing scripts, network optimization, and data integrity validation. You can use DataSync to transfer data at speeds up to 10 times faster than open-source tools. You can use DataSync to copy data over AWS Direct Connect or internet links to AWS for one-time data migrations, recurring data processing workflows, and automated replication for data protection and recovery.

AWS Storage Gateway

The AWS Storage Gateway service simplifies on-premises adoption of AWS storage. Your existing applications connect to a local gateway via industry-standard block and tape storage protocols to store data in Amazon S3 and Amazon Glacier. Data is compressed and securely transferred to AWS.

  • The File Gateway presents SMB or NFS file shares for on-premises applications to store files as S3 objects and access them with traditional file interfaces.
  • The Tape Gateway virtual tape library (VTL) configuration seamlessly integrates with your existing backup software for cost effective tape replacement in Amazon S3 and long term archival in S3 Glacier, and S3 Glacier Deep Archive.
  • The Volume Gateway stores or caches block volumes locally, with point-in-time backups as EBS snapshots. These snapshots may be recovered in the cloud.

Amazon S3 Transfer Acceleration

Amazon S3 Transfer Acceleration makes public Internet transfers to Amazon S3 faster. You can maximize your available bandwidth regardless of distance or varying Internet weather, and there are no special clients or proprietary network protocols. Simply change the endpoint you use with your S3 bucket and acceleration is automatically applied.

This is ideal for recurring jobs that travel across the globe, such as media uploads, backups, and local data processing tasks that are regularly sent to a central location.

Amazon Kinesis Data Firehose

Amazon Kinesis Data Firehose is the easiest way to load streaming data into AWS. It can capture and automatically load streaming data into Amazon S3 and Amazon Redshift, enabling near real-time analytics with existing business intelligence tools and dashboards you’re already using today. It is a fully managed service that automatically scales to match the throughput of your data and requires no ongoing administration. It can also batch, compress, and encrypt the data before loading it, minimizing the amount of storage used at the destination and increasing security. You can easily create a Firehose delivery stream from the AWS Management Console, configure it with a few clicks, and start sending data to the stream from hundreds of thousands of data sources to be loaded continuously to AWS – all in just a few minutes.

APN Partner Products

AWS has partnered with a number of industry vendors on physical gateway appliances that bridge the gap between traditional backup and cloud. Link existing on-premises data to Amazon's cloud to make the move without impacting performance and preserving existing backup catalogs.

  • Seamlessly integrates into existing infrastructure
  • May offer deduplication, compression, encryption or WAN acceleration
  • Cache recent backups locally, vault everything to the AWS cloud

Offline data migration to Amazon S3

One should never underestimate the bandwidth of a semi truck filled with 100 Petabytes of hard drives, or a 100 TB suitcase-sized device. These offline data migration services that use shippable, ruggedized devices are ideal for moving large archives, data lakes, or in situations where bandwidth and data volumes cannot pass over your networks within your desired time frame.

AWS Snowball

AWS Snowball is a petabyte-scale data transport solution that uses secure appliances to transfer large amounts of data into and out of AWS. Using Snowball addresses common challenges with large-scale data transfers including limited network bandwidth, long transfer times, and security concerns. Transferring data with Snowball is simple, fast, and secure.

AWS Snowball Edge

AWS Snowball Edge is a petabyte-scale data transfer device with on-board storage and compute capabilities. You can use Snowball Edge to move large amounts of data into and out of AWS, as a temporary storage tier for large local data-sets, or to support local workloads in remote or offline locations.

AWS Snowmobile

AWS Snowmobile is an exabyte-scale data transport solution that uses a secure semi 40-foot shipping container to transfer large amounts of data into and out of AWS. Using Snowmobile addresses common challenges with large-scale data transfers including high network costs, long transfer times, and security concerns. Transferring data with Snowmobile is done through a custom engagement, is fast, secure, and can be as little as one-fifth the cost of high-speed Internet.

Unmanaged cloud data migration tools

AWS also offers easy script or CLI tools to move data from your site into Amazon's cloud storage.

rsync

Customers use rsync, an open source tool, along with 3rd party file system tools to copy data directly into S3 buckets.

S3 command line interface

Customers use the Amazon S3 CLI to write commands to move data directly into S3 buckets.

S3 Glacier command line interface

Customers use the Amazon Glacier CLI to move data into Glacier vaults.

The common cloud data migration challenge

The daunting realities of data transport apply to most projects. How do you gracefully move from your current location to your new cloud, with minimal disruption, cost and time? What is the smartest way to actually move your GB, TB or PB of data?

It's a basic underlying problem: how much data can move how far how fast? For a best case scenario use this formula:

Number of days = (Total Bytes)/(Megabits per second * 125 * 1000 * Network Utilization * 60 seconds * 60 minutes * 24 hours)

For example, if you have a T1 connection (1.544Mbps) and 1TB (1024 * 1024 * 1024 * 1024 bytes) to move in or out of AWS the theoretical minimum time it would take to load over your network connection at 80% network utilization is 82 days.

Relax. We’ve done this before. We've found that customers approach this in two ways: they use very basic unmanaged migration tools to move their data or they select one of Amazon's suite of services noted above.

As a general rule of thumb, for best results we suggest:

Connection Data Scale Method
Less than 10 Mbps Less than 500 GB Unmanaged
More than 10 Mbps More than 500 GB Managed service