Cloud Data Migration
Data is a cornerstone of successful application deployments, analytics workflows, and machine learning innovations. When moving data to the cloud, you need to understand where you are moving it for different use cases, the types of data you are moving, and the network resources available, among other considerations. AWS offers a wide variety of services and partner tools to help you migrate your data sets, whether they are files, databases, machine images, block volumes, or even tape backups.
AWS Cloud Data Migration Services
AWS provides a portfolio of data transfer services to provide the right solution for any data migration project. The level of connectivity is a major factor in data migration, and AWS has offerings that can address your hybrid cloud storage, online data transfer, and offline data transfer needs.
Hybrid cloud storage
Many customers want to take advantage of the benefits of cloud storage, but have applications running on-premises that require low-latency access to their data, or need rapid data transfer to the cloud. AWS hybrid cloud storage architectures connect your on-premises applications and systems to cloud storage to help you reduce costs, minimize management burden, and innovate with your data.
AWS Storage Gateway simplifies on-premises adoption of AWS Storage. Storage Gateway lets you seamlessly connect and extend your on-premises applications to AWS Storage. Customers use Storage Gateway to seamlessly replace tape libraries with cloud storage, provide cloud storage-backed file shares, or create a low-latency cache to access data in AWS for on-premises applications. The service provides three different types of gateways – File Gateway, Tape Gateway, and Volume Gateway.
- File Gateway presents SMB or NFS file shares for on-premises applications to store files as S3 objects and access them with traditional file interfaces.
- Tape Gateway virtual tape library (VTL) configuration seamlessly integrates with your existing backup software for cost effective tape replacement in Amazon S3 and long term archival in S3 Glacier and S3 Glacier Deep Archive.
- Volume Gateway stores or caches block volumes locally, with point-in-time backups as EBS snapshots. These snapshots may be recovered in the cloud.
Customers select a Direct Connect dedicated physical connection to accelerate network transfers between their datacenters and AWS datacenters.
AWS Direct Connect lets you establish a dedicated network connection between your network and one of the AWS Direct Connect locations. Using industry standard 802.1q VLANs, this dedicated connection can be partitioned into multiple virtual interfaces. This enables you to use the same connection to access public resources such as objects stored in Amazon S3 using public IP address space, and private resources such as Amazon EC2 instances running within an Amazon Virtual Private Cloud (VPC) using private IP space, while maintaining network separation between the public and private environments. Virtual interfaces can be reconfigured at any time to meet your changing needs.
Explore our AWS Direct Connect Partner Bundles that help extend on-premises technologies to the cloud.
Online data transfer
These services make it simple and easy to transfer your data into and out of AWS via online methods.
AWS DataSync is a data transfer service that makes it easy for you to automate moving data between on-premises storage and Amazon S3, Amazon Elastic File System (Amazon EFS), or Amazon FSx for Windows File Server. DataSync automatically handles many of the tasks related to data transfers that can slow down migrations or burden your IT operations, including running your own instances, handling encryption, managing scripts, network optimization, and data integrity validation. You can use DataSync to transfer data at speeds up to 10 times faster than open-source tools. You can use DataSync to copy data over AWS Direct Connect or internet links to AWS for one-time data migrations, recurring data processing workflows, and automated replication for data protection and recovery.
The AWS Transfer Family provides fully managed support for file transfers directly into and out of Amazon S3. With support for Secure File Transfer Protocol (SFTP), File Transfer Protocol over SSL (FTPS), and File Transfer Protocol (FTP), the AWS Transfer Family helps you seamlessly migrate your file transfer workflows to AWS by integrating with existing authentication systems, and providing DNS routing with Amazon Route 53 so nothing changes for your customers and partners, or their applications. With your data in Amazon S3, you can use it with AWS services for processing, analytics, machine learning, and archiving. Getting started with the AWS Transfer Family is easy; there is no infrastructure to buy and set up.
Amazon S3 Transfer Acceleration makes public internet transfers to Amazon S3 faster. You can maximize your available bandwidth regardless of distance or varying internet weather, and there are no special clients or proprietary network protocols. Simply change the endpoint you use with your S3 bucket and acceleration is automatically applied.
This is ideal for recurring jobs that travel across the globe, such as media uploads, backups, and local data processing tasks that are regularly sent to a central location.
AWS Snowcone is the smallest member of the AWS Snow Family of edge computing and data transfer devices. Snowcone is portable, rugged, and secure. You can use Snowcone to collect, process, and move data to AWS online with AWS DataSync. Running applications in disconnected environments and connected edge locations can be challenging because these locations often lack the space, power, and cooling needed for data center IT equipment. AWS Snowcone stores data securely in edge locations, and can run edge computing workloads that use AWS IoT Greengrass or Amazon EC2 instances. Snowcone devices are small and weigh 4.5 lbs. (2.1 kg), so you can carry one in a backpack or fit it in tight spaces for IoT, vehicular, or even drone use cases.
Amazon Kinesis Data Firehose is the easiest way to load streaming data into AWS. It can capture and automatically load streaming data into Amazon S3 and Amazon Redshift, enabling near real-time analytics with existing business intelligence tools and dashboards you’re already using today. It is a fully managed service that automatically scales to match the throughput of your data and requires no ongoing administration. It can also batch, compress, and encrypt the data before loading it, minimizing the amount of storage used at the destination and increasing security. You can easily create a Firehose delivery stream from the AWS Management Console, configure it with a few clicks, and start sending data to the stream from hundreds of thousands of data sources to be loaded continuously to AWS – all in just a few minutes.
AWS has partnered with a number of industry vendors on physical gateway appliances that bridge the gap between traditional backup and cloud. Link existing on-premises data to Amazon's cloud to make the move without impacting performance and preserving existing backup catalogs.
- Seamlessly integrates into existing infrastructure
- May offer deduplication, compression, encryption or WAN acceleration
- Cache recent backups locally, vault everything to the AWS Cloud
Offline data transfer
The AWS Snow Family make it simple to get your data into and out of AWS via offline methods.
AWS Snowcone is the smallest member of the AWS Snow Family of edge computing and data transfer devices. Snowcone is portable, rugged, and secure. You can use Snowcone to collect, process, and move data to AWS offline by shipping the device. Running applications in disconnected environments and connected edge locations can be challenging because these locations often lack the space, power, and cooling needed for data center IT equipment. AWS Snowcone stores data securely in edge locations, and can run edge computing workloads that use AWS IoT Greengrass or Amazon EC2 instances. Snowcone devices are small and weigh 4.5 lbs. (2.1 kg), so you can carry one in a backpack or fit it in tight spaces for IoT, vehicular, or even drone use cases.
AWS Snowball is a petabyte-scale data transport and edge computing device that comes with on-board storage and compute capabilities and is availble in two options. Snowball Edge Storage Optimized devices provide both block storage and Amazon S3-compatible object storage, and 40 vCPUs. They are well suited for local storage and large scale-data transfer. Snowball Edge Compute Optimized devices provide 52 vCPUs, block and object storage, and an optional GPU for use cases like advanced machine learning and full motion video analysis in disconnected environments. You can use these devices for data collection, machine learning and processing, and storage in environments with intermittent connectivity (like manufacturing, industrial, and transportation) or in extremely remote locations (like military or maritime operations) before shipping them back to AWS. These devices may also be rack mounted and clustered together to build larger temporary installations.
AWS Snowmobile is an exabyte-scale data transport solution that uses a secure semi 40-foot shipping container to transfer large amounts of data into and out of AWS. Using Snowmobile addresses common challenges with large-scale data transfers including high network costs, long transfer times, and security concerns. Transferring data with Snowmobile is done through a custom engagement, is fast, secure, and can be as little as one-fifth the cost of high-speed internet.
Unmanaged cloud data migration tools
AWS also offers easy script or CLI tools to move data from your site into AWS cloud storage.
Customers use rsync, an open source tool, along with 3rd party file system tools to copy data directly into S3 buckets.
S3 command line interface
Customers use the Amazon S3 CLI to write commands to move data directly into S3 buckets.
S3 Glacier command line interface
Customers use the Amazon S3 Glacier CLI to move data into S3 Glacier vaults.
The common cloud data migration challenge
The daunting realities of data transport apply to most projects. How do you gracefully move from your current location to your new cloud, with minimal disruption, cost and time? What is the smartest way to actually move your GB, TB, or PB of data?
It's a basic underlying problem: how much data can move how far how fast? For a best case scenario use this formula:
Number of days = (Total Bytes)/(Megabits per second * 125 * 1000 * Network Utilization * 60 seconds * 60 minutes * 24 hours)
For example, if you have a T1 connection (1.544Mbps) and 1TB (1024 * 1024 * 1024 * 1024 bytes) to move in or out of AWS the theoretical minimum time it would take to load over your network connection at 80% network utilization is 82 days.
Relax. We’ve done this before. We've found that customers approach this in two ways: they use very basic unmanaged migration tools to move their data or they select one of AWS's suite of services noted above.
As a general rule of thumb, for best results we suggest:
|Less than 10 Mbps||Less than 500 GB||Unmanaged|
|More than 10 Mbps||More than 500 GB||Managed service|