Enable large-scale database migrations with AWS DMS and AWS Snowball

At some point in any database migration, the bandwidth of your network becomes a limiting factor. Without available high-speed internet links, it can take months to transfer large amounts of data. For example, 100 terabytes of data takes more than 100 days to transfer over a dedicated 100-Mbps connection. Many times, if you are in a remote location or have other network challenges, a migration of a relatively small 500-GB database can be considerably slow. Additionally, the cumulative size of many simultaneous smaller database migrations can also amplify this challenge.

Another common scenario that can hinder or delay a database migration project is the lack of outside access to the database itself. You might find yourself all set to start your database migration from your source database, only to find that you’re not permitted access outside your corporate network. The scenarios described, and others like them, are where the AWS Snowball service and its integration with AWS Database Migration Service (AWS DMS) can yield its greatest advantages in migrating large database environments offline.

More than 200,000 databases have been migrated to AWS using AWS Database Migration Service (AWS DMS), either as a one-time migration or with ongoing replication. AWS DMS works with the AWS Schema Conversion Tool (AWS SCT) to significantly simplify and expedite the database migration process in a low-cost, highly available manner. These tools work in concert with an AWS Snowball device installed in your datacenter. The device enables you to perform a database migration locally, then ship the AWS Snowball device back to AWS for import into your cloud storage target. This process alleviates constraints in network bandwidth that commonly slow or derail cloud migrations.

In this blog post, we’ll provision AWS services to receive your data, configure an AWS Snowball device, and configure the migration jobs required to actually move your data.

Procedure Outline and Architecture

AWS Snowball is a service of ruggedized physical storage and computing devices, such as the AWS Snowball Edge Storage Optimized device, that enable you to move petabytes of data to AWS. The service helps overcome challenges that you can encounter with large-scale data transfers, including long transfer times, a lack of usable bandwidth, and security concerns. You know that you can order a book or consumer product on Amazon Prime, knowing it shows up at your door two days later. Similarly, you can order a Snowball Edge device from your AWS Management Console and it arrives at your data center a few days later with approximately 80 TB of usable encrypted storage.

What follows is an architecture diagram of the migration workflow (Figure 1). The key components located in the on-premises data center include the source database, local replication instance, AWS SCT, and the Snowball Edge device. In the remote AWS Region, you have an Amazon Simple Storage Service (Amazon S3) bucket that is used for staging, the target database, and the remote replication agent. The local replication instance and SCT read the source database and copy existing schema tables from the source database to the Snowball Edge device. The Snowball Edge is then shipped to AWS where the data is imported on your behalf to your Amazon S3 bucket. Incremental data is uploaded to S3 by the local replication instance and applied on the target database by the remote replication instance. Using the DMS task created by the SCT, the data can be loaded to the target and then be cut over when you are ready.

The local replication agent and the SCT are installed on virtual machines or bare-metal hosts. The DMS agent is only supported on Red Hat Enterprise Linux versions 6.2 through 6.8, 7.0, and 7.1 (64-bit), and SUSE Linux version 12 (64-bit). See AWS DMS documentation for more details.

Architecture diagram of the migration workflow

Figure 1: Integrated Solution Architecture

Some of the salient features of this integration architecture are:

You get the ability to physically attach a secure, hardened data transfer device directly inside your data center rather than opening ports to the outside.
You can now move large databases from on-premises to AWS Cloud.
This integration provides a “push” model to migrate databases instead of a “pull” model.
You can migrate one or more databases using the same Snowball Edge device without a need to upgrade your network infrastructure and consume dedicated bandwidth.
While migrating these multi-terabyte databases to AWS, your on-premises databases remain online. They can be decommissioned when the Snowball Edge device is shipped back to AWS and automatically loaded onto your target Amazon Relational Database Service (RDS) or Amazon EC2–based database.
You can migrate existing data (one time) or optionally perform ongoing data replication to the target database.

A few notes about working with Snowball Edge and AWS DMS:

When you are performing a database migration, follow the step-by-step documentation.
For data transfer we recommend that you order a Snowball Edge Storage Optimized device, which has a storage capacity of approximately 80 TB.
Two hosts running supported operating systems are required to run the required agents:
- DMS agent (part of the AWS SCT package).
- Schema Conversion Tool
The Snowball Edge must be on the same network as your source database.

You can use the following steps to migrate a database or multiple databases using the new integration of AWS DMS and Snowball Edge.

Preparation

The preparation involves setting up prerequisites, creating an Amazon S3 bucket, and getting and configuring your Snowball Edge. The step-by-step instructions and the best practices are covered in the Snowball Edge Migration Guide. You must obtain the Snowball Edge access key and secret key by following these detailed procedures .

Prerequisites

As prerequisites, you must set up the source and target databases. To do so, look at the documentation for the AWS DMS source configuration and target configuration.

Create an Amazon S3 bucket (staging S3)

When you’ve set up the source and target databases as described in the documentation, you create a bucket in Amazon S3. This bucket is also called “staging S3.”

This bucket acts as a temporary staging area for existing data and ongoing transactions during a database migration process. When database migration is complete and cutover to the target database is done, you can delete this staging S3 bucket. This bucket should be in the same AWS Region as the target database. Also, AWS DMS and AWS SCT need AWS Identity and Access Management (IAM) roles to access this bucket.

For more information, see Prerequisites When Using S3 as a Source for AWS DMS in the AWS DMS documentation.

Order and configure the Snowball Edge

Next, you create an AWS Snowball job through the AWS Management Console and order your Snowball Edge device. As part of this step, you specify the Amazon S3 bucket (staging S3) you created in the previous step.

When your Snowball Edge adevice arrives, configure it on your local network following the steps mentioned in the Getting Started section of the Snowball Edge documentation. After the Snowball Edge device is connected to your network, install the Snowball client. Finally, unlock the Snowball Edge by downloading the manifest file, unlock code, and following these instructions.

Configuration

Next, configure your migration by taking the following steps:

Step 1: Configure AWS SCT

In the first step of configuration, configure the global settings for AWS SCT. These settings include the AWS service profile and the database drivers for the source and target databases.

To do so, start AWS SCT and for Settings choose Global Settings, AWS Service Profiles. The Global Settings page opens. Choose Add new AWS Service Profile.

Ensure you have added these elements to the Service Profile (see AWS DMS documentation for more details on configuring AWS SCT to use the Snowball Edge):

Profile Name
AWS Access Key
AWS Secret Key
Region
S3 Bucket folder

Next, click Test Connection to ensure all your profile values are correct and that all test show a status of Pass. If the test passes, choose OK and then OK again to close the window and dialog box.

Choose Import Job, choose the Snowball Edge job the list, and then choose OK.

Choose Import Job, choose the Snowball Edge job from the list, and then choose OK.

Now configure AWS SCT to use the Snowball Edge. Enter the IP address of the Snowball Edge, the listening port on the device (the default is 8080), and the Snowball Edge access key and secret keys you retrieved earlier. Choose OK to save your changes.

Configure AWS SCT to use the Snowball Edge

When the AWS service profile is configured in AWS SCT, you can use the source and target database details to create a new project in SCT. Then you can connect to both the source and target databases in this AWS SCT project.

Step 2: Install the source and target database drivers on the AWS DMS Replication Agent instance

AWS DMS uses a replication instance to connect to your source data store, read the source data, and format the data for consumption by the target data store. It is then copied to the Snowball Edge device. In this architecture, the remote DMS instance loads the data to the target data store. We must install source and target database drivers on this instance.

The ODBC drivers required for the source databases on the replication instance. For example, to configure MySQL drivers on Linux, run the following commands. For information on how to configure these drivers for specific source and target databases, see the database documentation.

sudo yum install unixODBC

sudo yum install mysql-connector-odbc

After executing the preceding commands, make sure that the /etc/odbcinst.ini file has the following contents:

cat /etc/odbcinst.ini
[MySQL ODBC 5.3 Unicode Driver]
Driver=/usr/lib64/libmyodbc5w.so
UsageCount=1

[MySQL ODBC 5.3 ANSI Driver]
Driver=/usr/lib64/libmyodbc5a.so
UsageCount=1

Step 3: Configure the AWS DMS Replication Agent instance and install the AWS DMS Replication Agent

The local machine where an agent runs and connects to the source database or databases to migrate data is called an AWS DMS Replication Agent instance. The agent process running on this instance is called an AWS DMS Replication Agent.

You size the instance that you use depending on a couple of considerations. These considerations are the number of tasks to run on this machine and the throughput requirements for data migration from the source database to the Snowball Edge device.

To install the AWS DMS agent, locate the “dmsagent” directory in the location where the AWS SCT installation files were decompressed. Install the agent appropriate for your operating system. Detailed installation steps are documented here.

During the installation, you must provide the port number and password. This port number and password are used in the AWS SCT UI in the next step.

sudo /opt/amazon/aws-schema-conversion-tool-dms-agent/bin/configure.sh

Configure the AWS DMS Replication Agent
Note: You will use these parameters when configuring agent in AWS Schema Conversion Tool

Please provide the password for the AWS DMS Replication Agent
Use minimum 8 and up to 20 alphanumeric characters with at least one digit and one capital case character
Password: *******
...
[set password command] Succeeded

Please provide port number the AWS DMS Replication Agent will listen on
Note: You will have to configure your firewall rules accordingly
Port: 8192
Starting service...
...
AWS DMS Replication Agent was started
You can always reconfigure the AWS DMS Replication Agent by running the script again.

Step 4: Configure an AWS DMS replication instance using the console

For AWS DMS and Snowball Edge integration, the AWS DMS replication instance is called an AWS DMS remote replication instance. It is named this way because in this case, the instance is running on the AWS Cloud. This placement contrasts with that of the AWS DMS Replication Agent instance, which runs on your local host. For clarification on the two replication instances, see the architecture diagram (Figure 1).

For information on how to create an AWS DMS remote replication instance using the AWS Management Console, see the AWS DMS documentation.

Execution

Now that you’ve set up configuration, you can run the migration by using the following steps:

Step 1: Connect AWS SCT to the replication agent

In this step, you connect to the replication agent using the host name, port number, and password you provided in configuration step 3.

In the AWS SCT user interface, navigate to View, Database Migration View (Local & DMS), and choose Register.

Specify the IP address of the host, the port number, and the password used for the AWS DMS replication agent configuration, as shown in the following screenshot:

Specify the IP address of the host, the port number, and the password used for the AWS DMS replication agent configuration.

This is how it looks after registration:

This is how it looks after registration

Step 2: Create local and DMS tasks in AWS SCT

You can now create tasks on the local and remote AWS DMS replication instances. AWS DMS tasks are the actual workhorses that conduct the data migration.

The following steps show you how to create local and remote tasks in a single step from the AWS SCT UI:

First, open the context (right-click) menu for the source schema in SCT, and choose Create Local & DMS Task.
Details, including agents, replication instances, IAM roles, AWS import job names, and so on, are prepopulated from AWS SCT profile configurations and your AWS DMS resources in the AWS Region.
Choose the appropriate agent, replication instance, migration type, and IAM role. Choose the job name, and type the Snowball Edge IP address. Also, type the local Amazon S3 access key and local S3 secret key details obtained when you performed step 2 in the preceding section.

Create Local and DMS Task

As a result, two tasks are created, and you can see them in the DMS console (and the AWS SCT UI, as shown in the next section):

Local task – This task migrates existing data from the source database to the Snowball Edge and also any ongoing transactions to the staging S3.
DMS task – This task is the one that you are used to seeing in AWS DMS console. This task migrates existing data from the staging S3 to the target database. It then applies any ongoing transactions to the target database. For clarification of the two tasks, see the architecture diagram (Figure 1).

AWS DMS Console:

As a result, two tasks are created, which you can see in the AWS SCT UI and DMS console (2)

Step 3: Test the connections, start the tasks, and monitor progress

You are now ready to test the connection to the source database, Snowball Edge device, and S3 bucket from the AWS DMS Replication Agent instance. To do so, choose Test on the AWS SCT UI Tasks tab.

Doing this also tests the connectivity to the staging S3 bucket and the target database from the AWS DMS remote replication instance.

Choosing Test on the AWS SCT Tasks tab tests the connectivity to the staging S3 bucket and the target database

Unless the test for all these tasks is successful, you cannot start the tasks.

The AWS DMS task remains in the running state in the console until the Snowball Edge device is shipped to AWS and the data is loaded into your staging S3 area.

The following screenshots shows the loaded data streams:

The loaded data streams

As mentioned, when the Snowball Edge is sent to AWS, the data is imported to the staging S3 bucket. There, the AWS DMS task automatically starts loading existing data into the target database (full load). The task then applies the change data capture (CDC) logs for ongoing replication. You can see the progress of the replication tasks in the following screenshot:

Your applications can now point to the new database on AWS.

When all existing data is migrated, and the ongoing replication process brings the source and target databases to the same transaction level, you can cut over to the target database. Your applications can now point to the new database on AWS.

Congratulations! You have migrated your multi-terabyte database or databases to AWS using AWS DMS and Snowball Edge integration.

We want to highlight the fact that you can migrate your database in this “push” model without using the Snowball Edge device too! In that case, the local task or tasks copy the existing data from the source database to the staging S3 bucket, including the ongoing database transactions.

Additionally, by locating the extraction logic closer to the source, you can gain efficiencies in Change Data Capture (CDC). For instance, if your use case only requires migrating a subset of database tables in a highly active source database, DMS can filter out extraneous changes before transmitting to AWS. This procedure could significantly lower the network bandwidth required to migrate your data.

The DMS tasks on the AWS DMS remote replication instance then load existing data directly to the target database. The tasks start loading the ongoing transactions once existing data is migrated. You can also use this staging S3 flow to verify that the entire process works well by testing on a small table or two before you order your Snowball Edge.

Cleaning up

As part of this migration project, you have deployed resources in your on-premises data center to run the local replication instance and the DMS Schema Conversion Tool. You have also deployed additional instances and a staging S3 bucket in the cloud. These resources will incur costs as long as they are in use. Be sure to remove all unwanted resources and clean up your work when you have finished with the migration.

Summary

Many AWS features and services arise from AWS teams carefully listening to real-life customer experiences and needs. This integration between AWS DMS and Snowball Edge is an excellent example of implementing the ideas that emerge from that process.

After following the procedure discussed, we were able to migrate a database hosted on-premises to an AWS managed database service (Amazon RDS) while minimizing the network footprint that such a migration typically consumes. This procedure enables customers to quickly and securely migrate databases from on-premises environments that have limited connectivity to the internet. The AWS Database Migration services and the Snowball Edge devices eliminate the need for on-premises shared-storage for staging database files before migration. Also, this solution does not require major changes to your network and access policies. You benefit from increased security, unlimited scale, and cost savings by migrating your databases to the AWS Cloud. Given all of these advantages, your teams can focus on developing new products and capabilities that benefit your organization, rather than dealing with the operational headaches of typical database migrations.

For more information about this feature, read the AWS documentation. Please consider leaving a comment below if you have any questions or feedback, thanks!

This blog is an update to a post by Ejaz Sayyed and Mitchell Gurspan in 2017; thanks to them for their prior work.