AWS Storage Blog

Seamlessly migrate large SQL databases using AWS Snowball and AWS DataSync

Many of our customers use native SQL Server backup and restore features to migrate on-premises SQL Server databases to AWS. When using the native SQL Server backup and restore functionality, you can simplify the database migration process by performing a full backup restore on the target SQL instance. In addition, with the help of differential and log backups, you can significantly reduce your application cutover time.

However, migrating large (10+ TB) SQL Server databases can include many terabytes of information, and can be cumbersome for several reasons. Local network bandwidth limits may throttle your migration, or the size of your database backup may exceed the throughput of your internet, VPN, or AWS Direct Connect link to AWS. Retaining incremental backups for a long time on premises while full backups are being transferred can incur additional storage capacity costs and increase operational overhead.

In this post, I walk you through a two-step process that enables you to migrate large SQL Server databases to AWS. The first step includes transferring a one-time full backup to Amazon S3 using AWS Snowball Edge Storage Optimized to create a replica of your SQL Server in AWS. The second step uses AWS DataSync and Amazon FSx for Windows File Server (Amazon FSx) to transfer incremental backups to keep the SQL Server in AWS up to date until the cutover. As part of this solution, I cover how to simplify your large database migration using AWS Snowball Edge Storage Optimized when there is a local network bandwidth constraint. I also demonstrate how you can transfer incremental backups to free up on-premises backup storage, while durably storing SQL Server data on AWS. Doing so reduces cost for your organization by moving that data off expensive storage systems.

Solution overview

The AWS Snowball Edge Storage Optimized device is a member of the AWS Snow Family. You can use a Snowball to physically transport and migrate a large SQL Server database, and for the initial transfer of your database backups to AWS. Once you have shipped your Snowball back to AWS, your backup data is uploaded into your S3 bucket, enabling you to restore your databases to SQL Server on Amazon EC2.

Suspending operations of your on-premises SQL Server while you are waiting for the Snow device to reach AWS, for the backups to be uploaded to your S3 bucket, and then for the backups to be restored to your target SQL Server platform can be costly.

You may need to process and transfer ongoing changes in your active databases that are occurring while your backups are being delivered into S3. You can implement a hybrid solution with the initial backup transferred over the Snow device, and subsequent incremental changes transferred as differential and transaction log backups using AWS DataSync and Amazon FSx. After the initial backup reaches AWS and is restored to your target platform, your target SQL Server on EC2 can catch up by restoring subsequent backups. It can stay in sync with your production database through regular transaction log restores. When you are ready to cut over to AWS, you only need a short period of downtime to process the last transaction log backups.

AWS DataSync is an online data transfer service that simplifies, automates, and accelerates moving incremental backup between on-premises to Amazon FSx file share. Amazon FSx provides fully managed SMB-based file storage, which acts as the target storage for your backup files.

The following diagram illustrates the solution architecture:

Simplify your large SQL Server databases migrations using AWS Snowball Edge

Figure 1: Solution architecture

Prerequisites

Before you get started, you must complete the following prerequisites:

  1. Set up and configure the AWS Snowball by following the AWS Snowball data migration guide.
  2. Prepare your target SQL Server database running on Amazon EC2 for migration.
  3. Make sure that you have created an AWS Managed or self-managed Active Directory for Microsoft Windows authentication.
  4. Create an Amazon FSx file system to store backups in AWS, which is the target storage for your SQL Server backup files. You can join the file system to your existing Active Directory. By joining the file system to your Active Directory, the file permissions are the same in Amazon FSx as they are on premises. The result is that your users have the same file access. AWS has created steps that guide you through the process of creating an Amazon FSx file system.
  5. Modify the source databases to use the full recovery model before taking a full database backup to permit log backups.
  6. Create a local file share on-premises. SQL Server backups will be directed to some local share, which will be monitored by AWS DataSync agent. New or changed files are copied to AWS DataSync and can be stored at Amazon FSx.
  7. Choose your Network connectivity for AWS DataSync. The two most common options are to communicate over the internet using the public service endpoints or over a private connection using VPC endpoints using AWS PrivateLink. Follow the network requirements for DataSync to configure your network environment.

Solution walkthrough

The solution consists of two stages.

In stage one, we use AWS Snowball for transferring one-time full backup to AWS and then restore backup to the SQL Server database on Amazon EC2:

  1. Create a full backup of SQL Server database from the on-premises source.
  2. Transfer the full backup files to AWS Snowball, and ship the Snow device back to AWS.
  3. After your data is loaded into an Amazon S3 bucket, transfer the data to Amazon FSx from Amazon S3 using AWS DataSync.
  4. Access your Amazon FSx file system from the EC2 instance that your SQL Server database is running on.
  5. Restore the full database backup file system to the target SQL Server running on EC2 in a “Restoring” state.

In stage two, we use AWS DataSync for ongoing data changes:

  1. Create SQL Server incremental backups to local file share on premises.
  2. Install the AWS DataSync agent on premises.
  3. Create DataSync tasks for continuous synchronization of file data from source to target Amazon FSx file system.
  4. Run the DataSync task.
  5. Monitor your DataSync task and cutover the database-dependent application to use Amazon EC2.

(Stage 1.1) Create a full backup of SQL Server database from the on-premises source

Create a full database backup by running the following T-SQL code.

BACKUP DATABASE <database_name> TO DISK = '<Filepath\database_name.bak>'
WITH NOFORMAT, NOINIT, NAME = '<database_name Full Database Backup>',
SKIP, NOREWIND, NOUNLOAD, STATS = 10
GO

To reduce the backup size and to speed up transferring of data, consider using the backup compression feature in SQL Server. See backup compression in the Microsoft documentation.

(Stage 1.2) Transfer the full backups to Snow Device and ship the snow device to AWS

The next step is to transfer the full database backup files to your Snow device and ship the Snow device to AWS. AWS has created a getting start guide and Knowledge Center article that walks you through the process on how to transfer data using An AWS Snow Family device.

(Stage 1.3) Transfer the data to Amazon FSx from Amazon S3 bucket

The next step is to create a task for DataSync by choosing the source location type as Amazon S3 bucket, destination location type as Amazon FSx for Windows File Server, and your migration settings. For instructions on how to create data transfer from S3 to in-cloud file systems, see the documentation on transferring data from S3 to in-cloud file systems in AWS documentation.

Creating a DataSync task to transfer data from Amazon S3 to Amazon FSx

Once you have specified the task settings and created the DataSync task, the next step is to start the task. You should see your backup files appearing on your Amazon FSx file system.

Once you start your task, you should see your backup files appearing on your Amazon FSx file system

(Stage 1.4) Access your Amazon FSx file system from the EC2 instance that your SQL Server database is running on

Amazon FSx file system stores your backups in AWS, and you need access to your file system on the target EC2 machine so that the backup files can be directly restored to the SQL Server running on Amazon EC2.

To access your file shares, use the Windows Network Drive Mapping functionality to map a drive letter on your compute instance to your Amazon FSx file system.

After your file system is mapped, your applications and users can access files and folders on your file system as if they are local files and folders. Refer to our documentation on how to map a file share on an Amazon EC2 Windows instance.

(Stage 1.5) Restore the SQL Server database from the Amazon FSx file share

After you transferred the file to Amazon FSx file share, it’s available for SQL Server on Amazon EC2 to restore. Restore a full database backup by executing the following T-SQL code.

USE [master]
RESTORE DATABASE <database_name> FROM DISK = '<Filepath\database_name.bak>' WITH NORECOVERY
GO

Restore a full database backup by executing the following T-SQL code

The NORECOVERY option leaves the database in a “Restoring” state after the restore has completed. This allows you to restore additional backups in the current recovery path.

(Stage 2.1) Create SQL Server incremental backups to local file share on premises

Automate your SQL Server backups in an SQL Server agent job to run on a scheduled basis and directed to your local Server Message Block (SMB) file share on premises. Create an incremental log backup to local share by executing the following T-SQL code.

BACKUP LOG <database_name>   
TO DISK = '<Filepath\database_name.trn>'
GO

(Stage 2.2) Install the AWS DataSync agent

Start by deploying a DataSync agent on-premises, where it can access your source storage location via SMB file share with SQL Server backup files. AWS provides an image for the agent deployment can be downloaded from the DataSync console.

  • If your SQL Servers are running on-premises, you can deploy AWS DataSync agent on a virtual machine.
  • If your SQL Servers are running in AWS on Amazon EC2, you can deploy AWS DataSync agent as an Amazon EC2 instance.
  • Regardless of which environment you deploy into, make sure you follow our guidance on system requirements when setting up AWS DataSync agent.

(Stage 2.3) Create DataSync migration task for scheduling backup file share transfers from the source to Amazon FSx

The next step is to create a task for DataSync. To do so, choose the source location type as Server Message Block (SMB), destination location type as Amazon FSx for Windows File Server, and your migration settings. For instructions on how to create a location, see the documentation on working with locations.

Source location type as Server Message Block (SMB), destination location type as Amazon FSx for Windows File Server

Validation: In the Task execution configuration section, choose the option to Verify only the data transferred.

Bandwidth: Set a bandwidth limit if you want to control the amount of bandwidth that AWS DataSync uses to replicate the data. By default, AWS DataSync scales to use the available bandwidth to expedite the data transfer.

Configuring DataSync task settings - task name and execution configuration

Schedule: On the Schedule option, you can specify when you want the AWS DataSync task to run.

On the Schedule option, you can specify when you want the AWS DataSync task to run

(Stage 2.4) Run the DataSync task

When you start the task, you should see the subsequent changes transferred as differential and transaction log backups to the target Amazon FSx file system.

When you start the task, you should see the subsequent changes transferred as differential and transaction log backups to Amazon FSx

After the initial backup reaches AWS and is restored to your target database in the “restoring” state, you can now start restoring the subsequent log backups and kept in sync with production database with regular transaction log restore.

(Stage 2.5) Monitor your DataSync task and cutover the database-dependent application to use Amazon EC2

You can monitor the task using CloudWatch. When the backup file is created at the source location and begins replicating to the target, it is replicated based on the task schedule configured in the previous step. If you are concerned about impacting your internet bandwidth during business hours, you can configure the AWS DataSync task to run on a custom schedule and specify off-peak hours.

When the application is ready for cutover, perform the following steps:

  1. Take the final tail T-log backup on source database.
  2. Run your DataSync task one more time and monitor the copy and make sure that the T-log file is transferred to Amazon FSx.
  3. From Amazon FSx, backup files can be directly restored to SQL Server on EC2.
  4. Restore the tail log backup on target SQL Server database using recovery option.
  5. When the application is ready for cutover to start using the target database on Amazon EC2, simply point the application to the target database using the SQL Server database endpoint in AWS.

Monitor your DataSync task and cutover the application to use Amazon EC2 - SQL Server management studio (SSMS) client

Cleaning up

When you’re finished using the resources in this post, clean up the AWS resources to avoid incurring unwanted charges. Specifically, delete the RDS for SQL Server instance, Amazon FSx file system, and DataSync objects (as in the DataSync agent, DataSync task, DataSync source location, and DataSync destination location).

Conclusion

In this blog post, I covered how to simplify large SQL Server database migrations by leveraging AWS Snowball, AWS DataSync, and Amazon FSx. I demonstrated how you can optimize a one-time full backup transfer to AWS when there is a local network bandwidth constraint using an AWS Snowball device. I also showed how you can transfer incremental backups using AWS DataSync and Amazon FSx for Windows File Server, keeping your SQL Server in AWS up to date until your application cutover. This solution can help you automate and accelerate data transfer, freeing up your on-premises storage capacity. By doing so, you can reduce your organizational costs by moving that data off expensive storage systems.

Thanks for reading this blog post. If you have any comments or questions, don’t hesitate to leave them in the comments section. To learn more about the services used in this post, check out the following resources: