AWS Database Blog

Introducing Ongoing Replication from Amazon RDS for SQL Server Using AWS Database Migration Service

We are excited to announce a new feature in AWS Database Migration Service (AWS DMS) and Amazon RDS for SQL Server that supports ongoing replication from Amazon RDS for SQL Server sources. AWS DMS helps you migrate databases to AWS quickly and more securely. It also helps you migrate data within AWS. You can migrate your data to and from most widely used commercial and open source databases, such as Oracle, Microsoft SQL Server, and PostgreSQL. The service supports homogeneous migrations, such as SQL Server to SQL Server, and also heterogeneous migrations between different database platforms, such as SQL Server to Amazon Aurora MySQL or SQL Server to Amazon RDS for MySQL.

This post gives you a quick overview of the ongoing replication process from Microsoft SQL Server sources. It also introduces a new feature to stream ongoing changes from an Amazon RDS for SQL Server source using MS-CDC (change data capture in SQL Server) and AWS DMS.

Background
AWS DMS is designed for heterogeneous migrations (for example, SQL Server to MySQL). But it also supports homogeneous migrations (for example, SQL Server to SQL Server). This design required access to the actual changes going on in the source instance—until now.

For tables that have primary keys, AWS DMS is designed to be used in the following manner by default:

  1. When you set up a task for migrating ongoing changes from SQL Server, AWS DMS first enables the database for transactional replication with the following command:
    use master
    exec sp_replicationdboption @dbname = 'db_name', @optname = 'publish', @value = 'true'
    go

    Running the sp_replicationdboption command requires sysadmin permissions on the source SQL Server. Hence, one of the requirements to use SQL Server as a source with AWS DMS (without prior setup) is to have a user with sysadmin permissions. To learn how to prepare the SQL Server source ahead of time before migrating to a supported target using AWS DMS without sysadmin permissions, see Using a SQL Server Database as a Source for AWS DMS.

  2. You then create a distribution database and a publication that contains a filter that is never true (1=0). This step is done only to make sure that you have enough logging in the transaction log in SQL Server. Doing this gives AWS DMS access to the entire update statement running on the source so that it can get more information about ongoing changes.
  3. After the preceding step is done, AWS DMS does not use any replication artifacts. When all the required information is available in the transaction log or transaction log backup, AWS DMS uses the fn_dblog() and fn_dump_dblog() functions to read changes directly from the transaction logs or transaction log backups using the log sequence number (LSN).

For tables without primary keys, AWS DMS uses MS-CDC to source the changes. MS-CDC makes the details of the changes available in an easily consumed relational format that AWS DMS can understand and migrate. The source of change data for change data capture is the SQL Server transaction log. As inserts, updates, and deletes are applied to tracked source tables, entries that describe those changes are added to the log. The log serves as input to the capture process. AWS DMS uses the log access functions described earlier to read changes from the active and backed up transaction logs based on the LSN.

Amazon RDS for SQL Server is a managed database service that helps automate time-consuming tasks like patches, backups, high availability, and so on. Given that it is a managed database service, Amazon RDS does not allow sysadmin permissions or expose all available engine features. SQL Server replication is one such feature that is not yet available in Amazon RDS. As a result, the AWS DMS team worked with the Amazon RDS SQL Server team to make the MS-CDC feature available in Amazon RDS SQL Server instances. This feature allows you to migrate/replicate from Amazon RDS SQL Server instances using AWS DMS.

By default, AWS DMS can migrate changes from an Amazon RDS SQL Server table using MS-CDC regardless of whether the table has a primary key. Note that MS-CDC in SQL Server is an enterprise edition-only feature until SQL Server 2016 SP1, in which it is also supported in standard editions of SQL Server.

Setup: Using Amazon RDS for SQL Server as a source for CDC in AWS DMS

  1. Log in as the master user in the DB instance, and set up the database for MS-CDC:
    exec msdb.dbo.rds_cdc_enable_db 'db_name'
  2. Run the following for each table to enable MS-CDC:
    exec sys.sp_cdc_enable_table 
    @source_schema = N'db_name', 
    @source_name = N'table_name', 
    @role_name = NULL, 
    @supports_net_changes = 1 
    GO
  3. Set the retention period for the changes to be available on the source using the following command:
    EXEC sys.sp_cdc_change_job @job_type = 'capture' ,@pollinginterval = 86400

The polling interval depicts the time in seconds during which the changes are available in the source DB instance for AWS DMS to consume these changes. Note that this prevents transaction log reuse for this duration and could result in an increased usage of the underlying storage as the transaction log is growing. We recommend that you set this to 24 hours so that AWS DMS has ample time to read the changes. But you could also test it with a lower polling interval based on the rate of change on the source.

Note:  Accessing the fn_dblog() function requires master user permissions in Amazon RDS for SQL Server. For Multi-AZ SQL Server Amazon RDS instances, the CDC parameters that are modified on the primary instance are not carried over to the standby instance. To ensure that these parameters are applied on the standby instance, use the rds_set_configuration stored procedure in Amazon RDS for SQL Server.

After completing this setup, you can create a source endpoint, a target endpoint, and a migration task to migrate from an Amazon RDS for SQL Server source instance to any of our supported targets. For a step-by-step example demonstrating migration from SQL Server to Aurora MySQL, see Step-by-Step Migration.

Conclusion
The ability to continuously stream data from an Amazon RDS for SQL Server instance to any AWS DMS supported target opens up support for a new source from which you can replicate data and make it available in multiple locations for further processing.

Thanks for reading! Have a successful migration/ongoing replication.


About the Author

Arun Thiagarajan is a database engineer with the Database Migration Service (DMS) & Schema Conversion Tool (SCT) team at Amazon Web Services. He works on DB migrations related challenges and works closely with customers to help them realize the true potential of the DMS service. He has helped migrate 100s of databases into the AWS cloud using DMS and SCT.