AWS Database Blog
Migrate SQL Server databases to Babelfish for Aurora PostgreSQL using change tracking with a linked server
Customers typically choose AWS Database Migration Service (AWS DMS) to migrate SQL Server databases to Babelfish for Aurora PostgreSQL. AWS DMS supports all editions of SQL Server for full load migrations. However, for continuous replication with ongoing changes, the source SQL Server must either have transactional replication or change data capture (CDC) enabled. This allows the AWS DMS to read changes from either the transaction log file of the database or the transaction log backup and replicate them to the target database.
SQL Server editions such as Enterprise, Standard, and Developer come with transactional replication and CDC capabilities. Therefore, you can implement continuous replication from these editions to Babelfish for Aurora PostgreSQL. However, if you’re using certain editions like SQL Server Web Edition or running SQL Server workload on Azure SQL, it doesn’t support either transactional replication or CDC. In such cases, you can only perform full load without continuous replication.
To overcome this limitation of replicating ongoing changes, a workaround is to use change tracking in conjunction with a linked server. This approach allows you to track changes in the Azure SQL or SQL Server Web Edition and replicate them to the target database effectively.
In this post, we provide instructions to replicate ongoing changes using the change tracking feature available in SQL Server Web Edition (source) with the linked server feature available in the Babelfish for Aurora PostgreSQL (target).
Solution overview
Change tracking is a mechanism to track data changes in a database. When change tracking is enabled on a table, SQL Server internally tracks the modifications (insert, update, or delete) in a separate internal table that contains the primary key column of the table.
All changes since tracking was enabled can be retrieved using a function named CHANGETABLE. To learn more about change tracking, refer to Work with change tracking (SQL Server).
The following diagram illustrates the solution architecture.
To implement the solution, complete the following high-level setup steps:
- Enable change tracking at the database and table level on the source server.
- Migrate the source database with initial full load using AWS DMS to the target.
- Create a linked server at the target to the source SQL Server.
- Create an anchor table at the target.
- Extract the changed data from the source and load it to the target table.
Prerequisites
To test this solution, you need the following prerequisites:
- Source SQL Server or Azure SQL instance
- Northwind database on SQL Server
- Babelfish for Aurora PostgreSQL instance version 4.0 or later
- Knowledge on functions in SQL Server and PostgreSQL
- SQL Server Management Studio (SSMS) or another client tool for SQL Server connectivity
This solution involves the creation and utilization of new AWS resources. Therefore, it will incur costs on your account. Refer to AWS Pricing for more information. We strongly recommend that you set this up in a non-production instance and run end-to-end validations before you implement this solution in a production environment.
Enable change tracking on the source
To enable change tracking on the source server, follow these steps:
- Connect to the SQL Server instance using SSMS.
- Choose the Northwind database and choose New Query.
- Enable change tracking at the database level using the following command:
- Enable change tracking on the tables that you want to include as part of continuous replication to the target. Here, we enable change tracking on the customers table using the following command:
- Run the following command to verify change tracking is enabled on the tables:
- To test how change tracking will retrieve the updated rows, run the following commands:
- Note the current change tracking version using the function CHANGE_TRACKING_CURRENT_VERSION (). Let’s assume this value is 26.
- Update a row in the customers table.
- To find which row in customer table is updated, use the CHANGETABLE function with the version number (26 in this example) as a parameter.
It’s essential to enable change tracking on the tables that need continuous replication before starting a full load with AWS DMS. This enables the system to keep track of all data modification operations (DML) on the designated tables from the beginning of the full load.
Migrate the source database with initial full load using AWS DMS to target
To migrate the source database with full load using AWS DMS to target, perform the following steps. Refer to Migrate SQL Server to Babelfish for Aurora PostgreSQL using the Compass tool and AWS DMS to create and connect to a Babelfish for Aurora PostgreSQL cluster. The post also outlines the steps to migrate the data from SQL Server to Babelfish using AWS DMS.
- On the target server, create the schema of the Northwind database. You can download a Northwind sample schema from the GitHub repo.
- Migrate the data from the source to the target using the full load setting in AWS DMS. For instructions, refer to Using Babelfish for Aurora PostgreSQL as a target for AWS Database Migration Service.
Create a linked server at the target to the source SQL Server
To configure a linked server in Babelfish for Aurora PostgreSQL, refer to Babelfish supports linked servers. Complete the following steps:
- Connect to the target database instance using SSMS or SQLCMD and run the following command. Here, we use SQLCMD to connect to the Babelfish for Aurora PostgreSQL instance. Replace the parameters with appropriate values.
- Install the tds_fdw extension:
- Create a linked server on the target instance using the following command. Replace the @datasrc, @rmtuser, and @rmtpassword parameters with appropriate values.
- Use OPENQUERY() T-SQL as shown in the following code, or a standard four-part naming convention to reference a table, view, or other supported objects, on the remote server:
- Run the following command to get modified records in the customers table based on the initial change tracking version (for this example, we use version 0):
The following screenshot shows our output.
You have successfully queried the modified records at the source based on the change tracking version. To automate the solution, you need to store the change tracking version. You can store the change tracking versions in a separate table and call it an anchor table. This helps you retrieve only the changes since the last synchronization.
Create an anchor table at the target
After you can query the records at the source using linked server, create an anchor table at the target. Here, we create a table named CT_ANCHOR_NORTHWIND in the Northwind database using the following command. This table acts as an anchor and holds the change tracking version numbers each time the data is replicated.
After you create an anchor table, insert a row for each table to be tracked. Here, we insert the schema name, customer name, fetched change tracking version, and next change tracking version values for the customers table in the Northwind database:
The following screenshot shows our output.
During each replication, the column CT_NEXT_VER needs to be updated manually or using an automation script with the current version ID of the change tracking table. You can retrieve the next set of changed records using this value.
The following figure describes how to maintain the version numbers during each synchronization.
Extract the changed data from the source and load it to the target table
To synchronize the data between the source SQL Server and the target Babelfish for Aurora PostgreSQL, perform the following steps. You may also run these steps on a scheduled basis.
- Connect to the target database and get the value of ct_next_ver (the next version to fetch the records) from the anchor table (ct_northwind_anchor) in a variable @ct_next_ver:
- Get the current change tracking version from source in a variable @ct_src_current_ver. You can get the value using a linked server query on the target.
- Delete the records at the target matching the primary key values of the source table changed rows based on a specific change tracking version (@ct_next_ver):
- Extract the new records from the source based on a specific change tracking version (@ct_next_ver) into their destination:
- Update the column values in the anchor table with appropriate values. These values will be used to fetch the next set during the next run.
- Verify the target table values with the source to check the synchronization:
Clean up
To avoid future charges and remove the components created while testing this use case, complete the following steps:
- Connect to the source SQL Server instance using SSMS.
- Choose the master database and choose New Query.
- Execute the following command:
Summary
In this post, we demonstrated how migrate SQL Server databases to Babelfish for Aurora PostgreSQL using change tracking with a linked server. This configuration helps when migrating your SQL Server workloads to Babelfish for Aurora PostgreSQL with minimal downtime. You can automate this solution by running scripts as a scheduled job.
If you have any comments or feedback, leave them in the comments section.
About the Authors
Chandra Pathivada is a Senior Database Specialist Solutions Architect with Amazon Web Services. He works with the Amazon RDS team, focusing on open source database engines like Amazon RDS for PostgreSQL and Amazon Aurora PostgreSQL. He enjoys working with customers to help design, deploy, and optimize relational database workloads on AWS.
Minesh Chande is Senior Database Specialist Solutions Architect at Amazon Web Services. He helps customers across different industry verticals design, migrate, and optimize their SQL Server workloads to a managed database platform like Amazon RDS, Amazon RDS Custom and Babelfish for Aurora PostgreSQL.