Migrate SQL Server databases to Babelfish for Aurora PostgreSQL using change tracking with a linked server

Customers typically choose AWS Database Migration Service (AWS DMS) to migrate SQL Server databases to Babelfish for Aurora PostgreSQL. AWS DMS supports all editions of SQL Server for full load migrations. However, for continuous replication with ongoing changes, the source SQL Server must either have transactional replication or change data capture (CDC) enabled. This allows the AWS DMS to read changes from either the transaction log file of the database or the transaction log backup and replicate them to the target database.

SQL Server editions such as Enterprise, Standard, and Developer come with transactional replication and CDC capabilities. Therefore, you can implement continuous replication from these editions to Babelfish for Aurora PostgreSQL. However, if you’re using certain editions like SQL Server Web Edition or running SQL Server workload on Azure SQL, it doesn’t support either transactional replication or CDC. In such cases, you can only perform full load without continuous replication.

To overcome this limitation of replicating ongoing changes, a workaround is to use change tracking in conjunction with a linked server. This approach allows you to track changes in the Azure SQL or SQL Server Web Edition and replicate them to the target database effectively.

In this post, we provide instructions to replicate ongoing changes using the change tracking feature available in SQL Server Web Edition (source) with the linked server feature available in the Babelfish for Aurora PostgreSQL (target).

Solution overview

Change tracking is a mechanism to track data changes in a database. When change tracking is enabled on a table, SQL Server internally tracks the modifications (insert, update, or delete) in a separate internal table that contains the primary key column of the table.

All changes since tracking was enabled can be retrieved using a function named CHANGETABLE. To learn more about change tracking, refer to Work with change tracking (SQL Server).

The following diagram illustrates the solution architecture.

To implement the solution, complete the following high-level setup steps:

Enable change tracking at the database and table level on the source server.
Migrate the source database with initial full load using AWS DMS to the target.
Create a linked server at the target to the source SQL Server.
Create an anchor table at the target.
Extract the changed data from the source and load it to the target table.

Prerequisites

To test this solution, you need the following prerequisites:

Source SQL Server or Azure SQL instance
Northwind database on SQL Server
Babelfish for Aurora PostgreSQL instance version 4.0 or later
Knowledge on functions in SQL Server and PostgreSQL
SQL Server Management Studio (SSMS) or another client tool for SQL Server connectivity

This solution involves the creation and utilization of new AWS resources. Therefore, it will incur costs on your account. Refer to AWS Pricing for more information. We strongly recommend that you set this up in a non-production instance and run end-to-end validations before you implement this solution in a production environment.

Enable change tracking on the source

To enable change tracking on the source server, follow these steps:

Connect to the SQL Server instance using SSMS.
Choose the Northwind database and choose New Query.

Enable change tracking at the database level using the following command:

ALTER DATABASE NORTHWIND SET CHANGE_TRACKING = ON(CHANGE_RETENTION = 5 DAYS, AUTO_CLEANUP = OFF)

Enable change tracking on the tables that you want to include as part of continuous replication to the target. Here, we enable change tracking on the customers table using the following command:
```
ALTER TABLE DBO.CUSTOMERS ENABLE CHANGE_TRACKING
```

Run the following command to verify change tracking is enabled on the tables:

SELECT schema_name(T.schema_id) as SCH_NAME ,T.NAME,CT.min_valid_version  ,ct.*
FROM northwind.sys.tables T LEFT OUTER JOIN sys.change_tracking_tables CT on T.object_id = CT.object_id
WHERE min_valid_version is not null

To test how change tracking will retrieve the updated rows, run the following commands:
1. Note the current change tracking version using the function CHANGE_TRACKING_CURRENT_VERSION (). Let’s assume this value is 26.
2. Update a row in the customers table.
3. To find which row in customer table is updated, use the CHANGETABLE function with the version number (26 in this example) as a parameter.

DECLARE @CTCV INT;
SELECT @CTCV = CHANGE_TRACKING_CURRENT_VERSION () ;

-- In this example , the current value is 26
SELECT @CTCV

-- update a record in customer table
UPDATE dbo.CUSTOMERS SET CompanyName ='camilo2' where CUSTOMERID = 'ALFKI'

-- Get all the changes that are made after version 26 in this example.
SELECT P.*,'--',CT.* FROM customers AS P JOIN CHANGETABLE(CHANGES northwind.dbo.customers,@CTCV ) AS CT ON  P.customerid = CT.customerid;

It’s essential to enable change tracking on the tables that need continuous replication before starting a full load with AWS DMS. This enables the system to keep track of all data modification operations (DML) on the designated tables from the beginning of the full load.

Migrate the source database with initial full load using AWS DMS to target

To migrate the source database with full load using AWS DMS to target, perform the following steps. Refer to Migrate SQL Server to Babelfish for Aurora PostgreSQL using the Compass tool and AWS DMS to create and connect to a Babelfish for Aurora PostgreSQL cluster. The post also outlines the steps to migrate the data from SQL Server to Babelfish using AWS DMS.

On the target server, create the schema of the Northwind database. You can download a Northwind sample schema from the GitHub repo.
Migrate the data from the source to the target using the full load setting in AWS DMS. For instructions, refer to Using Babelfish for Aurora PostgreSQL as a target for AWS Database Migration Service.

Create a linked server at the target to the source SQL Server

To configure a linked server in Babelfish for Aurora PostgreSQL, refer to Babelfish supports linked servers. Complete the following steps:

Connect to the target database instance using SSMS or SQLCMD and run the following command. Here, we use SQLCMD to connect to the Babelfish for Aurora PostgreSQL instance. Replace the parameters with appropriate values.
```
sqlcmd -S your-DB-instance.aws-region.rds.amazonaws.com -U test -P password
```

Install the tds_fdw extension:

EXEC sp_execute_postgresql N'CREATE EXTENSION tds_fdw';

Create a linked server on the target instance using the following command. Replace the @datasrc, @rmtuser, and @rmtpassword parameters with appropriate values.

Use Northwind; 
GO 
EXEC master.dbo.sp_addlinkedserver @server=N'ls_northwind', @srvproduct=N'', @provider=N'SQLNCLI', @datasrc=N'myserver.xxxxx.US-WEST-2.RDS.AMAZONAWS.COM', @catalog='northwind'; 
GO 
EXEC master.dbo.sp_addlinkedsrvlogin @rmtsrvname=N' ls_northwind',@useself=N'False',@locallogin=NULL,@rmtuser=N'username',@rmtpassword='password';

Use OPENQUERY() T-SQL as shown in the following code, or a standard four-part naming convention to reference a table, view, or other supported objects, on the remote server:
```
SELECT * FROM OPENQUERY(ls_northwind, 'SELECT * FROM customers');
```
Run the following command to get modified records in the customers table based on the initial change tracking version (for this example, we use version 0):
```
SELECT * FROM OPENQUERY(ls_northwind, 'SELECT C.* FROM customers AS C JOIN CHANGETABLE(CHANGES northwind.dbo.customers,0 ) AS CT ON  C.customerid = CT.customerid');
```
The following screenshot shows our output.

You have successfully queried the modified records at the source based on the change tracking version. To automate the solution, you need to store the change tracking version. You can store the change tracking versions in a separate table and call it an anchor table. This helps you retrieve only the changes since the last synchronization.

Create an anchor table at the target

After you can query the records at the source using linked server, create an anchor table at the target. Here, we create a table named CT_ANCHOR_NORTHWIND in the Northwind database using the following command. This table acts as an anchor and holds the change tracking version numbers each time the data is replicated.

CREATE TABLE ct_anchor_northwind
(
sch_name varchar(256),
tbl_name varchar(256),
ct_fetched_ver int, -- 0 during first execution
ct_next_ver int  -- current version at the source database , to fetch next version of the records
)

After you create an anchor table, insert a row for each table to be tracked. Here, we insert the schema name, customer name, fetched change tracking version, and next change tracking version values for the customers table in the Northwind database:

insert into ct_anchor_northwind (sch_name,tbl_name,ct_fetched_ver,ct_next_ver) values ('dbo' ,'customers',0,0)

The following screenshot shows our output.

During each replication, the column CT_NEXT_VER needs to be updated manually or using an automation script with the current version ID of the change tracking table. You can retrieve the next set of changed records using this value.

The following figure describes how to maintain the version numbers during each synchronization.

Extract the changed data from the source and load it to the target table

To synchronize the data between the source SQL Server and the target Babelfish for Aurora PostgreSQL, perform the following steps. You may also run these steps on a scheduled basis.

Connect to the target database and get the value of ct_next_ver (the next version to fetch the records) from the anchor table (ct_northwind_anchor) in a variable @ct_next_ver:

declare @ct_next_ver varchar(9)
SELECT @ct_next_ver = ct_next_ver FROM ct_anchor_northwind WHERE sch_name = 'dbo'  and tbl_name =  'customers'
print @ct_next_ver
-- result: 0

Get the current change tracking version from source in a variable @ct_src_current_ver. You can get the value using a linked server query on the target.

declare @ct_src_current_ver varchar(9)
SELECT @ct_src_current_ver  = src_current_ver FROM OPENQUERY(ls_northwind,'select   change_tracking_current_version ()  as src_current_ver from northwind.sys.change_tracking_tables')
print @ct_src_current_ver
-- result: 10

Delete the records at the target matching the primary key values of the source table changed rows based on a specific change tracking version (@ct_next_ver):

DELETE FROM [dbo].[customers] from [dbo].[customers] x
JOIN ( SELECT * from openquery(ls_northwind,'select x.[customerid] FROM [northwind].[dbo].[customers] x  JOIN changetable(changes [northwind].[dbo].[customers],0) as y on x.[customerid]=y.[customerid] ')) y on x.[customerid]=y.[customerid] ;
-- result : 2 rows affected

Extract the new records from the source based on a specific change tracking version (@ct_next_ver) into their destination:

INSERT INTO [dbo].[customers]([customerid], [companyname], [contactname], [contacttitle], [address], [city], [region], [postalcode], [country], [phone], [fax])
SELECT * FROM OPENQUERY(ls_northwind,'select x.[customerid], x.[companyname], x.[contactname], x.[contacttitle], x.[address], x.[city], x.[region], x.[postalcode], x.[country], x.[phone], x.[fax] from [northwind].[dbo].[customers] x  join changetable(changes [northwind].[dbo].[customers],0) as y on x.[customerid]=y.[customerid] ')
-- result: 2 rows affected

Update the column values in the anchor table with appropriate values. These values will be used to fetch the next set during the next run.

UPDATE ct_anchor_northwind  SET ct_fetched_ver = @ct_next_ver  , ct_next_ver = @ct_src_current_ver
WHERE sch_name = 'dbo' and tbl_name = 'customers'

Verify the target table values with the source to check the synchronization:
```
SELECT * FROM customers WHERE customerid  = 'alfki'
```

Clean up

To avoid future charges and remove the components created while testing this use case, complete the following steps:

Connect to the source SQL Server instance using SSMS.
Choose the master database and choose New Query.
Execute the following command:

Drop database Northwind
GO

Summary

In this post, we demonstrated how migrate SQL Server databases to Babelfish for Aurora PostgreSQL using change tracking with a linked server. This configuration helps when migrating your SQL Server workloads to Babelfish for Aurora PostgreSQL with minimal downtime. You can automate this solution by running scripts as a scheduled job.

If you have any comments or feedback, leave them in the comments section.

About the Authors

Chandra Pathivada is a Senior Database Specialist Solutions Architect with Amazon Web Services. He works with the Amazon RDS team, focusing on open source database engines like Amazon RDS for PostgreSQL and Amazon Aurora PostgreSQL. He enjoys working with customers to help design, deploy, and optimize relational database workloads on AWS.

Minesh Chande is Senior Database Specialist Solutions Architect at Amazon Web Services. He helps customers across different industry verticals design, migrate, and optimize their SQL Server workloads to a managed database platform like Amazon RDS, Amazon RDS Custom and Babelfish for Aurora PostgreSQL.

AWS Database Blog