AWS Database Blog
Configure change data capture parameters on Amazon RDS for SQL Server
AWS Database Migration Service (AWS DMS) is a managed migration and replication service that helps you move your database and analytic workloads to AWS quickly and securely. The source database remains fully operational during the migration, minimizing downtime to applications that rely on the database. AWS DMS can migrate data to and from most widely used commercial and open source databases.
SQL Server is a relational database developed by Microsoft. Amazon Relational Database Service (Amazon RDS) for SQL Server makes it straightforward to set up, operate, and scale SQL Server deployments in the cloud. Amazon RDS supports data replication via change data capture (CDC), and having CDC enabled is one of the prerequisites for using Amazon RDS for SQL Server with AWS DMS. CDC captures changes that are made to the data in the tables. It stores metadata about each change, which you can access later.
In this post, we do a deep dive on CDC parameters and explain their impact when configuring AWS DMS, in addition to discussing a few best practices.
Prerequisites
To follow along with this post, you should have familiarity with the following AWS services:
- AWS DMS
- Amazon RDS for SQL Server
Additionally, you need an AWS account with sufficient privileges to launch the resources necessary for this solution.
How AWS DMS works with Amazon RDS for SQL Server
For Amazon RDS for SQL Server, AWS DMS reads the transaction log (T-log) using Microsoft functions and gets the top 50,000 events by default. AWS DMS starts by querying the database log on the specific partition IDs that relate to the tables defined in the AWS DMS task. Partition IDs are read during each table reload, task restart, and task resume during both full load and CDC. AWS DMS retrieves the object IDs and obtains the data partition IDs corresponding to those object IDs. After you get the partition IDs, you fetch the relevant partitions from the T-log. This cycle runs in intervals of every second.
The following diagram illustrates the architecture. In this example, we use Amazon RDS for SQL Server as a source. The target for the AWS DMS task can be any supported endpoint.
AWS DMS reads the T-log using Microsoft functions, and requires CDC to be enabled on the source database and the tables that will be in scope of the AWS DMS task.
Why is CDC required for AWS DMS?
When CDC is enabled on a table, SQL Server creates a table in the cdc
schema for this table. The changed table is populated with the change data and is assigned a name based on the schema and table being tracked. For instance, if you have a table called customer
under the dbo
schema, a table named cdc.customer_CT
would be created on the cdc
schema to record all changes against the dbo.customer
table.
AWS DMS doesn’t read from the change tables. AWS DMS requires CDC to be enabled to make sure enhanced logging is captured in the T-log for AWS DMS to read the changes. As explained in the previous section, AWS DMS uses Microsoft functions to read the T-log. Consider the following table on the source:
If you issue an UPDATE statement on this table and update the [name]
column, you can see the difference between the [RowLog Contents 0]
and [RowLog Contents 1]
captured with and without CDC. For the sake of brevity, we have included a snippet of the following query that AWS DMS runs on the source:
The output of the query shows the complete information captured in the T-log in the second record (the UPDATE statement was issued after enabling CDC).
Current LSN | operation | RowLog Contents 0 | RowLog Contents 1 |
0000014f:0000c16d:0002 | LOP_MODIFY_ROW | 0x1800746573747573657267 | 0x190074657374757365726162 |
0000014f:0000c9ba:0016 | LOP_MODIFY_ROW | 0x30000800020000000200000100190074657374757365726162 | 0x300008000200000002000001001800746573747573657267 |
Understanding CDC parameters
For CDC, two jobs are created:
- Capture job – Scans through the T-log file to read the changes and pushes those changes to the change tracking tables
- Cleanup job – Cleans up the records in the change tracking tables which exhausted the retention period
The following are the CDC parameters that pertain to AWS DMS:
- max_trans – Maximum number of transactions to process in each scan cycle
- max_scans – Maximum number of scan cycles to run in order to extract all rows from the log
- continuous – Indicates whether the capture job is to run continuously (1) or only once (0)
- polling_interval – Number of seconds between log scan cycles
- retention – Number of minutes that change rows are to be retained in change tables
Although AWS DMS doesn’t read the change tables, you must tune the CDC parameters to control the retention of changes in the T-log.
In the next section, we explain how the parameters max_trans
, max_scans
, and polling_interval
help in retaining the log records in the T-log and how to tune them so that changes are retained for sufficient duration for AWS DMS to capture changes.
CDC parameters in action
We walk through the following steps to illustrate these parameters:
- Create a database called
dmscdc
and a table calleddmstestcdc
under the database: - Enable CDC on the database
dmscdc
and the tabledmstestcdc
:You have to tune the CDC parameters to make sure that the log records are retained for sufficient period so that AWS DMS will be able to query the transaction record—that is, the specific log sequence number (LSN) it is looking for in the transaction log file of the source database. These are subjected to the following factors:
- How many transactions is the target behind compared to the source?
- What is the polling interval, specifically the frequency with which the CDC job runs?
- What is the value of the
Maxtrans
andMaxscans
? These parameters determine how many transactions CDC will process on each run.
- Configure the capture job as follows. You must stop and start the CDC job every time you change the capture job parameters, which in our case is
pollinginterval
: - Confirm the CDC parameters by running the following command:
job_id job_type job_name maxtrans maxscans continuous pollinginterval retention threshold A49487C5-BF3C-4A8C-9385-6AFA7A3541B9 capture cdc.dmscdc_capture
500 10 1 3599 0 0 17511020-59D2-4C9E-BEA9-0578C0D23B11 cleanup cdc.dmscdc_cleanup
0 0 0 0 4320 5000 With the preceding setting, the capture job will process 5,000 records (
maxtrans
*maxscans
) with a frequency of 1 hour. - Insert a few records into the table
dmstestcdc
to confirm this:The capture job reads the preceding transactions from the T-log and marks those as
replicated
, which is 100,001 records in our case. When the CDC job runs, the capture job will mark those transactions as done. - Check the CDC sessions by running the following query, which should fetch 10 rows. The query will tell us how many records CDC processed, which is 5,000 in our case.
tran_count | start_time | end_time | scan_phase |
500 | 2023-12-07 20:34:15.100 | 2023-12-07 20:34:15.123 | Done |
500 | 2023-12-07 20:34:15.067 | 2023-12-07 20:34:15.083 | Done |
500 | 2023-12-07 20:34:15.037 | 2023-12-07 20:34:15.053 | Done |
500 | 2023-12-07 20:34:15.003 | 2023-12-07 20:34:15.023 | Done |
500 | 2023-12-07 20:34:14.963 | 2023-12-07 20:34:14.990 | Done |
500 | 2023-12-07 20:34:14.927 | 2023-12-07 20:34:14.950 | Done |
500 | 2023-12-07 20:34:14.883 | 2023-12-07 20:34:14.910 | Done |
500 | 2023-12-07 20:34:14.840 | 2023-12-07 20:34:14.870 | Done |
500 | 2023-12-07 20:34:14.797 | 2023-12-07 20:34:14.827 | Done |
500 | 2023-12-07 20:34:14.540 | 2023-12-07 20:34:14.773 | Done |
The preceding records will be purged from the T-log when the backup of the T-log happens on Amazon RDS for SQL Server, typically every 5 minutes, which helps in maintaining the size of the T-log and moving the LSN forward. The remaining records (95,001) will be picked up by the subsequent run of the capture job.
SQL Server doesn’t flush the T-log until after the transactions have been read by CDC. You need to strike a balance between how many records you’re retaining in the T-log and the AWS DMS replication lag. In this case, we make the capture job parameters aggressive by defining a shorter polling interval; then there can be a scenario that the LSN might be missing from the T-log. To avoid T-log truncation and make sure changes are retained in the T-log for sufficient duration, we recommend setting the polling interval to 1 day by running the following command:
Capture historical information of CDC
To monitor the historical information of the capture job, you can query the sys.dm_cdc_log_scan_sessions table. The table contains one row for each log scan session in the current database. It contains up to 32 scan sessions. Run the following query to get the latest 10 records:
The following is a sample output.
session_id | start_time | end_time | duration | scan_phase | error_count | tran_count | command_count | last_commit_cdc_time | latency | empty_scan_count | failed_sessions_count |
0 | 2023-12-07 19:21:27.283 | 2023-12-08 00:34:12.837 | 6 | Aggregate | 0 | 125001 | 125001 | 2023-12-07 19:50:32.657 | 17020 | 0 | 0 |
651 | 2023-12-08 00:34:12.820 | 2023-12-08 00:34:12.837 | 0 | Done | 0 | 500 | 500 | 2023-12-07 19:50:32.657 | 17020 | 0 | 0 |
650 | 2023-12-08 00:34:12.790 | 2023-12-08 00:34:12.810 | 0 | Done | 0 | 500 | 500 | 2023-12-07 19:50:31.700 | 17021 | 0 | 0 |
649 | 2023-12-08 00:34:12.760 | 2023-12-08 00:34:12.780 | 0 | Done | 0 | 500 | 500 | 2023-12-07 19:50:30.707 | 17022 | 0 | 0 |
648 | 2023-12-08 00:34:12.703 | 2023-12-08 00:34:12.723 | 0 | Done | 0 | 500 | 500 | 2023-12-07 19:50:29.757 | 17023 | 0 | 0 |
647 | 2023-12-08 00:34:12.670 | 2023-12-08 00:34:12.693 | 0 | Done | 0 | 500 | 500 | 2023-12-07 19:50:28.620 | 17024 | 0 | 0 |
646 | 2023-12-08 00:34:12.633 | 2023-12-08 00:34:12.660 | 0 | Done | 0 | 500 | 500 | 2023-12-07 19:50:27.523 | 17025 | 0 | 0 |
645 | 2023-12-08 00:34:12.587 | 2023-12-08 00:34:12.620 | 0 | Done | 0 | 500 | 500 | 2023-12-07 19:50:26.527 | 17026 | 0 | 0 |
644 | 2023-12-08 00:34:12.530 | 2023-12-08 00:34:12.573 | 0 | Done | 0 | 500 | 500 | 2023-12-07 19:50:25.490 | 17027 | 0 | 0 |
643 | 2023-12-08 00:34:12.500 | 2023-12-08 00:34:12.520 | 0 | Done | 0 | 500 | 500 | 2023-12-07 19:50:24.450 | 17028 | 0 | 0 |
Best practices and known issues
In this section, we discuss some best practices and considerations pertaining to the CDC parameters.
T-Log records truncated on failover in Multi-AZ instances
Always make sure that when the CDC parameters are changed on the primary instance, they are retained on the failover by running the rds_set_configuration command.
For instance, you can run the following sample command on the database dms_test
to set the maxtrans
and pollinginterval
parameters:
Run the following commands to make sure these values are retained post-failover:
Planned failovers or maintenance of the AWS DMS replication instance
For Amazon RDS for SQL Server, you need to make sure that every time the AWS DMS task is stopped for any maintenance activity on the source or during a planned scaling of the related AWS DMS replication instance, the capture job doesn’t run. When the capture job runs, the scanned events will be purged from the T-log when the T-log backup happens on Amazon Simple Storage Service (Amazon S3) every 5 minutes.
- Stop the capture job by running the following command:
- Stop the AWS DMS task.
- Finish your desired maintenance.
- Resume the AWS DMS task.
- Wait for the source latency to be 0.
- Start the capture job by running the following command:
The AWS DMS task will fail with the following error message if the aformentioned sequence of steps are not followed:
If you observe that the LSN is getting truncated on the source after stopping the capture job, there might not be any CDC events in the active T-log that could have prevented the truncation. This can arise when the database is idle or has fewer transactions. In this scenario, the sequence of steps is as follows:
- Stop the capture job by running the following command:
- Make sure there are some transactions or changes in the CDC-enabled database before stopping the AWS DMS task. You can run a script that runs DML statements every second. If you want to create a test script, you can follow the instructions given later in this section.
- Stop the AWS DMS task.
- Finish your desired maintenance.
- Resume the AWS DMS task.
- Wait for it to sync by monitoring the source latency.
- Stop the script that you set up in Step 2.
- Start the capture job by running the following command:
Follow these instructions to set up a script to run the test script mentioned in Step 2. In the following script, you create a table called test_table under the dbo” schema and then enable CDC on the test_table table. You then set up a SQL Server agent job that will insert a record and delete the record into the aforementioned table. This makes sure there are changes in the T-log that need to be picked up by the CDC job and therefore will prevent T-log truncation.
- Create the test table:
- Add the new table to CDC:
- Create a SQL Server agent job in Amazon RDS to insert or delete a record every 1 minute. Use the appropriate owner_login_name and database_name values in your agent job:
- On the AWS DMS console, exclude this table from your AWS DMS task using mapping rules if you’re using any wildcards (%) in the table selection rules of your AWS DMS task that would replicate this table:
Planned restart or failover of the RDS for SQL Server instance
The RDS for SQL Server agent service restarts whenever there is a reboot of the RDS for SQL Server instance or a failover and this causes the CDC job to rerun after the reboot or failover. To avoid the truncation of the T-log, follow these steps:
- Stop the AWS DMS task.
- Capture the current
maxtrans
andmaxscans
values, which you will revert after the failover: - Change the CDC configuration to set
maxtrans
andmaxscans
to 1: - Run the following statement so that the CDC parameters are retained after the failover:
- Restart the RDS for SQL Server instance.
- Resume the AWS DMS task.
- Restart the captured job with the restored configuration. In the following script, we have assumed
maxtrans
of 500 andmaxscans
of 10, but you should use the values that were captured in Step 2: - Run the following statement so that the CDC parameters are retained after the failover:
Clean up
To avoid incurring recurring charges, clean up your resources:
- On the AWS DMS console, delete any AWS DMS task that you set up.
- Drop the database by running the following command:
Conclusion
In this post, we shared the importance of configuring CDC parameters when using Amazon RDS for SQL Server as a source for configuring AWS DMS tasks, and also discussed some best practices. If you have any feedback or questions, leave them in the comments.
About the Authors
Suchindranath Hegde is a Data Migration Specialist Solutions Architect at Amazon Web Services. He works with our customers to provide guidance and technical assistance on data migration to the AWS Cloud using AWS DMS.
Abhishek Chaturvedi is a Sr. Database Engineer on the Amazon Web Services DMS team.
Mahesh Kansara is a Database Engineering manager at Amazon Web Services. He closely works with development and engineering teams to improve the migration and replication service. He also works with our customers to provide guidance and technical assistance on various database and analytical projects, helping them improving the value of their solutions when using AWS.
Junu Thankappan is a Senior Database Engineer with Amazon Web Services. He works with the AWS RDS team, focusing on commercial database engines and SQL Server.