AWS Database Blog

How AWS DMS handles open transactions when starting a full load and CDC task

For a relational database management system (RDBMS), consistency is one of the crucial properties of a transaction – it defines the rules for maintaining data points in a correct state after a transaction. The consistency of the data ensures that transactions only make changes to tables in predefined, predictable ways thereby preventing unintended consequences for the integrity of your data. For a typical migration approach from an RDBMS with reduced downtime, a consistent snapshot should first be taken from the source database. Then a transaction position from where the consistent snapshot was taken should be recorded in order to establish replication to be continuously consistent with the source database. Conceptually, if there are any gaps or missing transactions between the snapshot and the start of the replication, the target database of the migration won’t be consistent.

AWS Database Migration Service (AWS DMS) helps you migrate relational databases, data warehouses, NoSQL databases, and other types of data stores. To manage the initial data load or the AWS DMS full load phase and the transaction position where change data capture (CDC) starts from a consistent state, AWS DMS has implemented the task setting TransactionConsistencyTimeout to handle open transactions when the AWS DMS task starts. In this post, we discuss how AWS DMS handles open transactions when starting a full load and CDC tasks for various engines, and share best practices to avoid consistency timeout issues.

Common questions about TransactionConsistencyTimeout

The following are some typical questions about TransactionConsistencyTimeout depending on your source database:

  • Oracle source – Why does my task hang in a Before load state for 10 minutes before it starts? What does an open transaction mean in my task log? What is the consequence of an open transaction timeout?
  • SQL Server source – Why does my task hang in a Before load state when I start my task?
  • PostgreSQL source – Why did my task fail with a statement timeout at the beginning of the task though I don’t have statement_timeout set?
  • MySQL source – Will long transactions block the full load and CDC task from starting?

To answer these questions, let’s look at what open transactions mean in AWS DMS.

Open transactions in AWS DMS

An open transaction in AWS DMS is a transaction in the source relational database that hasn’t committed or rolled back when the AWS DMS full load and CDC task start. Similarly, when using the migration approach mentioned earlier, AWS DMS needs a consistent state for the initial bulk load to start and the transaction position for CDC. Because different RDBMS engines implement transaction logs differently, AWS DMS also behaves differently for various RDBMS sources. In the following sections, we look at how AWS DMS handles open transactions for Oracle, SQL Server, PostgreSQL, and MySQL as the source.

How AWS DMS handles open transactions for an Oracle source

For a full load and CDC task, the task setting TransactionConsistencyTimeout defines the number of seconds that AWS DMS waits for transactions to close before beginning a full load operation. The default value is 600 (10 minutes). If any transaction is open when the task starts, AWS DMS waits for 10 minutes by default. After the TransactionConsistencyTimeout value is reached, even if there are open transactions, the full load starts and transactions that are longer than TransactionConsistencyTimeout are missed in the future replication.

In the following example graph, when the full load and CDC task starts, there are two open transactions identified by the AWS DMS source capture process, out of six transactions total:

Open Transaction handle by AWS DMS

  • Transaction 1 – Although it’s an open transaction when the source capture starts, it commits within the TransactionConsistencyTimeout time frame and before the full load starts. Therefore, it’s included within the full load phase.
  • Transaction 2 – It starts and commits during the TransactionConsistencyTimeout time frame and before full load starts. Therefore, it’s included within the full load phase.
  • Transactions 3 and 4 – They commit during the full load phase, so become cached changes, and are applied after the full load finishes.
  • Transaction 5 – It starts during full load phase, and commits after the full load finishes, so it becomes a CDC change, and is applied after cached changes are applied.
  • Transaction 6 – Task continues as FailOnTransactionConsistencyBreached is false by default. Open transaction B is skipped, which results in data loss because it was committed after TransactionConsistencyTimeout. We will discuss best practices in a later section regarding the setting for TransactionConsistencyTimeout and FailOnTransactionConsistencyBreached.

The following log shows an example of typical AWS DMS log signatures when there is an open transaction. The SOURCE_CAPTURE component identifies the number of open transactions, and SORTER prints a warning for transaction consistency timeout after TransactionConsistencyTimeout.

xxxx-xx-xxT00:00:00 [SOURCE_CAPTURE  ]I:  Opened transaction list contains '6' transactions, these in-flight transaction(s) will not be copied  (oracle_endpoint_capture.c:967)
…
xxxx-xx-xxT00:00:00 [SORTER          ]I:  6 open transactions. Waiting for transaction consistency  (sorter_transaction.c:322)
…
xxxx-xx-xxT00:10:00 [SORTER          ]W:  Transaction consistency timeout occurred. 6 transactions are still open  (sorter_transaction.c:3575)

How AWS DMS handles open transactions for SQL Server source

Similarly, when the full load and CDC task starts using SQL Server as source, if there are any open transactions, AWS DMS SOURCE_CAPTURE identifies and prints logs as follows:

[SOURCE_CAPTURE  ]I:  do_get_best_lowset_position_for_highest_lsn_lookup(...) There is an outstanding active transaction!! It's oldest LSN is - '00000140:000217da:0001' It may be used as a '2nd chance'  SELECT MAX() candidate.  (sqlserver_log_queries.c:750)

For the READ COMMITTED isolation level when READ_COMMITTED_SNAPSHOT is OFF, which is the default isolation in SQL Server, the statements aren’t allowed to read data that has been modified but not yet committed by other transactions. As a result, for a table with open transactions that is being migrated by a full load and CDC task, select * from the exact table is blocked until the active transactions of that table in the task scope get committed or roll back. If an open transaction is on the table that isn’t in the task scope, AWS DMS waits for some time and then starts the task. AWS DMS doesn’t wait for commit or rollback to start the full load and CDC task, and it doesn’t make any difference.

As a result of this behaviour, there is no data loss. However, the full load and CDC task are constantly in a Before load state until the open transactions associated with the table in the task scope commit or roll back.

How AWS DMS handles open transactions for a PostgreSQL source

For a PostgreSQL source, AWS DMS uses the PostgreSQL plugin pglogical or test_decoding to read the Write-Ahead Log (WAL) via the SOURCE_CAPTURE component. However, when there are active transactions, creating a replication slot like select * from pg_create_logical_replication_slot('slot_test','test_decoding'); will hang. As a result, AWS DMS can’t create a replication slot to capture changes. It waits for TransactionConsistencyTimeout before the statement is timed out. As long as all open transactions commit or roll back within TransactionConsistencyTimeout, the full load and CDC task starts normally and there is no data loss. If open transactions run longer than TransactionConsistencyTimeout, it fails with the CDC prerequisite check and prints logs as follows:

[SOURCE_CAPTURE ]E: RetCode: SQL_ERROR SqlState: 42P01 NativeError: 1 Message: ERROR: relation "pglogical.replication_set" does not exist; No query has been executed with that handle [1022502] (ar_odbc_stmt.c:3752)

[SOURCE_CAPTURE ]E: RetCode: SQL_ERROR SqlState: 57014 NativeError: 1 Message: ERROR: canceling statement due to statement timeout; Error while executing the query [1022502] (ar_odbc_stmt.c:2581)

[SOURCE_CAPTURE ]E: Could not find any supported plugins available on source (postgres_plugin.c:269)

How AWS DMS handles open transactions for a MySQL source

For a MySQL source, AWS DMS uses binlog to capture changes. MySQL binlog doesn’t contain uncommitted transactions. Therefore, when the full load and CDC task starts, AWS DMS can only see committed transactions. Any open transactions are treated as cached changes or CDC changes depending on when the changes are committed, that is, during full load or after full load. As a result, there isn’t a transaction consistency timeout issue on the full load and CDC task when using MySQL as the source.

Best practices

The following are best practices for handling open transactions:

  • As part of data migration assessment, check the source database well in advance for long-running transactions and work with the application or service owner to have a process to minimize the long-running transactions in the database, at least during the migration phase.
  • For an Oracle source, there can be data loss when transaction consistency timeout errors occur. Therefore, it’s better to fail the task earlier by setting FailOnTransactionConsistencyBreached to true.
  • Start the full load and CDC task during a low traffic window of the source database. It’s better to wait until a scheduled dump or extract, transform, and load (ETL) job to finish in the source database before starting the full load and CDC task. Not only does this reduce the chance of encountering a transaction consistency timeout issue, it also reduces the number of ongoing changes to capture for the task to catch up with the source faster.
  • When the full load and CDC task starts and stays in Before load for a long time, it’s recommended to check for any open transactions from the source database.
  • TransactionConsistencyTimeout is 10 minutes by default. If the source database constantly has transactions longer than 10 minutes, increase TransactionConsistencyTimeout.
  • Use AWS DMS validation to identify data missing and consistency issues.

Conclusion

Data loss is undesirable for any database migration project. In this post, we described open transactions and how AWS DMS handles them when starting a full load and CDC task for different database engines. We shared best practices to follow during the planning and running of AWS DMS tasks to avoid open transaction-related issues.

If you have any questions or suggestions, leave a comment.


About the authors

Wanchen Zhao is a Partner Solutions Architect at AWS

Rishi Raj Srivastava is a database engineer with the AWS DMS team at Amazon Web Services.

Sudip Acharya is a Consultant with the AWS ProServe team in India. He works with internal and external Amazon customers to provide guidance and technical assistance on database modernization projects, helping them improve the value of their solutions when using AWS. His passion is to design and implementation of database or migration related tools.