What are the best practices for migrating a RDBMS database to Amazon Redshift using AWS DMS?
Last updated: 2022-08-18
I have a Relational Database Management System (RDBMS) database that I want to migrate to Amazon Redshift using the AWS Database Migration Service (AWS DMS). What are some best practices that I can follow when migrating an RDBMS database to Amazon Redshift?
Use the best practices in this article to help you migrate RDBMS databases to Amazon Redshift using AWS DMS.
- Use parallel load with your table-setting rules when you migrate data to an Amazon Redshift endpoint. For more information on using parallel load, and how it can be used to speed up migration, see Using parallel load for selected tables, views, and collections.
- AWS DMS versions 3.4.5 and later include support for the use of the ParallelLoad task settings during full load to Amazon Redshift target endpoints. For more information, see Multithreaded full load task settings for Amazon Redshift.
- If you are migrating a large amount of data to Amazon Redshift, increase the endpoint attribute values for maxFileSize and fileTransferUploadStreams. Doing this can help improve full load performance. Then, increase CommitRate value to improve performance.
- Because AWS DMS doesn't support Full Lob Mode when you use Amazon Redshift as a target for your databases, be sure to configure LobMaxSize to an appropriate value.
- Make sure that your Amazon Redshift tables have a primary key. If your target table doesn't have a primary key, then inserts are made in BatchApply mode. But because updates and deletes are then applied one-by-one, this affects performance. So, be sure that AWS DMS migrates CDC data in batches to the Amazon Redshift target to avoid any performance issues.
- Adjust the batch apply settings according to the workload to reduce the number of commits on the target. Batch apply settings include BatchApplyTimeoutMin, BatchApplyTimeoutMax, and BatchApplyMemoryLimit.
- Use BatchSplitSize to set the maximum number of changes applied in a single batch. The default value is 0, which means that there is no limit applied.
- Create target Amazon Redshift tables with the appropriate distribution and sort keys. When AWS DMS creates tables for Amazon Redshift targets, it creates tables with a default distribution key and no sort keys. Amazon Redshift stores data on disk in a sorted order, according to the sort key. The Amazon Redshift query optimizer uses sort order when determining optimal query plans. So, when you run a query, the query optimizer redistributes rows to the compute nodes as needed to perform any joins and aggregations. For more information, see Choose the best sort key and Working with data distribution styles.
- If you have a heavy transactional workload, use settings similar to these. These example settings mean that an 8000-MB batch buffer is filled in 1800 seconds, using 32 parallel threads, and with a 250-MB maximum file size.
Amazon Redshift endpoint settings:
BatchApplyEnabled=true; BatchSplitSize =8000; BatchApplyTimeoutMax =1800; BatchApplyTimeoutMin =1800; ParallelApplyThreads=32; ParallelApplyBufferSize=100;