What are the best practices to use when migrating PostgreSQL databases to a target RDS for PostgreSQL database using AWS DMS?

Last updated: 2022-08-29

I have a PostgreSQL source database that I want to migrate to a target Amazon Relational Database Service (Amazon RDS) for PostgreSQL source database. What are some best practices I can use to migrate from one PostgreSQL database to another using AWS Database Migration Service (AWS DMS)?

Short description

When you migrate homogeneous databases using AWS DMS, copy the initial data using your engine's native tools, such as pg_dump. Then, perform pg_restore on the target. You can also use logical replication, and the COPY command. For more information, see Best practices for migrating PostgreSQL databases to Amazon RDS and Amazon Aurora.

To migrate from an RDS for PostgreSQL database to another RDS for PostgreSQL database, take a snapshot, and then restore the snapshot as the target. For more information, see Migrating a snapshot of an RDS for PostgreSQL DB instance to an Aurora PostgreSQL DB cluster.

Be aware that migrating all of your data from a source database to a target database using AWS DMS full load can take a long time. You might experience delays due to factors like:

  • Bandwidth
  • Source capacity to push large amounts of data
  • Replication engine capacity to store, process, and forward bulk loads
  • Target capacity to consume data from the source

Comparatively, change replication contains only changes from the source to the target, so workloads like this can be very light.

Create and determine the current log sequence number (LSN)

Before you take a backup, you must get a marker to instruct your AWS DMS task where to begin migrating changes from.

On the source PostgreSQL database, run these queries to create and determine the current LSN.

Create the replication slot:

SELECT * FROM
pg_create_logical_replication_slot('replication_slot_name','tset_decoding')

Get the current LSN:

SELECT restart_LSN  FROM pg_replication_slots WHERE slot_name = 'replication_slot_name';

The restart_LSN command tells the AWS DMS task where to start migrating changes from source to target.

Resolution

Use these best practices for migrating data from PostgreSQL to PostgreSQL using AWS DMS tasks.

Don't use foreign keys and triggers during full load

When full load is taking place, make sure that foreign keys and triggers aren't included for migration. AWS DMS migrates tables in alphabetical order, but it doesn't know which tables are parent tables and which are child tables. So, AWS DMS might try to migrate child tables first. AWS DMS then stops the tables from migrating because of a foreign key violation. So, either suppress or don't include foreign keys on the target during migration.

Triggers must never be present on a target during migration because they perform several processes that might corrupt data on the target. Add any triggers at cut-over.

Turn on Full LOB Mode when migrating JSON Data

When you migrate LOB in the form of JSON, turn on full LOB Mode so that JSON format isn't truncated. If you use Limited LOB mode, data truncation might occur. Then, AWS DMS makes sure that the table fails because the JSON is in the wrong format.

Make sure that the primary key field is not TEXT data type

Check that the primary key field is not TEXT, especially if full LOB mode is turned on. You might experience DUPLICATES of NULLs because AWS DMS treats TEXT data type as LOB. So, during full load, AWS DMS tries to make the primary key NULL, and then reports a duplicate because there are many text columns with the same value. The error is never treated as "NULL not allowed on Primary Key", but as duplicates. This can be a difficult issue to discover and resolve, so always confirm that your primary key field is not TEXT to avoid the issue.

Allow AWS DMS to create target tables

If full load is taking place, allow AWS DMS to create tables on the target. When AWS DMS creates tables, it also creates the matching fields without DEFAULT values for the columns. Default values on columns can cause unexpected behavior in AWS DMS. For example, SERIAL makes AWS DMS migration fail because this field wants to auto create a value. See this example:

CREATE TABLE COMPANY (
   ID SERIAL PRIMARY KEY,
   NAME TEXT      NOT NULL);

If the target is structured like this example, then PostgreSQL internals wants to generate the value of the ID column. But, the source also contains a value to INSERT which causes an issue. So make sure that DEFAULTS are not part of the target during migration.

Define partitions as source tables on task table mappings

When you migrate partitioned tables, make sure that you define partitions as source tables on task table mappings, not parent tables. This is because WAL logs retain partitioned table information. Parent tables must only be used during full load, so don't use parent tables for the CDC phase. If you define parent tables during CDC, you might encounter duplicate errors that affect migration.

Also, when you map the target tables, make sure that all partitions are remapped to the parent. This means that the parent is used to automatically distribute to its partitions.

Define all facets on the source when you use PGLOGICAL

When you use PGLOGICAL for migration, make sure that you define every facet that is required on the source. If you skip one area, you'll see unexpected behavior. Issues that occur as a result of this are very difficult to troubleshoot, so check that you have defined these areas before you begin migration with PGLOGICAL.

For Amazon RDS, define these items:

Parameter group:

shared_preload_libraries = pglocical

Database level:

create extension pglogical;

For On-premises, define these items:

postgresql.conf:

shared_preload_libraries = pglogical

Database level:

create extension pglogical;

Make sure that all PG plugins that are defined on the source are defined on the target

Be sure that all of the PG plugins that you define on the source are also defined on the target. This helps with compatibility and smooth processing of data.

Make sure that tasks don't remain in the Stop/Error state

When tasks take a long time in the Stop or Error states, storage fills up on the replication slot.

Delete manually created replication slots from the source

If you created any replication slots manually, delete them from the source when migration is complete. If the replication slots remain on the source, then they accumulate in size and storage fills up.

Migrate tables with a primary key/unique index

It;s a best practice to make sure that the tables that you're migrating have a primary key/unique index. If a table doesn't have a primary key, then the UPDATE and DELETE statements might not be applied on the table because they aren't logged in WAL logs. For tables without a primary key, use REPLICATE IDENTITY FULL, but be aware that this generates a lot of information in the logs.


Did this article help?


Do you need billing or technical support?