Achieving minimum downtime for major version upgrades in Amazon Aurora for PostgreSQL using AWS DMS

AWS provides two managed PostgreSQL options: Amazon RDS for PostgreSQL and Amazon Aurora PostgreSQL. When Amazon RDS or Aurora support a new major version of a database engine, for example, PostgreSQL 10 to 11, you can upgrade your DB instances to the new version. Major version upgrades can contain database changes that may not be backward-compatible with existing applications. For more information, see Upgrading the PostgreSQL DB Engine for Aurora PostgreSQL and Best practices for upgrading Amazon RDS to major and minor versions of PostgreSQL.

Both Amazon RDS and Aurora provide an option to manually initiate a major version upgrade by modifying your DB instance. This is also known as an in-place upgrade and requires downtime for your applications during the upgrade process. Additionally, you must restore the latest backup in the event of any issues with the upgrade. Therefore, this option may not be desirable for all workload types. An alternative approach is using AWS Database Migration Service (DMS) for major version upgrades. AWS DMS uses PostgreSQL logical replication for near-real-time synchronization of data between two major versions. You should only use AWS DMS if you meet the following requirements:

There is no in-place upgrade option available for a particular major version
You need to upgrade a few or selective databases in an Aurora cluster
You want to minimize the downtime required for the upgrade process and have a faster rollback option to the older instance in case of any issues with cutover

The AWS DMS solution requires more AWS resources and additional planning as opposed to the in-place upgrade option. As DMS is based on outbound logical replication solution, there will be an increase in the load in terms of CPU utilization, Read and Write IOPS at the source. The increase may vary based on factors like change activity, number of active tasks during the migration. As a best practice, please test the DMS procedure in a non-production environment and make sure you have sized your source environment to cater to this additional load. Also, you need to have optimal configuration for the number of concurrent tasks, DMS replication instance, and necessary monitoring.

This post walks you through upgrading your Aurora PostgreSQL 10 database to Aurora PostgreSQL 11 using AWS DMS with minimal downtime. The walkthrough provides many reusable artifacts that can help you to get started quickly. Although this post uses Aurora, the instructions are also valid for an Amazon RDS for PostgreSQL instance.

Solution overview

At a high level, this post includes the following steps:

Setting up the necessary AWS resources using AWS CloudFormation
Setting up the source environment based on Aurora PostgreSQL 10.x version and loading the sample data
Setting up the target environment based on Aurora PostgreSQL 11.x
Performing schema migration using native PostgreSQL utilities
Setting up data migration (full load and change data capture (CDC)) using AWS DMS.
Monitoring, testing, and cutover to the target environment

The following diagram illustrates the high-level architecture.

Prerequisites

Before getting started, you must have the following prerequisites:

An AWS account with administrator IAM privilege. You can also use the AWS managed policy Administrator.
Familiarity with the following AWS services:
Experience using psql.

Setting up the environment

To set up your environment, complete the following steps:

Download the CloudFormation template Aurora_PostgreSQL_DBLaunch.yaml from the GitHub repo.
Launch it on the AWS Management Console.
Name the stack apgupgrade.
Specify the source and target database configurations.
Leave other values at their default.
This post uses the US East (N. Virginia) Region, but you can use any preferred Region. Make sure that while using AWS CLI commands, you set AWS_DEFAULT_REGION to your preferred Region.
The CloudFormation stack deploys the following resources:

- VPC (10.0.0.0/25)
- Internet Gateway
- Two private subnets
- Two public subnets
- AWS Cloud9 environment
- Aurora cluster with PostgreSQL compatibility 10.x (aurora-source) and 11.x (aurora-target) in a VPC, launched with a single db.r5.large instance (Writer instance) only
- Sample database with the name demo.You can refer to AWS CloudFormation documentation to know more about stack creation process. The CloudFormation stack creation takes approximately 15 minutes. You can see the progress by looking in the events section.

Record the Aurora master user name pgadmin. Default password for the master user is auradmin.
This post uses the AWS Cloud9 IDE to run SQL scripts and load data. You can also launch an Amazon EC2 instance in the same VPC.
After the launch is successful, log in to the AWS Cloud9 environment.
Install the PostgreSQL client tools.
Clone the AWS DMS sample repo from GitHub and load the data into the source database. See the following code:
```
cd ~/environment
git clone https://github.com/aws-samples/aws-database-migration-samples.git
```
Navigate to the /PostgreSQL/sampledb/v1 directory.
Configure the environment variables for the source and target Aurora endpoints. See the following code:
```
export AURORA_SOURCE_EP=<aurora-source cluster DNS endpoint>
export AURORA_TARGET_EP=<aurora-target cluster DNS endpoint>
```
You can obtain the cluster endpoint names from the AWS CloudFormation output section.
Log in as the master user to aurora-source using the psql utility and set up the sample data. The following code creates the schema dms_sample and loads schema objects and data:
```
cd ~/environment/aws-database-migration-samples/PostgreSQL/sampledb/v1
psql -h $AURORA_SOURCE_EP  -U pgadmin -d demo -f install-postgresql.sql 
```
The default installation takes up to 30–45 minutes and loads approximately 7 GB of data. You may see some psql errors such as role does not exist or psql: install-postgresql.sql:30: INFO: pg_hint_plan: hint syntax error at or near APPEND. You can ignore them.
Verify the tables are set up properly and the data load is complete.

To verify the list of tables and their sizes, run the following psql command:

psql -h $AURORA_SOURCE_EP -U pgadmin -d demo 
alter database demo set search_path="$user","dms_sample","public";
 \dt+ dms_sample.*

Clone the GitHub repo that contains the scripts and SQLs used by this post. See the following code:
```
cd ~/environment
git clone https://github.com/aws-samples/amazon-aurora-postgresql-upgrade
```

Setting up the source environment

It is important to thoroughly review the prerequisites, limitations, and best practices when you configure your source environment. This post highlights a few important considerations. For more information, see Using a PostgreSQL Database as a Source for AWS DMS.

Enabling logical replication

Enable logical replication by updating rds.logical_replication=1 in the aurora-source cluster parameter group and reboot the instance. For more information, see Using PostgreSQL Logical Replication with Aurora. See the following code:

# Get the parameter group name and Instance details for Aurora cluster
aws rds describe-db-clusters --db-cluster-identifier "aurora-source" --query "DBClusters[*].[DBClusterIdentifier,DBClusterMembers[0].DBInstanceIdentifier,DBClusterParameterGroup]" --output table

# Set the rds.logical_replication to 1 for enabling replication
aws rds modify-db-cluster-parameter-group \
--db-cluster-parameter-group-name <clusterparamgroup> \
--parameters "ParameterName=rds.logical_replication,ParameterValue=1,ApplyMethod=pending-reboot"

# Reboot the instance 
aws rds reboot-db-instance --db-instance-identifier <instance identifier>

Datatype considerations

AWS DMS doesn’t support all PostgreSQL datatypes when migrating data from PostgreSQL to PostgreSQL. As of the date this blog post was published, you can’t migrate composite datatypes and timestamps with time zones. Additionally, AWS DMS streams some data types as strings if the data type is unknown. Some data types, such as XML and JSON, can successfully migrate as small files, but can fail if they are large documents. If you have tables with such datatypes, you should use native PostgreSQL replication tools like pg_dump or Publisher/Subscriber logical replication to migrate such tables. For more information, see Migrating from PostgreSQL to PostgreSQL Using AWS DMS. See the following code:

cd ~/environment/amazon-aurora-postgresql-upgrade/DMS
psql -h $AURORA_SOURCE_EP -U pgadmin -d demo -f SourceDB/dms_unsupported_datatype.sql

You do not see rows for the preceding query because you don’t have any unsupported datatypes in your setup.

Another key consideration is to identify tables with NUMERIC data type without precision and scale. When transferring data that is a NUMERIC data type but without precision and scale, AWS DMS uses NUMERIC (28,6) (a precision of 28 and scale of 6) by default. For example, the value 0.611111104488373 from the source is converted to 0.611111 on the PostgreSQL target. See the following code:

select table_schema,table_name,column_name,data_type from information_schema.columns where data_type ='numeric' and numeric_scale is null;

You should evaluate the impact of this precision issue for your workload and adjust the precision for the table.

If your application must retain the precision and scale on the target database, you need to modify the source database by using ALTER TABLE. The ALTER TABLE command is a data definition language (DDL) command that acquires an exclusive lock on the table and is held until the end of the transaction, which causes database connections to pile up and leads to application outage, especially for large tables. Therefore, roll out such changes during a maintenance window after careful analysis.

If this is not an issue for your workload, you can replicate with the as-is setup, with the caveat that you cannot enable AWS DMS validation for the table involved. For more information, see Validating AWS DMS Tasks.

Missing primary keys

A captured table must have a primary key. If a table doesn’t have a primary key, AWS DMS ignores DELETE and UPDATE record operations for that table. You also need a primary key for CDC and data validation purposes.

To identify tables that don’t have primary keys, enter the following code:

cd ~/environment/amazon-aurora-postgresql-upgrade/DMS
psql -h $AURORA_SOURCE_EP -U pgadmin -d demo -f SourceDB/missing_pk.sql

table_schema  |    table_name
--------------+------------------
 dms_sample   | mlb_data
 dms_sample   | nfl_data
 dms_sample   | nfl_stadium_data
 dms_sample   | seat

To manage such tables, consider the following suggestions:

Identify any column that can serve as a primary key. This could be a column with a unique index and no null constraint.
If no such key exists, try adding a surrogate key by adding a column like GUID. Another option is to add all columns present in the table.
If the table receives only inserts and doesn’t accept any updates or deletes (for example, you are using it as a history table), then you can leave it as is and DMS will copy the inserts.

In this walkthrough, you create primary keys on all the tables (except seat) to make sure CDC and AWS DMS validation occurs. Because you skip the primary key creation step for the seat table, you may notice that AWS DMS reports the validation state as no primary key and doesn’t perform any data validation on this table. See the following code:

set search_path='dms_sample';
alter table mlb_data add primary key (mlb_id);
alter table nfl_data add primary key (position ,name,team);
alter table nfl_stadium_data add primary key(stadium,team);

Data definition language propagation

You can replicate the DDL statements with AWS DMS, but there are exceptions. For example, when using CDC, AWS DMS does not support TRUNCATE operations. For more information, see Limitations on Using a PostgreSQL Database as a Source for AWS DMS.

As a best practice, you should apply the DDL statements during the maintenance window on the source and target database manually. Based on your DDL strategy, make sure to turn it on or off by configuring the extra connection attribute captureDDLs during endpoint creation and AWS DMS task policy settings. For more information, see Task Settings for Change Processing DDL Handling.

Other considerations before the upgrade

PostgreSQL version 11 contains several changes that may affect compatibility with previous releases. For example, the column relhaspkey has been deprecated in the pg_class catalog, and you should use the pg_index catalog instead to check primary keys. If you have any custom monitoring queries that are dependent on such columns, you need to amend them accordingly. For more information on changes that may affect compatibility with previous releases, see the particular major version release notes (such as the PostgreSQL 11.0 release notes on the PostgreSQL website).

If you are using extensions like pg_stat_statements or pg_hint_plan, you need to create them manually in the target database. While creating them, make sure to check if there is a version mismatch between the source and target database and look for release notes. For example, pg_repack added support for PostgreSQL 11 in 1.4.4, which means you must upgrade your pg_repack client to 1.4.4. You can verify the installed extensions using the following code:

psql> \dx

After you set up the source environment and validate that it’s ready, you can set up the target environment.

Setting up the target environment

It is important to carefully review the prerequisites, limitations, and best practices when configuring your target environment. For more information, see Using a PostgreSQL Database as a Target for AWS Database Migration Service. In this section, you perform the schema migration and required parameters in the environment.

This post provides a custom parameter group based on PostgreSQL version 11. You can customize this further based on your existing configurations.

Set up session_replication_role=replica to temporarily disable all triggers from the instance until the migration is complete. See the following code:

# Identify instance id for aurora-target cluster

aws rds describe-db-clusters --db-cluster-identifier "aurora-target" \
--query "DBClusters[*].[DBClusterIdentifier,DBClusterMembers[0].DBInstanceIdentifier,\
DBClusterParameterGroup]" --output table

# Check DB parameter group

aws rds describe-db-instances --db-instance-identifier <instance_identifier> \
 --query "DBInstances[*].[DBClusterIdentifier,DBInstanceIdentifier,\
 DBParameterGroups[0].DBParameterGroupName,DBParameterGroups[0].ParameterApplyStatus]"\
 --output table

# Modify session_replication_role setting to replica

aws rds modify-db-parameter-group --db-parameter-group-name <dbparamgroup> \
--parameters "ParameterName=session_replication_role,ParameterValue=replica,\
ApplyMethod=immediate"

# Make sure the ParameterApplyStatusvalue changes from applying to in-sync
aws rds describe-db-instances --db-instance-identifier <instance_identifier> \
 --query "DBInstances[*].[DBClusterIdentifier,DBInstanceIdentifier,\
 DBParameterGroups[0].DBParameterGroupName,DBParameterGroups[0].ParameterApplyStatus]"\
 --output table

# Verify that database parameter is set in target database

demo=> show session_replication_role ;
 session_replication_role 
--------------------------
 replica
(1 row)

To proceed with this migration, you must first create the schema and associated objects in the target database. Schema migration includes two steps: migrate the users, roles, and grants, and migrate the schema definitions. Because this is a homogeneous migration, you use native PostgreSQL tools such as pg_dumpall and pg_dump.

Migrating users, roles, and system grants

Use pg_dumpall to dump global objects that are common to all databases. This includes information about database roles and properties such as access permissions that apply to whole databases (pg_dump does not export these objects). See the following code:

cd ~/environment
cd amazon-aurora-postgresql-upgrade/DMS/SourceDB
pg_dumpall -h $AURORA_SOURCE_EP -g -U pgadmin -f db_roles.sql --no-role-password

Amazon RDS for PostgreSQL and Aurora PostgreSQL block access to the catalog table pg_authid. Therefore, you have to use the —no—role-password option in pg_dumpall to dump the user and roles definition. Additionally, you need to use the PostgreSQL 10 client for this.

The db_roles.sql file has all the user and role information, including rdsadmin and other rds_* roles. Identify relevant users for your environment and exclude unwanted users and tablespace definitions from the SQL script and run this on the target database. This makes sure you have consistent users and roles set up in both the source and target environment. Because the passwords are not exported, you must synchronize the passwords manually using alter user commands.

If you store your credentials in AWS Secrets Manager or use IAM authentication, set up the necessary credentials and permissions for the target environment.

psql -h $AURORA_TARGET_EP -U pgadmin -d demo -c "CREATE USER dms_user WITH PASSWORD 'dms_user'"

Migrating schema objects

To copy the schema DDL, you use the pg_dump command. See the following code:

cd ~/environment/amazon-aurora-postgresql-upgrade/DMS/SourceDB
pg_dump –h  $AURORA_SOURCE_EP -d demo --schema-only -U pgadmin -f pg_schema.sql

The pg_schema.sql file includes all DDL statements.

As a best practice, you should create objects in the following order for efficient data loading:

Create sequences, tables, and primary key constraints for initiating full load.
After the full load is complete, you can have tasks stop before applying CDC changes. You can manage this in the AWS DMS task settings.
Create additional secondary indexes and other remaining objects.

This approach makes sure that secondary indexes do not slow down the full load process.

This post has already extracted the dms_sample schema-related DDLs from the pg_schema.sql script and created multiple DDL files based on the object type. You can find these SQL files in the TargetDB directory. To create the dms_sample schema in the target environment, enter the following code:

cd ~/environment/amazon-aurora-postgresql-upgrade/DMS/
psql -h $AURORA_TARGET_EP -U pgadmin -d demo -f ./TargetDB/create_table.sql
psql -h $AURORA_TARGET_EP -U pgadmin -d demo -f ./TargetDB/create_pk_constraint.sql

Verify that the schema objects are in the target before proceeding to database migration.

Database migration

In this step, you create an AWS DMS replication instance (AWS DMS version 3.3.1), source, and target endpoints via AWS CloudFormation using the stack name DMSRepforBlog. Because you use AWS CLI commands to create replication tasks, you must create two roles: dms-vpc-role and dms-cloudwatch-logs-role. For instructions, see Creating the IAM Roles to Use With the AWS CLI and AWS DMS API.

Launch the CloudFormation template with the following code:

cd ~/environment/amazon-aurora-postgresql-upgrade/DMS/src 
# Install jq utility
sudo yum install jq -y
# Set AURORA_DB_CFSTACK_NAME to Aurora PostgreSQL launch CloudFormation stack name. 
export AURORA_DB_CFSTACK_NAME="<Enter the stack name of Aurora>"
echo $AURORA_DB_CFSTACK_NAME
# Verify above stack name matches the Aurora stack created in earlier step

export AWS_DEFAULT_REGION="us-east-1"

#Source Endpoint Information
SrcRDSEndPoint=$(aws cloudformation describe-stacks --stack-name $AURORA_DB_CFSTACK_NAME | jq -r '.Stacks[].Outputs[] | select(.OutputKey=="SrcRDSEndPoint") | .OutputValue')

#Target Endpoint Information
TgtRDSEndPoint=$(aws cloudformation describe-stacks --stack-name $AURORA_DB_CFSTACK_NAME | jq -r '.Stacks[].Outputs[] | select(.OutputKey=="TgtRDSEndPoint") | .OutputValue')

#Subnet Information
SubnetID1=$(aws cloudformation describe-stacks --stack-name $AURORA_DB_CFSTACK_NAME | jq -r '.Stacks[].Outputs[] | select(.OutputKey=="SubnetID1") | .OutputValue')

SubnetID2=$(aws cloudformation describe-stacks --stack-name $AURORA_DB_CFSTACK_NAME | jq -r '.Stacks[].Outputs[] | select(.OutputKey=="SubnetID2") | .OutputValue')

#Security Group Information
RepSecurityGroup=$(aws cloudformation describe-stacks --stack-name $AURORA_DB_CFSTACK_NAME | jq -r '.Stacks[].Outputs[] | select(.OutputKey=="RDSSecurityGrp") | .OutputValue')

export SrcDBUsername="pgadmin"
export SrcDBPassword="auradmin"
export TgtDBUsername="pgadmin"
export TgtDBPassword="auradmin"

# Launch the Cloudformation Stack to create DMS replication instance
aws cloudformation create-stack --stack-name DMSRepforBlog --template-body file://DMSRepInstance.yaml --parameters ParameterKey=RepAllocatedStorage,ParameterValue=100 ParameterKey=RepMultiAZ,ParameterValue=false ParameterKey=RepSecurityGroup,ParameterValue=$RepSecurityGroup ParameterKey=ReplInstanceType,ParameterValue=dms.r4.2xlarge ParameterKey=SrcDBUsername,ParameterValue=$SrcDBUsername ParameterKey=SrcDBPassword,ParameterValue=$SrcDBPassword ParameterKey=SrcDatabaseConnection,ParameterValue=$SrcRDSEndPoint ParameterKey=SrcEngineType,ParameterValue=aurora-postgresql ParameterKey=Subnets,ParameterValue="$SubnetID1 \, $SubnetID2" ParameterKey=TgtDBUsername,ParameterValue=$TgtDBUsername ParameterKey=TgtDBPassword,ParameterValue=$TgtDBPassword ParameterKey=TgtDatabaseConnection,ParameterValue=$TgtRDSEndPoint ParameterKey=TgtEngineType,ParameterValue=aurora-postgresql

Stack creation takes up to 5 minutes. When it is complete, test the connection for the source and target with the following code:

 Set AWSDMS_CFSTACK_NAME to DMS replication Cloudformation stack name
AWSDMS_CFSTACK_NAME="DMSRepforBlog"
export AWS_DEFAULT_REGION="us-east-1"
#Set variable to replication instance arn
DMSREP_INSTANCE_ARN=$(aws cloudformation describe-stacks --stack-name $AWSDMS_CFSTACK_NAME | jq -r '.Stacks[].Outputs[] | select(.OutputKey=="ReplicationInstanceArn") | .OutputValue')

# Set source database end point arn
DB_SRC_ENDPOINT=$(aws cloudformation describe-stacks --stack-name $AWSDMS_CFSTACK_NAME | jq -r '.Stacks[].Outputs[] | select(.OutputKey=="SrcEndpointArn") | .OutputValue')
# Set target database end point arn
DB_TGT_ENDPOINT=$(aws cloudformation describe-stacks --stack-name $AWSDMS_CFSTACK_NAME | jq -r '.Stacks[].Outputs[] | select(.OutputKey=="TgtEndpointArn") | .OutputValue')

After you get the ARN for the replication instance and endpoints, you can proceed with testing your connection. See the following code:

#Test source DB connection

aws dms test-connection --replication-instance-arn ${DMSREP_INSTANCE_ARN} --endpoint-arn ${DB_SRC_ENDPOINT}
# Ensure status changes from testing to successful (takes about ~1 min)
aws dms describe-connections --filter Name=endpoint-arn,Values=${DB_SRC_ENDPOINT} Name=replication-instance-arn,Values=${DMSREP_INSTANCE_ARN} --output table

#Repeat same steps for target DB

aws dms test-connection --replication-instance-arn ${DMSREP_INSTANCE_ARN} --endpoint-arn ${DB_TGT_ENDPOINT}
# Ensure status changes from testing to successful (takes about ~1 min)
aws dms describe-connections --filter Name=endpoint-arn,Values=${DB_TGT_ENDPOINT} Name=replication-instance-arn,Values=${DMSREP_INSTANCE_ARN} --output table

To refresh the schemas’ source endpoint, enter the following code:

# Below command is an asynchronous operation and can take several minutes

aws dms refresh-schemas --endpoint-arn $DB_SRC_ENDPOINT --replication-instance-arn $DMSREP_INSTANCE_ARN --output table

# Check status
aws dms describe-refresh-schemas-status --endpoint-arn $DB_SRC_ENDPOINT --output table

Setting up AWS DMS

AWS DMS supports full load and CDC for this migration. Alternatively, you can use native tools such as pg_dump for full load and AWS DMS for CDC only. This post uses AWS DMS to carry out both full load and CDC.

In this walkthrough, you create two tasks: one for the large tables and the other for the rest of the tables to improve the parallelism using different task settings. Transactional consistency is maintained within a task, so tables in separate tasks don’t participate in common transactions.

During the migration, the full load of large tables (in terms of rows or physical size of the table) can take a considerable time. The following are best practices for AWS DMS full load:

Identify the large tables and divide them into multiple chunks based on numeric primary keys. Split these chunks into multiple tasks across the number of DMS instances.
To copy large partitioned tables, manually split the tasks or use auto-partition available in AWS DMS from version 1.2 or higher.
Choose the appropriate replication instance class and number of replication instances after doing a couple of iterations in a non-production environment. For more information, see Choosing the Optimum Size for a Replication Instance.
Plan your full load during off-peak hours at the source to reduce the burden and minimize the storage requirements to keep the database changes in Write Ahead Logs (WALs).
Turn the auto vacuum off on the target database and turn it on after the full load is complete.

For more information, see Improving the Performance of an AWS DMS Migration.

You use AWS CLI commands to create two DMS tasks:

Task 1 includes person, sporting_event_ticket, and ticket_purchase_hist because they have foreign key relationships
Task 2 includes the remaining tables

The DMS replication instance CloudFormation template Outputs tab has values for replication and endpoint ARNs. You can also use the jq utility to extract the required ARN. See the following code:

# Switch to directory hosting table mapping and task setting files
cd ~/environment/amazon-aurora-postgresql-upgrade/DMS/Migration
# Task 1
export task_identifier=dms-sample-task1-full-load-cdc
aws dms create-replication-task --replication-task-identifier ${task_identifier} --source-endpoint-arn ${DB_SRC_ENDPOINT} \
--target-endpoint-arn ${DB_TGT_ENDPOINT} --replication-instance-arn ${DMSREP_INSTANCE_ARN} --migration-type full-load-and-cdc \
--table-mappings 'file://table-mapping-task1.json' --replication-task-settings 'file://tasksetting.json'

DMS_TASK_ARN1=$(aws dms describe-replication-tasks | jq -r  '.ReplicationTasks[]|select(.ReplicationTaskIdentifier=="dms-sample-task1-full-load-cdc")|.ReplicationTaskArn')

# Task 2
export task_identifier=dms-sample-task2-full-load-cdc

aws dms create-replication-task --replication-task-identifier ${task_identifier} --source-endpoint-arn ${DB_SRC_ENDPOINT} \
--target-endpoint-arn ${DB_TGT_ENDPOINT} --replication-instance-arn ${DMSREP_INSTANCE_ARN} --migration-type full-load-and-cdc \
--table-mappings 'file://table-mapping-task2.json' --replication-task-settings 'file://tasksetting.json'

DMS_TASK_ARN2=$(aws dms describe-replication-tasks | jq -r  '.ReplicationTasks[]|select(.ReplicationTaskIdentifier=="dms-sample-task2-full-load-cdc")|.ReplicationTaskArn')

Make sure you are following the specific instructions for configuring tasks and endpoints based on the AWS DMS engine version. For example, if you use an AWS DMS version lower than 3.1.0, you must do some additional configuration at the source.

You can start the tasks when they are ready. To check their status, use the following code, and make sure that ReplicationTaskStatus moves from creating to ready:

aws dms describe-replication-tasks --filters Name=replication-instance-arn,Values=${DMSREP_INSTANCE_ARN} --query "ReplicationTasks[:].{ReplicationTaskIdentifier:ReplicationTaskIdentifier,ReplicationTaskArn:ReplicationTaskArn,ReplicationTaskStatus:Status,ReplicationTFullLoadPercent:ReplicationTaskStats.FullLoadProgressPercent}" --output table

# Start tasks
aws dms start-replication-task --replication-task-arn ${DMS_TASK_ARN1} --start-replication-task-type start-replication
aws dms start-replication-task --replication-task-arn ${DMS_TASK_ARN2} --start-replication-task-type start-replication

# Track progress
aws dms describe-replication-tasks --filters Name=replication-instance-arn,Values=${DMSREP_INSTANCE_ARN} --query "ReplicationTasks[:].{ReplicationTaskIdentifier:ReplicationTaskIdentifier,ReplicationTaskArn:ReplicationTaskArn,ReplicationTaskStatus:Status,ReplicationTFullLoadPercent:ReplicationTaskStats.FullLoadProgressPercent}" --output table

Note: After the task starts, you can verify if AWS DMS created the replication slots in the aurora-source cluster by running the following command (select * from pg_replication_slots). There is one slot created per task. This is true only for full load and CDC tasks. If you are using CDC-only tasks, you must create the slot manually and specify it in the source endpoint configuration. Make sure you clean up the tasks and any manual slots after they have completed or when no longer required.

# Track progress describing statistics at table level
aws dms describe-table-statistics --replication-task-arn ${DMS_TASK_ARN1} --output table
aws dms describe-table-statistics --replication-task-arn ${DMS_TASK_ARN2} --output table

After the full load is complete, the tasks are configured to stop. You can now create additional indexes and remaining objects. See the following code:

cd ~/environment/amazon-aurora-postgresql-upgrade/DMS
psql -h $AURORA_TARGET_EP  -U pgadmin -d demo -f ./TargetDB/create_secondary_index.sql
psql -h $AURORA_TARGET_EP  -U pgadmin -d demo -f ./TargetDB/create_fk_constraints.sql
psql -h $AURORA_TARGET_EP -U pgadmin -d demo -f ./TargetDB/create_view.sql
psql -h $AURORA_TARGET_EP  -U pgadmin -d demo  -f ./TargetDB/create_function.sql
psql -h $AURORA_TARGET_EP  -U pgadmin -d demo -f ./TargetDB/object_grants.sql

You have to resume tasks post-index creation for continuous replication. See the following code:

aws dms start-replication-task --replication-task-arn ${DMS_TASK_ARN1} --start-replication-task-type resume-processing
aws dms start-replication-task --replication-task-arn ${DMS_TASK_ARN2} --start-replication-task-type resume-processing

You can now perform DML activity on the source and see AWS DMS replicate it to the target. See the following code:

psql -h $AURORA_SOURCE_EP -U pgadmin -d demo -c "select dms_sample.generateticketactivity(1000)"
psql -h $AURORA_SOURCE_EP -U pgadmin -d demo -c "select dms_sample.generatetransferactivity(100)"

You can monitor the task progress via the console. The following screenshot shows the Database migration tasks page.

The validation state for all the tables is Validated, except for the seat table, which has the state No primary key. See the following screenshot.

Monitoring the migration

You can monitor the progress of the migration using Amazon CloudWatch metrics. For more information, see Monitoring AWS DMS Tasks. As a best practice, make sure that you set up CloudWatch alarms for CDCLatencySource and CDCLatencyTarget. This helps to get timely alerts on replication lag and take appropriate action. For more information about sample AWS CLI commands and setting up monitoring for CDCLatencySource and CDCLatencyTarget, see the DMS Task Monitoring file on GitHub. You can also monitor the TransactionLogsDiskUsage and CPUUtilization metrics on the aurora-source cluster. For more information, see Monitoring an Amazon Aurora DB Cluster.

Troubleshooting

During the migration of complex and multi-terabyte databases, you might face issues like errors or slow data copy processing. In those scenarios, you can refer to the following posts, which cover various troubleshooting techniques and best practices for using AWS DMS:

Testing your application

After the replication is caught up via AWS DMS, you can start testing your application by connecting to the aurora-target cluster. Before you proceed with testing, gather statistics for the database using utilities such as vacuumdb or analyze. For example, see the following code:

vaccumDB -d demo -vZ -h <AURORA_TARGET_EP> -U pgadmin -p 5432

You can use the following testing strategies:

For read-only testing, point your application directly to the aurora-target cluster while the replication is on.
For write or any kind of stress testing on the target, create a DB cluster snapshot or clone the database of the aurora-target cluster. This way, you don’t have to break the replication process. Also, make sure you enable triggers, indexes, and any custom parameters before testing. It’s important to test your workload thoroughly against the new version to see if there are any incompatibility or performance issues.
Review the existing custom parameters from the source, test, and configure it appropriately in the target environment.

Cutover

After the testing is complete from both a functional and non-functional standpoint, you are ready to point your applications to the new major version environment. Make sure you have defined a success criteria for cutover along with a rollback plan. This post provides the following checklist as a reference; you should tailor it to your environment. Complete the checklist in the following order:

Make sure that all users and roles are set up with appropriate permissions in aurora-target.
Verify the database object counts match (such as indexes and functions) in the source and target environment. Refer to sample queries in repo to perform object comparison.
Set up your monitoring on target databases, such as CloudWatch metrics, to be the same as the source.

Check for any long-running transactions on the source and stop or terminate them with pg_stat_activity. See the following code:

cd ~/environment/amazon-aurora-postgresql-upgrade/DMS
psql -h $AURORA_SOURCE_EP -d demo -U pgadmin -f ./SourceDB/longrunningsess.sql

Stop all the applications that write to aurora-source. This is the point at which your downtime starts.
Make that there is no AWS DMS replication delay by monitoring CloudWatch metrics CDCLatencySource and CDCLatencyTarget.
Make that there are no data validation errors in AWS DMS. You can also verify the row counts for the key tables using SQL commands.
Stop the AWS DMS task.
Update the aurora-target cluster parameter group and set session_replication_role=origin and apply immediately.
Make sure that sequences (if any) are in sync with the current value at the source by manually adjusting the last value. See the following code:
```
select 'select setval('||quote_literal(schemaname||'.'||sequencename)||','||last_value||',true);' from pg_sequences;
```
Modify your application to point to the aurora-target endpoint and start the application.
Conduct necessary tests to verify that the application is working correctly.
Enable the target for production use. This is the point at which your downtime stops.
Monitor your target environment and logs for any issues.
Delete the AWS DMS tasks and clean up the AWS resources after successful testing.

Conclusion

This post shared step-by-step instructions for migrating your data between Aurora and Amazon RDS for PostgreSQL major versions and key considerations for your planning and execution. As a best practice, review AWS DMS, Aurora, and PostgreSQL documentation for the latest information on major version upgrades. This post also provided the code templates, SQL scripts, task settings, and best practices that you can use to migrate your databases quickly. If you have any questions or comments about this post, please share your thoughts in the comments section.

About the Authors

Gowri Balasubramanian is a Principal Database Solutions Architect at Amazon Web Services. He works with AWS customers to provide guidance and technical assistance on both relational and NoSQL database services, helping them improve the value of their solutions when using AWS.

Amit Bansal is a Senior Consultant with the Professional Services team at Amazon Web Services. He focuses on database migrations to AWS and works with customers to design and implement Amazon RDS, Aurora and Redshift architectures.

HariKrishna Boorgadda is a Senior Consultant with the Professional Services team at Amazon Web Services. He focuses on database migrations to AWS and works with customers to design and implement Amazon RDS and Aurora architectures.