Achieving minimum downtime for major version upgrades in Amazon Aurora for PostgreSQL using AWS DMS
AWS provides two managed PostgreSQL options: Amazon RDS for PostgreSQL and Amazon Aurora PostgreSQL. When Amazon RDS or Aurora support a new major version of a database engine, for example, PostgreSQL 10 to 11, you can upgrade your DB instances to the new version. Major version upgrades can contain database changes that may not be backward-compatible with existing applications. For more information, see Upgrading the PostgreSQL DB Engine for Aurora PostgreSQL and Best practices for upgrading Amazon RDS to major and minor versions of PostgreSQL.
Both Amazon RDS and Aurora provide an option to manually initiate a major version upgrade by modifying your DB instance. This is also known as an in-place upgrade and requires downtime for your applications during the upgrade process. Additionally, you must restore the latest backup in the event of any issues with the upgrade. Therefore, this option may not be desirable for all workload types. An alternative approach is using AWS Database Migration Service (DMS) for major version upgrades. AWS DMS uses PostgreSQL logical replication for near-real-time synchronization of data between two major versions. You should only use AWS DMS if you meet the following requirements:
- There is no in-place upgrade option available for a particular major version
- You need to upgrade a few or selective databases in an Aurora cluster
- You want to minimize the downtime required for the upgrade process and have a faster rollback option to the older instance in case of any issues with cutover
The AWS DMS solution requires more AWS resources and additional planning as opposed to the in-place upgrade option. As DMS is based on outbound logical replication solution, there will be an increase in the load in terms of CPU utilization, Read and Write IOPS at the source. The increase may vary based on factors like change activity, number of active tasks during the migration. As a best practice, please test the DMS procedure in a non-production environment and make sure you have sized your source environment to cater to this additional load. Also, you need to have optimal configuration for the number of concurrent tasks, DMS replication instance, and necessary monitoring.
This post walks you through upgrading your Aurora PostgreSQL 10 database to Aurora PostgreSQL 11 using AWS DMS with minimal downtime. The walkthrough provides many reusable artifacts that can help you to get started quickly. Although this post uses Aurora, the instructions are also valid for an Amazon RDS for PostgreSQL instance.
At a high level, this post includes the following steps:
- Setting up the necessary AWS resources using AWS CloudFormation
- Setting up the source environment based on Aurora PostgreSQL 10.x version and loading the sample data
- Setting up the target environment based on Aurora PostgreSQL 11.x
- Performing schema migration using native PostgreSQL utilities
- Setting up data migration (full load and change data capture (CDC)) using AWS DMS.
- Monitoring, testing, and cutover to the target environment
The following diagram illustrates the high-level architecture.
Before getting started, you must have the following prerequisites:
- An AWS account with administrator IAM privilege. You can also use the AWS managed policy Administrator.
- Familiarity with the following AWS services:
- Experience using psql.
Setting up the environment
To set up your environment, complete the following steps:
- Download the CloudFormation template
Aurora_PostgreSQL_DBLaunch.yamlfrom the GitHub repo.
- Launch it on the AWS Management Console.
- Name the stack
- Specify the source and target database configurations.
- Leave other values at their default.
This post uses the US East (N. Virginia) Region, but you can use any preferred Region. Make sure that while using AWS CLI commands, you set
AWS_DEFAULT_REGIONto your preferred Region.
The CloudFormation stack deploys the following resources:
- VPC (10.0.0.0/25)
- Internet Gateway
- Two private subnets
- Two public subnets
- AWS Cloud9 environment
- Aurora cluster with PostgreSQL compatibility 10.x (
aurora-source) and 11.x (
aurora-target) in a VPC, launched with a single db.r5.large instance (Writer instance) only
- Sample database with the name
demo.You can refer to AWS CloudFormation documentation to know more about stack creation process. The CloudFormation stack creation takes approximately 15 minutes. You can see the progress by looking in the events section.
- Record the Aurora master user name pgadmin. Default password for the master user is auradmin.
This post uses the AWS Cloud9 IDE to run SQL scripts and load data. You can also launch an Amazon EC2 instance in the same VPC.
- After the launch is successful, log in to the AWS Cloud9 environment.
- Install the PostgreSQL client tools.
- Clone the AWS DMS sample repo from GitHub and load the data into the source database. See the following code:
- Navigate to the
- Configure the environment variables for the source and target Aurora endpoints. See the following code:
You can obtain the cluster endpoint names from the AWS CloudFormation output section.
- Log in as the master user to
aurora-sourceusing the psql utility and set up the sample data. The following code creates the schema
dms_sampleand loads schema objects and data:
The default installation takes up to 30–45 minutes and loads approximately 7 GB of data. You may see some psql errors such as
role does not existor
psql: install-postgresql.sql:30: INFO: pg_hint_plan: hint syntax error at or near APPEND. You can ignore them.
- Verify the tables are set up properly and the data load is complete.
- To verify the list of tables and their sizes, run the following psql command:
- Clone the GitHub repo that contains the scripts and SQLs used by this post. See the following code:
Setting up the source environment
It is important to thoroughly review the prerequisites, limitations, and best practices when you configure your source environment. This post highlights a few important considerations. For more information, see Using a PostgreSQL Database as a Source for AWS DMS.
Enabling logical replication
Enable logical replication by updating
rds.logical_replication=1 in the
aurora-source cluster parameter group and reboot the instance. For more information, see Using PostgreSQL Logical Replication with Aurora. See the following code:
AWS DMS doesn’t support all PostgreSQL datatypes when migrating data from PostgreSQL to PostgreSQL. As of the date this blog post was published, you can’t migrate composite datatypes and timestamps with time zones. Additionally, AWS DMS streams some data types as strings if the data type is unknown. Some data types, such as XML and JSON, can successfully migrate as small files, but can fail if they are large documents. If you have tables with such datatypes, you should use native PostgreSQL replication tools like pg_dump or Publisher/Subscriber logical replication to migrate such tables. For more information, see Migrating from PostgreSQL to PostgreSQL Using AWS DMS. See the following code:
You do not see rows for the preceding query because you don’t have any unsupported datatypes in your setup.
Another key consideration is to identify tables with
NUMERIC data type without precision and scale. When transferring data that is a
NUMERIC data type but without precision and scale, AWS DMS uses
NUMERIC (28,6) (a precision of 28 and scale of 6) by default. For example, the value 0.611111104488373 from the source is converted to 0.611111 on the PostgreSQL target. See the following code:
You should evaluate the impact of this precision issue for your workload and adjust the precision for the table.
If your application must retain the precision and scale on the target database, you need to modify the source database by using ALTER TABLE. The
ALTER TABLE command is a data definition language (DDL) command that acquires an exclusive lock on the table and is held until the end of the transaction, which causes database connections to pile up and leads to application outage, especially for large tables. Therefore, roll out such changes during a maintenance window after careful analysis.
If this is not an issue for your workload, you can replicate with the as-is setup, with the caveat that you cannot enable AWS DMS validation for the table involved. For more information, see Validating AWS DMS Tasks.
Missing primary keys
A captured table must have a primary key. If a table doesn’t have a primary key, AWS DMS ignores
UPDATE record operations for that table. You also need a primary key for CDC and data validation purposes.
To identify tables that don’t have primary keys, enter the following code:
To manage such tables, consider the following suggestions:
- Identify any column that can serve as a primary key. This could be a column with a unique index and no null constraint.
- If no such key exists, try adding a surrogate key by adding a column like GUID. Another option is to add all columns present in the table.
- If the table receives only inserts and doesn’t accept any updates or deletes (for example, you are using it as a history table), then you can leave it as is and DMS will copy the inserts.
In this walkthrough, you create primary keys on all the tables (except seat) to make sure CDC and AWS DMS validation occurs. Because you skip the primary key creation step for the seat table, you may notice that AWS DMS reports the validation state as no primary key and doesn’t perform any data validation on this table. See the following code:
Data definition language propagation
You can replicate the DDL statements with AWS DMS, but there are exceptions. For example, when using CDC, AWS DMS does not support
TRUNCATE operations. For more information, see Limitations on Using a PostgreSQL Database as a Source for AWS DMS.
As a best practice, you should apply the DDL statements during the maintenance window on the source and target database manually. Based on your DDL strategy, make sure to turn it on or off by configuring the extra connection attribute
captureDDLs during endpoint creation and AWS DMS task policy settings. For more information, see Task Settings for Change Processing DDL Handling.
Other considerations before the upgrade
PostgreSQL version 11 contains several changes that may affect compatibility with previous releases. For example, the column
relhaspkey has been deprecated in the
pg_class catalog, and you should use the
pg_index catalog instead to check primary keys. If you have any custom monitoring queries that are dependent on such columns, you need to amend them accordingly. For more information on changes that may affect compatibility with previous releases, see the particular major version release notes (such as the PostgreSQL 11.0 release notes on the PostgreSQL website).
If you are using extensions like
pg_hint_plan, you need to create them manually in the target database. While creating them, make sure to check if there is a version mismatch between the source and target database and look for release notes. For example, pg_repack added support for PostgreSQL 11 in 1.4.4, which means you must upgrade your
pg_repack client to 1.4.4. You can verify the installed extensions using the following code:
After you set up the source environment and validate that it’s ready, you can set up the target environment.
Setting up the target environment
It is important to carefully review the prerequisites, limitations, and best practices when configuring your target environment. For more information, see Using a PostgreSQL Database as a Target for AWS Database Migration Service. In this section, you perform the schema migration and required parameters in the environment.
This post provides a custom parameter group based on PostgreSQL version 11. You can customize this further based on your existing configurations.
Set up session_replication_role=
replica to temporarily disable all triggers from the instance until the migration is complete. See the following code:
To proceed with this migration, you must first create the schema and associated objects in the target database. Schema migration includes two steps: migrate the users, roles, and grants, and migrate the schema definitions. Because this is a homogeneous migration, you use native PostgreSQL tools such as
Migrating users, roles, and system grants
pg_dumpall to dump global objects that are common to all databases. This includes information about database roles and properties such as access permissions that apply to whole databases (
pg_dump does not export these objects). See the following code:
Amazon RDS for PostgreSQL and Aurora PostgreSQL block access to the catalog table
pg_authid. Therefore, you have to use the
—no—role-password option in
pg_dumpall to dump the user and roles definition. Additionally, you need to use the PostgreSQL 10 client for this.
db_roles.sql file has all the user and role information, including
rdsadmin and other
rds_* roles. Identify relevant users for your environment and exclude unwanted users and tablespace definitions from the SQL script and run this on the target database. This makes sure you have consistent users and roles set up in both the source and target environment. Because the passwords are not exported, you must synchronize the passwords manually using
alter user commands.
Migrating schema objects
To copy the schema DDL, you use the
pg_dump command. See the following code:
pg_schema.sql file includes all DDL statements.
As a best practice, you should create objects in the following order for efficient data loading:
- Create sequences, tables, and primary key constraints for initiating full load.
- After the full load is complete, you can have tasks stop before applying CDC changes. You can manage this in the AWS DMS task settings.
- Create additional secondary indexes and other remaining objects.
This approach makes sure that secondary indexes do not slow down the full load process.
This post has already extracted the
dms_sample schema-related DDLs from the
pg_schema.sql script and created multiple DDL files based on the object type. You can find these SQL files in the
TargetDB directory. To create the
dms_sample schema in the target environment, enter the following code:
Verify that the schema objects are in the target before proceeding to database migration.
In this step, you create an AWS DMS replication instance (AWS DMS version 3.3.1), source, and target endpoints via AWS CloudFormation using the stack name
DMSRepforBlog. Because you use AWS CLI commands to create replication tasks, you must create two roles:
dms-cloudwatch-logs-role. For instructions, see Creating the IAM Roles to Use With the AWS CLI and AWS DMS API.
Launch the CloudFormation template with the following code:
Stack creation takes up to 5 minutes. When it is complete, test the connection for the source and target with the following code:
After you get the ARN for the replication instance and endpoints, you can proceed with testing your connection. See the following code:
To refresh the schemas’ source endpoint, enter the following code:
Setting up AWS DMS
AWS DMS supports full load and CDC for this migration. Alternatively, you can use native tools such as
pg_dump for full load and AWS DMS for CDC only. This post uses AWS DMS to carry out both full load and CDC.
In this walkthrough, you create two tasks: one for the large tables and the other for the rest of the tables to improve the parallelism using different task settings. Transactional consistency is maintained within a task, so tables in separate tasks don’t participate in common transactions.
During the migration, the full load of large tables (in terms of rows or physical size of the table) can take a considerable time. The following are best practices for AWS DMS full load:
- Identify the large tables and divide them into multiple chunks based on numeric primary keys. Split these chunks into multiple tasks across the number of DMS instances.
- To copy large partitioned tables, manually split the tasks or use auto-partition available in AWS DMS from version 1.2 or higher.
- Choose the appropriate replication instance class and number of replication instances after doing a couple of iterations in a non-production environment. For more information, see Choosing the Optimum Size for a Replication Instance.
- Plan your full load during off-peak hours at the source to reduce the burden and minimize the storage requirements to keep the database changes in Write Ahead Logs (WALs).
- Turn the auto vacuum off on the target database and turn it on after the full load is complete.
For more information, see Improving the Performance of an AWS DMS Migration.
You use AWS CLI commands to create two DMS tasks:
- Task 1 includes
ticket_purchase_histbecause they have foreign key relationships
- Task 2 includes the remaining tables
The DMS replication instance CloudFormation template Outputs tab has values for replication and endpoint ARNs. You can also use the
jq utility to extract the required ARN. See the following code:
Make sure you are following the specific instructions for configuring tasks and endpoints based on the AWS DMS engine version. For example, if you use an AWS DMS version lower than 3.1.0, you must do some additional configuration at the source.
You can start the tasks when they are ready. To check their status, use the following code, and make sure that
ReplicationTaskStatus moves from
Note: After the task starts, you can verify if AWS DMS created the replication slots in the
aurora-source cluster by running the following command (
select * from pg_replication_slots). There is one slot created per task. This is true only for full load and CDC tasks. If you are using CDC-only tasks, you must create the slot manually and specify it in the source endpoint configuration. Make sure you clean up the tasks and any manual slots after they have completed or when no longer required.
After the full load is complete, the tasks are configured to stop. You can now create additional indexes and remaining objects. See the following code:
You have to resume tasks post-index creation for continuous replication. See the following code:
You can now perform DML activity on the source and see AWS DMS replicate it to the target. See the following code:
You can monitor the task progress via the console. The following screenshot shows the Database migration tasks page.
The validation state for all the tables is Validated, except for the seat table, which has the state No primary key. See the following screenshot.
Monitoring the migration
You can monitor the progress of the migration using Amazon CloudWatch metrics. For more information, see Monitoring AWS DMS Tasks. As a best practice, make sure that you set up CloudWatch alarms for
CDCLatencyTarget. This helps to get timely alerts on replication lag and take appropriate action. For more information about sample AWS CLI commands and setting up monitoring for
CDCLatencyTarget, see the DMS Task Monitoring file on GitHub. You can also monitor the
CPUUtilization metrics on the aurora-source cluster. For more information, see Monitoring an Amazon Aurora DB Cluster.
During the migration of complex and multi-terabyte databases, you might face issues like errors or slow data copy processing. In those scenarios, you can refer to the following posts, which cover various troubleshooting techniques and best practices for using AWS DMS:
- Debugging Your AWS DMS Migrations: What to Do When Things Go Wrong (Part 1)
- Debugging Your AWS DMS Migrations: What to Do When Things Go Wrong (Part 2)
- Debugging Your AWS DMS Migrations: What to Do When Things Go Wrong? (Part 3)
Testing your application
After the replication is caught up via AWS DMS, you can start testing your application by connecting to the
aurora-target cluster. Before you proceed with testing, gather statistics for the database using utilities such as vacuumdb or analyze. For example, see the following code:
You can use the following testing strategies:
- For read-only testing, point your application directly to the
aurora-targetcluster while the replication is on.
- For write or any kind of stress testing on the target, create a DB cluster snapshot or clone the database of the
aurora-targetcluster. This way, you don’t have to break the replication process. Also, make sure you enable triggers, indexes, and any custom parameters before testing. It’s important to test your workload thoroughly against the new version to see if there are any incompatibility or performance issues.
- Review the existing custom parameters from the source, test, and configure it appropriately in the target environment.
After the testing is complete from both a functional and non-functional standpoint, you are ready to point your applications to the new major version environment. Make sure you have defined a success criteria for cutover along with a rollback plan. This post provides the following checklist as a reference; you should tailor it to your environment. Complete the checklist in the following order:
- Make sure that all users and roles are set up with appropriate permissions in
- Verify the database object counts match (such as indexes and functions) in the source and target environment. Refer to sample queries in repo to perform object comparison.
- Set up your monitoring on target databases, such as CloudWatch metrics, to be the same as the source.
- Check for any long-running transactions on the source and stop or terminate them with
pg_stat_activity. See the following code:
- Stop all the applications that write to
aurora-source. This is the point at which your downtime starts.
- Make that there is no AWS DMS replication delay by monitoring CloudWatch metrics
- Make that there are no data validation errors in AWS DMS. You can also verify the row counts for the key tables using SQL commands.
- Stop the AWS DMS task.
- Update the
aurora-targetcluster parameter group and set
session_replication_role=originand apply immediately.
- Make sure that sequences (if any) are in sync with the current value at the source by manually adjusting the last value. See the following code:
- Modify your application to point to the
aurora-targetendpoint and start the application.
- Conduct necessary tests to verify that the application is working correctly.
- Enable the target for production use. This is the point at which your downtime stops.
- Monitor your target environment and logs for any issues.
- Delete the AWS DMS tasks and clean up the AWS resources after successful testing.
This post shared step-by-step instructions for migrating your data between Aurora and Amazon RDS for PostgreSQL major versions and key considerations for your planning and execution. As a best practice, review AWS DMS, Aurora, and PostgreSQL documentation for the latest information on major version upgrades. This post also provided the code templates, SQL scripts, task settings, and best practices that you can use to migrate your databases quickly. If you have any questions or comments about this post, please share your thoughts in the comments section.
About the Authors
Gowri Balasubramanian is a Principal Database Solutions Architect at Amazon Web Services. He works with AWS customers to provide guidance and technical assistance on both relational and NoSQL database services, helping them improve the value of their solutions when using AWS.
Amit Bansal is a Senior Consultant with the Professional Services team at Amazon Web Services. He focuses on database migrations to AWS and works with customers to design and implement Amazon RDS, Aurora and Redshift architectures.
HariKrishna Boorgadda is a Senior Consultant with the Professional Services team at Amazon Web Services. He focuses on database migrations to AWS and works with customers to design and implement Amazon RDS and Aurora architectures.