AWS Database Blog

Migrate to an Amazon Aurora PostgreSQL instance from another PostgreSQL source

January 2024: This post was reviewed and updated for accuracy.

Amazon Aurora PostgreSQL-Compatible Edition combines the performance and availability of high-end commercial databases with the simplicity and cost-effectiveness of open-source databases. Aurora provides this by scaling storage across three Availability Zones in the same Region, and supports up to 15 read replica instances for scaling out read workloads and high availability within a single Region. With Amazon Aurora Global Database, you can now replicate PostgreSQL databases in up to five Regions for remote read access and disaster recovery in the event of failure of an entire Region.

Due to these features, many customers are migrating their databases to Aurora PostgreSQL-Compatible. This post talks about migrating an Aurora PostgreSQL-Compatible instance from another PostgreSQL source, which can be an existing Amazon RDS for PostgreSQL instance or a self-managed PostgreSQL instance installed on an Amazon Elastic Compute Cloud (Amazon EC2) instance or on-premises.

Prerequisites

To reduce the time of the migration process, you can possibly reduce the size of the database before you start the migration. You have the following options:

  • Run VACUUM FULL/REINDEX or pg_repack on the database
  • Archive or remove any old or temporary data

Running VACUUM FULL or pg_repack on the database

You can check the bloat on the database using a bloat query, which shows the sizes of tables and indexes with actual size and the size that can be reduced by running VACUUM FULL or REINDEX on a particular table or index. These maintenance activities need the ACCESS EXCLUSIVE lock on the objects, which doesn’t allow any SQL on the object, including SELECT.

You can use the pg_repack extension to remove bloat from tables and indexes. This extension is supported on Amazon RDS for PostgreSQL versions 9.6.3 and later. For more information, see Working with the pg_repack Extension.

Also, you can remove duplicate or unused indexes in your database, to identify unused indexes use the following query:

WITH table_scans as (
    SELECT relid,
        tables.idx_scan + tables.seq_scan as all_scans,
        ( tables.n_tup_ins + tables.n_tup_upd + tables.n_tup_del ) as writes,
                pg_relation_size(relid) as table_size
        FROM pg_stat_user_tables as tables
),
all_writes as (
    SELECT sum(writes) as total_writes
    FROM table_scans
),
indexes as (
    SELECT idx_stat.relid, idx_stat.indexrelid,
        idx_stat.schemaname, idx_stat.relname as tablename,
        idx_stat.indexrelname as indexname,
        idx_stat.idx_scan,
        pg_relation_size(idx_stat.indexrelid) as index_bytes,
        indexdef ~* 'USING btree' AS idx_is_btree
    FROM pg_stat_user_indexes as idx_stat
        JOIN pg_index
            USING (indexrelid)
        JOIN pg_indexes as indexes
            ON idx_stat.schemaname = indexes.schemaname
                AND idx_stat.relname = indexes.tablename
                AND idx_stat.indexrelname = indexes.indexname
    WHERE pg_index.indisunique = FALSE
),
index_ratios AS (
SELECT schemaname, tablename, indexname,
    idx_scan, all_scans,
    round(( CASE WHEN all_scans = 0 THEN 0.0::NUMERIC
        ELSE idx_scan::NUMERIC/all_scans * 100 END),2) as index_scan_pct,
    writes,
    round((CASE WHEN writes = 0 THEN idx_scan::NUMERIC ELSE idx_scan::NUMERIC/writes END),2)
        as scans_per_write,
    pg_size_pretty(index_bytes) as index_size,
    pg_size_pretty(table_size) as table_size,
    idx_is_btree, index_bytes
    FROM indexes
    JOIN table_scans
    USING (relid)
),
index_groups AS (
SELECT 'Never Used Indexes' as reason, *, 1 as grp
FROM index_ratios
WHERE
    idx_scan = 0
    and idx_is_btree
UNION ALL
SELECT 'Low Scans, High Writes' as reason, *, 2 as grp
FROM index_ratios
WHERE
    scans_per_write <= 1
    and index_scan_pct < 10
    and idx_scan > 0
    and writes > 100
    and idx_is_btree
UNION ALL
SELECT 'Seldom Used Large Indexes' as reason, *, 3 as grp
FROM index_ratios
WHERE
    index_scan_pct < 5
    and scans_per_write > 1
    and idx_scan > 0
    and idx_is_btree
    and index_bytes > 100000000
UNION ALL
SELECT 'High-Write Large Non-Btree' as reason, index_ratios.*, 4 as grp
FROM index_ratios, all_writes
WHERE
    ( writes::NUMERIC / ( total_writes + 1 ) ) > 0.02
    AND NOT idx_is_btree
    AND index_bytes > 100000000
ORDER BY grp, index_bytes DESC )
SELECT reason, schemaname, tablename, indexname,
    index_scan_pct, scans_per_write, index_size, table_size
FROM index_groups;

To identify duplicate indexes use the following query:

SELECT
    indrelid::regclass AS TableName
    ,array_agg(indexrelid::regclass) AS Indexes
FROM pg_index
GROUP BY
    indrelid
    ,indkey
HAVING COUNT(*) > 1;

Archiving or removing old or temporary data

If you notice that some data has been created temporarily for testing purposes during some development process and not cleaned up yet, you can remove that old or temporary data, reducing the size of the database and therefore snapshot time. You can use the DROP or TRUNCATE table options to get rid of unused or temporary tables.

An efficient archiving/purging strategy can be implemented leveraging the aws_s3 extension or the AWS Database Migration Service (AWS DMS). Describe the different archiving or purging strategies is beyond the scope of this post, but you can find details in the following blog posts:

Migrating from RDS PostgreSQL to Aurora PostgreSQL

RDS offers you have two native options to migrate from RDS PostgreSQL to Aurora PostgreSQL-Compatible:

  • Use an RDS for PostgreSQL snapshot
  • Use an Aurora Read Replica for RDS for PostgreSQL

Your migration option depends on your use case and allowed application downtime. In this section, we discuss the purposes of these options and how to complete the migration.

Using an RDS PostgreSQL snapshot

In this option, you use the snapshot created for your existing RDS for PostgreSQL instance to create an Aurora PostgreSQL-Compatible instance. You may want to use this migration strategy with the following use cases:

  • You want your Aurora PostgreSQL-Compatible instance in a different Region than your existing RDS PostgreSQL instance
  • You want your Aurora PostgreSQL-Compatible instance in a different AWS account than your existing account
  • Your existing RDS for PostgreSQL is version 9.5 or older

For our use case, you have the RDS for PostgreSQL instance in a us-east Region and want to create an Aurora PostgreSQL-Compatible instance in a us-west Region. For this, you need to complete the following high-level steps:

  1. Create a snapshot of RDS for PostgreSQL.
  2. Copy the snapshot.
  3. Share the snapshot to the required AWS Region or account.
  4. Create Aurora PostgreSQL-Compatible using the snapshot.

Creating a snapshot for RDS PostgreSQL

To create your snapshot, complete the following steps:

  1. On the Amazon RDS console, choose Databases.
  2. Select the RDS for PostgreSQL instance that you want to migrate.
  3. From the Actions drop-down menu, choose Take snapshot.

  1. For Snapshot name, enter a name for your snapshot (for example, to-create-aurorapg).
  2. Choose Take Snapshot.

To see your snapshot, choose Snapshots from the navigation pane. The snapshot is listed on the Manual tab.

For full instructions, see Creating a DB Snapshot.

Copying the snapshot

In this step, you copy an encrypted or unencrypted DB snapshot, in the same Region or across Regions.

  1. On the Amazon RDS console, choose Snapshots.
  2. On the Manual tab, select the snapshot you want to copy.
  3. From the Actions drop-down menu, choose Copy Snapshot.

For this post, we copy the snapshot from us-east to us-west. The destination Region must have the same database engine version available as the source Region.

  1. For Destination Region, choose US West (N. California).
  2. For New DB Snapshot Identifier, enter to-create-aurorapg-from-useast.
  3. For Encryption, select Disable encryption, or choose Enable encryption if you want to encrypt the snapshot irrespective of if your source snapshot is encrypted or not.
  4. Then for Master Key, specify the AWS Key Management Service (AWS KMS) key identifier to use to encrypt the DB snapshot copy.
  5. Choose Copy snapshot.

You can view the progress of the snapshot on the Snapshots page. In the following screenshot, the snapshot is in the process of being copied in us-west.

Sharing a DB snapshot

You can share a DB snapshot (which is taken manually) to 20 AWS accounts, and can also stop sharing the snapshot for one or more accounts.

  1. On the Amazon RDS console, choose Snapshots.
  2. Select the snapshot you want to share.
  3. From the Actions drop-down menu, choose Share Snapshot.

  1. For DB snapshot visibility, select Public or Private.
  2. For AWS account ID, enter the account you want to share the snapshot with and choose Add.
  3. Repeat to include additional AWS accounts, up to 20.
  4. Choose Save.

To stop sharing a manual DB snapshot with an AWS account, repeat steps 1-3. On the Snapshot permissions page, enter the ID of the account you want to delete and choose Delete.

Creating an Aurora PostgreSQL instance using the snapshot

After you create the snapshot of your RDS for PostgreSQL instance and share with the Region or account that you want, you can create an Aurora PostgreSQL-Compatible instance using that snapshot.

  1. On the Amazon RDS console, choose Snapshots.
  2. Select the snapshot you want to migrate to an Aurora PostgreSQL-Compatible DB cluster.
  3. From the Actions drop-down menu, choose Migrate snapshot.

  1. For Migrate to DB Engine, choose your database engine (for this post, aurora-postgresql).
  2. For DB Engine Version, enter the version number of your database engine (for this post, 4).

For more information about setting those values, see Migrating an RDS PostgreSQL DB Snapshot to an Aurora PostgreSQL DB Cluster.

Using an Aurora Read Replica for RDS PostgreSQL

As a second option, you can migrate from an RDS for PostgreSQL DB instance to an Aurora PostgreSQL-Compatible DB cluster by using an Aurora Read Replica. This is the best approach if you want to migrate your RDS for PostgreSQL to Aurora PostgreSQL-Compatible within the same Region and account. However, this migration may take some time depending on the data size, roughly several hours per tebibyte (TiB) of data. Large databases under heavy load take a long time to sync. While the migration is in progress, your RDS for PostgreSQL instance accumulates write ahead log (WAL) segments for the initial sync and catch-up to happen. Make sure that your database instance has enough free storage available to hold these WAL files.

In this case, Amazon RDS uses the PostgreSQL DB engine’s streaming replication functionality to create a special type of DB cluster for the source PostgreSQL DB instance. This type of DB cluster is called an Aurora Read Replica. Updates made to the source PostgreSQL DB instance are asynchronously replicated to the Aurora Read Replica.

This migration option has two steps:

  1. Create an Aurora Read Replica for RDS for PostgreSQL.
  2. Promote the Aurora Read Replica to an Aurora PostgreSQL-Compatible cluster.

Creating an Aurora Read Replica

To create an Aurora Read Replica for a PostgreSQL DB instance, complete the following steps:

  1. On the Amazon RDS console, choose Databases.
  2. Select the PostgreSQL DB instance you want to use as the source for your read replica.
  3. From the Actions drop-down menu, choose Create Aurora read replica.

  1. On the Create Aurora Read Replica page, choose your options accordingly.
  2. Choose Create read replica.

For full instructions, see Creating an Aurora Read Replica.

Promoting the Aurora PostgreSQL Read Replica to an Aurora PostgreSQL-Compatible Cluster

You’re now ready to promote your read replica to an Aurora PostgreSQL-Compatible cluster.

  1. On the Amazon RDS console, choose Instances.
  2. Select the DB instance for the read replica.
  3. From the Actions drop-down menu, choose Promote.
  4. Choose Promote Read Replica.

Migrating from a self-managed PostgreSQL instance to an Aurora PostgreSQL-Compatible instance

If you have an existing PostgreSQL deployment on an EC2 instance or on-premises that you want to move to an Aurora PostgreSQL-Compatible instance, then the complexity of your task depends on the size of your database and the types of database objects that you’re transferring. For example, consider a database that contains datasets on the order of gigabytes, along with stored procedures and triggers. Such a database is more complicated than a simple database with only a few megabytes of test data and no triggers or stored procedures. For this use case, you can consider using the following options:

Using pg_dump and pg_restore

pg_dump and pg_restore are PostgreSQL native tools to create and restore a backup, respectively. You can create this backup on a production environment because it doesn’t block readers and writers. However, because this isn’t a continuous backup, you have to stop the application or stop writing to the database from the application before you initiate the backup. For instructions, see Importing a PostgreSQL Database from an Amazon EC2 Instance.

Using Logical Replication

Logical Replication is another way to migrate data from one PostgreSQL instance to another. Logical Replication is a method of replicating data objects and their changes, based upon their replication identity (usually a primary key). It is introduced in PostgreSQL 10, but you can use the pglogical extension if you have an older version than PostgreSQL 10.

For more information about enabling and using logical replication, see Using PostgreSQL Logical Replication with Aurora. For instructions on setting up logical replication, see Migrating PostgreSQL from on-premises or Amazon EC2 to Amazon RDS using logical replication.

Using AWS DMS replication

You can use AWS DMS to migrate data from your database that is on-premises, on an RDS DB instance, or in a database on an EC2 instance to a database on an AWS service like Aurora PostgreSQL-Compatible. Because you can replicate the changes continuously using AWS DMS, you don’t need to stop your application. For instructions, see Using DMS to migrate data from PostgreSQL to PostgreSQL.

For more information about AWS DMS, see the AWS DMS User Guide.

Summary

This post talked about the different available options to import data to an Aurora PostgreSQL-Compatible instance. Although you can choose from multiple options, each one has a specific purpose or use case:

  • If you want to migrate from RDS for PostgreSQL to an Aurora PostgreSQL-Compatible instance in a different AWS Region or AWS account, you can consider using a snapshot.
  • If you want to migrate from RDS for PostgreSQL to Aurora PostgreSQL-Compatible within the same Region, the best option is to create an Aurora Read Replica for your RDS for PostgreSQL instance.
  • If you want to migrate your RDS for PostgreSQL with a different Region or account or PostgreSQL on Amazon EC2 or on-premises instances with minimal downtime, consider using AWS DMS.

If you have comments or questions about this solution, please submit them in the comments section.


About the Authors

Baji Shaik is a Consultant with AWS ProServe, GCC India. His background spans a wide depth and breadth of expertise and experience in SQL/NoSQL database technologies. He is a Database Migration Expert and has developed many successful database solutions addressing challenging business requirements for moving databases from on-premises to Amazon RDS and Aurora PostgreSQL. He is an eminent author, having written several books on PostgreSQL. A few of his recent works include “PostgreSQL Configuration“, “Beginning PostgreSQL on the Cloud”, and “PostgreSQL Development Essentials“. Furthermore, he has delivered several conference and workshop sessions.

Domenico di Salvia is a Senior Database Specialist Solutions Architect at AWS. In his role, Domenico works with customers in EMEA region to provide guidance and technical assistance on database projects, helping them improve the value of their solutions when using or migrating to AWS, designing scalable, secure, performant, sustainable, cost-effective and robust database architectures in the AWS Cloud.


Audit History

Last reviewed and updated in January 2024 by Domenico di Salvia | Sr. RDS PostgreSQL Solutions Architect