AWS Database Blog
Migrating databases using RDS PostgreSQL Transportable Databases
April 2023: This post was reviewed and updated with a new section on Limitations
Amazon Relational Database Service (Amazon RDS) for PostgreSQL now supports the feature Transportable Databases, a high-speed data import and export method supported on versions 11.5 and later and 10.10 and later. If you need to import a PostgreSQL database from one RDS PostgreSQL instance into another, you can use native tools such as pg_dump and pg_restore or load data into target tables using the \copy
command. With transportable databases, you can move data much faster than these traditional tools. This feature uses a new extension called pg_transport
, which provides a per-database physical transport mechanism. Physical transport can move data much faster with minimum downtime by streaming the database files with minimal processing.
This post discusses common use cases for pg_transport
and how to configure and use this in your RDS PostgreSQL environment using AWS CLI and the psql utility.
Use cases for pg_transport
Many customers run their own SaaS or multi-tenant and enterprise applications on RDS PostgreSQL because Amazon RDS makes it simple to set up, operate, and scale PostgreSQL deployments in AWS. It’s common to have multiple databases on a single RDS PostgreSQL instance serving different customers or applications. Transportable databases are useful in such environments; you can address the following use cases with minimal downtime:
- Moving a database from one RDS instance to another for better resource management or isolation. For example, as an SaaS provider, you must move your customer data to a different configuration (larger instance type, provisioned IOPS, or different instance family) after they have reached a specific capacity.
- Reorganizing data for security and compliance reasons. For example, moving a database from an RDS instance in one AWS account to an RDS instance in an account with additional security controls.
- Providing a complete copy of your database for downstream applications or testing new features as needed.
Prerequisites
This post assumes you are familiar with RDS PostgreSQL and Amazon Elastic Compute Cloud (Amazon EC2) instances. You should also meet the following prerequisites:
- Launch two RDS instances with version 11.5 and instance type
r5. large
in the same or different AWS accounts within the same AWS Region. Name the source RDS cluster aspgsource
.<identifier>. <AWS region>.rds.amazonaws.com
and the target aspgtarget
.<identifier>. <AWS region>.rds.amazonaws.com
. Both the source and target RDS instances should be of the same major version of PostgreSQL. For more information, see Create and Connect to a PostgreSQL Database. - Make sure that the VPC security group for your source database instance allows inbound traffic from the destination instance.
- Set up an Amazon EC2 instance with the psql client to work with PostgreSQL instances. Configure your database security groups to allow traffic from this EC2 instance. For more information, see Connecting to a DB Instance Running the PostgreSQL Database Engine.
- Configure AWS CLI with your credentials. You should have RDS administrator IAM privileges. For more information, see Identity and Access Management in Amazon RDS. You can also consider using AWS managed policy DatabaseAdministrator to manage the RDS instances.
For more information about the known limitations and configuration parameters related to this feature, see Transporting a PostgreSQL Database Using the transport.import_from_server Function.
Limitations
- The source and destination DB instances must run the same major version of PostgreSQL.
- You cannot transport databases on read replicas or on parent instances of read replicas.
- You cannot use “reg” data types in any database tables that you plan to transport using this method.
- No extension other than
pg_transport
must be enabled in the source database instance. - The database being transported must not exist on the target, or else the transport fails.
- The access privileges and ownership from the source database are not transferred to the target database.
- All source database objects must be in the default “
pg_default
” tablespace. - The tables within the database cannot be renamed, included or excluded. Everything is migrated as is.
- A single DB instance can support up to 32 concurrent transports, including both imports and exports, if worker processes have been configured properly.
For more details, checkout the AWS prescriptive guidance.
Setting up the source environment
Before using this feature, enable the necessary configuration parameters and pg_transport extension on the source and target RDS environments. Complete the following steps:
- Log in to your EC2 instance. Create a new parameter group and attach it to the pgsource (source RDS) See the following command:
- Configure the
pg_transport
-related parameterspg_transport.num_workers
,pg_transport.timing
, andpg_transport.work_mem
. Load the extension viashared_preload_libraries
and update themax_worker_processes
. See the following command:For more information related to pg_transport related parameters, see Transporting PostgreSQL databases between DB instances.
- Modify the source RDS instance with the newly created parameter group. You have to reboot the instance because you modified some of the static parameters such as
shared_preload_libraries
andmax_worker_processes
. See the following command:
- After the RDS instance is up, check the connectivity to the instance via psql. Set up a sample database to test
pg_transport
. This post uses two databases of different sizes, but you can use any databases from your environment. See the following command:a.) Load the
imdb
database with data from the IMDB dataset (approximately 14 GB). There are many sample datasets available in the open-source community for PostgreSQL database. For more information, see Sample Databases in the PostgreSQL wiki.
b.) Load thebenchdb
database using the pgbench This post uses the scale factor of 10000 to initialize thepgbench
database (approximately 130 GB). Enter the following command to load the data intobenchdb
:
- Verify the source database environment configuration and create the
pg_transport
extension for each database. After connecting to the source instance, enter the following commands. The following command also shows the output:
- Verify that the source database configuration parameters updated via the new parameter group by entering the following query:
You now have a fully configured source environment.
Setting up the target environment
To set up the target environment, complete the following steps:
- Create a new parameter group and attach it to the
pgtarget
(target RDS) instance. See the following command: - Configure the
pg_transport
-related parameters. Load the extension viashared_preload_libraries
and update themax_worker_processes
. You should setmax_worker_processes
on both the source and destination DB instances to at least three times plus nine the destination DB instance’s setting for thepg_transport.num_workers
parameter:max_worker_processes = (pg_transport.num_workers * 3) + 9
. Add a few more to provide non-transport background worker processes. This post configures the value ofmax_worker_processes
as 24. See the following command: - Update the target RDS instance with the newly created parameter group. You have to reboot the instance due to the modification of static parameters such as
shared_preload_libraries
andmax_worker_processes
. See the following command: - Verify that the target database configuration parameters updated via the new parameter group by entering the following query:
- After the RDS instance is up, check the connectivity to the instance via psql. Make sure the target environment does not contain the database with the same name as that of the source. See the following command:
- Set up the
pg_transport
extension in the destination instance with the following command:
Testing pg_transport
You now have fully configured source and target environments and are ready to test this feature. pg_transport
provides a simple option to validate both environments before the actual transfer of data (a dry run of your migration).
- Use the
transport.import_from_server
function to perform the migration. Set the parameterdry_run=true
. See the following command: - After a successful dry run, initiate the database migration. Change the parameter value as false (
dry_run=false
). See the following command:For this post, transferring the 14-GB
imdb
database took less than 60 seconds.
To understand the performance ofpg_transport
, this post tested importing the sameimdb
database using thepg_dump
andpg_restore
method. See the following command:It took approximately 8 minutes, as opposed to
pg_transport
, which took less than a minute.
While the database transport is underway, you can run read-only queries only. For example, if you attempt to update the table inimdb
databasetitle_rating
, you get the following error:
- Repeat the same steps for migrating the
pgbench
databasebenchdb
. See the following command:It took approximately 569 seconds with
pg_transport.num_workers=8
andmax_worker_processes=40
. There were no other workloads or queries running on the source and target RDS instances when these tests were conducted. Your transfer speed may vary due to factors such as source and target system load, instance type, and RDS parameter configurations.
Best practices
For large-size databases (approximately 200 GB), you can modify the parameters related to pg_transport
. For example, you can increase pg_transport.num_workers
to 8 and max_worker_processes
to three times plus nine the number of pg_transport.num_workers
, visit Transporting PostgreSQL databases between DB instances for more information. num_workers.pg_transport
process consumes memory resources at the instance level, which may impact other running databases on both the source and target. Therefore, plan and test your configuration in a development environment before applying changes in the production environment.
pg_transport.work_mem
is allocated for each worker process (for example, pg_transport.num_workers
). Make sure you have sufficient freeable memory available in your instance. For more information, see Best Practices for Amazon RDS.
Setting up pg_transport
requires rebooting the instance due to the modification of static parameters such as max_worker_processes
and shared_preload_libraries
. Therefore, it’s a good idea to test and standardize these parameters based on your database environment to avoid frequent changes.
After transporting the database, set up appropriate roles and permissions at the target as per your database access requirements. Additionally, you can enable extensions as needed by your application.
Summary
This post provided use cases of RDS PostgreSQL transportable databases feature and highlighted important considerations when configuring the pg_transport
extension, as well as performance advantages over the traditional dump and load method. We encourage you to try out this feature in your environment. As always, AWS welcomes feedback, so please leave comments or questions below.
About the Author
Gowri Balasubramanian is a Principal Database Solutions Architect at Amazon Web Services. He works with AWS customers to provide guidance and technical assistance on both relational and NoSQL database services, helping them improve the value of their solutions when using AWS.
Rinisha Marar is an Associate RDS PostgreSQL Solutions Architect specializing in both relational (RDS/Aurora PostgreSQL) and non-relational (Amazon QLDB) database services at Amazon Web Services. She helps customers by providing them technical and architectural guidance, solutions and recommendation on best practices when leveraging the AWS services.