I want to use AWS Data Pipeline either to copy an Amazon Relational Database Service (Amazon RDS) for PostgreSQL table to an Amazon Redshift table, or to copy an Amazon RDS for MySQL table to an Amazon RDS for PostgreSQL table. How can I do this?

Data Pipeline default templates support copying an Amazon RDS for MySQL table to Amazon Redshift. For more information, see Amazon RDS to Redshift Templates. You can also modify the scripts from the default template to copy a table across other types of databases.

Download and use these scripts to copy a table from one database to another using Data Pipeline. Before you begin, modify the sample definition file with your database, EC2 instance details, and the path to your hosted shell script in Amazon Simple Storage Service (Amazon S3).

Copy an Amazon RDS for PostgreSQL table to an Amazon Redshift table using Data Pipeline

PostgresqlRdsToRedshift is a sample pipeline definition file that shows all activities and pipeline objects for copying an Amazon RDS for PostgreSQL table to an Amazon Redshift table. This sample file is based on the default Data Pipeline template used to copy a MySQL table to an Amazon Redshift table, with changes to the shell script used in the ShellCommandActivity. The sequence of activities in the pipeline is as follows:

1.    RDSToS3CopyActivity - CopyActivity copies from an RDS table to the Amazon S3 staging path. CopyActivity stages the entire RDS table to the EC2 instance running the activity. Allocate a larger EBS volume to your EC2 instance if you require additional space to fit your entire RDS table.

2.    RedshiftTableCreateActivity - ShellCommandActivity creates the target Amazon Redshift table. This activity runs the dbconv_psqlRDStoRedshift.sh script which is hosted in your S3 bucket.

3.    S3ToRedshiftCopyActivity - RedshiftCopyActivity uses the Amazon Redshift COPY command to copy the Amazon S3 CSV file to the target Amazon Redshift table.

4.    S3StagingCleanupActivity - ShellCommandActivity cleans up the intermediate Amazon S3 CSV file.

Copy an Amazon RDS for MySQL table to an Amazon RDS for PostgreSQL table using Data Pipeline

mysqlRDS-psqlRDS-copy-using-shell-definition.json is a sample pipeline definition file that shows all activities and pipeline objects for copying an Amazon RDS for MySQL table to an Amazon RDS for PostgreSQL table. This sample file is based on the default Data Pipeline Template used to copy a MySQL table to an Amazon Redshift table, with changes to the shell script used in the ShellCommandActivity. The sequence of activities in the pipeline is as follows:

1.    RDSToS3CopyActivity - CopyActivity copies from the RDS table to the Amazon S3 staging path. CopyActivity stages the entire RDS table to the EC2 instance running the activity. Allocate a larger EBS volume to your EC2 instance if you require additional space to fit your entire RDS table.

2.    SyncS3CsvToEc2 - ShellCommandActivity syncs the Amazon S3 CSV file to the local directory of the EC2 instance. PostgreSQL RDS does not support copying from Amazon S3.

3.    JdbcTableCreateActivity - ShellCommandActivity creates the target table in Amazon RDS for PostgreSQL table and writes the table data that is synced on the local EC2 instance to the target PostgreSQL table. This activity runs the dbconv_mysqlRDS-to-postgresqlRDS.sh script which is hosted in your S3 bucket.

4.    S3StagingCleanupActivity - ShellCommandActivity cleans up the intermediate Amazon S3 CSV file.


Did this page help you? Yes | No

Back to the AWS Support Knowledge Center

Need help? Visit the AWS Support Center

Published: 2018-01-09