AWS Database Blog
Federated query support for Amazon Aurora PostgreSQL and Amazon RDS for PostgreSQL – Part 2
This post is a continuation of Federated query support for Amazon Aurora PostgreSQL and Amazon RDS for PostgreSQL – Part 1. To recap, in 2003, the SQL standard added a new specification called SQL/MED (SQL Management of External Data). In 2011, PostgreSQL 9.1 was released with read-only support of this standard, and PostgreSQL 9.3 added write support in 2013. PostgreSQL implements SQL/MED through an interface called foreign data wrappers (FDW). There are several FDWs that let PostgreSQL connect to different external data sources, ranging from SQL database engines to flat files.
FDWs are helpful if you organization uses external data sources that you’d like to query from your PostgreSQL database. For example, you may want to analyze data using Amazon QuickSight and use an Amazon Relational Database Service (Amazon RDS) for PostgreSQL instance as the data source. If some of your data set resides in a separate Amazon RDS for MySQL database instance, you can use the MySQL FDW (mysql_fdw) from your RDS for PostgreSQL instance to keep the source database the same for Amazon QuickSight.
In part 1, we discussed how you can use one implementation of PostgreSQL FDW via the extension postgres_fdw
to interact with data residing in other PostgreSQL-based databases from within Amazon Aurora PostgreSQL-Compatible Edition and Amazon RDS for PostgreSQL. In this post, we review other FDW extensions that let you use your Amazon RDS for PostgreSQL database instance to query MySQL-compatible engines, Microsoft SQL Server and Sybase engines that support the tabular data stream (TDS) protocol, and Oracle databases.
Solution overview
In June 2021, Amazon RDS for PostgreSQL and Amazon Aurora PostgreSQL compatible started supporting the oracle_fdw extension, which allows your PostgreSQL database to connect and retrieve data stored in separate Oracle databases.
In March 2022, Amazon RDS for PostgreSQL added support for mysql_fdw, which allows your PostgreSQL database to connect and retrieve data stored in separate databases in Amazon Aurora MySQL-Compatible Edition, Amazon RDS for MySQL, and Amazon RDS for MariaDB, as well as self-managed MySQL/MariaDB instances on Amazon Elastic Compute Cloud (Amazon EC2). Amazon RDS for PostgreSQL and Amazon Aurora with PostgreSQL compatibility also added support for tds_fdw, which allows your PostgreSQL database to connect and retrieve data stored in separate SQL Server databases.
The mysql_fdw
extension is supported on Amazon RDS for PostgreSQL version 14.2, 13.6 and higher releases while tds_fdw
is supported by both Amazon RDS for PostgreSQL and Amazon Aurora with PostgreSQL compatibility.
The following table represents the extension name, supported DML and the versions supported for both RDS for PostgreSQL and Aurora PostgreSQL Compatible-edition.
Extension Name | Supported DML | RDS for PostgreSQL Support | Aurora PostgreSQL-compatible Support |
postgres_fdw |
Reads and writes | All available versions | All available versions |
mysql_fdw |
Reads and writes | 14.2, 13.6 and higher releases. | Not supported |
oracle_fdw |
Reads and writes | 12.7 or higher releases. | 12.7 or higher releases. |
tds_fdw |
Reads only | 14.2, 13.6 and higher releases. | 14.2, 13.6 and higher releases. |
In the following sections, we learn how to set up the three forms of FDWs and use them for tasks such as performing complex joins and aggregates on data residing in different data sources like MySQL-compatible, SQL Server, and Oracle databases with data residing in PostgreSQL to derive insights.
Prerequisites
Before getting started, make sure you have the following prerequisites:
- A primary Aurora PostgreSQL-compatible or RDS for PostgreSQL instance as your source machine.
- A remote database instance with information like user name, password, and database name. You can use any of the following databases:
- MySQL (Aurora, Amazon RDS, or self-managed on Amazon EC2 or on premises) for
mysql_fdw
- SQL Server (Amazon RDS or self-managed on Amazon EC2 or on premises) for
tds_fdw
- Oracle (Amazon RDS or self-managed on Amazon EC2 or on premises) for
oracle_fdw
- MySQL (Aurora, Amazon RDS, or self-managed on Amazon EC2 or on premises) for
- Network connectivity between the primary and remote database. For more information, see How can I troubleshoot connectivity to an Amazon RDS DB instance that uses a public or private subnet of a VPC?
- Tables that you query in the foreign (remote) server. For this post, we create one table in the source database
pgfdwsource
and one table in target database (MySQL, Oracle, or SQL Server).
In pgfdwsource
database, create a salary
table with the columns emailid
and salary
with dummy content. See the following script:
For the target database, we create a table that corresponds to customer1
with the columns id
, name
, emailid
, projectname
, and contactnumber
with dummy content. The following sections show how to set up this table in each database engine.
Setting up the MySQL database
For a target MySQL foreign database, use the following script:
Setting up the Oracle database
For a target Oracle foreign database, use the following script:
Setting up the SQL Server database
For a target SQL Server foreign database, use the following script:
In all tables, the column emailid
is common and can be used to derive insights into the salary corresponding to the customer1
employee information.
In the upcoming sections of this post, we set up a connection in between the source and target database and run federated queries to derive insights from the data present in MySQL compatible, SQL Server, and Oracle with PostgreSQL database engine.
Connect to the source instance
All the steps in the following sections are performed after logging in with the role pgfdwsource
into the primary database instance pgfdwsource
.
You connect to your source PostgreSQL instance with a primary user or via a normal user that has rds_superuser
permissions. You can use client tools like psql or pgAdmin. For more information, see Connecting to a DB instance running the PostgreSQL database engine. Create the extension mysql_fdw
(for connecting with a MySQL foreign database), oracle_fdw
(for connecting with an Oracle foreign database), and tds_fdw
(for connecting with a SQL Server foreign database) with CREATE EXTENSION:
Create a foreign server
We use CREATE SERVER to create our foreign (remote) server to encapsulate connection information that an FDW uses to access an external data resource.
For a target MySQL foreign server, see the following script:
For a target Oracle foreign server, see the following script:
For a target SQL Server foreign server, see the following script:
By default, PostgreSQL only lets a privileged account create and access a foreign server. In Amazon RDS for PostgreSQL and Amazon Aurora PostgreSQL-Compatible Edition, this is any user member of the rds_superuser
role. To allow an unprivileged user to access data on a foreign server, you must explicitly grant the USAGE
privilege on that foreign server to the user. The following command provides an example for how to grant usage on a foreign server to an unprivileged user:
Create a user mapping
CREATE USER MAPPING defines a mapping of a user to a foreign server. A user mapping encapsulates connection information that a foreign-data wrapper uses with the information from a foreign server to connect to an external data resource.
The owner of a foreign server can create user mappings for that server for any user.
In the following script, when the user pgfdwsource
is connecting to the foreign server (remote database), they use the login information provided in the user mapping.
For a target MySQL foreign server, see the following script:
For a target Oracle foreign server, see the following script:
For a target SQL Server foreign server, see the following script:
Create a foreign table
CREATE FOREIGN TABLE creates a foreign table in the source database that is mapped to the table in the foreign server. For example, in the following script, the foreign table customer1_fdw
points to the table customer1
, which resides in the remote database. When you need to query data from table customer1
in the remote database, the user in the local database queries using the foreign table customer1_fdw
.
For a target MySQL foreign server, see the following script:
For a target Oracle foreign server, see the following script:
For a target SQL Server foreign server, see the following script:
Furthermore, the name of the foreign table or table reference can be different from the name of the table in the remote database. This flexibility can be useful when you want to make the database name consistent for the application that’s accessing the foreign table from the PostgreSQL database.
Run federated queries with FDW
mysql_fdw
and oracle_fdw
allow you to run SELECT, INSERT, UPDATE, and DELETE statements in a remote MySQL or Oracle database. tds_fdw
allows you to run only SELECT statements on a remote SQL Server database. Unlike other PostgreSQL foreign data wrappers, tds_fdw
doesn’t support write mode, and can only read from the mapped foreign table within SQL Server. Alternatively, if you are working with Amazon Aurora PostgreSQL-Compatible Edition, you can work with Babelfish for Aurora PostgreSQL. With Babelfish, Amazon Aurora PostgreSQL understands T-SQL, Microsoft SQL Server’s proprietary SQL dialect, and supports the same communications protocol, so your application that were originally written for SQL Server can work with Aurora with fewer code changes. Refer to Get started with Babelfish for Aurora PostgreSQL to learn more.
With the preceding steps complete, you can now query the table customer1, which resides in the corresponding foreign server using the appropriate FDW, from within the instance pgfdwsource.xxxx.us-west-2.rds.amazonaws.com
, without the need for replication or any kind of pipeline. See the following script:
For a target MySQL foreign server, see the following script:
You can also run join queries to perform aggregates (and more) to derive insights.
In the following script, only the data for a particular project is queried. The project name details are stored in the remote server. In the explain plan of the query, the WHERE clause has been pushed down to the remote server. This ensures that only the data required to realize the result of the query is retrieved from the remote server and not the entire table. This is one of the optimizations that FDW offers that essentially reduces the data transferred over the network as well.
For a target Oracle foreign server, see the following script:
You can also run join queries to perform aggregates (and more) to derive insights.
In the following script, only the data for a particular project is queried. The project name details are stored in the remote server. In the explain plan of the query, the WHERE clause has been pushed down to the remote server. This makes sure that only the data required to realize the result of the query is retrieved from the remote server and not the entire table. This is one of the optimizations that FDW offers that essentially reduces the data transferred over the network as well.
For a target SQL Server foreign server, see the following script:
You can also run join queries to perform aggregates (and more) to derive insights.
In the following script, only the data for a particular project is queried. The project name details are stored in the remote server. In the explain plan of the query, the WHERE clause has been pushed down to the remote server. This makes sure that only the data required to realize the result of the query is retrieved from the remote server and not the entire table. This is one of the optimizations that FDW offers that essentially reduces the data transferred over the network as well.
You can pull data from different database engines by using the appropriate FDW extension, as seen in the preceding examples. Because the data is pulled in real time from a foreign database, you can achieve this without needing to set up any kind of replication or reserving space in the source database.
Additional capabilities
In addition to SELECT, mysql_fdw
and oracle_fdw
allows you to run UPDATE, INSERT, and DELETE on a foreign table.
For a target MySQL foreign server, see the following script:
Note that you receive an error if a column with the UNIQUE constraint isn’t defined at the remote MySQL server.
For a target Oracle foreign server, see the following script:
For a SQL Server foreign server or tds_fdw
, you can only do SELECT operations on foreign server. If you attempt to run an insert or update on the foreign table, it fails with the following error:
ERROR: cannot insert into foreign table “sqlservertab”
FDWs come with optimizations, such as maintaining the table schema locally. As a result, rather than doing a SELECT *
to pull all the data from a table in remote server, the FDW sends query WHERE clauses to the remote server to run, and doesn’t retrieve table columns that aren’t needed for the current query. It can also push down joins, aggregates, sorts, and more.
Clean up
You can follow the steps in this section to clean up the resources created in previous steps:
For a target MySQL foreign server, see the following script:
- Drop the foreign table:
- Drop the user mapping that maps the database user to the foreign server user:
- Drop the foreign server definition that provides the local PostgreSQL server with the foreign (remote) server connection information:
- Drop the extension:
For a target Oracle foreign server, see the following script:
- Drop the foreign table:
- Drop the user mapping that maps the database user to the foreign server user:
- Drop the foreign server definition that provides the local PostgreSQL server with the foreign (remote) server connection information:
- Drop the extension:
For a target SQL Server foreign server, see the following script:
- Drop the foreign table:
- Drop the user mapping that maps the database user to the foreign server user:
- Drop the foreign server definition that provides the local PostgreSQL server with the foreign (remote) server connection information:
- Drop the extension:
If you do not need to use the Amazon Aurora and Amazon RDS instances that you created for these examples, we recommend that you delete them.
Summary
The FDWs discussed in this post are quite powerful; this post covers just the basic functionality. For more information about these optimizations, see the following GitHub repos:
This two-part series showed what you can achieve with federated query capabilities and the foreign data wrappers postgres_fdw
, mysql_fdw
, oracle_fdw
, and tds_fdw
in Amazon RDS for PostgreSQL or Amazon Aurora PostgreSQL-compatible edition to query data from a PostgreSQL-based, MySQL, Oracle, or SQL Server remote server, both managed and self-managed on Amazon EC2 or on premises. Query federation is a powerful functionality that can be used in real time for running analytics queries on external databases, migrating data from external database to PostgreSQL or for developing basic ETL pipelines.
If you have questions or suggestions on the content covered in this post, leave them in the comments section.
About the Authors
Vibhu Pareek is a Solutions Architect at AWS. He joined AWS in 2016 and specializes in providing guidance on cloud adoption through the implementation of well architected, repeatable patterns and solutions that drive customer innovation. He has keen interest in open source databases like PostgreSQL. In his free time, you’d find him enjoying a game of football.
Asmita Varma is a Technical Account Manager with Amazon Web Services. She helps customers navigate the challenges and complexities of the AWS Cloud to attain operational excellence. Her focus area has been Databases.
Chirag Tyagi is a Senior Solutions Architect at amazon web services. He’s passionate about helping customers achieve their business objectives with focus on Databases and Analytics.