AWS Database Blog
Federated query support for Amazon Aurora PostgreSQL and Amazon RDS for PostgreSQL – Part 1
PostgreSQL is one of the most widely used database engines and is supported by a very large and active community. It’s a viable open-source option to use compared to many commercial databases, which require heavy license costs. Amazon Relational Database Service (Amazon RDS) for PostgreSQL and Amazon Aurora PostgreSQL-Compatible Edition are AWS managed offerings that take away the heavy lifting required for setting up the platform, configuring high availability, monitoring, and much more. This allows DBAs to spend more time on business-critical problems like doing schema design early on or query tuning.
In 2003, a new specification called SQL/MED (“SQL Management of External Data”) was added to the SQL standard. It’s a standardized way of handling access to remote objects from SQL databases. In 2011, PostgreSQL 9.1 was released with read-only support of this standard, and in 2013, write support was added with PostgreSQL 9.3. The implementation of this concept in PostgreSQL was called foreign data wrappers (FDW). Several FDWs are available that help connect PostgreSQL Server to different remote data stores, ranging from other SQL database engines to flat files. However, most FDWs are independent open-source projects implemented as Postgres extensions, and not officially supported by the PostgreSQL Global Development Group.
In this post, we discuss one FDW implementation that comes with PostgreSQL source as a contrib extension module called postgres_fdw. postgres_fdw allows you to implement federated query capability to interact with any remote PostgreSQL-based database, both managed and self-managed on Amazon Elastic Compute Cloud (Amazon EC2) or on premises. This is available in all present versions supported for Amazon RDS for PostgreSQL and Aurora PostgreSQL. The following diagram illustrates this architecture.
Important: The PostgreSQL Foreign Data Wrapper (FDW) does not officially support Amazon Redshift as a target database. As a result, certain functionalities might not work as expected or may encounter compatibility issues when attempting to connect to or query data in a Redshift instance. This post is intended for educational purposes only, and any implementation should be thoroughly tested in a non-production environment before considering it for use in a production setting.
Use cases
In this post, we primarily focus on two use cases to give an overview on the capability. However, you can easily extended the solution for other federated query use cases.
When working with independent software vendors (ISVs), we occasionally see them offering Amazon RDS for PostgreSQL and Aurora PostgreSQL in a multi-tenant setup in which they use one database per customer and a shared database within a single instance. Federated query capability implemented via FDW allows you to pull data from a shared database to other databases as needed.
An organization could have multiple systems catering to different departments. For example, a payroll database has employee salary information. This data maybe required by the HR and tax systems to calculate hike or decide tax incurred, respectively. One solution for such a problem is to copy the salary information in both the HR and Tax systems. However, this kind of duplication may lead to problems, like ensuring data accuracy, extra storage space incurred, or double writes. FDWs avoid duplication while providing access to required data that resides in a different foreign database. The following diagram illustrates this architecture.
Furthermore, in today’s world of purpose-built databases you host hot or active data in Amazon RDS for PostgreSQL and Aurora PostgreSQL, you have separate data warehouse solutions like Amazon Redshift for data archival. Without federated query support from an active database, you have to stream the active data to the data warehouse on an almost near-real-time basis to run analytical queries, which requires extra efforts and costs to set up a data pipeline and an additional overhead to the data warehouse. With federated query capability, you can derive insights while joining data at the transactional database from within itself as well as a data warehouse like Amazon Redshift. The following diagram illustrates this architecture.
For more information about querying data in your Aurora PostgreSQL or Amazon RDS for PostgreSQL remote server from Amazon Redshift as the primary database, see Querying data with federated queries in Amazon Redshift.
Prerequisites
Before getting started, make sure you have the following prerequisites:
- A primary Aurora PostgreSQL or Amazon RDS for PostgreSQL instance as your source machine.
- A remote PostgreSQL-based instance with information like username, password, and database name. You can use any of the following databases:
- Aurora PostgreSQL
- Amazon RDS for PostgreSQL
- Amazon Redshift
- Self-managed PostgreSQL on Amazon EC2
- On-premises PostgreSQL database
- Network connectivity between the primary and remote database. The remote database can be a different database within the same primary database instance, a separate database instance, Amazon Redshift cluster within the same or different Amazon Virtual Private Cloud (Amazon VPC), or even on-premises PostgreSQL-based database servers that can be reached using VPC peering or AWS managed VPN or AWS Direct Connect. The remote PostgreSQL-based database instance or Amazon Redshift cluster can be in a same or different AWS account with established network connectivity in place. For more information, see How can I troubleshoot connectivity to an Amazon RDS DB instance that uses a public or private subnet of a VPC?
- Tables that you query in the foreign (remote) server. For this post, we create one table in source database
pdfdwsource
and one table in target databasepdfdwtarget
.
For pdfdwsource
, create a salary table with the columns emailid
and salary
with dummy content. See the following code:
For pdfdwtarget
, create a table corresponding to customer1
with the columns id
, name
, emailid
, projectname
, and contactnumber
with dummy content. See the following code:
In both tables, the column emailid
is common and can be used to derive insights into salary corresponding to customer1
employee information.
Now let’s see this in action.
Configuring your source instance, foreign server, user mapping, and foreign table
All the steps in this section are performed after logging in with the role pgfdwsource
into the primary database instance pgfdwsource
.
Connecting to the source instance
You connect to your source instance with a master user or via a normal user that has rds_superuser
permissions. You can use client tools like psql or pgAdmin. For more information, see Connecting to a DB instance running the PostgreSQL database engine. Create the extension postgres_fdw
with CREATE EXTENSION:
Creating a foreign server
We use CREATE SERVER to create our foreign (remote) server mapping as the PostgreSQL-based server from which we pull the data. A foreign server typically encapsulates connection information that an FDW uses to access an external data resource. It uses the same connection options as libpq. SSLMODE
‘require
’ makes sure that the data is encrypted in transit. See the following code:
About the Authors
Vibhu Pareek is a Solutions Architect at AWS. He joined AWS in 2016 and specializes in providing guidance on cloud adoption through the implementation of well architected, repeatable patterns and solutions that drive customer innovation. He has keen interest in open source databases like PostgreSQL. In his free time, you’d find him spending hours on automobile reviews, enjoying a game of football or engaged in pretentious fancy cooking.
Gaurav Sahi is a Principal Solutions Architect based out of Bangalore, India and has rich experience across embedded, telecom, broadcast & digital video and cloud technologies. In addition to helping customers in their transformation journey to cloud, his current passion is to explore and learn AI/ML services. He likes to travel, enjoy local food and spend time with family and friends in his free time.