Federated query support between SQL Server and Amazon Aurora PostgreSQL
When you have a business need to integrate the data between Microsoft SQL Server and Amazon Aurora PostgreSQL-Compatible Edition, you can use the open-source TDS_FDW PostgreSQL extension to create federation between these two databases. With TDS_FDW with Aurora PostgreSQL-Compatible, you can modernize and extend the functionality of your existing applications to meet changing business needs. Alternatively, Babelfish for Aurora PostgreSQL can be used as well.
In this post, we provide five common use cases, and steps for implementing TDS_FDW with Aurora PostgreSQL-Compatible and using it to query the data within a sample SQL Server database. Finally, we cover best practices around the usage of TDS_FDW.
You need to have the following prerequisites:
- SQL Server, loaded with the sample database
DMS_SAMPLEwith data (refer to this GitHub repo).
- An Aurora PostgreSQL cluster with a sample schema
DMS_SAMPLEwith no data, configured to access the SQL Server cluster using TDS_FDW. The sample schema can be found on GitHub.
- Network connectivity configured between the source and target database clusters.
Federated queries from Aurora PostgreSQL to SQL Server
Complete the following steps:
- Connect to the Aurora PostgreSQL cluster endpoint and connect to the schema
- Get the list of the tables that exist in this database (ensure
TDS_FDWwill function with the foreign table):
- Create the
- Create a foreign server to access external SQL Server, which acts as a source database to migrate the data from:
- Create a user mapping between the local database user with the SQL Server user:
- Create the schema
f_dmsto host all the foreign tables:
These foreign tables don’t hold any data, but act like views to query the tables from the external SQL Server.
- Create the foreign data tables in the
f_dmsschema for all the user tables in the
dms_sampledatabase in SQL Server:
- Verify the list of the tables imported and query the tables to validate the access. The following statement lists all foreign data tables in the schema
Alternately , you can use the metacommand
- Query the external SQL Server tables through the foreign data tables in schema
- Copy the data from SQL Server to Aurora PostgreSQL:
In this post, we primarily focused on the following use cases to give an overview on the capability. However, you can easily extend the solution for other federated query use cases:
- Running ad hoc analytics queries on an external SQL Server – Organizations could have third-party applications using SQL Server as a backend, and data from these applications is often needed to query to get the recent data. For example, a third-party sales lead application has recent data about sales volume, and this data in external databases in SQL Server often needs to be queried along with the finance database to get forecasts on sales.
- Migrating data from SQL Server to PostgreSQL – You might want to move away from proprietary commercial database platforms to license-free managed databases like Aurora PostgreSQL to save on costs associated with licensing, managing, and maintaining the databases. Out of many migration options available, which include AWS Database Migration Service (AWS DMS), CSV export/import, and SQL Server Integration Services (SSIS), you can also use TDS_FDW if the data isn’t subject to certain limitations (which we discuss later in this post).
- Data sharding – In hybrid environments where multiple database engines are used, there is often a need to store specific silos of data in specific geolocations or AWS Regions. Due to these requirements, it can be useful to use TDS_FDW as a means of sharding datasets such that data can be written and stored by SQL Server, but made available for usage by Aurora PostgreSQL.
- Avoiding data duplication – You can use TDS_FDW to interact with data within a SQL Server database without first exporting and reloading the data into Aurora PostgreSQL. This results in less operational overhead related to data movement across engines and allows for cost savings on storage and data transfer (because the data isn’t being reloaded for the specific purpose of its usage in Aurora PostgreSQL). Additionally, by setting up direct access between SQL Server and Aurora PostgreSQL via
TDS_FDW, you don’t need to set up a change data capture (CDC) solution to keep the requisite tables in sync with each other.
- Alternative to standard ETL – Although extract, transform, and load (ETL) workloads vary significantly in size and scope, in many basic customer use cases data just needs to be copied from source to target (often with no transformation). In the event that you only need to extract and load data between Aurora PostgreSQL and SQL Server,
TDS_FDWallows for a simplified workflow. When you use a foreign data wrapper for this type of ETL use case, the data is only queried when needed using the applicable foreign table (instead of being constantly reloaded by a more complex reoccurring process).
In this section, we discuss some best practices when using
Pushdown is a mechanism used by PostgreSQL foreign data wrappers that allows for the remote run of clauses in client SQL statements (such as WHERE and ORDER BY). For instance, pushdown of a WHERE clause reduces bottlenecks between the local and remote databases, because the target data is filtered out on the remote side (reducing the amount of data transferred between the local (Aurora PostgreSQL) and remote (SQL Server). In a query where 100 out of 1,000 rows of foreign data are requested by specifying the WHERE clause, the quantity of data transferred is reduced by 90%.
Local and remote schemas are not automatically updated
Remote schemas are not automatically kept up to date with their local PostgreSQL equivalents, and sudden or extreme schema changes can break your foreign table. We recommend refreshing your
TDW_FDW user, schema, and table mapping whenever user or schema changes are made on the SQL Server that could impact your connectivity. In the event that Aurora PostgreSQL experiences a major version upgrade, you also need to perform this procedure.
See the following sample code snippets:
Another strategy that you can use for interacting with remote SQL Server data using
TDW_FDW is a PostgreSQL feature called materialized views. Materialized views are native PostgreSQL objects that are a combination of a view and a table with a predefined dataset. Materialized views are useful when dealing with complex queries (sometimes consisting of joins between internal and external tables, functional conditions, and more).
Because materialized views are native PostgreSQL objects (and can use data pulled in via foreign data wrappers), they can be indexed independently from an underlying data source. One of the downsides of using this method for querying external data is that these PostgreSQL schema objects require manual updates by triggers or other events (and must be fully recreated when refresh is required). Because this manual update is required, this implies some staleness between data in a created materialized view and data in the source (remote) table.
TDW_FDW has the following limitations:
- Writes to remote not yet supported – Although we addressed one of the
TDW_FDWlimitations in the best practices section, there are several others that must be mentioned. Unlike other PostgreSQL foreign data wrappers,
TDW_FDWdoesn’t support write mode, and can only read from the mapped foreign table within SQL Server.
- Workload considerations for performance – Although we touched on improving performance by limiting the amount of data returned using WHERE and ORDER BY clauses, it must be mentioned that if the amount of data returned by queries involving
TDW_FDWremote tables can’t be limited (and operations such as
SELECT *need to be implemented), performance may suffer (compared to querying data from native PostgreSQL tables). If you need to evaluate your use case for this FDW, please contact your AWS support team for an architectural review.
- Join pushdown not supported – Different PostgreSQL foreign data wrappers support different modes of pushdown (as described in the best practices section). Join pushdown is an operation that allows for pushing down an entire join operation to the remote server in such a way that only the results need be fetched by the local server. Those familiar with
postgres_fdwwill know and use this pushdown functionality regularly. That said,
TDS_FDWcurrently only supports WHERE and column pushdowns, which needs to be taken into consideration when qualifying your workloads for usage with
TDS_FDW in Aurora PostgreSQL is an option for connectivity between SQL Server and Aurora PostgreSQL. In this post, we demonstrated a workflow in which we are able to query a SQL Server database from Aurora PostgreSQL, and then read the entire contents of the foreign table into an Aurora PostgreSQL local table. In addition to demonstrating two sample workflows, we also mentioned additional use cases covered by this TDS foreign data wrapper.
If you have questions or suggestions on the content covered in this post, leave them in the comments section.
About the Authors
Peter Celentano is a Senior Specialist Solutions Architect with Amazon Web Services, focusing on managed PostgreSQL. He works with AWS customers to design scalable, secure, performant, and robust database architectures on the cloud.
Chandra Pathivada is a Senior Database Specialist Solutions Architect with Amazon Web Services. He works with AWS RDS team, focusing on Open source database engines like RDS for PostgreSQL and Aurora PostgreSQL. He enjoys working with customers to help design, deploy, and optimize relational database workloads on AWS.