Federated query support between SQL Server and Amazon Aurora PostgreSQL

When you have a business need to integrate the data between Microsoft SQL Server and Amazon Aurora PostgreSQL-Compatible Edition, you can use the open-source TDS_FDW PostgreSQL extension to create federation between these two databases. With TDS_FDW with Aurora PostgreSQL-Compatible, you can modernize and extend the functionality of your existing applications to meet changing business needs. Alternatively, Babelfish for Aurora PostgreSQL can be used as well.

In this post, we provide five common use cases, and steps for implementing TDS_FDW with Aurora PostgreSQL-Compatible and using it to query the data within a sample SQL Server database. Finally, we cover best practices around the usage of TDS_FDW.

Prerequisites

You need to have the following prerequisites:

SQL Server, loaded with the sample database DMS_SAMPLE with data (refer to this GitHub repo).
An Aurora PostgreSQL cluster with a sample schema DMS_SAMPLE with no data, configured to access the SQL Server cluster using TDS_FDW. The sample schema can be found on GitHub.
Network connectivity configured between the source and target database clusters.

Federated queries from Aurora PostgreSQL to SQL Server

Complete the following steps:

Connect to the Aurora PostgreSQL cluster endpoint and connect to the schema dms_sample:

psql --host=your-cluster-name-instance-1.aws-region.rds.amazonaws.com --port=5432 --username=test –-password

dms_sample=# \c dms_sample

Get the list of the tables that exist in this database (ensure TDS_FDW will function with the foreign table):

dms_sample=# select schemaname,tablename from pg_tables where schemaname='dms_sample';

dms_sample=# select schemaname,tablename from pg_tables where schemaname=’dms_sample’;

schemaname | tablename
-----------+------------------

dms_sample | mlb data
dms_sample | name data
dms_sample | nfl data
dms_sample | nfl stadium data
dms_sample | person
dms_sample | player
dms_sample | seat
dms_sample | seat_type
dms_sample | sporting_event
dms_sample | sporting_event ticket
dms_sample | sport_division
dms_sample | sport_league
dms_sample | sport _ location
dms_sample | sport team
dms_sample | sport _ type
dms_sample | ticket_purchase_hist

Create the tdw_fdw extension:

dms_sample=# create extension tds_fdw;

CREATE EXTENSION
dms_sample=#

Create a foreign server to access external SQL Server, which acts as a source database to migrate the data from:

dms_sample=# create server fdw_ms_sql Foreign Data Wrapper tds_fdw OPTIONS (database 'dms_sample', servername '<ENTER SQL SERVER NAME>', port '1433',use_remote_estimate 'on');

CREATE SERVER
dms_sample=#

Create a user mapping between the local database user with the SQL Server user:

dms_sample=# CREATE USER MAPPING FOR postgres SERVER fdw_ms_sql OPTIONS (username '<ENTER USER NAME>', password 'ENTER PWD');

CREATE USER MAPPING
dms_sample=#

Create the schema f_dms to host all the foreign tables:
```
dms_sample=# create schema f_dms;

CREATE SCHEMA
dms_sample=#
```
These foreign tables don’t hold any data, but act like views to query the tables from the external SQL Server.

Create the foreign data tables in the f_dms schema for all the user tables in the dms_sample database in SQL Server:

dms_sample=# IMPORT FOREIGN SCHEMA dbo LIMIT TO (mlb_data,name_data,nfl_data,nfl_stadium_data,seat_type,sport_type,sport_league,awsdms_truncation_safeguard,sport_location,sport_division,sport_team,seat,player,person,sporting_event,sporting_event_ticket,ticket_purchase_hist) FROM SERVER fdw_ms_sql INTO f_dms;

IMPORT FOREIGN SCHEMA
dms_sample=#

Verify the list of the tables imported and query the tables to validate the access. The following statement lists all foreign data tables in the schema f_dms:

dms_sample=# select foreign_table_name from information_schema.foreign_tables where foreign_table_catalog = 'babelfish_db' and foreign_table_schema = 'f_dms';

babelfish_db-g-# SELECT foreign_table_name FROM information_schema.foreign_tables where foreign_table_catalag - 'babelfish_db' and foreign_table_schema - 'f_dms';

foreign_table_name
------------------------------

awsdms_truncation_safeguard
mlb_dataname data
name_data
nfl_data
nfl_stadium_data
person
player
seat
seat_type
sport_division
sport_league
sport_locaticn
sport_team
sport_type
sporting_event
sporting_event_ticket
ticket_purchase)hist
( 17 rows)

Alternately , you can use the metacommand \det as well.

Query the external SQL Server tables through the foreign data tables in schema f_dms:
```
>dms_sample=# select count(*) from f_dms.sporting_event_ticket;
```

Copy the data from SQL Server to Aurora PostgreSQL:

dms_sample=# insert into dms_sample.sporting_event_ticket select * from f_dms.sporting_event_ticket;

INSERT 0 57588768
dms_sample=#

Use cases

In this post, we primarily focused on the following use cases to give an overview on the capability. However, you can easily extend the solution for other federated query use cases:

Running ad hoc analytics queries on an external SQL Server – Organizations could have third-party applications using SQL Server as a backend, and data from these applications is often needed to query to get the recent data. For example, a third-party sales lead application has recent data about sales volume, and this data in external databases in SQL Server often needs to be queried along with the finance database to get forecasts on sales.
Migrating data from SQL Server to PostgreSQL – You might want to move away from proprietary commercial database platforms to license-free managed databases like Aurora PostgreSQL to save on costs associated with licensing, managing, and maintaining the databases. Out of many migration options available, which include AWS Database Migration Service (AWS DMS), CSV export/import, and SQL Server Integration Services (SSIS), you can also use TDS_FDW if the data isn’t subject to certain limitations (which we discuss later in this post).
Data sharding – In hybrid environments where multiple database engines are used, there is often a need to store specific silos of data in specific geolocations or AWS Regions. Due to these requirements, it can be useful to use TDS_FDW as a means of sharding datasets such that data can be written and stored by SQL Server, but made available for usage by Aurora PostgreSQL.
Avoiding data duplication – You can use TDS_FDW to interact with data within a SQL Server database without first exporting and reloading the data into Aurora PostgreSQL. This results in less operational overhead related to data movement across engines and allows for cost savings on storage and data transfer (because the data isn’t being reloaded for the specific purpose of its usage in Aurora PostgreSQL). Additionally, by setting up direct access between SQL Server and Aurora PostgreSQL via TDS_FDW, you don’t need to set up a change data capture (CDC) solution to keep the requisite tables in sync with each other.
Alternative to standard ETL – Although extract, transform, and load (ETL) workloads vary significantly in size and scope, in many basic customer use cases data just needs to be copied from source to target (often with no transformation). In the event that you only need to extract and load data between Aurora PostgreSQL and SQL Server, TDS_FDW allows for a simplified workflow. When you use a foreign data wrapper for this type of ETL use case, the data is only queried when needed using the applicable foreign table (instead of being constantly reloaded by a more complex reoccurring process).

Best practices

In this section, we discuss some best practices when using TDS_FDW.

Understanding pushdown

Pushdown is a mechanism used by PostgreSQL foreign data wrappers that allows for the remote run of clauses in client SQL statements (such as WHERE and ORDER BY). For instance, pushdown of a WHERE clause reduces bottlenecks between the local and remote databases, because the target data is filtered out on the remote side (reducing the amount of data transferred between the local (Aurora PostgreSQL) and remote (SQL Server). In a query where 100 out of 1,000 rows of foreign data are requested by specifying the WHERE clause, the quantity of data transferred is reduced by 90%.

Local and remote schemas are not automatically updated

Remote schemas are not automatically kept up to date with their local PostgreSQL equivalents, and sudden or extreme schema changes can break your foreign table. We recommend refreshing your TDW_FDW user, schema, and table mapping whenever user or schema changes are made on the SQL Server that could impact your connectivity. In the event that Aurora PostgreSQL experiences a major version upgrade, you also need to perform this procedure.

See the following sample code snippets:

BEGIN;
CREATE EXTENSION IF NOT EXISTS postgres_fdw;
DROP SERVER IF EXISTS app CASCADE;
CREATE SERVER … ;
CREATE USER MAPPING … ;
DROP SCHEMA IF EXISTS app;
CREATE SCHEMA app;
IMPORT FOREIGN SCHEMA … ;
COMMIT;

Materialized views

Another strategy that you can use for interacting with remote SQL Server data using TDW_FDW is a PostgreSQL feature called materialized views. Materialized views are native PostgreSQL objects that are a combination of a view and a table with a predefined dataset. Materialized views are useful when dealing with complex queries (sometimes consisting of joins between internal and external tables, functional conditions, and more).

Because materialized views are native PostgreSQL objects (and can use data pulled in via foreign data wrappers), they can be indexed independently from an underlying data source. One of the downsides of using this method for querying external data is that these PostgreSQL schema objects require manual updates by triggers or other events (and must be fully recreated when refresh is required). Because this manual update is required, this implies some staleness between data in a created materialized view and data in the source (remote) table.

Limitations

TDW_FDW has the following limitations:

Writes to remote not yet supported – Although we addressed one of the TDW_FDW limitations in the best practices section, there are several others that must be mentioned. Unlike other PostgreSQL foreign data wrappers, TDW_FDW doesn’t support write mode, and can only read from the mapped foreign table within SQL Server.
Workload considerations for performance – Although we touched on improving performance by limiting the amount of data returned using WHERE and ORDER BY clauses, it must be mentioned that if the amount of data returned by queries involving TDW_FDW remote tables can’t be limited (and operations such as SELECT * need to be implemented), performance may suffer (compared to querying data from native PostgreSQL tables). If you need to evaluate your use case for this FDW, please contact your AWS support team for an architectural review.
Join pushdown not supported – Different PostgreSQL foreign data wrappers support different modes of pushdown (as described in the best practices section). Join pushdown is an operation that allows for pushing down an entire join operation to the remote server in such a way that only the results need be fetched by the local server. Those familiar with postgres_fdw will know and use this pushdown functionality regularly. That said, TDS_FDW currently only supports WHERE and column pushdowns, which needs to be taken into consideration when qualifying your workloads for usage with TDS_FDW.

Conclusion

TDS_FDW in Aurora PostgreSQL is an option for connectivity between SQL Server and Aurora PostgreSQL. In this post, we demonstrated a workflow in which we are able to query a SQL Server database from Aurora PostgreSQL, and then read the entire contents of the foreign table into an Aurora PostgreSQL local table. In addition to demonstrating two sample workflows, we also mentioned additional use cases covered by this TDS foreign data wrapper.

If you have questions or suggestions on the content covered in this post, leave them in the comments section.

About the Authors

Peter Celentano is a Senior Specialist Solutions Architect with Amazon Web Services, focusing on managed PostgreSQL. He works with AWS customers to design scalable, secure, performant, and robust database architectures on the cloud.

Chandra Pathivada is a Senior Database Specialist Solutions Architect with Amazon Web Services. He works with AWS RDS team, focusing on Open source database engines like RDS for PostgreSQL and Aurora PostgreSQL. He enjoys working with customers to help design, deploy, and optimize relational database workloads on AWS.

AWS Database Blog