AWS Open Source Blog

Want more PostgreSQL? You just might like Babelfish

“The greatest force in legacy databases is inertia,” a widely regarded industry analyst once told me. Not superior functionality. Not better performance. Not lower cost. None of the above. Just inertia. Developers might say they prefer to run PostgreSQL to proprietary alternatives (and they do), but enterprises have spent years building data models in Microsoft SQL Server or Oracle. Change feels hard.

This is true even if there are compelling reasons to migrate. You might be happy running SQL Server, for example, but want PostgreSQL benefits, such as transparent security; or handling of specific data types, such as spatial or geographical objects (PostGIS). Or you’d like to save money currently spent on SQL Server Enterprise licenses to get compression or concurrent index creation, both of which are free with PostgreSQL. Or maybe you want more freedom in where and how you run your database workloads: on-premises, self-managed in the cloud of your choice, or fully managed by your preferred cloud vendor.

Whatever your reasons for wanting to migrate to PostgreSQL, we’d like to offer an easier way to do that.

Today we announced plans to open source Babelfish for PostgreSQL, a SQL Server-compatible end-point for PostgreSQL to make PostgreSQL fluent in understanding communication from apps written for SQL Server (i.e., including understanding T-SQL and TDS communication protocol). This isn’t the first effort to facilitate migrations from SQL Server to PostgreSQL. Ian Harding, for example, has been helping people do this for nearly 20 years, and an array of companies also offer migration services, including Crunchy Data, Severalnines, EnterpriseDB and, yes, AWS.

However, we believe Babelfish stands out because it’s not another migration service, as useful as those can be. Babelfish enables PostgreSQL to understand database requests—both the command and the protocol—from applications written for Microsoft SQL Server without changing libraries, database schema, or SQL statements. This means much faster “migrations” with minimal developer effort. It’s also centered on “correctness,” meaning applications designed to use SQL Server functionality will behave the same on PostgreSQL as they would on SQL Server.

The tl;dr? Babelfish helps you have more PostgreSQL in your life, without compromise.

Lots of love for PostgreSQL

PostgreSQL has always been a popular option for developers, but over the past decade it has increased in popularity, as measured by Stack Overflow and other means. This isn’t just feel-good survey data, either: DB-Engines has spent years tracking database popularity across a number of factors, including job postings, technical Q&A on sites such as Stack Overflow, and more, and it’s clear that this developer love has translated into broad adoption relative to incumbents:

One thing that many love about PostgreSQL is that it’s completely community-driven. As noted PostgreSQL contributor Bruce Momjian has written, “[T]here is an unwritten rule that the PostgreSQL core team should not have over half of its members from a single company.” This has helped to generate high levels of trust that the PostgreSQL community will do what’s right for the database (and its users), not for any particular vendor. In turn, this has led to a database that does the essential things well (store and not lose data), while evolving to meet changing requirements (JSON support, etc.).

And yet, as mentioned, for all PostgreSQL’s benefits (and there are many more not mentioned here), migrating a large application from its existing database to PostgreSQL all at once can be hard. For too many workloads, the perceived or actual risk of disrupting operations through migration is sometimes deemed too high. Babelfish aims to change this.

Getting more PostgreSQL in your life

Babelfish adds an endpoint to PostgreSQL that understands the SQL Server wire protocol Tabular Data Stream (TDS), as well as commonly used T-SQL commands used by SQL Server. Support for T-SQL includes elements such as the SQL dialect, cursors, catalog views, data types, triggers, stored procedures, and functions. With Babelfish enabled, you don’t have to swap out database drivers or take on the significant effort of rewriting and verifying all of your applications’ database requests.

This is the biggest difference between Babelfish and traditional migration services, including those offered by AWS, such as AWS Database Migration Service (DMS) and AWS Schema Conversion Tool (SCT). Although these services facilitate the conversion of database schema from SQL Server to PostgreSQL, then move data from SQL Server and load it into PostgreSQL, they don’t take care of the often laborious process of rewriting application code. Babelfish eliminates the need to change the application to use PostgreSQL database drivers. It also removes the need to rewrite your internal application code to work with PostgreSQL.

Again, Babelfish lets you load your data into PostgreSQL with no conversion and lets your application think it’s still talking to SQL Server.

Those are the mechanics, but developers need to be certain that Babelfish truly speaks SQL Server’s language in a dependable, predictable way. As such, the guiding principle for Babelfish is correctness, with no compromises. What do I mean by correctness? Namely, that applications designed to use SQL Server semantics will behave the same on PostgreSQL as they would on SQL Server.

Over its 35 years in existence, SQL Server has evolved to meet a wide array of use cases. When first made available on GitHub, Babelfish won’t be able to handle every use case, but will be able to tackle the most common application scenarios. Most importantly, Babelfish will meet the correctness objective. That is, if Babelfish doesn’t yet support specific SQL Server functionality, it will return an error to the application, rather than defaulting to PostgreSQL behavior. Why? Because, again, developers (and the enterprises for which they work) must be able to depend absolutely on the correctness of SQL Server compatibility.

A simplistic example may be helpful here.

Imagine a customer application built against SQL Server that processes and stores monetary values, such as an ERP system. A commonly used datatype to store monetary values is the MONEY data type. In SQL Server, the MONEY data type’s behavior is fixed using four digits to the right of the decimal (e.g., $12.8123). However, in PostgreSQL, the MONEY data type is fixed using two digits to the right of the decimal. So, when the application tries to store a value of $12.8123, by example, PostgreSQL will round to $12.81. This seemingly subtle difference in rounding behavior could have a significant impact on downstream processes, such as financial reporting. To ensure correctness in Babelfish, we need to ensure such differences, small and large, are handled with absolute fidelity.

To fully accomplish this, we believe a community effort will be needed, with diverse participation that covers the large surface area of how applications use SQL Server.

Building a future on Postgres, together

If you are an independent developer, or a developer employed by a software vendor, system integrator, or enterprise, and you have experience with SQL Server, PostgreSQL, or both—you can help. If you’re involved in migration efforts, you can help to improve Babelfish by extending it to meet your particular needs and contribute to cover more of the SQL Server surface area. Working together Babelfish will become even more powerful—faster—allowing enterprises and ISVs to use PostgreSQL in more places with less work.

We are releasing Babelfish under the Apache 2.0 license. We invite others to become active in the project, and we will see it as a sign of success when developers outside of AWS become committers or maintainers. You can help by adding or extending Babelfish functionality, submitting feature requests, working on documentation, and contributing test cases. Anything that helps developers move to PostgreSQL is greatly appreciated.

We are open sourcing Babelfish in 2021. Until then, you can use Babelfish on Amazon Aurora in a preview to see how it works and to get a sense for whether this is the right approach for you. If you’re interested in participating in the Babelfish community, sign up here to be notified when we release it. We hope you’ll both use and contribute to Babelfish, and are grateful for your interest.

 

Matt Asay

Matt Asay

Matt Asay (pronounced "Ay-see") has been involved in open source and all that it enables (cloud, machine learning, data infrastructure, mobile, etc.) for nearly two decades, working for a variety of open source companies and writing regularly for InfoWorld and TechRepublic. You can follow him on Twitter (@mjasay).