Amazon Athena adds support for running SQL queries across relational, non-relational, object, and custom data sources.

Posted on: Nov 13, 2020

Federated queries in Amazon Athena enable users to run SQL queries across data stored in relational, non-relational, object, and custom data sources. The feature, which is now generally available in the us-east-1, us-west-2, and us-east-2 regions, enables customers to submit a single SQL query that scans data from multiple sources running on-premises or hosted in the cloud.  

Running analytics on data spread across applications can be complex and time consuming. Data required for analytics is often spread across relational, key-value, document, in-memory, search, graph, object, time-series and ledger data stores. To analyze data across these sources, analysts build complex pipelines to extract, transform and load data into a warehouse so that the data can be queried. Accessing data from various sources requires learning new programming languages and data access constructs. Federated SQL queries in Athena eliminate this complexity by allowing users to query the data in-place from wherever it resides. Analysts can use familiar SQL constructs to JOIN data across multiple data sources for quick analysis and store results in Amazon S3 for subsequent use. 

Athena executes federated queries using Athena Data Source Connectors that run on AWS Lambda. AWS has open sourced Data Source connectors for Amazon DynamoDB, Apache HBase, Amazon Document DB, Amazon Redshift, AWS CloudWatch, AWS CloudWatch Metrics, and JDBC-compliant relational databases such as MySQL and PostgreSQL under the Apache 2.0 license. Customers can use these connectors to run federated SQL queries in Athena across these data sources. Additionally, using Athena Query Federation SDK, developers can build connectors to any data source to enable Athena to run SQL queries against that data source. Athena Query Federation Connector extends the benefits of federated querying beyond AWS provided connectors. Since connectors run on AWS Lambda, customers do not have to manage infrastructure or plan for scaling to peak demands.

With this release, Athena federated query is generally available in the us-east-1, us-west-2, and us-east-2 regions. 

To learn more about the feature, please see the documentation here.
To get started with using an existing connector, please follow this guide.
To learn how to build your own data source connector using the Athena Query Federation SDK, please visit this link.