AWS Big Data Blog

Query 10 new data sources with Amazon Athena

When we first launched Amazon Athena, our mission was to make it simple to query data stored in Amazon Simple Storage Service (Amazon S3). Athena customers found it easy to get started and develop analytics on petabyte-scale data lakes, but told us they needed to join their Amazon S3 data with data stored elsewhere. We added connectors to sources including Amazon DynamoDB and Amazon Redshift to give data analysts, data engineers, and data scientists the ability to run SQL queries on data stored in databases running on-premises or in the cloud alongside data stored in Amazon S3.

Today, thousands of AWS customers from nearly every industry use Athena federated queries to surface insights and make data-driven decisions from siloed enterprise data—using a single AWS service and SQL dialect.

We’re excited to expand your ability to derive insights from more of your data with today’s release of 10 new data source connectors, which include some of the most widely used data stores on the market.

New data sources for Athena

You can now use Athena to query and surface insights from 10 new data sources:

  • SAP HANA (Express Edition)
  • Teradata
  • Cloudera
  • Hortonworks
  • Snowflake
  • Microsoft SQL Server
  • Oracle
  • Azure Data Lake Storage (ADLS) Gen2
  • Azure Synapse
  • Google BigQuery

Today’s release greatly expands the number of data sources supported by Athena. For a complete list of supported data sources, see Using Athena Data Source Connectors.

To coincide with this release, we enhanced the Athena console to help you browse available sources and connect to your data in fewer steps. You can now search, sort, and filter the available connectors on the console, and then follow the guided setup wizard to connect to your data.

Just as before, we’ve open-sourced the new connectors to invite contributions from the developer community. For more information, see Writing a Data Source Connector Using the Athena Query Federation SDK.

Connect the dots in your analytics strategy with Athena

With the breadth of data storage options available today, it’s common for data-driven organizations to choose a data store that meets the requirements of specific use cases and applications. Although this flexibility is ideal for architects and developers, it can add complexity for analysts, data scientists, and data engineers, which prevents them from accessing the data they need. To get around this, many users resort to workarounds that often involve learning new programming languages and database concepts or building data pipelines to prepare the data before it can be analyzed. Athena helps cut through this complexity with support for over 25 data sources and its simple-to-use, pay-as-you-go, serverless design.

With Athena, you can use your existing SQL knowledge to extract insights from a wide range of data sources without learning a new language, developing scripts to extract (and duplicate) data, or managing infrastructure. Athena allows you to do the following:

  • Run on-demand analysis on data spread across multiple cloud providers and systems of record using a single tool and single SQL dialect
  • Visualize data in business intelligence applications that use Athena to perform complex, multi-source joins
  • Design self-service extract, transform, and load (ETL) pipelines and event-based data processing workflows with Athena’s integration with AWS Step Functions
  • Unify diverse data sources to produce rich input features for machine learning model training workflows
  • Develop user-facing data-as-a-product applications that surface insights across data mesh architectures
  • Support analytics use cases while your organization migrates on-premises sources to the AWS Cloud

Get started with Athena’s data source connectors

To get started with federated queries for Athena, on the Athena console, choose Data Sources in the navigation pane, choose a data source, and follow the guided setup experience to configure your connector. After the connection is established and the source is registered with Athena, you can query the data via the Athena console, API, AWS SDK, and compatible third-party applications. To learn more, see Using Amazon Athena Federated Query and Writing Federated Queries.

You can also share a data source connection with team members, allowing them to use their own AWS account to query the data without setting up a duplicate connector. To learn more, see Enabling Cross-Account Federated Queries.

Conclusion

We encourage you to evaluate Athena and federated queries on your next analytics project. For help getting started, we recommend the following resources:


About the Authors

Scott Rigney is a Senior Technical Product Manager with Amazon Web Services (AWS) and works with the Amazon Athena team based out of Arlington, Virginia. He is passionate about building analytics products that enable enterprises to make data-driven decisions.

Jean-Louis Castro-Malaspina is a Senior Product Marketing Manager with Amazon Web Services (AWS) based in Hershey, Pennsylvania. He enjoys highlighting how customers use Analytics and Amazon Athena to unlock innovation. Outside of work, Jean-Louis enjoys spending time with his wife and daughter, running, and following international soccer.

Suresh_90Suresh Akena is a Principal WW GTM Leader for Amazon Athena. He works with the startups, enterprise and strategic customers to provide leadership on large scale data strategies including migration to AWS platform, big data and analytics and ML initiatives and help them to optimize and improve time to market for data driven applications when using AWS.