AWS Big Data Blog

Accessing and visualizing data from multiple data sources with Amazon Athena and Amazon QuickSight

Amazon Athena now supports federated query, a feature that allows you to query data in sources other than Amazon Simple Storage Service (Amazon S3). You can use federated queries in Athena to query the data in place or build pipelines that extract data from multiple data sources and store them in Amazon S3. With Athena Federated Query, you can run SQL queries across data stored in relational, non-relational, object, and custom data sources. Athena queries including federated queries can be run from the Athena console, a JDBC or ODBC connection, the Athena API, the Athena CLI, the AWS SDK, or AWS Tools for Windows PowerShell.

The goal for this post is to discuss how we can use different connectors to run federated queries with complex joins across different data sources with Athena and visualize the data with Amazon QuickSight.

Athena Federated Query

Athena uses data source connectors that run on AWS Lambda to run federated queries. A data source connector is a piece of code that translates between your target data source and Athena. You can think of a connector as an extension of the Athena query engine. Prebuilt Athena data source connectors exist for data sources like Amazon CloudWatch Logs, Amazon DynamoDB, Amazon DocumentDB (with MongoDB compatibility), Amazon OpenSearch Service, Amazon ElastiCache for Redis, and JDBC-compliant relational data sources such as MySQL, PostgreSQL, and Amazon Redshift under the Apache 2.0 license. You can also use the Athena Query Federation SDK to write custom connectors. After you deploy data source connectors, the connector is associated with a catalog name that you can specify in your SQL queries. You can combine SQL statements from multiple catalogs and span multiple data sources with a single query.

When a query is submitted against a data source, Athena invokes the corresponding connector to identify parts of the tables that need to be read, manages parallelism, and pushes down filter predicates. Based on the user submitting the query, connectors can provide or restrict access to specific data elements. Connectors use Apache Arrow as the format for returning data requested in a query, which enables connectors to be implemented in languages such as C, C++, Java, Python, and Rust. Because connectors run in Lambda, you can use them to access data from any data source in the cloud or on premises that is accessible from Lambda.

Prerequisites

Before creating your development environment, you must have the following prerequisites:

Configuring your data source connectors

After you deploy your CloudFormation stack, follow the instructions in the post Extracting and joining data from multiple data sources with Athena Federated Query to configure various Athena data source connectors for HBase on Amazon EMR, DynamoDB, ElastiCache for Redis, and Amazon Aurora MySQL.

You can run Athena federated queries in the AmazonAthenaPreviewFunctionality workgroup created as part of the CloudFormation stack or you could run them in the primary workgroup or other workgroups as long as you’re running with Athena engine version 2. As of this writing, Athena Federated Query is generally available in the Asia Pacific (Mumbai), Asia Pacific (Tokyo), Europe (Ireland), US East (N. Virginia), US East (Ohio), US West (N. California), and US West (Oregon) Regions. If you’re running in other Regions, use the AmazonAthenaPreviewFunctionality workgroup.

For information about changing your workgroup to Athena engine version 2, see Changing Athena Engine Versions.

Configuring QuickSight

The next step is to configure QuickSight to use these connectors to query data and visualize with QuickSight.

  1. On the AWS Management Console, navigate to QuickSight.
  2. If you’re not signed up for QuickSight, you’re prompted with the option to sign up. Follow the steps to sign up to use QuickSight.
  3. After you log in to QuickSight, choose Manage QuickSight under your account.

After you log in to QuickSight, choose Manage QuickSight under your account.

  1. In the navigation pane, choose Security & permissions.
  2. Under QuickSight access to AWS services, choose Add or remove.

Under QuickSight access to AWS services, choose Add or remove.

A page appears for enabling QuickSight access to AWS services.

  1. Choose Athena.

Choose Athena.

  1. In the pop-up window, choose Next.

In the pop-up window, choose Next.

  1. On the S3 tab, select the necessary S3 buckets. For this post, I select the athena-federation-workshop-<account_id> bucket and another one that stores my Athena query results.
  2. For each bucket, also select Write permission for Athena Workgroup.

For each bucket, also select Write permission for Athena Workgroup.

  1. On the Lambda tab, select the Lambda functions corresponding to the Athena federated connectors that Athena federated queries use. If you followed the post Extracting and joining data from multiple data sources with Athena Federated Query when configuring your Athena federated connectors, you can select dynamo, hbase, mysql, and redis.

For information about registering a data source in Athena, see the appendix in this post.

  1. Choose Finish.

Choose Finish.

  1. Choose Update.
  2. On the QuickSight console, choose New analysis.
  3. Choose New dataset.
  4. For Datasets, choose Athena.
  5. For Data source name, enter Athena-federation.
  6. For Athena workgroup, choose primary.
  7. Choose Create data source. 

As stated earlier, you can use the AmazonAthenaPreviewFunctionality workgroup or another workgroup as long as you’re running Athena engine version 2 in a supported Region.

You can use the AmazonAthenaPreviewFunctionality workgroup or another workgroup as long as you’re running Athena engine version 2 in a supported Region.

  1. For Catalog, choose the catalog that you created for your Athena federated connector.

For information about creating and registering a data source in Athena, see the appendix in this post.

For information about creating and registering a data source in Athena, see the appendix in this post.

  1. For this post, I choose the dynamo catalog, which does a federation to the Athena DynamoDB connector.

For this post, I choose the dynamo catalog, which does a federation to the Athena DynamoDB connector.

I can now see the database and tables listed in QuickSight.

  1. Choose Edit/Preview data to see the data.
  2. Choose Save & Visualize to start using this data for creating visualizations in QuickSight.

22. Choose Save & Visualize to start using this data for creating visualizations in QuickSight.

  1. To do a join with another Athena data source, choose Add data and select the catalog and table.
  2. Choose the join link between the two datasets and choose the appropriate join configuration.
  3. Choose Apply.

Choose Apply

You should be able to see the joined data.

You should be able to see the joined data.

Running a query in QuickSight

Now we use the custom SQL option in QuickSight to run a complex query with multiple Athena federated data sources.

  1. On the QuickSight console, choose New analysis.
  2. Choose New dataset.
  3. For Datasets, choose Athena.
  4. For Data source name, enter Athena-federation.
  5. For the workgroup, choose primary.
  6. Choose Create data source.
  7. Choose Use custom SQL.
  8. Enter the query for ProfitBySupplierNation.
  9. Choose Edit/Preview data.

Choose Edit/Preview data.

Under Query mode, you have the option to view your query in either SPICE or direct query. SPICE is the QuickSight Super-fast, Parallel, In-memory Calculation Engine. It’s engineered to rapidly perform advanced calculations and serve data. Using SPICE can save time and money because your analytical queries process faster, you don’t need to wait for a direct query to process, and you can reuse data stored in SPICE multiple times without incurring additional costs. You also can refresh data in SPICE on a recurring basis as needed or on demand. For more information about refresh options, see Refreshing Data.

With direct query, QuickSight doesn’t use SPICE data and sends the query every time to Athena to get the data.

  1. Select SPICE.
  2. Choose Apply.
  3. Choose Save & visualize.

Choose Save & visualize.

  1. On the Visualize page, under Fields list, choose nation and sum_profit.

QuickSight automatically chooses the best visualization type based on the selected fields. You can change the visual type based on your requirement. The following screenshot shows a pie chart for Sum_profit grouped by Nation.

The following screenshot shows a pie chart for Sum_profit grouped by Nation.

You can add more datasets using Athena federated queries and create dashboards. The following screenshot is an example of a visual analysis over various datasets that were added as part of this post.

The following screenshot is an example of a visual analysis over various datasets that were added as part of this post.

When your analysis is ready, you can choose Share to create a dashboard and share it within your organization.

Summary

QuickSight is a powerful visualization tool, and with Athena federated queries, you can run analysis and build dashboards on various data sources like DynamoDB, HBase on Amazon EMR, and many more. You can also easily join relational, non-relational, and custom object stores in Athena queries and use them with QuickSight to create visualizations and dashboards.

For more information about Athena Federated Query, see Using Amazon Athena Federated Query and Query any data source with Amazon Athena’s new federated query.


Appendix

To register a data source in Athena, complete the following steps:

  1. On the Athena console, choose Data sources.

On the Athena console, choose Data sources.

  1. Choose Connect data source.

Choose Connect data source.

  1. Select Query a data source.
  2. For Choose a data source, select a data source (for this post, I select Redis).
  3. Choose Next.

Choose Next.

  1. For Lambda function, choose your function.

For this post, I use the redis Lambda function, which I configured as part of configuring the Athena federated connector in the post Extracting and joining data from multiple data sources with Athena Federated Query.

  1. For Catalog name, enter a name (for example, redis).

The catalog name you specify here is the one that is displayed in QuickSight when selecting Lambda functions for access.

  1. Choose Connect.

Choose Connect.

When the data source is registered, it’s available in the Data source drop-down list on the Athena console.

When the data source is registered, it’s available in the Data source drop-down list on the Athena console.


About the Author

Saurabh Bhutyani is a Senior Big Data Specialist Solutions Architect at Amazon Web Services. He is an early adopter of open-source big data technologies. At AWS, he works with customers to provide architectural guidance for running analytics solutions on Amazon EMR, Amazon Athena, AWS Glue, and AWS Lake Formation. In his free time, he likes to watch movies and spend time with his family.