AWS Partner Network (APN) Blog

How to Use ThoughtSpot to Create Live Queries Against Amazon Athena Tables

By Darien Hong, Solutions Architect – AWS
By BP Yau, Sr. Partner Solutions Architect – AWS
By Ali Alladin, Sr. Partner Solutions Architect – ThoughtSpot

ThoughtSpot-AWS-Partners-2022
ThoughtSpot
Connect with ThoughtSpot-1

Amazon Athena is a serverless, interactive analytics service that provides a simplified and flexible way to analyze petabytes of data where it lives.

You can use Amazon Athena to directly run SQL queries against that data, or leverage modern analytics providers such as ThoughtSpot to derive business insights and create live visualizations using powerful artificial intelligence (AI) features.

In this post, you will learn how to create a connection between Amazon Athena and ThoughtSpot for visual analytics.

ThoughtSpot is an AWS Specialization Partner and AWS Marketplace Seller with the Data and Analytics Competency Partner. It enables business users to easily ask questions using search and AI directly on billions of rows of data in the cloud.

Create an Access Key for Your IAM User

There are a number of ways to grant access to different types of Amazon Web Services (AWS) resources; one of those methods is to create AWS Identify and Access Management (IAM) users. Once an IAM user is created, you can create an access key under that IAM user. Access keys are long-term credentials for an IAM user that grants programmatic access to your resources.

Access keys consist of two parts: an access key ID and a secret access key. In order to connect third-party applications such as ThoughtSpot to Amazon Athena, you need to create an access key. To do this, log in to the AWS console and go to the navigation bar on the upper right, choose your username, and then select Security credentials.

In the Access keys section, choose Create access key. If you already have two access keys, this button is deactivated and you must delete an access key before you can create a new one. If you already have active access keys and have the key and secret key saved, you can use that one for the ThoughtSpot connection and skip this section.

Figure 1 - Creating an access key

Figure 1 – Create access key.

Next, select the use case Third-party service and confirm you understand the recommendation, and then click Next.

On the Retrieve access keys page, choose either Show to reveal the value of your user’s secret access key, or Download .csv file. This is your only opportunity to save your secret access key. After you’ve saved your secret access key in a secure location, choose Done.

Create Two Amazon S3 Buckets

Another requirement when creating the connection between Amazon Athena and ThoughtSpot is to have an Amazon Simple Storage Service (Amazon S3) bucket location where you want to store query results. You can use an existing bucket you have or create a new one. To create a new bucket, navigate to Amazon S3 on the AWS console and select Create bucket.

To create a bucket, you’ll need to specify a globally unique bucket name. Select a region for the bucket that’s the same as the region where you use Athena. Note this region down as you’ll need it for the connection.

Keep the rest of the default settings and select Create bucket.

Repeat the steps above to create another S3 bucket to store the data you want to analyze.

Create an Amazon Athena Table

For this walkthrough, we’ll be using the sample datasets from this Athena Workshop. To load the sample data into your S3 bucket, click the CloudShell icon in the top navigation bar.

In the CloudShell terminal, execute the following commands by replacing <your-bucket-name> with the bucket you created to store the data:

aws s3 cp s3://ws-assets-prod-iad-r-iad-ed304a55c2ca1aee/9981f1a1-abdc-49b5-8387-cb01d238bb78/v1/csv/customers.csv ./customers.csv
aws s3 cp s3://ws-assets-prod-iad-r-iad-ed304a55c2ca1aee/9981f1a1-abdc-49b5-8387-cb01d238bb78/v1/csv/sales.csv ./sales.csv
aws s3 cp customers.csv s3://<your-bucket-name>;/basics/csv/customers/customers.csv
aws s3 cp sales.csv s3://<your-bucket-name>;/basics/csv/sales/sales.csv
echo "----- done -----"

Navigate back to the S3 console to verify the dataset has been loaded.

Figure 2 - Dataset in S3

Figure 2 – Verifying Amazon S3 dataset.

You should see a folder called basics/, with a subfolder called csv/, where there will be two folders, one labeled customers/ and one labeled sales/. The CSV files in these folders will be used to create Athena tables and connect to ThoughtSpot.

You can create Athena tables using AWS Glue crawlers, using Hive DDL, or by using the Athena Create table form. In this walkthrough, the tables will be created using an AWS Glue crawler.

Navigate to the Athena Query editor. Select Create from Tables and View, and then choose AWS Glue crawler.

Start by giving your crawler a name. Unlike S3 buckets, the name of your crawler doesn’t need to be globally unique. In this example, the crawler is called s3-crawler. You can optionally add a description to define what the crawler is used for, and then select Next.

When you define a crawler, you need to add a data source. AWS Glue can crawl many different sources, such as Amazon S3, Amazon DynamoDB, Apache Iceberg, Amazon Redshift, and more. In this example, the data is stored in S3.

Select Add a data source, choose S3 as the data source, and define the S3 path where the data is located. This can be an entire S3 bucket or a sub-folder. You can define how the crawler runs subsequently after the initial run. It can crawl the entire data source again, only the new subfolders that were added after the initial crawl, or based on S3 events.

For now, select Crawl all sub-folders and then Add an S3 data source.

Figure 3 - Define Glue Crawler data source

Figure 3 – Create AWS Glue crawler.

Classifiers evaluate the format of your data to infer a schema. There are many built-in classifiers for various formats, including JSON, CSV, web logs, and many database systems. You can skip adding a custom classifier for now, and select Next to proceed.

In order for the AWS Glue crawler to access the appropriate resources, you need to define an IAM role. You can either use an existing role or Create a new IAM role. A new IAM role created by the Glue console will have the prefix AWSGlueServiceRole-. You can add anything after the prefix. Select Create and then Next.

For the output, you can select an existing database or create a new one by selecting Add database. This takes you to the page to create a database. All you need to define is a database name. In this case, a database called default was created and selected.

You can define a time-based schedule for your crawler. It can be run hourly, daily, weekly, monthly, on a customer schedule, or on demand. Select On demand and then Next.

Figure 4 - Scheduling crawler

Figure 4 – Set crawler schedule.

Review all of the configurations and select Create crawler, and select Run crawler.

Depending on the amount of data that’s being crawled, the run can take anywhere from a few seconds to a few minutes. Once the status is Completed, navigate back to the Athena console to verify the tables have been created properly.

Go to the Query editor and select the database you created; in this case default. Run a simple query against the tables created to verify there is data in the tables.

Figure 5 - Athena query on data

Figure 5 – Verify dataset using Amazon Athena query editor.

Create ThoughtSpot Connection

Login to your ThoughtSpot instance. From the Data tab, select Add connection and enter a connection name, then select Athena and click Continue.

For the connection details, enter the information from the steps above:

  • Region: Enter the region where your Athena and S3 bucket lies.
  • Access key: This is the access key you saved from section 1.
  • Secret access key: This is the secret access key that you saved from section 1.
  • S3 output location: This is the S3 bucket you created in section 2 to store query result, or an existing bucket you already have. The path should be formatted as s3://<bucket-name>

The other fields are optional. You can specify a specific Data Source you want to connect to in Athena, and if you’re using a private link connection you can specify a VPC Endpoint.

Select Continue and the databases and tables will be listed in the left panel.

Figure 6 - Configure connection to Athena

Figure 6 – Create ThoughtSpot-to-Athena connection.

Select the appropriate tables and columns you’d like ThoughtSpot to use. Note the Selected tab updates to include the currently selected items, and sample data is included so you can confirm your choices.

Figure 7 - Select the appropriate tables

Figure 7 – Select appropriate tables for analysis.

Once you have selected all the data you want to share with ThoughtSpot, select Create connection and then Create. When the connection is complete, the data is immediately available for searching.

Figure 8 - Successful connection to the data

Figure 8 – Using ThoughtSpot for analytics.

To ensure the connection was successful, you can navigate to the Search data tab and select the tables you just added from Athena. You can select specific columns and create a visualization based on the data. If ThoughtSpot is able search and access data from the tables, the connection was successful.

Summary

You have successfully connected ThoughtSpot to your Amazon Athena data source and can now run live queries against that data. With the power of ThoughtSpot, you’re able to create live visualizations and search through your data with natural language.

Check out AWS Marketplace and sign up for a free trial with ThoughtSpot.

.
ThoughtSpot-APN-Blog-Connect-2022
.


ThoughtSpot – AWS Partner Spotlight

ThoughtSpot is an AWS Specialization Partner that enables business users to easily ask questions using search and AI directly on billions of rows of data in the cloud.

Contact ThoughtSpot | Partner Overview | AWS Marketplace