AWS Partner Network (APN) Blog
Using ‘athena-express’ to Simplify SQL Queries on Amazon Athena
By Gary Arora, APN Ambassador at Deloitte
In this post, I will show you how to integrate an application with Amazon Athena to execute SQL queries with athena-express
.
athena-express
is a wrapper around the AWS Software Development Kit (SDK) that can simplify executing SQL queries in Amazon Athena and fetch the JSON results in the same synchronous call—a capability well suited for many web applications.
Here’s an example of using athena-express
to query data from Athena using SQL:
Figure 1 – SQL query execution using ‘athena-express’ with expected output from Amazon Athena.
Amazon Athena Background
Amazon Athena makes it easier to analyze data in Amazon Simple Storage Service (Amazon S3) using standard SQL. Under the covers, it uses Presto, which is an open source SQL engine developed by Facebook in 2012 to query their 300 Petabyte data-warehouse. It’s incredibly powerful!
Athena combines the strength of Presto with the serverless and self-managed capabilities of AWS. By simply pointing Athena to your data in S3, you can start querying using standard SQL. Most results are delivered within seconds, and there’s no need for complex ETL jobs to prepare your data for analysis. This makes it easy for anyone with SQL skills to quickly analyze large-scale datasets.
How ‘athena-express’ Simplifies Using Amazon Athena
athena-express
simplifies integrating Amazon Athena with any Node.JS application—running as a standalone application or as an AWS Lambda function. As a wrapper on the AWS SDK, athena-express
bundles the following steps listed on the official AWS documentation:
- Initiates a query execution
- Keeps checking until the query has finished executing
- Fetches the results of the query execution from Amazon S3
And as added features:
- Formats the results into a clean, user-friendly JSON array
- Handles errors by recursively retrying for
ThrottlingException
,NetworkingError
, andTooManyRequestsException
- Provides optional helpful stats including cost per query in USD
Integrating with Athena without athena-express
requires you to identify the appropriate API methods in the AWS SDK, stitch them together sequentially, and then build out an error handling and retry mechanism for each of those methods.
athena-express
can help you save time and effort in setting up this integration so that you can focus on core application development. It’s lightweight (~4KB uncompressed) and has zero dependencies.
How is ‘athena-express’ Being Used?
The most common use case is integrating a web frontend with Amazon Athena using athena-express
as a backend. This backend could be a NodeJS application or a Lambda function.
Here’s a real-world example of athena-express
running inside a Lambda function to integrate a web interface with Athena:
Figure 2 – Real-world example of `athena-express` running inside a Lambda function to integrate with Amazon.
This architecture has a web frontend that invokes an API endpoint hosted on Amazon API Gateway by passing a query request. The query request can be as simple as SELECT * FROM movies LIMIT 3
.
This API Gateway triggers a Lambda function that contains the athena-express
library.
Prerequisites
You will need an AWS Identity and Access Management (IAM) Role if using Lambda or Amazon Elastic Compute Cloud (Amazon EC2), or an IAM User with accessKeyId
and secretAccessKey
if using a standalone NodeJS application. This IAM role/user must have AmazonAthenaFullAccess
and AmazonS3FullAccess
policies attached.
Note: As an alternative to granting AmazonS3FullAccess
, you could granularize and limit write access to a specific bucket that you must specify during athena-express
initialization.
Configuration Options
athena-express
needs an AWS SDK object created with relevant permissions, as mentioned in the prerequisites. This AWS object is passed within the constructor so it can invoke the Amazon Athena SDK.
It’s up to you how you create this AWS object. Here are few options:
- Create an AWS object by explicitly passing in the
accessKeyId
andsecretAccessKey
generated in prerequisites.
- Or, if using Lambda, provide an IAM execution role with
AmazonAthenaFullAccess
andAmazonS3FullAccess
policies attached.
- Use instance profiles when using Amazon EC2 instances.
- Use environment variables.
Option 1: Simple Configuration
Simple configuration requires only the AWS SDK object to be passed as a parameter to initialize athena-express
. Default values are assumed for all parameter options and athena-express
creates a new S3 bucket in your AWS account for Athena to store the query results in.
Option 2: Advanced Configuration
Advanced configuration specifies all parameters. You can pick and choose any of the optional parameters below.
Advanced config parameters are:
- s3 – (String) S3 bucket name/prefix to store Athena query results
- db – (String) Athena database the SQL queries will be executed in
- formatJson – (Boolean default true) Override as false if you’d rather get the raw unformatted JSON from Athena
- retry – (Integer default 200 milliseconds) Interval between re-checking if the specific Athena query has finished executing
- getStats – (Boolean default false) Get stats for your query. These stats include data scanned in megabytes, Athena execution time in milliseconds, item count, and query cost in USD based on the Athena Pricing Documentation.
Here’s the output when getStats=true:
Usage: Invoking ‘athena-express’
Using Promises:
Using Async/Await:
Full Example
Using standard NodeJS application:
Using AWS Lambda:
Results
Here are the Console.log of results returned by athena-express
. This JSON response includes the Items
array containing the formatted data rows returned from Amazon Athena. Alongside Items
, in the same hierarchy, are other attributes providing the metadata for this specific query.
Figure 3 – Console.log of results returned by ‘athena-express.’
Summary
Amazon Athena is a powerful serverless product that enables anyone with SQL knowledge to quickly analyze large-scale datasets on Amazon S3. athena-express
is a light-weight open source library that helps accelerate the integration of Athena with any NodeJS backend (including AWS Lambda) by wrapping around the AWS SDK.
In addition to executing SQL queries, athena-express
also formats the query results into a clean, friendly JSON array, handles common errors by recursively retrying, and provides helpful stats including cost per query in USD.
Developers can use athena-express
to help save time and effort in setting up the integration and focus on core application development.
Consider These Next Steps
- Get started with Amazon Athena by following the official AWS tutorial.
- Integrate
athena-express
with your application by installing it from NPM package manager. - Contribute to
athena-express
on its GitHub repository. - Create similar libraries in other languages by exploring the AWS SDK for Amazon Athena.
- Reach out to
athena-express
author, Gary Arora, for support related queries.
The content and opinions in this blog are those of the third party author and AWS is not responsible for the content or accuracy of this post.
.
Deloitte – APN Partner Spotlight
Deloitte is an APN Premier Partner and MSP. They assist customers in turning complex business issues into opportunities for growth. Deloitte’s market-leading teams help organizations transform in the digital era.
Contact Deloitte | Practice Overview | Customer Success
*Already worked with Deloitte? Rate this Partner
*To review an APN Partner, you must be an AWS customer that has worked with them directly on a project.