AWS Partner Network (APN) Blog

Using ‘athena-express’ to Simplify SQL Queries on Amazon Athena

By Gary Arora, APN Ambassador at Deloitte

In this post, I will show you how to integrate an application with Amazon Athena to execute SQL queries with athena-express.

athena-express is a wrapper around the AWS Software Development Kit (SDK) that can simplify executing SQL queries in Amazon Athena and fetch the JSON results in the same synchronous call—a capability well suited for many web applications.

Here’s an example of using athena-express to query data from Athena using SQL:

Athena-Express-1.1

Figure 1 – SQL query execution using ‘athena-express’ with expected output from Amazon Athena.

Amazon Athena Background

Amazon Athena makes it easier to analyze data in Amazon Simple Storage Service (Amazon S3) using standard SQL. Under the covers, it uses Presto, which is an open source SQL engine developed by Facebook in 2012 to query their 300 Petabyte data-warehouse. It’s incredibly powerful!

Athena combines the strength of Presto with the serverless and self-managed capabilities of AWS. By simply pointing Athena to your data in S3, you can start querying using standard SQL. Most results are delivered within seconds, and there’s no need for complex ETL jobs to prepare your data for analysis. This makes it easy for anyone with SQL skills to quickly analyze large-scale datasets.

How ‘athena-express’ Simplifies Using Amazon Athena

athena-express simplifies integrating Amazon Athena with any Node.JS application—running as a standalone application or as an AWS Lambda function. As a wrapper on the AWS SDK, athena-express bundles the following steps listed on the official AWS documentation:

  1. Initiates a query execution
  2. Keeps checking until the query has finished executing
  3. Fetches the results of the query execution from Amazon S3

And as added features:

  1. Formats the results into a clean, user-friendly JSON array
  2. Handles errors by recursively retrying for ThrottlingException, NetworkingError, and TooManyRequestsException
  3. Provides optional helpful stats including cost per query in USD

Integrating with Athena without athena-express requires you to identify the appropriate API methods in the AWS SDK, stitch them together sequentially, and then build out an error handling and retry mechanism for each of those methods.

athena-express can help you save time and effort in setting up this integration so that you can focus on core application development. It’s lightweight (~4KB uncompressed) and has zero dependencies.

How is ‘athena-express’ Being Used?

The most common use case is integrating a web frontend with Amazon Athena using athena-express as a backend. This backend could be a NodeJS application or a Lambda function.

Here’s a real-world example of athena-express running inside a Lambda function to integrate a web interface with Athena:

Athena-Express-2

Figure 2 – Real-world example of `athena-express` running inside a Lambda function to integrate with Amazon.

This architecture has a web frontend that invokes an API endpoint hosted on Amazon API Gateway by passing a query request. The query request can be as simple as SELECT * FROM movies LIMIT 3.

This API Gateway triggers a Lambda function that contains the athena-express library.

Prerequisites

You will need an AWS Identity and Access Management (IAM) Role if using Lambda or Amazon Elastic Compute Cloud (Amazon EC2), or an IAM User with accessKeyId and secretAccessKey if using a standalone NodeJS application. This IAM role/user must have AmazonAthenaFullAccess and AmazonS3FullAccess policies attached.

Note: As an alternative to granting AmazonS3FullAccess, you could granularize and limit write access to a specific bucket that you must specify during athena-express initialization.

Configuration Options

athena-express needs an AWS SDK object created with relevant permissions, as mentioned in the prerequisites. This AWS object is passed within the constructor so it can invoke the Amazon Athena SDK.

It’s up to you how you create this AWS object. Here are few options:

  • Create an AWS object by explicitly passing in the accessKeyId and secretAccessKey generated in prerequisites.
 const aws = require("aws-sdk");
 const awsCredentials = {
     region: "us-east-1",
     accessKeyId: "AKIAIHV5B6DGMEXVCXGA",
     secretAccessKey: "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"
 };
 aws.config.update(awsCredentials);

 const athenaExpressConfig = { aws }; //configuring athena-express with aws sdk object
 const athenaExpress = new AthenaExpress(athenaExpressConfig);
  • Or, if using Lambda, provide an IAM execution role with AmazonAthenaFullAccess and AmazonS3FullAccess policies attached.
 const aws = require("aws-sdk");
 const athenaExpressConfig = { aws }; //configuring athena-express with aws sdk object
 const athenaExpress = new AthenaExpress(athenaExpressConfig);

Option 1: Simple Configuration

Simple configuration requires only the AWS SDK object to be passed as a parameter to initialize athena-express. Default values are assumed for all parameter options and athena-express creates a new S3 bucket in your AWS account for Athena to store the query results in.

const AthenaExpress = require("athena-express"),
	aws = require("aws-sdk"),
	awsCredentials = { 
		region: "STRING_VALUE",
		accessKeyId: "STRING_VALUE",
		secretAccessKey: "STRING_VALUE"
	};

aws.config.update(awsCredentials);

//configuring athena-express with aws sdk object
const athenaExpressConfig = {
	aws, /* required */
};

//Initializing athena-express
const athenaExpress = new AthenaExpress(athenaExpressConfig);

Option 2: Advanced Configuration

Advanced configuration specifies all parameters. You can pick and choose any of the optional parameters below.

const AthenaExpress = require("athena-express"),
	aws = require("aws-sdk"),
	awsCredentials = { 
		region: "STRING_VALUE",
		accessKeyId: "STRING_VALUE",
		secretAccessKey: "STRING_VALUE"
	};

aws.config.update(awsCredentials);

//AthenaExpress config object
const athenaExpressConfig = {
	aws, /* required */
	s3: "STRING_VALUE", /* optional */
    db: "STRING_VALUE", /* optional */
	formatJson: BOOLEAN, /* optional default=true */
	retry: Integer, /* optional default=200 */
    getStats: BOOLEAN /* optional default=false */
};

//Initializing AthenaExpress
const athenaExpress = new AthenaExpress(athenaExpressConfig);

Advanced config parameters are:

  • s3 – (String) S3 bucket name/prefix to store Athena query results
  • db – (String) Athena database the SQL queries will be executed in
//So you can execute Athena queries simply by passing the SQL statement
athenaExpress.query("SELECT * FROM movies LIMIT 3");

//Instead of specifying the DB name in every query
athenaExpress.query({
	sql: "SELECT * FROM movies LIMIT 3",
	db: "moviedb"
});
  • formatJson – (Boolean default true) Override as false if you’d rather get the raw unformatted JSON from Athena
  • retry – (Integer default 200 milliseconds) Interval between re-checking if the specific Athena query has finished executing
  • getStats – (Boolean default false) Get stats for your query. These stats include data scanned in megabytes, Athena execution time in milliseconds, item count, and query cost in USD based on the Athena Pricing Documentation.

Here’s the output when getStats=true:

{  
	DataScannedInMB: 6,
	QueryCostInUSD: 0.00004768,
	EngineExecutionTimeInMillis: 2234,
	Count: 5,
   	Items: [  
      {  
         ...
      },
   ]
}

Usage: Invoking ‘athena-express’

Using Promises:

let query = {
	sql: "SELECT elb_name, request_port, request_ip FROM elb_logs LIMIT 3" /* required */,
	db: "sampledb" /* optional. You could specify a database here or in the advance configuration option mentioned above*/
};

athenaExpress
	.query(query)
	.then(results => {
		console.log(results);
	})
	.catch(error => {
		console.log(error);
	});

Using Async/Await:

(async () => {
	let query = {
		sql: "SELECT elb_name, request_port, request_ip FROM elb_logs LIMIT 3" /* required */,
		db: "sampledb" /* optional. You could specify a database here or in the configuration constructor*/
	};

	try {
		let results = await athenaExpress.query(query);
		console.log(results);
	} catch (error) {
		console.log(error);
	}
})();

Full Example

Using standard NodeJS application:

"use strict";

const AthenaExpress = require("athena-express"),
	aws = require("aws-sdk"),
	awsCredentials = {
		region: "us-east-1",
		accessKeyId: "AKIAIHV5B6DGMEXVCXGA",
		secretAccessKey: "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"
	};

aws.config.update(awsCredentials);

const athenaExpressConfig = {
	aws,
	s3: "s3://my-bucket-for-storing-athena-results-us-east-1",
    getStats: true
};

const athenaExpress = new AthenaExpress(athenaExpressConfig);

//Invoking a query on Amazon Athena
(async () => {
	let query = {
		sql: "SELECT elb_name, request_port, request_ip FROM elb_logs LIMIT 3",
		db: "sampledb",
		getStats: true 
	};

	try {
		let results = await athenaExpress.query(query);
		console.log(results);
	} catch (error) {
		console.log(error);
	}
})();

Using AWS Lambda:

"use strict";

const AthenaExpress = require("athena-express"),
	aws = require("aws-sdk");

	/* AWS Credentials are not required here 
    /* because the IAM Role assumed by this Lambda 
    /* has the necessary permission to execute Athena queries 
    /* and store the result in Amazon S3 bucket */

const athenaExpressConfig = {
	aws,
	db: "sampledb",
	getStats: true
};
const athenaExpress = new AthenaExpress(athenaExpressConfig);

exports.handler = async (event, context, callback) => {
	const sqlQuery = "SELECT elb_name, request_port, request_ip FROM elb_logs LIMIT 3";

	try {
		let results = await athenaExpress.query(sqlQuery);
		callback(null, results);
	} catch (error) {
		callback(error, null);
	}
};

Results

Here are the Console.log of results returned by athena-express. This JSON response includes the Items array containing the formatted data rows returned from Amazon Athena. Alongside Items, in the same hierarchy, are other attributes providing the metadata for this specific query.

Athena-Express-3

Figure 3 – Console.log of results returned by ‘athena-express.’

Summary

Amazon Athena is a powerful serverless product that enables anyone with SQL knowledge to quickly analyze large-scale datasets on Amazon S3. athena-express is a light-weight open source library that helps accelerate the integration of Athena with any NodeJS backend (including AWS Lambda) by wrapping around the AWS SDK.

In addition to executing SQL queries, athena-express also formats the query results into a clean, friendly JSON array, handles common errors by recursively retrying, and provides helpful stats including cost per query in USD.

Developers can use athena-express to help save time and effort in setting up the integration and focus on core application development.

Consider These Next Steps

The content and opinions in this blog are those of the third party author and AWS is not responsible for the content or accuracy of this post.

.


Deloitte Logo-1
Connect with Deloitte-1

Deloitte – APN Partner Spotlight

Deloitte is an APN Premier Partner and MSP. They assist customers in turning complex business issues into opportunities for growth. Deloitte’s market-leading teams help organizations transform in the digital era.

Contact Deloitte | Practice Overview | Customer Success

*Already worked with Deloitte? Rate this Partner

*To review an APN Partner, you must be an AWS customer that has worked with them directly on a project.