AWS Big Data Blog

Separate queries and managing costs using Amazon Athena workgroups

Amazon Athena is a serverless query engine for data on Amazon S3. Many customers use Athena to query application and service logs, schedule automated reports, and integrate with their applications, enabling new analytics-based capabilities.

Different types of users rely on Athena, including business analysts, data scientists, security, and operations engineers. But how do you separate and manage these workloads so that users get the best experience while minimizing costs?

In this post, I show you how to use workgroups to do the following:

  • Separate workloads.
  • Control user access.
  • Manage query usage and costs.

Separate workloads

By default, all Athena queries execute in the primary workgroup.  As an administrator, you can create new workgroups to separate different types of workloads.  Administrators commonly turn to workgroups to separate analysts running ad hoc queries from automated reports.  Here’s how to build out that separation.

First create two workgroups, one for ad hoc users (ad-hoc-users) and another for automated reports (reporting).

Next, select a specific output location. All queries executed inside this workgroup save their results to this output location. Routing results to a single secure location helps make sure users only access data they are permitted to see. You can also enforce encryption of query results in S3 by selecting the appropriate encryption configuration.

Workgroups also help you simplify the onboarding of new users to Athena. By selecting override client-side settings, you enforce a predefined configuration on all queries within a workgroup. Users no longer have to configure a query results output location or S3 encryption keys. These settings default to the parameters defined for the workgroup where those queries execute. Additionally, each workgroup maintains a unique query history and saved query inventory, making queries easier for you to track down.

Finally, when creating a workgroup, you can add up to 50 key-value pair tags to help identify your workgroup resources. Tags are also useful when attempting to allocate Athena costs between groups of users. Create Name and Dept tags for the ad-hoc-users and reporting workgroups with their name and department association.

Control user access to workgroups

Now that you have two workgroups defined, ad-hoc-users and reporting, you must control who can use and update them.  Remember that workgroups are IAM resources and therefore have an ARN. You can use this ARN in the IAM policy that you associate with your users.  In this example, create a single IAM user representing the team of ad hoc users and add the individual to an IAM group. The group contains a policy that enforces what actions these users can perform.

Start by reviewing IAM Policies for Accessing Workgroups and Workgroup Example Policies to familiarize yourself with policy options. Use the following IAM policy to set up permissions for your analyst user. Grant this user only the permissions required for working in the ad-hoc-users workgroup. Make sure that you tweak this policy to match your exact needs:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "athena:ListWorkGroups"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "athena:StartQueryExecution",
                "athena:GetQueryResults",
                "athena:DeleteNamedQuery",
                "athena:GetNamedQuery",
                "athena:ListQueryExecutions",
                "athena:StopQueryExecution",
                "athena:GetQueryResultsStream",
                "athena:GetQueryExecutions",
                "athena:ListNamedQueries",
                "athena:CreateNamedQuery",
                "athena:GetQueryExecution",
                "athena:BatchGetNamedQuery",
                "athena:BatchGetQueryExecution",
                "athena:GetWorkGroup",
                "athena:ListTagsForResource"
            ],
            "Resource": "arn:aws:athena:us-east-1:112233445566:workgroup/ad-hoc-users"
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:GetObjectAcl",
                "s3:GetObject",
                "s3:ListBucket",
                "s3:ListMultipartUploadParts"
            ],
            "Resource": "arn:aws:s3:::demo/workgroups/adhocusers/*"
        },
{
            "Effect": "Allow",
            "Action": [
                "glue:Get*"
            ],
            "Resource": [
                "arn:aws:glue:us-east-1:112233445566:catalog",
                "arn:aws:glue:us-east-1:112233445566:database/amazon",
                "arn:aws:glue:us-east-1:112233445566:table/amazon/*"
            ]
        }
    ]
}

Now your analyst user can execute queries only in the ad-hoc-users workgroup. The analyst user can switch to other workgroups, but they lose access when they try to perform any action. They are further restricted to list and query only those tables that belong to the Amazon database. For more information about controlling access to AWS Glue resources such as databases and tables, see AWS Glue Resource Policies for Access Control.

The following screenshot shows what the analyst user sees in the Athena console:

I’ve created a simple Node.js tool that executes SQL queries stored as files in a given directory. You can find my Athena test runner code in the athena_test_runner GitHub repo. You can use this code to simulate a reporting tool, after configuring it to use a workgroup. To do that, create an IAM role with permissions like those previously defined for the analyst user. This time, restrict access to the reporting workgroup.

The following JavaScript code example shows how to select a workgroup programmatically when executing queries:

function executeQueries(files) {
    params = 
    {
      "QueryString": "", 
      "ResultConfiguration": { 
        "OutputLocation": ""
      },
      "QueryExecutionContext": {
        "Database": "default"
      },
      "WorkGroup":"reporting"
    }
 
    params.QueryString = "SELECT * FROM amazon.final_parquet LIMIT 10"
    return new Promise((resolve, reject) => {
        athena.startQueryExecution(params, (err, results) => {
            if (err) {
                reject(err.message)
            } else {
                resolve(results)
            }
        })
    })
}

Run sample automated reports under the reporting workgroup, with the following command:

node index.js testsuite/

Query histories remain isolated between workgroups. A user logging into the Athena console as an analyst using the ad-hoc-users workgroup doesn’t see any automated reports that you ran under the reporting workgroup.

Managing query usage and cost

You have two workgroups configured: one for ad hoc users and another for automated reports. Now, you must safeguard against bad queries. In this use case, two potential situations for query usage should be monitored and controlled:

  • Make sure that users don’t run queries that scan more data than allowed by their budget.
  • Safeguard against automated script bugs that could cause indefinite query retirement.

First, configure data usage controls for your ad-hoc-users workgroup. There are two types of data usage controls: per-query and per-workgroup.

Set the per-query control for analysts to be 1 GB. This control cancels any query run in the ad-hoc-users workgroup that tries to scan more than 1 GB.

To observe this limit in action, choose Update, return to the query editor, and run a query that would scan more than 1 GB. This query triggers the error message, “Query cancelled! : Bytes scanned limit was exceeded”. Remember that you incur charges for data the query scanned up to the point of cancellation. In this case, you incur charges for 1 GB of data.

Now, switch to your reporting workgroup. For this workload, you’re not worried about individual queries scanning too much data. However, you want to control the aggregate amount of data scanned of all queries in this workgroup.

Create a per-workload data usage control for the reporting workgroup. You can configure the maximum amount of data scanned by all queries in the workgroup during a specific period.

For the automated reporting workload, you probably have a good idea of how long the process should take and the total amount of data that queries scan during this time. You only have a few reports to run, so you can expect them to run in a few minutes, only scanning a few megabytes of data. Begin by setting up a low watermark alarm to notify you when your queries have scanned more data than you would expect in five minutes. The following example is for demo purposes only. In most cases, this period would be longer. I configured the alarm to send a notification to an Amazon SNS topic that I created.

To validate the alarm, I made a minor change to my test queries, causing them to scan more data. This change triggered the SNS alarm, shown in the following Amazon CloudWatch dashboard:

Next, create a high watermark alarm that is triggered when the queries in your reporting workgroup exceed 1 GB of data over 15 minutes. In this case, the alarm triggers an AWS Lambda function that disables the workgroup, making sure that no additional queries execute in it. This alarm protects you from incurring faulty automation code or runaway query costs.

Before creating the data usage control, create a Node.js Lambda function to disable the workgroup. Paste in the following code:

exports.handler = async (event) => {
    const AWS = require('aws-sdk')
    let athena = new AWS.Athena({region: 'us-east-1'})
 
    let msg = JSON.parse(event.Records[0].Sns.Message)
    let wgname = msg.Trigger.Dimensions.filter((i)=>i.name=='WorkGroup')[0].value
    
    athena.updateWorkGroup({WorkGroup: wgname, State: 'DISABLED'})
 
    const response = {
        statusCode: 200,
        body: JSON.stringify(`Workgroup ${wgname} has been disabled`),
    };
    return response;
}

This code grabs the workgroup name from the SNS message body and calls the UpdateWorkGroup API action with the name and the state of DISABLED. The Athena API requires the most recent version of the AWS SDK. When you create the Lambda bundle, include the latest AWS SDK version in that bundle.

Next, create a new SNS topic and a subscription. For Protocol, select AWS Lambda. Then, select the Lambda function that you created in the previous step.

In the Athena console, create the second alarm, 1 GB for 15 min., and point it to the SNS topic that you created earlier. When triggered, this SNS topic calls the Lambda function that disables the reporting workgroup. No more queries can execute in this workgroup. You see this error message in the console when a workgroup is disabled:

Athena exposes other aggregated metrics per workgroup under the AWS/Athena namespace in CloudWatch, such as the query status and the query type (DDL or DML) per workgroup. To learn more, see Monitoring Athena Queries with CloudWatch Metrics.

Cost allocation tags

When you created your ad-hoc-users and reporting workgroups, you added Name and Dept tags. These tags can be used in your Billing and Cost Management console to determine the usage per workgroup.

Summary

In this post, you learned how to use workgroups in Athena to isolate different query workloads, manage access, and define data usage controls to protect yourself from runaway queries. Metrics exposed to CloudWatch help you monitor query performance and make sure that your users are getting the best experience possible. For more details, see Using Workgroups to Control Query Access.

About the Author

Roy Hasson is a Global Business Development Manager for AWS Analytics. He works with customers around the globe to design solutions to meet their data processing, analytics and business intelligence needs. Roy is big Manchester United fan, cheering his team on and hanging out with his family.