AWS Big Data Blog

Query cross-account Amazon DynamoDB tables using Amazon Athena Federated Query

Amazon DynamoDB is ideal for applications that need a flexible NoSQL database with low read and write latencies and the ability to scale storage and throughput up or down as needed without code changes or downtime. You can use DynamoDB for use cases including mobile apps, gaming, digital ad serving, live voting, audience interaction for live events, sensor networks, log ingestion, access control for web-based content, metadata storage for Amazon S3 objects, e-commerce shopping carts, and web session management.

What if you have the need to allow other AWS accounts to query your DynamoDB table? What if other accounts need to join data on your DynamoDB table with their data stored in data sources like Amazon CloudWatch, Amazon DocumentDB, Amazon Redshift, Amazon OpenSearch, MySQL, PostgreSQL connected with Athena data source connectors, and Amazon S3?

Amazon Athena cross-account federated query enables you to run SQL queries across data stored in relational, non-relational, object, and custom data sources where data source and its connector are in different AWS accounts from the user querying the data. There are no new charges for querying connectors in another account, but Athena’s standard rates for data scanned, Lambda usage, and other services apply.

This post will demonstrate Athena in an AWS account accessing a DynamoDB table of another AWS account by using the Athena cross-account federated query. It also explains deploying Amazon Athena DynamoDB connector using AWS Serverless Application Repository and setting up Athena cross-account federation between two accounts for the Demo.

Walkthrough

The solution has the following steps to demonstrate Athena cross-account federated query:

  • Set up Athena federation – To deploy a Lambda function for the data source connector and connect it to a data source.
  • Set up Athena cross-account federation – To set up IAM permissions for Athena cross-account federation.
  • Test Athena cross-account federated query – To show a demo of how an AWS account can share its DynamoDB table as an Athena data source with another AWS account.

Prerequisites

For this walkthrough, you should have the following prerequisites:

  • Two AWS Accounts
  • AWS resources: Amazon Simple Storage Service (Amazon S3), Amazon Athena, Amazon DynamoDB, AWS Lambda

Data source connectors

A data source connector is a piece of code that can translate between your target data source and Athena. Athena uses data source connectors that run on AWS Lambda to run federated queries. You can think of a connector as an extension of Athena’s query engine.

Connectors use Apache Arrow as the format for returning data requested in a query, which enables connectors to be implemented in languages such as C, C++, Java, Python, and Rust.

Athena uses data source connectors that run on AWS Lambda to run federated queries. Since connectors are processed in Lambda, they can be used to access data from any data source on the cloud or on premises that is accessible from Lambda

To use a connector in your Athena queries, deploy it to your account using one of the following ways:

This blog uses the AWS Serverless Application Repository to deploy the Amazon Athena DynamoDB connector.

After you deploy data source connectors, the connector is associated with a catalog that you can specify in SQL queries. You can combine SQL statements from multiple catalogs and span multiple data sources with a single query. When a query is submitted against a data source, Athena invokes the corresponding connector to identify parts of the tables that need to be read, manages parallelism, and pushes down filter predicates. Based on the user submitting the query, connectors can provide or restrict access to specific data elements.

Architecture

  • AWS Account-A has a DynamoDB table called Music.
  • Account-A has an Athena data source connector to federate into DynamoDB.
  • AWS Account-B has Analysts who need to query the DynamoDB table.
  • Account-A is sharing the Athena data source with Account-B by using Athena cross-account federated query.

The following figure shows Amazon Athena cross-account federation for Account-B to access DynamoDB in Account-A.

To demonstrate the Athena cross-account federation, create a sample DynamoDB table called music in Account-A. Follow the instructions at Getting started with DynamoDB to create the table Music and load thesample data.

Set up Athena federation

Preparing to create federated queries is a two-part process: deploying a Lambda function for the data source connector and connecting the Lambda function to a data source. For more details, see Enabling cross-account federated queries.

Deploy AthenaDynamoDBConnector using AWS Serverless Application Repository

  1. Sign in as an administrator to AWS Account-A.
  2. Open the Serverless Application Repository.
  3. In the navigation pane, choose Available applications.
  4. Select the option Show apps that create custom IAM roles or resource policies.
  5. In the search box, type the name of the connector AthenaDynamoDBConnector.

  6. Choosing a connector opens the Lambda function’s Application details page in the AWS Lambda console.
  7. On the right side of the details page, for Application settings, fill in the required information.
    • Application name – Name of AWS CloudFormation Stack to deploy the connector: AthenaDynamoDBConnector.
    • AthenaCatalogName – It is the catalog name to create in Athena. It is also the name of the Lambda function. Give it in lowercase: acct1dynamodb.
    • SpillBucket – Specify an existing S3 bucket (spill-bucket) in your account to receive data from any large response payloads that exceed Lambda function response size limits.
  8. Select I acknowledge that this app creates custom IAM roles and resource policies. For more information, choose the Info link.
  9. At the bottom right of the Application settings section, choose Deploy.
  10. Serverless Application Repository will create an AWS CloudFormation stack to deploy the connector.
  11. When the deployment is complete, you will see the Lambda function in the Resources section of the AWS CloudFormation stack. Note down the Lambda function name.

Connect Athena to the data source

  1. Go to Athena console in Account-A.
  2. Choose Data sources. Click Create Data source.
  3. In Choose data source, search for Amazon DynamoDB and select it.
  4. In Data source details, give a Data source name acct1dynamodb
  5. For Lambda function in the Connection details section, choose the name of the function acct1dynamodb from the dropdown.
  6. On the Review and create page, review the data source details, and then choose Create data source.

  7. You will see the data source acctdynamodb in the Data sources.
  8. Go to Query editor. Choose the Data Source acct1dynamodb from the dropdown.
  9. You will see all the tables in the shared data source.
  10. Run the following SQL in Athena Query editor 
    SELECT songtitle, albumtitle, cast(awards as int) as awards 
    FROM "acct1dynamodb"."default"."music" 
    WHERE artist = 'Acme Band' 
    limit 2;
  11. Verify Athena federation works.

Set up Athena cross-account federation

In Account-A: Set up IAM permissions for cross-account

  1. Sign in as an administrator to Account-A.
  2. On the S3 spill bucket (of the Lambda function), grant GetObject and ListBucket permissions to the IAM user analyst of Account-B.

Note: Replace Account-B-id with your actual AWS cross-account id to which you want to share the DynamoDB table. Replace spill-bucket with your actual S3 bucket in Account-A.

{
    "Version": "2008-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": ["arn:aws:iam::Account-B-id:user/analyst"]
            },
            "Action": [
                "s3:GetObject",
                "s3:ListBucket"
             ],
            "Resource": [
                "arn:aws:s3::: spill-bucket",
                "arn:aws:s3::: spill-bucket/*"
            ]
        }
     ]
 }
  1. Grant InvokeFunction on Lambda function acct1dynamodb to IAM user analyst of Account-B.

Note: Replace Account-A-id with your actual AWS account id where you have the DynamoDB table. Replace Account-B-id with your actual AWS cross-account id to which you want to share the DynamoDB table.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "CrossAccountInvocationStatement",
      "Effect": "Allow",
      "Principal": {
        "AWS": ["arn:aws:iam::Account-B-id:user/analyst"]
      }, 
      "Action": "lambda:InvokeFunction",
      "Resource": "arn:aws:lambda:aws-region:Account-A-id:function:acct1dynamodb"
    }
  ]
}
  1. Go to the Lambda function acct1dynamodb. Choose Configuration and Permissions.

  1. Go to Resource-based policy and Add permissions.

When you save the above permissions, you can see them under Policy statements in Resource-based policy of the Lambda function.

In Account-B: Set up IAM permissions for cross-account

  1. Sign in as an administrator to AWS Account-B.
  2. Create IAM role called AthenaCrossAccountFederated-Account-A-id for Account-A to assume. Add the following inline policy to the role.

Note: Replace Account-B-id with your actual AWS cross-account id to which you want to share the DynamoDB table.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "athena:CreateDataCatalog",
            "Resource": "arn:aws:athena:aws-region:Account-B-id:datacatalog/*"
        }
    ]
}
  1. Grant permission to the IAM user analyst to invoke the Lambda function acct1dymanodb of Account-A

Note: Replace Account-A-id with your actual AWS account id where you have the DynamoDB table.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "lambda:InvokeFunction",
            "Resource": "arn:aws:lambda:aws-region:Account-A-id:function:acct1dynamodb"
        }
    ]
}

Share the Athena Data source with Account-B

After the permissions are in place, you share a data connector in your account (Account-A) with another account (Account-B). Account-A retains full control and ownership of the connector. When Account-A makes configuration changes to the connector, the updated configuration applies to the shared connector in Account-B.

  1. Sign in as an administrator to Account-A.
  2. On Athena, go to Data sources, choose data source acct1dynamodb you want to share. Go to the Share option in the top right corner.

  1. For Account ID, enter the Account-B-id to share your data source with Account-B and click Share.

Test Athena cross-account federated query: Access the shared data source from Account-B

  1. Sign in as IAM user analyst to Account-B.
  2. In Athena, go to Data sources. You will see the data source acct1dynamodb.

  1. Go to Query editor. Choose the Data Source acct1dynamodb from the dropdown.

  1. You will see all the tables in the shared data source.

  1. Run the following SQL in Athena Query editor
SELECT songtitle, albumtitle, cast(awards as int) as awards 
FROM "acct1dynamodb"."default"."music" 
WHERE artist = 'Acme Band' 
limit 2;

  1. Athena cross-account federated has worked! This validates that user analyst in Account-B can see the data of the DynamoDB table of Account-A.

Clean up

To avoid incurring future charges, delete the following resources that were provisioned for this demo:

  • S3 spill bucket used in AWS Lambda
  • Lambda function used for the data source connector
  • Sample DynamoDB table

Conclusion

In this post, we saw how you can access a cross-account DynamoDB table using Athena Federated Query to query the data in place. You can execute a single SQL query to join this data across data sources like Amazon CloudWatch, Amazon DocumentDB, Amazon Redshift, Amazon OpenSearch, MySQL, PostgreSQL, Oracle, SQL Server, HBase, Redis, BigQuery, Snowflake, Teradata with Athena data source connectors and Amazon S3.


About the author

Satya Adimula is a Senior Data Architect at AWS based in Boston. With extensive experience in data and analytics, Satya helps organizations derive their business insights from the data at scale.