Use an AD FS user and Tableau to securely query data in AWS Lake Formation
Security-conscious customers often adopt a Zero Trust security architecture. Zero Trust is a security model centered on the idea that access to data shouldn’t be solely based on network location, but rather require users and systems to prove their identities and trustworthiness and enforce fine-grained identity-based authorization rules before granting access to applications, data, and other systems.
Some customers rely on third-party identity providers (IdPs) like Active Directory Federated Services (AD FS) as a system to manage credentials and prove identities and trustworthiness. Users can use their AD FS credentials to authenticate to various related yet independent systems, including the AWS Management Console (for more information, see Enabling SAML 2.0 federated users to access the AWS Management Console).
In the context of analytics, some customers extend Zero Trust to data stored in data lakes, which includes the various business intelligence (BI) tools used to access that data. A common data lake pattern is to store data in Amazon Simple Storage Service (Amazon S3) and query the data using Amazon Athena.
AWS Lake Formation allows you to define and enforce access policies at the database, table, and column level when using Athena queries to read data stored in Amazon S3. Lake Formation supports Active Directory and Security Assertion Markup Language (SAML) identity providers such as OKTA and Auth0. Furthermore, Lake Formation securely integrates with the AWS BI service Amazon QuickSight. QuickSight allows you to effortlessly create and publish interactive BI dashboards, and supports authentication via Active Directory. However, if you use alternative BI tools like Tableau, you may want to use your Active Directory credentials to access data stored in Lake Formation.
In this post, we show you how you can use AD FS credentials with Tableau to implement a Zero Trust architecture and securely query data in Amazon S3 and Lake Formation.
In this architecture, user credentials are managed by Active Directory, and not Amazon Identity and Access Management (IAM). Although Tableau provides a connector to connect Tableau to Athena, the connector requires an AWS access key ID and an AWS secret access key normally used for programmatic access. Creating an IAM user with programmatic access for use by Tableau is a potential solution, however some customers have made an architectural decision that access to AWS accounts is done via a federated process using Active Directory, and not an IAM user.
In this post, we show you how you can use the Athena ODBC driver in conjunction with AD FS credentials to query sample data in a newly created data lake. We simulate the environment by enabling federation to AWS using AD FS 3.0 and SAML 2.0. Then we guide you through setting up a data lake using Lake Formation. Finally, we show you how you can configure an ODBC driver for Tableau to securely query your data in the lake data using your AD FS credentials.
The following prerequisites are required to complete this walkthrough:
- An understanding of IAM roles and concepts
- A basic understanding of Lake Formation and Athena
- A copy of Tableau with a 14-day trail or fully licensed software
- An understanding of the concepts of Active Directory, and how to join a computer to an Active Directory domain
- An understanding of configuring ODBC components on a Windows machine
Create your environment
To simulate the production environment, we created a standard VPC in Amazon Virtual Private Cloud (Amazon VPC) with one private subnet and one public subnet. You can do the same using the VPC wizard. Our Amazon Elastic Compute Cloud (Amazon EC2) instance running the Tableau client is located in a private subnet and accessible via an EC2 bastion host. For simplicity, connecting out to Amazon S3, AWS Glue, and Athena is done via the NAT gateway and internet gateway set up by the VPC wizard. Optionally, you can replace the NAT gateway with AWS PrivateLink endpoints (AWS Security Token Service (AWS STS), Amazon S3, Athena, and AWS Glue endpoints are required) to make sure traffic remains within the AWS network.
The following diagram illustrates our environment architecture.
After you create your VPC with its private and public subnets, you can continue to build out the other requirements, such as Active Directory and Lake Formation. Let’s begin with Active Directory.
Enable federation to AWS using AD FS 3.0 and SAML 2.0
AD FS 3.0, a component of Windows Server, supports SAML 2.0 and is integrated with IAM. This integration allows Active Directory users to federate to AWS using corporate directory credentials, such as a user name and password from Active Directory. Before you can complete this section, AD FS must be configured and running.
To set up AD FS, follow the instructions in Setting up trust between AD FS and AWS and using Active Directory credentials to connect to Amazon Athena with ODBC driver. The first section of the post explains in detail how to set up AD FS and establish the trust between AD FS and Active Directory. The post ends with setting up an ODBC driver for Athena, which you can skip. The post creates a group name called
ArunADFSTest. This group relates to a role in your AWS account, which you use later.
When you have successfully verified that you can log in using your IdP, you’re ready to configure your Windows environment ODBC driver to connect to Athena.
Set up a data lake using Lake Formation
Lake Formation is a fully managed service that makes it easy for you to build, secure, and manage data lakes. Lake Formation provides its own permissions model that augments the IAM permissions model. This centrally defined permissions model enables fine-grained access to data stored in data lakes through a simple grant/revoke mechanism. We use this permissions model to grant access to the AD FS role we created earlier.
- On the Lake Formation console, you’re prompted with a welcome box the first time you access Lake formation.
The box asks you to select the initial administrative user and roles.
- Choose Add myself and choose Get Started.
We use the sample database provided by Lake Formation, but you’re welcome to use your own dataset. For instructions on loading your own dataset, see Getting Started with Lake Formation.With Lake Formation configured, we must grant read access to the AD FS role (
ArunADFSTest) we created in the previous step.
- In the navigation pane, choose Databases.
- Select the database
- On the Actions menu, choose Grant.
We grant the
SamlOdbcAccessrole access to
- For Principals, select IAM users and roles.
- For IAM users and roles, choose the role
- Select Named data catalog resources.
- For Databases, choose the database
- For Tables¸ choose All tables.
- Set the table permissions to Select and Describe.
- For Data permissions, select All data access.
- Choose Grant.
Our AD FS user assumes the role
ArunADFSTest, which has been granted access to
sampledbby Lake Formation. However, the
ArunADFSTestrole requires access to Lake Formation, Athena, AWS Glue, and Amazon S3. Following the practice of least privilege, AWS defines policies for specific Lake Formation personas. Our user fits the Data Analyst persona, which requires enough permissions to run queries.
- Add the
AmazonAthenaFullAccessmanaged policy (for instructions, see Adding and removing IAM identity permissions) and the following inline policy to the
Each time Athena runs a query, it stores the results in an S3 bucket, which is configured as the query result location in Athena.
- Create an S3 bucket, and in this new bucket create a new folder called
- Update the settings on the Athena console to use your newly created folder.
Tableau uses Athena to run the query and read the results from Amazon S3, which means that the
ArunADFSTestrole requires access to your newly created S3 folder.
- Attach the following inline policy to the
Our AD FS user can now assume a role that has enough privileges to query the sample database. The next step is to configure the ODBC driver on the client.
Configure an Athena ODBC driver
Athena is a managed serverless and interactive query service that allows you to analyze your data in Amazon S3 using standard Structured Query Language (SQL). You can use Athena to directly query data that is located in Amazon S3 or data that is registered with Lake Formation. Athena provides you with ODBC and JDBC drivers to effortlessly integrate with your data analytics tools (such as Microsoft Power BI, Tableau, or SQL Workbench) to seamlessly gain insights about your data in minutes.
To connect to our Lake Formation environment, we first need to install and configure the Athena ODBC driver on our Windows environment.
- Download the Athena ODBC driver relevant to your Windows environment.
- Install the driver by choosing the driver file you downloaded (in our case,
- Choose Next on the welcome page.
- Read the End-User License Agreement, and if you agree to it, select I Accept the terms in the License Agreement and choose Next.
- Leave the default installation location for the ODBC driver and choose Next.
- Choose Install to begin the installation.
- If the User Access Control pop-up appears, choose Yes to allow the driver installation to continue.
- When the driver installation is complete, choose Finish to close the installer.
- Open the Windows ODBC configuration application by selecting the Start bar and searching for ODBC.
- Open the version corresponding to the Athena ODBC version you installed, in our case 64 bit.
- On the User DSN tab, choose Add.
- Choose the Simba Athena ODBC driver and choose Finish.
Configure the ODBC driver to AD FS authentication
We now need to configure the driver.
- Choose the driver on the Driver configuration page.
- For Data Source Name, enter
- For Description, enter
Lake Formation Sample Database.
- For AWS Region, enter
eu-west-1or the Region you used when configuring Lake Formation.
- For Metadata Retrieval Method, choose Auto.
- For S3 Output Location, enter
- For Encryption Options, choose NOT_SET.
- Clear the rest of the options.
- Choose Authentication Options.
- For Authentication Type, choose ADFS.
- For User, enter
- For Password, enter your domain user password.
- For Preferred Role, enter
The preferred role is the same role configured in the previous section (
- For IdP Host, enter the AD federation URL you configured during AD FS setup.
- For IdP Port, enter
- Select SSL Insecure.
- Choose OK.
- Choose Test on the initial configuration page to test the connection.
- When you see a success confirmation, choose OK.
We can now connect to our Lake Formation sample database from our desktop environment using the Athena ODBC driver. The next step is to use Tableau to query our data using the ODBC connection.
Connect to your data using Tableau
To connect to your data, complete the following steps:
- Open your Tableau Desktop edition.
- Under To a Server, choose More.
- On the list of available Tableau installed connectors, choose Other Databases (ODBC).
- Choose the ODBC database you created earlier.
- Choose Connect.
- Choose Sign In.
When the Tableau workbook opens, select the database, schema, and table that you want to query.
- For Database, choose the database as listed in the ODBC setup (for this post,
- For Schema, choose your schema (
- For Table, search for and choose your table (
- Drag the table to the work area to start your query and further report development.
AWS Lake Formation provides database-, table-, column-, and tag-based access controls, and cross-account sharing at no charge. Lake Formation charges a fee for transaction requests and for metadata storage. In addition to providing a consistent view of data and enforcing row-level and cell-level security, the Lake Formation Storage API scans data in Amazon S3 and applies row and cell filters before returning results to applications. There is a fee for this filtering. To make sure that you’re not charged for any of the services that you no longer need, stop any EC2 instances that you created. Remove any objects in Amazon S3 you no longer require, because you pay for objects stored in S3 buckets.
Lastly, delete any Active Directory instances you may have created.
Lake Formation makes it simple to set up a secure data lake and then use the data lake with your choice of analytics and machine learning services, including Tableau. In this post, we showed you how you can connect to your data lake using AD FS credentials in a simple and secure way by using the Athena ODBC driver. Your AD FS user is configured within the ODBC driver, which then assumes a role in AWS. This role is granted access to only the data you require via Lake Formation.
About the Authors
Jason Nicholls is an Enterprise Solutions Architect at AWS. He’s passionate about building scalable web and mobile applications on AWS. He started coding on a Commodore VIC 20, which lead to a career in software development. Jason holds a MSc in Computer Science with specialization in coevolved genetic programming. He is based in Johannesburg, South Africa.
Francois van Rensburg is a Partner Management Solutions Architect at AWS. He has spent the last decade helping enterprise organizations successfully migrate to the cloud. He is passionate about networking and all things cloud. He started as a Cobol programmer and has built everything from software to data centers. He is based in Denver, Colorado.