Simplify external object access in Amazon Redshift using automatic mounting of the AWS Glue Data Catalog

November 2024: This post was reviewed and updated for accuracy.

Amazon Redshift is a petabyte-scale, enterprise-grade cloud data warehouse service delivering the best price-performance. Today, tens of thousands of customers run business-critical workloads on Amazon Redshift to cost-effectively and quickly analyze their data using standard SQL and existing business intelligence (BI) tools.

Amazon Redshift now makes it easier for you to run queries in AWS data lakes by automatically mounting the AWS Glue Data Catalog. You no longer have to create an external schema in Amazon Redshift to use the data lake tables cataloged in the Data Catalog. Accessing the Glue Data Catalog from Amazon Redshift can be easily achieved by using AWS IAM Identity Center Authentication. This is the recommended single sign-on method to integrate Amazon Redshift and browse the Glue Data Catalog. Alternatively, you can also use AWS Identity and Access Management (IAM) credentials or IAM role to browse the Glue Data Catalog and query data lake tables directly from Amazon Redshift Query Editor v2 or your preferred SQL clients.

This feature is now available in all AWS commercial and US Gov Cloud Regions where Amazon Redshift RA3, Amazon Redshift Serverless, and AWS Glue are available. To learn more about auto-mounting of the Data Catalog in Amazon Redshift, refer to Querying the AWS Glue Data Catalog.

Enabling easy analytics for everyone

Amazon Redshift is helping tens of thousands of customers manage analytics at scale. Amazon Redshift offers a powerful analytics solution that provides access to insights for users of all skill levels. You can take advantage of the following benefits:

It enables organizations to analyze diverse data sources, including structured, semi-structured, and unstructured data, facilitating comprehensive data exploration
With its high-performance processing capabilities, Amazon Redshift handles large and complex datasets, ensuring fast query response times and supporting real-time analytics
Amazon Redshift provides features like Multi-AZ and cross-Region snapshot copy for high availability and disaster recovery, and provides authentication and authorization mechanisms to make it reliable and secure
With features like Amazon Redshift ML, it democratizes ML capabilities across a variety of user personas
The flexibility to utilize different table formats such as Apache Hudi, Delta Lake, and Apache Iceberg optimizes query performance and storage efficiency
Integration with advanced analytical tools empowers you to apply sophisticated techniques and build predictive models
Scalability and elasticity allow for seamless expansion as data and workloads grow

Overall, Amazon Redshift empowers organizations to uncover valuable insights, enhance decision-making, and gain a competitive edge in today’s data-driven landscape.

The new automatic mounting of the AWS Glue Data Catalog feature enables you to directly query AWS Glue objects in Amazon Redshift without the need to create an external schema for each AWS Glue database you want to query. With automatic mounting the Data Catalog, Amazon Redshift automatically mounts the cluster account’s default Data Catalog during boot or user opt-in as an external database, named awsdatacatalog.

Relevant use cases for automatic mounting of the AWS Glue Data Catalog feature

You can use tools like Amazon EMR to create new data lake schemas in various formats, such as Apache Hudi, Delta Lake, and Apache Iceberg. However, when analysts want to run queries against these schemas, it requires administrators to create external schemas for each AWS Glue database in Amazon Redshift. You can now simplify this integration using automatic mounting of the AWS Glue Data Catalog.

The following diagram illustrates this architecture.

Solution overview

You can now use SQL clients like Amazon Redshift Query Editor v2 to browse and query awsdatacatalog. The authentication method can fall in couple of categories, with or without using IAM Identity center integration. Following section explains both scenarios.

Scenario 1: Authentication using IAM Identity Center

Complete the following high-level steps to integrate the automatic mounting of the Data Catalog using Query Editor V2 and a third-party SQL client:

As a prerequisite, Set up Single Sign-On (SSO) to Amazon Redshift using blog post Integrate Identity Provider (IdP) with Amazon Redshift Query Editor V2 and SQL Client using AWS IAM Identity Center for seamless Single Sign-On.
Create datasets in your S3 data lake and populate AWS Glue Data Catalog objects.
Set up AWS Lake Formation as an IAM Identity Center (IDC) application.
Register the data with Lake Formation.
Configure IAM Identity Center group permissions on catalog resources using AWS Lake Formation.
Grant usage on awsdatacatalog and discover the auto-mounted objects.
Connect with Amazon Redshift and query the Data Catalog as an IAM Identity Center user using Query Editor V2 or third-party clients.

Scenario 2: Authentication using non IAM Identity Center methods

In Query Editor V2, to connect to the awsdatacatalog database, choose the following:

Must use authentication method Temporary credentials using your IAM identity with the Redshift provisioned cluster
Must use the authentication method federated user to connect with a Redshift Serverless workgroup.

Complete the following high-level steps to integrate the automatic mounting of the Data Catalog using Query Editor V2 and a third-party SQL client:

Provision resources with AWS CloudFormation to populate Data Catalog objects.
Connect Redshift Serverless and query the Data Catalog as a federated user using Query Editor V2.
Connect with Redshift provisioned and query the Data Catalog using Query Editor V2.
Configure permissions on catalog resources using AWS Lake Formation.
Federate with Redshift Serverless and query the Data Catalog using Query Editor V2 and a third-party SQL client.
Discover the auto-mounted objects.
Connect with Redshift provisioned and query the Data Catalog as a federated user using a third-party client.
Connect with Amazon Redshift and query the Data Catalog as an IAM user using third-party clients.

The following diagram illustrates the solution workflow in both scenarios explained before.

Let us dive deeper into each of these scenarios and understand them in detail.

Scenario 1: Authentication using IAM Identity Center

Prerequisites:

This solution walkthrough assumes you have set up a Lake Formation administrator role or a similar role to follow along with the instructions in this post. To learn more about setting up permissions for a data lake administrator, explore Data Lake administrators persona in Lake Formation.

Additionally, set up Single Sign-On (SSO) to Amazon Redshift using blog post Integrate Identity Provider (IdP) with Amazon Redshift Query Editor V2 and SQL Client using AWS IAM Identity Center for seamless Single Sign-On

Once AWS Lake Formation administrator persona and Amazon Redshift SSO set up with IAM IDC integration is complete, you will have following prerequisites met.

An Okta account integrated with IAM Identity Center to sync users and groups
A Redshift managed application with IAM Identity Center
A Redshift source cluster with IAM Identity Center integration enabled
A Redshift target cluster with IAM Identity Center integration enabled (you can skip the section to set up Amazon Redshift role-based access)
Users and groups from IAM Identity Center assigned to the Redshift application
A permission set assigned to AWS accounts to enable Redshift Query Editor v2 access

Create an IAM policy using following permission and attach to the role used while creating IAM IDC redshift application.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "lakeformation:GetDataAccess",
                "glue:GetTable",
                "glue:GetTables",
                "glue:SearchTables",
                "glue:GetDatabase",
                "glue:GetDatabases",
                "glue:GetPartitions",
                "lakeformation:GetResourceLFTags",
                "lakeformation:ListLFTags",
                "lakeformation:GetLFTag",
                "lakeformation:SearchTablesByLFTags",
                "lakeformation:SearchDatabasesByLFTags"
           ],
            "Resource": "*"
        }
    ]
}

Attach the policy created earlier to the role used for IAM IDC Redshift application IAMIDCRedshiftRole. Also attach following policies and trust relationships to the role.

redshift-idc-policy

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "redshift:DescribeQev2IdcApplications",
                "redshift-serverless:ListNamespaces",
                "redshift-serverless:ListWorkgroups",
                "redshift-serverless:GetWorkgroup"
            ],
            "Resource": "*"
        },
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": [
                "sso:DescribeApplication",
                "sso:DescribeInstance"
            ],
            "Resource": [
                "arn:aws:sso:::instance/ssoins-123012345678",
                "arn:aws:sso::123012345678:application/ssoins-123sd1wss5678abbv/*"
            ]
        }
    ]
}

lakeformation-redshift-idc-policy

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "lakeformation:GetResourceLFTags",
                "glue:SearchTables",
                "glue:GetDatabase",
                "lakeformation:SearchDatabasesByLFTags",
                "glue:GetTables",
                "lakeformation:GetDataAccess",
                "glue:GetPartitions",
                "lakeformation:SearchTablesByLFTags",
                "lakeformation:ListLFTags",
                "lakeformation:GetLFTag",
                "glue:GetDatabases",
                "glue:GetTable"
            ],
            "Resource": "*"
        }
    ]
}

Trust Relationship

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": "redshift.amazonaws.com"
            },
            "Action": [
                "sts:AssumeRole",
                "sts:SetContext"
            ]
        }
    ]
}

Create datasets in your S3 data lake and populate AWS Glue Data Catalog objects

Log in to the AWS Management Console as an IAM administrator.
Go to AWS CloudShell or an alternate system with AWS command line interface (AWS CLI) and run the following AWS CLI command, providing your bucket name to copy the data.
```
aws s3 sync s3://redshift-demos/data/NY-Pub/ s3://<bucketname>/data/NY-Pub/
```
On the AWS Glue console, under Data Catalog in the navigation pane, choose Crawlers. Create a new crawler NYTaxiCrawler and run it to populate ny_pub table under automountdb

Note: A walkthrough of how to create objects in AWS Glue data catalog using public S3 bucket data is provided later in this blog post, under Scenario 2: Authentication using non IAM Identity Center methods section Provision resources with AWS CloudFormation to populate Data Catalog objects. You can edit the crawler NYTaxiCrawler and change the S3 path under Choose data sources and classifiers to S3 bucket used s3://<bucketname>/data/NY-Pub/. Alternatively, you can also follow the steps from one of our Lake Formation blog post Simplify access management with Amazon Redshift and AWS Lake Formation for users in an External Identity Provider from section Use case 1: Redshift Spectrum with Lake Formation.

Set up AWS Lake Formation as an IAM Identity Center (IDC) application

On the Lake Formation console, under Administration in the navigation pane, choose IAM Identity Center integration. Review the options and choose Create to enable Lake Formation integration. The integration status will update to Success.

Register the data with Lake Formation

Complete the following steps to register the ny_pub data with Lake Formation.

On the Lake Formation console, under Administration in the navigation pane, choose Data lake locations.
Choose Register location.
For Amazon S3 path, enter the bucket where the table data resides (s3://<bucketname>/data/NY-Pub/).

For IAM role, choose a Lake Formation user-defined role. For more information, refer to Requirements for roles used to register locations. We create LFRegisterBucketRole used to register data lake location with following policy and trust relationship.

Policy

•	Policy
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:GetObject",
                "s3:DeleteObject"
            ],
            "Resource": [
                "arn:aws:s3:::testidclf/data/NY-Pub/*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::testidclf"
            ]
        }
    ]
}

Trust Relationship

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "DataCatalogViewDefinerAssumeRole1",
            "Effect": "Allow",
            "Principal": {
                "Service": [
                    "lakeformation.amazonaws.com",
                    "glue.amazonaws.com"
                ]
            },
            "Action": [
                "sts:AssumeRole",
                "sts:SetContext"
            ]
        }
    ]
}

For Permission mode, select Lake Formation.
Choose Register location.

Next, verify that the IAMAllowedPrincipal group doesn’t have permission on the database.

On the Lake Formation console, under Data catalog in the navigation pane, choose Databases.
Select localdb and on the Actions menu, choose View permissions.
If IAMAllowedPrincipal is listed, select the principal and choose Revoke.
Repeat these steps to verify permissions for the table ny_pub.

Configure IAM Identity Center group permissions on catalog resources using AWS Lake Formation

Complete the following steps to grant database permissions to the IAM Identity Center group.

On the Lake Formation console, under Data catalog in the navigation pane, choose Databases.
Select the database localdb and on the Actions menu, choose Grant.
Choose Grant database.
Under Principals, select IAM Identity Center and choose Add.
In the pop-up window, if this is the first time assigning users and groups, choose Get started.
Enter the IAM Identity Center group in the search bar and choose the group.
Choose Done.
Under LF-Tags or catalog resources, localdb is already selected for Databases.
Select Describe for Database permissions.
Choose Grant to apply the permissions.

Next, you grant table permissions to the IAM Identity Center group.

Under Data catalog in the navigation pane, choose Databases.
Select the database localdb and on the Actions menu, choose Grant.
Under Principals, select IAM Identity Center and choose Add.
Enter the IAM Identity Center group in the search bar and choose the group.
Choose Assign.
Under LF-Tags or catalog resources, localdb is already selected for Databases.
For Tables, choose ny_pub.
Select Describe and select Table permissions.
Choose Grant to apply the permissions.

Grant usage on awsdatacatalog and discover the auto-mounted objects

As a data warehouse admin, run following grant statements for IAM IDC groups awssso-sales and awssso-finance which maps to Redshift roles awsidc:awssso-sales and awsidc:awssso-finance respectively in your Amazon Redshift data warehouse.

grant usage on database awsdatacatalog to role "awsidc:awssso-sales";
grant usage on database awsdatacatalog to role "awsidc:awssso-finance";

Connect with Amazon Redshift and query the Data Catalog as an IAM Identity Center user using Query Editor V2 or third-party clients

Perform following steps to federate with Redshift using IAM Identity Center Authentication and query the Auto Mount Data Catalog from Query Editor V2

In the IAM IDC Management Account, navigate to the IAM Identity Center console and copy the AWS access portal URL from the dashboard.
Now sign out from the management account and paste the AWS access portal URL in a new browser window.
A browser pop-up will trigger and takes you to preferred IdP login page where you enter your IdP credentials.
After successful authentication, you’ll be logged into the AWS console as a federated user.
Select the permission set named Amazon Redshift Query Editor V2.
Once you federate to Query Editor V2, choose your Redshift instance, right-click and choose Create connection.
To authenticate using IAM Identity Center, choose the authentication method IAM Identity Center.
It will show a pop-up and since your IdP credentials is already cached, it utilizes the same credentials and connects to Amazon Redshift Query Editor V2 using IAM Identity Center authentication.
Explore awsdatacatalog database and select the table ny_pub.

The automatic mounting of AWS Glue Data Catalog for users authenticated to access Amazon Redshift with IAM Identity Center integration, and connect using Query Editor V2 (QEV2) is now complete. The third-party clients also provide similar experience once your required prerequisites and set up steps on data lake using Lake Formation as described in this entire Scenario 1 are complete.

If you connect to Amazon Redshift using non-IDC methods like native users, console access users with IAM role or SAML/Native IDP based SSO users; continue to read along Scenario 2 that walks you through automatic mounting of data catalog in such cases.

Scenario 2: Authentication using non IAM Identity Center methods

Prerequisites

You should have the following prerequisites:

An AWS account. If you don’t have one, you can sign up for one.
A Redshift cluster. For setup instructions, see Create a sample Amazon Redshift cluster.
Alternatively, you could use a Redshift Serverless endpoint. For setup instructions, see Getting started with Amazon Redshift Serverless.
The latest Amazon Redshift JDBC driver version.
A SQL client such as SQL workbench/J.

Provision resources with AWS CloudFormation to populate Data Catalog objects

In this post, we use an AWS Glue crawler to create the external table ny_pub stored in Apache Parquet format in the Amazon Simple Storage Service (Amazon S3) location s3://redshift-demos/data/NY-Pub/. In this step, we create the solution resources using AWS CloudFormation to create a stack named CrawlS3Source-NYTaxiData in either us-east-1 (use the yml download or launch stack) or us-west-2 (use the yml download or launch stack). Stack creation performs the following actions:

Creates the crawler NYTaxiCrawler along with the new IAM role AWSGlueServiceRole-RedshiftAutoMount
Creates automountdb as the AWS Glue database

When the stack is complete, perform the following steps:

On the AWS Glue console, under Data Catalog in the navigation pane, choose Crawlers.
Open NYTaxiCrawler and choose Run crawler.

After the crawler is complete, you can see a new table called ny_pub in the Data Catalog under the automountdb database.

Alternatively, you can follow the manual instructions from the Amazon Redshift labs to create the ny_pub table.

Connect with Redshift Serverless and query the Data Catalog as a federated user using Query Editor V2

In this section, we use an IAM role with principal tags to enable fine-grained federated authentication to Redshift Serverless to access auto-mounting AWS Glue objects.

Complete the following steps:

Create an IAM role and add following permissions. For this post, we add full AWS Glue, Amazon Redshift, and Amazon S3 permissions for demo purposes. In an actual production scenario, it’s recommended to apply more granular permissions.
On the Tags tab, create a tag with Key as RedshiftDbRoles and Value as automount.
In Query Editor V2, run the following SQL statement as an admin user to create a database role named automount:
```
Create role automount;
```

Grant usage privileges to the database role:

GRANT USAGE ON DATABASE awsdatacatalog to role automount;

Switch the role to automountrole by passing the account number and role name.
In the Query Editor v2, choose your Redshift Serverless endpoint (right-click) and choose Create connection.
For Authentication, select Federated user.
For Database, enter the database name you want to connect to.
Choose Create connection.

You’re now ready to explore and query the automatic mounting of the Data Catalog in Redshift Serverless.

Connect with Redshift provisioned cluster and query the Data Catalog using Query Editor V2

To connect with Redshift provisioned cluster and access the Data Catalog, make sure you have completed the steps in the preceding section. Then complete the following steps:

Connect to Redshift Query Editor V2 using the database user name and password authentication method. For example, connect to the dev database using the admin user and password.
In an editor tab, assuming the user is present in Amazon Redshift, run the following SQL statement to grant an IAM user access to the Data Catalog:
```
GRANT USAGE ON DATABASE awsdatacatalog to "IAMR:automountrole";
```
As an admin user, choose the Settings icon, choose Account settings, and select Authenticate with IAM credentials.
Choose Save.
Switch roles to automountrole by passing the account number and role name.
Create or edit the connection and use the authentication method Temporary credentials using your IAM identity.

For more information about this authentication method, see Connecting to an Amazon Redshift database.

You are ready to explore and query the automatic mounting of the Data Catalog in Amazon Redshift.

Discover the auto-mounted objects

This section illustrates the SHOW commands for discovery of auto-mounted objects. See the following code:

// Discovery of Glue databases at the schema level 
SHOW SCHEMAS FROM DATABASE awsdatacatalog;

// Discovery of Glue tables 
 Syntax: SHOW TABLES FROM SCHEMA awsdatacatalog.<glue_db_name>;
Example: SHOW TABLES FROM SCHEMA awsdatacatalog.automountdb;

// Disocvery of Glue table columns 
 Syntax: SHOW COLUMNS FROM TABLE awsdatacatalog.<glue_db_name>.<glue_table_name>;
Example: SHOW COLUMNS FROM TABLE awsdatacatalog.automountdb.ny_pub;

Configure permissions on catalog resources using AWS Lake Formation

To maintain backward compatibility with AWS Glue, Lake Formation has the following initial security settings:

The Super permission is granted to the group IAMAllowedPrincipals on all existing Data Catalog resources
The Use only IAM access control setting is enabled for new Data Catalog resources

These settings effectively cause access to Data Catalog resources and Amazon S3 locations to be controlled solely by IAM policies. Individual Lake Formation permissions are not in effect.

In this step, we will configure permissions on catalog resources using AWS Lake Formation. Before you create the Data Catalog, you need to update the default settings of Lake Formation so that access to Data Catalog resources (databases and tables) is managed by Lake Formation permissions:

Change the default security settings for new resources. For instructions, see Change the default permission model.
Change the settings for existing Data Catalog resources. For instructions, see Upgrading AWS Glue data permissions to the AWS Lake Formation model.

For more information, refer to Changing the default settings for your data lake.

Federate with Redshift Serverless and query the Data Catalog using Query Editor V2 and a third-party SQL client

With Redshift Serverless, you can connect to awsdatacatalog from a third-party client as a federated user from any identity provider (IdP). In this section, we will configure permission on catalog resources for Federated IAM role in AWS Lake Formation. Using AWS Lake Formation with Redshift, currently permission can be applied on IAM user or IAM role level.

To connect as a federated user, we will be using Redshift Serverless. For setup instructions, refer to Single sign-on with Amazon Redshift Serverless with Okta using Amazon Redshift Query Editor v2 and third-party SQL clients.

There are additional changes required on following resources:

In Amazon Redshift, as an admin user, grant the usage to each federated user who needs access on awsdatacatalog:
```
GRANT USAGE ON DATABASE awsdatacatalog to "IAMR:ethan.doe@gmail.com";
```

If the user doesn’t exist in Amazon Redshift, you may need to create the IAM user with the password disabled as shown in the following code and then grant usage on awsdatacatalog:

Create User "IAMR:ethan.doe@gmail.com" with password disable;

On the Lake Formation console, assign permissions on the AWS Glue database to the IAM role that you created as part of the federated setup.
1. Under Principals, select IAM users and roles.
2. Choose IAM role oktarole.
3. Apply catalog resource permissions, selecting automountdb database and granting appropriate table permissions.
Update the IAM role used in the federation setup. In addition to the permissions added to the IAM role, you need to add AWS Glue permissions and Amazon S3 permissions to access objects from Amazon S3. For this post, we add full AWS Glue and AWS S3 permissions for demo purposes. In an actual production scenario, it’s recommended to apply more granular permissions.

Now you’re ready to connect to Redshift Serverless using the Query Editor V2 and federated login.

Use the SSO URL from Okta and log in to your Okta account with your user credentials. For this demo, we log in with user Ethan.
In the Query Editor v2, choose your Redshift Serverless instance (right-click) and choose Create connection.
For Authentication, select Federated user.
For Database, enter the database name you want to connect to.
Choose Create connection.
Run the command select current_user to validate that you are logged in as a federated user.

User Ethan will be able to explore and access awsdatacatalog data.

To connect Redshift Serverless with a third-party client, make sure you have followed all the previous steps.

For SQLWorkbench setup, refer to the section Configure the SQL client (SQL Workbench/J) in Single sign-on with Amazon Redshift Serverless with Okta using Amazon Redshift Query Editor v2 and third-party SQL clients.

The following screenshot shows that federated user ethan is able to query the awsdatacatalog tables using three-part notation:

Connect with Redshift provisioned cluster and query the Data Catalog as a federated user using third-party clients

With Redshift provisioned cluster, you can connect with awsdatacatalog from a third-party client as a federated user from any IdP.

To connect as a federated user with the Redshift provisioned cluster, you need to follow the steps in the previous section that detailed how to connect with Redshift Serverless and query the Data Catalog as a federated user using Query Editor V2 and a third-party SQL client.

There are additional changes required in IAM policy. Update the IAM policy with the following code to use the GetClusterCredentialsWithIAM API:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": "iam:ListGroups",
            "Resource": "*"
        },
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": "redshift:GetClusterCredentialsWithIAM",
            "Resource": "arn:aws:redshift:us-east-2:01234567891:dbname:redshift-cluster-1/dev"
        }
    ]
}

Now you’re ready to connect to Redshift provisioned cluster using a third-party SQL client as a federated user.

For SQLWorkbench setup, refer to the section Configure the SQL client (SQL Workbench/J) in the post Single sign-on with Amazon Redshift Serverless with Okta using Amazon Redshift Query Editor v2 and third-party SQL clients.

Make the following changes:

Use the latest Redshift JDBC driver because it only supports querying the auto-mounted Data Catalog table for federated users
For URL, enter jdbc:redshift:iam://<cluster endpoint>:<port>:<databasename>?groupfederation=true. For example, jdbc:redshift:iam://redshift-cluster-1.abdef0abc0ab.us-east-2.redshift.amazonaws.com:5439/dev?groupfederation=true.

In the preceding URL, groupfederation is a mandatory parameter that allows you to authenticate with the IAM credentials.

The following screenshot shows that federated user ethan is able to query the awsdatacatalog tables using three-part notation.

Connect and query the Data Catalog as an IAM user using third-party clients

In this section, we provide instructions to set up a SQL client to query the auto-mounted awsdatacatalog.

Use three-part notation to reference the awsdatacatalog table in your SELECT statement. The first part is the database name, the second part is the AWS Glue database name, and the third part is the AWS Glue table name:

SELECT * FROM awsdatacatalog.<aws-glue-db-name>.<aws-glue-table-name>;

You can perform various scenarios that read the Data Catalog data and populate Redshift tables.

For this post, we use SQLWorkbench/J as the SQL client to query the Data Catalog. To set up SQL Workbench/J, complete the following steps:

Create a new connection in SQL Workbench/J and choose Amazon Redshift as the driver.
Choose Manage drivers and add all the files from the downloaded AWS JDBC driver pack .zip file (remember to unzip the .zip file).

You must use the latest Redshift JDBC driver because it only supports querying the auto-mounted Data Catalog table.

For URL, enter jdbc:redshift:iam://<cluster endpoint>:<port>:<databasename>?profile=<profilename>&groupfederation=true. For example, jdbc:redshift:iam://redshift-cluster-1.abdef0abc0ab.us-east-2.redshift.amazonaws.com:5439/dev?profile=user2&groupfederation=true.

We are using profile-based credentials as an example. You can use any AWS profile or IAM credential-based authentication as per your requirement. For more information on IAM credentials, refer to Options for providing IAM credentials.

The following screenshot shows that IAM user johndoe is able to list the awsdatacatalog tables using the SHOW command.

The following screenshot shows that IAM user johndoe is able to query the awsdatacatalog tables using three-part notation:

If you get the following error while using groupfederation=true, you need to use the latest Redshift driver:

Something unusual has occurred to cause the driver to fail. Please report this exception:Authentication with plugin is not supported for group federation [SQL State=99999]

Clean up

Complete the following steps to clean up your resources:

If IAM Identity Center integration with Amazon Redshift or AWS Lake Formation was set up using other blog posts for Scenario 1 of this post, ensure you follow the Clean up section of respective posts as well. In case your solution was based off of Scenario 2, complete the following clean up activities
Delete the IAM role automountrole.
Delete the CloudFormation stack CrawlS3Source-NYTaxiData to clean up the crawler NYTaxiCrawler, the automountdb database from the Data Catalog, and the IAM role AWSGlueServiceRole-RedshiftAutoMount.
Update the default settings of Lake Formation:
1. In the navigation pane, under Data catalog, choose Settings.
2. Select both access control options choose Save.
3. In the navigation pane, under Permissions, choose Administrative roles and tasks.
4. In the Database creators section, choose Grant.
5. Search for IAMAllowedPrincipals and select Create database permission.
6. Choose Grant.

Considerations

Note the following considerations:

The Data Catalog auto-mount provides ease of use to analysts or database users. The security setup (setting up the permissions model or data governance) is owned by account and database administrators.
- To achieve fine-grained access control, build a permissions model in AWS Lake Formation.
- If the permissions have to be maintained at the Redshift database level, leave the AWS Lake Formation default settings as is and then run grant/revoke in Amazon Redshift.
If you are using a third-party SQL editor, and your query tool does not support browsing of multiple databases, you can use the “SHOW“ commands to list your AWS Glue databases and tables. You can also query awsdatacatalog objects using three-part notation (SELECT * FROM awsdatacatalog.<aws-glue-db-name>.<aws-glue-table-name>;) provided you have access to the external objects based on the permission model.

Conclusion

In this post, we introduced the automatic mounting of AWS Glue Data Catalog, which makes it easier for customers to run queries in their data lakes. This feature streamlines data governance and access control, eliminating the need to create an external schema in Amazon Redshift to use the data lake tables cataloged in AWS Glue Data Catalog. We showed how you can manage permission on auto-mounted AWS Glue-based objects using Lake Formation. The permission model can be easily managed and organized by administrators, allowing database users to seamlessly access external objects they have been granted access to.

As we strive for enhanced usability in Amazon Redshift, we prioritize unified data governance and fine-grained access control. This feature minimizes manual effort while ensuring the necessary security measures for your organization are in place.

For more information about automatic mounting of the Data Catalog in Amazon Redshift, refer to Querying the AWS Glue Data Catalog.

About the Authors

Maneesh Sharma is a Senior Database Engineer at AWS with more than a decade of experience designing and implementing large-scale data warehouse and analytics solutions. He collaborates with various Amazon Redshift Partners and customers to drive better integration.

Debu Panda is a Senior Manager, Product Management at AWS. He is an industry leader in analytics, application platform, and database technologies, and has more than 25 years of experience in the IT world.

Rohit Vashishtha is a Senior Analytics Specialist Solutions Architect at AWS based in Dallas, Texas. He has 17 years of experience architecting, building, leading, and maintaining big data platforms. Rohit helps customers modernize their analytic workloads using the breadth of AWS services and ensures that customers get the best price/performance with utmost security and data governance.

AWS Big Data Blog