How can I provide cross-account access to resources in the AWS Glue Data Catalog?

Last updated: 2021-08-13

I want to use services such as Amazon EMR, Amazon Athena, and AWS Glue with the AWS Glue Data Catalog in another account.

Resolution

The method that you can use to access resources in the AWS Glue Data Catalog in another account depends on the AWS service that you use to connect with and whether or not you use AWS Lake Formation to control access to the Data Catalog.

If you aren't sure whether you're using Lake Formation and you have a table that you want to share, then run the following AWS Command Line Interface (AWS CLI) command:

aws glue get-table --database-name DOC-EXAMPLE-DB --name DOC-EXAMPLE-TABLE --query 'Table.IsRegisteredWithLakeFormation'

Be sure to replace the following values in the above command:

  • DOC-EXAMPLE-DB with the name of the database
  • DOC-EXAMPLE-TABLE with the name of the table

If the command returns true, then you're using Lake Formation.

Note: If you receive errors when running AWS CLI commands, make sure that you’re using the most recent version of the AWS CLI.

The Data Catalog could be in a hybrid environment with some tables using Lake Formation and some others using AWS Glue permissions. To upgrade your Data Catalog to Lake Formation, see Upgrading AWS Glue data permissions to the AWS Lake Formation model.

Note: This article covers the solution options for cross-account access within a single AWS Region. Accessing resources in a different Region is beyond the scope of this article. If you want to replicate the Data Catalog from your account to an account in a different AWS Region, see AWS Glue Data Catalog Replication utility.

Accessing the Data Catalog without Lake Formation

If you are not using Lake Formation, then do the following to grant resource level permissions to account A from account B's AWS Glue Data Catalog.

Note: Account A is the source account, and account B is the account where the AWS Glue Data Catalog resources are located.

Accessing resources with an AWS Glue extract, transform, and load (ETL) job

1.    Attach a resource policy similar to the following in account A to access the databases and tables in account B. You can attach the policy using the AWS Glue console by choosing Settings in the navigation pane.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::1111222233334444:root"
      },
      "Action": "glue:*",
      "Resource": [
        "arn:aws:glue:us-east-1:5555666677778888:catalog",
        "arn:aws:glue:us-east-1:5555666677778888:database/DOC-EXAMPLE-DB",
        "arn:aws:glue:us-east-1:5555666677778888:table/DOC-EXAMPLE-DB/*"
      ]
    }
  ]
}

Be sure to replace the following values in the above policy:

  • 1111222233334444 with the account ID for account A
  • 5555666677778888 with the account ID for account B
  • us-east-1 with the Region of your choice
  • DOC-EXAMPLE-DB with the name of the database

You can also limit the access to a specific role in account A that's used to run the job by including the ARN of the role in the policy. For example:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::1111222233334444:role/service-role/AWSGlueServiceRole-Glue-Test"
      },
      "Action": "glue:*",
      "Resource": [
        "arn:aws:glue:us-east-1:5555666677778888:catalog",
        "arn:aws:glue:us-east-1:5555666677778888:database/DOC-EXAMPLE-DB",
        "arn:aws:glue:us-east-1:5555666677778888:table/DOC-EXAMPLE-DB/*"
      ]
    }
  ]
}

Be sure to replace the following values in the above policy:

  • 1111222233334444 with the account ID for account A
  • 5555666677778888 with the account ID for account B
  • us-east-1 with the Region of your choice
  • DOC-EXAMPLE-DB with the name of the database
  • AWSGlueServiceRole-Glue-Test with the ARN of the role that's used to run the ETL job

2.    The IAM user in account A that runs the ETL job needs access to the databases and tables in account B. In account A, attach an IAM policy to the AWS Glue ETL job's IAM role to access the database and tables in account B.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "glue:GetDatabase",
        "glue:GetConnection",
        "glue:GetTable",
        "glue:GetPartition"
      ],
      "Resource": [
        "arn:aws:glue:us-east-1:5555666677778888:catalog",
        "arn:aws:glue:us-east-1:5555666677778888:database/default",
        "arn:aws:glue:us-east-1:5555666677778888:database/DOC-EXAMPLE-DB",
        "arn:aws:glue:us-east-1:5555666677778888:table/DOC-EXAMPLE-DB/*"
      ]
    }
  ]
}

Note: If you are using Athena with the AWS Glue Data Catalog, then be sure to include the default database in the policy for the GetDatabase and CreateDatabase actions to succeed. For more information, see Fine-grained access to databases and tables in the AWS Glue Data Catalog.

Be sure to replace the following values in the above policy:

  • 1111222233334444 with the account ID for account A
  • 5555666677778888 with the account ID for account B
  • us-east-1 with the Region of your choice
  • DOC-EXAMPLE-DB with the name of the database

3.    After providing the required permission to account A, you can create an ETL job with the following script to test if account A has access to the Data Catalog in account B.

"""Create Spark Session with cross-account AWS Glue Data Catalog"""
from pyspark.sql import SparkSession

spark_session = SparkSession.builder.appName("Spark Glue Example") \
.config("hive.metastore.client.factory.class", \
"com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory") \
.config("hive.metastore.glue.catalogid", "5555666677778888") \
.enableHiveSupport() \
.getOrCreate()

table_df = spark_session.sql("SELECT * FROM DOC-EXAMPLE-DB.DOC-EXAMPLE-TABLE limit 10")

table_df.show()

Be sure to replace the following values in the script:

  • 5555666677778888 with the account ID for account B
  • DOC-EXAMPLE-DB with the name of the database
  • DOC-EXAMPLE-TABLE with the name of the table

Accessing resources with EMR

To access the Data Catalog in a different account with EMR, see How can I use Hive and Spark on Amazon EMR to query an AWS Glue Data Catalog that's in a different AWS account?

Accessing resources with Athena

To access the Data Catalog in a different account with Athena, see Registering an AWS Glue Data Catalog from another account.

Accessing the Data Catalog with Lake Formation

If you are using Lake Formation, then you can use either of the following methods to grant cross-account access to the Data Catalog:

  • Named resource
  • Tag-based access control (TBAC)

Important: To prevent new tables in the Data Catalog from having a default Super permission to IAMAllowedPrincipals, do the following:

  1. Open the AWS Lake Formation console.
  2. In the navigation pane, choose Data Catalog, and then choose Settings.
  3. Clear both Use only IAM access control for new databases and Use only IAM access control for new tables in this database.
  4. Choose Save.

For more information, see Changing the default security settings for your data lake.

Granting cross-account permissions using the named resource method

AWS Resource Access Manager (AWS RAM) is used for providing database permissions by using the named resource method.

To grant Lake Formation permissions to account A for the Data Catalog resources in account B, do the following:

1.    Attach a resource policy similar to the following in the Data Catalog for account B:

{
  "Effect": "Allow",
  "Action": [
    "glue:ShareResource"
  ],
  "Principal": {
    "Service": [
      "ram.amazonaws.com"
    ]
  },
  "Resource": [
    "arn:aws:glue:us-east-1:5555666677778888:table/*/*",
    "arn:aws:glue:us-east-1:5555666677778888:database/*",
    "arn:aws:glue:us-east-1:5555666677778888:catalog"
  ]
}

Be sure to replace 5555666677778888 in the policy with the account ID for account B.

2.    Use the named resource method to grant Lake Formation permissions to the Data Catalog databases and tables in Account B. For more information, see Granting Data Catalog permissions using the named resource method.

Granting cross-account permissions using TBAC

With TBAC, you can define policy tags and assign these tags to AWS Glue databases, tables, and columns. These tags can then be used to apply fine-grained access to these Data Catalog resources. For more information, see Tag-based access control in Lake Formation.

To grant Lake Formation permissions using TBAC, do the following:

1.    Attach a resource policy similar to the following in Data Catalog for account B:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::1111222233334444:root"
      },
      "Action": "glue:*",
      "Resource": [
        "arn:aws:glue:us-east-1:5555666677778888:catalog",
        "arn:aws:glue:us-east-1:5555666677778888:database/*",
        "arn:aws:glue:us-east-1:5555666677778888:table/*"
      ],
      "Condition": {
        "Bool": {
          "glue:EvaluatedByLakeFormationTags": true
        }
      }
    }
  ]
}

Be sure to replace the following values in the policy:

  • 1111222233334444 with the account ID for account A
  • 5555666677778888 with the account ID for account B

2.    Create the policy tags.

3.    Use the TBAC method to grant Lake Formation permissions on Data Catalog resources. For more information, see Granting Data Catalog permissions using the TBAC method.