How can I provide cross-account access to resources in the AWS Glue Data Catalog?

8 minute read
0

I want to use services, such as Amazon EMR, Amazon Athena, and AWS Glue, with the AWS Glue Data Catalog in another account.

Resolution

The way that you access cross-account resources in the AWS Glue Data Catalog depends on the AWS service that you use to connect. That access method also depends on whether you use AWS Lake Formation to control access to the Data Catalog.

If you aren't sure whether you're using Lake Formation and you want to share a table, run the following AWS Command Line Interface (AWS CLI) command:

aws glue get-table --database-name DOC-EXAMPLE-DB --name DOC-EXAMPLE-TABLE --query 'Table.IsRegisteredWithLakeFormation'

Be sure to replace the following values in this command:

  • DOC-EXAMPLE-DB with the name of the database
  • DOC-EXAMPLE-TABLE with the name of the table

If the command returns true, then you're using Lake Formation.

Note: If you receive errors when running AWS CLI commands, make sure that you're using the most recent version of the AWS CLI.

The Data Catalog could be in a hybrid environment with some tables using Lake Formation and some others using AWS Glue permissions. To upgrade your Data Catalog to Lake Formation, see Upgrading AWS Glue data permissions to the AWS Lake Formation model.

Note: This article covers the solution options for cross-account access within a single AWS Region. Accessing resources in a different Region is beyond the scope of this article. To replicate the Data Catalog from your account to an account in a different AWS Region, see AWS Glue Data Catalog replication utility on GitHub.

Accessing the Data Catalog without Lake Formation

If you aren't using Lake Formation, then do the following to grant resource-level permissions to account A from account B's AWS Glue Data Catalog.

Note: Account A is the extract, transform, and load (ETL) account, and account B is the account where the AWS Glue Data Catalog resources are located.

Accessing resources with an AWS Glue ETL job

1.    Attach a resource policy similar to the following in account B. This allows account A access to the databases and tables from account B. You can attach the policy using the AWS Glue console by choosing Catalog settings in the navigation pane.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::1111222233334444:root"
      },
      "Action": "glue:*",
      "Resource": [
        "arn:aws:glue:us-east-1:5555666677778888:catalog",
        "arn:aws:glue:us-east-1:5555666677778888:database/DOC-EXAMPLE-DB",
        "arn:aws:glue:us-east-1:5555666677778888:table/DOC-EXAMPLE-DB/*"
      ]
    }
  ]
}

Be sure to replace the following values in this policy:

  • 1111222233334444 with the account ID for account A
  • 5555666677778888 with the account ID for account B
  • us-east-1 with the Region of your choice
  • DOC-EXAMPLE-DB with the name of the database

You can also limit access to a specific role in account A that's used to run the job. Do this by including the ARN of the role in the policy. For example:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::1111222233334444:role/service-role/AWSGlueServiceRole-Glue-Test"
      },
      "Action": "glue:*",
      "Resource": [
        "arn:aws:glue:us-east-1:5555666677778888:catalog",
        "arn:aws:glue:us-east-1:5555666677778888:database/DOC-EXAMPLE-DB",
        "arn:aws:glue:us-east-1:5555666677778888:table/DOC-EXAMPLE-DB/*"
      ]
    }
  ]
}

Be sure to replace the following values in this policy:

  • 1111222233334444 with the account ID for account A
  • 5555666677778888 with the account ID for account B
  • us-east-1 with the Region of your choice
  • DOC-EXAMPLE-DB with the name of the database
  • AWSGlueServiceRole-Glue-Test with the ARN of the role that's used to run the ETL job

2.    The AWS Identity and Access Management (IAM) user in account A that runs the ETL job needs access to the databases and tables in account B. In account A, attach an IAM policy to the AWS Glue ETL job's IAM role to access the database and tables in account B:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "glue:GetDatabase",
        "glue:GetConnection",
        "glue:GetTable",
        "glue:GetPartition"
      ],
      "Resource": [
        "arn:aws:glue:us-east-1:5555666677778888:catalog",
        "arn:aws:glue:us-east-1:5555666677778888:database/default",
        "arn:aws:glue:us-east-1:5555666677778888:database/DOC-EXAMPLE-DB",
        "arn:aws:glue:us-east-1:5555666677778888:table/DOC-EXAMPLE-DB/*"
      ]
    }
  ]
}

Note: If you're using Athena with the AWS Glue Data Catalog, then include the default database in the policy. This makes sure that the GetDatabase and CreateDatabase actions succeed. For more information, see Access policy to the Default database and catalog per AWS Region.

Be sure to replace the following values in this policy:

  • 1111222233334444 with the account ID for account A
  • 5555666677778888 with the account ID for account B
  • us-east-1 with the Region of your choice
  • DOC-EXAMPLE-DB with the name of the database

3.    After providing the required permission to account A, you can test if account A has access to the Data Catalog in account B. To test this, create an ETL job with the following script:

"""Create Spark Session with cross-account AWS Glue Data Catalog"""
from pyspark.sql import SparkSession

spark_session = SparkSession.builder.appName("Spark Glue Example") \
.config("hive.metastore.client.factory.class", \
"com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory") \
.config("hive.metastore.glue.catalogid", "5555666677778888") \
.enableHiveSupport() \
.getOrCreate()

table_df = spark_session.sql("SELECT * FROM DOC-EXAMPLE-DB.DOC-EXAMPLE-TABLE limit 10")

table_df.show()

Be sure to replace the following values in the script:

  • 5555666677778888 with the account ID for account B
  • DOC-EXAMPLE-DB with the name of the database
  • DOC-EXAMPLE-TABLE with the name of the table

Accessing resources with Amazon EMR

To access the Data Catalog in a different account with Amazon EMR, see How can I use Hive and Spark on Amazon EMR to query an AWS Glue Data Catalog that's in a different AWS account?

Accessing resources with Athena

To access the Data Catalog in a different account with Athena, see Cross-account access to AWS Glue data catalogs.

Accessing the Data Catalog with Lake Formation

If you are using Lake Formation, then you can use either of the following methods to grant cross-account access to the Data Catalog:

  • Named resource
  • Tag-based access control (TBAC)

Important: To prevent new tables in the Data Catalog from having a default Super permission to IAMAllowedPrincipals, do the following:

1.    Open the AWS Lake Formation console.

2.    In the navigation pane, choose Data Catalog, and then choose Settings.

3.    Clear both Use only IAM access control for new databases and Use only IAM access control for new tables in this database.

4.    Choose Save.

For more information, see Changing the default security settings for your data lake.

Granting cross-account permissions using the named resource method

AWS Resource Access Manager (AWS RAM) is used for providing database permissions by using the named resource method.

To grant Lake Formation permissions to account A for the Data Catalog resources in account B, do the following:

1.    Attach a resource policy similar to the following in the Data Catalog for account B:

{
  "Effect": "Allow",
  "Action": [
    "glue:ShareResource"
  ],
  "Principal": {
    "Service": [
      "ram.amazonaws.com"
    ]
  },
  "Resource": [
    "arn:aws:glue:us-east-1:5555666677778888:table/*/*",
    "arn:aws:glue:us-east-1:5555666677778888:database/*",
    "arn:aws:glue:us-east-1:5555666677778888:catalog"
  ]
}

Be sure to replace 5555666677778888 in the policy with the account ID for account B.

2.    Use the named resource method to grant Lake Formation permissions to the Data Catalog databases and tables in Account B. For more information, see Granting Data Catalog permissions using the named resource method.

Granting cross-account permissions using TBAC

With TBAC, you can define policy tags and assign these tags to AWS Glue databases, tables, and columns. These tags can then be used to apply fine-grained access to these Data Catalog resources. For more information, see Data sharing using tag-based access control.

To grant Lake Formation permissions using TBAC, do the following:

1.    Attach a resource policy similar to the following in Data Catalog for account B:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::1111222233334444:root"
      },
      "Action": "glue:*",
      "Resource": [
        "arn:aws:glue:us-east-1:5555666677778888:catalog",
        "arn:aws:glue:us-east-1:5555666677778888:database/*",
        "arn:aws:glue:us-east-1:5555666677778888:table/*"
      ],
      "Condition": {
        "Bool": {
          "glue:EvaluatedByLakeFormationTags": true
        }
      }
    }
  ]
}

Be sure to replace the following values in the policy:

  • 1111222233334444 with the account ID for account A
  • 5555666677778888 with the account ID for account B

2.    Create the policy tags.

3.    Use the TBAC method to grant Lake Formation permissions on Data Catalog resources. For more information, see Granting Data Catalog permissions using the LF-TBAC method.


Related information

Granting cross-account access

Specifying AWS Glue resource ARNs

About upgrading to the Lake Formation permissions model

Migration between the Hive metastore and the AWS Glue Data Catalog

AWS Glue resource policies for access control

AWS OFFICIAL
AWS OFFICIALUpdated a year ago