AWS Big Data Blog

Integrate custom applications with AWS Lake Formation – Part 1

AWS Lake Formation makes it straightforward to centrally govern, secure, and globally share data for analytics and machine learning (ML).

With Lake Formation, you can centralize data security and governance using the AWS Glue Data Catalog, letting you manage metadata and data permissions in one place with familiar database-style features. It also delivers fine-grained data access control, so you can make sure users have access to the right data down to the row and column level.

Lake Formation also makes it straightforward to share data internally across your organization and externally, which lets you create a data mesh or meet other data sharing needs with no data movement.

Additionally, because Lake Formation tracks data interactions by role and user, it provides comprehensive data access auditing to verify the right data was accessed by the right users at the right time.

In this two-part series, we show how to integrate custom applications or data processing engines with Lake Formation using the third-party services integration feature.

In this post, we dive deep into the required Lake Formation and AWS Glue APIs. We walk through the steps to enforce Lake Formation policies within custom data applications. As an example, we present a sample Lake Formation integrated application implemented using AWS Lambda.

The second part of the series introduces a sample web application built with AWS Amplify. This web application showcases how to use the custom data processing engine implemented in the first post.

By the end of this series, you will have a comprehensive understanding of how to extend the capabilities of Lake Formation by building and integrating your own custom data processing components.

Integrate an external application

The process of integrating a third-party application with Lake Formation is described in detail in How Lake Formation application integration works.

In this section, we dive deeper into the steps required to establish trust between Lake Formation and an external application, the API operations that are involved, and the AWS Identity and Access Management (IAM) permissions that must be set up to enable the integration.

Lake Formation application integration external data filtering

In Lake Formation, it’s possible to control which third-party engines or applications are allowed to read and filter data in Amazon Simple Storage Service (Amazon S3) locations registered with Lake Formation.

To do so, you can navigate to the Application integration settings page on the Lake Formation console and enable Allow external engines to filter data in Amazon S3 locations registered with Lake Formation, specifying the AWS account IDs from where third-party engines are allowed to access locations registered with Lake Formation. In addition, you have to specify the allowed session tag values to identify trusted requests. We discuss in later sections how these tags are used.

LakeFormation Application integration

Lake Formation application integration involved AWS APIs

The following is a list of the main AWS APIs needed to integrate an application with Lake Formation:

  • sts:AssumeRole – Returns a set of temporary security credentials that you can use to access AWS resources.
  • glue:GetUnfilteredTableMetadata – Allows a third-party analytical engine to retrieve unfiltered table metadata from the Data Catalog.
  • glue:GetUnfilteredPartitionsMetadata – Retrieves partition metadata from the Data Catalog that contains unfiltered metadata.
  • lakeformation:GetTemporaryGlueTableCredentials – Allows a caller in a secure environment to assume a role with permission to access Amazon S3. To vend such credentials, Lake Formation assumes the role associated with a registered location, for example an S3 bucket, with a scope down policy that restricts the access to a single prefix.
  • lakeformation:GetTemporaryGluePartitionCredentials – This API is identical to GetTemporaryTableCredentials except that it’s used when the target Data Catalog resource is of type Partition. Lake Formation restricts the permission of the vended credentials with the same scope down policy that restricts access to a single Amazon S3 prefix.

Later in this post, we present a sample architecture illustrating how you can use these APIs.

External application and IAM roles to access data

For an external application to access resources in an Lake Formation environment, it needs to run under an IAM principal (user or role) with the appropriate credentials. Let’s consider a scenario where the external application runs under the IAM role MyApplicationRole that is part of the AWS account 123456789012.

In Lake Formation, you have granted access to various tables and databases to two specific IAM roles:

  • AccessRole1
  • AccessRole2

To enable MyApplicationRole to access the resources that have been granted to AccessRole1 and AccessRole2, you need to configure the trust relationships for these access roles. Specifically, you need to configure the following:

  • Allow MyApplicationRole to assume each of the access roles (AccessRole1 and AccessRole2) using the sts:AssumeRole
  • Allow MyApplicationRole to tag the assumed session with a specific tag, which is required by Lake Formation. The tag key should be LakeFormationAuthorizedCaller, and the value should match one of the session tag values specified in the Application integration settings page on the Lake Formation console (for example, “application1“).

The following code is an example of the trust relationships configuration for an access role (AccessRole1 or AccessRole2):

[
    {
        "Effect": "Allow",
        "Principal": {
            "AWS": "arn:aws:iam::123456789012:role/MyApplicationRole"
        },
        "Action": "sts:AssumeRole"
    },
    {
        "Effect": "Allow",
        "Principal": {
            "AWS": "arn:aws:iam::123456789012:role/MyApplicationRole"
        },
        "Action": "sts:TagSession",
        "Condition": {
            "StringEquals": {
                "aws:RequestTag/LakeFormationAuthorizedCaller": "application1"
            }
        }
    }
]

Additionally, the data access IAM roles (AccessRole1 and AccessRole2) must have the following IAM permissions assigned in order to read Lake Formation protected tables:

{
    "Version": "2012-10-17",
    "Statement": {
        "Sid": "LakeFormationManagedAccess",
        "Effect": "Allow",
        "Action": [
            "lakeformation:GetDataAccess",
            "glue:GetTable",
            "glue:GetTables",
            "glue:GetDatabase",
            "glue:GetDatabases",
            "glue:GetPartition",
            "glue:GetPartitions"
        ],
        "Resource": "*"
    }
}

Solution overview

For our solution, Lambda serves as our external trusted engine and application integrated with Lake Formation. This example is provided in order to understand and see in action the access flow and the Lake Formation API responses. Because it’s based on a single Lambda function, it’s not meant to be used in production settings or with high volumes of data.

Moreover, the Lambda based engine has been configured to support a limited set of data files (CSV, Parquet, and JSON), a limited set of table configurations (no nested data), and a limited set of table operations (SELECT only). Due to these limitations, the application should not be used for arbitrary tests.

In this post, we provide instructions on how to deploy a sample API application integrated with Lake Formation that implements the solution architecture. The core of the API is implemented with a Python Lambda function. We also show how to test the function with Lambda tests. In the second post in this series, we provide instructions on how to deploy a web frontend application that integrates with this Lambda function.

Access flow for unpartitioned tables

The following diagram summarizes the access flow when accessing unpartitioned tables.

Solution Architecture - Unpartitioned tables

The workflow consists of the following steps:

  1. User A (authenticated with Amazon Cognito or other equivalent systems) sends a request to the application API endpoint, requesting access to a specific table inside a specific database.
  2. The API endpoint, created with AWS AppSync, handles the request, invoking a Lambda function.
  3. The function checks which IAM data access role the user is mapped to. For simplicity, the example uses a static hardcoded mapping (mappings={ "user1": "lf-app-access-role-1", "user2": "lf-app-access-role-2"}).
  4. The function invokes the sts:AssumeRole API to assume the user-related IAM data access role (lf-app-access-role-1AccessRole1). The AssumeRole operation is performed with the tag LakeFormationAuthorizedCaller, having as its value one of the session tag values specified when configuring the application integration settings in Lake Formation (for example, {'Key': 'LakeFormationAuthorizedCaller','Value': 'application1'}). The API returns a set of temporary credentials, which we refer to as StsCredentials1.
  5. Using StsCredentials1, the function invokes the glue:GetUnfilteredTableMetadata API, passing the requested database and table name. The API returns information like table location, a list of authorized columns, and data filters, if defined.
  6. Using StsCredentials1, the function invokes the lakeformation:GetTemporaryGlueTableCredentials API, passing the requested database and table name, the type of requested access (SELECT), and CELL_FILTER_PERMISSION as the supported permission types (because the Lambda function implements logic to apply row-level filters). The API returns a set of temporary Amazon S3 credentials, which we refer to as S3Credentials1.
  7. Using S3Credentials1, the function lists the S3 files stored in the table location S3 prefix and downloads them.
  8. The retrieved Amazon S3 data is filtered to remove those columns and rows that the user is not allowed access to (authorized columns and row filters were retrieved in Step 5) and authorized data is returned to the user.

Access flow for partitioned tables

The following diagram summarizes the access flow when accessing partitioned tables.

Solution Architecture - Partitioned tables

The steps involved are almost identical to the ones presented for partitioned tables, with the following changes:

  • After invoking the glue:GetUnfilteredTableMetadata API (Step 5) and identifying the table as partitioned, the Lambda function invokes the glue:GetUnfilteredPartitionsMetadata API using StsCredentials1 (Step 6). The API returns, in addition to other information, the list of partition values and locations.
  • For each partition, the function performs the following actions:
    • Invokes the lakeformation:GetTemporaryGluePartitionCredentials API (Step 7), passing the requested database and table name, the partition value, the type of requested access (SELECT), and CELL_FILTER_PERMISSION as the supported permissions type (because the Lambda function implements logic to apply row-level filters). The API returns a set of temporary Amazon S3 credentials, which we refer to as S3CredentialsPartitionX.
    • Uses S3CredentialsPartitionX to list the partition location S3 files and download them (Step 8).
  • The function appends the retrieved data.
  • Before the Lambda function returns the results to the user (Step 9), the retrieved Amazon S3 data is filtered to remove those columns and rows that the user is not allowed access to (authorized columns and row filters were retrieved in Step 5).

Prerequisites

The following prerequisites are needed to deploy and test the solution:

  • Lake Formation should be enabled in the AWS Region where the sample application will be deployed
  • The steps must be run with an IAM principal with sufficient permissions to create the needed resources, including Lake Formation databases and tables

Deploy solution resources with AWS CloudFormation

We create the solution resources using AWS CloudFormation. The provided CloudFormation template creates the following resources:

  • One S3 bucket to store table data (lf-app-data-<account-id>)
  • Two IAM roles, which will be mapped to client users and their associated Lake Formation permission policies (lf-app-access-role-1 and lf-app-access-role-2)
  • Two IAM roles used for the two created Lambda functions (lf-app-lambda-datalake-population-role and lf-app-lambda-role)
  • One AWS Glue database (lf-app-entities) with two AWS Glue tables, one unpartitioned (users_tbl) and one partitioned (users_partitioned_tbl)
  • One Lambda function used to populate the data lake data (lf-app-lambda-datalake-population)
  • One Lambda function used for the Lake Formation integrated application (lf-app-lambda-engine)
  • One IAM role used by Lake Formation to access the table data and perform credentials vending (lf-app-datalake-location-role)
  • One Lake Formation data lake location (s3://lf-app-data-<account-id>/datasets) associated with the IAM role created for credentials vending (lf-app-datalake-location-role)
  • One Lake Formation data filter (lf-app-filter-1)
  • One Lake Formation tag (key: sensitive, values: true or false)
  • Tag associations to tag the created unpartitioned AWS Glue table (users_tbl) columns with the created tag

To launch the stack and provision your resources, complete the following steps:

  1. Download the code zip bundle for the Lambda function used for the Lake Formation integrated application (lf-integrated-app.zip).
  2. Download the code zip bundle for the Lambda function used to populate the data lake data (datalake-population-function.zip).
  3. Upload the zip bundles to an existing S3 bucket location (for example, s3://mybucket/myfolder1/myfolder2/lf-integrated-app.zip and s3://mybucket/myfolder1/myfolder2/datalake-population-function.zip)
  4. Choose Launch Stack.

This automatically launches AWS CloudFormation in your AWS account with a template. Make sure that you create the stack in your intended Region.

  1. Choose Next to move to the Specify stack details section
  2. For Parameters, provide the following parameters:
    1. For powertoolsLogLevel, specify how verbose the Lambda function logger should be, from the most verbose to the least verbose (no logs). For this post, we choose DEBUG.
    2. For s3DeploymentBucketName, enter the name of the S3 bucket containing the Lambda functions’ code zip bundles. For this post, we use mybucket.
    3. For s3KeyLambdaDataPopulationCode, enter the Amazon S3 location containing the code zip bundle for the Lambda function used to populate the data lake data (datalake-population-function.zip). For example, myfolder1/myfolder2/datalake-population-function.zip.
    4. For s3KeyLambdaEngineCode, enter the Amazon S3 location containing the code zip bundle for the Lambda function used for the Lake Formation integrated application (lf-integrated-app.zip). For example, myfolder1/myfolder2/lf-integrated-app.zip.
  3. Choose Next.

Cloudformation Create Stack with properties

  1. Add additional AWS tags if required.
  2. Choose Next.
  3. Acknowledge the final requirements.
  4. Choose Create stack.

Enable the Lake Formation application integration

Complete the following steps to enable the Lake Formation application integration:

  1. On the Lake Formation console, choose Application integration settings in the navigation pane.
  2. Enable Allow external engines to filter data in Amazon S3 locations registered with Lake Formation.
  3. For Session tag values, choose application1.
  4. For AWS account IDs, enter the current AWS account ID.
  5. Choose Save.

LakeFormation Application integration

Enforce Lake Formation permissions

The CloudFormation stack created one database named lf-app-entities with two tables named users_tbl and users_partitioned_tbl.

To be sure you’re using Lake Formation permissions, you should confirm that you don’t have any grants set up on those tables for the principal IAMAllowedPrincipals. The IAMAllowedPrincipals group includes any IAM users and roles that are allowed access to your Data Catalog resources by your IAM policies, and it’s used to maintain backward compatibility with AWS Glue.

To confirm Lake Formations permissions are enforced, navigate to the Lake Formation console and choose Data lake permissions in the navigation pane. Filter permissions by Database=lf-app-entities and remove all the permissions given to the principal IAMAllowedPrincipals.

For more details on IAMAllowedPrincipals and backward compatibility with AWS Glue, refer to Changing the default security settings for your data lake.

Check the created Lake Formation resources and permissions

The CloudFormation stack created two IAM roles—lf-app-access-role-1 and lf-app-access-role-2—and assigned them different permissions on the users_tbl (unpartitioned) and users_partitioned_tbl (partitioned) tables. The specific Lake Formation grants are summarized in the following table.

IAM Roles
lf-app-entities (Database)
  users _tbl (Table) _tbl _partitioned_tbl (Table)
lf-app-access-role-1 No access Read access on columns uid, state, and city for all the records. Read access to all columns except for address only on rows with value state=united kingdom.
lf-app-access-role-2 Read access on columns with the tag sensitive = false Read access to all columns and rows.

To better understand the full permissions setup, you should review the CloudFormation created Lake Formation resources and permissions. On the Lake Formation console, complete the following steps:

  1. Review the data filters:
    1. Choose Data filters in the navigation pane.
    2. Inspect the lf-app-filter-1
  2. Review the tags:
    1. Choose LF-Tags and permissions in the navigation pane.
    2. Inspect the sensitive
  3. Review the tag associations:
    1. Choose Tables in the navigation pane.
    2. Choose the users_tbl
    3. Inspect the LF-Tags associated to the different columns in the Schema
  4. Review the Lake Formation permissions:
    1. Choose Data lake permissions in the navigation pane.
    2. Filter by Principal = lf-app-access-role-1 and inspect the assigned permissions.
    3. Filter by Principal = lf-app-access-role-2 and inspect the assigned permissions.

Test the Lambda function

The Lambda function created by the CloudFormation template accepts JSON objects as input events. The JSON events have the following structure:

 {
  "identity": {
    "username": "XXX"
  },
  "fieldName": "YYY",
  "arguments": {
    "AA": "BB",
    ...
  }
}

Although the identity field is always needed in order to identify the called identity, depending on the requested operation (fieldName), different arguments should be provided. The following table lists these arguments.

Operation Description Needed Arguments Output
getDbs List databases No arguments needed List of databases the user has access to
getTablesByDb List tables db: <db_name> List of tables inside a database the user has access to
getUnfilteredTableMetadata Return the table metadata

db: <db_name>

table: <table_name>

Returns the output of the glue:GetUnfilteredTableMetadata API
getUnfilteredPartitionsMetadata Return the table partitions metadata

db: <db_name>

table: <table_name>

Returns the output of the glue:GetUnfilteredPartitionsMetadata API
getTableData Get table data

db: <db_name>

table: <table_name>

noOfRecs: N (number of records to pull)

nonNullRowsOnly: true/false (true to filter out records with all null values)

location: Table location

authorizedData: records of the table the user has access to

allColumns: All the columns of the table (returned only for demonstration and comparison purposes)

allData: All the records of the table without any filtering (returned only for demonstration and comparison purposes)

cellFilters: Lake Formation filters (applied to allData to return authorizedData)

authorizedColumns: Columns to which the user has access to (projection applied to allData to return authorizedData)

To test the Lambda function, you can create some sample Lambda test events. Complete the following steps:

  1. On the Lambda console, choose Functions on the navigation pane.
  2. Choose the lf-app-lambda-engine
  3. On the Test tab, select Create new event.
  4. For Event JSON, enter a valid JSON (we provide some sample JSON events).
  5. Choose Test.

Creata Lambda Test

  1. Check the test results (JSON response).

Lambda Test Result

The following are some sample test events you can try to see how different identities can access different sets of information.

user1 user2
{ 
  "identity": {
    "username": "user1"
  },
  "fieldName": "getDbs"
}
{ 
  "identity": {
    "username": "user2"
  },
  "fieldName": "getDbs"
}
{
  "identity": {
    "username": "user1"
  },
  "fieldName": "getTablesByDb",
  "arguments": {
    "db": "lf-app-entities"
  }
}
{
  "identity": {
    "username": "user2"
  },
  "fieldName": "getTablesByDb",
  "arguments": {
    "db": "lf-app-entities"
  }
}
{
  "identity": {
    "username": "user1"
  },
  "fieldName": "getUnfilteredTableMetadata",
  "arguments": {
    "db": "lf-app-entities",
    "table": "users_tbl" 
  }
}
{
  "identity": {
    "username": "user2"
  },
  "fieldName": "getUnfilteredTableMetadata",
  "arguments": {
    "db": "lf-app-entities",
    "table": "users_tbl" 
  }
}
{
  "identity": {
    "username": "user1"
  },
  "fieldName": "getUnfilteredTableMetadata",
  "arguments": {
    "db": "lf-app-entities",
    "table": "users_partitioned_tbl" 
  }
}
{
  "identity": {
    "username": "user2"
  },
  "fieldName": "getUnfilteredTableMetadata",
  "arguments": {
    "db": "lf-app-entities",
    "table": "users_partitioned_tbl" 
  }
}
{
  "identity": {
    "username": "user1"
  },
  "fieldName": "getUnfilteredPartitionsMetadata",
  "arguments": {
    "db": "lf-app-entities",
    "table": "users_tbl" 
  }
}
{
  "identity": {
    "username": "user2"
  },
  "fieldName": "getUnfilteredPartitionsMetadata",
  "arguments": {
    "db": "lf-app-entities",
    "table": "users_tbl" 
  }
}
{
  "identity": {
    "username": "user1"
  },
  "fieldName": "getUnfilteredPartitionsMetadata",
  "arguments": {
    "db": "lf-app-entities",
    "table": "users_partitioned_tbl" 
  }
}
{
  "identity": {
    "username": "user2"
  },
  "fieldName": "getUnfilteredPartitionsMetadata",
  "arguments": {
    "db": "lf-app-entities",
    "table": "users_partitioned_tbl" 
  }
}
{
  "identity": {
    "username": "user1"
  },
  "fieldName": "getTableData",
  "arguments": {
    "db": "lf-app-entities",
    "table": "users_tbl",
    "noOfRecs": 10,
    "nonNullRowsOnly": true
  }
}
{
  "identity": {
    "username": "user2"
  },
  "fieldName": "getTableData",
  "arguments": {
    "db": "lf-app-entities",
    "table": "users_tbl",
    "noOfRecs": 10,
    "nonNullRowsOnly": true
  }
}
{
  "identity": {
    "username": "user1"
  },
  "fieldName": "getTableData",
  "arguments": {
    "db": "lf-app-entities",
    "table": "users_partitioned_tbl",
    "noOfRecs": 10,
    "nonNullRowsOnly": true
  }
}
{
  "identity": {
    "username": "user2"
  },
  "fieldName": "getTableData",
  "arguments": {
    "db": "lf-app-entities",
    "table": "users_partitioned_tbl",
    "noOfRecs": 10,
    "nonNullRowsOnly": true
  }
}

As an example, in the following test, we request users_partitioned_tbl table data in the context of user1:

{
  "identity": {
    "username": "user1"
  },
  "fieldName": "getTableData",
  "arguments": {
    "db": "lf-app-entities",
    "table": "users_partitioned_tbl",
    "noOfRecs": 10,
    "nonNullRowsOnly": true
  }
}

The following is the related API response:

{
  "database": "lf-app-entities",
  "name": "users_partitioned_tbl",
  "location": "s3://lf-app-data-123456789012/datasets/lf-app-entities/users_partitioned/",
  "authorizedColumns": [
    {
      "Name": "born_year",
      "Type": "string"
    },
    {
      "Name": "city",
      "Type": "string"
    },
    {
      "Name": "name",
      "Type": "string"
    },
    {
      "Name": "state",
      "Type": "string"
    },
    {
      "Name": "surname",
      "Type": "string"
    },
    {
      "Name": "uid",
      "Type": "int"
    }
  ],
  "authorizedData": [
    [
      "1980",
      "bristol",
      "emily",
      "united kingdom",
      "brown",
      4
    ],
    [
      "1980",
      "vancouver",
      "<FILTEREDCELL>",
      "canada",
      "<FILTEREDCELL>",
      5
    ],
    [
      "1980",
      "madrid",
      "<FILTEREDCELL>",
      "spain",
      "<FILTEREDCELL>",
      6
    ],
    [
      "1980",
      "mexico city",
      "<FILTEREDCELL>",
      "mexico",
      "<FILTEREDCELL>",
      10
    ],
    [
      "1980",
      "zurich",
      "<FILTEREDCELL>",
      "switzerland",
      "<FILTEREDCELL>",
      11
    ],
    [
      "1980",
      "buenos aires",
      "<FILTEREDCELL>",
      "argentina",
      "<FILTEREDCELL>",
      12
    ],
    [
      "1990",
      "london",
      "john",
      "united kingdom",
      "pike",
      1
    ],
    [
      "1990",
      "milan",
      "<FILTEREDCELL>",
      "italy",
      "<FILTEREDCELL>",
      2
    ],
    [
      "1990",
      "berlin",
      "<FILTEREDCELL>",
      "germany",
      "<FILTEREDCELL>",
      3
    ],
    [
      "1990",
      "munich",
      "<FILTEREDCELL>",
      "germany",
      "<FILTEREDCELL>",
      7
    ]
  ],
  "allColumns": [
    {
      "Name": "address",
      "Type": "string"
    },
    {
      "Name": "born_year",
      "Type": "string"
    },
    {
      "Name": "city",
      "Type": "string"
    },
    {
      "Name": "name",
      "Type": "string"
    },
    {
      "Name": "state",
      "Type": "string"
    },
    {
      "Name": "surname",
      "Type": "string"
    },
    {
      "Name": "uid",
      "Type": "int"
    }
  ],
  "allData": [
    [
      "beautiful avenue 123",
      "1980",
      "bristol",
      "emily",
      "united kingdom",
      "brown",
      4
    ],
    [
      "lake street 45",
      "1980",
      "vancouver",
      "david",
      "canada",
      "lee",
      5
    ],
    [
      "plaza principal 6",
      "1980",
      "madrid",
      "sophia",
      "spain",
      "luz",
      6
    ],
    [
      "avenida de arboles 40",
      "1980",
      "mexico city",
      "olivia",
      "mexico",
      "garcia",
      10
    ],
    [
      "pflanzenstrasse 34",
      "1980",
      "zurich",
      "lucas",
      "switzerland",
      "fischer",
      11
    ],
    [
      "avenida de luces 456",
      "1980",
      "buenos aires",
      "isabella",
      "argentina",
      "afortunado",
      12
    ],
    [
      "hidden road 78",
      "1990",
      "london",
      "john",
      "united kingdom",
      "pike",
      1
    ],
    [
      "via degli alberi 56A",
      "1990",
      "milan",
      "mario",
      "italy",
      "rossi",
      2
    ],
    [
      "green road 90",
      "1990",
      "berlin",
      "july",
      "germany",
      "finn",
      3
    ],
    [
      "parkstrasse 789",
      "1990",
      "munich",
      "oliver",
      "germany",
      "schmidt",
      7
    ]
  ],
  "filteredCellPh": "<FILTEREDCELL>",
  "cellFilters": [
    {
      "ColumnName": "born_year",
      "RowFilterExpression": "TRUE"
    },
    {
      "ColumnName": "city",
      "RowFilterExpression": "TRUE"
    },
    {
      "ColumnName": "name",
      "RowFilterExpression": "state='united kingdom'"
    },
    {
      "ColumnName": "state",
      "RowFilterExpression": "TRUE"
    },
    {
      "ColumnName": "surname",
      "RowFilterExpression": "state='united kingdom'"
    },
    {
      "ColumnName": "uid",
      "RowFilterExpression": "TRUE"
    }
  ]
}

To troubleshoot the Lambda function, you can navigate to the Monitoring tab, choose View CloudWatch logs, and inspect the latest log stream.

Clean up

If you plan to explore Part 2 of this series, you can skip this part, because you will need the resources created here. You can refer to this section at the end of your testing.

Complete the following steps to remove the resources you created following this post and avoid incurring additional costs:

  1. On the AWS CloudFormation console, choose Stacks in the navigation pane.
  2. Choose the stack you created and choose Delete.

Additional considerations

In the proposed architecture, Lake Formation permissions were granted to specific IAM data access roles that requesting users (for example, the identity field) were mapped to. Another possibility is to assign permissions in Lake Formation to SAML users and groups and then work with the AssumeDecoratedRoleWithSAML API.

Conclusion

In the first part of this series, we explored how to integrate custom applications and data processing engines with Lake Formation. We delved into the required configuration, APIs, and steps to enforce Lake Formation policies within custom data applications. As an example, we presented a sample Lake Formation integrated application built on Lambda.

The information provided in this post can serve as a foundation for developing your own custom applications or data processing engines that need to operate on an Lake Formation protected data lake.

Refer to the second part of this series to see how to build a sample web application that uses the Lambda based Lake Formation application.


About the Authors

Stefano Sandona Picture Stefano Sandonà is a Senior Big Data Specialist Solution Architect at AWS. Passionate about data, distributed systems, and security, he helps customers worldwide architect high-performance, efficient, and secure data platforms.

Francesco Marelli PictureFrancesco Marelli is a Principal Solutions Architect at AWS. He specializes in the design, implementation, and optimization of large-scale data platforms. Francesco leads the AWS Solution Architect (SA) analytics team in Italy. He loves sharing his professional knowledge and is a frequent speaker at AWS events. Francesco is also passionate about music.