AWS Big Data Blog

Category: AWS Lake Formation

Effective data lakes using AWS Lake Formation, Part 1: Getting started with governed tables

Thousands of customers are building their data lakes on Amazon Simple Storage Service (Amazon S3). You can use AWS Lake Formation to build your data lakes easily—in a matter of days as opposed to months. However, there are still some difficult challenges to address with your data lakes: Supporting streaming updates and deletes in your data […]

Read More

How FanDuel Group secures personally identifiable information in a data lake using AWS Lake Formation

This post is co-written with Damian Grech from FanDuel FanDuel Group is an innovative sports-tech entertainment company that is changing the way consumers engage with their favorite sports, teams, and leagues. The premier gaming destination in the US, FanDuel Group consists of a portfolio of leading brands across gaming, sports betting, daily fantasy sports, advance-deposit […]

Read More

Controlling data lake access across multiple AWS accounts using AWS Lake Formation

When deploying data lakes on AWS, you can use multiple AWS accounts to better separate different projects or lines of business. In this post, we see how the AWS Lake Formation cross-account capabilities simplify securing and managing distributed data lakes across multiple accounts through a centralized approach, providing fine-grained access control to the AWS Glue […]

Read More

Managing COVID-19 exposure with crowd tracing

This is a guest blog post by AWS partner Aspire Ventures As we enter winter, with fewer options to be outdoors, our personal choices can impact our risk of contracting the COVID-19 virus even more. The New England Journal of Medicine publication showed real-world examples of the effectiveness of masks and social distancing in mitigating […]

Read More

Creating a source to Lakehouse data replication pipe using Apache Hudi, AWS Glue, AWS DMS, and Amazon Redshift

February 2021 update – Please refer to the post Writing to Apache Hudi tables using AWS Glue Custom Connector to learn about an easier mechanism to write to Hudi tables using AWS Glue Custom Connector. In this post, we include the modified Apache Hudi JARs as an external dependency. The AWS Glue Custom Connector feature […]

Read More

AWS serverless data analytics pipeline reference architecture

Onboarding new data or building new analytics pipelines in traditional analytics architectures typically requires extensive coordination across business, data engineering, and data science and analytics teams to first negotiate requirements, schema, infrastructure capacity needs, and workload management. For a large number of use cases today however, business users, data scientists, and analysts are demanding easy, […]

Read More

Anonymize and manage data in your data lake with Amazon Athena and AWS Lake Formation

Most organizations have to comply with regulations when dealing with their customer data. For that reason, datasets that contain personally identifiable information (PII) is often anonymized. A common example of PII can be tables and columns that contain personal information about an individual (such as first name and last name) or tables with columns that, if joined with another table, can trace back to an individual. You can use AWS Analytics services to anonymize your datasets. In this post, I describe how to use Amazon Athena to anonymize a dataset.  You can then use AWS Lake Formation to provide the right access to the right personas.

Read More

Enable fine-grained data access in Zeppelin Notebook with AWS Lake Formation

This post explores how you can use AWS Lake Formation integration with Amazon EMR (still in beta) to implement fine-grained column-level access controls while using Spark in a Zeppelin Notebook. My previous post Extract Salesforce.com data using AWS Glue and analyzing with Amazon Athena showed you a simple use case for extracting any Salesforce object data using AWS Glue and Apache Spark, saving it to Amazon Simple Storage Service (Amazon S3), cataloging the data using the Data Catalog in Glue, and querying it using Amazon Athena.

Read More

Control data access and permissions with AWS Lake Formation and Amazon EMR

What if you could control the access to your data lake centrally? Would it be more convenient to share specific data securely with internal and external customers? With AWS Lake Formation and its integration with Amazon EMR, you can easily perform these administrative tasks. This post goes through a use case and reviews the steps to control the data access and permissions of your existing data lake.

Read More

Enable fine-grained permissions for Amazon QuickSight authors in AWS Lake Formation

This post demonstrates how to extend the Lake Formation security model to QuickSight users and groups, which allows data lake administrators to manage data catalog resource permissions centrally from one console. As organizations embark on the journey to secure their data lakes with Lake Formation, having the ability to centrally manage fine-grained permissions for QuickSight authors can extend the data governance and enforcement of security controls at the data consumption (business intelligence) layer. You can enable these fine-grained permissions for QuickSight users and groups at the database, table, or column level, and they’re reflected in the Athena dataset in QuickSight.

Read More