How to enforce Amazon S3 Access Grants with Immuta
Amazon Simple Storage Service (Amazon S3) is the most popular object storage platform for modern data lakes. Organizations today evolved to adopt a lake house architecture that combines the scalability and cost effectiveness of data lakes with the performance and ease-of-use of data warehouses. Likewise, Amazon S3 plays an increasingly important role as the foundational storage layer for the modern lake house. However, as the lake house architecture evolves to be more complex to involve S3 with a growing number of popular products like Databricks and Snowflake, managing access across your entire data lake stack can become complex and challenging.
What is Amazon S3 used for?
Organizations use Amazon S3 to store data for many use cases. In the world of data lakes, Amazon S3 is used for data science, machine learning (ML), and analytics. For example, health and life sciences solutions analyze medical images and patient records to diagnose, monitor, and provide medical recommendations. Retail solutions leverage sales and marketing data and social sentiment analysis to optimize user relations or create the best offers. Financial services solutions analyze user activities for fraud detection and risk mitigation.
Organizations choose Amazon S3 because of its scale. Today Amazon S3 stores more than 350 trillion objects with over 100 million requests per second to process workloads powered by artificial intelligence (AI) and data analytics. Additionally, Amazon S3 can store all kinds of data. Some common data types might include:
- Text or binary data: documents, images, videos, audio, and digital files
- Logs: data from servers, applications, and devices
- Machine learning: datasets for ML and analytics
- Mobile and IoT: data generated by mobile applications and Internet of Things (IoT) sensors and devices
Why do you need S3 Access Control?
Protecting Amazon S3 data and making sure of safe and proper use is a top priority for data owners, platform administrators, compliance and governance officers, and security stakeholders. Data lake users want a simple, scalable, and centralized solution to protect Amazon S3 data to maintain compliance with internal and external regulations, while still enabling users to access and analyze the Amazon S3 data to drive business outcomes. Protecting non-tabular data stored in Amazon S3 needs governance and access controls. Some common use cases include:
- Data exploration and querying of non-tabular data files for data science projects.
- Staging and preparing of new raw data before it is loaded into tables.
- Training and optimizing ML models on large unstructured data sets such as images, audio, text, and video files.
- Support for table-level access controls at the folder- or partition-level.
- Managing data access at the storage-tier from S3A-based clients such as open-source Apache Spark or from managed AWS services such as Apache Spark on Amazon EMR.
Immuta’s native integration with Amazon S3 allows you to centralize permission management across even the most complex data lake stacks. By providing a metadata-driven approach to policy logic management, Immuta removes the need to focus on managing one bucket at a time. This saves time, makes teams more efficient, and delivers a more performant solution for data users. Let’s see how it works.
Enforcing S3 Access Control using S3 Access Grants
To adhere to the principle of least privilege, users define granular access to their Amazon S3 data based on application, personas, groups, or organizational units (OUs). Users use various approaches to achieve granular access to data in Amazon S3, depending on the scale and complexity of the access patterns. The most common approach, which is highly effective for managing access to small-to-medium numbers of datasets in Amazon S3 and AWS Identity and Access Management (IAM) Principals, is to define IAM permission policies and S3 bucket policies. This strategy works if the necessary policies fit within the policy size limits and the number of IAM principals per account.
However, as the number of datasets and use cases scales, users look beyond IAM for solutions that are tailored for the data lake patterns. To address these needs, AWS developed S3 Access Grants. This feature provides a dynamic and scalable access control solution for Amazon S3 data used for data and analytics workloads. It supports users with a complex or large permission configuration and scales Amazon S3 data permissions for users, roles, and applications.
Immuta collaborated with AWS to develop an integration that uses the new S3 Access Grants features. With this integration, users who have deployed complex data stacks, such as Amazon S3, Databricks, and Snowflake, can now centralize access control management using Immuta and can use Immuta’s attribute-based access control to grant read/write permissions against objects in Amazon S3.
Other AWS services, such as Apache Spark on Amazon EMR or open-source Spark via the S3A connector, are integrated with S3 Access Grants. These integrations mean that Immuta’s scalable and attribute-based policies will apply when users create Apache Spark on Amazon EMR or open-source Spark jobs to access S3 data.
With Immuta’s integration with Amazon S3 through S3 Access Grants, you can centralize and simplify your data lake permission management across the stack, and provide greater visibility into your organization’s usage of Amazon S3 data.
How Immuta’s Amazon S3 integration works
Immuta’s Amazon S3 integration allows you to map object access to users or IAM roles based on user and object attributes. Leverage Amazon Macie to detect file contents, you can use Immuta to attach data source-level tags to the Amazon S3 prefix-based data sources through Immuta UI or API. Then, those tags are used to create policies that protect data sources at the Amazon S3 prefix-level.
The Immuta policy editor allows any user, regardless of technical expertise, to create and manage subscription policies on their S3 objects, making sure global policies can be applied to meet organizational standards and encourage policy reuse. This reduces workflow bottlenecks since all users are empowered to understand, maintain, and approve policies.
With Immuta, you can define understandable policy logic that natively controls this access in Amazon S3, as shown in the following figure.
In turn, Immuta uses S3 Access Grants to control this access. Now, users who access data in Amazon S3 can only see the data listed by this Immuta policy. Users accessing S3 data using Apache Spark on Amazon EMR can only access the data that meets these criteria.
Why do you need Immuta and S3 Access Grants?
Adeptly managing your storage layer is crucial, as your Amazon S3 data lake continues to grow in size and scope. Immuta offers the following benefits:
- Consistently apply policies across data platforms, such as Amazon S3, Snowflake, Databricks, Amazon Redshift, Google BigQuery, Azure Synapse, and Starburst (Trino).
- Reduce operational costs with fewer policies and eliminate manual coding.
- Empower any user to create and manage policies on and inside their buckets while making sure global policies are applied to meet company standards.
- Use the integrations between S3 Access Grants and Apache Spark on Amazon EMR, or open source Spark via the S3A connector, to process big data.
This combination of outcomes helps you unlock more value from your Amazon S3 data. LMI, a consultancy dedicated to powering a future-ready government, used Immuta and Amazon S3 to accelerate data access by 10x, while fully protecting more than one billion government equipment maintenance records. Read their full story.
“By leveraging this new release from Immuta that integrates with S3 Access Grants, we envision a single control plane for Booking.com data owners and governors to manage access at scale for all S3 resources ingested into our data lake (both structured and unstructured). Moreover, as this integration is based on a new S3 native access control capability, it gives us confidence that controls will be enforced consistently, no matter which technology data consumers will choose to access the data.”
– Luca Falsina, Principal Software Engineer I, Booking.com
Immuta continues to expand its market-leading Data Security Platform to secure all cloud data. The recently announced Amazon S3 integration allows you to centralize access control management using Immuta and can use Immuta’s attribute-based access control to manage Amazon S3 permissions.