Simplify and scale access management to shared datasets with cross-account Amazon S3 Access Points

In today’s interconnected and data centric world, businesses must have access to the right data for data-driven decision-making, ultimately driving better business results. Collecting all the relevant data takes time and capital as it requires setting up data ingestion pipelines, hiring analysts to validate and interpret the data, and incorporating data insights that influence important business decisions. Instead, customers can get access to useful data much faster through implementation of distributed data architecture, also known as data mesh architecture. Data mesh is an architectural framework that unites the disparate data sources and links them together through centrally managed data sharing and governance guidelines. Data mesh makes data discoverable, widely accessible, secure, and interoperable, thus giving customers improved decision-making capabilities and faster time to value.

Customers can implement data mesh architecture internally across different business units in an organization and/or externally with the help of data marketplaces such as AWS Data Exchange. Today, more customers are adopting multi-account distributed data architecture internally in their organization to eliminate dependencies on a single owner to generate and manage data. In this model, AWS accounts are aligned to various business domains. Each account manages data-as-a-product independently and securely within the domain, and shares that data with multiple users and applications across the organization. Externally, with the help of data marketplace platforms, customers can start using the data they want in production as soon as they license it, without spending months building data ingestion pipelines to get it there. In this model “data providers”, customers who want to share datasets, offer data-as-a-service to “subscribers”, customers who want to access these datasets. Customers can find advertising, business intelligence, demographics, research, market data, and more easily on data marketplaces.

These emerging data mesh architectures are creating a new model for data sharing across accounts. While such architectures reduce bottlenecks and silos in data management, managing access for shared data can become complex for data owners as their applications scale to support more datasets and users.

In this blog, we explain cross-account S3 Access Points and how this feature helps customers simplify and scale access management in cases of cross-account access patterns as seen in data mesh architectures.

About cross-account Amazon S3 Access Points

In 2019, AWS launched Amazon S3 Access Points for bucket owners to easily create thousands of access configurations without having to manage a single bucket policy that spans multiple access patterns as their application and storage footprint scales. At re:Invent 2022, AWS launched cross-account Amazon S3 Access Points for bucket owners to delegate access management to trusted AWS accounts to create their own access points. By using this feature, bucket owners can grant data access to cross-account users without managing IAM roles or multiple access point policies, or maintaining duplicate copies of data. The trusted account can then define permissions to provide specific access to their own end users. This simplifies access management for multiple trusted users and doesn’t require the bucket owner to configure each of those permissions themselves. These trusted accounts can enforce distinct permissions by prefixes, object tags, and network controls on resources in the bucket owner’s account.

In cases of data-as-a-service, a data provider can delegate access management to the third-party data marketplace businesses such as AWS Data Exchange for Amazon S3 to independently manage access permissions for all data subscribers on a provider’s behalf. Similarly, for internal distributed data architectures, individual domain data owners can delegate access management to other trusted partner accounts in the organizations to define more specific access policies on shared data for the end users. This helps all data owners scale and manage complex, multi-tenant and cross-account access patterns at ease. Let’s learn how to create and use these cross-account access points.

Creating and using cross-account S3 Access Points

At a high-level, to create cross-account S3 Access Points you must implement the following steps.

First, obtain necessary permissions to access the right set of data from the bucket owner. Bucket owner sets the initial permissions boundary within which the cross-account access point owner can self-serve detailed permissions.
Create an S3 Access Point in your account pointing to the bucket in another account.
Share newly created access point information with end users to access the data.
Access data using the cross-account access point.

Prerequisites: Let’s consider we have two accounts in an organization: Account A (123456789010) in the marketing domain, owns the marketing data in their Amazon S3 bucket (test-bucket-1) and Account B (111122223333) that has product managers and data scientists who require read access to marketing data. These users can leverage this data for multiple use cases that support their business objectives; for example, generate data insights and leverage those findings in driving up adoption of products. For this blog, we will assume the IAM user, Jane, who is a data scientist in Account B, needs permission to GET objects with the prefix ResearchData from bucket (test-bucket-1). To grant access to these users, Account B will be creating a cross-account S3 Access Point on the bucket (test-bucket-1). Before starting, confirm you have s3:CreateAccessPoint permissions to create access points in Account B.

Step 1: Bucket owner grants permission to cross-account access point owner

Bucket owner in Account A updates the bucket policy to authorize requests from the cross-account access point. For the purpose of this blog, here’s an example of a bucket policy that allows GET requests on the bucket from an access point that is created by Account B.

{
	"Version": "2012-10-17",
	"Statement": [
		{
			"Effect": "Allow",
			"Principal": {
				"AWS": "*"
			},
			"Action": "s3:GetObject",
			"Resource": [
				"arn:aws:s3:::test-bucket-1/*"
			],
			"Condition": {
				"StringEquals": {
					"s3:DataAccessPointAccount": "111122223333"
				}
			}
		}
	]
}

Step 2: Create an S3 Access Point

Cross-account access points can be created via any of these methods – S3 console, AWS Command Line Interface (CLI), AWS SDKs, or via the Amazon S3 REST API. For the purposes of this blog, let’s use the S3 console. Go to the Access Points page on the left rail of your S3 console and select the Create access point button. We want to give an IAM user, Jane, permission to GET objects with the prefix ResearchData. We then add janes-researchdata-access-point as the name for this access point in field: Access point name. Select Specify a bucket in another account option and add Bucket owner account ID and the Bucket name. There are options for restricting access to Amazon VPC, which requires an Amazon VPC ID. In this example let’s say we want to allow access from outside the VPC as well, so in the Network origin section, we select Internet. To only allow requests made from a specific virtual private cloud (VPC), refer to the S3 documentation.

create an s3 access point

Next, in the Block Public Access settings for this Access Point section, we leave the Amazon S3 Block Public Access settings as-is:

block public access settings for s3 access points

Now we can add an Access Point policy for Jane to download the data. In this policy, our Principal is user Jane, and the resource is our access point combined with every object with the prefix /ResearchData. View the S3 documentation for more examples on access points policies.

{
    "Version": "2012-10-17",
    "Statement" : [
    {
        "Effect": "Allow",
        "Principal" : { "AWS": "arn:aws:iam:111122223333:user/Jane" },
        "Action" : ["s3:GetObject"],
        "Resource" : "arn:aws:s3:eu-west-3:111122223333:accesspoint/janes-researchdata-access-point/object/ResearchData/*"
     }]
}

Select Create access point.

s3 access point policy

Step 3: Share access point information with users to be able to access the data

To give users access to the data you must share the access point Amazon Resource Names (ARN) with them. These ARNs are similar to bucket ARNs, but they are explicitly typed and encoded to the access point’s AWS Region and the AWS account ID of the access point’s owner. Access point ARNs use the format arn:aws:s3:region:account-id:accesspoint/ accesspoint-name. Alternatively, you can share the access point alias instead of the access point ARN with users. When you create an access point, Amazon S3 automatically generates an alias that you can use instead of an Amazon S3 bucket name for data plane operations. Learn more about these bucket-style alias in the S3 documentation.

Step 4: Access data via cross-account access point

You can access the objects in the bucket via an access point using the AWS Management Console, AWS CLI, AWS SDKs, or the S3 REST APIs. To access the data via the S3 console, just navigate to the Access Points page and select the access point name that you want to read or write data from. To access data via SDKs and the CLI, use the format arn:aws:s3:region:account-id:accesspoint/resource. You will use access points the same way as you use a bucket. For example, assuming I were authenticated as Jane, I could do the following:

aws s3api get-object --key /ResearchData/file.zip --bucket arn:aws:s3:eu-west-3:111122223333:accesspoint/janes-researchdata-access-point download.zip

Alternatively, you can access this data using Amazon Athena for analyzing the marketing data. Athena is a serverless, interactive analytics service that provides a simplified, flexible way to analyze petabytes of data where it lives. Customers can also use access point alias with various AWS services such as: Amazon Redshift, Amazon EMR, and Amazon Sagemaker Feature Store. Additionally, customers can use open source packages, such as Apache Spark and Apache Hive along with Amazon Partner Network (APN) solutions without any code changes.

To view all the access points created in your account, you can use the ListAccessPoints API or visit the Access Points page in the S3 console.

s3 access points page in s3 console

aws s3control list-access-points \    --account-id 111122223333

If you are a data provider offering data-as-a-service, then you can also turn on S3 Requester Pays on your bucket to assign the costs of requests and data downloads to your data subscribers.

Additional considerations

Here are a few of things to keep in mind as you start to make use of this new Amazon S3 feature:

Creating cross-account access points doesn’t grant any access to the data, unless the bucket owner explicitly grants the permission. The bucket owner retains ultimate control of the data and chooses what access control to delegate via the bucket policy.
There is no extra charge for this feature beyond the normal request charges for configuring access points. Access to the data is billed at standard S3 GET and PUT rates. For more information, view the Amazon S3 pricing page.
IAM Access Analyzer generates a finding for bucket owners displaying that a bucket is shared through an access point, and the findings details page will include the cross-account access point ARN. Visit Access Analyzer for S3 documentation for more details.
As a cross-account access point owner, you can view the access point level APIs events such as, CreateAccessPoints, ListAccessPoints, GetAccessPoints, etc. in AWS CloudTrail and in server access logs. As a bucket owner, you can view all the object-level API events such as GetObject received from cross-account access points in the logs.

Conclusion

In this blog, we discussed how to create and use cross-account access points to simplify and scale access to shared data. Using cross-account S3 Access Points, customers sharing data internally over data mesh architectures or sharing data externally over a data marketplace can quickly scale access to a network of users. To further simplify and speed up access management, customers can setup a self-service application that automatically creates cross-account S3 Access Points in the trusted account when data consumers sign up for a shared dataset. Instead of spending months acquiring data on their own, these consumers can access shared datasets to innovate faster and generate data-driven insights that were previously more challenging to uncover. With faster access to relevant data, customers can make more timely informed decisions, ultimately leading to better business results.

Thanks for reading this blog, if you have any comments or questions, don’t hesitate to leave them in the comments section.