Select your cookie preferences

We use essential cookies and similar tools that are necessary to provide our site and services. We use performance cookies to collect anonymous statistics, so we can understand how customers use our site and make improvements. Essential cookies cannot be deactivated, but you can choose “Customize” or “Decline” to decline performance cookies.

If you agree, AWS and approved third parties will also use cookies to provide useful site features, remember your preferences, and display relevant content, including relevant advertising. To accept or decline all non-essential cookies, choose “Accept” or “Decline.” To make more detailed choices, choose “Customize.”

Skip to main content

Guidance for a Secure Data Mesh with Distributed Data Asset Ownership on AWS

Overview

This Guidance shows how you can build a data mesh architecture on AWS to implement a decentralized, domain-driven approach to data management. It gives you the ownership and agility to deliver valuable data products, fostering better decision-making, personalized experiences, and operational efficiencies. The Guidance addresses how various AWS services, users, and key resources can be used for advanced data security challenges through distributed, decentralized ownership in a typical data mesh design. With this Guidance, disparate data sources are effectively united and linked through centrally managed data sharing and governance guidelines. This allows you to maintain control over how shared data is accessed, who accesses it, and the format in which it is accessed.

How it works

Overview

This architecture diagram illustrates an overview of a data mesh design that allows for distributed data ownership and control while providing centralized data sharing and governance to address security challenges. The subsequent diagram highlights the essential AWS services used in implementing this design pattern.

Diagram illustrating a data sharing architecture with three sections: data producers (AWS accounts managing data storage and catalogs), central governance (AWS account handling data stewards, admins, access control, and audits), and data consumers (AWS accounts accessing data for search and compute).

Architecture and core AWS services

This architecture diagram shows the pivotal AWS services that allow the various components of this Guidance to function seamlessly within the data mesh architecture on AWS.

Diagram illustrating the core services of an AWS secure data mesh architecture, featuring components for data producers, central governance, and data consumers. Key AWS services include IAM Identity Center, IAM, RedShift, S3, Glue Crawler, Glue Data Catalog, DataZone, Lake Formation, KMS, Secrets Manager, CloudWatch, CloudTrail, SageMaker, QuickSight, Bedrock, Athena, and EMR. The diagram is organized into three main sections, showing service roles and interactions across producers, governance, and consumers.

Well-Architected Pillars

The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.

CloudWatch provides comprehensive visibility into your resources and services, enabling proactive monitoring, quick troubleshooting, and prompt incident response. CloudTrail allows you to audit your AWS account, supporting governance and compliance through detailed activity logs. Use these services to maintain the operational excellence of your architecture and respond effectively to events and incidents.

Read the Operational Excellence whitepaper

Prioritize the security of your data and resources with IAM and AWS KMS. IAM allows you to centrally manage fine-grained permissions, specifying who or what can access your AWS services and resources. AWS KMS, on the other hand, allows you to define encryption keys for data encryption at rest and in transit, preserving the confidentiality and integrity of your sensitive information.

Read the Security whitepaper

Safeguard the reliability of your data and applications with Amazon S3 and Data Catalog. Amazon S3 is designed to provide high durability and availability, automatically replicating your data across multiple Availability Zones. The Data Catalog serves as a centralized metadata repository, helping you maintain a consistent and reliable view of your data sources across different data stores.

Read the Reliability whitepaper

Optimize the performance of your data processing and analytics with Amazon Redshift and Athena. Amazon Redshift is a fully managed, massively parallel processing (MPP) data warehouse service that helps you make fast and cost-effective business decisions. Athena, a serverless interactive query service, allows you to analyze data directly in Amazon S3 using standard SQL without the need to manage any infrastructure.

Read the Performance Efficiency whitepaper

As a fully managed, serverless service, Amazon S3 eliminates the need to provision and manage infrastructure, reducing the associated costs. Use the various storage classes offered by Amazon S3, including the Amazon S3 Intelligent-Tiering storage class, S3 Standard, S3 Standard-IA, and S3 Glacier, to match your data storage and access requirements with the most cost-effective options.

Read the Cost Optimization whitepaper

Amazon DataZone helps reduce data redundancy, enforces data governance policies, and facilitates secure data sharing, leading to optimized storage usage and a reduced environmental impact. By centralizing your data and enabling collaborative data sharing, you can minimize the need for data duplication across your organization, contributing to a more sustainable data environment.

Read the Sustainability whitepaper

Disclaimer

The sample code; software libraries; command line tools; proofs of concept; templates; or other related technology (including any of the foregoing that are provided by our personnel) is provided to you as AWS Content under the AWS Customer Agreement, or the relevant written agreement between you and AWS (whichever applies). You should not use this AWS Content in your production accounts, or on production or other critical data. You are responsible for testing, securing, and optimizing the AWS Content, such as sample code, as appropriate for production grade use based on your specific quality control practices and standards. Deploying AWS Content may incur AWS charges for creating or using AWS chargeable resources, such as running Amazon EC2 instances or using Amazon S3 storage.