Amazon DataZone FAQs

Page Topics

General

General

Amazon DataZone is a data management service that makes it faster and easier for customers to catalog, discover, share, and govern data stored across AWS, on premises, and third-party sources. With Amazon DataZone, engineers, data scientists, product managers, analysts, and business users can quickly access data throughout an organization so that they can discover, use, and collaborate to derive data-driven insights. Administrators and data owners who oversee an organization's data assets can easily manage and govern access to data. Amazon DataZone provides built-in workflows for data consumers to request access to data and for data owners to approve the access. 

Amazon DataZone gives data people a unified data management portal to catalog, discover, access, analyze, and govern data across the organization. They can then more easily collaborate with data engineers and IT admins to gain insights from their data faster. Amazon DataZone helps users consume data assets that are in the business data catalog from Amazon Redshift Query Editor and Amazon Athena through a web-based application. This removes the need to log in to the AWS Management Console for users who prefer an out-of-console experience. To programmatically set up, configure, or integrate with existing processes, Amazon DataZone has APIs published with guidelines on how to use them.

You can use Amazon DataZone to manage data assets from AWS Lake Formation managed AWS Glue tables and Amazon Redshift tables. Additionally, with AWS Glue connectors and its integration with Amazon AppFlow, assets from various sources can be catalogued to increase visibility across the organization. With general availability, you can configure Amazon DataZone to catalog custom assets where you have the flexibility to define what that asset could be. 

Amazon DataZone projects are business use case–based groupings of users, data assets, and analytics tools. They provide a collaborative space where users of the project are able to collaborate and exchange data and artifacts. Projects are secured so that only users who are explicitly added to the project are able to access the data and tools within it.

When deployed, the project creates AWS Identity and Access Management (IAM) roles based on the project-selected capabilities (for example, a data lake) that provide users with required access to do their job. Projects also provide work isolation inside the same account, as well as a security boundary (security group and IAM roles). To work with data within projects, you can create environments. Environments create IAM roles based on the tools and capabilities (for example, data lake) that provide users with required access to do their job. 

Yes, the Amazon DataZone business data catalog supports a business glossary. A business glossary is like a dictionary for an organization that lists business terms with their definitions to ensure that the same definitions are used organization-wide when discovering and analyzing data. Additionally, the business data catalog provides metadata forms to customize, mandate, or define additional metadata to assets for data people to learn and understand the asset before using it for their analysis. 

Amazon DataZone abstracts the process of sharing data between data producers and consumers by using Lake Formation constructs. Amazon DataZone automates the fulfillment of data access to the underlying (Amazon DataZone managed) assets according to the policies applied by data publishers. The fulfillment is taken care of without the need for an admin or for data movement. 

Yes, we have support for APIs, AWS CloudFormation, AWS Command Line Interface (AWS CLS), and AWS Cloud Development Kit (AWS CDK). For more details on API support, see the documentation.