General

Q: What is Amazon DataZone?

Use Amazon DataZone to discover and share data at scale across organizational boundaries with built-in governance and access controls, making data and analytics tools accessible to everyone in the organization. The service improves operational efficiency for business and data teams to work with data faster, gain insights, and make informed decisions rooted in the truth—your data. With Amazon DataZone, anyone in your organization can discover new and existing realms of data from a personalized web application, without expertise of underlying AWS data services.

Q: When should I use Amazon DataZone?

Use Amazon DataZone when you need a streamlined way to search for data. With the Amazon DataZone catalog, you can make data visible with business context to find and understand data quickly. Amazon DataZone projects simplify access to AWS analytics tools by creating business use case–based groupings of teams, analytics tools, and data assets. With the Amazon DataZone automated publish/subscribe workflow, you can adjust data ownership to protect data between producers and consumers. It makes sure that the people with the right permissions can access the right data for the right purpose. Use Amazon DataZone to apply federated data governance by letting those who know the data enforce security and access controls on relevant data assets.

Amazon DataZone components

Q: What are the main components of Amazon DataZone?

Amazon DataZone includes four main components:

  1. Organization-wide catalog. Make data visible with business context for everyone to find and understand data quickly. Catalog data across the organization, in Azure or Google Cloud, on-premises RDBMS databases, or SaaS applications like Salesforce, Google Analytics, and SAP, with rich metadata and business context.
  2. Publish/subscribe workflow with access management. Use the automated workflow to better secure data between producers and consumers and to make sure that you have access to the right data for the right purpose. Streamline auditing who is using each dataset and for what business use cases, and monitor usage and costs across projects and lines of business (LOBs).
  3. Projects. Simplify access to AWS analytics by creating business use case–based groupings of people, data assets, and analytics tools. Amazon DataZone projects provide a space where members of the project are able to collaborate, exchange data, and share artifacts. Projects are better secured so that only those who are explicitly added to the project are able to access the data and analytics tools within it. Projects manage the ownership of data assets produced in accordance with policies applied by data stewards, decentralizing data ownership through federated governance.
  4. Portal (outside the AWS Management Console). The Amazon DataZone portal is an integrated data experience to promote exploration and drive innovation with a personalized homepage. The portal is an out-of-console experience that facilitates streamlined cross-functional collaboration while working with data and analytics tools in a self-service fashion. It uses existing credentials from your identity provider.

Q: What are Amazon DataZone domains?

With domains, you can more securely organize resources aligned to business-driven domains, such as LOBs. You have the flexibility to reflect your organization’s hierarchy through this scalable structure. Domains are a scalable container for you, your team, and related Amazon DataZone entities, including data assets and analytics tools—like Amazon Athena and Amazon Redshift query editors. You can publish a data asset in the catalog with a particular domain that governs the data. You can then control access on their associated AWS accounts and resources that can access that domain.

Q: How does Amazon DataZone support and integrate with other AWS services?

Amazon DataZone supports three types of integrations with other AWS services:

  1. Producer data sources. You can locally publish structured data assets, including XML, JSON, and CSV, to the Amazon DataZone catalog from the data stored on AWS Glue Data Catalog and Amazon Redshift tables and views. You can also use AWS Glue to ingest data into your AWS Glue Data Catalog from other sources. These sources include Amazon Simple Storage Service (S3), Amazon DynamoDB, Amazon Relational Database Service (RDS), and SaaS providers—such as Salesforce, SAP, and Google Analytics—through Amazon AppFlow.
  2. Consumer data sources. You can access data assets using Amazon Athena, Amazon Redshift query editor, and Amazon Redshift Spectrum.
  3. Access fulfillment. Amazon DataZone can manage permissions management for AWS Lake Formation managed AWS Glue tables and Amazon Redshift tables and views. Additionally, Amazon DataZone connects standard events related to your actions to Amazon EventBridge. You can use these standard events to integrate with other AWS services or third-party solutions for custom integrations.

Q: Which Regions are supported for preview?

At preview, the Amazon DataZone root domain can only be provisioned in the AWS Regions of US East (N. Virginia), US West (Oregon), or Europe (Ireland). AWS IAM Identity Center, which is the successor to AWS Single Sign-on, must be configured in the same Region as the root domain. You can publish and consume data from any AWS Region. For example, you can publish data from an AWS Glue Data Catalog in any Region, and use Amazon Athena, Amazon Redshift, AWS Glue, or other AWS services in any other Region.

Q: How do I get started with Amazon DataZone?

Sign up for the preview. When the Amazon DataZone preview opens in early 2023, you can access Amazon DataZone from the AWS Management Console.