Amazon DataZone introduces API-driven, OpenLineage-compatible data lineage visualization in preview

Posted on: Jun 27, 2024

Amazon DataZone introduces data lineage in preview, helping customers visualize lineage events from OpenLineage-enabled systems or through API and trace data movement from source to consumption. Amazon DataZone is a data management service for customers to catalog, discover, share, and govern data at scale across organizational boundaries with governance and access controls.

Amazon DataZone's data lineage feature captures and visualizes the transformations of data assets and columns, providing a view into the data movement from source to consumption. Using Amazon DataZone's OpenLineage-compatible API, domain administrators and data producers can capture and store lineage events beyond what is available in Amazon DataZone, including transformations in Amazon S3, AWS Glue, and other services. Data consumers in Amazon DataZone can gain confidence in an asset's origin from the comprehensive view of its lineage while data producers can assess the impact of changes to an asset by understanding its consumption. Additionally, Amazon DataZone versions lineage with each event, enabling users to visualize lineage at any point in time or compare transformations across an asset's or job's history. This historical lineage provides a deeper understanding of how data has evolved, essential for troubleshooting, auditing, and validating the integrity of data assets.

Amazon DataZone data lineage is available for preview in all AWS Regions where Amazon DataZone is available.

To learn more, visit Amazon DataZone, read the AWS News Blog, and get started with data lineage documentation.