Posted On: Dec 1, 2021

Amazon SageMaker now offers enhancements to the machine learning (ML) lineage tracking capability that enables customers to track and query the lineage of artifacts such as data, features, and models across an ML workflow. Now, customers can retrieve the end-to-end lineage graph spanning the entire workflow from data preparation to model deployment through a single query. This feature eliminates undifferentiated heavy lifting needed to retrieve lineage information one workflow step at a time and manually stitch them all together. Customers can also retrieve lineage information for segments of the workflow by defining a step as the focal point and querying the lineage of the steps that are upstream or downstream of that focal point. For instance, customers can define a model as the focal entity and retrieve the location of the raw data set from which features were extracted to train that model.

The new feature also enables tracking lineage information of workflow steps that span multiple AWS accounts. Creating multiple accounts for various personas (Data Scientist, ML engineers etc.) to organize all the resources of your organization is a common DevOps practice. To enable this feature, customer can share lineage resources across AWS accounts using AWS RAM. AWS RAM helps reduce operational overhead and provides visibility of shared resources. Once configured, customers can use lineage querying API to track relationships between various artifacts spanning across multiple AWS accounts..

The ML lineage information can be used to improve model governance, reproduce previous versions of the artifacts, or troubleshoot workflows more efficiently. To get started, train a new ML model using SageMaker Studio or SDK and use lineage querying APIs to track lineage information. To learn more, visit our documentation page on cross account graph based lineage tracking and lineage querying API.