Introducing Amazon SageMaker Pipelines, first purpose built CI/CD service for machine learning

Posted On: Dec 8, 2020

We’re excited to announce Amazon SageMaker Pipelines, a new capability of Amazon SageMaker to build, manage, automate, and scale end to end machine learning workflows. SageMaker Pipelines brings automation and orchestration to ML workflows, enabling you to accelerate machine learning projects and scale up to thousands of models in production.

Machine Learning is an iterative process and requires collaboration across different stakeholders such as data engineers, data scientists, ML engineers, and DevOps engineers. It is challenging to build a scalable process for building models as the number of steps across data preparation, feature engineering, training, and model evaluation can become large, increasing the complexity in managing data dependencies. As the number of models rise, managing model versions and deploying them in production requires automation in an easy and scalable manner. Finally, tracking lineage across the end to end pipeline requires custom tooling for tracking of data and model artifacts and actions.

Amazon SageMaker Pipelines enables data science and engineering teams to collaborate seamlessly on ML projects and streamline building, automating, and scaling of end to end ML workflows. Amazon SageMaker SDK makes it easy to construct model building pipelines by defining the parameters and steps which can include Amazon SageMaker Data Wrangler, Processing, Training, Batch Transform, conditional evaluation, and registering models to the central model registry. Once the pipelines are built, Amazon SageMaker takes care of the execution of the pipelines and you can view the pipeline executions and the real-time metrics and logs for each step in Amazon SageMaker Studio. Models are registered to the new Amazon SageMaker model registry which automatically versions new models generated from pipelines and offers built-in approval workflows to select which models are deployed to production.

Amazon SageMaker Pipelines offers DevOps best practices of Continuous Integration and Continuous Delivery (CI/CD) applied to machine learning (known as MLOps) to automate and scale ML model building and deployment pipelines. Amazon SageMaker Pipelines provides built in MLOps templates so you can get started with CI/CD for ML Projects and also provides the ability to use custom MLOps templates. As a result, you can quickly and easily scale your ML Pipelines without relying on manual processes and better ensure code consistency, integration and unit testing, and reliable model updates in production. Finally, Amazon SageMaker Pipelines automatically tracks lineage for each step of your ML pipeline, which may help with any governance and audit requirements, without the need for building any custom tooling.

Amazon SageMaker Pipelines is now generally available in all commercial AWS Regions where Amazon SageMaker is available and the MLOps capabilities of Amazon SageMaker Pipelines are only available in the AWS Regions where AWS CodePipeline is also available. Read the documentation for more information and for sample notebooks. To learn how to use the feature visit the blog post.

Introducing Amazon SageMaker Pipelines, first purpose built CI/CD service for machine learning

Ending Support for Internet Explorer