ML Ops with Amazon SageMaker and Kubernetes

Simplify Kubernetes-based machine learning with Amazon SageMaker

Kubernetes is an open source system used to automate the deployment, scaling, and management of containerized applications. Kubeflow Pipelines is a workflow manager that offers an interface to manage and schedule machine learning (ML) workflows on a Kubernetes cluster. Using open source tools offers flexibility and standardization, but requires time and effort to set up infrastructure, provision notebook environments for data scientists, and stay up-to-date with the latest deep learning framework versions.

Amazon SageMaker Operators for Kubernetes and Components for Kubeflow Pipelines enable the use of fully managed SageMaker machine learning tools across the ML workflow natively from Kubernetes or Kubeflow. This eliminates the need for you to manually manage and optimize your Kubernetes-based ML infrastructure while still preserving control over orchestration and flexibility.

Scaling Machine Learning on Kubernetes and Kubeflow with Amazon SageMaker (48:26)


Simplify infrastructure setup

Amazon SageMaker Operators and Components eliminate the need to set up your own Kubernetes environment for machine learning by automatically provisioning the necessary resources, complete with autoscaling, based on your desired Amazon EC2 instance type.

Focus on innovation

By leveraging the Amazon SageMaker Operators and Components, you can avoid constant updates and installs to ensure your teams use the latest deep learning framework versions, hyperparameter tuning tools, and other utilities such as reusable algorithms and AutoML.

Provision quickly

Amazon SageMaker Studio and SageMaker Notebooks allow you to quickly provision development environments including Jupyter Notebooks, job management tools, and Python libraries for your data science teams working on Kubernetes-based ML platforms.

How it works

  • Amazon SageMaker Operators for Kubernetes
  • Amazon SageMaker Components for Kubeflow Pipelines
  • Amazon SageMaker Operators for Kubernetes
  • How it works - Amazon SageMaker Operators for Kubernetes
  • Amazon SageMaker Components for Kubeflow Pipelines
  • How it works - Amazon SageMaker Components for Kubeflow Pipelines

Use cases

Hybrid ML workflows

Sometimes, portions of the ML workflow need to take place on premise to accommodate constraints such as local data requirements, but other parts of the workflow, such as inference, can take place in the cloud. Amazon SageMaker Operators and Components connect on premises infrastructure to the cloud to leverage fully managed ML services where it is possible within the ML workflow.

Open source ML platforms

Many teams choose to build ML platforms on open source for flexibility and portability across environments. However, running open-source platforms requires configuration of Kubernetes settings. Amazon SageMaker Operators and Components allow for the maintenance of open-source ML platforms, while using the cloud for parts of the ML workflow where it makes sense for business needs.

Business continuity

Significant time and effort go into thoughtfully constructing Kubernetes environments to meet business needs. Amazon SageMaker Operators and Components allow for the use of a fully managed cloud service while continuing to leverage existing ML platforms constructed using Kubernetes or Kubeflow.

Customer stories


Cisco’s AI team built a hybrid cloud implementation using Kubeflow Pipelines to allow them to adhere to local data requirements. Cisco trains their model on prem using their own hardware and then serves models to AWS and performs inference using Amazon SageMaker, reducing their ML lifecycle TCO by 50%.

Read more »

Bayer Crop Science

Bayer Crop Science applies ML to monitor test plots to evaluate the performance of potential new products, but the analytical models used can require effort to train and use correctly. Using Amazon SageMaker with Kubeflow Pipelines, Bayer has created reproducible templates for analytical model training to help improve data science across the organization.

Read more »


"At iRobot, we use machine learning to build experiences that free customers of daily cleaning while they live and work at home. To achieve enterprise scale, our solutions are developed using Kubeflow on AWS. We love how we are able to run machine learning data processing, training, and validation pipelines with production-grade security and ability to scale. We’re excited by the direction of Amazon SageMaker making running Kubeflow on AWS even more seamlessly, allowing iRobot to deliver delightful experiences to our customers such as the iRobot Genius Home Intelligence platform.”

- Danielle Dean, PhD, Technical Director of ML


Scaling MLOps on Kubernetes with Amazon SageMaker Operators (28:20)

Introducing Amazon SageMaker Components for Kubeflow Pipelines
By Shashank Prasanna and Alex Chung
June 2020