Amazon SageMaker Documentation

Amazon SageMaker is a fully managed service with features that help developers and data scientists prepare, build, train, and deploy machine learning (ML) models. SageMaker helps customers remove the heavy lifting from the ML process to enable customers to develop high quality models. SageMaker provides the components used for machine learning in a single toolset to help models get to production faster with less effort and at lower cost.

Automatic Model Tuning

Amazon SageMaker automatic model tuning, also known as hyperparameter tuning, finds the best version of a model by running many training jobs on your dataset using the algorithm and ranges of hyperparameters that you specify. It then chooses the hyperparameter values that result in a model that performs the best, as measured by a metric that you choose.

SageMaker Automatic Model Tuning works with default configuration settings to simplify implementation by removing the need to provision hardware, install the right software, and download the training data. You can save time and money by selecting the right compute infrastructure and leveraging experiment management tools to manage hyperparameter tuning training jobs.

SageMaker Automatic Model Tuning can scale to run multiple tuning jobs in parallel, use distributed clusters of compute instances, and support large volumes of data. It includes a failure-resistant workflow with built-in retry mechanisms for robustness.

SageMaker Automatic Model Tuning can save compute time and cost with early stopping. This feature uses the information from the previously evaluated configurations to predict whether a specific candidate is promising and, if it is not, stops the evaluation. Moreover, using SageMaker warm start, you can accelerate the hyperparameter tuning process and reduce the cost for tuning models. You can start a new hyperparameter tuning job based on selected parent jobs so that training jobs conducted in those parent jobs can be reused as prior knowledge.

SageMaker’s warm start can help you run your tuning jobs iteratively and improve model accuracy by seeding your tuning job with the hyperparameter evaluations from previous tuning tasks.

SageMaker Automatic Model Tuning offers an intelligent version of hyperparameter tuning methods that is based on the Bayesian search theory and is designed to find the best model in the shortest time. It starts with a random search but then learns how the model is behaving with respect to hyperparameter values. In the subsequent steps, SageMaker Automatic Model Tuning uses this knowledge to try hyperparameters against model objective metrics. When choosing the best hyperparameters for the next training job, it considers everything that it knows about the problem so far and allows the algorithm to use the best-known results.

SageMaker Automatic Model Tuning now also supports Hyperband, a new search strategy that can find the optimal set of hyperparameters faster than Bayesian search for large-scale models such as deep neural networks that address computer vision problems. Hyperband is a new multi-fidelity tuning strategy that uses both intermediate and final results of training jobs to dynamically re-allocate resources to promising hyperparameter configurations and automatically stops the underperforming training jobs. 

SageMaker Automatic Model Tuning works seamlessly with the SageMaker built-in algorithms, including tree-based models such as XGBoost, neural network–based forecasting models such as DeepAR, and scikit-learn models, as well as bring your own algorithms and deep learning neural network models.

To save development time and effort, you can add a model tuning step in your SageMaker Pipelines workflow that will automatically invoke a hyperparameter tuning job as part of the model building workflow, without requiring custom integration code.

SageMaker Automatic Model Tuning is integrated into SageMaker JumpStart, providing one-click fine tuning and deployment of a variety of pretrained models across ML tasks, algorithms, and solutions for common business problems. It is also integrated into SageMaker Autopilot to find the best version of a model using hyperparameter optimization (HPO) mode. HPO mode selects the algorithms that are most relevant to your dataset and selects the best range of hyperparameters to tune your models. To tune your models, HPO mode runs up to 100 trials to find the optimal hyperparameters settings within the selected range.

Amazon SageMaker Autopilot

Amazon SageMaker Autopilot automates the process of building, training, and tuning the best machine learning models based on your data, while allowing you to maintain full control and visibility. 

You can use Amazon SageMaker Autopilot even when you have missing data. SageMaker Autopilot fills in the missing data, provides statistical insights about columns in your dataset, and automatically extracts information from non-numeric columns, such as date and time information from timestamps.

Amazon SageMaker Autopilot infers the type of predictions that best suit your data, such as binary classification, multi-class classification, or regression. SageMaker Autopilot then explores algorithms such as gradient boosting decision tree, feedforward deep neural networks, and logistic regression, and trains and optimizes hundreds of models based on these algorithms to find the model that best fits your data.

Amazon SageMaker Autopilot allows you to review all the ML models that are generated for your data. You can view the list of models, ranked by metrics such as accuracy, precision, recall, and area under the curve (AUC), review model details such as the impact of features on predictions, and deploy the model that is best suited to your use case.

Amazon SageMaker Autopilot provides an explainability report, generated by Amazon SageMaker Clarify, that makes it easier for you to understand and explain how models created with SageMaker Autopilot make predictions. You can also identify how each attribute in your training data contributes to the predicted result as a percentage. The higher the percentage, the more strongly that feature impacts your model’s predictions.

With Amazon SageMaker Autopilot, you can customize steps in your autoML journey to help create high quality ML models. You can apply your own data preprocessing and feature engineering transformations with  pre-configured data transformations within SageMaker Data Wrangler and bring the recipe to SageMaker Autopilot. You can also define a custom data split for training and validation data or upload a custom dataset for validation. In addition, you can select features for training, change data type, and select a training mode (ensemble or hyperparameter optimization) for your SageMaker Autopilot experiment.

You can generate a Amazon SageMaker Studio Notebook for any model Amazon SageMaker Autopilot creates and dive into the details of how it was created, refine it as desired, and recreate it from the notebook at any point in the future.

Amazon SageMaker Canvas

Amazon SageMaker Canvas expands access to machine learning (ML) by providing business analysts with a visual interface that allows them to generate accurate ML predictions on their own — without requiring any ML experience or having to write a single line of code. With SageMaker Canvas, you can browse and access disparate cloud and on-premises data sources. After data is imported, you can analyze, explore, and visualize the relationships between features, and even create new features using functions and operators. SageMaker Canvas then lets you build ML models and generate accurate predictions with a few clicks. You can also publish results, explain, and interpret models. In addition, you can collaborate with data scientists within your organization. You can share models for review and update, and data scientists can share ML models built in other tools, so you can generate predictions on those models directly inside SageMaker Canvas.

You can browse and import data using the SageMaker Canvas visual interface. SageMaker Canvas supports CSV file types and discovers AWS data sources that your account has access to, including Amazon Simple Storage Service (Amazon S3) and Amazon Redshift. You can also drag and drop files from your local disk and use prebuilt connectors to import data from third-party sources such as Snowflake. In addition, you can use the join operation to join data across multiple sources and create new unified datasets for training prediction models. For example, you can join transactional data in Amazon Redshift that contains customer IDs with CSV tables in Amazon S3 that contain customer profile data to create a new dataset. In the SageMaker Canvas visual interface, you can verify that data was imported correctly, understand the data distribution with parameters such as mean and median, and determine if there are missing values in your data. You can also profile data and identify correlations between columns in your dataset.

SageMaker Canvas offers exploratory data analysis (EDA), allowing you to prepare, explore, and analyze your data. You can impute missing values and replace outliers with custom values, visualize the relationships between features, and create new features using functions and operators.

Once you connect to your data sources, select a dataset, and prepare your data, you can select the feature that you want to predict and initiate the model creation job. SageMaker Canvas will automatically identify the problem type, generate new relevant features, test hundreds of prediction models (using ML techniques such as linear regression, logistic regression, deep learning, time-series forecasting, and gradient boosting), and build the model that makes the most accurate predictions based on your dataset.

You can share your SageMaker Canvas models with data scientists who use SageMaker Studio. They can review, update, and share updated models with you, so you can analyze and generate predictions on updated models in SageMaker Canvas.

Amazon SageMaker Clarify

Amazon SageMaker Clarify provides machine learning (ML) developers with tools to gain greater insights into their ML training data and models. SageMaker Clarify detects and measures potential bias using a variety of metrics so that ML developers can address potential bias and explain model predictions.

SageMaker Clarify can detect potential bias during data preparation, after model training, and in your deployed model. For instance, you can check for bias related to age in your dataset or in your trained model and receive a detailed report that quantifies different types of potential bias. SageMaker Clarify also includes feature importance scores that help you explain how your model makes predictions and produces explainability reports in bulk or real time through online explainability. You can use these reports to support customer or internal presentations or to identify potential issues with your model.

With SageMaker Clarify, you can identify potential bias during data preparation without having to write your own code as part of Amazon SageMaker Data Wrangler. You specify input features, such as gender or age, and SageMaker Clarify runs an analysis job to detect potential bias in those features. SageMaker Clarify then provides a visual report with a description of the metrics and measurements of potential bias so that you can identify steps to remediate the bias. For example, in a financial dataset that contains only a few examples of business loans to one age group as compared to others, the bias metrics will indicate the imbalance so that you can address the imbalances in your dataset and potentially reduce the risk of having a model that is disproportionately inaccurate for a specific age group.

In case of imbalances, you can use SageMaker Data Wrangler to balance your data. SageMaker Data Wrangler offers three balancing operators: random undersampling, random oversampling, and SMOTE to rebalance data in your unbalanced datasets. 

After you’ve trained your model, you can run a SageMaker Clarify bias analysis through Amazon SageMaker Experiments to check your model for potential bias such as predictions that produce a negative result more frequently for one group than they do for another. You specify input features with respect to which you would like to measure bias in the model outcomes, such as age, and SageMaker runs an analysis and provides you with a visual report that identifies the different types of bias for each feature, such as whether older groups receive more positive predictions compared to younger groups.

AWS open-source method Fair Bayesian Optimization can help mitigate bias by tuning a model’s hyperparameters. Read our blog post to learn how to apply Fair Bayesian Optimization to mitigate bias while optimizing the accuracy of an ML model.

SageMaker Clarify helps data scientists and ML engineers monitor predictions for bias on a regular basis. Bias can be introduced or exacerbated in deployed ML models when the training data differs from the live data that the model sees during deployment. For example, the outputs of a model for predicting home prices can become biased if the mortgage rates used to train the model differ from current mortgage rates. SageMaker Clarify bias detection capabilities are integrated into Amazon SageMaker Model Monitor so that when SageMaker detects bias beyond a certain threshold, it generates metrics that you can view in Amazon SageMaker Studio and through Amazon CloudWatch metrics and alarms.

SageMaker Clarify is integrated with SageMaker Experiments to provide scores detailing which features contributed the most to your model prediction on a particular input for tabular, natural language processing (NLP), and computer vision models. For tabular datasets, SageMaker Clarify can also output an aggregated feature importance chart which provides insights into the overall prediction process of the model. These details can help determine if a particular model input has more influence than expected on overall model behavior. For tabular data, in addition to the feature importance scores, you can also use partial dependence plots (PDPs) to show the dependence of the predicted target response on a set of input features of interest.

SageMaker Clarify can also provide insights into computer vision and NLP models. For vision models, you can see which parts of the image the models found most important with SageMaker Clarify. For NLP models, SageMaker Clarify provides feature importance scores at the level of words, sentences, or paragraphs. 

Changes in live data can expose a new behavior of your model. SageMaker Clarify is integrated with SageMaker Model Monitor to notify you using alerting systems such as CloudWatch if the importance of input features shift, causing model behavior to change.

SageMaker Clarify can provide scores detailing which features contributed the most to your model’s individual prediction after the model has been run on new data. These details can help determine if a particular input feature has more influence on the model predictions than expected. You can view these details for each prediction in real time through online explainability or get a report in bulk that uses batch processing of all the individual predictions. 

Amazon SageMaker Data Wrangler

Amazon SageMaker Data Wrangler can reduce the time it takes to aggregate and prepare data for machine learning (ML). With SageMaker Data Wrangler, you can simplify the process of data preparation and feature engineering, and complete each step of the data preparation workflow, including data selection, cleansing, exploration, and visualization from a single visual interface. 

Using SageMaker Data Wrangler’s data selection tool, you can choose the data you want from various data sources and import it easily. SageMaker Data Wrangler contains built-in data transformations so you can normalize, transform, and combine features without having to write any code. With SageMaker Data Wrangler’s visualization templates, you can quickly preview and inspect that these transformations are completed as you intended by viewing them in Amazon SageMaker Studio, a fully integrated development environment (IDE) for ML. Once your data is prepared, you can build fully automated ML workflows with Amazon SageMaker Pipelines and save them for reuse in the Amazon SageMaker Feature Store.

With the SageMaker Data Wrangler data selection tool, you can quickly access and select data from a wide variety of popular sources (such as Amazon Simple Storage Service [S3], Amazon Athena, Amazon Redshift, AWS Lake Formation, Snowflake, and Databricks Delta Lake) and over 40 other third-party sources (such as Salesforce, SAP, Facebook Ads, and Google Analytics). You can also write queries for data sources using SQL and import data directly into SageMaker from various file formats, such as CSV, Parquet, ORC, and JSON, and database tables.

SageMaker Data Wrangler provides a Data Quality and Insights report that automatically verifies data quality (such as missing values, duplicate rows, and data types) and helps detect anomalies (such as outliers, class imbalance, and data leakage) in your data. Once you can effectively verify data quality, you can apply domain knowledge to process datasets for ML model training.

SageMaker Data Wrangler helps you understand your data and identify potential errors and extreme values with a set of robust preconfigured visualization templates. Histograms, scatter plots, box and whisker plots, line plots, and bar charts are all available out of the box for applying on your data. We also have ML-specific visualizations (such as bias report, feature correlation, multicollinearity, target leakage, and time series) that show feature importance and feature correlations. Those can be accessed by selecting the corresponding tools in the Analysis tab.

SageMaker Data Wrangler offers a selection of data transformations so you can transform your data into formats that can be effectively used for models without writing a single line of code. Preconfigured transformations cover common use cases such as flattening JSON files, deleting duplicate rows, imputing missing data with mean or medium, one hot encoding, and time-series–specific transformers to accelerate the preparation of time-series data for ML. You can also author custom transformations in PySpark, SQL, and Pandas. SageMaker Data Wrangler also offers a library of code snippets to author these custom transformations.

The SageMaker Data Wrangler Quick Model feature provides an estimate of the expected predictive power of your data. Quick Model automatically splits your data into training and testing datasets and trains the data on an XGBoost model with default hyperparameters. Based on the task you are solving (for example, classification or regression), SageMaker Data Wrangler provides a model summary, feature summary, and confusion matrix, which help you iterate on your data preparation flows.

You can launch or schedule a job to process your data or export it to a SageMaker Studio notebook. SageMaker Data Wrangler offers several export options, including Amazon SageMaker Data Wrangler jobs, Amazon SageMaker Feature Store, Amazon SageMaker Autopilot, and Amazon SageMaker Pipelines, providing you the ability to integrate your data preparation flow into your ML workflow. Alternatively, you can deploy your data preparation workflow to a SageMaker hosted endpoint.

Amazon SageMaker Debugger

Amazon SageMaker Debugger can reduce troubleshooting during training by detecting and alerting you to remediate common training errors such as gradient values becoming too large or too small. Alerts can be viewed in Amazon SageMaker Studio or configured through Amazon CloudWatch. Additionally, the SageMaker Debugger SDK enables you to automatically detect new classes of model-specific errors such as data sampling, hyperparameter values, and out of bound values.

Amazon SageMaker Debugger monitors utilization of system resources such as GPUs, CPUs, network, and memory, and profiles your training jobs to collect detailed ML framework metrics. You can inspect resource metrics visually through SageMaker Studio. Anomalies in resource utilization are correlated to specific operations for identification of bottlenecks such as over-utilized CPUs so you can take corrective action. Additionally, a detailed report can be downloaded for offline analysis. Training runs can be profiled either at the start of the training job or at any point when training is in progress.

Amazon SageMaker Debugger comes with built-in analytics that analyze data emitted during training such as inputs, outputs, and transformations known as tensors. As a result, you can detect whether a model is overfitting or overtraining, whether gradients are getting too large or too small, whether GPU resources are underutilized, and other bottlenecks during training. With SageMaker Debugger, you can also create your own custom conditions to test for specific behavior in your training jobs. These conditions can invoke actions such as stopping a training job and sending an SMS or email. Early stopping of training jobs will help reduce training costs for suboptimal models and develop better prototypes faster.

Amazon SageMaker Debugger supports ML frameworks including TensorFlow, PyTorch, Apache MXNet, Keras, and XGBoost. SageMaker’s built-in containers for these frameworks come pre-installed with SageMaker Debugger, enabling you to monitor, profile, and debug your training scripts easily. By default, SageMaker Debugger monitors system hardware utilization and losses during training without writing additional code to monitor each resource separately.

Amazon SageMaker Debugger is integrated with AWS Lambda so you can act on results from alerts. For example, AWS Lambda functions can automatically stop a training job when a non-converging action such as losses continuously increasing rather than decreasing over time, is detected. AWS Lambda provides notifications to stop training jobs so you can reduce costs and achieve desired results during the early stages of ML development and training.

Amazon SageMaker Model Deployment

Amazon SageMaker enables you to deploy ML models to make predictions (also known as inference) at the best price-performance for any use case. It provides a broad selection of ML infrastructure and model deployment options to help meet all your ML inference needs. It is a fully managed service and integrates with MLOps tools, so you can scale your model deployment, reduce inference costs, manage models more effectively in production, and reduce operational burden.

Amazon SageMaker provides scalable and cost-effective ways to deploy large numbers of ML models. With SageMaker’s multi-model endpoints and multi-container endpoints, you can deploy thousands of models on a single endpoint, improving cost-effectiveness while providing the flexibility to use models as often as you need them. Multi-model endpoints support both CPU and GPU instance types.

Amazon SageMaker inference supports built-in algorithms and prebuilt Docker images for some of the most common machine learning frameworks such as Apache MXNet, TensorFlow, and PyTorch. Or you can bring your own containers. SageMaker inference also supports most popular model servers such as TensorFlow Serving, TorchServe, NVIDIA Triton, and AWS Multi Model Server. With these options, you can deploy models quickly for virtually any use case.

Amazon SageMaker offers instance types with varying levels of compute and memory, including Amazon EC2 Inf1 instances based on AWS Inferentia, high-performance ML inference chips designed and built by AWS, and GPU instances such as Amazon EC2 G4dn. Or, choose Amazon SageMaker Serverless Inference to easily scale to thousands of models per endpoint, millions of transactions per second (TPS) throughput, and sub10 millisecond overhead latencies.

Amazon SageMaker Inference Recommender helps you choose the best available compute instance and configuration to deploy machine learning models for optimal inference performance and cost. SageMaker Inference Recommender selects the compute instance type, instance count, container parameters, and model optimizations for inference to maximize performance and minimize cost.

You can use scaling policies to scale the underlying compute resources to accommodate fluctuations in inference requests. With autoscaling, you can shut down instances when there is no usage to prevent idle capacity and reduce inference cost.

Amazon SageMaker Neo

Amazon SageMaker Neo enables developers to optimize machine learning (ML) models for inference on SageMaker in the cloud and supported devices at the edge.

ML inference is the process of using a trained machine learning model to make predictions. After training a model for high accuracy, developers often spend a lot of time and effort tuning the model for high performance. For inference in the cloud, developers often turn to large instances with lots of memory and powerful processing capabilities at higher costs to achieve better throughput. For inference on edge devices with limited compute and memory, developers often spend months hand-tuning the model to achieve acceptable performance within the device hardware constraints.

Amazon SageMaker Neo optimizes machine learning models for inference on cloud instances and edge devices to run faster. SageMaker Neo optimizes a trained model and compiles it into an executable. The compiler uses a machine learning model to apply the performance optimizations to optimize performance for your model on the cloud instance or edge device. You then deploy the model as a SageMaker endpoint or on supported edge devices and start making predictions.

For inference in the cloud, SageMaker Neo speeds up inference and saves cost by creating an inference optimized container in SageMaker hosting. For inference at the edge, SageMaker Neo can save developers months of manual tuning by tuning the model for the selected operating system and processor hardware. 

Amazon SageMaker Edge Manager

An increasing number of applications such as industrial automation, autonomous vehicles, and automated checkouts require machine learning (ML) models that run on devices at the edge so predictions can be made in real-time when new data is available. Amazon SageMaker Neo is an easy way to optimize ML models for edge devices, enabling you to train ML models once in the cloud and run them on any device. As devices proliferate, customers may have thousands of deployed models running across their fleets. Amazon SageMaker Edge Manager enables you to optimize, secure, monitor, and maintain ML models on fleets of smart cameras, robots, personal computers, and mobile devices.

Amazon SageMaker Edge Manager provides a software agent that runs on edge devices. The agent comes with an ML model optimized with SageMaker Neo so you don’t need to have Neo runtime installed on your devices in order to take advantage of the model optimizations. The agent also collects prediction data and sends a sample of the data to the cloud for monitoring, labeling, and retraining so you can keep models accurate over time. All data can be viewed in the SageMaker Edge Manager dashboard which reports on the operation of deployed models. And, because SageMaker Edge Manager enables you to manage models separately from the rest of the application, you can update the model and the application independently, which can reduce costly downtime and service disruptions. SageMaker Edge Manager also cryptographically signs your models so you can verify that it was not tampered with as it moves from the cloud to edge devices.

Amazon SageMaker Experiments

SageMaker Experiments is a managed service for tracking and analyzing ML experiments at scale. 

ML experiments are performed in diverse environments such as local notebooks and IDEs, training code running in the cloud, or managed IDEs in the cloud such as SageMaker Studio. With SageMaker Experiments, you can start tracking your experiments centrally from any environment or IDE using Python code.

The process of developing an ML model involves experimenting with various combinations of data, algorithms, and parameters, while evaluating the impact of incremental changes on model performance. With SageMaker Experiments, you can track your ML iterations and save the related metadata such as metrics, parameters and artifacts in a central place.

Finding the best model from multiple iterations requires analysis and comparison of model performance. SageMaker Experiments provide visualizations such as scatter plots, bar charts, and histograms. In addition, SageMaker Experiments SDK lets you load the logged data in your notebook for offline analysis.

SageMaker Experiments is integrated with SageMaker Studio, allowing team members to access the same information and confirm that the experiment results are consistent, enabling collaboration. Use SageMaker Studio's search capability to find relevant experiments from the past.

Using SageMaker Experiments you can access and reproduce your ML workflow from the experiments you’ve tracked.

Amazon SageMaker Feature Store

Amazon SageMaker Feature Store is a fully managed, purpose-built repository to store, update, retrieve, and share machine learning (ML) features.

Features are the attributes or properties models use during training and inference to make predictions. For example, in an ML application that recommends a music playlist, features could include song ratings, which songs were listened to previously, and how long songs were listened to. The accuracy of an ML model is based on a precise set and composition of features. Often, these features are used repeatedly by multiple teams training multiple models. And whichever feature set was used to train the model needs to be available to make real-time predictions (inference). Keeping a single source of features that is consistent and up-to-date across these different access patterns is a challenge as most organizations keep two different feature stores, one for training and one for inference.

Amazon SageMaker Feature Store is a purpose-built repository where you can store and access features so it’s easier to name, organize, and reuse them across teams. SageMaker Feature Store provides a unified store for features during training and real-time inference without the need to write additional code or create manual processes to keep features consistent. SageMaker Feature Store keeps track of the metadata of stored features (e.g. feature name or version number) so that you can query the features for the right attributes in batches or in real time using Amazon Athena, an interactive query service. SageMaker Feature Store also keeps features updated, because as new data is generated during inference, the single repository is updated so new features are available for models to use during training and inference.

Amazon SageMaker Ground Truth

Amazon SageMaker enables you to identify raw data, such as images, text files, and videos; add informative labels; and generate labeled synthetic data to create high-quality training datasets for your machine learning (ML) models. SageMaker offers two options, Amazon SageMaker Ground Truth Plus and Amazon SageMaker Ground Truth, which provide you with the flexibility to use a workforce to create and manage data labeling workflows on your behalf or manage your own data labeling workflows.

With SageMaker Ground Truth Plus, you can create high-quality training datasets without having to build labeling applications or manage labeling workforces on your own. SageMaker Ground Truth Plus helps reduce data labeling costs. SageMaker Ground Truth Plus provides a workforce that is trained on ML tasks and can help meet your data security, privacy, and compliance requirements. You upload your data, and then SageMaker Ground Truth Plus creates and manages data labeling workflows and the workforce on your behalf.

If you want the flexibility to build and manage your own data labeling workflows and workforce, you can use SageMaker Ground Truth. SageMaker Ground Truth is a data labeling service that makes it easy to label data and gives you the option to use human annotators through Amazon Mechanical Turk, third-party vendors, or your own private workforce.

You can also generate labeled synthetic data without manually collecting or labeling real-world data. SageMaker Ground Truth can generate hundreds of thousands of automatically labeled synthetic images on your behalf.

Amazon SageMaker JumpStart

Amazon SageMaker JumpStart is a machine learning (ML) hub that can help you accelerate your ML journey. With SageMaker JumpStart, you can access built-in algorithms with pretrained models from model hubs, pretrained foundation models to help you perform tasks such as article summarization and image generation, and prebuilt solutions to solve common use cases. In addition, you can share ML artifacts, including ML models and notebooks, within your organization to accelerate ML model building and deployment. 

SageMaker JumpStart provides built-in algorithms with pretrained models from model hubs, including TensorFlow Hub, PyTorch Hub, HuggingFace, and MxNet GluonCV. You can also access built-in algorithms using the SageMaker Python SDK. Built-in algorithms cover common ML tasks, such as data classifications (image, text, tabular) and sentiment analysis.

Foundation models are large-scale ML models that contain billions of parameters and are pretrained on terabytes of text and image data, so you can perform a wide range of tasks such as article summarization and text, image, or video generation. Because foundation models are pretrained, they can help lower training and infrastructure costs and enable customization for your use case.

Prebuilt solutions can be used for common use cases and are fully customizable.

ML Ops with Amazon SageMaker and Kubernetes

Kubernetes is an open source system used to automate the deployment, scaling, and management of containerized applications. Kubeflow Pipelines is a workflow manager that offers an interface to manage and schedule machine learning (ML) workflows on a Kubernetes cluster. Using open source tools offers flexibility and standardization, but requires time and effort to set up infrastructure, provision notebook environments for data scientists, and stay up-to-date with the latest deep learning framework versions.

Amazon SageMaker Operators for Kubernetes and Components for Kubeflow Pipelines enable the use of fully managed SageMaker machine learning tools across the ML workflow natively from Kubernetes or Kubeflow. This eliminates the need to manually manage and optimize your Kubernetes-based ML infrastructure while still preserving control over orchestration and flexibility.

Amazon SageMaker Model Monitor

With Amazon SageMaker Model Monitor, you can select the data you would like to monitor and analyze without the need to write any code. SageMaker Model Monitor lets you select data from a menu of options such as prediction output, and captures metadata such as timestamp, model name, and endpoint so you can analyze model predictions based on the metadata. You can specify the sampling rate of data capture as a percentage of overall traffic in the case of high volume real-time predictions, and the data is stored in your own Amazon S3 bucket. You can also encrypt this data, configure fine-grained security, define data retention policies, and implement access control mechanisms for secure access.

Amazon SageMaker Model Monitor offers built-in analysis in the form of statistical rules, to detect drifts in data and model quality. You can also write custom rules and specify thresholds for each rule. The rules can then be used to analyze model performance. SageMaker Model Monitor runs rules on the data collected, detects anomalies, and records rule violations.

All metrics emitted by Amazon SageMaker Model Monitor can be collected and viewed in Amazon SageMaker Studio, so you can visually analyze your model performance without writing additional code. Not only can you visualize your metrics, but you can also run ad-hoc analysis in a SageMaker notebook instance to understand your models better.

Amazon SageMaker Model Monitor allows you to ingest data from your ML application in order to compute model performance. The data is stored in Amazon S3 and secured through access control, encryption, and data retention policies.

You can monitor your ML models by scheduling monitoring jobs through Amazon SageMaker Model Monitor. You can automatically kick off monitoring jobs to analyze model predictions during a given time period. You can also have multiple schedules on a SageMaker endpoint.

Amazon SageMaker Model Monitor is integrated with Amazon SageMaker Clarify to improve visibility into potential bias. Although your initial data or model may not have been biased, changes in the world may cause bias to develop over time in a model that has already been trained. For example, a substantial change in home buyer demographics could cause a home loan application model to become biased if certain populations were not present in the original training data. Integration with SageMaker Clarify enables you to configure alerting systems such as Amazon CloudWatch to notify you, if your model begins to develop bias.

The reports generated by monitoring jobs can be saved in Amazon S3 for further analysis. Amazon SageMaker Model Monitor emits metrics to Amazon CloudWatch where you can consume notifications to trigger alarms or corrective actions such as retraining the model or auditing data. The metrics include information such as rules that were violated and timestamp information. SageMaker Model Monitor also integrates with other visualization tools including TensorBoard, Amazon QuickSight, and Tableau.

Amazon SageMaker Notebooks

Amazon SageMaker offers two types of fully managed Jupyter Notebooks for data exploration and building ML models: Amazon SageMaker Studio notebooks and Amazon SageMaker notebook instances.

Quick start, collaborative notebooks that integrate with ML tools in SageMaker and other AWS services for your ML development, from preparing data at petabyte scale using Spark on Amazon EMR, to training and debugging models, tracking experiments, deploying and monitoring models and managing pipelines – all in Amazon SageMaker Studio – an integrated development environment (IDE) for ML. Dial compute resources up or down without interrupting your work. Share notebooks with your team using a sharable link or coedit a single notebook at the same time.

Standalone, fully managed Jupyter Notebook instances in the Amazon SageMaker console. Choose from the broadest selection of compute resources available in the cloud, including GPUs for accelerated computing, and work with the latest versions of open-source software that you trust.

Amazon SageMaker Pipelines

Amazon SageMaker Pipelines is a purpose-built, easy-to-use continuous integration and continuous delivery (CI/CD) service for machine learning (ML). With SageMaker Pipelines, you can create, automate, and manage end-to-end ML workflows at scale.

Using Amazon SageMaker Pipelines, you can create ML workflows with a Python SDK, and then visualize and manage your workflow using Amazon SageMaker Studio. You can be more efficient and scale faster by storing and reusing the workflow steps you create in SageMaker Pipelines. You can also get started with built-in templates to build, test, register, and deploy models.

With the SageMaker Pipelines model registry, you can track these versions in a central repository where it is easy to choose the right model for deployment based on your business requirements. You can use SageMaker Studio to browse and discover models, or you can access them through the SageMaker Python SDK.

Amazon SageMaker Pipelines logs every step of your workflow, creating an audit trail of model components such as training data, platform configurations, model parameters, and learning gradients. Audit trails can be used to recreate models and help support compliance requirements.

Amazon SageMaker Pipelines brings CI/CD practices to machine learning, such as maintaining parity between development and production environments, version control, on-demand testing, and end-to-end automation, helping you scale ML throughout your organization.


RStudio on Amazon SageMaker is a fully managed cloud-based RStudio Workbench. You can launch the RStudio integrated development environment (IDE), and dial up and down the underlying compute resources without interrupting your work, to enable machine learning (ML) and analytics solutions in R at scale.

Amazon SageMaker shadow testing

SageMaker helps you run shadow tests to evaluate a new machine learning (ML) model before production release by testing its performance against the currently deployed model. Shadow testing can help you catch potential configuration errors and performance issues before they impact end users. 

Amazon SageMaker Studio Lab

Amazon SageMaker Studio Lab is a free machine learning (ML) development environment that provides the compute, storage (up to 15GB), and security—all at no cost—for anyone to learn and experiment with ML. To get started, you don’t need to configure infrastructure or manage identity and access or even sign up for an AWS account. SageMaker Studio Lab accelerates model building through GitHub integration, and it comes preconfigured with popular ML tools, frameworks, and libraries to get you started immediately. SageMaker Studio Lab saves your work so you don’t need to restart in between sessions.

Amazon SageMaker Studio

Amazon SageMaker Studio provides a single, web-based visual interface where you can perform all ML development steps, which can significantly improve data science team productivity. SageMaker Studio gives you access, control, and visibility into each step required to build, train, and deploy models. You can upload data, create new notebooks, train and tune models, move back and forth between steps to adjust experiments, compare results, and deploy models to production all in one place. ML development activities that can be performed within SageMaker Studio include notebooks, experiment management, automatic model creation, debugging, and model and data drift detection.

Amazon SageMaker Model Training

Amazon SageMaker Model Training reduces the time and cost to train and tune machine learning (ML) models at scale without the need to manage infrastructure. You can take advantage of the highest-performing ML compute infrastructure currently available, and SageMaker can scale infrastructure up or down, from one to thousands of GPUs. Since you pay only for what you use, you can manage your training costs more effectively. To train deep learning models faster, SageMaker distributed training libraries can automatically split large models and training datasets across AWS GPU instances, or you can use third-party libraries, such as DeepSpeed, Horovod, or Megatron.

Distributed Training Libraries

Amazon SageMaker helps improve the training process for large deep learning models and datasets. Using partitioning algorithms, SageMaker's distributed training libraries split large deep learning models and training datasets across AWS GPU instances in a fraction of the time it takes to do manually. SageMaker achieves these efficiencies through two techniques: data parallelism and model parallelism. Model parallelism splits models too large to fit on a single GPU into smaller parts before distributing across multiple GPUs to train, and data parallelism splits large datasets to train concurrently in order to improve training speed.

ML use cases such as image classification and text-to-speech demand increasingly larger computational requirements and datasets.

With just a few lines of additional code, you can add either data parallelism or model parallelism to your PyTorch and TensorFlow training scripts and Amazon SageMaker will apply your selected method for you. SageMaker splits your model by using graph partitioning algorithms to balance the computation of each GPU while minimizing the communication between GPU instances. SageMaker also helps optimize your distributed training jobs through algorithms that are designed to maximize AWS compute and network infrastructure in order to achieve near-linear scaling efficiency, which allows you to complete training more quickly than manual implementations.

Training Compiler

Use Amazon SageMaker Training Compiler to train deep learning (DL) models faster on scalable GPU instances managed by SageMaker.

State-of-the-art deep learning (DL) models consist of complex multi-layered neural networks with billions of parameters that can take thousands of GPU hours to train. Optimizing such models on training infrastructure requires knowledge of DL and systems engineering; this can be challenging even for narrow use cases.

SageMaker Training Compiler is a capability of SageMaker that makes these hard-to-implement optimizations to reduce training time on GPU instances. The compiler optimizes DL models to accelerate training by more efficiently using SageMaker machine learning (ML) GPU instances. SageMaker Training Compiler is available at no additional charge within SageMaker and can help reduce total billable time as it accelerates training.

SageMaker Training Compiler is integrated into AWS Deep Learning Containers (DLCs). Using the SageMaker Training Compiler–enabled AWS DLCs, you can compile and optimize training jobs on GPU instances. Bring your deep learning models to SageMaker and enable SageMaker Training Compiler to accelerate the speed of your training job on SageMaker ML instances for accelerated computing. 

Additional Information

For additional information about service controls, security features and functionalities, including, as applicable, information about storing, retrieving, modifying, restricting, and deleting data, please see This additional information does not form part of the Documentation for purposes of the AWS Customer Agreement available at, or other agreement between you and AWS governing your use of AWS’s services.