Amazon SageMaker for Data Scientists

Amazon SageMaker for Data Scientists

Data Science

Tens of thousands of data scientists use Amazon SageMaker, because SageMaker makes it easy to solve business problems using machine learning (ML). SageMaker Studio provides a fully integrated development environment (IDE) for ML so you can prepare data, and build, train, and deploy models with a single, visual experience. Overall, data science teams can be up to 10x more productive using SageMaker.

Machine Learning

Transparency

Biases are imbalances in the accuracy of predictions across different groups, such as age or income bracket. Biases can result from the data or algorithm used to train your model. The field of machine learning provides an opportunity to address biases by detecting them in your data and model.

Detect bias and understand predictions

Amazon SageMaker Clarify provides data to improve model quality through bias detection during data preparation and after training. SageMaker Clarify also provides model explainability reports so stakeholders can see how and why models make predictions.

Learn more »
SageMaker Clarify

Collect and prepare training data

Amazon SageMaker offers all the tools you need to create high quality training data. You can easily access data from AWS and third party data sources, label your data, automatically cleanse and transform data, and visualize data in order to engineer model features. 

Prepare data for ML in minutes

With SageMaker Data Wrangler’s data selection tool, you can quickly select data from multiple data sources, such as Amazon Athena, Amazon Redshift, AWS Lake Formation, Amazon S3, and the Amazon SageMaker Feature Store. You can write queries for data sources and import data directly into SageMaker from various file formats, and use SageMaker Data Wranger's visualization templates and built-in data transforms to ensure data prepared will result in accurate ML models. 

Learn more »
SageMaker Data Wrangler

Data labeling

Amazon SageMaker Ground Truth makes it helps you build highly accurate training datasets for machine learning. Get started with labeling your data in minutes through the SageMaker Ground Truth console using custom or built-in data labeling workflows including 3D point clouds, video, images, and text.

Get started »
SageMaker Ground Truth

Low latency feature store

Amazon SageMaker Feature Store is a fully managed repository to store, update, retrieve, and share machine learning (ML) features. SageMaker Feature Store serves the exact same features in batch for training and in real-time for inference so you don’t need to write code to keep features consistent. You can easily add new features, update existing ones, retrieve features in batches for training, and get the same features with single digit millisecond latency for real-time inference.

Learn more »
SageMaker Feature Store

Build models

After data is prepared, Amazon SageMaker provides all the tools you need to iteratively try different modeling techniques in order to evaluate their performance. You can pick different algorithms, including over 15 that are built in and optimized for SageMaker, and over 150 pre-built models from popular model zoos available with just a few clicks. Inside SageMaker Studio, you can run the models on a small scale to see results and view reports on their performance so you can come up with high quality working prototypes.

One-click Jupyter Notebooks

Amazon SageMaker Studio Notebooks are one-click Jupyter notebooks that can be spun up quickly. The underlying compute resources are fully elastic, so you can easily dial up or down the available resources and the changes take place automatically in the background without interrupting your work. Notebooks can be shared with a single click, your colleagues get the exact same notebook, saved in the same place.

Get started »
SageMaker Studio Notebook

Built-in algorithms

Amazon SageMaker also offers over 15 built in algorithms available in pre-built container images that can be used to quickly train and run inference.

Get started »
Built-in Algorithms

Local mode

Amazon SageMaker makes it possible to test and prototype locally. The Apache MXNet and TensorFlow Docker containers used in SageMaker are available on GitHub. You can download these containers to your local environment and use the SageMaker Python SDK to test your scripts before deploying to SageMaker training or hosting environments. 

Get started »
SageMaker Local Mode

Reinforcement learning

Amazon SageMaker supports reinforcement learning in addition to traditional supervised and unsupervised learning. SageMaker has built-in, fully-managed reinforcement learning algorithms, including some of the newest and best performing in the academic literature.

Get started »
Reinforcement Learning

Train and tune models

Amazon SageMaker provides everything you need to train and tune models. You can easily manage different training runs to isolate and measure the impact of changing data sets, algorithm versions, and model parameters or take advantage of automatic model tuning. 

Organize, track, and evaluate training runs

Amazon SageMaker Experiments automatically captures training input parameters, configurations, and results, and stores them as ‘experiments’. You can browse active experiments, search for previous experiments by their characteristics, review previous experiments with their results, and compare experiment results visually.

Get started »
SageMaker Experiments

Detect and debug problems

Amazon SageMaker Debugger captures metrics in real-time so you can correct performance problems quickly before the model is deployed to production.

Learn more »
SageMaker Debugger

Managed spot training

Amazon SageMaker provides Managed Spot Training to help you to reduce training costs by up to 90%. This capability uses Amazon EC2 Spot instances, which is spare AWS compute capacity. Training jobs are automatically run when compute capacity becomes available and are made resilient to interruptions caused by changes in capacity, allowing you to save cost when you have flexibility with when to run training jobs.

Get started »
Managed Spot Training

Automatic model tuning

Amazon SageMaker can automatically tune your model by adjusting thousands of different combinations of algorithm parameters to arrive at the most accurate predictions the model is capable of producing saving weeks of effort. Automatic model tuning uses machine learning to quickly tune your model to be as accurate as possible. 

Get started »
Automatic Model Tuning

Deploy models to production

Amazon SageMaker makes it easy to generate predictions by providing everything you need to deploy machine learning models in production and monitor model quality over time. 

Automatic workflows

Amazon SageMaker Pipelines help you create, automate, and manage end-to-end ML workflows at scale using CI/CD practices. Once the workflows are created, they can be visualized and managed in SageMaker Studio. SageMaker Pipelines takes care of all the heavy lifting involved with managing dependencies between each step of the ML workflow. You can re-run complete workflows at any time with updated data to keep your models accurate, and share workflows with other teams to collaborate on projects. 

Learn more »
SageMaker Pipelines

Continuously monitor models

Amazon SageMaker Model Monitor automatically detects model and concept drifts and provides detailed alerts that help identify the source of the problem so you can improve model quality over time. All models trained in SageMaker automatically emit key metrics that can be collected and viewed in SageMaker Studio.

Learn more »
SageMaker Model Monitor

Human review

Many machine learning applications require humans to review low confidence predictions to ensure the results are correct. Amazon Augmented AI provides built-in human review workflows for common machine learning use cases.

Get started »

Batch transform

Amazon SageMaker Batch Transform eliminates the need to resize large datasets for batch processing jobs. Batch Transform allows you to run predictions on large or small batch datasets using a simple API. 

Get started »

Multi-model endpoints

Amazon SageMaker provides a scalable and cost effective way to deploy large numbers of custom machine learning models. SageMaker Multi-Model endpoints enable you to deploy multiple models with a single click on a single endpoint and serve them using a single serving container.

Get started »

Resources for Amazon SageMaker for Data Scientists

A day in the life of a machine learning data scientist at JP Morgan Chase (34:41)