Amazon SageMaker for Data Scientists

Integrated development environment (IDE) for the ML lifecycle

250 hours per month of ml.t3.medium

on Studio notebooks for the first 2 months with the AWS Free Tier

Access data from structured and unstructured data sources

Improve productivity with purpose built-tools

Fully managed Jupyter Notebooks with just a few clicks

How it works

Data science is the study of data to extract meaningful insights for business. It asks and answers questions like what happened, why it happened, and what will happen. Machine learning (ML) is essential for data science because ML makes it practical for machines to solve problems that traditional analytics cannot easily solve with rule-based logic. ML analyzes data and discovers patterns by learning from examples. Machines can then use the patterns to recognize unknown instances. Amazon SageMaker offers a broad set of ML capabilities used by tens of thousands of customers to access and analyze data, and build, train, and deploy high-quality ML models. Your data science teams can be up to 10x more productive using Amazon SageMaker.

Amazon SageMaker for Data Scientists

Prepare

Prepare data for ML in minutes

With SageMaker Data Wrangler’s data selection tool, you can quickly select data from multiple data sources, such as Amazon Athena, Amazon Redshift, AWS Lake Formation, Amazon S3, and the Amazon SageMaker Feature Store. You can write queries for data sources and import data directly into SageMaker from various file formats, and use SageMaker Data Wrangler’s visualization templates and built-in data transforms to ensure data prepared will result in accurate ML models.

Learn more »
Prepare data for ML in minutes

Low latency feature store

A fully managed repository to store, update, retrieve, and share machine learning features, SageMaker Feature Store serves the exact same features in batch for training and in real-time for inference so you don’t need to write code to keep features consistent. You can easily add new features, update existing ones, retrieve features in batches for training, and get the same features with single-digit millisecond latency for real-time inference.

Learn more »
Low latency feature store

Scalable data preparation using notebooks

You can visually browse, discover, and connect to Apache Spark data processing environments running on Amazon EMR from your SageMaker Studio notebooks with a few clicks. Once connected, you can interactively query, explore, and visualize data, and run Spark jobs using the language of your choice (SQL, Python, and Scala) to build end-to-end data preparation and ML workflows.

Learn more »
Scalable data preparation using notebooks

Data Labeling

Amazon SageMaker data labeling allows you to identify raw data, such as images, text files, and videos, and add informative labels to create high-quality training datasets for your machine learning models.

Learn more »
Data Labeling

Build

One-click Jupyter Notebooks

Amazon SageMaker Studio Notebooks are one-click Jupyter Notebooks that can be spun up quickly. The underlying compute resources are fully elastic, so you can easily dial up or down the available resources and the changes take place automatically in the background without interrupting your work. Notebooks can be shared with a single click, your colleagues get the exact same notebook, saved in the same place.

Learn more »
One-click Jupyter Notebooks

Built-in algorithms

Amazon SageMaker offers over 15 built-in algorithms available in pre-built container images that can be used to quickly train and run inference.

Get started »
Built-in algorithms

Pre-built solutions and open-source models

Amazon SageMaker JumpStart helps you quickly get started with ML using pre-built solutions that can be deployed with just a few clicks. SageMaker JumpStart also supports one-click deployment and fine-tuning of more than 150 popular open-source models.

Get started »
Pre-built solutions and open-source models

Optimized for major frameworks

Amazon SageMaker is optimized for many popular deep learning frameworks such as TensorFlow, Apache MXNet, PyTorch, and more. Frameworks are always up-to-date with the latest version, and are optimized for performance on AWS. You don’t need to manually setup these frameworks and can use them within the built-in containers.

Get started »
Optimized for major frameworks

Train

Detect bias and understand predictions

Amazon SageMaker Clarify provides data to improve model quality through bias detection during data preparation and after training. SageMaker Clarify also provides model explainability reports so stakeholders can see how and why models make predictions.

Learn more »
Detect bias and understand predictions

Organize, track, and evaluate training runs

Amazon SageMaker Experiments automatically captures training input parameters, configurations, and results, and stores them as ‘experiments’. You can browse active experiments, search for previous experiments by their characteristics, review previous experiments with their results, and compare experiment results visually.

Learn more »
Organize, track, and evaluate training runs

Detect and debug problems

Amazon SageMaker Debugger captures metrics in real-time so you can correct performance problems quickly before the model is deployed to production.

Learn more »
Detect and debug problems

Deploy

Continuously monitor model

Amazon SageMaker Model Monitor automatically detects model and concept drifts and provides detailed alerts that help identify the source of the problem so you can improve model quality over time. All models trained in SageMaker automatically emit key metrics that can be collected and viewed in SageMaker Studio.

Learn more »
Continuously monitor model

Easy Deployment Options

Amazon SageMaker provides the broadest selection of machine learning (ML) infrastructure and model deployment options meeting the needs of your use case, whether real-time or batch, so you can easily deploy your ML models at scale. SageMaker supports the entire spectrum of inference requirements ranging from low latency (a few milliseconds) and high throughput (hundreds of thousands of inference requests per second), to long-running inference for use cases such as natural language processing (NLP) and computer vision (CV).

Learn more »
Easy Deployment Options

Getting started

Learn more about Amazon SageMaker Studio

Visit the webpage »

Learn machine learning

Start building »

Have more questions?

Explore resources »

Explore more of AWS