Amazon SageMaker for Data Scientists
Integrated development environment (IDE) for the ML lifecycle
250 hours per month of ml.t3.medium
on Studio notebooks for the first 2 months with the AWS Free Tier
Access data from structured and unstructured data sources
Improve productivity with purpose-built tools
Fully managed Jupyter Notebooks with just a few clicks
How it works
Data science is the study of data to extract meaningful insights for business. It asks and answers questions such as what happened, why it happened, and what will happen. Machine learning (ML) is essential for data science because ML makes it practical for machines to solve problems that traditional analytics cannot easily solve with rule-based logic. ML analyzes data and discovers patterns by learning from examples. Machines can then use the patterns to recognize unknown instances. Amazon SageMaker offers a broad set of ML capabilities used by tens of thousands of customers to access and analyze data, and build, train, and deploy high-quality ML models. Your data science teams can be up to 10 times more productive using Amazon SageMaker.

Prepare
Prepare data for ML in minutes
With SageMaker Data Wrangler’s data selection tool, you can quickly select data from multiple data sources, such as Amazon Athena, Amazon Redshift, AWS Lake Formation, Amazon Simple Storage Service (S3), and the Amazon SageMaker Feature Store. You can write queries for data sources and import data directly into SageMaker from various file formats, and use SageMaker Data Wrangler’s visualization templates and built-in data transforms to ensure data prepared will result in accurate ML models.

Low-latency feature store
A fully managed repository to store, update, retrieve, and share ML features, SageMaker Feature Store serves the exact same features in batch for training and in real time for inference so you don’t need to write code to keep features consistent. You can easily add new features, update existing ones, retrieve features in batches for training, and get the same features with single-digit millisecond latency for real-time inference.

Scalable data preparation using notebooks
You can visually browse, discover, and connect to Apache Spark data processing environments running on Amazon EMR from your SageMaker Studio notebooks with a few clicks. Once connected, you can interactively query, explore, and visualize data, and run Spark jobs using the language of your choice (SQL, Python, and Scala) to build end-to-end data preparation and ML workflows.

Data labeling
Amazon SageMaker data labeling allows you to identify raw data, such as images, text files, and videos, and add informative labels to create high-quality training datasets for your ML models.

Build
One-click Jupyter notebooks
Amazon SageMaker Studio notebooks are one-click Jupyter notebooks that can be spun up quickly. The underlying compute resources are fully elastic, so you can easily dial up or down the available resources and the changes take place automatically in the background without interrupting your work. Notebooks can be shared with a single click, and your colleagues get the exact same notebook, saved in the same place.

Built-in algorithms
Amazon SageMaker offers over 15 built-in algorithms available in pre-built container images that can be used to quickly train and run inference.

Pre-built solutions and open-source models
Amazon SageMaker JumpStart helps you quickly get started with ML using pre-built solutions that can be deployed with just a few clicks. SageMaker JumpStart also supports one-click deployment and fine-tuning of more than 150 popular open-source models.

Optimized for major frameworks
Amazon SageMaker is optimized for many popular deep-learning frameworks such as TensorFlow, Apache MXNet, PyTorch, and more. Frameworks are always up to date with the latest version and are optimized for performance on AWS. You don’t need to manually set up these frameworks and can use them within the built-in containers.

Train
Detect bias and understand predictions
Amazon SageMaker Clarify provides data to improve model quality through bias detection during data preparation and after training. SageMaker Clarify also provides model explainability reports so stakeholders can see how and why models make predictions.

Organize, track, and evaluate training runs
Amazon SageMaker Experiments automatically captures training input parameters, configurations, and results, and stores them as "experiments". You can browse active experiments, search for previous experiments by their characteristics, review previous experiments with their results, and compare experiment results visually.

Detect and debug problems
Amazon SageMaker Debugger captures metrics in real time so you can correct performance problems quickly before the model is deployed to production.

Deploy
Continuously monitor models
Amazon SageMaker Model Monitor automatically detects model and concept drifts and provides detailed alerts that help identify the source of the problem so you can improve model quality over time. All models trained in SageMaker automatically emit key metrics that can be collected and viewed in SageMaker Studio.

Easy deployment options
Amazon SageMaker provides the broadest selection of ML infrastructure and model deployment options meeting the needs of your use case, whether real-time or batch, so you can easily deploy your ML models at scale. SageMaker supports the entire spectrum of inference requirements ranging from low latency (a few milliseconds) and high throughput (hundreds of thousands of inference requests per second) to long-running inference for use cases such as natural language processing (NLP) and computer vision (CV).
