General

Q: What is Amazon SageMaker?

Amazon SageMaker is a fully-managed service that enables data scientists and developers to quickly and easily build, train, and deploy machine learning models.

Q: What can I do with Amazon SageMaker?

Amazon SageMaker enables developers and scientists to build machine learning models for use in intelligent, predictive apps.

Q: How do I get started with Amazon SageMaker?

To get started with Amazon SageMaker, you log into the Amazon SageMaker console, launch a notebook instance with an example notebook, modify it to connect to your data sources, follow the example to build/train/validate models, and deploy the resulting model into production with just a few inputs.

Q: In which regions is Amazon SageMaker available?

For a list of the supported Amazon SageMaker AWS regions, please visit the AWS Region Table for all AWS global infrastructure. Also for more information, see Regions and Endpoints in the AWS General Reference.

Q: Can I get a history of Amazon SageMaker API calls made on my account for security analysis and operational troubleshooting purposes?  

Yes. To receive a history of Amazon SageMaker API calls made on your account, you simply turn on AWS CloudTrail in the AWS Management Console. The following API calls in Amazon SageMaker Runtime are *not* recorded and delivered: InvokeEndpoint.

Q: What is the service availability of Amazon SageMaker?

Amazon SageMaker is designed for high availability. There are no maintenance windows or scheduled downtimes. Amazon SageMaker APIs run in Amazon’s proven, high-availability data centers, with service stack replication configured across three facilities in each AWS region to provide fault tolerance in the event of a server failure or Availability Zone outage.

Q: What security measures does Amazon SageMaker have?

Amazon SageMaker ensures that ML model artifacts and other system artifacts are encrypted in transit and at rest. Requests to the Amazon SageMaker API and console are made over a secure (SSL) connection. You pass AWS Identity and Access Management roles to Amazon SageMaker to provide permissions to access resources on your behalf for training and deployment. You can use encrypted S3 buckets for model artifacts and data, as well as pass a KMS key to Amazon SageMaker notebooks, training jobs, and endpoints, to encrypt the attached ML storage volume.

Q: How does Amazon SageMaker secure my code?

Amazon SageMaker stores code in ML storage volumes, secured by security groups and optionally encrypted at rest.

Q: How am I charged for Amazon SageMaker?

You pay for ML compute, storage, and data processing resources you use for hosting the notebook, training the model, performing predictions, and logging the outputs. Amazon SageMaker allows you to select the number and type of instance used for the hosted notebook, training, and model hosting. You only pay for what you use, as you use it; there are no minimum fees and no upfront commitments.

Q: What if I have my own notebook, training, or hosting environment?

Amazon SageMaker provides a full end-to-end workflow, but you can continue to use your existing tools with Amazon SageMaker. You can easily transfer the results of each stage in and out of Amazon SageMaker as your business requirements dictate.

Hosted Jupyter notebooks

Q: What types of notebooks are supported?

Currently, Jupyter notebooks are supported.

Q: How do you persist notebook files when I stop my workspace?

You can persist your notebook files on the attached ML storage volume. The ML storage volume will be detached when the notebook instance is shut down and reattached when the notebook instance is relaunched. Items stored in memory will not be persisted.

Q: How do I increase the available resources in my notebook?

You can modify the notebook instance and select a larger profile through the Amazon SageMaker console, after saving your files and data on the attached ML storage volume. The notebook instance will be restarted with greater available resources, with the same notebook files and installed libraries.

Q: How can I train a model from an Amazon SageMaker notebook?

After launching an example notebook, you can customize the notebook to fit your data source and schema, and execute the AWS APIs for creating a training job. The progress or completion of the training job is available through the Amazon SageMaker console or AWS APIs.

Model Training

Q: What is Managed Spot Training?

Managed Spot Training with Amazon SageMaker lets you train your machine learning models using Amazon EC2 Spot instances, while reducing the cost of training your models by up to 90%.

Q: How do I use Managed Spot Training?

You enable the Managed Spot Training option when submitting your training jobs and you also specify how long you want to wait for Spot capacity. Amazon SageMaker will then use Amazon EC2 Spot instances to run your job and manages the Spot capacity. You have full visibility into the status of your training job, both while they are running and while they are waiting for capacity.

Q: When should I use Managed Spot Training?

Managed Spot Training is ideal when you have flexibility with your training runs and when you want to minimize the cost of your training jobs. With Managed Spot Training, you can reduce the cost of training your machine learning models by up to 90%.

Q: How does Manage Spot Training work?

Managed Spot Training uses Amazon EC2 Spot instances for training, and these instances can be pre-empted when AWS needs capacity. As a result, Managed Spot Training jobs can run in small increments as and when capacity becomes available. The training jobs need not be restarted from scratch when there is an interruption as Amazon SageMaker can resume the training jobs using the latest model checkpoint. The built-in frameworks and the built-in computer vision algorithms with Amazon SageMaker enable periodic checkpoints, and you can enable checkpoints with custom models.

Q: Do I need to periodically checkpoint with Managed Spot Training?

We recommend periodic checkpoints as a general best practice for long running training jobs. This prevents your Managed Spot Training jobs from restarting if capacity is pre-empted. When you enable checkpoints, Amazon SageMaker resumes your Managed Spot Training jobs from the last checkpoint. 

Q: How do you calculate the cost savings with Managed Spot Training jobs?

Once a Managed Spot Training job is completed, you can see the savings in the AWS management console and also calculate the cost savings as the percentage difference between the duration for which the training job ran and the duration for which you were billed.

Regardless of how many times your Managed Spot Training jobs are interrupted, you are charged only once based on the amount of data downloaded.

Q: Which instances can I use with Managed Spot Training?

Managed Spot Training can be used with all instances supported in Amazon SageMaker.

Q: Which AWS regions are supported with Managed Spot Training?

Managed Spot Training is supported on all AWS regions where Amazon SageMaker is currently available.

Q: Are there limits to the size of the dataset I can use for training?

There are no fixed limits to the size of the dataset you can use for training models with Amazon SageMaker.

Q: What data sources can I easily pull into Amazon SageMaker?

You can specify the Amazon S3 location of your training data as part of creating a training job.

Q: What algorithms does Amazon SageMaker use to generate models?

Amazon SageMaker includes built-in algorithms for linear regression, logistic regression, k-means clustering, principal component analysis, factorization machines, neural topic modeling, latent dirichlet allocation, gradient boosted trees, sequence2sequence, time series forecasting, word2vec, and image classification. Amazon SageMaker also provides optimized Apache MXNet, Tensorflow, Chainer, and PyTorch containers. In addition, Amazon SageMaker supports your custom training algorithms provided through a Docker image adhering to the documented specification.

Q: What is Automatic Model Tuning?

Most machine learning algorithms expose a variety of parameters that control how the underlying algorithm operates. Those parameters are generally referred to as hyperparameters and their values affect the quality of the trained models. Automatic model tuning is the process of finding a set of hyperparameters for an algorithm that can yield an optimal model.

Q: What models can be tuned with Automatic Model Tuning?

You can run automatic model tuning in Amazon SageMaker on top of any algorithm as long as it’s scientifically feasible, including built-in SageMaker algorithms, deep neural networks, or arbitrary algorithms you bring to Amazon SageMaker in the form of Docker images.

Q: Can I use Automatic Model Tuning outside of Amazon SageMaker?

Not at this time. The best model tuning performance and experience is within Amazon SageMaker.

Q: What is the underlying tuning algorithm?

Currently, our algorithm for tuning hyperparameters is a customized implementation of Bayesian Optimization. It aims to optimize a customer specified objective metric throughout the tuning process. Specifically, it checks the object metric of completed training jobs, and leverages the knowledge to infer the hyperparameter combination for the next training job.

Q: Will you recommend specific hyperparameters for tuning?

No. How certain hyperparameters impact the model performance depends on various factors and it is hard to definitively say one hyperparameter is more important than the others and thus needs to be tuned. For built-in algorithms within Amazon SageMaker, we do call out whether or not a hyperparameter is tunable.

Q: How long does a hyperparameter tuning job take?

The length of time for a hyperparameter tuning job depends on multiple factors including the size of the data, the underlying algorithm, and the values of the hyperparameters. Additionally, customers can choose the number of simultaneous training jobs and total number of training jobs. All these choices affect how long a hyperparameter tuning job can last. 

Q: Can I optimize multiple objectives simultaneously like a model to be both fast and accurate?

Not at this time. Right now, you need to specify a single objective metric to optimize or change your algorithm code to emit a new metric, which is a weighted average between two or more useful metrics, and have the tuning process optimize towards that objective metric.

Q: How much Automatic Model Tuning cost?

There is no charge for a hyperparameter tuning job itself. You will be charged by the training jobs that are launched by the hyperparameter tuning job, based on model training pricing.

Q: What is reinforcement learning?

Reinforcement learning is a machine learning technique that enables an agent to learn in an interactive environment by trial and error using feedback from its own actions and experiences.

Q: Can I train reinforcement learning models in Amazon SageMaker?

Yes, you can train reinforcement learning models in Amazon SageMaker in addition to supervised and unsupervised learning models.

Q: How is reinforcement learning different from supervised learning?

Though both supervised and reinforcement learning use mapping between input and output, unlike supervised learning where the feedback provided to the agent is correct set of actions for performing a task, reinforcement learning uses a delayed feedback where reward signals are optimized to ensure a long-term goal through a sequence of actions.

Q: When should I use reinforcement learning?

While the goal of supervised learning techniques is to find the right answer based on the patterns in the training data and the goal of unsupervised learning techniques is to find similarities and differences between data points. In contrast, the goal of reinforcement learning techniques is to learn how to achieve a desired outcome even when it is not clear how to accomplish that outcome. As a result, RL is more suited to enabling intelligent applications where an agent can make autonomous decisions such as robotics, autonomous vehicles, HVAC, industrial control, and more.

Q: What type of environments can I use for training reinforcement learning models?

Amazon SageMaker RL supports a number of different environments for training reinforcement learning models. You can use AWS services such as AWS RoboMaker, open source environments or custom environments developed using Open AI Gym interfaces, or commercial simulation environments such as MATLAB and SimuLink.

Q: Do I need to write my own RL agent algorithms to train reinforcement learning models?

No, Amazon SageMaker RL includes RL toolkits such as Coach and Ray RLLib that offer implementations of RL agent algorithms such as DQN, PPO, A3C, and many more.

Q: Can I bring my own RL libraries and algorithm implementation and run in Amazon SageMaker RL?

Yes, you can bring your own RL libraries and algorithm implementations in Docker Containers and run those in Amazon SageMaker RL.

Q: Can I do distributed rollouts using Amazon SageMaker RL?

Yes. You can even select a heterogeneous cluster where the training can run on a GPU instance and the simulations can run on multiple CPU instances.

Q: What is Amazon SageMaker Neo?

Amazon SageMaker Neo is a new capability that enables machine learning models to train once and run anywhere in the cloud and at the edge. SageMaker Neo automatically optimizes models built with popular deep learning frameworks that can be used to deploy on multiple hardware platforms. Optimized models run up to two times faster and consume less than a tenth of the resources of typical machine learning models.

Q: How do I get started with Amazon SageMaker Neo?

To get started with Amazon SageMaker Neo, you log into the Amazon SageMaker console, choose a trained model, follow the example to compile models, and deploy the resulting model onto your target hardware platform.

Q: What are the major components of Amazon SageMaker Neo?

Amazon SageMaker Neo contains two major components – a compiler and a runtime. First, the Neo compiler reads models exported by different frameworks. It then converts the framework-specific functions and operations into a framework-agnostic intermediate representation. Next, it performs a series of optimizations. Then, the compiler generates binary code for the optimized operations and writes them to a shared object library. The compiler also saves the model definition and parameters into separate files. During execution, the Neo runtime loads the artifacts generated by the compiler -- model definition, parameters, and the shared object library to run the model.

Q: Do I need to use SageMaker to train my model in order to use Neo to convert the model?

No. You can train models elsewhere and use Neo to optimize them for SageMaker ML instances or Greengrass supported devices. 

Q: Which models does SageMaker Neo support?

Currently, SageMaker Neo supports the most popular deep learning models that power computer vision applications and the most popular decision tree models used in Amazon SageMaker today. Neo optimizes the performance of AlexNet, ResNet, VGG, Inception, MobileNet, SqueezeNet, and DenseNet models trained in MXNet and TensorFlow, and classification and random cut forest models trained in XGBoost.

Q: Which platforms does SageMaker Neo support?

Currently, Neo supports SageMaker ML.C5, ML.C4, ML.M5, ML.M4, ML.P3, and ML.P2 instances and AWS DeepLens, Raspberry Pi, and Jetson TX1 and TX2 devices, and Greengrass devices-based Intel® Atom and Intel® Xeon CPUs, ARM Cortex-A CPUs, and Nvidia Maxwell and Pascal GPUs.

Q: Do I need to use a specific version of a framework that is supported on the target hardware?

No. Developers can run models using the SageMaker Neo container without dependencies on the framework.

Q: How much does it cost to use SageMaker Neo?

You pay for the use of the SageMaker ML instance that runs inference using SageMaker Neo.

Q: In which AWS regions is SageMaker Neo available?

Currently, SageMaker Neo is available in N. Virginia, Oregon, Ohio, Ireland, Asia Pacific (Tokyo), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Seoul), Asia Pacific (Mumbai), Asia Pacific (Hong Kong), Canada (Central), EU (Frankfurt), EU (London), EU (Paris), EU (Stockholm), South America (Sao Paulo), and US West (N. California) AWS regions.

Q: What is Amazon SageMaker model tracking?

Amazon SageMaker model tracking enables you to quickly find and evaluate the most relevant model training runs from potentially hundreds and thousands of your Amazon SageMaker model training jobs. SageMaker Search is available through both AWS Management Console and AWS SDK APIs for Amazon SageMaker.

Q: How can I organize and track my model training runs?

With the model tracking capabilities, you can search and organize your model training runs by any training job property of your choice such as training job creation time, training dataset URI, hyper parameter values, or just any training job metadata. A flexible way to organize and group related training jobs together is to use tags for labelling model training jobs. Searching on tags lets you quickly find the model training runs associated to a specific business project, a research lab or a data science team, helping you to meaningfully categorize and catalog your model training runs.

Q: How do I create a training run leaderboard using the model tracking capabilities?

The model training jobs are presented to you on the AWS management console in a tabular format, similar to a leaderboard. This contains all the hyperparameters and model training metrics presented in sortable columns. You can click on the column header to rank the leaderboard for the objective performance metric of your choice. You can also quickly compare and rank model training runs based on performance metrics such as training loss and validation accuracy, thus using leaderboards to pick “winning” models to deploy into production environments.

Q: How do I trace model or endpoint lineage?

Navigate to “Endpoints” on the AWS Management Console for Amazon SageMaker and choose the end point from the list of all your deployed endpoints. Then scroll down to “Endpoint Configuration Settings” on the chosen endpoint page to see all the model versions deployed at the endpoint. Right next to each model version you can see a direct link to the model training job that created the model in first place.

Model Deployment

Q: Can I access the infrastructure that Amazon SageMaker runs on?

No. Amazon SageMaker operates the compute infrastructure on your behalf, allowing it to perform health checks, apply security patches, and do other routine maintenance. You can also deploy the model artifacts from training with custom inference code in your own hosting environment.

Q: How do I scale the size and performance of an Amazon SageMaker model once in production?

Amazon SageMaker hosting automatically scales to the performance needed for your application using Application Auto Scaling. In addition, you can manually change the instance number and type without incurring downtime through modifying the endpoint configuration.

Q: How do I monitor my Amazon SageMaker production environment?

Amazon SageMaker emits performance metrics to Amazon CloudWatch Metrics so you can track metrics, set alarms, and automatically react to changes in production traffic. In addition, Amazon SageMaker writes logs to Amazon Cloudwatch Logs to let you monitor and troubleshoot your production environment.

Q: What kinds of models can be hosted with Amazon SageMaker?

Amazon SageMaker can host any model that adheres to the documented specification for inference Docker images. This includes models created from Amazon SageMaker model artifacts and inference code.

Q: How many concurrent real-time API requests does Amazon SageMaker support?

Amazon SageMaker is designed to scale to a large number of transactions per second. The precise number varies based on the deployed model and the number and type of instances to which the model is deployed.

Q: What is Batch Transform

Batch Transform enables you to run predictions on large or small batch data. There is no need to break down the data set into multiple chunks or managing real-time endpoints. With a simple API, you can request predictions for a large number of data records and transform the data quickly and easily.

Learn more about Amazon SageMaker pricing

Visit the pricing page
Ready to get started?
Sign up
Have more questions?
Contact us