Q: What is Amazon SageMaker?
Amazon SageMaker is a fully-managed service that enables data scientists and developers to quickly and easily build, train, and deploy machine learning models.
Q: What can I do with Amazon SageMaker?
Amazon SageMaker enables developers and scientists to build machine learning models for use in intelligent, predictive apps.
Q: How do I get started with Amazon SageMaker?
To get started with Amazon SageMaker, you log into the Amazon SageMaker console, launch a notebook instance with an example notebook, modify it to connect to your data sources, follow the example to build/train/validate models, and deploy the resulting model into production with just a few inputs.
Q: Can I get a history of Amazon SageMaker API calls made on my account for security analysis and operational troubleshooting purposes?
Yes. To receive a history of Amazon SageMaker API calls made on your account, you simply turn on AWS CloudTrail in the AWS Management Console. The following API calls in Amazon SageMaker Runtime are *not* recorded and delivered: InvokeEndpoint.
Q: What is the service availability of Amazon SageMaker?
Amazon SageMaker is designed for high availability. There are no maintenance windows or scheduled downtimes. Amazon SageMaker APIs run in Amazon’s proven, high-availability data centers, with service stack replication configured across three facilities in each AWS region to provide fault tolerance in the event of a server failure or Availability Zone outage.
Q: What security measures does Amazon SageMaker have?
Amazon SageMaker ensures that ML model artifacts and other system artifacts are encrypted in transit and at rest. Requests to the Amazon SageMaker API and console are made over a secure (SSL) connection. You pass AWS Identity and Access Management roles to Amazon SageMaker to provide permissions to access resources on your behalf for training and deployment. You can use encrypted S3 buckets for model artifacts and data, as well as pass a KMS key to Amazon SageMaker notebooks, training jobs, and endpoints, to encrypt the attached ML storage volume.
Q: How does Amazon SageMaker secure my code?
Amazon SageMaker stores code in ML storage volumes, secured by security groups and optionally encrypted at rest.
Q: How am I charged for Amazon SageMaker?
You pay for ML compute, storage, and data processing resources you use for hosting the notebook, training the model, performing predictions, and logging the outputs. Amazon SageMaker allows you to select the number and type of instance used for the hosted notebook, training, and model hosting. You only pay for what you use, as you use it; there are no minimum fees and no upfront commitments.
Q: What if I have my own notebook, training, or hosting environment?
Amazon SageMaker provides a full end-to-end workflow, but you can continue to use your existing tools with Amazon SageMaker. You can easily transfer the results of each stage in and out of Amazon SageMaker as your business requirements dictate.
Hosted Jupyter notebooks
Q: What types of notebooks are supported?
Currently, Jupyter notebooks are supported.
Q: How do you persist notebook files when I stop my workspace?
You can persist your notebook files on the attached ML storage volume. The ML storage volume will be detached when the notebook instance is shut down and reattached when the notebook instance is relaunched. Items stored in memory will not be persisted.
Q: How do I increase the available resources in my notebook?
You can modify the notebook instance and select a larger profile through the Amazon SageMaker console, after saving your files and data on the attached ML storage volume. The notebook instance will be restarted with greater available resources, with the same notebook files and installed libraries.
Q: How can I train a model from an Amazon SageMaker notebook?
After launching an example notebook, you can customize the notebook to fit your data source and schema, and execute the AWS APIs for creating a training job. The progress or completion of the training job is available through the Amazon SageMaker console or AWS APIs.
Q: Are there limits to the size of the dataset I can use for training?
There are no fixed limits to the size of the dataset you can use for training models with Amazon SageMaker.
Q: What data sources can I easily pull into Amazon SageMaker?
You can specify the Amazon S3 location of your training data as part of creating a training job.
Q: What algorithms does Amazon SageMaker use to generate models?
Amazon SageMaker includes built-in algorithms for linear regression, logistic regression, k-means clustering, principal component analysis, factorization machines, neural topic modeling, latent dirichlet allocation, gradient boosted trees, sequence2sequence, time series forecasting, word2vec, and image classification. Amazon SageMaker also provides optimized Apache MXNet, Tensorflow, Chainer, and PyTorch containers. In addition, Amazon SageMaker supports your custom training algorithms provided through a Docker image adhering to the documented specification.
Q: What is Automatic Model Tuning?
Most machine learning algorithms expose a variety of parameters that control how the underlying algorithm operates. Those parameters are generally referred to as hyperparameters and their values affect the quality of the trained models. Automatic model tuning is the process of finding a set of hyperparameters for an algorithm that can yield an optimal model.
Q: What models can be tuned with Automatic Model Tuning?
You can run automatic model tuning in Amazon SageMaker on top of any algorithm as long as it’s scientifically feasible, including built-in SageMaker algorithms, deep neural networks, or arbitrary algorithms you bring to Amazon SageMaker in the form of Docker images.
Q: Can I use Automatic Model Tuning outside of Amazon SageMaker?
Not at this time. The best model tuning performance and experience is within Amazon SageMaker.
Q: What is the underlying tuning algorithm?
Currently, our algorithm for tuning hyperparameters is a customized implementation of Bayesian Optimization. It aims to optimize a customer specified objective metric throughout the tuning process. Specifically, it checks the object metric of completed training jobs, and leverages the knowledge to infer the hyperparameter combination for the next training job.
Q: Will you recommend specific hyperparameters for tuning?
No. How certain hyperparameters impact the model performance depends on various factors and it is hard to definitively say one hyperparameter is more important than the others and thus needs to be tuned. For built-in algorithms within Amazon SageMaker, we do call out whether or not a hyperparameter is tunable.
Q: How long does a hyperparameter tuning job take?
The length of time for a hyperparameter tuning job depends on multiple factors including the size of the data, the underlying algorithm, and the values of the hyperparameters. Additionally, customers can choose the number of simultaneous training jobs and total number of training jobs. All these choices affect how long a hyperparameter tuning job can last.
Q: Can I optimize multiple objectives simultaneously like a model to be both fast and accurate?
Not at this time. Right now, you need to specify a single objective metric to optimize or change your algorithm code to emit a new metric, which is a weighted average between two or more useful metrics, and have the tuning process optimize towards that objective metric.
Q: Can I access the infrastructure that Amazon SageMaker runs on?
No. Amazon SageMaker operates the compute infrastructure on your behalf, allowing it to perform health checks, apply security patches, and do other routine maintenance. You can also deploy the model artifacts from training with custom inference code in your own hosting environment.
Q: How do I scale the size and performance of an Amazon SageMaker model once in production?
Amazon SageMaker hosting automatically scales to the performance needed for your application using Application Auto Scaling. In addition, you can manually change the instance number and type without incurring downtime through modifying the endpoint configuration.
Q: How do I monitor my Amazon SageMaker production environment?
Amazon SageMaker emits performance metrics to Amazon CloudWatch Metrics so you can track metrics, set alarms, and automatically react to changes in production traffic. In addition, Amazon SageMaker writes logs to Amazon Cloudwatch Logs to let you monitor and troubleshoot your production environment.
Q: What kinds of models can be hosted with Amazon SageMaker?
Amazon SageMaker can host any model that adheres to the documented specification for inference Docker images. This includes models created from Amazon SageMaker model artifacts and inference code.
Q: How many concurrent real-time API requests does Amazon SageMaker support?
Amazon SageMaker is designed to scale to a large number of transactions per second. The precise number varies based on the deployed model and the number and type of instances to which the model is deployed.
Q: What is Batch Transform
Batch Transform enables you to run predictions on large or small batch data. There is no need to break down the data set into multiple chunks or managing real-time endpoints. With a simple API, you can request predictions for a large number of data records and transform the data quickly and easily.