Get Started with the Implementation Guide

6 Steps  |  30 Minutes


Q: What is Amazon Machine Learning (Amazon ML)?

Amazon ML is a service that allows you to easily build predictive applications, including fraud detection, demand forecasting, and click prediction. Amazon ML uses powerful algorithms that can help you create machine learning models by finding patterns in existing data, and using these patterns to make predictions from new data as it becomes available. The AWS Management Console and API provide data and model visualization tools, as well as wizards to guide you through the process of creating machine learning models, measuring their quality and fine-tuning the predictions to match your application requirements. Once the models are created, you can get predictions for your application by using the simple API, without having to implement custom prediction generation code or manage any infrastructure. Amazon ML is highly scalable and can generate billions of predictions, and serve those predictions in real-time and at high throughput. With Amazon ML there is no setup cost and you pay as you go, so you can start small and scale as your application grows.

Q: What are some use cases for Amazon ML?

You can use Amazon ML to create a wide variety of predictive applications. For example, you can use Amazon ML to help you build applications that flag suspicious transactions, detect fraudulent orders, forecast demand, personalize content, predict user activity, filter reviews, listen to social media, analyze free text, and recommend items.

Q: What security measures does Amazon ML have?

Amazon ML ensures that ML models and other system artifacts are encrypted in transit and at rest. Requests to the Amazon ML API and console are made over a secure (SSL) connection. You can use AWS Identity and Access Management (AWS IAM) to control which IAM users have access to specific Amazon Machine Learning actions and resources.

Q: Where can I store my data with Amazon ML?

You can use Amazon ML to read your data from three data stores: (a) one or more files in Amazon S3, as with this Project example; (b) results of an Amazon Redshift query, or (c) results of an Amazon Relational Database Service (RDS) query when executed against a database running with the MySQL engine. Data from other products can usually be exported into CSV files in Amazon S3, making it accessible to Amazon ML. For detailed instructions for configuring permissions that enable Amazon ML to access the supported data stores, see the Amazon Machine Learning Developer Guide.

Q: I want to use this Project example with my own data. Are there limits to the size of the dataset I can use for training?

Amazon ML can train models on datasets up to 100 GB in size.

Q: How do I tune my model if it isn’t giving the results I want?

The best way to increase a model’s quality is by using more and higher-quality data to train it. Adding more observations, adding additional types of information, and transforming your data to optimize the learning process are all great ways to improve the model’s predictive accuracy. Amazon ML also provides several parameters for tuning the learning process: (a) target size of the model, (b) the number of passes to be made over the data, and (c) the type and amount of regularization to apply to the model. Finally, one important aspect of model tuning to consider is how predictions generated by your ML model are interpreted by your application, to align them optimally with the business goals. Amazon ML helps you adjust the interpretation cut-off score for binary classification models, enabling you to make an informed trade-off between different kinds of mistakes that a trained model can make. For example, some applications are very tolerant of false positive errors, but false negative errors are highly undesirable—the Amazon ML service console helps you adjust the score cut-off to align with this requirement.

Q: What can I do with the predictive model I build with Amazon ML?

Once you have generated your predictions, there are several ways you can utilize the results. For example, you can load the data into a spreadsheet to sort and filter the data by prediction scores. You can also load the data into a database like Amazon RDS or Amazon RedShift to generate lists of qualified segments. Additionally, you could load the prediction score into a NoSQL database using Amazon DynamoDB to allow real time serving of the prediction score for an application.

Get Started with the Implementation Guide