Using AWS Marketplace for machine learning workloads

Organizations of all sizes are realizing that Machine Learning is more than a nice-to-have capability. It’s becoming a necessary differentiator that has the potential to impact almost every aspect of the business. From back-office optimizations to business forecasting and risk reduction, ML is critical for companies looking to innovate and remain relevant. One way for organizations to drive ML capabilities is to enable technical teams, not just data scientists, to experiment and deploy ML in their applications. AWS Marketplace makes it easy to find, test, buy, and deploy software that runs on AWS. AWS Marketplace has an ML category that includes ML models and algorithms which can be deployed with Amazon SageMaker.

In this blog post, I will walk through how to use AWS Marketplace to solve a machine learning problem. I’ll also demonstrate how to use a combination of AWS services and third-party software available in AWS Marketplace to achieve a solution.

How AWS Marketplace supports your ML workloads

1. Problem formulation

Mapping a business problem to a machine learning problem

A good ML project starts with a well-defined business objective. Often, a business objective maps to multiple ML problems. For example, a goal of improving an auto insurance claim processing workflow might require a chatbot solution. It could also require multiple models for license plate identification; car make, model, and year identification; and fraud detection. Once you have mapped your business objective to one or more ML problems, you can then solve each problem individually. Before you do so, it is important to set success criteria, so that you can evaluate how well your solution performed relative to your goals.

Defining success criteria

Define your success criteria before attempting to solve your problem. Once you have solved the problem, you can evaluate the quality of your model against the criteria.

For example, success criteria for a model to recognize the make of a car might cover the following information:

- The list of “Make” of cars the model must recognize.
- Sample car images for which model’s following metrics must be above a specific threshold:
  - Recall of X %
  - Precision of Y %
  - Accuracy of Z %

The metrics listed above are just examples. Metrics, as well as thresholds for them, vary based on the type of problem you want to solve. For more information, see Evaluating ML models.

2. Solving the ML problem

After mapping your business objectives to ML problems and defining success criteria, you are ready to solve your ML problem. You have two options:

- Use a pre-built ML model solution.
- Build your own custom ML model.

Using pre-built solutions to solve the problem

Using pre-built software allows you to experiment quickly with low upfront investment. If you are looking to launch an ML-enabled product, pre-built models and algorithms help you get to market faster. However, these models may be less customized for your business, industry, or use case. The best approach is to combine multiple solutions to achieve your desired business outcome.

Pre-built models often work best if you’re facing these challenges:

- Lack of data scientists in the organization.
- Lack of good quality training data for solving the problem. Many commercial datasets are expensive and you might be working on a problem for which datasets may not be available.
- Time and resources required for training a good quality model.
- Sheer number of ML problems in the backlog.
- The problem you are trying to solve is generic. Some examples of problems in this category are speech to text, text to speech, and optical character recognition (OCR).
- You want to do rapid prototyping to try an idea before making a large investment. ML projects can be time and resource-intensive. With a pay-as-you-go (PAYG) pricing model support for pre-trained ML models, you can deploy a model to evaluate whether an idea is worth investing in.

The AWS Marketplace for machine learning provides pre-built ML solutions for both AI services and Amazon SageMaker compatible pre-trained ML models. It has over 210 pre-trained models from over 38 sellers. Since these are pre-trained ML models, you don’t have to find training data or spend time tuning models. You can stand up an Amazon SageMaker endpoint from these models, or you can perform a batch transform and start performing predictions almost instantaneously.

This re:MARS session offers a walkthrough of a sample application that uses pre-trained models to solve a business problem. In the video, I show how to automate auto insurance claim processing by quickly creating a claim-filing chatbot.

AWS Marketplace for machine learning contains two types of solutions:

- Domain-independent ML solutions – Domain-independent ML models are offered by sellers in AWS Marketplace. These are different from managed, domain-independent Artificial intelligence (AI) services offered by AWS. Those include Amazon Rekognition and Amazon Lex, which are created and hosted by AWS. Some examples of domain-independent, pre-trained ML models are Demisto Phishing email classifier and Mphasis DeepInsights Email Intent.

- Domain-specific ML solutions – AWS Marketplace also contains a broad category of pre-trained ML models that solve domain-specific problems. These include LexisNexis US Legal Taxonomy – Level 1 and News taggers by Novetta. AWS also offers some managed AI services such as Amazon Comprehend Medical, which is an AI service for a specific domain.

Building a custom machine learning model to solve the problem

You may come across problems for which there are no pre-trained ML models or services that directly solve your ML problem. This is when you plan an ML project to build a custom ML model.

Here is an overview of the steps to train a custom ML model:

Identify the data to help you solve the problem.
Define the task type, such as supervised, unsupervised, or a reinforcement learning task.
Label the dataset. If task is supervised, label the dataset if not already labeled.
Perform feature engineering to create a dataset that can be fed to a model training job.
Train an ML model.

A. Identify and source data

For your ML problem, you must identify data to help you solve the problem. Once you have identified your data requirements, sometimes, you do not have the necessary data. When data is not available, many customers use open source, publicly available data. Here are a few options where you can find data:

- AWS Data Exchange is a service that makes it easy for millions of AWS customers to securely find, subscribe to, and use third-party data in the cloud. AWS Data Exchange has over a 1000 products from over 80 data providers, including over 500 free products. For a complete list of data products available from AWS Data Exchange, see the product catalog. Once subscribed to a data product, you can use the AWS Data Exchange API to load data directly into Amazon S3 and then analyze it with a wide variety of AWS analytics and machine learning services.
- AWS Marketplace for machine learning If you can’t find the data you need, check for a pre-trained ML model in the AWS Marketplace for machine learning that can solve the problem.
- Registry of Open Data on AWS The Registry of Open Data on AWS makes it easy to find publicly available datasets through AWS services. However, for commercial, practical applications, you need high-quality data specific to your problem. It is common practice for organizations to source the data from a third-party vendor.

B. Defining task type: supervised vs unsupervised

Next, decide whether you would you like to solve the problem using supervised learning, unsupervised learning, or a combination of both. Unsupervised ML tasks such as clustering do not require labeled datasets. Supervised ML tasks such as classification or regression require labeled datasets. If your dataset is not labeled, the next logical step is to label the dataset.

C. Data labeling

Data scientists are expensive resources, and data labeling may not be the best use of their time. These services and sellers can help:

- Amazon SageMaker Ground Truth (SageMaker Ground Truth) helps you build highly accurate ML training datasets quickly. Ground Truth can lower data labeling costs for larger datasets via automated data labeling. Automated data labeling uses ML to learn to label data automatically, helping you save resources and money. SageMaker Ground Truth also lets you set up a private workforce so that your work team can help label the dataset. However, this may not always be feasible due to the sheer amount of data that needs labeled. Labeling data might not be the best use of your team’s time. To scale your labeling efforts, SageMaker Ground Truth also enables you to use the public workforce by offering an integration with Amazon Mechanical Turk (MTurk).
- Data labeling services in AWS MarketplaceYou can also consider offloading data labeling to sellers listed in AWS Marketplace. SageMaker Ground Truth integrates seamlessly with data labeling services such as Data Labeling Services by Vivetic, Figure Eight Data Labeling Platform, and Data Labeling Services by iMerit. For a complete list of software products, see AWS Marketplace Data Labeling Services.

For information on how to use these labeling services with SageMaker Ground Truth, see Use two additional data labeling services for your Amazon SageMaker Ground Truth labeling jobs.

D. Data analysis and feature engineering

The goal of this step is to prepare data that can be used to train an ML model. In this step, you assemble your data, cleanse it, and perform feature engineering. You may need to pull data from multiple sources, including data warehouses, databases, or file structures. Here are services and software to help with this step:

- AWS services such as Amazon EMR, AWS Glue, Amazon Athena, and Amazon Redshift Spectrum
- Third-party products listed in AWS Marketplace from categories such as Business Intelligence, Big Data, Databases and Caching, and Analytics.
- With Amazon SageMaker:
  - You can also use an Amazon SageMaker notebook instance backed by an EMR cluster to analyze the data during the preliminary analysis phase. For information on how to connect a notebook instance to an EMR cluster, see Build Amazon SageMaker notebooks backed by Spark in Amazon EMR.
  - To store and access large datasets from your Amazon SageMaker notebook instance, you can mount an Amazon Elastic File System (EFS) to it. You can also share ML scripts in this manner. For more information, see Mount an EFS file system to an Amazon SageMaker notebook (with lifecycle configurations).
- Pre-trained ML models can also be used for identifying high-quality data. Using Background Noise classifier, you can identify which audio files are good candidates for becoming part of your training dataset.

Sometimes, adding synthetic features can also help you improve the efficiency of your model. For example, say you work for a fashion ecommerce company and are preparing data for creating customer recommendations. You have information about the interests of visitors to your website. You also have pictures of customers wearing purchased accessories. By using a pre-trained model like Cortexica Fashion Localization (CPU), you can identify fashion objects worn by customers to create recommendations for additional purchases.

E. Training a machine learning model

Now you are ready to train your model. You have multiple options:

- Deep Learning AMI and Deep Learning Containers
  - Deep Learning AMI: AWS Marketplace offers a Deep Learning AMI that embeds all major frameworks. AWS Deep Learning AMIs are built and optimized for building, training, debugging, and serving deep learning models in Amazon Elastic Compute Cloud (Amazon EC2). They support popular frameworks such as TensorFlow, MXNet, PyTorch, Chainer, Keras, and more. For more information about the AWS Deep Learning AMIs, see this documentation.
  - Deep Learning containers: AWS also offers deep learning containers you can use with EC2, Amazon Elastic Container Service (Amazon ECS), Amazon Elastic Kubernetes Service (Amazon EKS), and Amazon SageMaker. For more information about deep learning containers, see this documentation.

- Amazon SageMaker: Amazon SageMaker provides developers and data scientists the ability to build, train, and deploy ML models quickly.
  - Frameworks: Amazon SageMaker supports major frameworks such as TensorFlow, Apache MXNet, Scikit-learn, PyTorch, and others. For more information on how to use frameworks with Amazon SageMaker in managed way, see Use Machine Learning Frameworks with Amazon SageMaker.
  - Built-in algorithms: Amazon SageMaker has high-performance built-in algorithms for popular use cases. There is also a GitHub repository containing multiple sample notebooks that demonstrate how to apply ML and deep learning in Amazon SageMaker.
  - Purpose-built algorithms in AWS Marketplace: It is important to use the right algorithm for solving your ML problem. There could be certain algorithms that may not yet be readily available in popular frameworks. You can solve this problem two ways: (1) bring your own algorithm and (2) use algorithms compatible with Amazon SageMaker found in AWS Marketplace. AWS Marketplace currently contains over 65 purpose-built algorithms you can use to solve ML problems. Some of the examples of third-party algorithms from AWS Marketplace are H2O.ai H2O-3 Automl Algorithm, Implicit BPR, and Intel® DAAL Decision Forest Regression.
  - Links:
    - For complete list of algorithms, see algorithms compatible with Amazon SageMaker in AWS Marketplace.
    - For a deep dive on using an algorithm from AWS Marketplace with Amazon SageMaker, watch my tech talk on: Accelerate Machine Learning Projects with Hundreds of Algorithms and Models in AWS Marketplace.
    - For sample notebooks on how to use third-party algorithms available in AWS Marketplace, see this GitHub repository.

- AMI and SaaS software products in AWS Marketplace: AWS Marketplace also offers multiple AMIs and software products for end-to-end data science projects. TIBCO Data Science for AWS, Domino, KNIME Analytics Platform for AWS, Dataiku DSS, H2O.ai Driverless AI, and Databricks Unified Analytics Platform are some software products that help. For a complete list, see the Machine Learning category of solutions available in AWS Marketplace.

Many sellers have also listed SaaS software in AWS Marketplace. Popular choices include Moogsoft AIOps (US & Canada), Vanillatech Labs – Time Series Forecasting, and Language Scoring API.

AWS Marketplace purchasing features

AWS Marketplace offers several features to help you purchase and use software available in AWS Marketplace.

- Pay as you Go: AWS Marketplace allows you to perform rapid prototyping by offering a PAYG option for certain product types.
- Seller Private Offers: For high-volume purchases requiring special provisions such as volume discounts or a custom End User License Agreement (EULA), AWS Marketplace offers Seller Private Offers.
- Enterprise Contract: Using third-party software from AWS Marketplace requires your organization to accept the seller’s EULA associated with their software listing. Enterprise Contract for AWS Marketplace enables you to procure software quickly and take the friction out of contract negotiations. Large enterprise customers can purchase software from AWS Marketplace using a standardized contract template under which participating sellers have agreed to offer their software products.
- Private Marketplace lets you control which products your users can procure from AWS Marketplace. Built on top of AWS Marketplace, it enables your IT administrators to create a customized digital catalog of approved sellers and software products.
- Coupa integration: AWS Marketplace integrates with procurement system software such as Coupa. For more information, see Procurement system integration. For a 23-minute overview, watch Centralize Invoicing and Spend Management for Software Procurement.

Conclusion

In this post, I showed the process for solving ML problems. The process includes identifying the data you need, defining the task type, labeling your dataset, performing feature engineering, and training your ML model. I also showed options and documentation for AWS services as well as AWS Marketplace software products that support each step. Finally, I mentioned purchasing features for AWS Marketplace to help simplify your procurement process.

Next steps

If you are interested in listing your software product, see Register as an AWS Marketplace Seller. If you have questions about implementing the solution described in this post, please contact AWS Support.

About the author

Kanchan Waikar is a Senior Solutions Architect at Amazon Web Services with AWS Marketplace for machine learning group. She has over 13 years of experience building, architecting, and managing, NLP, and software development projects. She has a masters degree in computer science(data science major) and she enjoys helping customers build solutions backed by AI/ML based AWS services and partner solutions.

AWS Marketplace