Label Inspector

Find label errors in any classification dataset (text and tabular/CSV datasets supported)

Overview

Cleanlab builds AI solutions to assess data quality in messy real-world applications. Mislabeled data is common in classification, but we invented Confident Learning algorithms that automatically detect label errors in your dataset.

Label Inspector runs these algorithms to estimate which examples are likely mislabeled in any classification dataset. Simply provide the data (labels + features) for a classification task, and state-of-the-art ML models will be trained to score the quality of your labels and flag which ones are likely incorrect.

Label Inspector can identify mislabeled examples in any standard multi-class classification dataset (including features that are: text, numeric, or categorical — with missing values allowed). It returns a CSV file with a row for each example in your dataset, stating: whether it appears mislabeled, how likely the label is correct, plus an alternative suggested label.

Documentation and examples: https://github.com/cleanlab/aws-marketplace/

Highlights

Label Inspector works for any standard multi-class classification dataset (including features that are: text, numeric, or categorical — with missing values allowed). It trains state-of-the-art ML models to automatically detect which examples are mislabeled.
Documentation and example usage notebooks for the latest version are available here: https://github.com/cleanlab/aws-marketplace/
Label Inspector auto-trains a robust ML model to identify potential label errors in your dataset. After the training is completed, you can deploy this trained model to classify any new data that you get. If your new data has an accompanying labels column, Label Inspector will also identify any potential label errors in the new data.

Details

Sold by

Cleanlab

Unlock automation with AI agent solutions

Fast-track AI initiatives with agents, tools, and solutions from AWS Partners.

Explore AI agent solutions

Features and programs

Financing for AWS Marketplace purchases

AWS Marketplace now accepts line of credit payments through the PNC Vendor Finance program. This program is available to select AWS customers in the US, excluding NV, NC, ND, TN, & VT.

View financing details

Pricing

Label Inspector

Info

View purchase options

Pricing is based on actual usage, with charges varying according to how much you consume. Subscriptions have no end date and may be canceled any time.

Additional AWS infrastructure costs may apply. Use the AWS Pricing Calculator to estimate your infrastructure costs.

Usage costs (22)

Info

Dimension	Description	Cost
ml.m5.xlarge Inference (Batch) Recommended	Model inference on the ml.m5.xlarge instance type, batch mode	$5.00/host/hour
ml.m5.xlarge Training Recommended	Algorithm training on the ml.m5.xlarge instance type	$10.00/host/hour
ml.p2.xlarge Inference (Batch)	Model inference on the ml.p2.xlarge instance type, batch mode	$10.00/host/hour
ml.m4.4xlarge Inference (Batch)	Model inference on the ml.m4.4xlarge instance type, batch mode	$10.00/host/hour
ml.m5.4xlarge Inference (Batch)	Model inference on the ml.m5.4xlarge instance type, batch mode	$10.00/host/hour
ml.m5.12xlarge Inference (Batch)	Model inference on the ml.m5.12xlarge instance type, batch mode	$10.00/host/hour
ml.m4.16xlarge Inference (Batch)	Model inference on the ml.m4.16xlarge instance type, batch mode	$10.00/host/hour
ml.p2.16xlarge Inference (Batch)	Model inference on the ml.p2.16xlarge instance type, batch mode	$10.00/host/hour
ml.m5.2xlarge Inference (Batch)	Model inference on the ml.m5.2xlarge instance type, batch mode	$10.00/host/hour
ml.p3.16xlarge Inference (Batch)	Model inference on the ml.p3.16xlarge instance type, batch mode	$10.00/host/hour

Vendor refund policy

We do not currently support refunds, but you can cancel your subscription to the service at any time.

How can we make this page better?

We'd like to hear your feedback and ideas on how to improve this page.

Legal

Vendor terms and conditions

Upon subscribing to this product, you must acknowledge and agree to the terms and conditions outlined in the vendor's End User License Agreement (EULA) .

Content disclaimer

Vendors are responsible for their product descriptions and other product content. AWS does not warrant that vendors' product descriptions or other product content are accurate, complete, reliable, current, or error-free.

Usage information

Info

Version

Delivery details

Amazon SageMaker algorithm

An Amazon SageMaker algorithm is a machine learning model that requires your training data to make predictions. Use the included training algorithm to generate your unique model artifact. Then deploy the model on Amazon SageMaker for real-time inference or batch processing. Amazon SageMaker is a fully managed platform for building, training, and deploying machine learning models at scale.

Deploy the model on Amazon SageMaker AI using the following options:

Algorithm training

Before deploying the model, train it with your data using the algorithm training process. You're billed for software and SageMaker infrastructure costs only during training. Duration depends on the algorithm, instance type, and training data size. When training completes, the model artifacts save to your Amazon S3 bucket. These artifacts load into the model when you deploy for real-time inference or batch processing. For more information, see Use an Algorithm to Run a Training Job .

Real-time inference

Deploy the model as an API endpoint for your applications. When you send data to the endpoint, SageMaker processes it and returns results by API response. The endpoint runs continuously until you delete it. You're billed for software and SageMaker infrastructure costs while the endpoint runs. AWS Marketplace models don't support Amazon SageMaker Asynchronous Inference. For more information, see Deploy models for real-time inference .

Batch transform

Deploy the model to process batches of data stored in Amazon Simple Storage Service (Amazon S3). SageMaker runs the job, processes your data, and returns results to Amazon S3. When complete, SageMaker stops the model. You're billed for software and SageMaker infrastructure costs only during the batch job. Duration depends on your model, instance type, and dataset size. AWS Marketplace models don't support Amazon SageMaker Asynchronous Inference. For more information, see Batch transform for inference with Amazon SageMaker AI .

Version release notes

Automatically detect potential label errors in your tabular dataset. Deploy the model and find label errors in your test dataset as well.

Additional details

Inputs

Summary: Your data should be in a CSV file where the first column contains the class labels (remaining columns will be treated as predictive features). The first line of the CSV file should be a header containing column names for your data.

Ensure that the labels are categorical strings (not continuous numbers but discrete integers are ok), as only multi-class and binary classification datasets are supported. Other columns of data table contain: numeric, categorical, or text (arbitrary string) values.

Input MIME type: text/csv

Real-time inference sample input data

https://github.com/cleanlab/aws-marketplace/blob/main/label-inspector/data/input/dataset.csv

Batch transform sample input data

https://github.com/cleanlab/aws-marketplace/blob/main/label-inspector/data/input/dataset.csv

Input data descriptions

The following table describes supported input data fields for real-time inference and batch transform.

Field name	Description	Constraints	Required
Labels and training features	The input data must contain each example’s label in the first column and feature values for this example in the other columns. Each row in the input file represents a single example. We will automatically train ML models to predict the label based on the feature values and also run Confident Learning algorithms to estimate label quality. Each column (i.e. feature) must be either numeric, categorical, or text (arbitrary string). Data with multiple text columns and missing values are supported.	Type: FreeText	Yes

Resources

Vendor resources

Documentation and example usage notebooks

Learn more about label errors

ML training may take time; try Cleanlab Studio for better ML (+ image support)

Support

Vendor support

For questions/support, please email: support@cleanlab.ai . Free Trials and Subscription Plans available! Email us for more details.

Your email subject line must state that you are using Label Inspector in AWS Marketplace.

Get support

AWS infrastructure support

AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.

Get support

Similar products

Hadoop on Debian 11 with support by AskforCloud LLC

By Askforcloud LLC

This product has charges associated with it for seller support. The Apache Hadoop framework develops open-source software for reliable, scalable, distributed computing.

View product

Apiable - API portal for Amazon API Gateway

By Apiable

Productize your APIs without the dev time

View product

Nightfall Developer Platform: Cloud-Native Data Loss Prevention API

By Nightfall

With the Nightfall Developer Platform, build data classification and protection into any application. Nightfall's APIs are a set of building blocks developers can use to discover, classify, and protect sensitive data.

View product

ClimateTracker Climate-related Disclosures - Aotearoa NZ CS 1-3

By ClimateTracker

ClimateTracker provides intelligent AI Climate Disclosure reporting tools that guide organisations through changing climate standards, help make impactful climate decisions and reduce compliance costs

View product

Customer reviews

Leave a review

Ratings and reviews

Info

0 ratings

5 star

4 star

3 star

2 star

1 star

0 AWS reviews

13 external reviews

Star ratings include only reviews from verified AWS customers. External reviews can also include a star rating, but star ratings from external reviews are not averaged in with the AWS customer star ratings.

Oil & Energy

A decent Large Language Model, if we are keen in tracking our responses with score basis

Reviewed on Aug 14, 2025

Review provided by G2

What do you like best about the product?

Its scoring mechanism for all the generated responses is a great feature in amongst of GPTs.

What do you dislike about the product?

Context mapping check would have been better, along with the scoring mechanism. And report genertaion and mechanisim would have been a great tool if its included.

What problems is the product solving and how is that benefiting you?

Its real-time tracking and scoring mechanisim and compatibility with varies of LLM, makes it more useful.

Ashish A.

CleanLab: Best ML Modules Optimizer

Reviewed on May 21, 2025

Review provided by G2

What do you like best about the product?

The best part of Cleanlab is it's AI models which optimizes any pretrained modules with great level of efficiency. Another best part is it's documentation, Any type of users can use Cleanlab by reading it's documentation. And TLM module is best, it optimizes any LLM. It's API feature helps the integration part much easier.

What do you dislike about the product?

As of now I find it a bit hard to dislike such great module. But still talking about it's dislike : It is expensive and some small startups may not afford it. Also, TLM doesn't do great with unstructured data.

What problems is the product solving and how is that benefiting you?

I work as a Data Manager in a Company which works with US Healthcare Data. We train modules on Healthcare datasets. Cleanlab helps us identifies and flags incorrect labels. The modules we train sometimes misinterpret the inputs. Here Cleanlab plays a vital role. This optimizes our ML modules and also helps to identifies outliers. In general Cleanlab helps us with optimizing our AI models.

Ritesh S.

Powerful label-cleaning with a slight learning curve

Reviewed on May 16, 2025

Review provided by G2

What do you like best about the product?

Accurate error detection. The ability to automatically spot mislabeled and low-confidence examples has saved me countless hours of manual review.

Seamless pandas integration. Working directly on DataFrames makes it trivial to plug Cleanlab into existing preprocessing pipelines.

Clear, example-driven docs. The step-by-step tutorials helped me get up and running in under an hour.

What do you dislike about the product?

Initial setup complexity. Installing all dependencies (and configuring environments) can feel a bit involved if you’re just experimenting.

Performance on very large datasets. Label-error detection can be slow without additional tuning or sampling.

What problems is the product solving and how is that benefiting you?

Cleanlab tackles the hidden “label noise” in your datasets—mislabeled, ambiguous or low-confidence examples that quietly drag down model accuracy. By automatically flagging and ranking these problematic records (and even suggesting which labels to trust), Cleanlab lets me:

Catch mistakes early, before they poison training, so my models learn from clean, reliable data.

Streamline data audits, turning hours of manual review into minutes of focused corrections.

Boost final performance, since models trained on higher-quality labels consistently deliver better accuracy and robustness.

Overall, Cleanlab empowers me to maintain a trustworthy, production-ready dataset with far less effort—and to iterate on models faster and with greater confidence.

Hemant R.

Best and easy to use AI

Reviewed on May 06, 2025

Review provided by G2

What do you like best about the product?

Easy to use. No much hardware setup is required and the way it helps in refining data & on the e-commerce side is wonderful.

What do you dislike about the product?

Nothing as such I can think of. need to look more into the product before making any statement.

What problems is the product solving and how is that benefiting you?

So we have a lot of customer data but it's mostly messy and not linked properly but with Cleanlab it gives us a properly formatted data.

nageen n.

The AI tools to easy my job to clean data from row to smart data set and help our team

Reviewed on Apr 19, 2025

Review provided by G2

What do you like best about the product?

The time we spent in dataset to significanty decrese after using cleanlab. i would say its save lots of time.

What do you dislike about the product?

sometime it getting slow on large dataset but we have not so frequnt those dataset but yes there is need to improvment.

What problems is the product solving and how is that benefiting you?

The problem with our existing dataset is clean by manully most of time and sometime heuristics so this is best suited for us to solve our problem. our 70-80% time and human affort are decrese.

View all reviews