
Overview
Cleanlab builds AI solutions to assess data quality in messy real-world applications. Mislabeled data is common in classification, but we invented Confident Learning algorithms that automatically detect label errors in your dataset.
Label Inspector runs these algorithms to estimate which examples are likely mislabeled in any classification dataset. Simply provide the data (labels + features) for a classification task, and state-of-the-art ML models will be trained to score the quality of your labels and flag which ones are likely incorrect.
Label Inspector can identify mislabeled examples in any standard multi-class classification dataset (including features that are: text, numeric, or categorical — with missing values allowed). It returns a CSV file with a row for each example in your dataset, stating: whether it appears mislabeled, how likely the label is correct, plus an alternative suggested label.
Documentation and examples: https://github.com/cleanlab/aws-marketplace/Â
Highlights
- Label Inspector works for any standard multi-class classification dataset (including features that are: text, numeric, or categorical — with missing values allowed). It trains state-of-the-art ML models to automatically detect which examples are mislabeled.
- Documentation and example usage notebooks for the latest version are available here: https://github.com/cleanlab/aws-marketplace/
- Label Inspector auto-trains a robust ML model to identify potential label errors in your dataset. After the training is completed, you can deploy this trained model to classify any new data that you get. If your new data has an accompanying labels column, Label Inspector will also identify any potential label errors in the new data.
Details
Unlock automation with AI agent solutions

Features and programs
Financing for AWS Marketplace purchases
Pricing
Dimension | Description | Cost |
|---|---|---|
ml.m5.xlarge Inference (Batch) Recommended | Model inference on the ml.m5.xlarge instance type, batch mode | $5.00/host/hour |
ml.m5.xlarge Training Recommended | Algorithm training on the ml.m5.xlarge instance type | $10.00/host/hour |
ml.p2.xlarge Inference (Batch) | Model inference on the ml.p2.xlarge instance type, batch mode | $10.00/host/hour |
ml.m4.4xlarge Inference (Batch) | Model inference on the ml.m4.4xlarge instance type, batch mode | $10.00/host/hour |
ml.m5.4xlarge Inference (Batch) | Model inference on the ml.m5.4xlarge instance type, batch mode | $10.00/host/hour |
ml.m5.12xlarge Inference (Batch) | Model inference on the ml.m5.12xlarge instance type, batch mode | $10.00/host/hour |
ml.m4.16xlarge Inference (Batch) | Model inference on the ml.m4.16xlarge instance type, batch mode | $10.00/host/hour |
ml.p2.16xlarge Inference (Batch) | Model inference on the ml.p2.16xlarge instance type, batch mode | $10.00/host/hour |
ml.m5.2xlarge Inference (Batch) | Model inference on the ml.m5.2xlarge instance type, batch mode | $10.00/host/hour |
ml.p3.16xlarge Inference (Batch) | Model inference on the ml.p3.16xlarge instance type, batch mode | $10.00/host/hour |
Vendor refund policy
We do not currently support refunds, but you can cancel your subscription to the service at any time.
How can we make this page better?
Legal
Vendor terms and conditions
Content disclaimer
Delivery details
Amazon SageMaker algorithm
An Amazon SageMaker algorithm is a machine learning model that requires your training data to make predictions. Use the included training algorithm to generate your unique model artifact. Then deploy the model on Amazon SageMaker for real-time inference or batch processing. Amazon SageMaker is a fully managed platform for building, training, and deploying machine learning models at scale.
Version release notes
Automatically detect potential label errors in your tabular dataset. Deploy the model and find label errors in your test dataset as well.
Additional details
Inputs
- Summary
Your data should be in a CSV file where the first column contains the class labels (remaining columns will be treated as predictive features). The first line of the CSV file should be a header containing column names for your data.
Ensure that the labels are categorical strings (not continuous numbers but discrete integers are ok), as only multi-class and binary classification datasets are supported. Other columns of data table contain: numeric, categorical, or text (arbitrary string) values.
- Input MIME type
- text/csv
Input data descriptions
The following table describes supported input data fields for real-time inference and batch transform.
Field name | Description | Constraints | Required |
|---|---|---|---|
Labels and training features | The input data must contain each example’s label in the first column and feature values for this example in the other columns. Each row in the input file represents a single example. We will automatically train ML models to predict the label based on the feature values and also run Confident Learning algorithms to estimate label quality. Each column (i.e. feature) must be either numeric, categorical, or text (arbitrary string). Data with multiple text columns and missing values are supported. | Type: FreeText | Yes |
Support
Vendor support
For questions/support, please email: support@cleanlab.ai . Free Trials and Subscription Plans available! Email us for more details.
Your email subject line must state that you are using Label Inspector in AWS Marketplace.
AWS infrastructure support
AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.
Similar products


Customer reviews
A decent Large Language Model, if we are keen in tracking our responses with score basis
CleanLab: Best ML Modules Optimizer
Powerful label-cleaning with a slight learning curve
Seamless pandas integration. Working directly on DataFrames makes it trivial to plug Cleanlab into existing preprocessing pipelines.
Clear, example-driven docs. The step-by-step tutorials helped me get up and running in under an hour.
Performance on very large datasets. Label-error detection can be slow without additional tuning or sampling.
Catch mistakes early, before they poison training, so my models learn from clean, reliable data.
Streamline data audits, turning hours of manual review into minutes of focused corrections.
Boost final performance, since models trained on higher-quality labels consistently deliver better accuracy and robustness.
Overall, Cleanlab empowers me to maintain a trustworthy, production-ready dataset with far less effort—and to iterate on models faster and with greater confidence.