Listing Thumbnail

    Label Inspector

     Info
    Sold by: Cleanlab 
    Deployed on AWS
    Find label errors in any classification dataset (text and tabular/CSV datasets supported)

    Overview

    Cleanlab builds AI solutions to assess data quality in messy real-world applications. Mislabeled data is common in classification, but we invented Confident Learning algorithms that automatically detect label errors in your dataset.

    Label Inspector runs these algorithms to estimate which examples are likely mislabeled in any classification dataset. Simply provide the data (labels + features) for a classification task, and state-of-the-art ML models will be trained to score the quality of your labels and flag which ones are likely incorrect.

    Label Inspector can identify mislabeled examples in any standard multi-class classification dataset (including features that are: text, numeric, or categorical — with missing values allowed). It returns a CSV file with a row for each example in your dataset, stating: whether it appears mislabeled, how likely the label is correct, plus an alternative suggested label.

    Documentation and examples: https://github.com/cleanlab/aws-marketplace/ 

    Highlights

    • Label Inspector works for any standard multi-class classification dataset (including features that are: text, numeric, or categorical — with missing values allowed). It trains state-of-the-art ML models to automatically detect which examples are mislabeled.
    • Documentation and example usage notebooks for the latest version are available here: https://github.com/cleanlab/aws-marketplace/
    • Label Inspector auto-trains a robust ML model to identify potential label errors in your dataset. After the training is completed, you can deploy this trained model to classify any new data that you get. If your new data has an accompanying labels column, Label Inspector will also identify any potential label errors in the new data.

    Details

    Delivery method

    Latest version

    Deployed on AWS

    Unlock automation with AI agent solutions

    Fast-track AI initiatives with agents, tools, and solutions from AWS Partners.
    AI Agents

    Features and programs

    Financing for AWS Marketplace purchases

    AWS Marketplace now accepts line of credit payments through the PNC Vendor Finance program. This program is available to select AWS customers in the US, excluding NV, NC, ND, TN, & VT.
    Financing for AWS Marketplace purchases

    Pricing

    Label Inspector

     Info
    Pricing is based on actual usage, with charges varying according to how much you consume. Subscriptions have no end date and may be canceled any time.
    Additional AWS infrastructure costs may apply. Use the AWS Pricing Calculator  to estimate your infrastructure costs.

    Usage costs (22)

     Info
    Dimension
    Description
    Cost
    ml.m5.xlarge Inference (Batch)
    Recommended
    Model inference on the ml.m5.xlarge instance type, batch mode
    $5.00/host/hour
    ml.m5.xlarge Training
    Recommended
    Algorithm training on the ml.m5.xlarge instance type
    $10.00/host/hour
    ml.p2.xlarge Inference (Batch)
    Model inference on the ml.p2.xlarge instance type, batch mode
    $10.00/host/hour
    ml.m4.4xlarge Inference (Batch)
    Model inference on the ml.m4.4xlarge instance type, batch mode
    $10.00/host/hour
    ml.m5.4xlarge Inference (Batch)
    Model inference on the ml.m5.4xlarge instance type, batch mode
    $10.00/host/hour
    ml.m5.12xlarge Inference (Batch)
    Model inference on the ml.m5.12xlarge instance type, batch mode
    $10.00/host/hour
    ml.m4.16xlarge Inference (Batch)
    Model inference on the ml.m4.16xlarge instance type, batch mode
    $10.00/host/hour
    ml.p2.16xlarge Inference (Batch)
    Model inference on the ml.p2.16xlarge instance type, batch mode
    $10.00/host/hour
    ml.m5.2xlarge Inference (Batch)
    Model inference on the ml.m5.2xlarge instance type, batch mode
    $10.00/host/hour
    ml.p3.16xlarge Inference (Batch)
    Model inference on the ml.p3.16xlarge instance type, batch mode
    $10.00/host/hour

    Vendor refund policy

    We do not currently support refunds, but you can cancel your subscription to the service at any time.

    How can we make this page better?

    We'd like to hear your feedback and ideas on how to improve this page.
    We'd like to hear your feedback and ideas on how to improve this page.

    Legal

    Vendor terms and conditions

    Upon subscribing to this product, you must acknowledge and agree to the terms and conditions outlined in the vendor's End User License Agreement (EULA) .

    Content disclaimer

    Vendors are responsible for their product descriptions and other product content. AWS does not warrant that vendors' product descriptions or other product content are accurate, complete, reliable, current, or error-free.

    Usage information

     Info

    Delivery details

    Amazon SageMaker algorithm

    An Amazon SageMaker algorithm is a machine learning model that requires your training data to make predictions. Use the included training algorithm to generate your unique model artifact. Then deploy the model on Amazon SageMaker for real-time inference or batch processing. Amazon SageMaker is a fully managed platform for building, training, and deploying machine learning models at scale.

    Deploy the model on Amazon SageMaker AI using the following options:
    Before deploying the model, train it with your data using the algorithm training process. You're billed for software and SageMaker infrastructure costs only during training. Duration depends on the algorithm, instance type, and training data size. When training completes, the model artifacts save to your Amazon S3 bucket. These artifacts load into the model when you deploy for real-time inference or batch processing. For more information, see Use an Algorithm to Run a Training Job  .
    Deploy the model as an API endpoint for your applications. When you send data to the endpoint, SageMaker processes it and returns results by API response. The endpoint runs continuously until you delete it. You're billed for software and SageMaker infrastructure costs while the endpoint runs. AWS Marketplace models don't support Amazon SageMaker Asynchronous Inference. For more information, see Deploy models for real-time inference  .
    Deploy the model to process batches of data stored in Amazon Simple Storage Service (Amazon S3). SageMaker runs the job, processes your data, and returns results to Amazon S3. When complete, SageMaker stops the model. You're billed for software and SageMaker infrastructure costs only during the batch job. Duration depends on your model, instance type, and dataset size. AWS Marketplace models don't support Amazon SageMaker Asynchronous Inference. For more information, see Batch transform for inference with Amazon SageMaker AI  .
    Version release notes

    Automatically detect potential label errors in your tabular dataset. Deploy the model and find label errors in your test dataset as well.

    Additional details

    Inputs

    Summary

    Your data should be in a CSV file where the first column contains the class labels (remaining columns will be treated as predictive features). The first line of the CSV file should be a header containing column names for your data.

    Ensure that the labels are categorical strings (not continuous numbers but discrete integers are ok), as only multi-class and binary classification datasets are supported. Other columns of data table contain: numeric, categorical, or text (arbitrary string) values.

    Input MIME type
    text/csv
    https://github.com/cleanlab/aws-marketplace/blob/main/label-inspector/data/input/dataset.csv
    https://github.com/cleanlab/aws-marketplace/blob/main/label-inspector/data/input/dataset.csv

    Input data descriptions

    The following table describes supported input data fields for real-time inference and batch transform.

    Field name
    Description
    Constraints
    Required
    Labels and training features
    The input data must contain each example’s label in the first column and feature values for this example in the other columns. Each row in the input file represents a single example. We will automatically train ML models to predict the label based on the feature values and also run Confident Learning algorithms to estimate label quality. Each column (i.e. feature) must be either numeric, categorical, or text (arbitrary string). Data with multiple text columns and missing values are supported.
    Type: FreeText
    Yes

    Support

    Vendor support

    For questions/support, please email: support@cleanlab.ai . Free Trials and Subscription Plans available! Email us for more details.

    Your email subject line must state that you are using Label Inspector in AWS Marketplace.

    AWS infrastructure support

    AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.

    Similar products

    Customer reviews

    Ratings and reviews

     Info
    0 ratings
    5 star
    4 star
    3 star
    2 star
    1 star
    0%
    0%
    0%
    0%
    0%
    0 AWS reviews
    |
    13 external reviews
    Star ratings include only reviews from verified AWS customers. External reviews can also include a star rating, but star ratings from external reviews are not averaged in with the AWS customer star ratings.
    Oil & Energy

    A decent Large Language Model, if we are keen in tracking our responses with score basis

    Reviewed on Aug 14, 2025
    Review provided by G2
    What do you like best about the product?
    Its scoring mechanism for all the generated responses is a great feature in amongst of GPTs.
    What do you dislike about the product?
    Context mapping check would have been better, along with the scoring mechanism. And report genertaion and mechanisim would have been a great tool if its included.
    What problems is the product solving and how is that benefiting you?
    Its real-time tracking and scoring mechanisim and compatibility with varies of LLM, makes it more useful.
    Ashish A.

    CleanLab: Best ML Modules Optimizer

    Reviewed on May 21, 2025
    Review provided by G2
    What do you like best about the product?
    The best part of Cleanlab is it's AI models which optimizes any pretrained modules with great level of efficiency. Another best part is it's documentation, Any type of users can use Cleanlab by reading it's documentation. And TLM module is best, it optimizes any LLM. It's API feature helps the integration part much easier.
    What do you dislike about the product?
    As of now I find it a bit hard to dislike such great module. But still talking about it's dislike : It is expensive and some small startups may not afford it. Also, TLM doesn't do great with unstructured data.
    What problems is the product solving and how is that benefiting you?
    I work as a Data Manager in a Company which works with US Healthcare Data. We train modules on Healthcare datasets. Cleanlab helps us identifies and flags incorrect labels. The modules we train sometimes misinterpret the inputs. Here Cleanlab plays a vital role. This optimizes our ML modules and also helps to identifies outliers. In general Cleanlab helps us with optimizing our AI models.
    Ritesh S.

    Powerful label-cleaning with a slight learning curve

    Reviewed on May 16, 2025
    Review provided by G2
    What do you like best about the product?
    Accurate error detection. The ability to automatically spot mislabeled and low-confidence examples has saved me countless hours of manual review.

    Seamless pandas integration. Working directly on DataFrames makes it trivial to plug Cleanlab into existing preprocessing pipelines.

    Clear, example-driven docs. The step-by-step tutorials helped me get up and running in under an hour.
    What do you dislike about the product?
    Initial setup complexity. Installing all dependencies (and configuring environments) can feel a bit involved if you’re just experimenting.

    Performance on very large datasets. Label-error detection can be slow without additional tuning or sampling.
    What problems is the product solving and how is that benefiting you?
    Cleanlab tackles the hidden “label noise” in your datasets—mislabeled, ambiguous or low-confidence examples that quietly drag down model accuracy. By automatically flagging and ranking these problematic records (and even suggesting which labels to trust), Cleanlab lets me:

    Catch mistakes early, before they poison training, so my models learn from clean, reliable data.

    Streamline data audits, turning hours of manual review into minutes of focused corrections.

    Boost final performance, since models trained on higher-quality labels consistently deliver better accuracy and robustness.

    Overall, Cleanlab empowers me to maintain a trustworthy, production-ready dataset with far less effort—and to iterate on models faster and with greater confidence.
    Hemant R.

    Best and easy to use AI

    Reviewed on May 06, 2025
    Review provided by G2
    What do you like best about the product?
    Easy to use. No much hardware setup is required and the way it helps in refining data & on the e-commerce side is wonderful.
    What do you dislike about the product?
    Nothing as such I can think of. need to look more into the product before making any statement.
    What problems is the product solving and how is that benefiting you?
    So we have a lot of customer data but it's mostly messy and not linked properly but with Cleanlab it gives us a properly formatted data.
    nageen n.

    The AI tools to easy my job to clean data from row to smart data set and help our team

    Reviewed on Apr 19, 2025
    Review provided by G2
    What do you like best about the product?
    The time we spent in dataset to significanty decrese after using cleanlab. i would say its save lots of time.
    What do you dislike about the product?
    sometime it getting slow on large dataset but we have not so frequnt those dataset but yes there is need to improvment.
    What problems is the product solving and how is that benefiting you?
    The problem with our existing dataset is clean by manully most of time and sometime heuristics so this is best suited for us to solve our problem. our 70-80% time and human affort are decrese.
    View all reviews