
Overview
This solution helps create large training datasets without manually labeling them over weeks or months. It uses weak supervision approach and regular expression based heuristics with the help of labeling functions (LFs) to assign labels to unlabeled training data. The labels are further enhanced using confidence learning methodologies to provide clean labeled datat as output. The output contains a CSV file consisting of the text, regular expression based base labels and enhanced clean labels. The solution is beneficial for obtaining automated clean class labels for input text datasets with less manual effort.
Highlights
- This solution leverages data-centric approach to get better class labels. This is extremely pertinent for downstream supervised model building. One can use this solution in domains such as e-commerce, marketing and fintech companies to automate the labeling of unlabelled text classification problems such as sentiment classification for product reviews, tweets or social media posts, finance news etc.
- The current solution only works with dataframes as input and generates output that contains only those data points that are labeled by the labeling function. It does not include any data points that have not been assigned any base label. For better results, we recommend upto 700 words in each row.
- PACE - ML is Mphasis framework and methodology for end-to-end machine learning development and deployment. PACE-ML enables organizations to improve the quality & reliability of the machine learning solutions in production and helps automate, scale, and monitor them. Need customized Machine Learning and Deep Learning solutions? Get in touch!
Details
Unlock automation with AI agent solutions

Features and programs
Financing for AWS Marketplace purchases
Pricing
Dimension | Description | Cost/host/hour |
|---|---|---|
ml.m5.xlarge Inference (Batch) Recommended | Model inference on the ml.m5.xlarge instance type, batch mode | $0.00 |
ml.m5.xlarge Inference (Real-Time) Recommended | Model inference on the ml.m5.xlarge instance type, real-time mode | $0.00 |
ml.m5.xlarge Training Recommended | Algorithm training on the ml.m5.xlarge instance type | $10.00 |
ml.m4.4xlarge Inference (Batch) | Model inference on the ml.m4.4xlarge instance type, batch mode | $0.00 |
ml.m5.4xlarge Inference (Batch) | Model inference on the ml.m5.4xlarge instance type, batch mode | $0.00 |
ml.m4.16xlarge Inference (Batch) | Model inference on the ml.m4.16xlarge instance type, batch mode | $0.00 |
ml.m5.2xlarge Inference (Batch) | Model inference on the ml.m5.2xlarge instance type, batch mode | $0.00 |
ml.p3.16xlarge Inference (Batch) | Model inference on the ml.p3.16xlarge instance type, batch mode | $0.00 |
ml.m4.2xlarge Inference (Batch) | Model inference on the ml.m4.2xlarge instance type, batch mode | $0.00 |
ml.c5.2xlarge Inference (Batch) | Model inference on the ml.c5.2xlarge instance type, batch mode | $0.00 |
Vendor refund policy
Currently we do not support refunds, but you can cancel your subscription to the service at any time.
How can we make this page better?
Legal
Vendor terms and conditions
Content disclaimer
Delivery details
Amazon SageMaker algorithm
An Amazon SageMaker algorithm is a machine learning model that requires your training data to make predictions. Use the included training algorithm to generate your unique model artifact. Then deploy the model on Amazon SageMaker for real-time inference or batch processing. Amazon SageMaker is a fully managed platform for building, training, and deploying machine learning models at scale.
Version release notes
This is version 3.1
Additional details
Inputs
- Summary
input_zip.zip contains input_zip folder. input_zip folder contains:
- dataset.csv: containing the data in which automatic data labeling will be applied.
- pattern.json: containing parameters:
- column name: name of the column in the dataset.csv in which data labeling algorithm will be applied.
- class: dictionary containing the class names among which data will be divided based on the regex pattern belonging to that particular class.
- Limitations for input type
- 1. Input should be in zip format and name should be input_zip.zip. 2. input_zip.zip should contain a input_zip folder. 3. input_zip folder should contain 2 files. One is a csv file "dataset.csv" and another is a json file "pattern.json" 4. Current solution only works with dataframes as input.
- Input MIME type
- text/csv, application/zip, application/gzip, text/plain
Input data descriptions
The following table describes supported input data fields for real-time inference and batch transform.
Field name | Description | Constraints | Required |
|---|---|---|---|
dataset.csv | dataset.csv contains the input data in which data labeling algorithm will be applied | Type: FreeText
Limitations: For better results, we recommend upto 700 words in each row. | Yes |
pattern.json | "column name": containing name of the column in the dataset.csv in which data labeling algorithm will be applied.
| Type: FreeText | Yes |
pattern.json | "class": dictionary containing:
1. keys: class names( for e.g. class_name_1, class_name_2 in above example) among which data will be divided.
2. values: containing regex pattern.
| Type: FreeText | Yes |
pattern.json | {"column name": "column_name", "class": {"external_link": {"pattern": "(?:(?:https?|ftp))+"}, "SPAM_CHECK": {"pattern": "(?:(?:check?))+"}}}
In the sample example above, for class "external_link",if the regex pattern matches with any row in the dataset then that row will be labeled to class "external_link". | Type: FreeText | Yes |
Resources
Vendor resources
Support
AWS infrastructure support
AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.
