
Overview
Provectus Document Intelligence solution simplifies and accelerates data extraction and data entry processes. By using state-of-the-art Machine Learning models, it automatically extracts and converts unstructured data from scanned documents into searchable and reusable formats. The solution is designed to be utilized for Robotic Process Automation (RPA) in healthcare, banking, financial services, insurance, and other industries that need to quickly automate document processing operations.
Highlights
- Reimagine inefficient and time-consuming manual document processing with the intelligent, AI-powered data extraction solution. Reduce the time needed to handle paper/scanned documents, eliminate unnecessary costs, and improve employee productivity and satisfaction. An automated information extraction unlocks a wide array of opportunities from streamlining business processes to enabling real-time data analysis, to discover actionable insights and facilitate decision-making.
- Train, test, and tune Provectus Document Intelligence: Data Extraction models on your own dataset of documents. It’s simple, comprehensive, and fun!
- Need an intelligent document processing solution to accelerate and automate manual data entry? Reach us at hello@provectus.com
Details
Introducing multi-product solutions
You can now purchase comprehensive solutions tailored to use cases and industries.
Features and programs
Financing for AWS Marketplace purchases
Pricing
Vendor refund policy
This product is offered for free. If there are any questions, please contact us for further clarifications.
How can we make this page better?
Legal
Vendor terms and conditions
Content disclaimer
Delivery details
Amazon SageMaker algorithm
An Amazon SageMaker algorithm is a machine learning model that requires your training data to make predictions. Use the included training algorithm to generate your unique model artifact. Then deploy the model on Amazon SageMaker for real-time inference or batch processing. Amazon SageMaker is a fully managed platform for building, training, and deploying machine learning models at scale.
Version release notes
First release of the model.
Additional details
Inputs
- Summary
Training
The training requires two separate datasets: training and validation. The training dataset is used for training the model, the validation dataset is used to prevent overfitting.
Training Input data
Supported MIME Content Types:
application/json
[ { "context": "This agreement is made by and between Vulcan Materials CO, based at PSC 2758, Box 6740, APO AA 97024, and Adolor Corp, based at 60755 Green Terrace Suite 037, West Ginastad, OH 54388, and becomes effective on 2026-03-26. With this agreement, Adolor Corp agrees to perform services for Vulcan Materials CO for the project tentatively titled \"Manufactor toy cars\" on the following terms and conditions.", "qas": [ { "id": "0", "is_impossible": false, "question": "first party", "answers": [ { "text": "Vulcan Materials CO", "answer_start": 38 }] }] } ]Example input(s) for training job:
Input data consists of a list of training examples with "context" and "qas" fields. "context" contains a striing with text that contains entities to be extracted. "qas" contains a list of "questions" and "answers" — entity names, and their positions in the text. "is_imposible" field might be used to add negative samples to the dataset — samples that help the algorithm to recognize that an entity is not present in the context. NB: "id" field in the "qas" samples refers to a global identifier of a sample (over all contexts), rather than a single context.
The algorithm does not require any manual preprocessing. Tokenization is performed by the algorithm.
Inference
At inference-time, you must provide test data in the same format as with training. However, the "answer" field in "qas" is not required.
Inference Input data
Supported MIME Content Types:
application/json
[ { "context": "This agreement is made by and between Vulcan Materials CO, based at PSC 2758, Box 6740, APO AA 97024, and Adolor Corp, based at 60755 Green Terrace Suite 037, West Ginastad, OH 54388, and becomes effective on 2026-03-26. With this agreement, Adolor Corp agrees to perform services for Vulcan Materials CO for the project tentatively titled \"Manufactor toy cars\" on the following terms and conditions.", "qas": [ { "id": "0", "is_impossible": false, "question": "first party" } ]Example input(s) for inference job / endpoint:
Output
Output format is the same as the input format — the algorithm adds the "answer" field to all questions in the test dataset. NB: empty string as an answer indicates that the entity is not found in the given context.
- Input MIME type
- application/json
Resources
Vendor resources
Support
Vendor support
Check out the JSON inference sample: https://www.notion.so/Social-distance-JSON-inference-format-504312267db14b7296f9a59873220057
We'd love to tailor our solution to improve the data accuracy for your environment and use case. Reach out to us at hello@provectus.com to talk, or visit our website.
AWS infrastructure support
AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.