Overview

Product video
Ingest and preprocess complex natural language data from any document, file type, or layout with Unstructured.
Under the hood, the Unstructured engine involves breaking a document into its constituent parts and identifying the document's structure, such as its header, tables, and body text. Unstructured provides diverse preprocessing strategies for documents each catering to different document types and requirements. Utilizing the optimal strategy enhances document element classification accuracy and extraction efficiency, which is crucial for image-based files and layout-intensive documents.
Click on Continue to Subscribe to start using Unstructured for your data preprocessing needs.
We are constantly improving our products and love feedback.
Highlights
- Transforms all your data for downstream analytics. Next-generation vision transformer for images, PDF, and table extraction
- Enhanced models for table extraction, document hierarchy, and element classification. Chunks your data for LLM applications
- Compatible with any embedding model, vector database, and LLM framework. API client libraries in multiple client languages (e.g. Python, Javascript)
Details
Introducing multi-product solutions
You can now purchase comprehensive solutions tailored to use cases and industries.
Features and programs
Trust Center
Financing for AWS Marketplace purchases
Pricing
- ...
Dimension | Cost/hour |
|---|---|
m4.xlarge Recommended | $4.40 |
t2.micro | $1.10 |
t3.micro | $2.20 |
m4.4xlarge | $17.60 |
r5.24xlarge | $105.60 |
r5.large | $2.20 |
r5.8xlarge | $35.20 |
m5n.16xlarge | $70.40 |
r3.xlarge | $4.40 |
i3.2xlarge | $8.80 |
Vendor refund policy
We do not currently support refunds, but you can cancel at any time.
How can we make this page better?
Legal
Vendor terms and conditions
Content disclaimer
Delivery details
64-bit (x86) Amazon Machine Image (AMI)
Amazon Machine Image (AMI)
An AMI is a virtual image that provides the information required to launch an instance. Amazon EC2 (Elastic Compute Cloud) instances are virtual servers on which you can run your applications and workloads, offering varying combinations of CPU, memory, storage, and networking resources. You can launch as many instances from as many different AMIs as you need.
Version release notes
1.0.72
- update contextual chunking chunk window size to be 100k from 15k
Additional details
Usage instructions
"To connect to the operating system, use SSH and the username rocky. You will need the same SSH Key Pair supplied during stack launch. For more details see the Unstructed API deployment guide here: https://docs.unstructured.io/api-reference/api-services/aws "
Resources
Vendor resources
Support
Vendor support
Please allow 24 hours. Join us in our Slack workspace for support
AWS infrastructure support
AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.
Standard contract
Customer reviews
Fast document parsing has boosted culture insights and now improves HR policy intelligence
What is our primary use case?
We are a culture operating system that analyzes organizational culture, and we have an AI bot that joins calls to create structured culture intelligence reports. When we talk about HR, there are HR policies, PDFs, and performance documents generated by the HR or human resource department in the company. If we need to digest that data, we use Unstructured to create a vector database of these unstructured data.
If an HR manager wants to use HR policies, HR documents, and performance data in Instill, they can upload their document, and we use Unstructured to convert those PDFs into a vector-based database.
What is most valuable?
We are now using Unstructured every day, and it is useful when we want answers and AI to be used on a PDF or something similar. We use Unstructured to convert it into a vector database to make retrieval augmentation or any kind of AI processes easy.
The document parsing stands out, as the document ingestion is very fast in Unstructured, 20 to 40% faster than the industry products available. If HR wants to upload a PDF on our platform, we use Unstructured to digest the data, and it is 20 to 40% faster than other solutions.
The faster document ingestion has resulted in customer satisfaction, leading to higher quality answers using AI that improved customer satisfaction and NPS score. NPS has improved by at least 10 to 15 points since we started using Unstructured, not only for data digestion but also for retrieving data when we have to use AI or RAG.
What needs improvement?
Cost is something that needs to be factored for scaling use cases because we do not have control over how many documents users will upload, so it is variable and we cannot set a threshold.
For how long have I used the solution?
I have been working in my field for the last eight years.
What do I think about the stability of the solution?
The accuracy and reliability of output from Unstructured are very accurate and highly reliable, as we have not faced any issues and the uptime is consistent.
Which other solutions did I evaluate?
I advise doing research about other vector database searches because Pinecone is also good, but you need to understand the use case.
What other advice do I have?
Features and usability are fine, and it is one of the best products available.
I chose a rating of 10 out of 10 because they are very focused on doing what they do at the best quality and speed, and what they are not doing is outside their scope. They claim faster processing and converting into a vector database faster, building a vector database from unstructured data, which they provide at a very fast speed and quality.
The governance and security regarding Unstructured's AI capabilities are good, as we have SOC 2 and other compliance certificates from Unstructured. I give this product a rating of 10 out of 10.