Listing Thumbnail

    Unstructured API

     Info
    Deployed on AWS
    AWS Free Tier
    Unstructured extracts and transforms data for use with every major vector database and LLM framework
    5

    Overview

    Play video

    Ingest and preprocess complex natural language data from any document, file type, or layout with Unstructured.

    Under the hood, the Unstructured engine involves breaking a document into its constituent parts and identifying the document's structure, such as its header, tables, and body text. Unstructured provides diverse preprocessing strategies for documents each catering to different document types and requirements. Utilizing the optimal strategy enhances document element classification accuracy and extraction efficiency, which is crucial for image-based files and layout-intensive documents.

    Click on Continue to Subscribe to start using Unstructured for your data preprocessing needs.

    We are constantly improving our products and love feedback.

    Highlights

    • Transforms all your data for downstream analytics. Next-generation vision transformer for images, PDF, and table extraction
    • Enhanced models for table extraction, document hierarchy, and element classification. Chunks your data for LLM applications
    • Compatible with any embedding model, vector database, and LLM framework. API client libraries in multiple client languages (e.g. Python, Javascript)

    Details

    Delivery method

    Delivery option
    UnstructuredAPI
    64-bit (x86) Amazon Machine Image (AMI)

    Latest version

    Operating system
    OtherLinux 9

    Deployed on AWS
    New

    Introducing multi-product solutions

    You can now purchase comprehensive solutions tailored to use cases and industries.

    Multi-product solutions

    Features and programs

    Trust Center

    Trust Center
    Access real-time vendor security and compliance information through their Trust Center powered by Drata or Vanta. Review certifications and security standards before purchase.

    Financing for AWS Marketplace purchases

    AWS Marketplace now accepts line of credit payments through the PNC Vendor Finance program. This program is available to select AWS customers in the US, excluding NV, NC, ND, TN, & VT.
    Financing for AWS Marketplace purchases

    Pricing

    Unstructured API

     Info
    Pricing is based on actual usage, with charges varying according to how much you consume. Subscriptions have no end date and may be canceled any time.
    Additional AWS infrastructure costs may apply. Use the AWS Pricing Calculator  to estimate your infrastructure costs.
    If you are an AWS Free Tier customer with a free plan, you are eligible to subscribe to this offer. You can use free credits to cover the cost of eligible AWS infrastructure. See AWS Free Tier  for more details. If you created an AWS account before July 15th, 2025, and qualify for the Legacy AWS Free Tier, Amazon EC2 charges for Micro instances are free for up to 750 hours per month. See Legacy AWS Free Tier  for more details.

    Usage costs (121)

     Info
    • ...
    Dimension
    Cost/hour
    m4.xlarge
    Recommended
    $4.40
    t2.micro
    $1.10
    t3.micro
    $2.20
    m4.4xlarge
    $17.60
    r5.24xlarge
    $105.60
    r5.large
    $2.20
    r5.8xlarge
    $35.20
    m5n.16xlarge
    $70.40
    r3.xlarge
    $4.40
    i3.2xlarge
    $8.80

    Vendor refund policy

    We do not currently support refunds, but you can cancel at any time.

    How can we make this page better?

    Tell us how we can improve this page, or report an issue with this product.
    Tell us how we can improve this page, or report an issue with this product.

    Legal

    Vendor terms and conditions

    Upon subscribing to this product, you must acknowledge and agree to the terms and conditions outlined in the vendor's End User License Agreement (EULA) .

    Content disclaimer

    Vendors are responsible for their product descriptions and other product content. AWS does not warrant that vendors' product descriptions or other product content are accurate, complete, reliable, current, or error-free.

    Usage information

     Info

    Delivery details

    64-bit (x86) Amazon Machine Image (AMI)

    Amazon Machine Image (AMI)

    An AMI is a virtual image that provides the information required to launch an instance. Amazon EC2 (Elastic Compute Cloud) instances are virtual servers on which you can run your applications and workloads, offering varying combinations of CPU, memory, storage, and networking resources. You can launch as many instances from as many different AMIs as you need.

    Version release notes

    1.0.72

    • update contextual chunking chunk window size to be 100k from 15k

    Additional details

    Usage instructions

    "To connect to the operating system, use SSH and the username rocky. You will need the same SSH Key Pair supplied during stack launch. For more details see the Unstructed API deployment guide here: https://docs.unstructured.io/api-reference/api-services/aws "

    Resources

    Vendor resources

    Support

    Vendor support

    Please allow 24 hours. Join us in our Slack workspace for support

    AWS infrastructure support

    AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.

    Product comparison

     Info
    Updated weekly
    By Unstructured
    By Hyperscience

    Accolades

     Info
    Top
    10
    In Software Development, Data Analysis
    Top
    10
    In Text/OCR
    Top
    10
    In Handwriting Recognition

    Customer reviews

     Info
    Sentiment is AI generated from actual customer reviews on AWS and G2
    Reviews
    Functionality
    Ease of use
    Customer service
    Cost effectiveness
    1 reviews
    Insufficient data
    Insufficient data
    Insufficient data
    Insufficient data
    Positive reviews
    Mixed reviews
    Negative reviews

    Overview

     Info
    AI generated from product descriptions
    Document Structure Analysis
    Breaks documents into constituent parts and identifies structural elements including headers, tables, and body text
    Vision Transformer Technology
    Utilizes next-generation vision transformer models for extraction from images, PDFs, and tables
    Multi-Format Data Ingestion
    Ingests and preprocesses complex natural language data from any document type and file layout
    Vector Database and LLM Integration
    Compatible with any embedding model, vector database, and LLM framework with API client libraries in multiple languages including Python and Javascript
    Adaptive Preprocessing Strategies
    Provides diverse preprocessing strategies tailored to different document types with enhanced models for table extraction, document hierarchy, and element classification
    Production-Grade Runtime Environment
    Pre-built and optimized Python runtime for document processing tasks with rapid deployment capabilities
    Zero-Shot Document Templates
    Accelerated processing blueprints for common document types including forms with handwriting, paystubs, bank statements, and invoices without requiring training data
    Comprehensive Document Intelligence
    Built-in capabilities for document classification, extraction, transcription, complex table handling, and Human-in-the-Loop interfaces with native QA support
    High-Throughput Processing Architecture
    Massively parallel processing engineered for high-throughput, low-latency performance at scale with intelligent cost orchestration across CPU and GPU resources
    Model-Agnostic Pipeline Framework
    Support for any model type or custom third-party integration within a single pipeline with centralized versioning, auditability, and full telemetry tracking on all inputs, outputs, and automated decisions
    Handwritten and Printed Text Recognition
    Achieves 98% accuracy in extracting printed and handwritten text, numbers, and checkboxes from low-quality faxed and scanned documents using machine learning trained on over 1 billion documents from finance, insurance, and healthcare industries.
    Document Classification and Matching
    Automatically classifies and identifies pages by matching them to user-defined form templates using image recognition algorithms, capable of handling degraded documents including those with stains, incorrect orientation, and poor quality scans.
    Signature and Barcode Detection
    Detects signatures, QR codes, barcodes, and performs address validation with specialized readers available for advanced use cases.
    Template-Based Form Field Extraction
    Uses blank forms as templates to locate and extract individual form fields for digitization without requiring coding or machine learning expertise through a web-based developer interface.
    Data Transformation and Human-in-the-Loop Review
    Applies data transformations to ensure output consistency and readiness for RPA ingestion, with optional human-in-the-loop review interface for data validation and business logic application, integrated via API or Blue Prism connector.

    Contract

     Info
    Standard contract
    No
    No

    Customer reviews

    Ratings and reviews

     Info
    5
    1 ratings
    5 star
    4 star
    3 star
    2 star
    1 star
    100%
    0%
    0%
    0%
    0%
    1 AWS reviews
    reviewer2846073

    Fast document parsing has boosted culture insights and now improves HR policy intelligence

    Reviewed on Jun 04, 2026
    Review from a verified AWS customer

    What is our primary use case?

    We are a culture operating system that analyzes organizational culture, and we have an AI bot that joins calls to create structured culture intelligence reports. When we talk about HR, there are HR policies, PDFs, and performance documents generated by the HR or human resource department in the company. If we need to digest that data, we use Unstructured  to create a vector database of these unstructured data.

    If an HR manager wants to use HR policies, HR documents, and performance data in Instill, they can upload their document, and we use Unstructured  to convert those PDFs into a vector-based database.

    What is most valuable?

    We are now using Unstructured every day, and it is useful when we want answers and AI to be used on a PDF or something similar. We use Unstructured to convert it into a vector database to make retrieval augmentation or any kind of AI processes easy.

    The document parsing stands out, as the document ingestion is very fast in Unstructured, 20 to 40% faster than the industry products available. If HR wants to upload a PDF on our platform, we use Unstructured to digest the data, and it is 20 to 40% faster than other solutions.

    The faster document ingestion has resulted in customer satisfaction, leading to higher quality answers using AI that improved customer satisfaction and NPS  score. NPS  has improved by at least 10 to 15 points since we started using Unstructured, not only for data digestion but also for retrieving data when we have to use AI or RAG.

    What needs improvement?

    Cost is something that needs to be factored for scaling use cases because we do not have control over how many documents users will upload, so it is variable and we cannot set a threshold.

    For how long have I used the solution?

    I have been working in my field for the last eight years.

    What do I think about the stability of the solution?

    The accuracy and reliability of output from Unstructured are very accurate and highly reliable, as we have not faced any issues and the uptime is consistent.

    Which other solutions did I evaluate?

    I advise doing research about other vector database searches because Pinecone  is also good, but you need to understand the use case.

    What other advice do I have?

    Features and usability are fine, and it is one of the best products available.

    I chose a rating of 10 out of 10 because they are very focused on doing what they do at the best quality and speed, and what they are not doing is outside their scope. They claim faster processing and converting into a vector database faster, building a vector database from unstructured data, which they provide at a very fast speed and quality.

    The governance and security regarding Unstructured's AI capabilities are good, as we have SOC 2 and other compliance certificates from Unstructured. I give this product a rating of 10 out of 10.

    Which deployment model are you using for this solution?

    Private Cloud

    If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

    View all reviews