Overview
Upstage Document Parse Enhanced is an all-in-one document intelligence API that transforms diverse file formats into structured HTML/Markdown. It combines the robust OCR and layout analysis of our Standard model with proprietary Vision Language Models (VLMs) for deep understanding.
Universal Document Support & VLM Precision:
-
Any Format, One Structure: Seamlessly processes PDFs, scanned images (JPEG, PNG, TIFF), and digital documents. Whether it's a blurry scan or a digital report, it unifies them into clean, structured HTML.
-
VLM-Powered Insight: Beyond reading text, it uses VLMs to "see" and interpret complex elements. It reconstructs borderless/nested tables, merges multi-page tables, and extracts numerical data from charts.
-
Visual Description: It automatically generates natural language captions for figures and diagrams, ensuring no visual context is lost in digitization.
Designed for Agentic RAG: Prioritizes semantic depth and data fidelity over raw speed (2x-10x processing time vs. Standard), delivering the rich context LLMs need.
Highlights
- **Format-Agnostic Parsing**: Handles everything from native PDFs to complex image scans. It identifies layouts (paragraphs, headers) and reconstructs reading order across formats. The Enhanced VLM engine then dives deep to merge split tables and interpret charts, ensuring consistent output quality regardless of the input file type.
- **Agentic RAG Ready**: By converting visual data (charts, figures) into text and merging fragmented tables, it preserves the full context often lost in standard OCR. This "semantically dense" output allows LLMs to reason over complex financial or scientific data effectively. It offers a trade-off of deeper processing time for unmatched data fidelity.
- **Key Tasks**: PDF & Image to HTML - VLM Document Understanding - Multi-page Table Merging - Chart-to-Data Conversion - Figure Captioning - Scanned Document OCR - Layout Analysis
Details
Introducing multi-product solutions
You can now purchase comprehensive solutions tailored to use cases and industries.
Features and programs
Financing for AWS Marketplace purchases
Pricing
Free trial
Dimension | Description | Cost/host/hour |
|---|---|---|
ml.m5.12xlarge Inference (Batch) Recommended | Model inference on the ml.m5.12xlarge instance type, batch mode | $0.00 |
ml.g6.12xlarge Inference (Real-Time) Recommended | Model inference on the ml.g6.12xlarge instance type, real-time mode | $12.00 |
Vendor refund policy
We do not support any refunds currently.
How can we make this page better?
Legal
Vendor terms and conditions
Content disclaimer
Delivery details
Amazon SageMaker model
An Amazon SageMaker model package is a pre-trained machine learning model ready to use without additional training. Use the model package to create a model on Amazon SageMaker for real-time inference or batch processing. Amazon SageMaker is a fully managed platform for building, training, and deploying machine learning models at scale.
Version release notes
Initial release of document parse enhanced!
Document Parse Enhanced is an advanced mode of the existing Document Parse, significantly improving accuracy and result consistency for documents rich in visual elements such as tables, charts, diagrams, and checkboxes. It is designed to efficiently process large volumes of documents in enterprise environments—including finance, manufacturing, and the public sector—and to be readily integrated into real-world document-driven workflows.
Additional details
Inputs
- Summary
Provide input data in multipart form data View more detailed description here
To activate Enhanced Mode, you must correctly specify the mode parameter.
The mode parameter is a string value that determines the parsing mode. The following options are available: standard, enhanced, and auto. • standard: Use for text-focused documents with simple tables. • enhanced: Use for documents containing complex tables, images, charts, and other advanced visual elements. • auto: Automatically classifies each page as either standard or enhanced and processes it accordingly.
Default: "standard" Allowed values: "standard" | "enhanced" | "auto"
- Input MIME type
- multipart/form-data
Input data descriptions
The following table describes supported input data fields for real-time inference and batch transform.
Field name | Description | Constraints | Required |
|---|---|---|---|
mode | The mode parameter is a string value that determines the parsing mode. The following options are available: standard, enhanced, and auto.
• standard: Use for text-focused documents with simple tables.
• enhanced: Use for documents containing complex tables, images, charts, and other advanced visual elements.
• auto: Automatically classifies each page as either standard or enhanced and processes it accordingly.
Default: "standard"
Allowed values: "standard" | "enhanced" | "auto" | - | No |
Resources
Vendor resources
Support
Vendor support
Contact us for model, usage and enterprise integration inquiries.
AWS infrastructure support
AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.
