Overview
This 30B parameter vision-language model delivers production-grade optical character recognition with enterprise-level accuracy across diverse document types. Powered by a Mixture-of-Experts architecture that activates only 3B parameters per token, the model
It achieves exceptional OCR performance while maintaining computational efficiency. The model excels at extracting text from forms, invoices, receipts, medical records, legal documents, and complex structured layouts, achieving 88% accuracy on industry-standard OCR benchmarks.
With specialized training in form understanding, it demonstrates a 14.7 Character Error Rate on FUNSD benchmark, making it highly effective for automated document processing pipelines.
The 32K context window enables processing of multi-page documents and batch operations in a single inference pass.
Optimized for high-throughput production environments, it processes thousands of documents efficiently while maintaining consistent accuracy across diverse document formats including tables, multi-column layouts, and mixed-content documents.
Production Advantages:
- Real-time inference suitable for automated workflows
- Consistent performance across diverse document types
- Optimized for integration with document management systems
- Balances accuracy and speed for enterprise-scale deploymentsv
- Ideal for high-volume document processing pipeline
Highlights
- OCR Performance Achieves 88% accuracy on OCRBench evaluations Demonstrates 14.7 Character Error Rate on FUNSD form understanding Handles 20+ languages with consistent accuracy Robust text extraction from receipts, invoices, forms, and business documents Excellent performance on complex layouts and structured documents
- Technical Specifications 30B total parameters with 3B active per inference (MoE architecture) Maximum context length: 32K tokens Image resolution: Up to 8MP/4K (3840 X 2160) Fast inference through efficient architecture design Supports batch processing for high-volume workflows
- Document Understanding Strong performance on charts and data visualizations Excellent table extraction and structure preservation Reliable text extraction from complex multi-column layouts Handles documents with varying quality and orientations Effective processing of mixed-content documents
Details
Introducing multi-product solutions
You can now purchase comprehensive solutions tailored to use cases and industries.
Features and programs
Financing for AWS Marketplace purchases
Pricing
Free trial
Dimension | Description | Cost/host/hour |
|---|---|---|
ml.g5.12xlarge Inference (Batch) Recommended | Model inference on the ml.g5.12xlarge instance type, batch mode | $9.98 |
ml.g5.12xlarge Inference (Real-Time) Recommended | Model inference on the ml.g5.12xlarge instance type, real-time mode | $9.98 |
Vendor refund policy
No refunds are possible.
How can we make this page better?
Legal
Vendor terms and conditions
Content disclaimer
Delivery details
Amazon SageMaker model
An Amazon SageMaker model package is a pre-trained machine learning model ready to use without additional training. Use the model package to create a model on Amazon SageMaker for real-time inference or batch processing. Amazon SageMaker is a fully managed platform for building, training, and deploying machine learning models at scale.
Version release notes
30B parameter vision-language model delivers production-grade optical character recognition with enterprise-level accuracy across diverse document types. Powered by a Mixture-of-Experts architecture that activates only 3B parameters per token, the model achieves exceptional OCR performance while maintaining computational efficiency and excels at extracting text from forms, invoices, receipts, medical records, legal documents, and complex structured layouts, achieving 88% accuracy on industry-standard OCR benchmarks.
Additional details
Inputs
- Summary
1. Chat Completion Example Payload {
"model": "/opt/ml/model",
"messages": [
{"role": "system", "content": "You are a helpful medical assistant."},
{"role": "user", "content": "What should I do if I have a fever and body aches?"}
],
"max_tokens": 1024,
"temperature": 0.6
}For additional parameters:
ChatCompletionRequest OpenAI Chat API
2. Text Completion
Single Prompt Example {
"model": "/opt/ml/model",
"prompt": "How can I maintain good kidney health?",
"max_tokens": 512,
"temperature": 0.6
}Multiple Prompts Example {
"model": "/opt/ml/model",
"prompt": [
"How can I maintain good kidney health?",
"What are the best practices for kidney care?"
],
"max_tokens": 512,
"temperature": 0.6
}Reference:
CompletionRequest OpenAI Completions API
3. Image + Text Inference
The model supports both online (direct URL) and offline (base64-encoded) image inputs.
Online Image Example { "model": "/opt/ml/model", "messages": [ {"role": "system", "content": "You are a helpful medical assistant."}, { "role": "user", "content": [ {"type": "text", "text": "What does this medical image show?"}, {"type": "image_url", "image_url": {"url": "https://example.com/image.jpg "}} ] } ], "max_tokens": 2048, "temperature": 0.1 }
Offline Image Example (Base64) { "model": "/opt/ml/model", "messages": [ {"role": "system", "content": "You are a helpful medical assistant."}, { "role": "user", "content": [ {"type": "text", "text": "What does this medical image show?"}, {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,..."}} ] } ], "max_tokens": 2048, "temperature": 0.1 }
Reference:
Important Notes:
- Streaming Responses: Add "stream": true to your request payload to enable streaming
- Model Path Requirement: Always set "model": "/opt/ml/model" (SageMaker's fixed model location)
- Input MIME type
- application/json
Support
Vendor support
For any assistance, please reach out to support@johnsnowlabs.com .
AWS infrastructure support
AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.
Similar products
