Automate document processing and extract accurate insights
Companies spend significant time and effort manually or digitally pre-processing documents to make them usable for their applications. Documents have different formats, types, and layouts, making this a time-consuming, error-prone, and costly process. Teams might not have the machine learning (ML) expertise to automate intelligent document processing, but they want a simple and efficient solution that can scale with business requirements and provide accurate results.
Amazon Comprehend helps you automate document processing with no prior ML experience required. Use the classification and extraction capabilities to rapidly process a variety of document types and accurately extract insights to inform your business decisions. Access capabilities to detect and protect sensitive data and help meet compliance requirements.
Benefits
Faster time to insights
Quickly and accurately process and extract insights from a variety of document types.
Build to your requirements
Create custom models specific to your domain, industry, or business requirements, with classes and entities you define.
Support all skill levels
Automate your document processing pipeline and manage models at scale, with no ML experience required.
Protect privacy
Discover and protect personally identifiable information in your documents to help meet privacy and compliance standards.
How it works

Features
Document types support
Use a single API for processing both text and semi-structured documents that are digital or scanned. Access on-demand and batch processing support for document types such as PDF, Docx, JPEG, TIFF, PNG, and plaintext UTF-8.
Model customization
Build custom models to accurately catch your domain-specific document categories (such as W2s and auto and home insurance claims) and terminology (such as names, acronyms, product codes, and order types) for your use case or industry. Use dedicated ML models and endpoints built with your data, that only you create and access.
Accuracy control
Improve document processing outcomes with a combination of optical character recognition (OCR) and natural language processing (NLP). Use additional datasets at training time to increase accuracy of classification and entity recognition.
A single step to insights and model management
Quickly train, deploy, retrain, and manage your models with out-of-the-box model management capabilities. Access insights from your models with single-step inference.
Multi-language support
Expand the reach of your use case by processing documents across the multiple languages that Amazon Comprehend supports, reducing the need for translations.
Personally identifiable information (PII) detection
Use the Amazon Comprehend PII redaction capabilities to help automate the discovery and redaction of PII data in your documents at scale.
Use cases
Insurance claim forms
Classify and extract critical information from medical bills and claim forms such as policies and medical codes to provide accurate insights for completing claims processing.
Mortgage applications
Extract entities from income statements, identity verification, and other loan application documents for credit evaluation and underwriting.
Legal contracts
Automate legal contract processing, classify and triage high-risk documents, and extract insights such as case numbers, trademarks, and clauses to inform negotiations.
Tax documents
Classify and extract insights from bills, contracts, W2 forms, bank statements, and invoices for tax provisioning and filing.
Resources
Documentation
Get started with Amazon Comprehend for intelligent document processing.