Posted On: Apr 19, 2023

Amazon Comprehend announced that Amazon Comprehend APIs for Document Classification will now use layout of the document in addition to text, to provide higher accuracy. 

Amazon Comprehend is a Natural Language Processing (NLP) service that provides pre-trained and custom APIs to derive insights from textual data. At re:Invent 2022, Comprehend simplified document classification by adding support for inference on common document types. At that time, customers did not have the ability to train custom document classification models for PDF/Word/Image files with layout data for higher accuracy. Now, using the same document classification APIs, customers will be able to train custom classification models with PDF documents, Microsoft Word files, and images, to support using layout information and get higher accuracy for classification. This higher accuracy is beneficial for various scenarios such as insurance claims and mortgage document classification. Customers can use the new capability for asynchronous processing or real-time use cases.

Customer can process documents in the English language for layout information support. These capabilities are available in all AWS regions where Amazon Comprehend is available.

To learn more and get started, visit the Amazon Comprehend Intelligent Document Processing page, AWS Blog, and our documentation