Posted On: Mar 30, 2022
Amazon Textract is a machine learning service that makes it easy to extract text and data from virtually any document. We continuously improve the underlying machine learning models based on customer feedback to provide even better accuracy. Today, we are pleased to announce few quality enhancements to both our Tables and checkbox detection features.
The latest Tables models supports detecting merged cells and identifying column headers. Specifically, you can now detect merged cells on a document processed using the AnalyzeDocument-Tables feature through the "Type": "MERGED_CELL" and also identify cells that make up the column header through the "EntityTypes": ["COLUMN_HEADER"] identifier. In addition, we are pleased to announce quality enhancements to our Tables feature. Starting today, Textract more accurately detects outer table boundaries, row and column boundaries and table content. Customers can now expect higher accuracy with lower postprocessing on extracting tables within a wide variety of document types, including those found in lending, insurance, financial services, legal, healthcare, energy and the public sector.
Finally, we have improved the check box detection capabilities within the Forms model. With this improvement, you can now leverage Amazon Textract to more accurately detect handwritten selected/not selected information in checkboxes within form fields.
To get started, log on to the Amazon Textract console to try out the latest Tables and Checkbox detection feature. To learn more about Textract capabilities, please visit the Amazon Textract website, developer guide, or resources page.