Amazon Textract is now PCI DSS certified and extracts even more data from tables and forms

Posted on: Dec 18, 2019

Amazon Textract is a machine learning service that makes it easy and quick to retrieve text and structured data like tables and forms using our DetectText or AnalyzeDoc APIs, without requiring any custom configuration or templates. One advantage of a managed service like Amazon Textract is that customers benefit from continuous improvement over time. Today, we are pleased to announce that Amazon Textract is now PCI DSS certified. This means that you can now use Amazon Textract for all workloads that require Payment Card Industry Data Security Standard (PCI DSS) information security standard, such as cardholder data (CHD) or sensitive authentication data (SAD). Also starting today, AWS launched a set of quality enhancements that make Amazon Textract even more accurate for our tables and forms features. 

First, our tables model now works better with complex table structures that contain split cells and merged cells, which make it difficult to align cell values to the correct column header or row header. Next, Amazon Textract has further improved in identifying rows and columns for cells with wrapped text (text present across multiple lines), even for tables without explicit boundaries. Amazon Textract now does a more accurate determination of cells with content across multiple lines as opposed to when it is a new row without an explicit boundary. Finally, Amazon Textract has also improved the forms model to give more accurate results for key-value pair identification. These benefits apply to many types of documents, but are especially pronounced for documents where tables and key-value pair are present within the same page. Now, Amazon Textract correctly identifies key-value pairs embedded within a table. 

You can learn more about these updates here