AWS Machine Learning Blog

Category: Amazon Textract

Bring structure to diverse documents with Amazon Textract and transformer-based models on Amazon SageMaker

From application forms, to identity documents, recent utility bills, and bank statements, many business processes today still rely on exchanging and analyzing human-readable documents—particularly in industries like financial services and law. In this post, we show how you can use Amazon SageMaker, an end-to-end platform for machine learning (ML), to automate especially challenging document analysis […]

Read More

AWS is redefining how companies process documents in a digital world

Think about the last time you opened a bank account, applied for insurance, or refinanced your home. It was probably done on paper. The number of documents in a mortgage packet alone is over 100 pages long. What do you do with all that paper? For many companies across a variety of industries, including financial […]

Read More

Announcing specialized support for extracting data from invoices and receipts using Amazon Textract

Receipts and invoices are documents that are critical to small and medium businesses (SMBs), startups, and enterprises for managing their accounts payable processes. These types of documents are difficult to process at scale because they follow no set design rules, yet any individual customer encounters thousands of distinct types of these documents. In this post, […]

Read More

TC Energy builds an intelligent document processing workflow to process over 20 million images with Amazon AI

This is a guest post authored by Paul Ngo, US Gas Technical and Operational Services Data Team Lead at TC Energy. TC Energy operates a network of pipelines, including 57,900 miles of natural gas and 3,000 miles of oil and liquid pipelines, throughout North America. TC Energy enables a stable network of natural gas and […]

Read More

Improve newspaper digitalization efficacy with a generic document segmentation tool using Amazon Textract

We are living in a digital age. Information that used to be spread by printouts is disseminated at unforeseen speeds through digital formats. In parallel to the inventions of new types of media, an increasing number of archives and libraries are trying to create digital repositories with new technologies. Digitization allows for preservation by creating […]

Read More

Segment paragraphs and detect insights with Amazon Textract and Amazon Comprehend

Many companies extract data from scanned documents containing tables and forms, such as PDFs. Some examples are audit documents, tax documents, whitepapers, or customer review documents. For customer reviews, you might be extracting text such as product reviews, movie reviews, or feedback. Further understanding of the individual and overall sentiment of the user base from […]

Read More

Intelligent governance of document processing pipelines for regulated industries

Processing large documents like PDFs and static images is a cornerstone of today’s highly regulated industries. From healthcare information like doctor-patient visits and bills of health, to financial documents like loan applications, tax filings, research reports, and regulatory filings, these documents are integral to how these industries conduct business. The mechanisms by which these documents […]

Read More

PDF document pre-processing with Amazon Textract: Visuals detection and removal

Amazon Textract is a fully managed machine learning (ML) service that automatically extracts printed text, handwriting, and other data from scanned documents that goes beyond simple optical character recognition (OCR) to identify, understand, and extract data from forms and tables. Amazon Textract can detect text in a variety of documents, including financial reports, medical records, […]

Read More
We use the following sample document, which has both printed and handwritten content in tables.

Process documents containing handwritten tabular content using Amazon Textract and Amazon A2I

Even in this digital age where more and more companies are moving to the cloud and using machine learning (ML) or technology to improve business processes, we still see a vast number of companies reach out and ask about processing documents, especially documents with handwriting. We see employment forms, time cards, and financial applications with […]

Read More

This month in AWS Machine Learning: January edition

Hello and welcome to our first “This month in AWS Machine Learning” of 2021! Every day there is something new going on in the world of AWS Machine Learning—from launches to new to use cases to interactive trainings. We’re packaging some of the not-to-miss information from the ML Blog and beyond for easy perusing each […]

Read More