AWS Machine Learning Blog

Category: Amazon Textract

Dataset architecture

How United Airlines built a cost-efficient Optical Character Recognition active learning pipeline

In this post, we discuss how United Airlines, in collaboration with the Amazon Machine Learning Solutions Lab, build an active learning framework on AWS to automate the processing of passenger documents. “In order to deliver the best flying experience for our passengers and make our internal business process as efficient as possible, we have developed […]

Generative AI and multi-modal agents in AWS: The key to unlocking new value in financial markets

Multi-modal data is a valuable component of the financial industry, encompassing market, economic, customer, news and social media, and risk data. Financial organizations generate, collect, and use this data to gain insights into financial operations, make better decisions, and improve performance. However, there are challenges associated with multi-modal data due to the complexity and lack […]

Implement smart document search index with Amazon Textract and Amazon OpenSearch

In this post, we’ll take you on a journey to rapidly build and deploy a document search indexing solution that helps your organization to better harness and extract insights from documents. Whether you’re in Human Resources looking for specific clauses in employee contracts, or a financial analyst sifting through a mountain of invoices to extract payment data, this solution is tailored to empower you to access the information you need with unprecedented speed and accuracy.

Intelligent Document Processing Pipeline with Generative AI

Enhancing AWS intelligent document processing with generative AI

Data classification, extraction, and analysis can be challenging for organizations that deal with volumes of documents. Traditional document processing solutions are manual, expensive, error prone, and difficult to scale. AWS intelligent document processing (IDP), with AI services such as Amazon Textract, allows you to take advantage of industry-leading machine learning (ML) technology to quickly and […]

Retain original PDF formatting to view translated documents with Amazon Textract, Amazon Translate, and PDFBox

Companies across various industries create, scan, and store large volumes of PDF documents. In many cases, the content is text-heavy and often written in a different language and requires translation. To address this, you need an automated solution to extract the contents within these PDFs and translate them quickly and cost-efficiently. Many businesses have diverse […]

Announcing enhanced table extractions with Amazon Textract

Amazon Textract is a machine learning (ML) service that automatically extracts text, handwriting, and data from any document or image. Amazon Textract has a Tables feature within the AnalyzeDocument API that offers the ability to automatically extract tabular structures from any document. In this post, we discuss the improvements made to the Tables feature and […]

Dialogue-guided intelligent document processing with foundation models on Amazon SageMaker JumpStart

Intelligent document processing (IDP) is a technology that automates the processing of high volumes of unstructured data, including text, images, and videos. IDP offers a significant improvement over manual methods and legacy optical character recognition (OCR) systems by addressing challenges such as cost, errors, low accuracy, and limited scalability, ultimately leading to better outcomes for […]

Introducing Amazon Textract Bulk Document Uploader for enhanced evaluation and analysis

Amazon Textract is a machine learning (ML) service that automatically extracts text, handwriting, and data from any document or image. To make it simpler to evaluate the capabilities of Amazon Textract, we have launched a new Bulk Document Uploader feature on the Amazon Textract console that enables you to quickly process your own set of […]

Transform, analyze, and discover insights from unstructured healthcare data using Amazon HealthLake

Healthcare data is complex and siloed, and exists in various formats. An estimated 80% of data within organizations is considered to be unstructured or “dark” data that is locked inside text, emails, PDFs, and scanned documents. This data is difficult to interpret or analyze programmatically and limits how organizations can derive insights from it and […]

Build end-to-end document processing pipelines with Amazon Textract IDP CDK Constructs

September 2023: This post was reviewed and updated. Intelligent document processing (IDP) with AWS helps automate information extraction from documents of different types and formats, quickly and with high accuracy, without the need for machine learning (ML) skills. Faster information extraction with high accuracy can help you make quality business decisions on time, while reducing […]