AWS Architecture Blog

Category: Amazon Comprehend

AI-based intelligent document processing engine

Optimizing data with automated intelligent document processing solutions

Many organizations struggle to effectively manage and derive insights from the large amount of unstructured data locked in emails, PDFs, images, scanned documents, and more. The variety of formats, document layouts, and text makes it difficult for any standard Optical Character Recognition (OCR) to extract key insights from these data sources. To help organizations overcome […]

Figure 1. Automated form data extraction architecture

Automate your Data Extraction for Oil Well Data with Amazon Textract

Traditionally, many businesses archive physical formats of their business documents. These can be invoices, sales memos, purchase orders, vendor-related documents, and inventory documents. As more and more businesses are moving towards digitizing their business processes, it is becoming challenging to effectively manage these documents and perform business analytics on them. For example, in the Oil […]

Figure 2. Extending the solution

Scale Up Language Detection with Amazon Comprehend and S3 Batch Operations

Organizations have been collecting text data for years. Text data can help you intelligently address a range of challenges, from customer experience to analytics. These mixed language, unstructured datasets can contain a wealth of information within business documents, emails, and webpages. If you’re able to process and interpret it, this information can provide insight that […]

Top 5

Top 5: Featured Architecture Content for September

The AWS Architecture Center provides new and notable reference architecture diagrams, vetted architecture solutions, AWS Well-Architected best practices, whitepapers, and more. This blog post features some of our best picks from the new and newly updated content we released in the past month. 1. AWS Best Practices for DDoS Resiliency Prioritizing the availability and responsiveness […]

Field Notes: How to Prepare Large Text Files for Processing with Amazon Translate and Amazon Comprehend

Biopharmaceutical manufacturing is a highly regulated industry where deviation documents are used to optimize manufacturing processes. Deviation documents in biopharmaceutical manufacturing processes are geographically diverse, spanning multiple countries and languages. The document corpus is complex, with additional requirements for complete encryption. Therefore, to reduce downtime and increase process efficiency, it is critical to automate the […]

Figure 1. Architecture of document processing workflow

Automate Document Processing in Logistics using AI

Multi-modal transportation is one of the biggest developments in the logistics industry. There has been a successful collaboration across different transportation partners in supply chain freight forwarding for many decades. But there’s still a considerable overhead of paperwork processing for each leg of the trip. Tens of billions of documents are processed in ocean freight […]

High-level design for an AWS lake house implementation

Benefits of Modernizing On-premises Analytics with an AWS Lake House

Organizational analytics systems have shifted from running in the background of IT systems to being critical to an organization’s health. Analytics systems help businesses make better decisions, but they tend to be complex and are often not agile enough to scale quickly. To help with this, customers upgrade their traditional on-premises online analytic processing (OLAP) […]

Architecture showing how to build a Scalable Real-Time Newsfeed Watchlist Using Amazon Comprehend

Field Notes: Building a Scalable Real-Time Newsfeed Watchlist Using Amazon Comprehend

One of the challenges businesses have is to constantly monitor information via media outlets and be alerted when a key interest is picked up, such as individual, product, or company information. One way to do this is to scan media and news feeds against a company watchlist. The list may contain personal names, organizations or […]

How to redact confidential information in your ML pipeline

Integrating Redaction of FinServ Data into a Machine Learning Pipeline

Financial companies process hundreds of thousands of documents every day. These include loan and mortgage statements that contain large amounts of confidential customer information. Data privacy requires that sensitive data be redacted to protect the customer and the institution. Redacting digital and physical documents is time-consuming and labor-intensive. The accidental or inadvertent release of personal information […]