Guidance for Enhanced Document Search Using Content and Metadata Enrichment on AWS
Overview
How it works
Enhance search experiences by adding metadata to your document with custom document enrichment using Amazon Kendra.
Well-Architected Pillars
The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.
Operational Excellence
Amazon Kendra uses Amazon CloudWatch logs to provide insight into the operation of data sources. Amazon Kendra logs process details for the documents as they are indexed. It also logs errors from the data source that occur while documents are being indexed. CloudWatch logs can be used to monitor, store, and access the log files. With minimal user intervention, CloudWatch logs can capture insights and anomaly detection to continuously analyze metrics of systems and applications, determine normal baselines, and surface anomalies. The AWS CloudFormationtemplate can be easily modified and extended to integrate changes.
Security
The CloudFormation infrastructure as code (IaC) automation deploys resources to the AWS Cloud securely. This reduces the risk of human and potential errors related to manual configuration or management.
Lambda functions are configured through AWS Identity and Access Management (IAM) with least-privilege access, limiting access to just the required Amazon S3 data buckets.
Reliability
The Kendra Enterprise Edition of Amazon Kendra is highly available by default within a Region and can handle Availability Zones failures. Lambda runs in multiple Availability Zones to ensure that it is available to process events in the case of a service interruption in a single zone.
Before extraction, Lambda is configured to run only for a maximum of 5 minutes. Text extraction from each audio and video file must complete in 5 minutes. Post extraction, Lambda is configured to run for a maximum 1 minute, so Amazon Comprehend has to detect entities from the text within that time.
Amazon Kendra is integrated with AWS CloudTrail, a service that provides a record of actions taken by a user, role, or an AWS service in Amazon Kendra. CloudTrail captures all API calls from Amazon Kendra as events, including calls from the Amazon Kendra console and from code calls to the Amazon Kendra APIs.
Performance Efficiency
Services used in the Guidance are purpose built for this use. Amazon Transcribe is built to create a transcription of audio and video files. Amazon Textract extracts text from scanned image documents. Amazon Comprehend detects entities from within the text.
With CloudFormation IaC, this Guidance can be deployed to any supported AWS Region close to the user base to decrease latency and improve performance.
The code is executed using Lambda functions that provide serverless compute capabilities without the infrastructure. The functions automatically scale in and out to meet the changes in demand.
Cost Optimization
Serverless architectures and services such as Lambda, Amazon Textract, Amazon Comprehend, and Amazon Transcribe provide a pay-as-you-go pricing model that is based on usage. And because they're serverless, these services scale based on demand.
AWS Budgets can help to plan budgets for cost and usage. Lambda can be used with Compute Savings Plans to reduce cost.
Sustainability
The Lambda functions' execution environment shuts down the application logic after it has been executed. This saves on infrastructure use and cost.
Disclaimer
Did you find what you were looking for today?
Let us know so we can improve the quality of the content on our pages