Skip to main content

Guidance for Real-Time Text Search Using Amazon OpenSearch Service

Overview

This Guidance enables you to integrate Amazon DynamoDB with Amazon OpenSearch Service to enable real-time search. Most applications should use Amazon DynamoDB zero-ETL integration with Amazon OpenSearch Service. For applications with requirements that do not align with zero-ETL integration, this Guidance demonstrates how to perform an initial load of data from DynamoDB into OpenSearch Service through parallel functions and how to replicate new data into OpenSearch Service. By keeping data in both places, you can target queries to the database best suited to your requirements: DynamoDB powers any fixed access patterns that require performance and scalability, and OpenSearch Service powers access patterns that require flexibility in searching and filtering.

How it works

This architecture diagram shows how to load and stream data from an Amazon DynamoDB table to Amazon OpenSearch Service to support real-time, open-ended searching and filtering.

Get Started

Deploy this Guidance

Sample code

Use sample code to deploy this Guidance in your AWS account

Well-Architected Pillars

The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.

AWS Cloud Development Kit (AWS CDK) defines the infrastructure for the solution as code, helping you achieve consistent deployment. Lambda divides work into smaller units of work, each responsible for a different application function. These single-task functions reduce human error and support small incremental changes that are easier to reverse if they fail.

Read the Operational Excellence whitepaper

Where applicable, this Guidance launches services in private Amazon Virtual Private Cloud (Amazon VPC) networks rather than public. Private networking through Amazon VPC supports security at all layers by letting you control how data is accessed. Additionally, the use of single-purpose, least-privilege AWS Identity and Access Management (IAM) policies helps you prevent permission changes from having broader, unanticipated consequences and reduces the risk of users mishandling sensitive data. AWS Secrets Manager generates and securely stores admin secrets, preventing users from storing credentials in code or environment variables where they are at risk of exposure.

Read the Security whitepaper

Amazon SQS provides an automatic retry mechanism if a portion of the import fails, helping you quickly recover from failures. As the system of record, DynamoDB uses point-in-time recovery for continuous backup, enabling recovery to any second within the last 35 days. OpenSearch Service helps you prevent drift between the two databases by using the “create” operation for initial data loading, preventing older data from overwriting newer data. OpenSearch Service is set to use a single-node cluster, but you can change this to a multi–Availability Zone cluster to maintain availability in production.

Read the Reliability whitepaper

Lambda enables you to parallelize workloads: reads from DynamoDB go through segmented parallel scans split across multiple Lambda function invocations. This parallelization enables significantly higher throughput than a single thread could manage.

Read the Performance Efficiency whitepaper

Lambda reads DynamoDB items together in a batch rather than as individual GetItem requests. As a result, this Guidance consumes fewer read capacity units. By lowering the amount of work spent on tasks like initializing connections, the use of batches reduces compute time and the number of Lambda invocations, lowering your compute costs. Additionally, OpenSearch Service batch operations are efficient, helping you reduce the overall cost of compute resources.

Read the Cost Optimization whitepaper

Lambda only invokes functions when data needs to be moved into OpenSearch Service and does not run while idle. This helps you maximize your utilization of compute resources. Additionally, as a serverless, managed service, DynamoDB helps reduce inefficiencies and decrease the total power consumed by your workloads.

Read the Sustainability whitepaper

Disclaimer

The sample code; software libraries; command line tools; proofs of concept; templates; or other related technology (including any of the foregoing that are provided by our personnel) is provided to you as AWS Content under the AWS Customer Agreement, or the relevant written agreement between you and AWS (whichever applies). You should not use this AWS Content in your production accounts, or on production or other critical data. You are responsible for testing, securing, and optimizing the AWS Content, such as sample code, as appropriate for production grade use based on your specific quality control practices and standards. Deploying AWS Content may incur AWS charges for creating or using AWS chargeable resources, such as running Amazon EC2 instances or using Amazon S3 storage.