[SEO Subhead]
This Guidance enables you to integrate Amazon DynamoDB with Amazon OpenSearch Service to enable real-time search. Most applications should use Amazon DynamoDB zero-ETL integration with Amazon OpenSearch Service. For applications with requirements that do not align with zero-ETL integration, this Guidance demonstrates how to perform an initial load of data from DynamoDB into OpenSearch Service through parallel functions and how to replicate new data into OpenSearch Service. By keeping data in both places, you can target queries to the database best suited to your requirements: DynamoDB powers any fixed access patterns that require performance and scalability, and OpenSearch Service powers access patterns that require flexibility in searching and filtering.
Please note: [Disclaimer]
Architecture Diagram
[Architecture diagram description]
Initial Load
Step 1
To process existing data, an AWS Lambda function is invoked to describe the Amazon DynamoDB table and split it into a number of segments based on the returned item count. The function writes one message to an Amazon Simple Queue Service (Amazon SQS) queue for each segment number.
Step 2
Amazon SQS acts as an event source for Lambda. Lambda will invoke functions from messages in the queue and process segments of the DynamoDB table in parallel.
Step 3
The Lambda function uses a parallel scan to read the segment of the DynamoDB table listed in the source event from Amazon SQS.
Step 4
The function then writes the data retrieved from DynamoDB into Amazon OpenSearch Service in batches through the bulk-create operation.
Streaming Changes
Step 5
Insert or update items in DynamoDB to invoke capture by DynamoDB streams.
Step 6
DynamoDB streams send item-level modifications captured from DynamoDB to the Lambda streaming update function.
Step 7
The Lambda function writes that data in batches to OpenSearch Service through the bulk index operation. Track ingested documents with the SearchableDocuments metric in Amazon CloudWatch.
Get Started
Deploy this Guidance
Well-Architected Pillars
The AWS Well-Architected Framework helps you understand the pros and cons of the decisions you make when building systems in the cloud. The six pillars of the Framework allow you to learn architectural best practices for designing and operating reliable, secure, efficient, cost-effective, and sustainable systems. Using the AWS Well-Architected Tool, available at no charge in the AWS Management Console, you can review your workloads against these best practices by answering a set of questions for each pillar.
The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.
-
Operational Excellence
AWS Cloud Development Kit (AWS CDK) defines the infrastructure for the solution as code, helping you achieve consistent deployment. Lambda divides work into smaller units of work, each responsible for a different application function. These single-task functions reduce human error and support small incremental changes that are easier to reverse if they fail.
-
Security
Where applicable, this Guidance launches services in private Amazon Virtual Private Cloud (Amazon VPC) networks rather than public. Private networking through Amazon VPC supports security at all layers by letting you control how data is accessed. Additionally, the use of single-purpose, least-privilege AWS Identity and Access Management (IAM) policies helps you prevent permission changes from having broader, unanticipated consequences and reduces the risk of users mishandling sensitive data. AWS Secrets Manager generates and securely stores admin secrets, preventing users from storing credentials in code or environment variables where they are at risk of exposure.
-
Reliability
Amazon SQS provides an automatic retry mechanism if a portion of the import fails, helping you quickly recover from failures. As the system of record, DynamoDB uses point-in-time recovery for continuous backup, enabling recovery to any second within the last 35 days. OpenSearch Service helps you prevent drift between the two databases by using the “create” operation for initial data loading, preventing older data from overwriting newer data. OpenSearch Service is set to use a single-node cluster, but you can change this to a multi–Availability Zone cluster to maintain availability in production.
-
Performance Efficiency
Lambda enables you to parallelize workloads: reads from DynamoDB go through segmented parallel scans split across multiple Lambda function invocations. This parallelization enables significantly higher throughput than a single thread could manage.
-
Cost Optimization
Lambda reads DynamoDB items together in a batch rather than as individual GetItem requests. As a result, this Guidance consumes fewer read capacity units. By lowering the amount of work spent on tasks like initializing connections, the use of batches reduces compute time and the number of Lambda invocations, lowering your compute costs. Additionally, OpenSearch Service batch operations are efficient, helping you reduce the overall cost of compute resources.
-
Sustainability
Lambda only invokes functions when data needs to be moved into OpenSearch Service and does not run while idle. This helps you maximize your utilization of compute resources. Additionally, as a serverless, managed service, DynamoDB helps reduce inefficiencies and decrease the total power consumed by your workloads.
Related Content
Amazon DynamoDB zero-ETL integration with Amazon OpenSearch Service is now available
Disclaimer
The sample code; software libraries; command line tools; proofs of concept; templates; or other related technology (including any of the foregoing that are provided by our personnel) is provided to you as AWS Content under the AWS Customer Agreement, or the relevant written agreement between you and AWS (whichever applies). You should not use this AWS Content in your production accounts, or on production or other critical data. You are responsible for testing, securing, and optimizing the AWS Content, such as sample code, as appropriate for production grade use based on your specific quality control practices and standards. Deploying AWS Content may incur AWS charges for creating or using AWS chargeable resources, such as running Amazon EC2 instances or using Amazon S3 storage.
References to third-party services or organizations in this Guidance do not imply an endorsement, sponsorship, or affiliation between Amazon or AWS and the third party. Guidance from AWS is a technical starting point, and you can customize your integration with third-party services when you deploy the architecture.