AWS Database Blog
Achieve near real-time analytics with Amazon DynamoDB and zero-ETL for Amazon OpenSearch Service
Following Rockset’s acquisition by OpenAI, the Rockset service will be deprecated by the end of 2024. Customers who use Rockset to run analytics and search workloads on their Amazon DynamoDB data have asked for guidance on maintaining these capabilities when they migrate off Rockset. Amazon OpenSearch Service is used by tens of thousands of AWS customers to perform search and analysis on application and operational data. AWS recently launched support for zero-ETL integration between DynamoDB and OpenSearch Service to sync data in near real time. This extract, transform, and load (ETL) process is fully managed, code free solution that provides seamless data replication.
In this post, we explore how to transition from using Rockset to OpenSearch Service for your DynamoDB use-case effectively. To illustrate this integration, we consider a real-world example of a gaming company that tracks user interactions, such as in-game purchases and player scores, using DynamoDB. This data needs to be analyzed in real time to provide insights into user behavior, detect anomalies, and personalize the gaming experience.
DynamoDB
DynamoDB is a fully managed NoSQL database service known for its low-latency performance and seamless scalability. It efficiently handles virtually any amount of structured and semi-structured data, making it ideal for applications requiring consistent, single-digit millisecond response times.
OpenSearch Service
OpenSearch Service is a managed service that makes it easy for you to perform interactive log analytics, real-time search and application monitoring, website search, semantic search, fuzzy full-text search and more. OpenSearch Service allows you to deploy, operate, and scale OpenSearch clusters and OpenSearch Serverless collections in the AWS Cloud. OpenSearch Dashboards is an integrated visualization tool that makes it easy for users to explore their data in real-time.
DynamoDB zero-ETL with OpenSearch Service
DynamoDB zero-ETL integration with OpenSearch Service lets you perform a search on your DynamoDB data by automatically replicating and transforming it without custom code or infrastructure. This zero-ETL integration uses Amazon OpenSearch Ingestion to synchronize the data between DynamoDB and OpenSearch Service.
OpenSearch Ingestion is a fully managed, serverless data collector that delivers real-time log, trace, and event data to OpenSearch Service domains and OpenSearch Serverless collections. OpenSearch Ingestion is powered by the open-source data collector Data Prepper. You simply configure your data producers to send data to OpenSearch Ingestion, which automatically delivers and transforms the data as required to the OpenSearch domain or collection that you specify.
The zero-ETL integration uses a DynamoDB export to Amazon Simple Storage Service (Amazon S3) to create an initial snapshot to load into OpenSearch Service and uses Amazon DynamoDB Streams to replicate subsequent changes in near real time making sure your data is up to date and ready for search and analysis. The integration doesn’t consume read or write throughput on your table, so it doesn’t impact your production traffic. You can create a pipeline without taking a snapshot by excluding export settings, or with only a snapshot and no updates by excluding stream settings.
You must turn on point-in-time recovery (PITR) for export with the DynamoDB Streams feature with the NEW_AND_OLD_IMAGES
option for ongoing replication.
The zero-ETL integration provides simple, scalable, real-time data processing, indexing, and SQL-based query capabilities on live data streams within OpenSearch.
Solution overview
The following diagram illustrates the solution architecture.
Here’s how these services work together:
- Data ingestion – Zero-ETL uses the initial snapshot data from DynamoDB S3 export to load into OpenSearch. Then uses DynamoDB Streams to replicate further changes in near real time and indexes them into OpenSearch Service. With this automated process, your data is consistently kept up to date and ready for search and analysis.
- Real-time querying – OpenSearch Service offers powerful query capabilities that enable you to perform complex searches and aggregations on your data. Whether you need to analyze trends, detect anomalies, or perform search queries to return relevant results for your application, OpenSearch Service provides the tools you need.
- Visualization – With OpenSearch Dashboards, you can create interactive and real-time visualizations of your data. You can customize your dashboards to display key metrics and insights, providing a comprehensive view of your data at a glance.
For our gaming company use case, this integration allows the company to monitor player activities in real time, identify popular in-game items, and optimize in-game economies dynamically based on player behavior. This real-time insight can significantly enhance user engagement and satisfaction.
The integration of these services offers the following benefits:
- Real-time insights – You can achieve near real-time analytics with minimal latency, allowing timely and informed decision-making
- Reduced complexity – Zero-ETL eliminates the need for complex ETL pipelines, simplifying data integration and reducing operational overhead
- Scalable and reliable – DynamoDB and OpenSearch Service are fully managed and scalable, providing high availability and reliability for your applications
Configure zero-ETL Integration with OpenSearch
To set up an integration between DynamoDB and OpenSearch Service, complete the following steps:
- On the DynamoDB console, choose Integrations in the navigation pane.
- Select the DynamoDB table you want to synchronize, then choose Create
- Provide a unique pipeline name and configure the pipeline capacity and compute resources to automatically scale your pipeline based on the current ingestion workload.
- Input the minimum and maximum Ingestion OpenSearch Compute Units (OCUs). In this example, we use the default pipeline capacity settings of minimum 2 Ingestion OCU and maximum 4 Ingestion OCUs.
OCU is a basic unit of measure for data ingestion. Each OCU is a combination of approximately 8 GB of memory and 2 vCPUs. OpenSearch Ingestion supports up to 96 OCUs, and it automatically scales up and down based on your ingest workload demand. In general, a single OCU can handle around 1,000 write request units (WCU) on the DynamoDB table.
The code in the Pipeline configuration section defaults to the DynamoDB blueprint. For single-table design sources, you can opt to use the Zero-ETL with DynamoDB single table template,
which exports a single-table DynamoDB table by conditionally routing table partitions with different schemas to different OpenSearch indexes.
The following image displays the Pipeline configuration, pre-configured with the DynamoDB template.
You can also specify index mapping templates to make sure your DynamoDB fields are mapped to the correct fields in your OpenSearch Service indexes.
For a comprehensive overview of configuration settings for the pipeline, refer to the OpenSearch Data Prepper documentation. You must set up AWS Identity and Access Management (IAM) roles for the pipeline. For instructions, refer to Configure the pipeline role.
After a few minutes, your OpenSearch Ingestion pipeline will become active, as shown on the following screenshot.
For a more detailed walkthrough on setting up and configuring zero-ETL for your DynamoDB and OpenSearch Service integration, refer to Getting started with Amazon OpenSearch Ingestion.
Migrate from a DynamoDB integration with Rockset to OpenSearch
Migrating from a DynamoDB integration with Rockset to OpenSearch involves a seamless transition facilitated by the zero-ETL capability. Applications previously relying on Rockset for real-time analytics now need to be configured to perform analytics on OpenSearch.
Both platforms offer robust analytics capabilities. OpenSearch provides a search language known as Query DSL, which is a flexible and powerful JSON-based interface providing granular control for running search and complex analytical queries. Additionally, you can utilize the SQL plugin in OpenSearch to run the search and analytical Queries using SQL and PPL languages. The SQL plugin supports various response formats such as JDBC, CSV, and JSON, and also provides an _explain
endpoint to translate your queries into OpenSearch DSL or troubleshoot errors.
This flexibility makes sure that even as you transition from Rockset to OpenSearch, you can continue to use familiar SQL queries while taking advantage of OpenSearch’s powerful analytics capabilities. For more detailed information on Query DSL and to explore its capabilities, see the OpenSearch Query DSL documentation.
Fully serverless architecture
If you prefer a fully serverless architecture, OpenSearch Service offers a serverless configuration.
By taking advantage of the serverless capabilities of both DynamoDB and OpenSearch Service, you can build a fully managed, scalable, and cost-effective real-time analytics platform that allows your team to concentrate on delivering business value rather than managing infrastructure.
Conclusion
Whether you’re migrating from Rockset or building a new application, the combination of DynamoDB, zero-ETL, and Amazon OpenSearch Service gives you the tools to build a scalable, cost-effective, and high-performance real-time analytics platform. To get started, refer to DynamoDB zero-ETL integration with Amazon OpenSearch Service and the AWS re:Invent 2023 video Amazon DynamoDB zero-ETL integration with Amazon OpenSearch Service. We also recommend trying out the workshop Amazon DynamoDB zero-ETL integration with Amazon OpenSearch Service, which provides a detailed walkthrough on setting up the pipeline and taking advantage of the full capabilities of this integration.
About the authors
Lee Hannigan is a Sr. DynamoDB Specialist Solutions Architect based in Donegal, Ireland. He brings a wealth of expertise in distributed systems, backed by a strong foundation in big data and analytics technologies. In his role, Lee excels in assisting customers with the design, evaluation, and optimization of their workloads using the capabilities of Amazon DynamoDB.
Praveen Kadipikonda is a Senior OpenSearch Specialist Solutions Architect at AWS based out of Dallas. He helps customers build efficient, performant, and scalable analytic solutions.