AWS Big Data Blog
Analyze more demanding as well as larger time series workloads with Amazon OpenSearch Serverless
In today’s data-driven landscape, managing and analyzing vast amounts of data, especially logs, is crucial for organizations to derive insights and make informed decisions. However, handling this data efficiently presents a significant challenge, prompting organizations to seek scalable solutions without the complexity of infrastructure management.
Amazon OpenSearch Serverless lets you run OpenSearch in the AWS Cloud, without worrying about scaling infrastructure. With OpenSearch Serverless, you can ingest, analyze, and visualize your time-series data. Without the need for infrastructure provisioning, OpenSearch Serverless simplifies data management and enables you to derive actionable insights from extensive repositories.
We recently announced a new capacity level of 10TB for Time-series data per account per Region, which includes one or more indexes within a collection. With the support for larger datasets, you can unlock valuable operational insights and make data-driven decisions to troubleshoot application downtime, improve system performance, or identify fraudulent activities.
In this post, we discuss this new capability and how you can analyze larger time series datasets with OpenSearch Serverless.
10TB Time-series data size support in OpenSearch Serverless
The compute capacity for data ingestion and search or query in OpenSearch Serverless is measured in OpenSearch Compute Units (OCUs). These OCUs are shared among various collections, each containing one or more indexes within the account. To accommodate larger datasets, OpenSearch Serverless now supports up to 200 OCUs per account per AWS Region, each for indexing and search respectively, doubling from the previous limit of 100. You configure the maximum OCU limits on search and indexing independently to manage costs. You can also monitor real-time OCU usage with Amazon CloudWatch metrics to gain a better perspective on your workload’s resource consumption.
Dealing with larger data and analysis needs more memory and CPU. With 10TB data size support, OpenSearch Serverless is introducing vertical scaling up to eight times of 1-OCU systems. For example, the OpenSearch Serverless will deploy a larger system equivalent of eight 1-OCU systems. The system will use hybrid of horizontal and vertical scaling to address the needs of the workloads. There are improvements to shard reallocation algorithm to reduce the shard movement during heat remediation, vertical scaling, or routine deployment.
In our internal testing for 10TB Time-series data, we set the Max OCU to 48 for Search and 48 for Indexing. We set the data retention for 5 days using data lifecycle policies, and set the deployment type to “Enable redundancy” making sure the data is replicated across Availability Zones . This will lead to 12_24 hours of data in hot storage (OCU disk memory) and the rest in Amazon Simple Service (Amazon S3) storage. We observed the average ingestion achieved was 2.3 TiB per day with an average ingestion performance of 49.15 GiB per OCU per day, reaching a max of 52.47 GiB per OCU per day and a minimum of 32.69 Gib per OCU per day in our testing. The performance depends on several aspects, like document size, mapping, and other parameters, which may or may not have a variation for your workload.
Set max OCU to 200
You can start using our expanded capacity today by setting your OCU limits for indexing and search to 200. You can still set the limits to less than 200 to maintain a maximum cost during high traffic spikes. You only pay for the resources consumed, not for the max OCU configuration.
Ingest the data
You can use the load generation scripts shared in the following workshop, or you can use your own application or data generator to create a load. You can run multiple instances of these scripts to generate a burst in indexing requests. As shown in the following screenshot, we tested with an index, sending approximately 10 TB of data. We used our load generator script to send the traffic to a single index, retaining data for 5 days, and used a data life cycle policy to delete data older than 5 days.
Auto scaling in OpenSearch Serverless with new vertical scaling.
Before this release, OpenSearch Serverless auto-scaled by horizontally adding the same-size capacity to handle increases in traffic or load. With the new feature of vertical scaling to a larger size capacity, it can optimize the workload by providing a more powerful compute unit. The system will intelligently decide whether horizontal scaling or vertical scaling is more price-performance optimal. Vertical scaling also improves auto-scaling responsiveness, because vertical scaling helps to reach the optimal capacity faster compared to the incremental steps taken through horizontal scaling. Overall, vertical scaling has significantly improved the response time for auto_scaling.
Conclusion
We encourage you to take advantage of the 10TB index support and put it to the test! Migrate your data, explore the improved throughput, and take advantage of the enhanced scaling capabilities. Our goal is to deliver a seamless and efficient experience that aligns with your requirements.
To get started, refer to Log analytics the easy way with Amazon OpenSearch Serverless. To get hands-on experience with OpenSearch Serverless, follow the Getting started with Amazon OpenSearch Serverless workshop, which has a step-by-step guide for configuring and setting up an OpenSearch Serverless collection.
If you have feedback about this post, share it in the comments section. If you have questions about this post, start a new thread on the Amazon OpenSearch Service forum or contact AWS Support.
About the authors
Satish Nandi is a Senior Product Manager with Amazon OpenSearch Service. He is focused on OpenSearch Serverless and has years of experience in networking, security and ML/AI. He holds a Bachelor’s degree in Computer Science and an MBA in Entrepreneurship. In his free time, he likes to fly airplanes, hang gliders and ride his motorcycle.
Michelle Xue is Sr. Software Development Manager working on Amazon OpenSearch Serverless. She works closely with customers to help them onboard OpenSearch Serverless and incorporates customer’s feedback into their Serverless roadmap. Outside of work, she enjoys hiking and playing tennis.
Prashant Agrawal is a Sr. Search Specialist Solutions Architect with Amazon OpenSearch Service. He works closely with customers to help them migrate their workloads to the cloud and helps existing customers fine-tune their clusters to achieve better performance and save on cost. Before joining AWS, he helped various customers use OpenSearch and Elasticsearch for their search and log analytics use cases. When not working, you can find him traveling and exploring new places. In short, he likes doing Eat → Travel → Repeat.