AWS Big Data Blog
Migrate your indexes to Amazon OpenSearch Serverless with Logstash
We recently announced the general availability of Amazon OpenSearch Serverless , a new option for Amazon OpenSearch Service that makes it easy run large-scale search and analytics workloads without having to configure, manage, or scale OpenSearch clusters. With OpenSearch Serverless, you get the same interactive millisecond response times as OpenSearch Service with the simplicity of a serverless environment.
In this post, you’ll learn how to migrate your existing indices from an OpenSearch Service managed cluster domain to a serverless collection using Logstash.
With OpenSearch domains, you get dedicated, secure clusters configured and optimized for your workloads in minutes. You have full control over the configuration of compute, memory, and storage resources in clusters to optimize cost and performance for your applications. OpenSearch Serverless provides an even simpler way to run search and analytics workloads—without ever having to think about clusters. You simply create a collection and a group of indexes, and can start ingesting and querying the data.
Solution overview
Logstash is open-source software that provides ETL (extract, transform, and load) for your data. You can configure Logstash to connect to a source and a destination via input and output plugins. In between, you configure filters that can transform your data. This post walks you through the steps you need to set up Logstash to connect an OpenSearch Service domain (input) to an OpenSearch Serverless collection (output).
You set the source and destination plugins in Logstash’s config file. The config file has sections for Input
, Filter
, and Output
. Once configured, Logstash will send a request to the OpenSearch Service domain and read the data according to the query you put in the input
section. After data is read from OpenSearch Service, you can optionally send it to the next stage Filter
for transformations such as adding or removing a field from the input data or updating a field with different values. In this example, you won’t use the Filter
plugin. Next is the Output
plugin. The open-source version of Logstash (Logstash OSS) provides a convenient way to use the bulk API to upload data to your collections. OpenSearch Serverless supports the logstash-output-opensearch output plugin, which supports AWS Identity and Access Management (IAM) credentials for data access control.
The following diagram illustrates our solution workflow.
Prerequisites
Before getting started, make sure you have completed the following prerequisites:
- Note down your OpenSearch Service domain’s ARN, user name, and password.
- Create an OpenSearch Serverless collection. If you’re new to OpenSearch Serverless, refer to Log analytics the easy way with Amazon OpenSearch Serverless for details on how to set up your collection.
Set up Logstash and the input and output plugins for OpenSearch
Complete the following steps to set up Logstash and your plugins:
- Download
logstash-oss-with-opensearch-output-plugin
. (This example uses the distro for macos-x64. For other distros, refer to the artifacts.) - Extract the downloaded tarball:
- Update the
logstash-output-opensearch
plugin to the latest version: - Install the
logstash-input-opensearch
plugin:
Test the plugin
Let’s get into action and see how the plugin works. The following config file retrieves data from the movies
index in your OpenSearch Service domain and indexes that data in your OpenSearch Serverless collection with same index name, movies
.
Create a new file and add the following content, then save the file as opensearch-serverless-migration.conf
. Provide the values for the OpenSearch Service domain endpoint under HOST, USERNAME, and PASSWORD in the input
section, and the OpenSearch Serverless collection endpoint details under HOST along with REGION, AWS_ACCESS_KEY_ID, and AWS_SECRET_ACCESS_KEY in the output
section.
You can specify a query in the input
section of the preceding config. The match_all
query matches all data in the movies
index. You can change the query if you want to select a subset of the data. You can also use the query to parallelize the data transfer by running multiple Logstash processes with configs that specify different data slices. You can also parallelize by running Logstash processes against multiple indexes if you have them.
Start Logstash
Use the following command to start Logstash:
After you run the command, Logstash will retrieve the data from the source index from your OpenSearch Service domain, and write to the destination index in your OpenSearch Serverless collection. When the data transfer is complete, Logstash shuts down. See the following code:
Verify the data in OpenSearch Serverless
You can verify that Logstash copied all your data by comparing the document count in your domain and your collection. Run the following query either from the Dev tools tab, or with curl
, postman
, or a similar HTTP client. The following query helps you search all documents from the movies
index and returns the top documents along with the count. By default, OpenSearch will return the document count up to a maximum of 10,000. Adding the track_total_hits
flag helps you get the exact count of documents if the document count exceeds 10,000.
Conclusion
In this post, you migrated data from your OpenSearch Service domain to your OpenSearch Serverless collection using Logstash’s OpenSearch input and output plugins.
Stay tuned for a series of posts focusing on the various options available for you to build effective log analytics and search solutions using OpenSearch Serverless. You can also refer the Getting started with Amazon OpenSearch Serverless workshop to know more about OpenSearch Serverless.
If you have feedback about this post, submit it in the comments section. If you have questions about this post, start a new thread on the Amazon OpenSearch Service forum or contact AWS Support.
About the authors
Prashant Agrawal is a Sr. Search Specialist Solutions Architect with Amazon OpenSearch Service. He works closely with customers to help them migrate their workloads to the cloud and helps existing customers fine-tune their clusters to achieve better performance and save on cost. Before joining AWS, he helped various customers use OpenSearch and Elasticsearch for their search and log analytics use cases. When not working, you can find him traveling and exploring new places. In short, he likes doing Eat → Travel → Repeat.
Jon Handler (@_searchgeek) is a Sr. Principal Solutions Architect at Amazon Web Services based in Palo Alto, CA. Jon works closely with the CloudSearch and Elasticsearch teams, providing help and guidance to a broad range of customers who have search workloads that they want to move to the AWS Cloud. Prior to joining AWS, Jon’s career as a software developer included four years of coding a large-scale, eCommerce search engine.