AWS Big Data Blog
Introducing Terraform support for Amazon OpenSearch Ingestion
Today, we are launching Terraform support for Amazon OpenSearch Ingestion. Terraform is an infrastructure as code (IaC) tool that helps you build, deploy, and manage cloud resources efficiently. OpenSearch Ingestion is a fully managed, serverless data collector that delivers real-time log, metric, and trace data to Amazon OpenSearch Service domains and Amazon OpenSearch Serverless collections. In this post, we explain how you can use Terraform to deploy OpenSearch Ingestion pipelines. As an example, we use an HTTP source as input and an Amazon OpenSearch Service domain (Index) as output.
Solution overview
The steps in this post deploy a publicly accessible OpenSearch Ingestion pipeline with Terraform, along with other supporting resources that are needed for the pipeline to ingest data into Amazon OpenSearch. We have implemented the Tutorial: Ingesting data into a domain using Amazon OpenSearch Ingestion, using Terraform.
We create the following resources with Terraform:
- Amazon OpenSearch domain(publicly accessible domain)
- AWS Identity and Access Management (IAM) role for the OpenSearch Ingestion pipeline
- Amazon CloudWatch log group
- OpenSearch Ingestion pipeline
The pipeline that you create exposes an HTTP source as input and an Amazon OpenSearch sink to save batches of events.
Prerequisites
To follow the steps in this post, you need the following:
- An active AWS account.
- Terraform installed on your local machine. For more information, see Install Terraform.
- The necessary IAM permissions required to create the AWS resources using Terraform.
- awscurl for sending HTTPS requests through the command line with AWS Sigv4 authentication. For instructions on installing this tool, see the GitHub repo.
Create a directory
In Terraform, infrastructure is managed as code, called a project. A Terraform project contains various Terraform configuration files, such as main.tf
, provider.tf
, variables.tf
, and output.df
. Let’s create a directory on the server or machine that we can use to connect to AWS services using the AWS Command Line Interface (AWS CLI):
Change to the directory.
Create the Terraform configuration
Create a file to define the AWS resources.
Enter the following configuration in main.tf
and save your file:
Create the resources
Initialize the directory:
Review the plan to see what resources will be created:
Apply the configuration and answer yes
to run the plan:
The process might take around 7–10 minutes to complete.
Test the pipeline
After you create the resources, you should see the ingest_endpoint_url
output displayed. Copy this value and export it in your environment variable:
Send a sample log with awscurl
. Replace the profile with your appropriate AWS profile for credentials:
You should receive a 200 OK
as a response.
To verify that the data was ingested in the OpenSearch Ingestion pipeline and saved in the OpenSearch, navigate to the OpenSearch and get its domain endpoint. Replace the <OPENSEARCH ENDPOINT URL>
in the snippet given below and run it.
You should see the output as below:
Clean up
To destroy the resources you created, run the following command and answer yes
when prompted:
The process might take around 30–35 minutes to complete.
Conclusion
In this post, we showed how you can use Terraform to deploy OpenSearch Ingestion pipelines. AWS offers various resources for you to quickly start building pipelines using OpenSearch Ingestion and use Terraform to deploy them. You can use various built-in pipeline integrations to quickly ingest data from Amazon DynamoDB, Amazon Managed Streaming for Apache Kafka (Amazon MSK), Amazon Security Lake, Fluent Bit, and many more. The following OpenSearch Ingestion blueprints allow you to build data pipelines with minimal configuration changes and manage them with ease using Terraform. To learn more, check out the Terraform documentation for Amazon OpenSearch Ingestion.
About the Authors
Rahul Sharma is a Technical Account Manager at Amazon Web Services. He is passionate about the data technologies that help leverage data as a strategic asset and is based out of New York city, New York.
Farhan Angullia is a Cloud Application Architect at AWS Professional Services, based in Singapore. He primarily focuses on modern applications with microservice software patterns, and advocates for implementing robust CI/CD practices to optimize the software delivery lifecycle for customers. He enjoys contributing to the open source Terraform ecosystem in his spare time.
Arjun Nambiar is a Product Manager with Amazon OpenSearch Service. He focusses on ingestion technologies that enable ingesting data from a wide variety of sources into Amazon OpenSearch Service at scale. Arjun is interested in large scale distributed systems and cloud-native technologies and is based out of Seattle, Washington.
Muthu Pitchaimani is a Search Specialist with Amazon OpenSearch Service. He builds large-scale search applications and solutions. Muthu is interested in the topics of networking and security, and is based out of Austin, Texas.