AWS Database Blog

Elasticsearch tutorial: a quick start guide

Elasticsearch has REST API operations for everything—including its indexing capabilities. Besides the REST API, there are AWS SDKs for the most popular development languages. In this guide, we use the REST API so that you can learn about the underlying technology in a language-agnostic way.

Indexing is the core of Elasticsearch. It’s what allows you to perform blazing-fast searches across terabytes of data. But you can’t search data that doesn’t exist. So, in this post, I go over how to create indexes, put data into Elasticsearch, and then search with Elasticsearch using Amazon Elasticsearch Service.

Create an Amazon Elasticsearch Service domain

If you haven’t already done so, be sure to sign up for an AWS account. You can try Amazon Elasticsearch Service using the free tier for the first 12 months when you sign up with a new account, and getting started with Amazon Elasticsearch Service is pretty straightforward.

When your account is ready, create an Amazon Elasticsearch Service domain (cluster with config). To get one going (it takes about 15 minutes), follow the steps in Creating and Configuring Amazon Elasticsearch Service Domains.

There are only a few basic steps to getting an Amazon Elasticsearch Service domain up and running:

  1. Define your domain
  2. Configure your cluster
  3. Set up access
  4. Review

After completing those four steps, you’ll be up and running, and ready to continue this guide. I encourage you to set up a domain now if you haven’t yet. Then launch Kibana so that you can follow along. Kibana is available via a link in your domain overview. To access it, you need to set up the appropriate permissions. To use Amazon Cognito for granting access, see Amazon Cognito Authentication for Kibana.

I’ve gone ahead and given my domain open access because it’s only for demo purposes, and I will tear it down after I’m done with the samples. For anything beyond demo purposes, you definitely need to secure your access points when you do any work with Elasticsearch and Kibana. For the highest level of security, I recommend that you put your domain inside a virtual private cloud (VPC).

After you have an Amazon Elasticsearch Service domain set up, you can get started by putting some data into Amazon Elasticsearch Service. Let’s look at that next.

How to get data into Amazon Elasticsearch Service

In Elasticsearch, data is put into an index as a JSON document. You could explicitly create an index, but there’s no real need for that. Amazon Elasticsearch Service creates an index around the first document you add. This makes it possible to put a document into an index without knowing whether it exists.

Let’s begin the tutorial by putting a document into an index.

Putting a document into an index

The HTTP verb for creating a new resource is PUT, which is what you use to create a new document and index in Amazon Elasticsearch Service. You can use any HTTP tool, such as Postman, curl, or the dev console in Kibana.

Whichever tool you use, make the HTTP call as follows to create an index with a new document:

PUT /vegetables/_doc/1
{
  "name":"carrot",
  "color":"orange"
}

The preceding example assumes that you’re using the dev console in Kibana. If you’re using a different tool, adjust accordingly by providing the full URL and credentials, if necessary. Any way you call it, that endpoint creates an index named vegetables and puts a single document into the index with an ID of 1.

The _doc part is a bit of a legacy that will soon go away completely. It represents the type of the document. In earlier versions, you could have multiple types of documents in the same index. You could have a food index with types like _veggies, _desserts, and _tacos—each with a different structure. Unfortunately, this hinders search performance, so types are being slowly phased out of Elasticsearch. It’s better to have an index for each type, like this: /veggies/_doc, /desserts/_doc, and /tacos/_doc.

If you would rather have Amazon Elasticsearch Service generate an ID for you, like some other JSON repositories, it can.

Auto-generated IDs

It’s simple to have Amazon Elasticsearch Service generate an ID for your documents. All you have to do is use a POST instead of a PUT.

POST /veggies/_doc
{
  "name":"beet",
  "color":"red",
  "classification":"root"
}

This call creates an index named veggies and adds the document to the index. It also generates an ID for the document. You might have noticed that you don’t provide anything after _doc in the URL. Normally, an ID would go there. Because you’re creating a document with a generated ID, you don’t provide one yet. That’s reserved for something else—updates.

Updating a document with a post

Use an HTTP POST with the identifier to update an existing document.

You can create a document with the ID 42, as follows:

POST /veggies/_doc/42
{
  "name":"sugar-beet",
  "color":"red",
  "classification":"bark"
}

Then you use that ID to update the document, like this:

POST /veggies/_doc/42
{
  "name":"sugar-beet",
  "color":"red",
  "classification":"root"
}

This command updates the document with the new classification value “root”. When you try to update a document that does not exist, Amazon Elasticsearch Service creates the document.

Let’s recap the commands so far:

  • PUT creates a document with a specified ID.
  • POST updates the document with the specified ID.
  • POST also creates a document with an auto-generated ID when you don’t provide one.

Now that you understand the basics, we can look at how to get a bunch of data in all at once using the bulk API.

Bulk actions

Using the _bulk API operation, you can perform many actions on one or more indexes in one call. Performing several create, update, and delete actions in a single call speeds up your operations. Here’s the basic formula:

POST /_bulk
<action_meta>\n
<action_data>\n
<action_meta>\n
<action_data>\n

Each action takes two lines of JSON. First, you provide the action description or metadata. Then, on the next line, you have the data. Each part and action is separated by a newline (\n). An action description for an insert might look like the following:

{ "create" : { "_index" : "veggies", "_type" : "_doc", "_id" : "7" } }

And the next line of data might look like this:

{ "name":"kale", "color":"green", "classification":"leafy-green" }

Taken together, the meta and the data represent a single action in a bulk operation. You can send many operations in one call, like the following:

POST /_bulk
{ "create" : { "_index" : "veggies", "_type" : "_doc", "_id" : "7" } }
{ "name":"kale", "color":"green", "classification":"leafy-green" }
{ "create" : { "_index" : "veggies", "_type" : "_doc", "_id" : "8" } }
{ "name":"spinach", "color":"green", "classification":"leafy-green" }
{ "create" : { "_index" : "veggies", "_type" : "_doc", "_id" : "9" } }
{ "name":"arugula", "color":"green", "classification":"leafy-green" }
{ "create" : { "_index" : "veggies", "_type" : "_doc", "_id" : "10" } }
{ "name":"endive", "color":"green", "classification":"leafy-green" }
{ "create" : { "_index" : "veggies", "_type" : "_doc", "_id" : "11" } }
{ "name":"lettuce", "color":"green", "classification":"leafy-green" }
{ "delete" : { "_index" : "vegetables", "_type" : "_doc", "_id" : "1" } }

Notice that the last action is a delete. There’s no data following the delete action. And because the URL doesn’t specify an index (and it can), the bulk operation can take action on any index in the domain.

Okay, now that you know how to put data into Amazon Elasticsearch Service, let’s move on to searching.

How to search with Amazon Elasticsearch Service

Searching is the main event when it comes to Elasticsearch! Having a lot of data is great, but what good does it do until you actually put it to use? And what better way to start using your data than to search for specific values?

Are you looking for all the root vegetables? Do you need a count of all leafy greens? How about the number of errors logged per hour? The answers all start with an index search.

Let’s take a look at a basic search. Then you can move on to some more advanced searching.

Basic searches

Your basic search looks like the following:

GET /veggies/_search?q=name:l*

This example should bring back a JSON response with the lettuce document.

Advanced searches

You can do some advanced searching by providing the query options as JSON in the request body. Try the following:

GET /veggies/_search
{
  "query": {
    "term": {
      "name": "lettuce"
    }
  }
}

This example should also bring back a JSON response with the lettuce document.

You can do more with this type of query. Let’s try sorting. But first, you need to prep the index. You need to re-create the index because the automatic field mapping chose types that can’t be sorted by default. Delete and create the index as follows:

DELETE /veggies

PUT /veggies
{ 
  "mappings": { 
    "_doc": { 
      "properties": { 
        "name": { 
          "type": "keyword" 
        }, 
        "color": { 
          "type": "keyword"
        },
        "classification": {
          "type": "keyword"
        }
      }
    }
  }
}

Then repopulate the index:

POST /_bulk
{ "create" : { "_index" : "veggies", "_type" : "_doc", "_id" : "7"  } }
{ "name":"kale", "color":"green", "classification":"leafy-green" }
{ "create" : { "_index" : "veggies", "_type" : "_doc", "_id" : "8" } }
{ "name":"spinach", "color":"green", "classification":"leafy-green" }
{ "create" : { "_index" : "veggies", "_type" : "_doc", "_id" : "9" } }
{ "name":"arugula", "color":"green", "classification":"leafy-green" }
{ "create" : { "_index" : "veggies", "_type" : "_doc", "_id" : "10" } }
{ "name":"endive", "color":"green", "classification":"leafy-green" }
{ "create" : { "_index" : "veggies", "_type" : "_doc", "_id" : "11" } }
{ "name":"lettuce", "color":"green", "classification":"leafy-green" }
{ "delete" : { "_index" : "vegetables", "_type" : "_doc", "_id" : "1" } }

And now, you can search with a sort like this:

GET /veggies/_search
{
  "query" : {
    "term": { "color": "green" }
  },
  "sort" : [
      "classification"
  ]
}

Here, we just added an ascending sort by the classification.

Now that you know how to search, let’s look at a few ways to get your data that flows through AWS services into your Amazon Elasticsearch Service domains.

How to put bulk and streaming data into Amazon Elasticsearch Service

Now that you know how to search your data, you probably want to try working with massive amounts of your own data. I’m sure you can think of many uses for searching and aggregating your own data. Think of your logs and all the events that occur in your system. Do you have event logs? Event streams? What about data coming in from IoT devices?

This section covers different ways to load streaming data into Amazon Elasticsearch Service. After the data is in, you can start pulling together valuable insights using the search and query APIs that you have already learned about.

We already covered the bulk API, but there’s another way to get data into your Amazon Elasticsearch Service domain: you can connect a stream data source to it. Here’s how that works.

Stream data connections

When you’re running on AWS, you can use your existing data pipelines to feed data into Amazon Elasticsearch Service. There’s a basic pattern for connecting Amazon S3, Amazon Kinesis Data Streams, and Amazon DynamoDB. You use an AWS Lambda function to connect to the source and put the data into Amazon Elasticsearch Service.

Kinesis Data Firehose, Amazon CloudWatch, and AWS IoT have more integrated solutions. Amazon Elasticsearch Service is a destination for these three streams. For example, you would use a rule action to send IoT stream data to an Amazon Elasticsearch Service domain.

Conclusion

Whether you’re running your own Elasticsearch clusters or using Amazon Elasticsearch Service domains, you can easily learn how to use the REST API to upload data and perform searches. Your eventual goal should be to get data streams into Elasticsearch, where you can perform interesting analyses. And Kibana gives you some tools to create data visualizations directly from your Elasticsearch data.

There are tons of possibilities waiting for you. You should definitely take a look at what you can do next!


About the Author

Kartavya Jain is a Sr. Product Marketing Manager at Amazon Web Services. He is a hands-on marketing professional who believes in delivering value to customers and field through results-driven, content-rich marketing.