Log analytics has been around for some time now and is especially valuable these days for application and infrastructure monitoring, root-cause analysis, security analytics, and more. The ELK stack is a very commonly used open-source log analytics solution. While a great solution for log analytics, it does come with operational overhead. Amazon Elasticsearch Service is a great managed option for your ELK stack, and it’s easy to get started. Luckily, with only a few clicks, you can have a fully-featured cluster up and ready to index your server logs. Once the service is ready, the next step is getting your logs and application information into the database for indexing and search. Most systems use the ‘L’ in the ELK stack for this, which stands for Logstash.

Logstash collects, processes, and forwards data. While it’s most often associated with Elasticsearch, it supports plugins with a variety of capabilities.

In this quick start guide, we’ll install Logstash and configure it to ingest a log and publish it to a pipeline. We’ll start out with a basic example and then finish up by posting the data to the Amazon Elasticsearch Service. This tutorial assumes you’re comfortable with the Linux command line.

Running Logstash on an EC2 Instance

We’re going to install Logstash on an Amazon Elastic Compute Cloud (EC2) instance running a standard Amazon Linux AMI. The easiest way to add software to an AMI is with YUM. Elastic publishes a package that manages the system dependencies. Logstash is a Java application. It requires Java 8 and is not compatible with Java 9 or 10.

The first step to installing Logstash from YUM is to retrieve Elastic’s public key.
[user]$ rpm –import https://artifacts.elastic.co/GPG-KEY-elasticsearch

Next, create a logstash.repo file in /etc/yum.repos.d/ with the following contents:
name=Elastic repository for 6.x packages

Now your repository is ready for use. So install Logstash with this command line:
[user]$ sudo yum install logstash

Running LS on EC2

YUM will retrieve the current version for you. Right now, that’s 6.4.0.

Before you start, you need to make two changes to the current user’s environment. First, you need to add your current user to the logstash group so it can write to the application’s directories for caching messages. The usermod command will do this for you.
[user]$ sudo usermod -a -G logstash ec2-user

Next, if you’re running this tutorial on a micro instance, you may have memory problems. Modify your .bashrc and add this line:
[user]$ export LS_JAVA_OPTS=“-Xms500m -Xmx500m -XX:ParallelGCThreads=1”

This sets Java’s memory to a more modest setting.

Finally, log out and then log back in to allow the group change to take effect.

Understanding pipeline events

Logstash processes data with event pipelines. A pipeline consists of three stages: inputs, filters, and outputs.

Inputs generate events. They’re produced by one of many Logstash plugins. For example, an event can be a line from a file or a message from a source, such as syslog or Redis.

Filters, which are also provided by plugins, process events. You can configure a filter to structure, change, or drop events. Filters can be tied to conditional expressions and even combined.

Outputs route the events to their final destination. We’re all familiar with Logstash routing events to Elasticsearch, but there are plugins for Amazon CloudWatch, Kafka, Pager Duty, JDBC, and many other destinations.

One quick note: this tutorial assumes you’re a beginner. With that, let’s get started.

A basic pipeline

Let’s start by creating the most straightforward pipeline we can. First, create an empty directory called settings and use it to override the default configuration in the Docker container.
[user]$ mkdir settings

Now, you need to create a configuration file with a pipeline in it. Create logstash_simple.conf in settings and add this text to it:
input {
stdin {}
output {
stdout {}

Let’s run Logstash.
[user]$ /usr/share/logstash/bin/logstash -f /usr/share/logstash/config/logstash_simple.conf

After a few moments and several lines of log messages, Logstash will print this to the terminal:
The stdin plugin is now waiting for input:

There may be other messages after that one, but as soon as you see this, you can start the test.

So test your pipeline by entering “Foo!” into the terminal and then pressing enter. Logstash accepted your message as an event and then sent it back to the terminal!

stdin plugin

Parsing logs

So, you can see how easy it is to create a pipeline. Let’s create one that reads a log file from a web server.

First, you need to install the web server and start it.
[user]$ sudo yum install httpd

YUM will ask to install several packages. Say yes. Next, start the service.
[user]$ sudo service httpd start

Last, set the permissions on the httpd logs directory so Logstash can read it. We usually create users and set things up more securely, but this will do for now.
[user]$ sudo chmod 755 /var/log/httpd

Now, open another shell and verify that Apache is working with Wget.

parsing logs

Apache is running and complaining about access. That’s good enough for what we need.

So, take a quick look at the web access log file.

web access

There are the requests in the log. Now, let’s point Logstash at our weblogs. Create a new configuration file named logstash.conf in the settings directory.
input {
file {
path => "/var/log/httpd/access_log"
start_position => "beginning"
output {
stdout {}

And run Logstash with this configuration file.
[user]$ /usr/share/logstash/bin/logstash -f /usr/share/logstash/config/logstash.conf

After a few moments, Logstash will start to process the access log. Switch to the other shell and use Wget to generate a few more requests.

web log processing

Logstash is processing the weblogs!

We used the Logstash file plugin to watch the file.
input {
file {
path => "/var/log/httpd/access_log"
start_position => "beginning"

And we pointed it at the web access log. The start_position parameter tells the plugin to start processing from the start of the file. We could also use end, and it will start from the end instead.

Let’s take a look at the output from Logstash. This log message… - - [10/Sep/2018:00:03:20 +0000] "GET / HTTP/1.1" 403 3630 "-" "Wget/1.14 (linux-gnu)"

…was transformed into this:
"@version" => "1",
"message" => " - - [10/Sep/2018:00:03:20 +0000] \"GET / HTTP/1.1\" 403 3630 \"-\" \"Wget/1.14 (linux-gnu)\"",
"@timestamp" => 2018-09-10T00:16:21.559Z,
"path" => "/var/log/httpd/access_log",
"host" => "ip-172-16-0-155.ec2.internal"

We have a handful of fields and a single line with the message in it. What if we want to index our events in parts so we can group them in searches?

Let’s use filters to parse this data before we send it to Elasticsearch.

Filtering the logs

First, we’ll add a filter to our pipeline. Edit the logstash.conf file so it looks like this:
input {
file {
path => "/var/log/httpd/access_log*"
start_position => "beginning"
filter {
grok {
match => { "message" => "%{HTTPD_COMMONLOG}" }
output {
stdout {}

You’re using the Grok plugin to process the httpd log messages. Restart Logstash and wait for it to log that it’s ready. Then, make another web request. You need to generate a new event since the default behavior is not to process the same message twice. Now, look at the new output for an access log message.
"timestamp" => "10/Sep/2018:00:23:57 +0000",
"@timestamp" => 2018-09-10T00:23:57.653Z,
"ident" => "-",
"path" => "/var/log/httpd/access_log",
"host" => "ip-172-16-0-155.ec2.internal",
"auth" => "-",
"httpversion" => "1.1",
"bytes" => "3630",
"request" => "/",
"@version" => "1",
"message" => " - - [10/Sep/2018:00:23:57 +0000] \"GET / HTTP/1.1\" 403 3630 \"-\" \"Wget/1.14 (linux-gnu)\"",
"verb" => "GET",
"clientip" => "",
"response" => "403"

You have a field for every entry in the log message.

Grok’s primary role is to process input messages and provide them with structure. In this case, it took a line of text and created an object with ten fields. The plugin uses patterns to match text in messages. A pattern looks like this: %{SYNTAX:SEMANTIC}. Syntax is a value to match, and semantic is the name to associate it with.

A syntax can either be a datatype, such as NUMBER for a numeral or IPORHOST for an IP address or hostname. You used one of Logstash’s core patterns. Since processing weblogs is a common task, Logstash defines HTTPD_COMMONLOG for Apache’s access log entry.

If you look in the core pattern entry for HTTP, you can see a list of definitions that demonstrate how patterns are defined and built from one another.

# Log formats
HTTPD_COMMONLOG %{IPORHOST:clientip} %{HTTPDUSER:ident} %{HTTPDUSER:auth} \[%{HTTPDATE:timestamp}\] "(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})" %{NUMBER:response} (?:%{NUMBER:bytes}|-)

HTTPDERROR_DATE is built from a DAY, MONTH and MONTHDAY, etc.
Finally, HTTPD_COMBINEDLOG builds on the HTTPD_COMMONLOG pattern.

We have a fully processed log message. Let’s publish it to Elasticsearch!

Output pipeline

Amazon’s Elasticsearch Service requires an output plugin that supports AWS’s permissions system. So, we need to install that first.
[user}$ sudo -E bin/logstash-plugin install logstash-output-amazon_es

The -E will pass the Java settings we added to the environment to the Logstash plugin tool.

If you haven’t already created an Elasticsearch domain, do that now. Make sure it’s in the same VPC as your EC2 instance.

Now you need to get a set of AWS access keys that can publish to Elasticsearch. There are several ways to configure the plugin. We’ll use a user with access keys.

Go to the user section of the AWS console. Click the add user button. Give the user and name and set the type to programmatic access.

add user

Now, click the next button on the bottom of the page.

Click attach existing policies directly. Then, use the filter policies search box to find Amazon’s existing AmazonESFullAccess policy. This policy will allow Logstash to create indexes and add records. In production, we would create a custom policy giving the user the access it needs and nothing more.

add user 2

Then, click next and review the account settings.

add user 3

Click next again, and we have a user.

add user 4

Copy the access and secret keys from this page. Now, we can configure Logstash.

Add the amazon_es section to the output section of your config. Leave the stdout section in so you can see what’s going on.
output {
stdout {}
amazon_es {
hosts => ["search-logstash2-gqa3z66kfuvuyk2btbcpckdp5i.us-east-1.es.amazonaws.com"]
region => "us-east-1"
aws_access_key_id => 'ACCESS_KEY'
aws_secret_access_key => 'SECRET_KEY'
index => "access-logs-%{+YYYY.MM.dd}"

We’ve added the keys, set our AWS region, and told Logstash to publish to an index named access_logs and the current date.

Restart the Logstash daemon again.

Now, when Logstash says it’s ready, make a few more web requests. After Logstash logs them to the terminal, check the indexes on your Elasticsearch console. We see that the Elasticsearch created the index, and it contains the fields defined in our log messages.


Using Logstash to monitor web access

We installed Logstash from scratch on a new EC2 instance. We configured it to read from standard input and log to standard output. Then we pointed it at web access log files, set a log filter, and finally published web access logs to the Amazon Elasticsearch Service.

Based on this tutorial, you can see how easy it is to use Logstash with the Amazon Elasticsearch Service to monitor your system logs. Get started today!

Learn more about Amazon Elasticsearch Service pricing

Visit the pricing page
Ready to build?
Get started with Amazon Elasticsearch Service
Have more questions?
Contact us