AWS DevOps & Developer Productivity Blog
From ELK Stack to EKK: Aggregating and Analyzing Apache Logs with Amazon Elasticsearch Service, Amazon Kinesis, and Kibana
September 8, 2021: Amazon Elasticsearch Service has been renamed to Amazon OpenSearch Service. See details.
By Pubali Sen, Shankar Ramachandran
Log aggregation is critical to your operational infrastructure. A reliable, secure, and scalable log aggregation solution makes all the difference during a crunch-time debugging session.
In this post, we explore an alternative to the popular log aggregation solution, the ELK stack (Elasticsearch, Logstash, and Kibana): the EKK stack (Amazon Elasticsearch Service, Amazon Kinesis, and Kibana). The EKK solution eliminates the undifferentiated heavy lifting of deploying, managing, and scaling your log aggregation solution. With the EKK stack, you can focus on analyzing logs and debugging your application, instead of managing and scaling the system that aggregates the logs.
In this blog post, we describe how to use an EKK stack to monitor Apache logs. Let’s look at the components of the EKK solution.
Amazon Elasticsearch Service is a popular search and analytics engine that provides real-time application monitoring and log and clickstream analytics. For this post, you will store and index Apache logs in Amazon ES. As a managed service, Amazon ES is easy to deploy, operate, and scale in the AWS Cloud. Using a managed service also eliminates administrative overhead, like patch management, failure detection, node replacement, backing up, and monitoring. Because Amazon ES includes built-in integration with Kibana, it eliminates installing and configuring that platform. This simplifies your process further. For more information about Amazon ES, see the Amazon Elasticsearch Service detail page.
Amazon Kinesis Agent is an easy-to-install standalone Java software application that collects and sends data. The agent continuously monitors the Apache log file and ships new data to the delivery stream. This agent is also responsible for file rotation, checkpointing, retrying upon failures, and delivering the log data reliably and in a timely manner. For more information, see Writing to Amazon Kinesis Firehose Using Amazon Kinesis Agent or Amazon Kinesis Agent in GitHub.
Amazon Kinesis Firehose provides the easiest way to load streaming data into AWS. In this post, Firehose helps you capture and automatically load the streaming log data to Amazon ES and back it up in Amazon Simple Storage Service (Amazon S3). For more information, see the Amazon Kinesis Firehose detail page.
You’ll provision an EKK stack by using an AWS CloudFormation template. The template provisions an Apache web server and sends the Apache access logs to an Amazon ES cluster using Amazon Kinesis Agent and Firehose. You’ll back up the logs to an S3 bucket. To see the logs, you’ll leverage the Amazon ES Kibana endpoint.
By using the template, you can quickly complete the following tasks:
· Provision an Amazon ES cluster.
· Provision an Amazon Elastic Compute Cloud (Amazon EC2) instance.
· Install Apache HTTP Server version 2.4.23.
· Install the Amazon Kinesis Agent on the web server.
· Provision an Elastic Load Balancing load balancer.
· Create the Amazon ES index and the associated log mappings.
· Create an Amazon Kinesis Firehose delivery stream.
· Create all AWS Identity and Access Management (IAM) roles and policies. For example, the Firehose delivery stream backs up the Apache logs to an S3 bucket. This requires that the Firehose delivery stream be associated with a role that gives it permission to upload the logs to the correct S3 bucket.
· Configure Amazon CloudWatch Logs log streams and log groups for the Firehose delivery stream. This helps you to troubleshoot when the log events don’t reach their destination.
EKK Stack Architecture
The following architecture diagram shows how an EKK stack works.
Prerequisites
To build the EKK stack, you must have the following:
· An Amazon EC2 key pair in the US West (Oregon) Region. If you don’t have one, create one.
· An S3 bucket in the US West (Oregon) Region. If you don’t have one, create one.
· A default VPC in the US West (Oregon) Region. If you have deleted the default VPC, request one.
· Administrator-level permissions in IAM to enable Amazon ES and Amazon S3 to receive the log data from the EC2 instance through Firehose.
Getting Started
Begin by launching the AWS CloudFormation template to create the stack.
1. In the AWS CloudFormation console, choose to the AWS CloudFormation template. Make sure that you are in the US West (Oregon) region.
Note: If you want to download the template to your computer and then upload it to AWS CloudFormation, you can do so from this Amazon S3 bucket. Save the template to a location on your computer that’s easy to remember.
2. Choose Next.
3. On the Specify Details page, provide the following:
a) Stack Name: A name for your stack.
b) InstanceType: Select the instance family for the EC2 instance hosting the web server.
c) KeyName: Select the Amazon EC2 key pair in the US West (Oregon) Region.
d) SSHLocation: The IP address range that can be used to connect to the EC2 instance by using SSH. Accept the default, 0.0.0.0/0.
e) WebserverPort: The TCP/IP port of the web server. Accept the default, 80.
4. Choose Next.
5. On the Options page, optionally specify tags for your AWS CloudFormation template, and then choose Next.
6. On the Review page, review your template details. Select the Acknowledgement checkbox, and then choose Create to create the stack.
It takes about 10-15 minutes to create the entire stack.
Configure the Amazon Kinesis Agent
After AWS CloudFormation has created the stack, configure the Amazon Kinesis Agent.
1. In the AWS CloudFormation console, choose the Resources tab to find the Firehose delivery stream name. You need this to configure the agent. Record this value because you will need it in step 3.
2. On the Outputs tab, find and record the public IP address of the web server. You need it to connect to the web server using SSH to configure the agent. For instructions on how to connect to an EC2 instance using SSH, see Connecting to Your Linux Instance Using SSH.
3. On the web server’s command line, run the following command:
sudo vi /etc/aws-kinesis/agent.json
This command opens the configuration file, agent.json, as follows.
{ "cloudwatch.emitMetrics": true, "firehose.endpoint": "firehose.us-west-2.amazonaws.com", "awsAccessKeyId": "", "awsSecretAccessKey": "", "flows": [ { "filePattern": "/var/log/httpd/access_log", "deliveryStream": "", "dataProcessingOptions": [ { "optionName": "LOGTOJSON", "logFormat": "COMMONAPACHELOG" } ] } ] }
4. For the deliveryStream key, type the value of the KinesisFirehoseDeliveryName that you retrieved from the stack’s Resources tab. After you type the value, save and terminate the agent.json file.
5. Run the following command on the CLI:
sudo service aws-kinesis-agent restart
6. On the AWS CloudFormation console choose the resources tab and note the name of the Amazon ES cluster corresponding to the LogicalID ESDomain.
7. Go to AWS Management Console, and choose Amazon Elasticsearch Service. Under My Domains, you can see the Amazon ES domain that the AWS CloudFormation template created.
Configure Kibana and View Your Apache Logs
Amazon ES provides a default installation of Kibana with every Amazon ES domain. You can find the Kibana endpoint on your domain dashboard in the Amazon ES console.
1. In the Amazon ES console, choose the Kibana endpoint.
2. In Kibana, for Index name or pattern, type logmonitor. logmonitor is the name of the AWS ES index that you created for the web server access logs. The health checks from Amazon Elastic Load Balancing generate access logs on the web server, which flow through the EKK pipeline to Kibana for discovery and visualization.
3. In Time-field name, select datetime.
4. On the Kibana console, choose the Discover tab to see the Apache logs.
Use Kibana to visualize the log data by creating bar charts, line and scatter plots, histograms, pie charts, etc.
Pie chart of IP addresses accessing the web server in the last 30 days
Bar chart of IP addresses accessing the web server in the last 5 minutes
You can graph information about http response, bytes, or IP address to provide meaningful insights on the Apache logs. Kibana also facilitates making dashboards by combining graphs.
Monitor Your Log Aggregator
To monitor the Firehose delivery stream, navigate to the Firehose console. Choose the stream, and then choose the Monitoring tab to see the Amazon CloudWatch metrics for the stream.
When log delivery fails, the Amazon S3 and Amazon ES logs help you troubleshoot. For example, the following screenshot shows logs when delivery to an Amazon ES destination fails because the date mapping on the index was not in line with the ingest log.
Conclusion
In this post, we showed how to ship Apache logs to Kibana by using Amazon Kinesis Agent, Amazon ES, and Firehose. It’s worth pointing out that Firehose automatically scales up or down based on the rate at which your application generates logs. To learn more about scaling Amazon ES clusters, see the Amazon Elasticsearch Service Developer Guide.
Managed services like Amazon ES and Amazon Kinesis Firehose simplify provisioning and managing a log aggregation system. The ability to run SQL queries against your streaming log data using Amazon Kinesis Analytics further strengthens the case for using an EKK stack. The AWS CloudFormation template used in this post is available to extend and build your own EKK stack.