How do I process CloudFront logs with Amazon ES?

Last updated: 2019-05-22

I want to build custom reports using the access logs from my Amazon CloudFront distribution. How can I process CloudFront logs with Amazon Elasticsearch Service (Amazon ES) so that I can create custom reports?

Short Description

Follow these steps to process CloudFront logs using Amazon ES:

1.    Create an Amazon ES domain and an Amazon Simple Storage Service (Amazon S3) bucket in the same AWS Region.

2.    Configure your CloudFront distribution to store access logs in the Amazon S3 bucket.

3.    Configure an Amazon Elastic Compute Cloud (Amazon EC2) instance to use Logstash to process the CloudFront logs and push them to the Amazon ES domain.

Resolution

Create an Amazon ES domain and an Amazon S3 bucket in the same AWS Region

To process CloudFront logs using Amazon ES, first set up these resources in the same AWS Region:

Configure your CloudFront distribution to store access logs in the Amazon S3 bucket

1.    Open the CloudFront console.

2.    Select your CloudFront distribution, and then choose Distribution Settings.

3.    In the General tab, choose Edit, and then do the following:
For Logging, select On.
For Bucket for Logs, select the S3 bucket that's in the same AWS Region as your Amazon ES domain.
For Log Prefix, enter a prefix for the names of the logs.

4.    Choose Yes, Edit.

Note: It might take up to 24 hours for log requests to be delivered to the S3 bucket.

Configure an Amazon EC2 instance to use Logstash to process the CloudFront logs and then push them to the Amazon ES domain

1.    Launch an Amazon EC2 instance.
Note: This instance must use an AWS Identity and Access Management (IAM) role that has access to Amazon S3 (GET object) and Amazon ES (PUT document).

2.    Connect to the instance using SSH.

3.    Install Java 8 on the instance—this is required to run Logstash.

4.    Run this command to download the Logstash client on the instance:

wget https://artifacts.elastic.co/downloads/logstash/logstash-5.5.0.tar.gz

5.    Run this command to extract the Logstash client:

tar xvf logstash-5.5.0.tar.gz

6.    Run this command to install the Logstash plugin for Amazon ES:

cd logstash-5.5.0
bin/logstash-plugin install logstash-output-amazon_es

7.    Create a JSON-formatted file to serve as the template for the Logstash output. The template file can be similar to the following:
Note: Be sure to modify the template according to your reporting requirements.

cloudfront.template.json

#cloudfront.template.json
{
  "template": "cloudfront-logs-*",
  "mappings": {
    "logs": {
      "_source": {
        "enabled": false
      },
      "_all": {
        "enabled": false
      },
      "dynamic_templates": [
        {
          "string_fields": {
            "mapping": {
              "index": "not_analyzed",
              "type": "string"
            },
            "match_mapping_type": "string",
            "match": "*"
          }
        }
      ],
      "properties": {
        "geoip": {
          "dynamic": true,
          "properties": {
            "ip": {
              "type": "ip"
            },
            "location": {
              "type": "geo_point"
            },
            "latitude": {
              "type": "float"
            },
            "longitude": {
              "type": "float"
            }
          }
        }
      }
    }
  }
}

8.    Create a Logstash configuration file to define the S3 bucket with CloudFront logs as the input, and the Amazon ES domain as the output. The configuration file can be similar to the following:

cloudfront.conf

#cloudfront.conf
input {
  s3 {
    bucket => "{CLOUDFRONT_LOG_BUCKET}"
    prefix => "{CLOUDFRONT_LOG_KEY_PREFIX}"
    region => "{BUCKET_REGION_NAME}"
  }
}


filter {
  grok {
    match => { "message" => "%{DATE_EU:date}\t%{TIME:time}\t(?<x_edge_location>\b[\w\-]+\b)\t(?:%{NUMBER:sc_bytes:int}|-)\t%{IPORHOST:c_ip}\t%{WORD:cs_method}\t%{HOSTNAME:cs_host}\t%{NOTSPACE:cs_uri_stem}\t%{NUMBER:sc_status:int}\t%{GREEDYDATA:referrer}\t%{GREEDYDATA:User_Agent}\t%{GREEDYDATA:cs_uri_stem}\t%{GREEDYDATA:cookies}\t%{WORD:x_edge_result_type}\t%{NOTSPACE:x_edge_request_id}\t%{HOSTNAME:x_host_header}\t%{URIPROTO:cs_protocol}\t%{INT:cs_bytes:int}\t%{GREEDYDATA:time_taken}\t%{GREEDYDATA:x_forwarded_for}\t%{GREEDYDATA:ssl_protocol}\t%{GREEDYDATA:ssl_cipher}\t%{GREEDYDATA:x_edge_response_result_type}" }
  }

  mutate {
    add_field => [ "listener_timestamp", "%{date} %{time}" ]
  }

  date {
    match => [ "listener_timestamp", "yy-MM-dd HH:mm:ss" ]
    target => "@timestamp"
  }

  geoip {
    source => "c_ip"
  }

  useragent {
    source => "User_Agent"
    target => "useragent"
  }

  mutate {
    remove_field => ["date", "time", "listener_timestamp", "cloudfront_version", "message", "cloudfront_fields", "User_Agent"]
  }
}

output {
  amazon_es {
    hosts => ["{AMAZON_ES_DOMAIN_ENDPOINT}"]
    region => "{AMAZON_ES_DOMAIN_REGION_NAME}"
    index => "cloudfront-logs-%{+YYYY.MM.dd}"
    template => "/path-to-file/cloudfront.template.json"
  }
}

9.    Use a text editor such as vi to edit the following values in your configuration file:
For bucket, enter the name of the S3 bucket that stores the CloudFront logs.
For prefix, enter the prefix that you specified as the Log Prefix when you enabled logging on your CloudFront distribution.
For region, enter the AWS Region of the S3 bucket and Amazon ES domain.
For hosts, enter the endpoint of your Amazon ES domain.
For template, enter the path to the template file that you created.

10.    Save the changes that you made to the configuration file.

11.    Run Logstash with the -f option, and specify the configuration file that you created. For more information, see Command-Line Flags on the Elastic website.

After you complete these steps, Logstash publishes documents to the Amazon ES domain that you specified. To check that documents are published successfully, open your Amazon ES domain from the Amazon ES console, and then check the Indices view.

You can now use Kibana to create custom reports and visualizations for your logs. For more information, see Kibana and Logstash.

Note: Be sure that you have an access policy that allows Kibana to access the logs stored in your Amazon ES domain.


Did this article help you?

Anything we could improve?


Need more help?