LogAnalyzer for Amazon CloudFront
Analyze your Amazon CloudFront Logs using Amazon Elastic MapReduce.
Submitted By: Jai@AWS
Created On: May 04, 2009
Note: This tutorial assumes you are using Hadoop 1.0.3.
Amazon CloudFront is a web service that delivers your content using a global network of edge locations. Amazon CloudFront can be configured to collect access logs by updating the distribution configuration (https://docs.amazonwebservices.com/AmazonCloudFront/latest/DeveloperGuide/HowToUpdateDistribution.html).
Using Amazon Elastic MapReduce and LogAnalyzer application you can generate usage reports containing total traffic volume, object popularity, a break down of traffic by client IPs and edge location. Reports are formatted as tab delimited text files, and delivered to the Amazon S3 bucket that you specify.
Amazon CloudFront's Access Logs provide detailed information about requests made for your content delivered through Amazon CloudFront, AWS's content delivery service. The LogAnalyzer for Amazon CloudFront analyzes the service's raw log files to produce a series of reports that answer business questions commonly asked by content owner.
Source Location | https://github.com/awslabs/emr-sample-apps/tree/master/cloudfront |
Sample Dataset Location | elasticmapreduce/samples/cloudfront/input |
Source License | Apache License, Version 2.0 and GPL Version 2.0 |
Running the Analyzer
The application can be run in two ways either from the AWS Console (https://console.aws.amazon.com/) using the Amazon Elastic MapReduce tab or using the Amazon Elastic MapReduce Ruby Client
To run the application using the console click on the "Create New JobFlow" button, select Sample Applications and choose CloudFront LogAnalyzer (Custom Jar). Click "Continue". In the Jar Arguments textbox replace
If you have the Ruby Client already installed then you can generate reports by running
./elastic-mapreduce --create --jar s3n://elasticmapreduce/samples/cloudfront/logprocessor.jar --args "-input,s3n://elasticmapreduce/samples/cloudfront/input,-output,s3n:///cloudfront/log-reports"
In this command replace
Reports Generated
This sample application produces four sets of reports based on Amazon CloudFront access logs. The Overall Volume Report displays total amount of traffic delivered by CloudFront over the course of whatever period you specify. The Object Popularity Report shows how many times each of your objects are requested. The Client IP report shows the traffic from each different Client IP that made a request for your content. The Edge Location Report shows the total number of traffic delivered through each edge location. Each report measures traffic in three ways: the total number of requests, the total number of bytes transferred, and the number of request broken down by HTTP response code.
Customizing the Application
The LogAnalyzer is implemented using Cascading (https://www.cascading.org) and is an example of how to construct an Amazon Elastic MapReduce application. To customize the reports generated by the LogAnalyzer, download the source code from this page. Follow the instructions in the README for building and uploading to Amazon S3 for use with Amazon Elastic MapReduce.
How to Run this Application | You can run this application using the AWS Management Console or Command Line Tools |
Sample Input Parameters | -input s3n://elasticmapreduce/samples/cloudfront/input -output s3n:// |
Further Reading | https://aws.amazon.com/cloudfront |