LogAnalyzer for Amazon CloudFront

Articles & Tutorials>LogAnalyzer for Amazon CloudFront
Analyze your Amazon CloudFront Logs using Amazon Elastic MapReduce.

Details

Submitted By: Jai@AWS
Created On: May 4, 2009 2:39 PM GMT
Last Updated: March 20, 2014 9:39 PM GMT

Note: This tutorial assumes you are using Hadoop 1.0.3.

Amazon CloudFront is a web service that delivers your content using a global network of edge locations. Amazon CloudFront can be configured to collect access logs by updating the distribution configuration (http://docs.amazonwebservices.com/AmazonCloudFront/latest/DeveloperGuide/HowToUpdateDistribution.html).

Using Amazon Elastic MapReduce and LogAnalyzer application you can generate usage reports containing total traffic volume, object popularity, a break down of traffic by client IPs and edge location. Reports are formatted as tab delimited text files, and delivered to the Amazon S3 bucket that you specify.

Amazon CloudFront's Access Logs provide detailed information about requests made for your content delivered through Amazon CloudFront, AWS's content delivery service. The LogAnalyzer for Amazon CloudFront analyzes the service's raw log files to produce a series of reports that answer business questions commonly asked by content owner.

Source Location on Amazon S3 elasticmapreduce/samples/cloudfront/
code/cloudfront-loganalyzer.tgz
Compiled JAR Location elasticmapreduce/samples/cloudfront/logprocessor.jar
Sample Dataset Location elasticmapreduce/samples/cloudfront/input
Source License Apache License, Version 2.0 and GPL Version 2.0

Running the Analyzer

The application can be run in two ways either from the AWS Console (http://console.aws.amazon.com/) using the Amazon Elastic MapReduce tab or using the Amazon Elastic MapReduce Ruby Client

To run the application using the console click on the "Create New JobFlow" button, select Sample Applications and choose CloudFront LogAnalyzer (Custom Jar). Click "Continue". In the Jar Arguments textbox replace <yourbucket> with the name of the Amazon S3 bucket in which you would like the generated reports to be placed. Check to make sure that the path doesn't already exist in your S3 bucket, otherwise your job will fail. Click "Continue". Choose the number of instances to be used and then click "Continue". Review your parameters and click "Create Job Flow" to launch the application. After the Job Flow has finished, your reports should be available in the Amazon S3 bucket that you provided.

If you have the Ruby Client already installed then you can generate reports by running

./elastic-mapreduce --create --jar  s3n://elasticmapreduce/samples/cloudfront/logprocessor.jar --args  "-input,s3n://elasticmapreduce/samples/cloudfront/input,-output,s3n://<yourbucket>/cloudfront/log-reports"

In this command replace <yourbucket> with the name of the Amazon S3 bucket in which you would like the generated reports to be placed. Check to make sure that the path doesn't already exist in your S3 bucket, otherwise your job will fail.

Reports Generated

This sample application produces four sets of reports based on Amazon CloudFront access logs. The Overall Volume Report displays total amount of traffic delivered by CloudFront over the course of whatever period you specify. The Object Popularity Report shows how many times each of your objects are requested. The Client IP report shows the traffic from each different Client IP that made a request for your content. The Edge Location Report shows the total number of traffic delivered through each edge location.  Each report measures traffic in three ways: the total number of requests, the total number of bytes transferred, and the number of request broken down by HTTP response code.

Customizing the Application

The LogAnalyzer is implemented using Cascading (http://www.cascading.org) and is an example of how to construct an Amazon Elastic MapReduce application. To customize the reports generated by the LogAnalyzer, download the source code from this page. Follow the instructions in the README for building and uploading to Amazon S3 for use with Amazon Elastic MapReduce.

How to Run this Application You can run this application using the AWS Management Console or Command Line Tools
Sample Input Parameters -input s3n://elasticmapreduce/samples/cloudfront/input
-output s3n://<yourbucket>/<output prefix>
-start any
-end <Current date in YYYY-MM-dd-HH format>
-timeBucket 300
-overallVolumeReport
-objectPopularityReport
-clientIPReport
-edgeLocationReport
Further Reading http://aws.amazon.com/cloudfront

Comments

Great, kind of wish it could aggregate referrers
This is quite a good example of what exactly mapreduce is good at (and certainly a gateway drug for people who haven't used elastic map reduce much, like myself). I do wonder about why this doesn't support a 'top referrers' statistic? Referrers are included in CloudFront logs, but maybe that's a recent addition. Anyway, it would be appreciated, and in the meantime I'll see if I can hack it in. Thanks for the great code!
devseed on October 23, 2009 2:55 PM GMT
Mac OSX Java version problem
If you try to compile this on OS X and get the error "class file has wrong version XX.0, should be 49.0", it's due to trying to compile using an incorrect Java version, in my case 1.5 - LogAnalyzer needs 1.6. To fix the problem: 1. Make sure the latest version of Java is installed - it's generally installed with Software Update, but can be downloaded from Apple.com. 2. Terminal to /System/Library/Frameworks/JavaVM/Versions and run the command "sudo ln -fhsv 1.6.0 CurrentJDK" 3. Compile using "ant jar"
cmune on July 11, 2009 3:53 PM GMT
We are temporarily not accepting new comments.
©2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.