Amazon CloudFront Request Logging
We just added a handy new request logging feature to Amazon CloudFront. One line is written to a log file each time an object is accessed via CloudFront. This data (object name, access point, time of access, and so forth) can be used to generate usage reports using reporting tools.
Once enabled for a particular distribution (there’s complete information in the Developer Guide), CloudFront will deposit new log files into the S3 bucket of your choice every hour or so. The logs are generated in the W3C Extended Log File Format, making it possible for you to process them using existing server log analysis tools. As you can see from the screen shot at right, the log includes a field for the edge location. You can use this field to analyze and understand the geographic distribution of your user base.
The free Cloudberry S3 Explorer makes it easy for you to control CloudFront logging. You need only enable the feature, enter the desired prefix for the log files, and choose an S3 bucket. This very handy program has an impressive feature list and should be of value to anyone using S3 and/or CloudFront. Andy from CloudBerry Lab also sent a picture of their new reporting feature, made possible by the CloudFront logs:
Good Data has created an on-demand service to analyze and report on CloudFront performance and traffic. The new service takes advantage of their ability to load, analyze, and report on large amounts of data. You can access your logs using a pre-built dashboard of reports, do ad hoc analysis, and correlate it with other internal or external data. You can embed the finished reports and charts in your corporate portal or wiki if you’d like. There’s a free trial and an introductory video too.
We have also published a really cool code sample. This sample uses Amazon Elastic MapReduce and the Cascading framework to show how to process the new CloudFront log files in a scalable fashion. The sample reads any number of log files from a given S3 bucket, processes the files which represent a specified date range, and generates up to four report families: Client IP addresses, Object Popularity, Overall Volume, and Edge Locations. The reports are written to another S3 bucket. Here’s one of the Edge Location reports, request counts by edge location:
I’m confident that additional tool and analysis support will emerge before too long. When it does, I’ll cover them in a new post. We’ll add them to the CloudFront Solutions Catalog.
Update: BucketExplorer has also added support for logging. Per their announcement, they have also added support for creating and managing distributions, the ability to map CNAMEs to distributions, batch-mode ACL updates (very handy for large S3 buckets), automated bucket backup, and lots more.