- Total bytes transferred per hour
- A list of the top 50 IP addresses by traffic per hour
- A list of the top 50 external referrers
- The top 50 search terms in referrals from Bing and Google
You can modify the Pig script to generate additional information.
|Location of Pig script||s3://elasticmapreduce/samples/pig-apache/do-reports.pig|
|Sample data set||s3://elasticmapreduce/samples/pig-apache/input|
|Source license||Apache License, Version 2.0|
Running the Pig Sample Using AWS Management Console
To run the application using the AWS Management Console:
- Navigate to AWS Management Console and sign in.
- Click Create New JobFlow, select Sample Applications, and choose Apache Log Reports (Pig Script).
- Click Continue.
- In the Output Locations field, replace
<yourbucket> with the name of the Amazon S3 bucket
into which you would like to place the generated reports.
If you don't have a bucket, use a tool, such as Firefox S3 Organizer to create one. Make sure the path doesn't already exist in your Amazon S3 bucket. If the path already exists then the job will fail.
- On the next page, choose the number of Amazon EC2 instances to use.
- Review the parameters and click Create Job Flow to launch the
When the job flow finishes, your reports should be available in the Amazon S3 bucket you specified.
Running the Pig Sample Using the Command Line Client
If you have the Amazon Elastic MapReduce Command Line Client installed, you can generate the reports using the following commands. Make sure to change mybucket in the output path to be the name of a bucket you own. Also, make sure the output path doesn't already exist. If it does, the script will fail.
$ INPUT_PATH=s3://elasticmapreduce/samples/pig-apache/input $ OUTPUT_PATH=s3://mybucket/pig-apache/output $ PIG_SCRIPT=s3://elasticmapreduce/samples/pig-apache/do-reports.pig $ ./elastic-mapreduce --create --pig-script --args "-p,INPUT=$INPUT_PATH,-p,OUTPUT=$OUTPUT_PATH,$PIG_SCRIPT"
Alternatively, you could start a development job flow and then add steps to the job flow to execute Pig scripts.
$ INPUT_PATH=s3://elasticmapreduce/samples/pig-apache/input $ OUTPUT_PATH=s3://mybucket/pig-apache/output $ PIG_SCRIPT=s3://elasticmapreduce/samples/pig-apache/do-reports.pig $ ./elastic-mapreduce --create --alive Created jobflow j-A1212121212 $ ./elastic-mapreduce --jobflow j-A1212121212 --pig-script --args "-p,INPUT=$INPUT_PATH,-p,OUTPUT=$OUTPUT_PATH,$PIG_SCRIPT"
In this way, you can execute more than a single Pig script in your job flow. If you start a job flow, you must terminate it when you're finished.
$ ./elastic-mapreduce --jobflow j-A1212121212 --terminate
Customizing this Pig Script
To customize the Pig script:
- Download the script from do-reports.pig.
- Modify and save a copy in your Amazon S3 bucket .
- Run the script following one of the instructions above but replace the pig script location with the location of your modified script in Amazon S3.