AWS Blog

Pig Latin – High Level Data Processing with Elastic MapReduce

Amazon Elastic MapReduce now includes support for the Pig Latin programming language.

A product of the Apache Software Foundation, Pig Latin is a SQL-like data transformation language. You can use Pig Latin to run complex processes on large-scale compute clusters without having to spend time learning the MapReduce paradigm. Pig Latin programs are built around efficient high-level data types such as bags, tuples, maps, and fields and operations like LOAD, FOREACH, and FILTER.

We support two distinct usage models for Pig Latin (both of which are covered in our Pig tutorial):

In Interactive mode you can run Pig queries on an Elastic MapReduce cluster of any size by simply setting up an SSH connection to the master node and running Grunt (the interactive Pig shell). You can solve your entire problem in this way, or you can write and debug your scripts for eventual use in Batch mode.

In Batch mode, you can launch multiple EC2 instances running Elastic MapReduce, referencing your Pig Latin script, your input data, and the desired destination for your output data.

— Jeff;