AWS Big Data Blog

Apache Tez Now Available with Amazon EMR

Moataz Anany is a Solutions Architect with AWS

Amazon EMR has added Apache Tez version 0.8.3 as a supported application in release 4.7.0. Tez is an extensible framework for building batch and interactive data processing applications on top of Hadoop YARN. By processing data flows and computations as Directed Acyclic Graphs (DAGs), Tez provides a more flexible and efficient execution engine than MapReduce.

Both Apache Hive and Apache Pig users can benefit from using Tez. For example, Hive users will see performance gains with queries involving multiple JOIN clauses. For a list of optimizations, see Hive on Tez. Ultimately, Tez offers Hive users a more interactive querying experience with a relatively simple change in Hive’s configuration. Check out the Apache Tez topic in the EMR documentation for further details.

We’ve also added the Tez UI: a web application that displays both live and historical views of the Tez application. It is installed with Tez on EMR and accessible at the following URL on the master node of your EMR cluster:

http://<master-node-dns-name>:8080/tez-ui

Below is a screen shot from Tez showing the example Hive query in the EMR documentation referenced above.

Start using Tez today to speed up your Hive queries and Pig scripts on EMR!

If you have questions or suggestions, please leave a comment below.

————————————

Related

Combine NoSQL and Massively Parallel Analytics Using Apache HBase and Apache Hive on Amazon EMR