Posted On: Aug 2, 2016

Use the latest versions of all 16 supported open-source applications with Amazon EMR release 5.0, including upgrades such as Apache Spark 2.0, Apache Hive 2.1, Presto 0.150, Apache Zeppelin 0.6.1 (Snapshot), Pig 0.16, and Hue 3.10. Apache Tez, an optimized execution framework, has replaced Apache Hadoop MapReduce as the default execution engine for Hive and Pig, applications now utilize the Java Development Kit 8 (JDK 8) for their runtime environment, and Spark is now compiled with Scala 2.11. All previous sandbox applications on Amazon EMR release 4.x are now GA. Additionally, you can now use enhanced debugging functionality for EMR steps to quickly find errors in logs and highlight possible common root causes.

Spark 2.0 was released GA from the Apache Foundation last week, and you can now leverage Spark’s new performance enhancements, better SQL support, the Structured Streaming API, and better SparkR support. Hive 2.1 has improved support for the Apache Parquet file format, various performance optimizations, and increased SQL support. For information on how Hive 2.1 differs from Hive 1.0 on Amazon EMR, click here. Zeppelin 0.6.1 (Snapshot) now has authentication and authorization support for notebooks, and Hue 3.10 has many UI improvements, including a notebook interface and an updated Apache Oozie workflow editor to visually create complex workflows.  

You can create an Amazon EMR cluster with release 5.0 by choosing release label “emr-5.0.0” from the AWS Management Console, AWS CLI, or SDK. You can specify the set of applications to install on your cluster, and previous sandbox applications are now specified without the “-sandbox” suffix. Enhanced debugging information is available in the console or directly in the step description, and it is automatically enabled for clusters with release 5.0. Please visit the Amazon EMR documentation for more information about release 5.0, Spark 2.0, Hive 2.1, Presto 0.150, Tez 0.8.4, Zeppelin 0.6.1 (Snapshot), Hue 3.10, and enhanced debugging. You can also join our live webinar, Introducing Amazon EMR Release 5.0, at 9AM PDT on Tuesday, August 23 for more details about release 5.0.