Updates to Apache Spark and Apache Hive in Amazon EMR release 5.8.0

Posted on: Aug 10, 2017

You can now use new versions of Apache Spark (2.2.0), Apache Hive (2.3.0), and Apache Flink (1.3.1) on Amazon EMR release 5.8.0. Spark 2.2.0 resolved over 1,100 tickets, including the general availability of Structured Streaming, new machine learning algorithms in MLlib, and improvements to Spark’s cost-based optimizer. Hive 2.3.0 and Flink 1.3.1 contain bug fixes and improvements. Additionally, you can now store workflow files for Apache Oozie in Amazon S3 and the AWS SDK used by applications on your cluster is now upgraded to 1.11.160.

You can create an Amazon EMR cluster with release 5.8.0 by choosing release label “emr-5.8.0” from the AWS Management Console, AWS CLI, or SDK. You can select Spark, Hive, Flink, Oozie, and HBase to install these applications on your cluster. Please visit the Amazon EMR documentation for more information about release 5.8.0, Spark 2.2.0, Hive 2.3.0, Flink 1.3.1, HBase 1.3.1, and Oozie 4.3.0.

Amazon EMR release 5.8.0 is available in all supported regions for Amazon EMR.