Posted On: Dec 22, 2017
You can now use Apache Spark 2.2.1, Apache Hive 2.3.2, and Amazon SageMaker integration with Apache Spark on Amazon EMR release 5.11.0. Spark 2.2.1 and Hive 2.3.2 include various bug fixes and improvements. Amazon SageMaker Spark is an open-source Spark library for Amazon SageMaker, a fully-managed service which can build, train, and deploy machine learning models at scale. It enables you to interleave Spark stages and stages that interact with Amazon SageMaker in your Spark ML Pipelines, allowing you to train models using Spark DataFrames in Amazon SageMaker with Amazon-provided ML algorithms like K-Means clustering or XGBoost.
You can create an Amazon EMR cluster with release 5.11.0 by choosing release label “emr-5.11.0” from the AWS Management Console, AWS CLI, or SDK. You can select Spark and Hive to install these applications on your cluster. The Amazon SageMaker Spark library is automatically included when you install Spark. Please visit the Amazon EMR documentation for more information about release 5.11.0, Spark 2.2.1, Hive 2.3.2, and using Amazon SageMaker with Spark.
Amazon EMR release 5.11.0 is available in all supported regions for Amazon EMR.