Support for TensorFlow and S3 select with Spark on Amazon EMR release 5.17.0

Posted on: Sep 20, 2018

You can now use TensorFlow 1.9.0, the popular machine and deep learning framework, and S3 Select with Apache Spark on Amazon EMR release 5.17.0. Tensorflow libraries can be combined with big data processing engines like Spark on EMR to speed up the model training process by parallelizing the tuning of training parameters. The trained model can then be broadcast to all the nodes of the cluster to perform distributed inference on a large amount of data that are too big to run on a single node. TensorFlow on EMR is packaged with TensorBoard, a visualization tool, that helps you visualize and debug the flow of tensor graph in real-time, understand the effects of your design choices, and further optimize your model. TensorFlow builds on EMR vary by the instance type you use for your cluster.

With EMR release 5.17.0, you can use S3 Select with Spark. This feature allows your Spark application to selectively query a subset of data from a large object in S3. This improves performance by reducing the amount of data that needs to be transferred to and processed by the EMR cluster. Additionally, with this release, you can configure JupyterHub on EMR to save and persist notebooks directly to S3. You can also use the upgraded versions of Apache Flink 1.5.2, Apache HBase 1.4.6 and Presto 0.206.

You can create an Amazon EMR cluster with the release 5.17.0 by choosing the release label “emr-5.17.0” from the AWS Management Console, AWS CLI, or SDK. You can select TensorFlow, Flink, HBase, and Presto to install these applications when you launch your EMR cluster. Please visit the Amazon EMR documentation for more information about EMR release 5.17.0, TensorFlow 1.9.0, S3 Select with SparkFlink 1.5.2, HBase 1.4.6, and Presto 0.206.

Amazon EMR release 5.17.0 is now available in all supported regions for Amazon EMR.

You can stay up to date on EMR releases by subscribing to the feed for EMR release notes. Use the RSS icon at the top of the EMR Release Guide to link the feed URL directly to your favorite feed reader.