Hive 0.13.1 now available on Amazon Elastic MapReduce

Posted on: Sep 4, 2014

We have added support for Hive 0.13.1 to Amazon Elastic MapReduce (Amazon EMR). Hive 0.13.1 is the latest release of Hive, a popular application for running SQL queries on Hadoop for ad hoc analysis or extract-transform-load (ETL) workflows.

Hive 0.13.1 is the latest release of Hive, a popular application for running SQL queries on Hadoop for ad hoc analysis or extract-transform-load (ETL) workflows. Besides accessing on-cluster HDFS, you can use Hive to query data directly in Amazon S3, Amazon DynamoDB, and Amazon Kinesis. This latest version includes several notable improvements in performance and supported SQL-syntax, such as vectorized querying to process thousand-row blocks, fast plan serialization, support for DECIMAL and CHAR datatypes, and sub-queries for IN, NOT IN, EXISTS, and NOT EXISTS operators. We have included an optimization to the Hive windowing functions that greatly increases their scalability. Additionally, we have also backported four features from the future Hive 0.14 release which enhance Parquet support. To learn about Hive 0.13.1 on EMR, please see our documentation.

You can launch an Amazon EMR cluster with Hive 0.13.1 from the AWS Management Console, AWS CLI, or SDK by selecting AMI 3.2.0 and installing Hive. AMI 3.2.0 includes Hadoop 2.4, and incorporates several bug fixes and enhancements. You can learn more about using Hive on Amazon EMR, here, and about AMI 3.2.0 here.