Apache HBase 1.2 Now Available on Amazon EMR for Realtime Access for Massive Datasets

Posted on: Apr 21, 2016

You can now use Apache HBase 1.2 on Amazon EMR release 4.6.0. Apache HBase is a massively scalable, distributed big data store in the Apache Hadoop ecosystem. It is an open-source, non-relational, versioned database which runs on top of the Hadoop Distributed Filesystem (HDFS), and it is built for random, strictly consistent realtime access for tables with billions of rows and millions of columns. It has tight integration with Apache Hadoop, Apache Hive, and Apache Pig, so you can easily combine massively parallel analytics with fast data access. Apache HBase's data model, throughput, and fault tolerance are a good match for workloads in ad tech, web analytics, financial services, applications using time-series data, and many more.

You can create an Amazon EMR cluster with HBase 1.2 by choosing release label “emr-4.6.0” from the AWS Management Console, AWS CLI, or SDK and specifying HBase as an application. Also, HBase RegionServers, which manage and serve data in HBase, are only installed on Amazon EMR Core Nodes (and not Task Nodes) because they must be collocated with HDFS DataNodes. Please visit the Amazon EMR documentation for more information about HBase on Amazon EMR.