Apache HBase on Amazon EMR
Why Apache HBase on EMR?
Features and benefits
Durability
to Amazon S3.
Performance
and Amazon EBS volumes, so you can customize the hardware of your cluster to optimize for cost and performance.
Integration
e for more details about Amazon EMR features.
Tooling
using the EMR File System. Separating your cluster’s storage and compute nodes by using Amazon S3 as a data store, provides several advantages over on-cluster HDFS. You can save costs by sizing your cluster for your compute requirements instead of HDFS data storage, get the availability and durability of S3 storage, scale compute nodes without impacting your underlying storage, and terminate your cluster to save costs and quickly restore it. You can also create and configure a read-replica cluster in another Amazon EC2 Availability Zone that provides read-only access to the same data as the primary cluster, ensuring uninterrupted access to your data even if the primary cluster becomes unavailable.
Customer success with HBase and EMR
FINRA customer success
FINRA uses Amazon EMR to run Apache HBase on Amazon S3 to fastly access trillions of trade records and save over 60% costs.

Monster customer success
Monster uses Apache HBase on Amazon EMR to store clickstream and advertising campaign data and run SQL queries with Apache Hive.
