Amazon EMR

Apache HBase on Amazon EMR

Why Apache HBase on EMR?

Amazon EMR natively supports Apache HBase to give you realtime access to tables that can scale to billions of rows and millions of columns. Amazon EMR combines the benefits of open source Apache HBase - column oriented data store on distributed systems – with the durability, performance, integration and tooling capabilities of Amazon EMR. You can get strongly consistent writes and reads, and you can query results on petabytes of data within milliseconds to power mission critical workloads in financial services, ad tech, web analytics and applications using time-series data. Your existing Apache HBase applications will work on Amazon EMR without any code changes. Learn more about Apache HBase on Amazon EMR.

Features and benefits

to Amazon S3.

and Amazon EBS volumes, so you can customize the hardware of your cluster to optimize for cost and performance.

e for more details about Amazon EMR features.

using the EMR File System. Separating your cluster’s storage and compute nodes by using Amazon S3 as a data store, provides several advantages over on-cluster HDFS. You can save costs by sizing your cluster for your compute requirements instead of HDFS data storage, get the availability and durability of S3 storage, scale compute nodes without impacting your underlying storage, and terminate your cluster to save costs and quickly restore it. You can also create and configure a read-replica cluster in another Amazon EC2 Availability Zone that provides read-only access to the same data as the primary cluster, ensuring uninterrupted access to your data even if the primary cluster becomes unavailable.