AWS Big Data Blog
Setting up Read Replica Clusters with HBase on Amazon S3
Many customers have taken advantage of the numerous benefits of running Apache HBase on Amazon S3 for data storage, including lower costs, data durability, and easier scalability. Customers such as FINRA have lowered their costs by 60% by moving to an HBase on S3 architecture along with the numerous operational benefits that come with decoupling storage from compute and using S3 as the storage layer. HBase on S3 allows you to turn on a cluster and immediately start querying against data within S3 rather than having to go through a lengthy snapshot restore process.
With the launch of Amazon EMR 5.7.0, you can now take the high availability and durability of HBase on S3 one step further to the cluster level, where you can now start multiple HBase read-only clusters that can connect to the same HBase root directory in S3. This allows you to ensure that your data is always reachable through read replica clusters and run their clusters across multiple Availability Zones.
In this post, I guide you through setting up read replica clusters with HBase on S3.
HBase Overview
Apache HBase is a massively scalable, distributed big data store in the Apache Hadoop ecosystem. It is an open-source, non-relational, versioned database that runs on top of the Hadoop Distributed Filesystem (HDFS). It is built for random, strictly consistent, real time access for tables with billions of rows and millions of columns. It has tight integration with Apache Hadoop, Apache Hive, and Apache Pig, so you can easily combine massively parallel analytics with fast data access. HBase’s data model, throughput, and fault tolerance are a good match for workloads in ad tech, web analytics, financial services, applications using time-series data, and many more.
Table structure in HBase, like many NoSQL technologies, should be directly influenced by the queries and access patterns of the data. Query performance varies drastically based on the way the cluster has to process and return the data.
HBase on S3
To use HBase on S3 read replicas, you must first be using HBase on S3. For those unfamiliar with HBase on S3 architecture, this section conveys some of the basics.
By using S3 as a data store for HBase, you can separate your cluster’s storage and compute nodes. This enables you to cut costs by sizing your cluster for your compute requirements. You don’t have to pay to store your entire dataset with 3x replication in the on-cluster Hadoop Distributed File System (HDFS).
EMR configures HBase on Amazon S3 to cache data in-memory and on-disk in your cluster to improve read performance from S3. You can quickly and easily scale up or scale down compute nodes without impacting your underlying storage. Or you can terminate your cluster to cut costs and quickly restore it in another Availability Zone.
HBase with support for S3 is available on EMR releases from 5.2.0 onward. To use S3 as a data store, configure the storage mode and specify a root directory in your HBase configuration. Also, it’s recommended to enable EMRFS consistent view. For more information, see Apache HBase on Amazon S3.
Use cases for HBase on S3 read replica clusters
Using HBase on S3 allows your data to be stored safely and durably. It persists the data off-cluster, which eliminates the dangers of data loss for persisted writes when the cluster is terminated. However, there can situations where you want to make sure that your data on HBase is highly available, even in the rare event of the cluster or Availability Zone failure. Another case could be when you want the ability to have multiple clusters access the same root directory in S3. If you have a primary cluster that goes under heavy load during bulk loads, writes, and compactions, this feature allows you to create secondary clusters that off-load and separate the read load from the write load, ensuring that you meet your read SLAs while optimizing around cost and performance.
The following diagram shows HBase on S3 without read replicas. In this scenario, events such as cluster failure or Availability Zone failure render users unable to access data on HBase.
The HBase root directory, including HFiles and metadata, resides in S3:
Prior to EMR 5.7.0, multiple clusters could not be pointed to the same root directory. For architectures requiring high availability, you needed to create duplicate data on S3.
With the launch of EMR 5.7.0, you can now start multiple HBase read-only clusters that can connect to the same HBase root directory in S3. This allows you to ensure that your data is always reachable through read replica clusters and run your clusters across multiple Availability Zones.
Here are a few example architectures that use HBase on S3 with read replicas, showing before and after for possible downtime events.
HBase read replica in the same Availability Zone – Resilient to primary cluster failure.
HBase read replica in different Availability Zones – Resilient to Availability Zone failure.
Another great way to think about using HBase on S3 read replicas is being able to right-size your clusters for the appropriate load. For example, if you anticipate light read load but still want to ensure high availability, you can size your read replicas to be small and of smaller instance types.
An additional example would be a bulk load scenario, where you have your write cluster scale up to peak load during a bulk load. You then scale the cluster back down to a minimal size after bulk load completion. During scaling, your read replicas can maintain a read-specific sizing or even have a heterogeneous sizing mixture.
Walkthrough
Use the following steps to set up your HBase on S3 read replica environment. This functionality is only available with EMR 5.7.0 or later.
Create an EMR cluster with HBase on S3
Sample Configuration JSON:
Add data to the primary
When loading a system with HBase read replicas, you must make sure all writes from the primary cluster are flushed as HFiles to S3. The read-replica clusters read from these HFiles, and are unable to read any new writes that haven’t been flushed from the Memstore yet. To ensure that the read replica clusters are reading the most recent data, follow the steps below:
- Insert data as usual into the HBase primary cluster (BulkLoading is preferred for large amounts of data).
- Ensure that data is flushed (using the flush command) to S3. Note: This is only relevant if not using BulkLoading.
- Wait for any region splits or merges to complete to ensure that the hbase:meta table is in a consistent state.
- If any regions have changed (splits, merges) or any table metadata has been modified (tables added/removed), run the refresh_meta command on the read replica cluster.
- On the read replica cluster, run the refresh_hfiles command for the updated table(s).
Read data from a replica
You can now read data as you would normally from your clusters.
Screenshot of reading from the primary cluster
Screenshot of reading from the read replica cluster
As you can see, both clusters return the same values.
Keeping your read replicas consistent
To keep your read replica cluster consistent, follow these general guidelines:
On the read replica:
Run ‘refresh_hfiles’ when:
- Records are added/modified for a table.
Run ‘refresh_meta’ when:
- Regions have changed (splits, compacts) or any table metadata has been modified (tables added/removed)
On the primary cluster:
If compactions are enabled, run a compaction to avoid inconsistencies when major compactions are triggered (minor compactions are handled).
Related properties and commands
HBase properties:
Config | Default | Explanation |
hbase.meta.table.suffix | “” | Adds a suffix to the meta table name: value=’test’ -> ‘hbase:meta_test’ |
hbase.global.readonly.enabled | false | Puts the entire cluster into read-only mode |
Hbase.meta.startup.refresh | false | Syncs the meta table with the backing storage. Used to pick up new tables or regions. |
Reminder: These commands are automatically set if hbase.emr.readreplica.enabled is set to true.
HBase commands:
Command | Description |
refresh_hfiles <Tablename | Refreshes HFiles from disk. Used to pick up new edits on a read replica. |
clear_block_cache <tablename> | Clears the cache for the specified table. |
refresh_meta | Syncs the meta table with the backing storage. Used to pick up new tables/regions. |
Summary
You are now able to create highly available read-replica clusters for your existing HBase clusters backed by S3. You can retain read access to your HBase data in case of cluster failure, I/O intensive compactions, and even the unlikely event of an Availability Zone failure. For more information, see using HBase on S3 Read Replica clusters.
I encourage you to give the HBase on S3 read replica feature a try!
Next Steps
Take your skills to the next level. Learn tips for migrating to Apache HBase on S3 from HDFS.