Migrate to Apache HBase on Amazon S3 on Amazon EMR: Guidelines and Best Practices
This blog post provides guidance and best practices about how to migrate from Apache HBase on HDFS to Apache HBase on Amazon S3 on Amazon EMR.
Apache HBase on Amazon S3 on Amazon EMR
Amazon EMR version 5.2.0 or later, lets you run Apache HBase on Amazon S3. By using Amazon S3 as a data store for Apache HBase, you can separate your cluster’s storage and compute nodes. This saves costs because you’re sizing your cluster for your compute requirements. You’re not paying to store your entire dataset with 3x replication in the on-cluster HDFS.
Many customers have taken advantage of the benefits of running Apache HBase on Amazon S3 for data storage. These benefits include lower costs, data durability, and more efficient scalability. Customers, such as the Financial Industry Regulatory Agency (FINRA), have lowered their costs by 60% by moving to an Apache HBase on Amazon S3 architecture. They have also experienced operational benefits that come with decoupling storage from compute and using Amazon S3 as the storage layer.
Whitepaper on Migrating to Apache HBase on Amazon S3 on Amazon EMR
This whitepaper walks you through the stages of a migration. It also helps you determine when to choose Apache HBase on Amazon S3 on Amazon EMR, plan for platform security, tune Apache HBase and EMRFS to support your application SLA, identify options to migrate and restore your data, and manage your cluster in production.
For more information, see Migrating to Apache HBase on Amazon S3 on Amazon EMR
If you found this post useful, be sure to check out Setting up Read Replica Clusters with HBase on Amazon S3, and Tips for Migrating to Apache HBase on Amazon S3 from HDFS.
About the Author
Francisco Oliveira is a Senior Big Data Engineer with AWS Professional Services. He focuses on building big data solutions with open source technology and AWS. In his free time, he likes to try new sports, travel and explore national parks.