We use essential cookies and similar tools that are necessary to provide our site and services. We use performance cookies to collect anonymous statistics, so we can understand how customers use our site and make improvements. Essential cookies cannot be deactivated, but you can choose “Customize” or “Decline” to decline performance cookies.
If you agree, AWS and approved third parties will also use cookies to provide useful site features, remember your preferences, and display relevant content, including relevant advertising. To accept or decline all non-essential cookies, choose “Accept” or “Decline.” To make more detailed choices, choose “Customize.”
Customize cookie preferences
We use cookies and similar tools (collectively, "cookies") for the following purposes.
Essential
Essential cookies are necessary to provide our site and services and cannot be deactivated. They are usually set in response to your actions on the site, such as setting your privacy preferences, signing in, or filling in forms.
Performance
Performance cookies provide anonymous statistics about how customers navigate our site so we can improve site experience and performance. Approved third parties may perform analytics on our behalf, but they cannot use the data for their own purposes.
Allowed
Functional
Functional cookies help us provide useful site features, remember your preferences, and display relevant content. Approved third parties may set these cookies to provide certain site features. If you do not allow these cookies, then some or all of these services may not function properly.
Allowed
Advertising
Advertising cookies may be set through our site by us or our advertising partners and help us deliver relevant marketing content. If you do not allow these cookies, you will experience less relevant advertising.
Allowed
Blocking some types of cookies may impact your experience of our sites. You may review and change your choices at any time by selecting Cookie preferences in the footer of this site. We and selected third-parties use cookies or similar technologies as specified in the AWS Cookie Notice.
Your privacy choices
We display ads relevant to your interests on AWS sites and on other properties, including cross-context behavioral advertising. Cross-context behavioral advertising uses data from one site or app to advertise to you on a different company’s site or app.
To not allow AWS cross-context behavioral advertising based on cookies or similar technologies, select “Don't allow” and “Save privacy choices” below, or visit an AWS site with a legally-recognized decline signal enabled, such as the Global Privacy Control. If you delete your cookies or visit this site from a different browser or device, you will need to make your selection again. For more information about cookies and how we use them, please read our AWS Cookie Notice.
Amazon EMR natively supports Apache HBase to give you realtime access to tables that can scale to billions of rows and millions of columns. Amazon EMR combines the benefits of open source Apache HBase - column oriented data store on distributed systems – with the durability, performance, integration and tooling capabilities of Amazon EMR. You can get strongly consistent writes and reads, and you can query results on petabytes of data within milliseconds to power mission critical workloads in financial services, ad tech, web analytics and applications using time-series data. Your existing Apache HBase applications will work on Amazon EMR without any code changes. Learn more about Apache HBase on Amazon EMR.
Features and benefits
Durability
Amazon EMR enables you to use Amazon S3 as a data store for Apache HBase using the EMR File System. Using Amazon S3 as a data store decouples your compute from storage and provides several advantages over on-cluster Hadoop Distributed File System (HDFS) from Apache Hadoop. You can save cost by sizing your cluster for your compute requirements instead of HDFS data storage requirements, while getting the availability and durability of Amazon S3 for your data storage. You can scale compute nodes without impacting your underlying storage, terminate your cluster when your job finishes to save costs, and quickly restore your cluster when you need it. You can also create and configure a read-replica cluster in an Amazon EC2 Availability Zone which the primary cluster resides, to get read-only access to the same data and ensuring uninterrupted access to your data even if the primary cluster becomes unavailable. Amazon EMR also persists Apache HBase data files (HFiles) to Amazon S3.
Performance
Apache HBase is designed to maintain performance while scaling out to hundreds of nodes, supporting random access billions of rows and millions of columns. It utilizes Amazon S3 (with EMRFS) or the Hadoop Distributed Filesystem (HDFS) as a fault-tolerant datastore. Amazon EMR supports a wide variety of instance types and Amazon EBS volumes, so you can customize the hardware of your cluster to optimize for cost and performance.
Integration
You can easily launch a fully-configured Amazon EMR cluster running Apache HBase and other Apache Hadoop and Apache Spark ecosystem applications in minutes. Amazon EMR automatically replaces poorly performing nodes, and you can easily resize your cluster to meet your requirements. You can manage tables and browse data in Apache HBase using the Hue UI, and easily backup and restore tables to Amazon S3 using EMRFS and Hadoop MapReduce. Additionally, Apache HBase on Amazon EMR can utilize Amazon EMR’s authorization, Kerberos authentication, and encryption feature sets. Click here for more details about Amazon EMR features.
Tooling
Amazon EMR enables you to use Amazon S3 as a data store for Apache HBase using the EMR File System. Separating your cluster’s storage and compute nodes by using Amazon S3 as a data store, provides several advantages over on-cluster HDFS. You can save costs by sizing your cluster for your compute requirements instead of HDFS data storage, get the availability and durability of S3 storage, scale compute nodes without impacting your underlying storage, and terminate your cluster to save costs and quickly restore it. You can also create and configure a read-replica cluster in another Amazon EC2 Availability Zone that provides read-only access to the same data as the primary cluster, ensuring uninterrupted access to your data even if the primary cluster becomes unavailable.
Customer success with HBase and EMR
FINRA customer success
FINRA uses Amazon EMR to run Apache HBase on Amazon S3 to fastly access trillions of trade records and save over 60% costs.