Amazon Elastic MapReduce (Amazon EMR) makes it easy to provision and manage Hadoop in the AWS Cloud. Hadoop is available in multiple distributions and Amazon EMR gives you the option of using the Amazon Distribution or the MapR Distribution for Hadoop.
MapR delivers on the promise of Hadoop with a proven, enterprise-grade platform that supports a broad set of mission-critical and real-time production uses. MapR brings unprecedented dependability, ease-of-use and world-record speed to Hadoop, NoSQL, database and streaming applications in one unified Big Data platform. MapR is used across financial services, retail, media, healthcare, manufacturing, telecommunications and government organizations as well as by leading Fortune 100 and Web 2.0 companies. Investors include Lightspeed Venture Partners, Mayfield Fund, NEA, and Redpoint Ventures. Connect with MapR on Facebook, LinkedIn, and Twitter
Enhanced ease-of-use and reliability for Apache HBase applications
Instant Recovery: MapR M7 delivers database high availability. The system automatically recovers from any node failure within seconds, allowing the application to continue operating with no impact.
Zero HBase Administration: MapR M7 allows users to utilize tables without running any separate services, such as RegionServers. In addition, M7 eliminates compactions and provides seamless region splits, so the administrator does not need to run these operations manually.
Continuous Low Latency: MapR M7 provides consistent low latency by avoiding garbage collections or compactions that affect performance. Low Disk I/O coupled with smaller disk footprint makes database operations on disk fast and predictable.
Full Data Protection with Snapshots: M7 delivers full data protection for HBase. Snapshots enable point-in-time recovery of tables to protect against user or application errors. M7 expands snapshots to include all data - both files and tables. HBase tables can be read directly from snapshots and recovered directly without the downtime required to restore HBase in other distributions.
Business Continuity with Mirroring: Mirroring allows users to automatically replicate differential data in real-time across clusters. This could be employed to create disaster recovery solutions for databases or leveraged to provide read-only access to data from multiple locations. Because M7 does not require RegionServers to be reconstructed, databases can be brought up instantly on the mirrored site if the active site goes down.
NFS: MapR provides random read/write access and a standard NFS interface so that users can mount the cluster and leverage standard file-based applications with Hadoop, including Linux utilities, file browsers and non-Java applications. When using MapR on Amazon EMR, the NFS interface is pre-mounted at /mapr.
ODBC: MapR provides an ODBC driver for Hive that conforms to the standard ODBC 3.52 specification, enabling users to utilize any BI tool or SQL query builder with Hadoop. MicroStrategy, Tableau, Excel, Toad and many other commercial and open source tools are supported.
Deployment: Amazon EMR with MapR fully automates the provisioning, installation and configuration of the cluster, which can be launched via the AWS Management Console, CLI or API.
MapR Control System (MCS): MapR provides end-to-end monitoring and management for Hadoop, including hardware, storage, MapReduce and other components in the distribution.
CLI and REST API: All MCS capabilities are also exposed through the CLI and REST API. This enables users to obtain cluster information and perform operations programmatically. It also allows integration with third-party and custom monitoring/management systems.
File System High Availability: MapR provides a no-NameNode architecture that can tolerate multiple simultaneous failures with automatic failover and fallback. The metadata is distributed and replicated, just like the data. With no NameNode, there is no practical limit to how many files can be stored, and also no dependency on any external NAS.
MapReduce High Availability: MapR provides JobTracker HA, with automatic failover and fallback. If the active JobTracker fails, it is automatically started on a different node, and all jobs and tasks continue to run with no interruption.
Data Protection: MapR provides snapshots for point-in-time recovery, enabling users to recover from user and application errors. MapR uses redirect-on-write technology, so only changed blocks are snapshotted, avoiding any impact on performance. Note that snapshots are guaranteed to be consistent, so all applications are supported.
Disaster Recovery: MapR provides mirroring between clusters, enabling disaster recovery across availability zones, as well as hybrid deployments involving both on-premise and EMR clusters. For hybrid deployments, all MapR-based Hadoop distributions are supported, including EMC Greenplum MR and the Cisco UCS appliance. Note that only changed blocks are transferred, and all data is automatically compressed.
Compression and performance
Compression: MapR automatically and transparently compresses all data that is not already compressed. This reduces disk and network I/O and increases performance. There is no need to manually compress files or modify applications to handle compression. Random read/writes are also efficient because only the necessary blocks are decompressed with the capability to split files.
Performance: MapR features an advanced architecture that provides higher efficiency and parallelism, while reducing disk and network I/O. MapR holds world records on its performance.
The M7 Edition is a complete distribution for Apache Hadoop that delivers ease of use, dependability and performance advantages for NoSQL and Hadoop applications. M7 has removed the trade-offs organizations face when looking to deploy a NoSQL solution. M7 provides scale, strong consistency, reliability and continuous low latency with an architecture that does not require compactions or background consistency checks.
The M5 Edition is also a complete distribution for Apache Hadoop that delivers enterprise-grade features for all file operations on Hadoop. Features include mirroring, snapshots, NFS HA, data placement control, and many more, which the most demanding mission-critical environments will welcome.
The M3 Edition is the free version of our complete distribution for Hadoop. The M3 edition delivers a fully random read-write capable platform that supports industry-standard interfaces (e.g., NFS, ODBC), and provides management, compression and performance advantages.
With Elastic MapReduce you only pay for what you use.
Your cost will depend on the number and type of Amazon EC2 Instances in your job flow and the amount of time it is running. Pricing for Elastic MapReduce with MapR is in addition to pricing for EC2 and S3.
Pricing for Amazon EC2, Amazon EMR, and MapR
You are charged from the time the job flow begins processing until it is terminated. Partial hours are rounded up.
Save Money with Reserved and Spot Instances
The Amazon EC2 prices above are for On-demand Instances. On-Demand Instances are the most expensive but give you the most flexibility. EC2 also offers Reserved Instances and Spot Instances.
Reserved Instances give you the option to make a low, one-time payment for each instance you want to reserve and in turn receive a significant discount on the hourly charge for that instance. There are three Reserved Instance types (Light, Medium, and Heavy Utilization Reserved Instances) that enable you to balance the amount you pay upfront with your effective hourly price.
Spot Instances enable you to bid for unused Amazon EC2 capacity. Instances are charged the Spot Price, which is set by Amazon EC2 and fluctuates periodically depending on the supply of and demand for Spot Instance capacity. To use Spot Instances, you specify the maximum price you are willing to pay per instance hour. If your maximum price bid exceeds the current Spot Price, your request is fulfilled and your instances will run until either you choose to terminate them or the Spot Price increases above your maximum price (whichever is sooner).
To view more information and current prices for Reserved Instances and Spot Instances, see the Amazon EC2 pricing page.
Other Pricing Details
Amazon S3 is billed separately. (Many customers store their input and output data in S3; others store all of the data locally on HDFS.) Currently it costs $668 per month to store 10 TB of data in S3 with reduced redundancy. The more data you store, the lower the monthly price per GB.
Amazon SimpleDB is also billed separately. (Only applies if you enable debugging for your job flow)