AWS News Blog

New Whitepaper: Amazon Elastic MapReduce (EMR) Best Practices

Voiced by Polly

CaptureAmazon Elastic MapReduce (EMR) accelerates big data analytics. It provides instant scalability and elasticity, letting you focus on analytics instead of infrastructure for your data-intensive projects. Whether you are indexing large data sets or analyzing massive amounts of scientific data or processing clickstream logs, EMR simplifies running Hadoop and related big data applications on AWS.

When analyzing massive amounts of data, the issue encompasses more challenges than simply data processing and computing. One has to make several decisions regarding how to collect and aggregate data, how to move data or point the data source to the cloud, how to compress the data and finally how to process the data faster and more cost-effectively.

In that regard, we are very excited to release the Best Practices For Amazon EMR whitepaper today. This whitepaper highlights the best practices of moving data to AWS, collecting, aggregating and compressing the data, and discusses common architectural patterns for setting up and configuring Amazon EMR clusters for faster processing. We also discuss several performance and cost optimization techniques so you can process and analyze massive amounts of data at high throughput and low cost in a reliable manner.

As always we would love to get your feedback. Please feel free to use the comments below to leave feedback so we can improve our products, features and documentation. Thanks!

– Jinesh and Parviz;

Modified 2/10/2021 – In an effort to ensure a great experience, expired links in this post have been updated or removed from the original post.
Jeff Barr

Jeff Barr

Jeff Barr is Chief Evangelist for AWS. He started this blog in 2004 and has been writing posts just about non-stop ever since.