Amazon Elastic MapReduce

New – Auto Scaling for EMR Clusters

The Amazon EMR team is cranking out new features at an impressive pace (guess they have lots of worker nodes)! So far this quarter they have added all of these features: September – Data Encryption for Apache Spark, Tez, and Hadoop MapReduce. September – Open-sourced EMR-DynamoDB Connector for Apache Hive. November – Stream Processing at […]

Snowball HDFS Import

Update (June 2019) – This feature is no longer available. If you are running MapReduce jobs on premises and storing data in HDFS (the Hadoop Distributed File System), you can now copy that data directly from HDFS to an AWS Snowball without using an intermediary staging file. Because HDFS is often used for Big Data […]

Amazon EMR Update – Apache Spark 1.5.2, Ganglia, Presto, Zeppelin, and Oozie

My colleague Jon Fritz wrote the guest post below to introduce you to the newest version of Amazon EMR. — Jeff; Today we are announcing Amazon EMR release 4.2.0, which adds support for Apache Spark 1.5.2, Ganglia 3.6 for Apache Hadoop and Spark monitoring, and new sandbox releases for Presto (0.125), Apache Zeppelin (0.5.5), and […]

Elastic MapReduce Release 4.0.0 With Updated Applications Now Available

Amazon EMR is a managed cluster platform that simplifies running big data frameworks, such as Apache Hadoop and Apache Spark, on AWS to process and analyze vast amounts of data. By using these frameworks and related open-source projects, such as Apache Hive and Apache Pig, you can process data for analytics purposes and business intelligence […]

AWS GovCloud (US) Update – AWS Key Management Service Now Available

The AWS Key Management Service (AWS KMS) provides you with seamless, centralized control over your encryption keys. As I noted when we launched the service (see my post, New AWS Key Management Service, for more information), this service gives you a new option for data protection and relieves you of many of the more onerous […]

Amazon EMR Now Supports Amazon S3 Client-Side Encryption

Many AWS customers use Amazon EMR to process huge amounts of data. Built around Hadoop, EMR allows these customers to build highly scalable processing systems that can quickly and efficiently digest raw data and turn it into actionable business intelligence. EMR File System (EMRFS) enables Amazon EMR clusters to operate directly on data in Amazon […]

Resource Groups and Tagging for AWS

For many years, AWS customers have used tags to organize their EC2 resources (instances, images, load balancers, security groups, and so forth), RDS resources (DB instances, option groups, and more), VPC resources (gateways, option sets, network ACLS, subnets, and the like) Route 53 health checks, and S3 buckets. Tags are used to label, collect, and […]

Hue – A Web User Interface for Analyzing Data With Elastic MapReduce

Hue is an open source web user interface for Hadoop. Hue allows technical and non-technical users to take advantage of Hive, Pig, and many of the other tools that are part of the Hadoop and EMR ecosystem. You can think of Hue as the primary user interface to Amazon EMR and the AWS Management Console […]

AWS News Blog

Tag: Amazon Elastic MapReduce