AWS News Blog

Tag: Amazon Elastic MapReduce

New – Auto Scaling for EMR Clusters

The team is cranking out new features at an impressive pace (guess they have lots of worker nodes)! So far this quarter they have added all of these features: September – Data Encryption for Apache Spark, Tez, and Hadoop MapReduce. September – Open-sourced EMR-DynamoDB Connector for Apache Hive. November – Stream Processing at Scale with […]

Read More

Snowball HDFS Import

If you are running MapReduce jobs on premises and storing data in HDFS (the Hadoop Distributed File System), you can now copy that data directly from HDFS to an without using an intermediary staging file. Because HDFS is often used for Big Data workloads, this can greatly simplify the process of importing large amounts of […]

Read More

Amazon EMR Now Supports Amazon S3 Client-Side Encryption

Many AWS customers use to process huge amounts of data.  Built around Hadoop, EMR allows these customers to build highly scalable processing systems that can quickly and efficiently digest raw data and turn it into actionable business intelligence. EMR File System (EMRFS) enables Amazon EMR clusters to operate directly on data in , making it […]

Read More

Resource Groups and Tagging for AWS

For many years, AWS customers have used tags to organize their EC2 resources (instances, images, load balancers, security groups, and so forth), RDS resources (DB instances, option groups, and more), VPC resources (gateways, option sets, network ACLS, subnets, and the like) Route 53 health checks, and S3 buckets. Tags are used to label, collect, and […]

Read More

New AWS Quick Start – Cloudera Enterprise Data Hub

date: 2014-10-15 2:03:16 PM The new Quick Start Reference Deployment Guide for Cloudera Enterprise Data Hub does exactly what the title suggests! The comprehensive (20 page) guide includes the architectural considerations and configuration steps that will help you to launch the new Cloudera Director and an associated Cloudera Enterprise Data Hub (EDH) in a matter […]

Read More

Consistent View for Elastic MapReduce’s File System

Many AWS developers are using (a managed Hadoop service) to quickly and cost-effectively build applications that process vast amounts of data. The EMR File System (EMRFS) allows AWS customers to use as a durable and cost-effective data store that is independent of the memory and compute resources of any particular cluster. It also allows multiple […]

Read More