AWS News Blog

Tag: Amazon Elastic MapReduce

New – Auto Scaling for EMR Clusters

The Amazon EMR team is cranking out new features at an impressive pace (guess they have lots of worker nodes)! So far this quarter they have added all of these features: September – Data Encryption for Apache Spark, Tez, and Hadoop MapReduce. September – Open-sourced EMR-DynamoDB Connector for Apache Hive. November – Stream Processing at […]

Snowball HDFS Import

Update (June 2019) – This feature is no longer available. If you are running MapReduce jobs on premises and storing data in HDFS (the Hadoop Distributed File System), you can now copy that data directly from HDFS to an AWS Snowball without using an intermediary staging file. Because HDFS is often used for Big Data […]

Amazon EMR Now Supports Amazon S3 Client-Side Encryption

Many AWS customers use Amazon EMR to process huge amounts of data.  Built around Hadoop, EMR allows these customers to build highly scalable processing systems that can quickly and efficiently digest raw data and turn it into actionable business intelligence. EMR File System (EMRFS) enables Amazon EMR clusters to operate directly on data in Amazon […]

Resource Groups and Tagging for AWS

For many years, AWS customers have used tags to organize their EC2 resources (instances, images, load balancers, security groups, and so forth), RDS resources (DB instances, option groups, and more), VPC resources (gateways, option sets, network ACLS, subnets, and the like) Route 53 health checks, and S3 buckets. Tags are used to label, collect, and […]