AWS News Blog

Amazon EMR Now Supports Amazon S3 Client-Side Encryption

Many AWS customers use Amazon EMR to process huge amounts of data.  Built around Hadoop, EMR allows these customers to build highly scalable processing systems that can quickly and efficiently digest raw data and turn it into actionable business intelligence.

EMR File System (EMRFS) enables Amazon EMR clusters to operate directly on data in Amazon Simple Storage Service (Amazon S3), making it easy for customers to work with input and output files in S3. Until now, EMRFS supported unencrypted and server-side encrypted objects in S3

Support for Amazon S3 Client-Side Encryption in the EMRFS
Today we’re adding support for client-side encrypted objects in S3, enabling you to use your own keys. The EMRFS S3 client-side encryption uses the same envelope encryption method found in the generic S3 Encryption Client, allowing you to use Amazon EMR to easily process data uploaded to S3 using that client. This feature does not, however, encrypt data stored in HDFS on the local disks of your Amazon EMR cluster or data in transit between your cluster nodes.

The encryption is transparent to the applications running on the EMR cluster.

You can store your keys in the AWS Key Management Service (AWS KMS) or provide custom logic to access keys in on-premises HSMs or other customer key management systems. Amazon EMR can use an Encryption Materials Provider that you supply, so you can store your keys in any location where Amazon EMR can use them.

Enabling Encryption From the Console
You can enable this new feature from the EMR Console like this:

Based on the option that you select, the console will prompt you for additional information. For example, if you choose to use the Key Management Service, you can choose the desired one from the menu (you can also enter the ARN of an AWS KMS key if the key is owned by another AWS account):

Custom Key Management With the EMRFS
You can create a custom Encryption Materials Provider class to provide keys to the EMRFS using user defined logic. The EMRFS will pass information from the S3 object metadata to the provider to inform which key to retrieve for decryption. Your code must contain the information about how to retrieve the keys, and the EMRFS will use the key that the provider presents. When you specify the custom encryption materials provider option, all you need to do is give the Amazon S3 location of your provider, and Amazon EMR will automatically add the provider to the cluster and use it with the EMRFS.

Available Now
This feature is available now and you can start using it today. You will need to use the latest EMR AMI (version 3.6.0 or later).

Jeff;

Jeff Barr

Jeff Barr

Jeff Barr is Chief Evangelist for AWS. He started this blog in 2004 and has been writing posts just about non-stop ever since.