Amazon EMR now supports data encryption for Apache Spark, Tez, and Hadoop MapReduce

Posted on: Sep 21, 2016

You can now easily enable encryption for data at-rest and in-transit for Apache Spark, Apache Tez, and Apache Hadoop MapReduce on Amazon EMR. For encryption at-rest, you can encrypt data stored in Amazon S3 with the EMR File System (EMRFS) and data stored on your Amazon EMR cluster in the local file system on each node and the Hadoop Distributed File System (HDFS). For encryption in-transit, Amazon EMR will enable the open-source encryption features for Apache Spark, Apache Tez, and Apache Hadoop MapReduce.

Encryption for each supported component can be easily configured using an Amazon EMR security configuration, which specifies the keys and certificates to use for encryption on your cluster. Security configurations are named AWS resources, and they are stored for you in the Amazon EMR service.

You can create a security configuration on the security configuration page in Amazon EMR console, AWS Command Line Interface (CLI), or the AWS SDK with the Amazon EMR API. After creating a security configuration, you can specify it when creating an Amazon EMR cluster. You can use AWS Key Management Service (KMS) or custom key management infrastructure to supply encryption keys, and you can use TLS certificates stored in Amazon S3 for in-transit encryption. Security configurations are supported on Amazon EMR releases 5.0.0 and 4.8.0. Please visit the Amazon EMR documentation for more information about security configurations, encryption at-rest for each storage layer, and encryption in-transit mechanisms for each supported engine.