Posted On: Mar 25, 2015
You can now use Amazon EMR clusters to process encrypted data stored in Amazon S3 that you previously encrypted using client-side encryption. This functionality has been added to the EMR File System (EMRFS), which Amazon EMR clusters use to read from and write to Amazon S3 securely, consistently, and with high performance. When writing to Amazon S3, EMRFS now supports encrypting those objects with Amazon S3 client-side encryption in addition to Amazon S3 server-side encryption
EMRFS support for Amazon S3 client-side encryption enables you to use Amazon EMR to process objects in Amazon S3 which are encrypted by keys stored in the AWS Key Management Service (KMS), on-premises hardware security modules (HSMs), or other key vendor systems. When reading from or writing to Amazon S3, EMRFS encrypts and decrypts objects using the same envelope encryption method as the Amazon S3 encryption client, enabling EMRFS to seamlessly encrypt and decrypt objects you upload to Amazon S3 using that client. Please note that EMRFS support for Amazon S3 client-side encryption does not also encrypt data written to the Hadoop Distributed File System (HDFS) on the local disks of your Amazon EMR cluster.
EMRFS support for Amazon S3 client-side encryption is transparent to the applications running on your cluster. When enabling this feature, you can configure EMRFS to use an AWS KMS key alias or Amazon Resource Name (ARN), or provide custom logic for EMRFS to obtain keys held in your custom key management system. You can launch an Amazon EMR cluster with Amazon S3 client-side encryption from the AWS Management Console, AWS CLI, or SDK by selecting AMI 3.6.0 or later. To learn more about using Amazon S3 client-side encryption in EMRFS with AWS KMS keys and custom key providers, click here.