How to Use Xplenty with AWS KMS to Provide Field-Level Encryption in ETL Data Processing
By Mark Smallcombe, CTO at Xplenty
Emerging data privacy regulations such as HIPAA, GDPR, and CCPA are forcing customers to review how they handle and protect their customers’ data.
Enterprises often choose to mask, remove, or encrypt sensitive data in the extract, transform, and load (ETL) step to minimize the risk of sensitive data becoming stored, logged, accessible, or breached from their data lake or data warehouse.
Xplenty’s ETL and ELT platform allows customers to quickly and easily prepare their data for analytics using a simple-to-use data integration cloud service. Xplenty’s drag-and-drop interface enables data integration, processing, and preparation without installing, deploying, or maintaining any software.
Xplenty’s global service uses AWS Key Management Service (AWS KMS) and operates in the North Virginia, Oregon, Ireland, Tokyo, Singapore, and Sydney AWS Regions. AWS KMS makes it easy to create and control the keys used to encrypt or digitally sign your data.
In this post, I will describe how Xplenty leverages AWS Encryption SDK and a customer’s AWS KMS to encrypt sensitive data during an ETL process.
We’ll also explore how this gives Xplenty’s customers granular control of the encryption and decryption process using their own AWS KMS key policy. This helps you meet industry compliance standards like GDPR.
Xplenty is an AWS Partner Network (APN) Advanced Technology Partner with the AWS Data & Analytics Competency. Xplenty provides a complete toolkit for building data pipelines, and customers use the package designer to implement a variety of ETL use cases, from simple replication to complex data preparation.
The best solution to protect sensitive data is to remove, hash, or anonymize the data fields before they’re loaded into a data warehouse or data lake, but that’s not always a suitable business solution.
Personally Identifiable Information (PII) and Personal Health Information (PHI) data is needed for business applications. However, it must be strongly protected in transport, rest, and in the customer’s application.
The AWS Encryption SDK uses envelope encryption that ensures the data key (provided by the customer’s AWS KMS) used for encryption is stored securely with the encrypted data for later decryption (in a single encrypted message).
Figure 1 – Symmetric Key envelope encryption.
Why AWS Key Management Service (AWS KMS)?
Secure encryption key management is complicated and prone to security vulnerabilities. AWS KMS enables Xplenty to give customers full control of encryption keys, their rotation, and their logging whilst maintaining very high, proven security (FIPS 140-2).
With AWS KMS and Xplenty, customers can run their ETL jobs and encrypt sensitive data, all without managing encryption keys or exchanging secrets. This makes the end to end solution, a secure and seamless process for the customer.
Setting Up AWS KMS
First, you need to create a new Customer Master Key and give Xplenty’s AWS account permission to call this AWS KMS. Here’s a screen shot of a customer’s AWS KMS console showing how you can give Xplenty’s account permission to call this AWS KMS.
Figure 2 – Customer’s console specifying Xplenty’s AWS account ID for AWS KMS access.
The following example fragment of a customer’s KMS key policy gives you full control of Xplenty’s permission to encrypt and decrypt fields of data. Xplenty may be able to encrypt data but never decrypt data by removing “kms:Decrypt” from the key policy actions.
Calling Xplenty’s Encrypt Function
Inside Xplenty’s ETL package, you can encrypt data by passing the string to the encrypt function with the AWS Key ARN (and the optional encryption context and AES encryption strength).
This returns the encrypted message containing the ciphertext and encrypted data key.
Figure 3 – Using Xplenty’s Encrypt function to protect an email address.
Encrypt Code Overview
Xplenty’s transformation platform uses the Java AWS Encryption SDK, and to be able to call the customer’s AWS KMS, we must first include the following dependencies:
- aws-java-sdk-kms: classes to communicate with AWS KMS.
- aws-encryption-sdk-java: AWS Encryption SDK for Java.
- aws-java-sdk-core: classes to interact with AWS.
The following example Java code shows how to include the AWS Encryption SDK, call AWS KMS, and then encrypt a piece of data.
Calling Xplenty’s Decrypt Function
Inside Xplenty’s ETL package, you can also decrypt data by passing the encrypted message to the decrypt function with the AWS Key ARN (and the optional security context that was used for the encryption).
This returns the decrypted message.
Figure 4 – Using Xplenty’s Decrypt function to decrypt an email address.
Decrypt Code Overview
The decrypt function setup mirrors the encrypt function, but there’s an additional step of validating the encryption context. Note the format of the encryption data structure returned by the AWS Encryption SDK.
Here’s how to check the encryption context and then call the decrypt function:
Here’s how the encryption context is validated:
KMS Data Key Caching
Here’s an overview of the data key caching function:
In this post, I have have shown how a customer gives a vendor permission to call their AWS KMS, how you have the ability to set the minimum privilege (principle of least privilege), and provided working Java code snippets to encrypt and decrypt data using the AWS Encryption SDK and AWS KMS to help you get started.
Xplenty’s platform helps data scientists and business users quickly create their data pipelines without coding. AWS Key Management Service (AWS KMS) helps Xplenty’s customers further secure their ETL data processing without giving up any security control.
Field-level encryption protects a customer’s PII and PHI data right at the source, ensuring this sensitive data is encrypted before loading into data lakes, data warehouses, and other internal systems.
The content and opinions in this blog are those of the third party author and AWS is not responsible for the content or accuracy of this post.
Xplenty – APN Partner Spotlight
Xplenty is an AWS Data & Analtyics Competency Partner. Xplenty is a SOC2 certified and offers an easy-to-use ETL and ELT data processing platform that connects to AWS data sources and data destinations.
*Already worked with Xplenty? Rate this Partner
*To review an APN Partner, you must be an AWS customer that has worked with them directly on a project.