AWS Partner Network (APN) Blog

How to Use Xplenty with AWS KMS to Provide Field-Level Encryption in ETL Data Processing

By Mark Smallcombe, CTO at Xplenty

Xplenty-Logo-1
Xplenty-APN-Badge-2
Connect with Xplenty-1

Emerging data privacy regulations such as HIPAA, GDPR, and CCPA are forcing customers to review how they handle and protect their customers’ data.

Enterprises often choose to mask, remove, or encrypt sensitive data in the extract, transform, and load (ETL) step to minimize the risk of sensitive data becoming stored, logged, accessible, or breached from their data lake or data warehouse.

Xplenty’s ETL and ELT platform allows customers to quickly and easily prepare their data for analytics using a simple-to-use data integration cloud service. Xplenty’s drag-and-drop interface enables data integration, processing, and preparation without installing, deploying, or maintaining any software.

Xplenty’s global service uses AWS Key Management Service (AWS KMS) and operates in the North Virginia, Oregon, Ireland, Tokyo, Singapore, and Sydney AWS Regions. AWS KMS makes it easy to create and control the keys used to encrypt or digitally sign your data.

In this post, I will describe how Xplenty leverages AWS Encryption SDK and a customer’s AWS KMS to encrypt sensitive data during an ETL process.

We’ll also explore how this gives Xplenty’s customers granular control of the encryption and decryption process using their own AWS KMS key policy. This helps you meet industry compliance standards like GDPR.

Xplenty is an AWS Partner Network (APN) Advanced Technology Partner with the AWS Data & Analytics Competency. Xplenty provides a complete toolkit for building data pipelines, and customers use the package designer to implement a variety of ETL use cases, from simple replication to complex data preparation.

Why Encryption?

The best solution to protect sensitive data is to remove, hash, or anonymize the data fields before they’re loaded into a data warehouse or data lake, but that’s not always a suitable business solution.

Personally Identifiable Information (PII) and Personal Health Information (PHI) data is needed for business applications. However, it must be strongly protected in transport, rest, and in the customer’s application.

The AWS Encryption SDK uses envelope encryption that ensures the data key (provided by the customer’s AWS KMS) used for encryption is stored securely with the encrypted data for later decryption (in a single encrypted message).

Xplenty-ETL-1

Figure 1 – Symmetric Key envelope encryption.

Why AWS Key Management Service (AWS KMS)?

Secure encryption key management is complicated and prone to security vulnerabilities. AWS KMS enables Xplenty to give customers full control of encryption keys, their rotation, and their logging whilst maintaining very high, proven security (FIPS 140-2).

Xplenty customers can create a new AWS KMS Customer Master Key, and give Xplenty access to this AWS Key ARN for all their encryption and decryption on Xplenty.

With AWS KMS and Xplenty, customers can run their ETL jobs and encrypt sensitive data, all without managing encryption keys or exchanging secrets. This makes the end to end solution, a secure and seamless process for the customer.

Setting Up AWS KMS

First, you need to create a new Customer Master Key and give Xplenty’s AWS account permission to call this AWS KMS. Here’s a screen shot of a customer’s AWS KMS console showing how you can give Xplenty’s account permission to call this AWS KMS.

Xplenty-ETL-2.1

Figure 2 – Customer’s console specifying Xplenty’s AWS account ID for AWS KMS access.

The following example fragment of a customer’s KMS key policy gives you full control of Xplenty’s permission to encrypt and decrypt fields of data. Xplenty may be able to encrypt data but never decrypt data by removing “kms:Decrypt” from the key policy actions.

...
{
    "Sid": "Allow use of the key",
    "Effect": "Allow",
    "Principal": {"AWS": [
    "arn:aws:iam::111122223333:user/KMSUser",
    "arn:aws:iam::111122223333:role/KMSRole",
    "arn:aws:iam::444455556666:root"
    ]},
    "Action": [
    "kms:Encrypt",
    "kms:Decrypt",
    "kms:ReEncrypt*",
    "kms:GenerateDataKey*",
    "kms:DescribeKey"
    ],
    "Resource": "*"
}
...

Calling Xplenty’s Encrypt Function

Inside Xplenty’s ETL package, you can encrypt data by passing the string to the encrypt function with the AWS Key ARN (and the optional encryption context and AES encryption strength).

This returns the encrypted message containing the ciphertext and encrypted data key.

Xplenty-ETL-3

Figure 3 – Using Xplenty’s Encrypt function to protect an email address.

Encrypt Code Overview

Xplenty’s transformation platform uses the Java AWS Encryption SDK, and to be able to call the customer’s AWS KMS, we must first include the following dependencies:

  • aws-java-sdk-kms: classes to communicate with AWS KMS.
  • aws-encryption-sdk-java: AWS Encryption SDK for Java.
  • aws-java-sdk-core: classes to interact with AWS.

The following example Java code shows how to include the AWS Encryption SDK, call AWS KMS, and then encrypt a piece of data.

import com.amazonaws.encryptionsdk.AwsCrypto;

import com.amazonaws.encryptionsdk.CryptoAlgorithm;

import com.amazonaws.encryptionsdk.CryptoResult;

import com.amazonaws.encryptionsdk.MasterKeyProvider;

import com.amazonaws.encryptionsdk.kms.KmsMasterKeyProvider;
...

public class Encrypt extends Crypto {
    private static final int DATA_INDEX = 0;
    private static final int KEY_ARN_INDEX = 1;

    @Override
    public String exec(Tuple input) throws IOException {
        // validate the function calls 
        checkEncryptionAvailability();
        checkInputValidity(input);

        /* extract the input arguments: 
           data to encrypt, KMS keyARN, encryption context and AES strength */
        String dataToEncrypt = input.get(DATA_INDEX).toString();
        String keyArn = input.get(KEY_ARN_INDEX).toString();
        Map<String, String> encryptionContext = getEncryptionContext(input);
        String aesStrength = getAesStrength(input);

        // preparing an encryption key
        AwsCrypto crypto = new AwsCrypto();
        crypto.setEncryptionAlgorithm(getEncryptionAlgorithm(aesStrength));
        MasterKeyProvider keyProvider = new KmsMasterKeyProvider(getCredentials(), keyArn);

        // encrypt the data with the data encryption key
        CryptoResult<String, ?> encryptionResult = crypto.encryptString(
            getCachingCryptoMaterialsManager(keyProvider), dataToEncrypt, encryptionContext);

        // return the encrypted message containing the ciphertext and the encrypted data key
        return encryptionResult.getResult();
    }
    
...

Calling Xplenty’s Decrypt Function

Inside Xplenty’s ETL package, you can also decrypt data by passing the encrypted message to the decrypt function with the AWS Key ARN (and the optional security context that was used for the encryption).

This returns the decrypted message.

Xplenty-ETL-4

Figure 4 – Using Xplenty’s Decrypt function to decrypt an email address.

Decrypt Code Overview

The decrypt function setup mirrors the encrypt function, but there’s an additional step of validating the encryption context. Note the format of the encryption data structure returned by the AWS Encryption SDK.

Here’s how to check the encryption context and then call the decrypt function:

...

    CryptoResult<String, ?> decryptionResult = crypto.decryptString(
        getCachingCryptoMaterialsManager(keyProvider), dataToDecrypt);

    // confirm that the encryption context is valid
    validateContext(encryptionContext, decryptionResult.getEncryptionContext());

    return decryptionResult.getResult();
}

Here’s how the encryption context is validated:

private void validateContext(Map<String, String> currentContext, Map<String, 
                             String> encryptedContext) throws IllegalStateException {
    if ((encryptedContext.size() - 1) != currentContext.size()) {
        throw new IllegalStateException("Encryption context does not match.");
    }

    for (final Map.Entry<String, String> entry: currentContext.entrySet()) {
        if (!entry.getValue().equals(encryptedContext.get(entry.getKey()))) {
            throw new IllegalStateException("Encryption context does not match.");
        }
    }
}

KMS Data Key Caching

Xplenty uses the AWS KMS data key caching feature to minimize calls to the customer’s AWS KMS, leading to improved performance and reduced AWS KMS costs.

Here’s an overview of the data key caching function:

protected CryptoMaterialsManager getCachingCryptoMaterialsManager(MasterKeyProvider keyProvider) {
    if (CACHING_CRYPTO_MATERIALS_MANAGER != null) {
        return CACHING_CRYPTO_MATERIALS_MANAGER;
    }

    CACHING_CRYPTO_MATERIALS_MANAGER = CachingCryptoMaterialsManager.newBuilder()
            .withMasterKeyProvider(keyProvider)
            .withCache(new LocalCryptoMaterialsCache(CRYPTO_MATERIALS_CACHE_CAPACITY))
            .withMaxAge(DATAKEY_CACHE_AGE, TimeUnit.DAYS)
            .build();

    return CACHING_CRYPTO_MATERIALS_MANAGER;
}

In this post, I have have shown how a customer gives a vendor permission to call their AWS KMS, how you have the ability to set the minimum privilege (principle of least privilege), and provided working Java code snippets to encrypt and decrypt data using the AWS Encryption SDK and AWS KMS to help you get started.

Summary

Xplenty’s platform helps data scientists and business users quickly create their data pipelines without coding. AWS Key Management Service (AWS KMS) helps Xplenty’s customers further secure their ETL data processing without giving up any security control.

Field-level encryption protects a customer’s PII and PHI data right at the source, ensuring this sensitive data is encrypted before loading into data lakes, data warehouses, and other internal systems.

Visit the Xplenty website to view the technical documentation and sign up for a two-week free trial. Xplenty is also available on AWS Marketplace.

The content and opinions in this blog are those of the third party author and AWS is not responsible for the content or accuracy of this post.

.
Xplenty-APN-Blog-CTA-1
.


Xplenty – APN Partner Spotlight

Xplenty is an AWS Data & Analtyics Competency Partner. Xplenty is a SOC2 certified and offers an easy-to-use ETL and ELT data processing platform that connects to AWS data sources and data destinations.

Contact Xplenty | Solution Overview | AWS Marketplace

*Already worked with Xplenty? Rate this Partner

*To review an APN Partner, you must be an AWS customer that has worked with them directly on a project.