AWS Database Blog

Introducing Client-Side Field Level Encryption and MongoDB 5.0 API compatibility in Amazon DocumentDB

Amazon DocumentDB (with MongoDB compatibility) is a scalable, highly durable, and fully managed database service for operating mission-critical MongoDB-compatible JSON based workloads. On 02/MAR/2023, Amazon DocumentDB launched support for Client-Side Field Level Encryption (CSFLE), MongoDB 5.0 API compatibility, new aggregation operators, and other enhancements.

In this post, we summarize what’s new in Amazon DocumentDB and show an example of how to encrypt sensitive data in your application with CSFLE.

What’s new in Amazon DocumentDB 5.0?

Amazon DocumentDB 5.0 offers the following enhancements:

  • Client-side field level encryption – With the support for CSFLE, you can now selectively encrypt sensitive data in-application using AWS Key Management Service (AWS KMS) before it is sent to the database. This is in addition to the existing features available for encrypting data at rest and in transit.
  • New operators – We have added support for two new aggregation operators, $dateAdd and $dateSubtract, and have updated behaviors of already supported operators to be compatible with MongoDB 5.0 API. These new operators are available in Amazon DocumentDB 5.0 and Elastic Clusters. For more information, see Supported MongoDB APIs, Operations, and Data Types.
  • Index enhancements – You now have the ability to use indexes with the $elemMatch operator. As a result, queries with $elemMatch will now result in index scans, providing better performance for queries involving arrays.
  • Storage limit increase – We have increased the volume storage limit to 128 TiB from the previous limit of 64 TiB. This increase is applicable to all Amazon DocumentDB instance-based clusters (including 3.6 and 4.0 clusters) and Elastic Clusters. Now, each shard in Amazon DocumentDB Elastic Clusters will have a maximum storage capacity of 128 TiB. You pay only for the storage and I/O that your Amazon DocumentDB cluster consumes, and you don’t need to provision these resources in advance. For existing clusters, the storage limit increase gets applied automatically and requires no action to be taken.

For full release notes, see release notes.

Getting started with CSFLE

Applications that deal with sensitive data, such as personally identifiable information (PII), can now choose to encrypt only the sensitive fields before storing them in Amazon DocumentDB, improving their security posture by protecting data from data breaches and unauthorized access while also complying with privacy and regulatory requirements.

To find out a full list of which operations are supported and which are not on encrypted fields, see Client-side field level encryption.

Solution overview

Configuring and using CSFLE in Amazon DocumentDB comprises four steps:

  1. Create a customer managed key (CMK) using AWS KMS.
  2. Create an AWS Identity and Access Management (IAM) policy and associate it with the user.
  3. Generate a data encryption key (DEK).
  4. Perform read and write operations.

Prerequisites

To implement this solution, you need:

  • A TLS-enabled Amazon DocumentDB 5.0 cluster (instance-based or elastic). We recommend using TLS to encrypt data in transit as a security best practice. You can use an existing cluster or create a new one.
  • An AWS Secrets Manager secret where Amazon DocumentDB credentials are stored. Using Secrets Manager to manage your database credentials is another recommended best practice. For more information on using Secrets Manager with Amazon DocumentDB, see How Amazon DocumentDB (with MongoDB compatibility) uses AWS Secrets Manager.
  • An IAM user. You can use an existing IAM user or create a new user. For this post, we use an existing cluster and IAM user democsfle.

Create a CMK using AWS KMS

To create your encryption key, follow these steps:

  1. On the AWS KMS console, choose Customer-managed keys in the navigation pane.
  2. Choose Create key.
    Create key
  3. For Key type, select Symmetric.
  4. For Key usage, select Encrypt and decrypt.
  5. Choose Next.
    Configure key
  6. Enter an alias, such as csflecmk, and an optional description.
  7. Choose Next.
    add labels
  8. For Key administrators, select the IAM users and roles that can administer the key.
  9. Choose Next.
    Define key usage permissions
  10. Optionally, select the IAM users and roles that can define the keys.
  11. Choose Next.
    key usage permissions
  12. On the Review and create page, review the choices you made and choose Finish.
    review and finish

Create an IAM policy and associate it with an IAM user

You need an IAM policy that allows the IAM user democsfle to use the key created in the previous step. For this post, we name the policy csfledemopolicy, and specify it to allow encrypt and decrypt actions for the key.

  1. On the IAM console, choose Policies in the navigation pane.
  2. Choose Create policy.
    create IAM policy
  3. On the JSON tab, enter the following policy (provide the ARN of the CMK you created):
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Sid": "csfldemov1",
                "Effect": "Allow",
                "Action": [
                    "kms:Encrypt",
                    "kms:Decrypt"
                ],
                "Resource": "<ARN of Customer Master Key>"
            }
        ]
    } 
  4. Choose Next: Tags.
    Tags
  5. Optionally, add any tags to your policy.
  6. Choose Next: Review.
    Next: Review
  7. Enter a name and optional description.
  8. Review the policy and choose Create policy.
    Create policy

This completes the IAM policy creation. The next step is to add the IAM policy csfledemopolicy to the IAM user democsfle.

  1. Choose Users in the navigation pane.
  2. Search for and choose the user democsfle.
    search and choose users
  3. In the Permissions policies section, choose Add permissions.
    add permissions
  4. For Permissions options, select Attach policies directly.
  5. Search for and select the policy csfledemopolicy.
  6. Choose Next.
    add permissions
  7. Review the permission summary and choose Add permissions.
    review permissions summary

Note: You may also need to provide required permissions to the IAM user to read the secret containing Amazon DocumentDB credentials from Secrets Manager.

Generate a data encryption key

We use a DEK to encrypt data in-application before it is sent to the database. The DEK is stored in a key vault collection of your choice in Amazon DocumentDB. DEKs are encrypted with a CMK. You can generate multiple DEKs to encrypt multiple fields. It’s important to note that you can’t decrypt your encrypted data without the DEK that was used to encrypt it.

For this task, you create the Python script democsfle.py with functions to retrieve credentials and generate DEK. The following table lists the attributes to use in the functions.

Type Attribute Description
AWS KMS Key details kmsKeyArn The ARN of the CMK that we created.
awsRegion The Region in which the CMK is available.
DEK details key_vault_namespace The namespace (<database>.<collection>) of the key vault in which the DEK is stored in Amazon DocumentDB. For example, we use encr.dekKeys.
keyAltName The name for the DEK key that is going to be generated by the script and is stored as a document in key_vault_namespace. This value should be unique in the key_vault_namespace. For example, we use demo_encr_email as the DEK in this post.
Amazon DocumentDB Credentials secret_name The name of the secret in Secrets Manager where Amazon DocumentDB credentials are stored.
tlsCAFile The SSL certificate used to encrypt data in transit. For this post, this is rds-combined-ca-bundle.pem. You can download this certificate and place it in the script. For more information, see Connecting with TLS Enabled.

Complete the following steps to create functions to retrieve database credentials from Secrets Manager and generate a DEK:

  1. In your preferred text editor, create the file democsfle.py and enter the following code:
    import boto3
    import json
    import base64
    import pymongo
    from botocore.exceptions import ClientError
    from bson import json_util
    from pymongo.encryption import (Algorithm,
                                    ClientEncryption)
    from pprint import pprint
    
    awsRegion = "<aws region>"
    secret_name = "<Name of the Secret where database credentials are stored>"
    
    # Function to retrieve credentials. 
    
    def get_secret():
    
        session = boto3.session.Session()
        client = session.client(service_name="secretsmanager", region_name=awsRegion)
        try:
            get_secret_value_response = client.get_secret_value(SecretId=secret_name)
        except ClientError as e:
            print("Failed to retrieve secret {} because: {}".format(secret_name, e))
        else:
    
            if "SecretString" in get_secret_value_response:
                secret = get_secret_value_response["SecretString"]
            else:
                decoded_binary_secret = base64.b64decode(
                    get_secret_value_response["SecretBinary"]
                )
        secret = json.loads(secret)
        credentials = session.get_credentials()
        secret["access_key"] = credentials.access_key
        secret["secret_key"] = credentials.secret_key
    
        return secret

Note: You may need to configure your AWS CLI for the IAM user.

  1. Next, you need a function to generate DEK, so let’s create a function named generate_keys() and append to the same file (democsfle.py):
    # Function to create DEK and store it in KeyVault. 
    
    kmsKeyArn = "<ARN of Customer Master Key created in previous step>"
    keyAltName = "demo_encr_email"
    key_vault_namespace = "encr.dekKeys"
        
    def generate_keys():
    
        docDBAppAdminUsername = get_secret()["username"]
        docDBAppAdminPassword = get_secret()["password"]
        docDBClusterEndpoint = get_secret()["host"]
        awsUserAccessKeyId = get_secret()["access_key"]
        awsUserSecretAccessKey = get_secret()["secret_key"]
    
        kms_providers = {
            "aws": {
                "accessKeyId": awsUserAccessKeyId,
                "secretAccessKey": awsUserSecretAccessKey,
            }
        }
    
        client = pymongo.MongoClient(
            docDBClusterEndpoint,
            username=docDBAppAdminUsername,
            password=docDBAppAdminPassword,
            tls="true",
            tlsCAFile="rds-combined-ca-bundle.pem",
            retryWrites="false",
        )
    
        key_vault_db_name, key_vault_coll_name = key_vault_namespace.split(".", 1)
    
        # Set up the key vault (key_vault_namespace) for this example.
        key_vault = client[key_vault_db_name][key_vault_coll_name]
        # Ensure that two data keys cannot share the same keyAltName.
    
        key_vault.create_index("keyAltNames", unique=True)
    
        client_encryption = ClientEncryption(
            kms_providers, key_vault_namespace, client, client.codec_options
        )
    
        # Create a new data key .
        m_key = {"region": awsRegion, "key": kmsKeyArn}
    
        data_key_id = client_encryption.create_data_key(
            "aws", master_key=m_key, key_alt_names=[keyAltName]
        )
    
        doc = key_vault.find_one(
            {"keyAltNames": keyAltName},
            {"_id": 1, "keyAltNames": 1, "creationDate": 1, "updateDate": 1},
        )
    
        print(" DEK Name: ", doc["keyAltNames"], "\n", "Created date:", doc["creationDate"])
    
        client_encryption.close()
        client.close()

Perform read and write operations

In this task, you create a function to insert documents into a collection with encrypted and unencrypted fields and read the data using DEK created by the function generate_keys(). This function takes an additional attribute: a user namespace (<database>.<collection>) to store the user data along with the attributes used in the previous functions. We use gamesDB.users as the namespace where the user data is stored.

Append the following function to perform read and write operations to democsfle.py:

userNamespace = "gamesDB.users"
    
def read_write():

    docDBAppAdminUsername = get_secret()["username"]
    docDBAppAdminPassword = get_secret()["password"]
    docDBClusterEndpoint = get_secret()["host"]
    awsUserAccessKeyId = get_secret()["access_key"]
    awsUserSecretAccessKey = get_secret()["secret_key"]

    kms_providers = {
        "aws": {
            "accessKeyId": awsUserAccessKeyId,
            "secretAccessKey": awsUserSecretAccessKey,
        }
    }

    client = pymongo.MongoClient(
        docDBClusterEndpoint,
        username=docDBAppAdminUsername,
        password=docDBAppAdminPassword,
        tls="true",
        tlsCAFile="rds-combined-ca-bundle.pem",
        retryWrites="false",
    )

    key_vault_db_name, key_vault_coll_name = key_vault_namespace.split(".", 1)

    user_db, user_coll = userNamespace.split(".", 1)

    userCollection = client[user_db][user_coll]

    client_encryption = ClientEncryption(
        kms_providers, key_vault_namespace, client, userCollection.codec_options
    )

    # Explicitly encrypt a field:
    encrypted_email_field = client_encryption.encrypt(
        "Jane_Doe@example.com",
        Algorithm.AEAD_AES_256_CBC_HMAC_SHA_512_Deterministic,
        key_alt_name=keyAltName,
    )

    encrypted_dob_field = client_encryption.encrypt(
        "1990-01-01",
        Algorithm.AEAD_AES_256_CBC_HMAC_SHA_512_Random,
        key_alt_name=keyAltName,
    )

    userCollection.insert_one(
        {
            "gamerTag": "Jane_Doe",
            "encrypted_email": encrypted_email_field,
            "encrypted_dob": encrypted_dob_field,
            "favorite_friends": ["Akua Mansa", "Carlos Salazar", "Nikki Wolf"],
        }
    )

    doc = userCollection.find_one({"gamerTag": "Jane_Doe"})

    print("Encrypted Document: ")
    pprint(doc)
    # Explicitly decrypt the field:
    doc["encrypted_email"] = client_encryption.decrypt(doc["encrypted_email"])
    doc["encrypted_dob"] = client_encryption.decrypt(doc["encrypted_dob"])

    print("\n Decrypted document:")
    pprint(doc)
    # Cleanup resources.
    client_encryption.close()
    client.close()

def main():
    user_choice = str(input("Enter your choice GenerateKeys or PerformReadWrite : "))
    if(user_choice=="GenerateKeys"):
        generate_keys()
    if(user_choice=="PerformReadWrite"):
        read_write()
        
if __name__ == "__main__":
    main()

This completes the creation of the script democsfle.py. The next step is to run the script to see it in action.

  1. Generate the DEK with the following code:
    $ python3 democsfle.py 
    Enter your choice GenerateKeys or PerformReadWrite : GenerateKeys

    code to Generate the DEK

  2. Perform read and write operations:
    $ python3 democsfle.py 
    Enter your choice GenerateKeys or PerformReadWrite : PerformReadWrite

    perform read and write operations

Clean up

If you created a new Amazon DocumentDB cluster, you can stop the cluster or delete the cluster. If you created a new IAM user, you can deactivate or delete the user if you’re not using that user elsewhere.

Summary

In this post we introduced the new features in Amazon DocumentDB and showed you how to use client-side field encryption in your application with an example. For more information about recent launches and blog posts, see Amazon DocumentDB (with MongoDB compatibility) resources.


About the authors

Kaarthiik Thota is a Senior DocumentDB Specialist Solutions Architect at AWS based out of London. He is passionate about database technologies and enjoys helping customers solve problems and modernize applications leveraging NoSQL databases. Before joining AWS, he worked extensively with relational databases, NoSQL databases, and Business Intelligence technologies for more 14 years