How to Tokenize and De-Identify Your Data in Amazon RDS with Baffle
By Harold Byun, VP Products at Baffle
By Jani Syed, Principal Solutions Architect at AWS
Amazon Web Services (AWS) provides solutions to enhance the security posture of its infrastructure and services, which in turn helps customers fulfill their portion of the AWS Shared Responsibility Model.
It’s important for customers to understand that the responsibility for the actual data stored on AWS falls to them, in terms of ensuring its security and privacy.
There are also some customers that require additional data-centric protection measures to address more stringent privacy regulations and measures. The basis for some of these requirements may include:
- On-premise data residency requirements where data cannot arrive into AWS in plaintext.
- Key ownership and control requirements where keys cannot be owned, stored, or accessed by AWS. This includes Bring Your Own Key (BYOK) and Hold Your Own Key (HYOK) scenarios.
- Data privacy regulations where compliance mandates require that data values are either tokenized, encrypted, or otherwise de-identified.
It’s worth noting that these types of data-centric protection requirements cannot be met with common encryption at-rest and transparent database encryption (TDE) methods. Those methods don’t actually obscure or protect the data values, and are mitigation measures for physical disk theft as opposed to data leaks and hacks over the wire.
Other solutions for data-centric tokenization or encryption require application code modification and key management integration that can be challenging to implement at scale or inconsistently applied, leaving gaps in the security model.
Baffle Data Protection Services (DPS) on AWS provides a data-centric protection layer allowing customers to tokenize, encrypt, and mask data in Amazon Relational Database Service (Amazon RDS) at the column or row level, without any application code modifications while supporting a BYOK or HYOK model.
In this post, we will review the architecture for Baffle DPS and how it performs its tokenization and masking functions. Then, we’ll walk you through how to launch and test Baffle DPS from an AWS CloudFormation template with Amazon RDS databases to encrypt data at the column level.
Overview of Solution Components
Baffle DPS consists of three major components:
- Baffle Manager – Administrative console for configuration and management of data protection policies.
- Baffle Shield – TCP protocol reverse proxy that performs encryption, decryption, tokenization, masking, and access control functions.
- Key Virtualization Layer – Interface layer for generating and retrieving encryption keys from key management stores and secrets vaults via industry standard protocols.
Baffle DPS can also run in a headless mode for tighter integration with CI/CD pipelines or DevOps-oriented deployments.
This solution integrates with multiple AWS services including AWS CloudHSM, AWS Key Management Service (AWS KMS), AWS Database Migration Service (AWS DMS), AWS Simple Storage Service (Amazon S3), Amazon Elastic Container Service (Amazon ECS), AWS Fargate, and others.
How it Works
Normally, applications establish a connection to a database via a database client and driver. With the Baffle Shield, a transparent data protection layer is established in front of an Amazon RDS database instance that the application tier (clients, microservices, API calls) connects to instead of a direct connection to the RDS instance.
This transparent layer presents the original data schema to the application tier so any tokenization or masking operation is, effectively, invisible to the client.
The Baffle Shield can be deployed via Amazon Elastic Compute Cloud (Amazon EC2) instances, Docker images, Kubernetes pods, and AWS Fargate.
For high availability, the solution can run behind Elastic Load Balancing with auto scale groups. Alternatively, Kubernetes pods or AWS Fargate can be used to instantiate instances of the shield with scaling and redundancy policies.
Once instances are running, traffic is routed through the Baffle Shield via a connection string update that points to the Baffle Shield IP address or hostname. Application traffic continues to authenticate against the database and connections access any cleartext data at wire speed.
To perform tokenization or encryption of data, the Baffle Manager is used to define a data protection policy to create a privacy schema and maps RDS database columns to a given encryption key and protection mode.
This mapping gets propagated to the Baffle Shield, which performs data encryption and decryption for a given column based on the privacy schema. This provides the Baffle Shield with knowledge of encrypted or tokenized columns while always presenting the original data schema to the application tier.
Baffle supports the following data protection modes for RDS databases:
- NIST standard AES-256 encryption for field or row-level protection
- Format Preserving Encryption (FPE) tokenization
- Dynamic Data Masking
- Role-Based Data Masking
Key management integration is performed via a key virtualization layer. Baffle DPS’s key virtualization layer supports industry standard protocols such as Key Management Interoperability Protocol (KMIP) 1.1 and higher, PKCS #11, and REST to integrate with enterprise key management solutions, hardware security modules (HSMs) and cloud key vaults.
This integration layer allows customers to encrypt data in Amazon RDS while enabling a BYOK or HYOK model.
Deploying Baffle DPS with Amazon RDS
To deploy a test setup of DPS, use the CloudFormation template located here: Baffle CloudFormation Demo Template
This template will provision the following resources:
- One (1) Baffle Manager on Amazon EC2.
- One (1) Baffle Shield on EC2.
- One (1) Amazon RDS instance running MySQL, Postgres, or Microsoft SQL Server with a snapshot of demo data.
- One (1) virtual private cloud with gateway and routing table.
- One (1) security group with no external access. You will need to modify the Security Group to allow external access to the admin console.
After launching the CloudFormation template, follow the steps below to configure the Baffle Manager console and perform data encryption for your database. The default credentials for the databases are included in the CloudFormation template as a comment.
Upon completing these steps, you’ll be able to access the RDS instance directly to observe data encrypted at the field level. Accessing the instance through the Baffle Shield will seamlessly decrypt the data for the client application.
You may also download the Getting Started with Baffle Data Protection Services Guide.
Steps to Configure DPS
- Unlock Baffle Manager. Navigate to https://baffle_manager_ip_address. You will need to modify the template Security Group to allow access from your IP address. You’ll receive certificate warnings on first access. Click Proceed and the screen below will appear.
- To unlock the Baffle Manager, access the system via SSH. Use “baffle” as the username and the key file specified in the CloudFormation template. Enter the following command to retrieve the unlock code:
sudo more /opt/baffle/baffle-manager/initpass
- Paste the retrieved code into the web browser password prompt and click Continue.
- Configure system settings. You will be prompted for hostname and domain settings. All system users must have this domain name as part of their username email address when logging in.
- Configure email settings. This allows Baffle Manager to send emails to provide notifications and for password resets. Enter the SMTP server to use, as well as the credential to use to authentication to the SMTP server. You may also select Skip for test purposes.
- Create admin account. The screen below prompts you to create the initial Baffle Manager administrator account. This account is used to configure the subsequent components, such as the key management store, data store connections, and Baffle Shields.
- Configure credential keystore. This configuration screen establishes an encrypted credential store for any system access credential or access key the Baffle Manager or Baffle Shield utilize.
Select LOCAL for Keystore type. For Secret Key, enter any random string which will be used to generate a random key to encrypt the Keystore Config Password. For Config Password, enter a secure password or passphrase to secure the actual keystore.
- Install SSL certificate. This configuration step allows you to install an SSL certificate to secure access to the Baffle Manager web interface. Upload the certificate and key file for your organization or respective certificate authority (CA) to enable SSL for the Baffle Manager console.
- This should complete the initial setup process and bring you to the login page. Enter the login credentials you created in Step 6.
- Configure a keystore. After logging into Baffle Manager, click the key icon on the left hand navigation panel. If this is the first time you are enrolling a Keystore, only the “baffle_credential_store” that was created in the previous section will be present.
- Click on the +KEYSTORE button in the top right corner to add a new Keystore.
Enter the Keystore Name and Description. For this test, select LOCAL as the Keystore Type and enter a Secret Key to generate a random key for the Keystore. Click Add Keystore when complete.
- Connect an Amazon RDS database. Click on the database icon on the left navigation panel to display a list of configured data stores.
- Click on the +DATABASE button to add a data store. Enter a database name and description.
Specify the database type. Then, enter the hostname or IP and port of the database. For a list of default ports, refer to the respective documentation for your database platform.
Enter the database user credentials. MySQL account privileges are specified in an appendix in the Getting Started with Baffle Data Protection Services Guide.
Select Use SSL to enable an SSL/TLS connection to the database. You will need the Amazon RDS CA certificate.
Click Add Database when completed.
- Connect a Baffle Shield to Baffle Manager. Click on the shield icon on the left navigation panel to display a list of connected Baffle Shields. Click on the +BAFFLE SHIELD button in the upper right hand corner.
Select Automated Deployment for Deployment Model.
Enter the username “centos” to access the Baffle Shield EC2 instance.
Enter the private IP address of the Baffle Shield from the CloudFormation template resources.
Enter a port number the Baffle Shield will use to listen for application connections. The default port is 8444.
Select Use SSL if the data store connection uses SSL.
Select Use SSH Key and upload the key that you selected when you set up the Shield instance.
Click Add Baffle Shield to complete the process. The new Shield will be added to the list of configured Baffle Shields.
- Add an application to create a data protection policy. Click on the Applications icon in the left navigation panel. The defined data protection policies are displayed as Applications. Click on +APPLICATION.
- Enroll application. Enter a name and description.
Choose the Baffle Shield from the drop-down that was configured in the previous section.
Select the data store which you will encrypt.
Select the keystore (LOCAL) to be used as a source for data encryption keys.
Specify the operational mode for the Baffle Shied. Leave Workload Capture off, unless profiling an application.
Specify column level for the encryption method. Click Enroll Application.
- Define the data protection policy. Click on the application configured in the steps above. A sidebar will display information about the application. Click on the ENCRYPT button to define the policy.
- Select fields for encryption. A data schema navigator will open for the configured data store. Select the database, table, and columns you would like to encrypt. Click Next to proceed.
- Encrypt data. Select Parallel Processing and then Encrypt. This will begin encrypting the data in the database.
- You will see a migration in progress on the Applications listing page.
Upon completion of migration, connect to the Amazon RDS database directly from a database client utility (Dbeaver is a free open source utility) and issue a select against the data. You will see the data in a tokenized form.
Finally, connect to the Baffle Shield on port 8444 using the same database client utility. Issuing a select against the data will show the data in a decrypted state.
In this post, we have explained how you can establish a data protection service layer for data stored in Amazon RDS databases.
Without any code changes to the application tier, data can be easily encrypted in a data-centric manner to ensure compliance with data privacy regulations and help reduce the risk of inadvertently exposing sensitive data.
Baffle – AWS Partner Spotlight
Baffle is an AWS Select Technology Partner that provides advanced data protection for “lift and shift” application migrations to AWS.
*Already worked with Baffle? Rate the Partner
*To review an AWS Partner, you must be a customer that has worked with them directly on a project.