AWS Partner Network (APN) Blog

How to Mask Sensitive Data on AWS Using DataMasque

By Preet Sawhney, Security Risk and Compliance Consultant – AWS Professional Services
By Pankaj Miglani, Sr. Partner Solutions Architect, Security – AWS

Connect with DataMasque-1

Amazon Web Services (AWS) has a large number of customers that handle the gamut of data types, including their customers’ personal information which must always be protected.

Organizations are entrusted by their customers to protect their personal data, and far-reaching privacy regulations like HIPAA, PCI-DSS, IRAP, SOX, GDPR, and others mandate the need to control and limit access to sensitive data.

Yet, to better serve customers, the likes of developers, testers, analysts, and third parties demand access to high quality, production-like databases for their non-production environments.

As data is copied from production to non-production databases for testing and other in-house initiatives, sensitive data proliferates through the organization, expanding the security and compliance footprint and ultimately increasing the likelihood of a data breach.

With cyber-attacks getting more sophisticated by the day, the risk of fraud and data breach is at an all-time high. It’s imperative that AWS customers manage their data safely to meet regulatory compliance, maintain and improve security posture, and deliver on business efficiency and brand reputation objectives.

As security experts working at AWS, we have identified DataMasque as an effective and proven data masking solution to meet the requirements described above.

DataMasque is an AWS Partner that removes sensitive data from the databases and replaces it with realistic and functional masked values that enable effective development, testing, and analytics.

This empowers AWS customers to do secure deployments in testing or non-production environments while mitigating the data breach security risks and allowing relevant teams to focus on development and testing of their applications without hinderance.

Business Value Proposition

The business goals of a successful data masking solution are to protect customers and end users while empowering the data by removing sensitivity to ensure it can be used to its full potential.

Protect Customers

  • Eliminate possibility of exposure of personal identification to unintended parties.
  • Protect data integrity.
  • Assure customers their personal data is not used for unconsented purposes.

Protect End Users

  • Give users confidence their data is protected.
  • Provide the tools, guidance, and support that aligns with the responsibility.
  • Provide safeguards and fail safes to protect users.

Empower the Data

  • By removing sensitivity, the data can used to its full potential.
  • Gives access of data to a wider set of internal teams for meaningful use cases without the risk of data breach.
  • Make data shareable with third parties and external analytical teams to provide value-added services.

Use Cases

Multiple organizations—like banks and financial organizations, analytics and research companies, service providers, large retail chains, and medical and healthcare providers—typically find databases to be the epicenter of the risks that can potentially lead to data breaches.

Such organizations need to mask sensitive data for the aforementioned reasons.

PII Discovery

The first step to address the risk of personally identifiable information (PII) exposure is the discovery of PII across databases. This is where DataMasque plays a vital role and provides powerful PII discovery functionality which:

  • Discovers sensitive data by performing a metadata keyword search with custom keyword support.
  • Provides the capability to exclude specific column names during schema/sensitive data discovery based on certain keywords.
  • Provides proactive and ongoing protection by identifying new sensitive information.

Masking in Databases

This post focuses on the following scenarios:

  • Masking within AWS.
  • On premises-to-AWS migration.
  • Migration from a third-party public cloud to AWS.

Solution Overview

Irrespective of the applicable use case, the DataMasque solution performs the following:

  • Mask personal information from a copy of production database within the production/secure zone.
  • Use the masked data to deploy a database in non-production environment.
  • Maintain referential integrity and data consistency after data masking.

Here is the pictorial representation of what DataMasque essentially does:


Figure 1 – Logical illustration.

On Premises and Third-Party Cloud to AWS Migration

For customers migrating their applications to AWS, migration of non-production databases is a critical task, more so if the databases have unmasked PII in them.

PII exposure is a huge risk in this scenario, and the most efficient way to mitigate it is to mask the database before you migrate the database.

Since there could be multiple tools and processes deployed in such a migration, here’s a technology-agnostic view of the sequence of events:

  • Clone the production database with unmasked PII within the same environment.
  • Run the masking process using the deployed DataMasque instance in the production environment.
  • Copy/clone the masked database ready to be used in the non-prod environment.


Figure 2 – Workflow diagram.

Database Masking Within AWS

For customers wanting to re-structure their AWS environment or further modernize the apps, there would be a need to deploy the databases in new AWS accounts or virtual private clouds (VPCs). If the existing databases contain unmasked PII, the risk of PII exposure is relevant and can have negative consequences.

DataMasque’s AWS-centered solution has been designed with a single AWS CloudFormation template that:

  • Takes unmasked encrypted Amazon Relational Database Service (Amazon RDS) production snapshots.
  • Calls the DataMasque API to mask the databases.
  • Outputs encrypted and masked RDS snapshots safe to be used in non-production.

In order to accelerate the data delivery for non-production use cases for AWS customers, DataMasque has been split into two components—AWS Step Functions and AWS Service Catalog.

AWS Step Functions

The AWS Step Functions component contains the CloudFormation templates (Amazon RDS and Amazon Aurora) to create masked snapshots, and it covers the automation steps (blue box) in the diagram below.


Figure 3 – Deployment architecture.

AWS Service Catalog

The AWS Service Catalog component contains the CloudFormation templates to allow users to provision non-prod RDS and Aurora instances from available masked snapshots, and it covers the self-service steps (purple box) in the diagram above.

The salient features of this solution are:

  • Fully automated: API-first architecture and seamless integration with the existing CI/CD tooling in use.
  • Referential integrity: Supports masking of primary keys and unique keys, and automatically maintains referential integrity of foreign keys.
  • Data consistency: Provides consistency across all occurrences of an information “type” masked using the same algorithm across tables, databases, and database engines.
  • Ongoing protection: Proactively scans for new databases and tables that may contain sensitive data.
  • Irreversible masking: Uses cryptographically secure SHA-512 salted hash.


Data is growing at staggering rates, challenging organizations to adopt new ways to protect sensitive data while making it available for business advantage.

Data masking is an integral part of an organization’s data security strategy, and the need for it is only going to get bigger with time.

DataMasque uses powerful, efficient, and best-of-breed techniques to protect personal data, solving some of the biggest organizational and security challenges:

  • Protect customers and grow your business.
  • Enhance data breach protection.
  • Streamline data privacy compliance.
  • Maintain data sovereignty in the cloud.

DataMasque is available in AWS Marketplace. You can also learn more on the DataMasque website.


DataMasque – AWS Partner Spotlight

DataMasque is an AWS Partner that removes sensitive data from the databases and replaces it with realistic and functional masked values that enable effective development, testing, and analytics.

Contact DataMasque | Partner Overview | AWS Marketplace