Maintaining Control of PII Hosted on AWS with Hold Your Own Key (HYOK) Security

By Les McMonagle, Chief Security Strategist at SecuPi
By Jani Syed, Principal Solutions Architect at AWS

One of the biggest challenges in moving to the cloud for any organization that collects and processes personally identifiable information (PII) is the fundamental change to the trust model.

In traditional on-premises data repositories and applications, the organizations that collect and protect PII are in complete control of all aspects of data security and privacy.

When they migrate their workloads to Amazon Web Services (AWS), they only maintain control of user identities, applications used to access data, and the data itself. Everything else is handled by AWS.

AWS customers are confident the physical and network security controls of AWS facilities meet or exceed their own data center requirements. However, they find it difficult to get used to the new trust model a cloud migration often demands.

SecuPi, an AWS Select Technology Partner, minimizes changes to the trust model and reduces the risk associated with digital transformations.

The fine-grained, purpose specific or attribute-based access control, data protection, user behavior analytics (UBA), and privacy compliance capabilities of the SecuPi Platform enable cloud migrations by allowing organizations to remain in full control over their data hosted in the cloud.

This post will demonstrate how organizations collecting and processing sensitive or regulated PII can reduce barriers to cloud adoption and satisfy the trust model requirements of even the most conservative and risk-averse companies.

Hold Your Own Key (HYOK)

When migrating to AWS, customers must manage many aspects of data security contractually, but the privacy of their customer data remains their responsibility per the AWS Shared Responsibility Model.

Figure 1 – Clear delineation of responsibility for AWS Cloud hosting.

AWS provides a wide range of data security features and functionality, from secure network connectivity to at-rest encryption of data. When customers first insisted they must control the encryption keys used to protect their data, AWS and various security vendors offered bring your own key (BYOK) or manage your own key options to provide basic masking or access controls.

Unfortunately, those approaches are not satisfactory trust models for many organizations because the keys are still stored and used in the cloud. Any user defined function (UDF)-based solution is also slower and less secure.

On the other hand, a hold your own key (HYOK) approach, where the keys remain on-premises, eliminates this barrier to AWS adoption.

Solution Overview

As shown in Figure 2 below, sensitive columns are encrypted before they are loaded to AWS; for example, on Kafka Producers, ETL Servers, or Amazon Simple Storage Service (Amazon S3) files.

They are then re-identified by the relevant policy enforcement point on the consuming applications on-premises for authorized users only, on a “need-to-know” basis without code changes.

Figure 2 – SecuPi Platform architecture for AWS.

HYOK enables companies to anonymize their data on-premises prior to transferring to AWS by encrypting just the data elements that associate the data records to specific individuals (data subjects).

Records are only re-identified at run-time, on-premises, for authorized users or applications. The identical anonymization policy (rule) is applied, regardless of the underlying data store or method for loading or consuming the data:

AWS Glue, Amazon Relational Database Service (Amazon RDS) and Amazon Redshift.
Apache Kafka.
Amazon EMR, Amazon Personalize, Amazon Forecast, and Amazon SageMaker.
Microsoft PowerBI, Tableau, and Jupyter.

The encryption keys remain in-house, while full control over all PII remains in-house and only anonymized data is stored on AWS. Risk is dramatically reduced for all parties involved.

The number of data elements involved to achieve this anonymization or at least pseudonymization depends on the data set. Just enough of the PII fields need to be protected by encryption or tokenization to open the door to more rapid and broader AWS adoption.

Most data analytics can be performed on this anonymized data in the cloud when referential integrity is maintained for these encrypted or tokenized fields. For example, if each Social Security Number (SSN) or equivalent unique national ID number is replaced by an equally unique value, the protected values can be used as the Primary/Foreign Key pairs for table joins, or as unique secondary identifiers.

AWS Customers Remain in Control

With this approach, data analytics access requirements are fully satisfied, PII remains fully protected while in the cloud, and an acceptable trust model is maintained. This lowers the risk for AWS customers, for AWS and, most importantly, for the data subjects themselves, who are the ultimate owners of PII.

AWS security layers operate independent of, and in additional to, the more granular and specific controls provided by SecuPi that span hybrid cloud and on-premises environments.

The SecuPi Platform extends an AWS customer’s span of control over access to their PII, and reduces liability for AWS by dramatically lowering the sensitivity of the data hosted at AWS. With SecuPi, AWS is no longer classified as a Data Processor under General Data Protection Regulation (GDPR), further reducing audit and compliance costs.

SecuPi goes far beyond discovery of sensitive data to providing full life-cycle management of data usage, access, accountability, audit trail, and retention of PII when data subjects exercise their Right of Erasure (RoE) or Right to be Forgotten (RTBF).

This alleviates AWS customer concerns over cloud data center operations personnel or external bad actors having any possible ability to re-identify data hosted on AWS. Everybody wins: the AWS customer, AWS, the privacy regulators, and the data subjects themselves.

How SecuPi Enables AWS Cloud Migrations Involving PII

Cloud migrations often require a single, centrally managed, policy-based, HYOK-enabled, enterprise-class solution. SecuPi’s Policy Management Server is hosted on-premises or in a customer’s virtual private cloud (VPC). This is where policies are created, and then distributed to the various SecuPi policy enforcement points.

Enforcement points are installed on-premises or in the cloud, and act independently to enforce the relevant policies and return data access activity logs to the central policy server.

The enforcement points provide:

Internal data security policy enforcement and compliance.
Privacy compliance (RoE, Breach Notification, Consent and Preference Management).
Fine-grained Purpose or Attribute-Based Access Control (PBAC/ABAC).
Column- and row-level dynamic masking.
Column-level encryption/tokenization.
Result set filtering and blocking based on end-user context or conditions.

SecuPi’s Policy Management Server installed on-premises or in an Amazon VPC provides:

Policy configuration management.
Tamper-proof audit trail of all access to sensitive or regulated data.
Market-leading advanced user behavior analytics (UBA).
Full accountability.
Database activity monitoring (DAM).
Alerting on anomalous data access activity.
Data loss prevention (DLP) features.
Granular policy administration and separation of duties.

SecuPi uses Docker containers or Kubernetes for application container orchestration, and infrastructure as code (IaC) for automated software installations. Access to the Docker or Kubernetes cluster is strictly controlled using Amazon VPC features, including security groups and Network Access Control Lists (ACL).

Customers can utilize Amazon RDS, Amazon Redshift, Amazon EMR, Spark, or Kafka for persistent storage. AWS Key Management Service (KMS) can be leveraged to manage the keys used to provide the coarse grained encryption of data at rest in the cloud, while an on-premises hardware security module (HSM) or KMS is leveraged for HYOK and fine-grained, column- or row-level protection.

AWS also provides encrypted database snapshots copied to a separate AWS Region for disaster recovery using serverless AWS Lambda functions (also fully supported by SecuPi).

All of this is achieved transparent to the end users, applications used to access the data, and data repositories hosting or storing the data. SecuPi does not require any changes to the data layer—no User Defined Functions (UDF), for on-premises Relational Database Management Systems (RDBMS), or Amazon Redshift, Amazon RDS, or Amazon EMR.

No API calls or code changes are required on the applications used to access the data, whether these are hosted on-premises or in the cloud.

Enforcement Points, Rules, Policies, Mappings, and Dynamic Views

SecuPi enforcement points are implemented using different methods depending in the specific use case:

Application overlays deployed on-premises instrumenting the applications used to access the sensitive or regulated data.
Smart driver wrappers that intercept ODBC/JDBC/CLI/Python connections to the data layer.
In-line gateways that intercept all access to the data layer (whether in Snowflake, Amazon RDS, or others).

Rules are then enforced that can modify SQL queries to be fully compliant, encrypt or decrypt individual data elements, or dynamically mask results returned to the user.

For example, the encryption keys in Figure 3, which are generated and managed by SecuPi, are associated with an application called MyApp. This loads data to AWS, and is configured for anonymizing records on ingestion by encrypting selected identifiable fields or columns (rewriter policy).

Figure 3 – Associating keys with data ingestion policy and data rewriter.

Different keys are typically used for different data elements, but normally the same key is used for the same field (SSN, for example) across all applications to maintain referential integrity between data repositories on-premises and in the cloud.

As shown in Figure 4, a comprehensive data transformation policy is configured that associates specific keys and data formats to specific data elements. This policy can be automatically applied to all data loading or data ingestion processes (NiFi or Talend, for example), regardless of the method used to load the data (Amazon S3 or direct) or the source system where the data originates.

Figure 4 – Finalizing data ingestion / transformation with encryption policy configuration.

The next step is to define data consumption policies that control when protected columns are decrypted. Dynamic masking is applied based on user context (who, application used, physical location, time of day) for all sensitive data columns, including those encrypted at rest.

Figure 5 shows a resource access (data consumption) policy that controls who can use or view the contents of the email column in the Customers table.

Figure 5 – Configuring email column resource access (data consumption) policy.

These resource access policies essentially generate dynamic views consistently applied across all data repositories on-premises or on AWS.

It is prohibitively expensive to attempt to recreate the same consistent view layer security controls in a new data storage platform using static views, API calls to external functions, data layer user-defined functions, or joins to security tables.

Dynamic views require no application code changes, no physical or logical data model changes, and no performance impact on the data layers hosted on-premises or in AWS. They negate the need to create, test, and implement complex view layer security controls using proprietary methods on each separate data repository.

In Figure 6, a data rewriter policy is also defined for the Email column that defines the key to decrypt (when required), plus any dynamic masking rules to apply, based on the actual end-user identity and context, not just the shared application ID.

Figure 6 – Configuring a data rewriter policy for the email address column.

Format mappings can be configured when required to handle differences in how column contents are stored or presented, such as different data formats for numeric values (VarChar, Float, Decimal, Integer) or Date data types.

Finally, role mappings can be configured to associate role memberships—managed in external LDAP directories or other authoritative sources for user attribute information—to specific SecuPi data access rights or policies.

This privacy-by-design approach ensures least privilege, and automatically controls access on a need-to-know basis. PII is protected as far upstream in the data flow as possible, and remains protected as far downstream as possible.

Re-identification of data occurs at runtime, just before authorized users or business processes need to consume it. In other words, you can anonymize sensitive data elements, PII, or even intellectual property (IP), prior to loading it to AWS.

You would apply the anonymization on-premises, as far upstream as possible, and only re-identify back on-premises, as far downstream as possible, at runtime for authorized users.

Sensitive PII is never accessible or exposed in the cloud, even when using Jupyter, Amazon SageMaker, or Amazon Personalize machine learning algorithms for analyzing the data.

For example, the following command calls the SecuPi JAR file, specifies the policy to adequately anonymize the records, and the input and output files to be used.

java -jar secupi-boot-processor-929d3af1-8625-4c5d-876a-677cea773144.jar --policy policyName --in filename --out s3://bucket/outfilename

The policy includes formatting information, field separator character(s), data type, encryption keys to use, and other details to support record de-identification (when posting to AWS) and re-identification at run-time for authorized users.

SecuPi’s advanced user behavior analytics sends alerts whenever defined threshold limits are exceeded (query risk score, row count, VIP customer access). Threshold alerts in SecuPi UBA ensure automated monitoring. They can be configured to proactively block excessive or inappropriate access to sensitive data, even protecting against stolen credentials and compromised identities.

SecuPi protects and controls access to data in the same way, whether on-premises or in Amazon RDS, Amazon Redshift, Amazon EMR, or other cloud-hosted data repositories, including load files and other transient data stored in Amazon S3 buckets.

Conclusion

The Hold Your Own Key (HYOK) approach and SecuPi’s platform accelerate cloud migrations because they satisfy the trust model needs of AWS customers. They reduce risk for all parties involved, while simultaneously lowering costs associated with data layer platform changes and migrating data.

AWS customers can simultaneously leverage AWS-provided infrastructure and data protection capabilities to support the hosting and processing of regulated data. File level encryption of Amazon S3 files, network traffic encryption, and other data security features, adds additional layers of security.

SecuPi’s HYOK approach satisfies even the most stringent trust model requirements, lowers risk, lowers costs, and simplifies compliance with a wide range of data privacy regulations and internationally accepted privacy principles.

.

.

SecuPi – AWS Partner Spotlight

SecuPi is an AWS Select Technology Partner that minimizes changes to the trust model and reduces the risk associated with digital transformations.

Contact SecuPi | Partner Overview

*Already worked with SecuPi? Rate the Partner

*To review an AWS Partner, you must be a customer that has worked with them directly on a project.