AWS Partner Network (APN) Blog

Protecting and Managing Sensitive Customer Data with Skyflow and Cloud Storage Security

By Ashok Mahajan, Sr. Partner Solutions Architect, Startups – AWS
By Ed Casmer, CTO – Cloud Storage Security
By Gokhul Srinivasan, Sr. Partner Solutions Architect, Startups – AWS
By Sean Falconer, Head of Marketing – Skyflow

Securing personally identifiable information (PII) while maintaining compliance can be a daunting task for organizations. Despite best intentions, PII often finds itself scattered across various repositories such as databases, data warehouses, log files, and backups. This makes the maintenance of robust security and compliance measures an uphill battle.

File management only adds to the complexity, requiring stringent security measures, strict access controls, and compliance-oriented storage practices. The risk of data loss and malware threats further intensifies when organizations receive files from external sources such as customers. Organizations must scan such external files before processing for viruses and malware to mitigate potential threats.

To minimize risk and de-scope existing upstream and downstream systems, organizations use Skyflow which is available in AWS Marketplace. Skyflow Data Privacy Vault delivers security, compliance, and data residency for your Amazon Web Services (AWS) workloads.

Skyflow, an AWS Partner, uses Cloud Storage Security (CSS) to automatically and asynchronously scan uploaded files for malicious code and malware. CSS is an AWS Specialization Partner with the Security Competency, and it helps to further protect your infrastructure and ease the burden of sensitive file management.

In this post, we’ll show how to secure PII data using Skyflow Data Privacy Vault and add malware protection using Cloud Storage Security on AWS.

Skyflow Data Privacy Vault

Skyflow is a software-as-a-service (SaaS) offering that supports multi-tenant and single-tenant deployment models. Skyflow Data Privacy Vault isolates, protects, and governs access to sensitive customer data, which is transformed by the vault into opaque tokens that serve as references to this data. The non-sensitive tokens can be safely stored in any application storage systems or used in data warehouses.

A Skyflow vault can keep sensitive data in a specific geographic location, and tightly controls access to this data. Other systems only have access to non-sensitive tokenized data.

In the example below, a phone number (555-1212) is collected by a frontend application. This phone number, along with any other PII, is transformed by the vault, which is isolated outside of your company’s existing infrastructure.

Any downstream services (such as a database) store only the token representation of the data (e.g. ABC123), and are removed from the scope of compliance. The token representation can preserve formatting as needed and be consistently generated to not break analytics and machine learning (ML) workflows.

Example of reducing compliance and security scope with a data privacy vault

Figure 1 – Reducing compliance and security scope with a data privacy vault.

A data privacy vault serves as core infrastructure for PII, and Skyflow Data Privacy Vault provides this core infrastructure as a service which includes compute, storage, and network. The core architectural block is simplified to an API call, and Skyflow uses polymorphic encryption which combines multiple forms of encryption to secure PII and make it usable. This allows you to perform operations over fully encrypted data.

You can build any PII-specific workload on a Skyflow vault for data sharing, analytics, and encrypted operations. This way you could find all records with the same area code without decrypting the data or calculate the average income of your customers, again without exposing yourself, your employees, or your infrastructure to PII.

Working with a Skyflow Vault

While a data privacy vault isn’t a database, Skyflow Data Privacy Vault was designed to have some similar properties. For example, a Skyflow vault supports a schema that can consist of tables, columns, and rows (see image below).

Example of a vault schema with four tables

Figure 2 – Vault schema with four tables.

The vault is specially designed for supporting the full lifecycle of sensitive data, and it understands the structure of PII and its uses. For example, a Skyflow vault understands a social security number as a data type, not simply a string. This means the vault natively supports use cases like showing only the last four digits of a social security number based on the roles and policies you set up, or securely sharing the full social security number with a third-party vendor of identity verification.

The vault not only transforms sensitive data into non-sensitive data, but it tightly controls access to sensitive data through a zero-trust model where no user account or process has access to data unless it’s granted by explicit access control policies. These policies are built from the bottom, granting access to specific columns and rows of PII. This allows you to control who sees what, when, where, for how long, and in what format.

To store, manage, and retrieve data with Skyflow, you can use APIs directly or software development kits (SDKs). Skyflow supports both frontend and backend SDKs. Depending on your needs and where you choose to integrate, that will impact which SDK you use.

To learn more about the Skyflow SDKs and APIs, check out the documentation.

Solution Overview

To demonstrate secure file storage and management through Skyflow, let’s look at how this solution de-scopes both the frontend and backend application from touching the sensitive documents.

  • In addition to Skyflow vault, the solution also uses Amazon API Gateway to serve as the backend API entry point for passing non-sensitive data downstream.
  • AWS Lambda to receive and safely store non-sensitive data in Amazon DynamoDB.
  • AWS Secrets Manager for secure storage and management of the Skyflow vault service account key.
  • Amazon DynamoDB to save the associated skyflow_id shared by the vault after secure file storage.
  • Cloud Storage Security to automatically, behind the scenes, ensure that the file is free of viruses and other potential threats.

The following architecture diagram illustrates the file upload flow with Skyflow, AWS services mentioned above and CSS.

Skyflow-CSS-Sensitive-Data-3

Figure 3 – Example of file upload processed through Skyflow and CSS.

  1. Skyflow Javascript SDK is used by the frontend application to first retrieve an auth bearer token for Skyflow API authentication. To retrieve the auth bearer token, the frontend makes an API call through Amazon API Gateway, which passes to AWS Lambda.
  2. A Lambda function requests the service account key from AWS Secrets Manager.
  3. AWS Secrets Manager returns the service account key.
  4. Lambda uses the Skyflow server-side SDK to sign a JSON Web Token (JWT) token and request an auth bearer token.
  5. Skyflow vault authenticates the JWT token and returns an auth bearer token.
  6. The auth bearer token is passed back to the frontend application.
  7. Skyflow Elements uploads the sensitive file to the Skyflow vault.
  8. Cloud Storage Security asynchronously starts a virus scan of the file.
  9. Skyflow ID for the secured file is returned to the frontend application.
  10. Skyflow ID is posted to the API Gateway and passed through to another Lambda function.
  11. AWS Lambda stores the Skyflow ID in Amazon DynamoDB.

Access Control

To control access to the customer’s vault, policies are created in Skyflow to allow programmatic writes into the vault table for client records.

Read and update access needed to be restricted to the single record owned by the currently logged in user. Skyflow customers can use an authentication service like Auth0 and the customer application knows who the user is based on the Auth0 token.

Skyflow vault respects the identity of the user and restrict access based on this identity. To support this requirement, customers use Skyflow’s context-aware authorization.

Context-Aware Authorization

Programmatic access to Skyflow APIs is controlled through a service account created within your Skyflow account. The service account’s roles, and the policies attached to those roles, decide the level of access a service account has to a vault. The creation of Skyflow roles, policies, and service accounts is controlled programmatically through Skyflow’s management APIs or through Skyflow Studio, Skyflow’s web-based vault administration portal (see image below).

An example of creating a policy from Skyflow Studio

Figure 4 – Example of creating a policy from Skyflow Studio.

Context-aware authorization lets your backend insert an additional claim for end user context into the JWT insertion. You can use any string that uniquely identifies the end user, such as the token provided by Auth0 after a client successfully logs in.

After the additional claim is added, the vault verifies the request and returns a bearer token with the context identifier. The diagram in Figure 5 below illustrates authentication with contextual information for the Skyflow customer and data retrieval.

  1. Frontend app requests a bearer token from the backend app server.
  2. Backend retrieves context information (the Auth0 token in the case of this Skyflow customer) for the end user.
  3. Backend authenticates the identity of the service account with the vault.
  4. Vault returns authentication confirmation as a context-aware bearer token to the backend server.
  5. App backend forwards the context-aware bearer token to the frontend.
  6. App detokenizes customer data to display plaintext and masked data to the application user based on the applied access control policies.

Context-aware authorization flow diagram using Auth0 token for context

Figure 5 – Context-aware authorization flow diagram using Auth0 token for context.

Using the returned bearer token with the context restriction, the frontend customer application is able to retrieve the PII and files owned by the currently logged in user and only that user (Step 6).

Further, the time-to-live (TTL) on the bearer token can be controlled, so the token can be set to live only long enough to retrieve the record for the client.

Securing PII and Files from the Application Frontend

When collecting and managing sensitive data like files containing PII, it’s best practice to take the entire application infrastructure out of security and compliance scope including the frontend.

Skyflow Elements provides a secure way to collect and reveal sensitive data including files. It offers several benefits, including complete programmatic isolation from your frontend applications, end-to-end encryption, tokenization, and the ability to customize the look and feel of the data collection form.

When users interact with Skyflow Elements, various components work together to collect and reveal sensitive data. Here’s how it works:

  • When user enters sensitive data into collect elements, the client-side SDK sends the data to your vault and receives tokens that represent the data.
  • When you need to reveal the data to a user, the client-side SDK sends the tokens to your vault, receives the data, and displays the data in reveal elements.

After uploading a file, Skyflow automatically scans the file for viruses leveraging the CSS integration within the vault. You can retrieve the status of a scan using the Get Status Scan API.

If the file doesn’t contain a virus, a status of SCAN_CLEAN is returned and the file is available for downloading or in-page retrieval. Otherwise, a status of SCAN_INFECTED is returned and the file moved into quarantine.

To reveal an uploaded file, the file is embedded into the web frontend as an iframe so the file never touches the customer’s servers.

Skyflow enables a business to offload the security, privacy, and compliance responsibilities of sensitive file and PII handling so its can focus resources on their core business.

Summary

In this post, we discussed the challenges businesses face with managing sensitive customer data. We reviewed how to secure personally identifiable information (PII) using Skyflow Data Privacy Vault and add malware protection using Cloud Storage Security (CSS) on AWS.

We also showed how Skyflow Data Privacy Vault can securely collect, manage, and use sensitive data. Skyflow integrates with CSS to support automatic virus and malware detection and protection for files.

To learn more, contact Skyflow or try out Skyflow in AWS Marketplace. For additional information regarding Cloud Storage Security, check out CSS in AWS Marketplace.