[SEO Subhead]
This Guidance shows how to build a serverless tokenization framework that replaces sensitive data with unique, formatted identifiers known as "tokens." These tokens can be used in place of the original data in frontend or backend applications, allowing for the generation of tokens, storage of client-side encrypted sensitive data in a token vault, and retrieval of original sensitive data when necessary. The framework incorporates multi-layered security measures to protect tokenization and de-tokenization APIs. By adopting this serverless approach, organizations can enhance data security while reducing the costs and overhead associated with managing and scaling resources for tokenizing customers' sensitive data. Additionally, it lowers the cost of meeting compliance requirements, such as those set by the Payment Card Industry Data Security Standard (PCI DSS), while effectively safeguarding personally identifiable information (PII).
Please note: [Disclaimer]
Architecture Diagram

[Architecture diagram description]
Step 1
The customer-facing application authenticates with Amazon Cognito and obtains an authorization token to access the tokenization APIs.
Get Started

Well-Architected Pillars

The AWS Well-Architected Framework helps you understand the pros and cons of the decisions you make when building systems in the cloud. The six pillars of the Framework allow you to learn architectural best practices for designing and operating reliable, secure, efficient, cost-effective, and sustainable systems. Using the AWS Well-Architected Tool, available at no charge in the AWS Management Console, you can review your workloads against these best practices by answering a set of questions for each pillar.
The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.
-
Operational Excellence
AWS X-Ray and Amazon CloudWatch Logs enable visualization and logging of tokenization transactions across API Gateway, Lambda functions, and Lambda layers. By visualizing traces and collecting logs, users can more easily troubleshoot performance bottlenecks or identify failures.
Moreover, the AWS Database Encryption software development kit (SDK) for DynamoDB provides APIs for encryption, decryption, and key management, reducing overhead compared to manual service integrations and cryptographic implementations.
Lastly, the included AWS CloudFormation template automates provisioning of required resources, streamlining deployment to support users with quick experimentation, and reducing the overhead of manually configuring services.
-
Security
The services selected for this Guidance work in tandem to secure API access, protect the sensitive data network, enable fine-grained access control, manage encryption keys to reduce risk, and enforce mutual TLS. Specifically, AWS WAF filters incoming traffic to allow only legitimate access to tokenization APIs, preventing distributed denial of service (DDoS) attacks. Amazon VPC endpoints and AWS PrivateLink control network-level access to DynamoDB tables storing sensitive data and keys. The AWS IAM Access Analyzer provides insights to fine-tune access permissions. AWS KMS manages the encryption keys used by the tokenization Lambda function. Amazon Cognito handles user authentication and authorization for the tokenization APIs. And lastly, the Database Encryption SDK for DynamoDB generates secure data encryption keys from AWS KMS and stores encrypted data in DynamoDB.
-
Reliability
The API Gateway API keys help to rate limit APIs for different API clients and set burst rate limits for managing transactions per second. AWS KMS has a request per second quota on cryptographic operations, and API throttling prevents requests from exceeding the current quota limit. Lambda makes the tokenization APIs highly scalable to meet the fluctuating demands of tokenizing sensitive data, while the AWS Serverless Application Model (AWS SAM) simplifies the deployment of new code versions and automation templates.
Furthermore, the use of private subnets deployed across multiple Availability Zones (AZs), Regional services with built-in resilience and high availability, multi-AZ Amazon VPC endpoints, and Amazon DynamoDB global tables provide enhanced reliability and availability. AWS SAM also provides a higher-level abstraction on top of CloudFormation to define Lambda functions and enable local unit testing. Collectively, these services provide the framework to help ensure workloads perform their intended functions correctly and consistently, while also enabling quick recovery from failures.
-
Performance Efficiency
API Gateway and Lambda enable near real-time, synchronous, event-driven communication between the client (UI) and server. The Lambda function can also handle thousands of tokenization requests per second in real-time. Similarly, API Gateway can handle thousands of API requests per second in real-time to tokenize sensitive data when a user submits information on a web page.
Furthermore, DynamoDB allows for the storage of unstructured information at scale with a latency of less than a few milliseconds. Moreover, DynamoDB provides a low-latency database layer for storing encrypted sensitive information and generated tokens.
-
Cost Optimization
The Lambda function allows memory and CPU requirements to be optimized for price and performance using the AWS Lambda Power Tuning tool. Users can also select the Amazon DynamoDB Standard-Infrequent Access (Standard-IA) storage class for workloads that require long-term storage of infrequently accessed data, thereby optimizing storage costs. Both Lambda and DynamoDB provide on-demand and provisioned capacity options to cater to various price and performance scenarios.
Lastly, PrivateLink optimizes the data transfer costs by keeping the network traffic within the AWS network and avoiding charges for NAT gateway, a Network Address Translation (NAT) service.
-
Sustainability
Lambda, API Gateway, and DynamoDB are designed to scale dynamically to meet the demand for optimized resource utilization, thereby reducing the energy usage required to run the servers. These are serverless services that optimize resource utilization and dynamically scale to match the demands of the tokenization and de-tokenization APIs. The storage and compute layers scale dynamically to accommodate the incoming traffic demands, which in turn reduces the overall energy usage and environmental impacts.
Related Content

Building a serverless tokenization solution to mask sensitive data
How to use tokenization to improve data security and reduce audit scope
Disclaimer
The sample code; software libraries; command line tools; proofs of concept; templates; or other related technology (including any of the foregoing that are provided by our personnel) is provided to you as AWS Content under the AWS Customer Agreement, or the relevant written agreement between you and AWS (whichever applies). You should not use this AWS Content in your production accounts, or on production or other critical data. You are responsible for testing, securing, and optimizing the AWS Content, such as sample code, as appropriate for production grade use based on your specific quality control practices and standards. Deploying AWS Content may incur AWS charges for creating or using AWS chargeable resources, such as running Amazon EC2 instances or using Amazon S3 storage.
References to third-party services or organizations in this Guidance do not imply an endorsement, sponsorship, or affiliation between Amazon or AWS and the third party. Guidance from AWS is a technical starting point, and you can customize your integration with third-party services when you deploy the architecture.