[SEO Subhead]
This Guidance shows how to implement an accurate, resilient, serverless, and event-driven payroll processing system designed with one-time processing requirements and failure-handling patterns. If you have transactional requirements when writing data to your systems of record, you have likely used features inherent to your relational database system. However, when you move to an asynchronous model in the cloud, many of the approaches that your architects and developers have relied on might not be available. This Guidance addresses challenges with data consistency by providing a transactional, or 'saga,' pattern to handle rollbacks and compensating actions when failures occur during the multi-step payroll processing workflow.
Please note: [Disclaimer]
Architecture Diagram

[Architecture diagram description]
Step 1
A scheduled event invokes the payroll process.
Step 2
The GetEmployers AWS Lambda function connects to the Amazon Relational Database Service (Amazon RDS) Proxy as a connection pool to connect to downstream relational databases at scale.
Step 3
The GetEmployers Lambda function gets employer information from an Amazon Aurora database, as this use case relies on a relational data model common for payroll systems.
Step 4
The GetEmployers Lambda function stores all the employer information in an Amazon Simple Storage Service (Amazon S3) bucket.
Step 5
The Amazon Simple Queue Service (Amazon SQS) dead-letter queue (DLQ) stores messages that failed to get processed, to be processed and reviewed later.
Step 6
The Redrive Lambda function, run by a scheduled Amazon EventBridge cron event, processes failed messages from the Amazon SQS DLQ using ad-hoc API calls.
Step 7
The GetEmployers Lambda function invokes AWS Step Functions after the completion of steps 2–5. This standard Step Functions workflow uses two distributed maps: The Child-Distributed Map processes multiple employers in parallel, while the Parent-Distributed Map processes the employees of each employer in parallel.
Step 8
The Parent-Distributed Map state reads employer information from an Amazon S3 object. The ‘Begin’ status of processing each employer is written into the Amazon DynamoDB EmployerStatus database.
Step 9
The GetEmployees Lambda function retrieves employee data for each employer from Aurora using Amazon RDS Proxy, storing the results as JSON objects in an Amazon S3 bucket.
Step 10
The Child-Distributed Map begins processing the individual employee payroll by running multiple parallel threads. It reads employee information from the Amazon S3 object (stored in step 9) and starts by posting a ‘Begin’ status in the EmployeeStatus DynamoDB database.
Step 11
Each employee’s payroll is processed through a series of Amazon SQS queues, which then invoke Lambda functions using a task token. The task-token approach gives the workflow the flexibility to call the downstream systems asynchronously.
In this step, the responses from downstream systems (like the TimeEntry, PayrollCalc, TaxCalc, and RetirementCalc Lambda functions and their corresponding APIs) are returned to corresponding Amazon SQS queues using the task token. This completes the task and launches the next step in the Step Functions workflow.
Step 12
The Employer completion Amazon SQS queue invokes the Employer validation Lambda function, which validates that data for all the employees of an employer was processed successfully.
Step 13
If some employees' data was not processed successfully, the Employer validation Lambda function sends the failed messages to the Amazon SQS Process Dead-Letter Queue (DLQ).
Step 14
If all the employees’ payrolls were calculated correctly for the employer, the Lambda function sends the message to the Amazon SQS PushPayment queue to process the payment through an external payment system.
Step 15
Once the payment is successful, the MoneyMovement Lambda function sends the completion notification to users through Amazon Simple Notification Service (Amazon SNS).
Step 16
The messages previously sent to the Amazon SQS process DLQ are processed via ad-hoc API calls using Amazon API Gateway, depending on the type of error and resolution. AWS WAF protects those calls against common web exploits and bots that can affect availability, compromise security, or consume excessive resources.
AWS Shield, a managed distributed denial of service (DDoS) protection service, safeguards applications running on AWS.
Get Started

Well-Architected Pillars

The AWS Well-Architected Framework helps you understand the pros and cons of the decisions you make when building systems in the cloud. The six pillars of the Framework allow you to learn architectural best practices for designing and operating reliable, secure, efficient, cost-effective, and sustainable systems. Using the AWS Well-Architected Tool, available at no charge in the AWS Management Console, you can review your workloads against these best practices by answering a set of questions for each pillar.
The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.
-
Operational Excellence
Event-driven systems are asynchronous in nature and use decoupled microservices that can scale and fail independently. This Guidance is designed to handle planned and unplanned events. For example, EventBridge and API Gateway facilitate both scheduled requests for planning activities and one-time requests for unplanned activities. Additionally, the use of serverless services enables microservices to scale dynamically as the business scales. Amazon SQS, Lambda, DynamoDB, and Step Functions (which provide distributed map and lifecycle hooks) can scale while maintaining a durable orchestrated workflow. The system maintains idempotency, allowing repeated processing of the same operation without unintended side effects.
-
Security
This Guidance uses multiple layers of security to protect data as it moves across systems and from public to private resources. It encrypts network traffic using TLS, and API Gateway uses authentication and authorization services to protect backend services from untrusted sources. Additionally, Shield provides DDoS protection, and AWS WAF provides conditional access policies that control access to protected content.
-
Reliability
In distributed systems, each service can scale independently based on variable incoming requests or events, but this can lead to unpredictable volumes of traffic at individual services because each component accepts, completes, and hands off work at a different rate. This Guidance uses Amazon SQS to decouple services within a distributed system and provide a durable queue, which buffers downstream services from spikes in volume until they are able to process the work. DynamoDB integrates and scales alongside serverless systems and provides system idempotency by tracking the state of work as it moves through the distributed system.
-
Performance Efficiency
This Guidance uses serverless services, removing the need for you to manually provision servers to handle peak volume. Serverless technologies like Step Functions, Amazon SQS, Lambda, EventBridge, and API Gateway scale alongside request volume, often within seconds, and efficiently scale up and down to protect your solution against the under- or overprovisioning of resources.
-
Cost Optimization
Serverless technologies like Lambda, Amazon SQS, and API Gateway use a pay-for-use billing model, and they elastically scale resources up and down alongside events. This removes the operational overhead of capacity management and patching and optimizes resource allocation to match the load. EventBridge and Lambda also enable an event-driven architecture that removes the need to keep resources continuously allocated to poll or track work as it passes between services. Additionally, event-driven components are loosely coupled, promoting greater flexibility and extensibility of applications and thereby improving your developers’ operational efficiency.
-
Sustainability
Step Functions, Amazon SQS, Lambda, EventBridge, API Gateway, and DynamoDB dynamically scale alongside requests for optimal provisioning based on workload volume. As a result, you don’t need to overprovision services to meet peak demand. Backed by loosely coupled microservices that scale independently, this event-driven system can allocate resources to the specific service that needs to process the event. Additionally, the AWS US East and West Regions use 100 percent renewable energy to power their compute resources.
Related Content

[Title]
Disclaimer
The sample code; software libraries; command line tools; proofs of concept; templates; or other related technology (including any of the foregoing that are provided by our personnel) is provided to you as AWS Content under the AWS Customer Agreement, or the relevant written agreement between you and AWS (whichever applies). You should not use this AWS Content in your production accounts, or on production or other critical data. You are responsible for testing, securing, and optimizing the AWS Content, such as sample code, as appropriate for production grade use based on your specific quality control practices and standards. Deploying AWS Content may incur AWS charges for creating or using AWS chargeable resources, such as running Amazon EC2 instances or using Amazon S3 storage.
References to third-party services or organizations in this Guidance do not imply an endorsement, sponsorship, or affiliation between Amazon or AWS and the third party. Guidance from AWS is a technical starting point, and you can customize your integration with third-party services when you deploy the architecture.