This Guidance shows how you can automate background check reporting and auditing using AWS artificial intelligence and analytics services and immutable ledger technology. Background checks are essential for every business to assess hiring risks and comply with industry standards, like Service Organization Control Type 2 (SOC2). However, manual auditing processes can be redundant, error-prone, inefficient at scale, time-consuming, and often have no precise mechanism to track the history of when data is updated. By using this Guidance, you can set up a solution that automates tasks, increases quality and efficiency, and tracks history and data lineage, helping you meet compliance requirements while building a strong brand reputation.
Please note: [Disclaimer]
[Architecture diagram description]
Risk and compliance teams receive background check reports from direct sources or third-parties, which are then uploaded into Amazon Simple Storage Service (Amazon S3). These reports are stored in Amazon S3 using frontend applications or are automatically moved from the Network File System (NFS) or Server Message Block (SMB) storage using AWS DataSync or AWS Storage Gateway.
The report extraction stage uses an AWS Lambda function as an invocation type to read files from Amazon S3. Amazon Textract extracts data from the report files and stores it on Amazon Simple Queue Service (Amazon SQS). Amazon SQS supports dead-letter queues (DLQs), which other queues can target for messages that are not processed successfully.
In the storage stage, a Lambda function (as an invocation type) reads the messages from Amazon SQS and stores the message payload on Amazon Quantum Ledger Database (Amazon QLDB). Amazon QLDB maintains the entire history of data changes on individual data records in an immutable fashion for full traceability. Any messages that are not processed successfully are moved to the DLQ.
In the notification stage, a Lambda function validates messages on the DLQ and sends email notifications using Amazon Simple Notification Service (Amazon SNS).
An AWS Glue workflow transforms the Amazon Ion extract from Amazon QLDB into partitioned Apache Parquet files. An AWS Glue crawler reads and catalogs the Parquet-formatted version, making it available to Amazon Athena.
Athena views that are created on partitioned Parquet data provide data enrichment for the Amazon QuickSight dashboard.
The QuickSight dashboard uses Athena views to create business intelligence dashboards on top of background check reports.
The AWS Well-Architected Framework helps you understand the pros and cons of the decisions you make when building systems in the cloud. The six pillars of the Framework allow you to learn architectural best practices for designing and operating reliable, secure, efficient, cost-effective, and sustainable systems. Using the AWS Well-Architected Tool, available at no charge in the AWS Management Console, you can review your workloads against these best practices by answering a set of questions for each pillar.
The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.
Lambda helps reduce operational overhead because there is no need to provision, scale, or manage servers. Additionally, versioning support lets you publish multiple versions of a function and roll back to previous versions if needed. Step Functions provides resilience, and its state machines are automatically distributed across Availability Zones (AZs) for high availability. It also reduces operational overhead by managing state, checkpoints, and restarts for you, and you can easily change Step Functions workflows without writing code. EventBridge manages all the underlying infrastructure needed to deliver events at scale, so you don’t need to build custom event solutions. CloudWatch provides integrated monitoring for events and workflows, providing visibility into service performance and issues.
AWS Identity and Access Management (IAM) policies govern the permissions and policies for specific AWS services and associated API actions, applying least-privilege controls to allow only the least required permissions for effective operation of any particular service. AWS Key Management Service (AWS KMS) lets you encrypt the data at rest, and the use of the recommended TLS 1.3 keeps data encrypted in transit during the request flow. The Amazon QLDB Federal Information Processing Standard (FIPS) endpoint uses a TLS library that complies with FIPS 140-2. This endpoint helps security-sensitive ledger applications meet strict encryption, compliance, and regulatory requirements required by businesses that interact with state governments.
This Guidance stores source data Amazon S3, an object storage service that offers 99.999999999% (11 nines) data durability. Amazon QLDB is fully managed, serverless, highly available, and helps you store extracted data and all data changes. Its ledger is deployed across multiple AZs, with multiple copies for each AZ, and ledger write is acknowledged only after being written to durable storage in multiple AZs to provide strong durability of your data. DataSync provides reliable data transfer capabilities between on-premises and cloud storage, keeping data in sync across locations and automatically resuming data transfer jobs from points of failure. Lambda automatically scales to handle thousands of concurrent implementations without any capacity planning, and it retries implementations at least three times before events might be rejected, improving reliability.
Amazon Textract accelerates workflows that rely on information from documents and supports both synchronous and asynchronous operations. It lets you quickly extract structured data, such as fields, values and tables, from documents at scale using machine learning models. Amazon SQS and Amazon SNS support push-based and pull-based communication, respectively, to optimize for different use cases. The push model of Amazon SNS lets you send time-critical notifications, and Amazon SQS lets you decouple sending and receiving components through an asynchronous messaging model.
This Guidance provides an event-driven architecture that uses serverless services and managed services, helping you optimize costs without under-provisioning or over-provisioning. AWS serverless services let you pay only for what you use, and there are no minimum fees or mandatory service usage. Additionally, Storage Gateway offers free data transfer-in costs from gateway applications.
AWS serverless services provision the minimum number of resources needed and follow an event-driven architecture to accomplish tasks, helping you minimize your overall resource consumption and maximize your resource utilization, leading to improved sustainability. Additionally, Amazon Textract only runs when handling API requests, and DataSync lets you move your data to sustainable and cost-efficient cloud storage, reducing the overall footprint of your data storage.
A detailed guide is provided to experiment and use within your AWS account. Each stage of building the Guidance, including deployment, usage, and cleanup, is examined to prepare it for deployment.
The sample code is a starting point. It is industry validated, prescriptive but not definitive, and a peek under the hood to help you begin.
The sample code; software libraries; command line tools; proofs of concept; templates; or other related technology (including any of the foregoing that are provided by our personnel) is provided to you as AWS Content under the AWS Customer Agreement, or the relevant written agreement between you and AWS (whichever applies). You should not use this AWS Content in your production accounts, or on production or other critical data. You are responsible for testing, securing, and optimizing the AWS Content, such as sample code, as appropriate for production grade use based on your specific quality control practices and standards. Deploying AWS Content may incur AWS charges for creating or using AWS chargeable resources, such as running Amazon EC2 instances or using Amazon S3 storage.
References to third-party services or organizations in this Guidance do not imply an endorsement, sponsorship, or affiliation between Amazon or AWS and the third party. Guidance from AWS is a technical starting point, and you can customize your integration with third-party services when you deploy the architecture.