AWS Partner Network (APN) Blog

Automating the Know Your Customer Process Using Capgemini’s AI-Powered Solution on AWS

By Arpna Gupta, Solution Director – Capgemini
By Kashik Sharma, Data Scientist – Capgemini
By Prateek Agrawal, Partner Solutions Architect – AWS
By Vikas Nambiar, Partner Solutions Architect – AWS

Capgemini-AWS-Partners-3
Capgemini
Connect with Capgemini-2

Financial institutions use “Know Your Customer” (KYC) as the process of identifying and verifying a customer’s identity prior to providing any financial service. The KYC process is driven by regulatory requirements as a measure against financial crimes and illicit activities.

As part of this process, financial organizations are required to validate the authenticity of the documents furnished to them by a customer, and to establish identity of the user requesting a new account, loans, or other financial services offered by the organization.

The KYC process entails multiple challenges in gathering and validating customers’ financial and personal data, including name, address, date of birth, and identity cards to support this data. These challenges include data quality and accuracy issues, and most importantly the ability to ensure documents submitted are not fraudulent in nature.

Fake/fraudulent documentation has been a top fraud experienced by the banking industry over the last few years and is a growing concern for financial institutions. While fraudsters are exploiting technology to circumvent financial systems, however, technology can also be an effective tool to identify and detect these risks.

In this post, we will illustrate how Capgemini’s “Know Your Customer” validation solution helps institutions automate KYC identity documents validation, extraction of information present in them, and forgery detection using artificial intelligence (AI) with Amazon SageMaker and Amazon Textract.

The solution provides customers an extensible automated solution for validating government-issued documents, while reducing the overall time and manual intervention required to onboard customers.

Capgemini is an AWS Premier Tier Services Partner and Managed Services Provider (MSP) that is at the forefront of innovation to address a breadth of client opportunities across cloud, digital, and platforms.

KYC Validator Solution Architecture

Equipped with deep learning capabilities, Capgemini’s KYC validation solution can detect forged documents, providing a mechanism to validate a customer’s true identity and mitigating risk of financial crimes. Leveraging Amazon Textract, it enables users to extract information present in the KYC identity documents.

Capgemini’s KYC validator provides the following capabilities:

  • Document pre-processing
  • Extraction of information from KYC document
  • KYC document classification
  • Forgery detection
  • REST API interfaces for integration to end users and systems

Capgemini-KYC-Validator-AI-1

Figure 1 – Capgemini KYC solution capabilities.

The KYC validator solution has been implemented using AWS services, as per the high-level architecture as shown below.

Capgemini-KYC-Validator-AI-2

Figure 2 – Capgemini KYC validator solution architecture.

The diagram in Figure 2 shows how an end user or system can invoke the solution through its REST API endpoint enabled on Amazon API Gateway. When the API is invoked, Amazon API Gateway triggers an AWS Lambda function that executes subsequent stages of information extraction and then detection.

Information Extraction Stage

The information extraction stage extracts all of the pertinent text information from the identity document and then classifies the identity document. The process involves the following steps:

Capgemini-KYC-Validator-AI-3

Figure 3 – Stages within the information extraction stage.

The following AWS services are used in the stage’s implementation:

  • Amazon API Gateway: This is a managed service that helps in creating, maintaining, and protecting RESTful APIs and WebSocket APIs. Capgemini’s solution uses the service to host REST APIs that act as entry points for customer systems, or users that require a KYC validation completed.
  • AWS Lambda: This provides the solution with a serverless, event-driven compute capability to run the following code modules as Lambda functions. These functions are invoked by Amazon API Gateway when a user or system invokes the API hosted on API Gateway.
    • KYC validator function: When invoked by Amazon API Gateway, the KYC validator Lambda function passes the identity document to both information extraction and fake detector stages in parallel.
    • Text extraction function: This Lambda function uses Amazon Textract and extracts text data from the given identity document. The extracted text is then passed as input to the document classification function for further processing.
    • Document classification function: Based on the data received, this Lambda function classifies the parent document using regex-based logic on the extracted textual data to classify the document into one of the predefined KYC identity document types. In the KYC validator’s support for India identify documents, classifications supported are driving license, voter ID card, permanent account number (PAN) card, and Aadhar card. Document classification information is then passed to the next module extract and displays relevant details with respect to the type of document.
    • Information extraction function: The identity cards have different formats and information based on their issuing country and state. Based on the document type, this Lambda function uses the templates and position-based heuristics to extract the relevant information from the document, including for document types that have high variation in field placements for document types based on issuing state. For Indian states and union territories, the driving license from two different states and their template/format are entirely different (as shown in Figure 4 below). Similarly, in Figure 4 you can also see different formats of PAN cards from two different states.

Capgemini-KYC-Validator-AI-4

Figure 4 – Driving license cards (top) and PAN cards (bottom) from different states in India.

Fake Detection Stage

To detect identity document authenticity, the document is pre-processed and a deep learning-based Amazon SageMaker model is trained using the pre-processed dataset. This step classifies the document as forged or original.

Ones trained, the model endpoint is exposed through Amazon SageMaker for systems to consume the service, and predicts authenticity of the supplied document.

The results from the previous two stages are combined and provided as responses to the requesting entity, which completes the KYC validation process.

Capgemini-KYC-Validator-AI-5

Figure 5 – Response sample of the original document.

Conclusion

In this post, you have seen how Capgemini’s “Know Your Customer” (KYC) solution validates artefacts by extracting information from identification documents and then validating their originality using AWS managed and serverless services.

The solution leverages a serverless, event-based architecture that supports parallel executions, ensuring cost is kept to a minimum while reducing the time required to onboard customers at scale.

Furthermore, the KYC solution is built to be consumed in batch and real-time mode, with capabilities to expand its coverage to different country and regional requirements. It can also be extended to make data available with customer content to fill web forms.

Reach out to Capgemini and learn how you can implement this solution and enable KYC documentation authenticity checks to reduce complexity and manual errors associated with traditional “Know Your Customer” validation processes.

.
Capgemini-APN-Blog-Connect-2023
.


Capgemini – AWS Partner Spotlight

Capgemini is an AWS Premier Tier Services Partner and MSP that’s at the forefront of innovation to address a breadth of client opportunities across cloud, digital, and platforms.

Contact Capgemini | Partner Overview | AWS Marketplace | Case Studies