AWS Machine Learning Blog

Automate document validation and fraud detection in the mortgage underwriting process using AWS AI services: Part 1

In this three-part series, we present a solution that demonstrates how you can automate detecting document tampering and fraud at scale using AWS AI and machine learning (ML) services for a mortgage underwriting use case.

This solution rides on a more significant global wave of increasing mortgage fraud, which is worsening as more people present fraudulent proofs to qualify for loans. Data suggests high-risk and suspected fraudulent mortgage activity is on the rise, noting a 52% increase in suspected fraudulent mortgage applications since 2013. (Source: Equifax)

Part 1 of this series discusses the most common challenges associated with the manual lending process. We provide concrete guidance on addressing this issue with AWS AI and ML services to detect document tampering, identify and categorize patterns for fraudulent scenarios, and integrate with business-defined rules while minimizing human expertise for fraud detection.

In Part 2, we demonstrate how to train and host a computer vision model for tampering detection and localization on Amazon SageMaker. In Part 3, we show how to automate detecting fraud in mortgage documents with an ML model and business-defined rules using Amazon Fraud Detector.

Challenges associated with the manual lending process

Organizations in the lending and mortgage industry receive thousands of applications, ranging from new mortgage applications to refinancing an existing mortgage. These documents are increasingly susceptible to document fraud as fraudsters attempt to exploit the system and qualify for mortgages in several illegal ways. To be eligible for a mortgage, the applicant must provide the lender with documents verifying their employment, assets, and debts. Changing borrowing rules and interest rates can drastically alter an applicant’s credit affordability. Fraudsters range from blundering novices to near-perfect masters when creating fraudulent loan application documents. Fraudulent paperwork includes but is not limited to altering or falsifying paystubs, inflating information about income, misrepresenting job status, and forging letters of employment and other key mortgage underwriting documents. These fraud attempts can be challenging for mortgage lenders to capture.

The significant challenges associated with the manual lending process include but not limited to:

  • The necessity for a borrower to visit the branch
  • Operational overhead
  • Data entry errors
  • Automation and time to resolution

Finally, the underwriting process, or the analysis of creditworthiness and the loan decision, takes additional time if done manually. Again, the manual consumer lending process has some advantages, such as approving a loan that requires human judgment. The solution will provide automation and risk mitigation in mortgage underwriting which will help reduce time and cost as compared to the manual process.

Solution overview

Document validation is a critical type of input for mortgage fraud decisions. Understanding the risk profile of the supporting mortgage documents and driving insights from this data can significantly improve risk decisions and is central to any underwriter’s fraud management strategy.

The following diagram represents each stage in a mortgage document fraud detection pipeline. We walk through each of these stages and how they aid towards underwriting accuracy (initiated with capturing documents to classify and extract required content), detecting tampered documents, and finally using an ML model to detect potential fraud classified according to business-driven rules.

Conceptual Architecture

In the following sections, we discuss the stages of the process in detail.

Document classification

With intelligent document processing (IDP), we can automatically process financial documents using AWS AI services such as Amazon Textract and Amazon Comprehend.

Additionally, we can use the Amazon Textract Analyze Lending API in processing mortgage documents. Analyze Lending uses pre-trained ML models to automatically extract, classify, and validate information in mortgage-related documents with high speed and accuracy while reducing human error. As depicted in the following figure, Analyze Lending receives a loan document and then splits it into pages, classifying them according to the type of document. The document pages are then automatically routed to Amazon Textract text processing operations for accurate data extraction and analysis.

Amazon Textract Analyze Lending API

The Analyze Lending API offers the following benefits:

  • Automated end-to-end processing of mortgage packages
  • Pre-trained ML models across a variety of document types in a mortgage application package
  • Ability to scale on demand and reduce reliance on human reviewers
  • Improved decision-making and significantly lower operating costs

Tampering detection

We use a computer vision model deployed on SageMaker for our end-to-end image forgery detection and localization solution, which means it takes a testing image as input and predicts pixel-level forgery likelihood as output.

Most research studies focus on four image forgery techniques: splicing, copy-move, removal, and enhancement. Both splicing and copy-move involve adding image content to the target (forged) image. However, the added content is obtained from a different image in splicing. In copy-move, it’s from the target image. Removal, or inpainting, removes a selected image region (for example, hiding an object) and fills the space with new pixel values estimated from the background. Finally, image enhancement is a vast collection of local manipulations, such as sharpening, brightness, and adjustment.

Depending on the characteristics of the forgery, different clues can be used as the foundation for detection and localization. These clues include JPEG compression artifacts, edge inconsistencies, noise patterns, color consistency, visual similarity, EXIF consistency, and camera model. However, real-life forgeries are more complex and often use a sequence of manipulations to hide the forgery. Most existing methods focus on image-level detection, whether or not an image is forged, and not on localizing or highlighting a forged area of the document image to aid the underwriter in making informed decisions.

We walk through the implementation details of training and hosting a computer vision model for tampering detection and localization on SageMaker in Part 2 of this series. The conceptual CNN-based architecture of the model is depicted in the following diagram. The model extracts image manipulation trace features for a testing image and identifies anomalous regions by assessing how different a local feature is from its reference features. It detects forged pixels by identifying local anomalous features as a predicted mask of the testing image.

Computer vision tampering detection

Fraud detection

We use Amazon Fraud Detector, a fully managed AI service, to automate the generation, evaluation, and detection of fraudulent activities. This is achieved by generating fraud predictions based on data extracted from the mortgage documents against ML fraud models trained with the customer’s historical (fraud) data. You can use the prediction to trigger business rules in relation to underwriting decisions.

Amazon Fraud Detector Process

Defining the fraud prediction logic involves the following components:

  • Event types – Define the structure of the event
  • Models – Define the algorithm and data requirements for predicting fraud
  • Variables – Represent a data element associated with the fraud detection event
  • Rules – Tell Amazon Fraud Detector how to interpret the variable values during fraud prediction
  • Outcomes – The results generated from a fraud prediction
  • Detector version – Contains fraud prediction logic for the fraud detection event

The following diagram illustrates the architecture of this component.

Amazon Fraud Detector Detailed Process

After you deploy your model, you may evaluate its performance scores and metrics based on the prediction explanations. This helps identify top risk indicators and analyze fraud patterns across the data.

Third-party validation

We integrate the solution with third-party providers (via API) to validate the extracted information from the documents, such as personal and employment information. This is particularly useful to cross-validate details in addition to document tampering detection and fraud detection based on the historical pattern of applications.

The following architecture diagram illustrates a batch-oriented fraud detection pipeline in mortgage application processing using various AWS services.

Fraud Detection End to End Architecture

The workflow includes the following steps:

  1. The user uploads the scanned documents into Amazon Simple Storage Service (Amazon S3).
  2. The upload triggers an AWS Lambda function (Invoke Document Analysis) that calls the Amazon Textract API for text extraction. Additionally, we can use the Amazon Textract Analyze Lending API to automatically extract, classify, and validate information.
  3. On completion of text extraction, a notification is sent via Amazon Simple Notification Service (Amazon SNS).
  4. The notification triggers a Lambda function (Get Document Analysis), which invokes Amazon Comprehend for custom document classification.
  5. Document analysis results that have a low confidence score to are routed to human reviewers using Amazon Augmented AI (Amazon A2I).
  6. Output from Amazon Textract and Amazon Comprehend is aggregated using a Lambda function (Analyze & Classify Document).
  7. A SageMaker inference endpoint is called for a fraud prediction mask of the input documents.
  8. Amazon Fraud Detector is called for a fraud prediction score using the data extracted from the mortgage documents.
  9. The results from Amazon Fraud Detector and the SageMaker inference endpoint are aggregated into the loan origination application.
  10. The status of the document processing job is tracked in Amazon DynamoDB.


This post walked through an automated solution to detect document tampering and fraud in the mortgage underwriting process using Amazon Fraud Detector and other Amazon AI and ML services. This solution allows you to detect fraudulent attempts closer to the time of fraud occurrence and helps underwriters with an effective decision-making process. The flexibility of the implementation allows you to define business-driven rules to classify and capture the fraudulent attempts customized to specific business needs.

In Part 2 of this series, we provide the implementation details for detecting document tampering using SageMaker. In Part 3, we demonstrate how to implement the solution on Amazon Fraud Detector.

About the authors

Anup Ravindranath
is a Senior Solutions Architect at Amazon Web Services (AWS) based in Toronto, Canada working with Financial Services organizations. He helps customers to transform their businesses and innovate on cloud.

Vinnie Saini is a Senior Solutions Architect at Amazon Web Services (AWS) based in Toronto, Canada. She has been helping Financial Services customers transform on cloud, with AI and ML driven solutions laid on strong foundational pillars of Architectural Excellence.