AWS News Blog
Classifying and Extracting Mortgage Loan Data with Amazon Textract
Mortgage loan applications, at least in the United States, comprise around 500 or more pages of diverse documents. In order for applications to be reviewed, all these documents need to be classified, and the data on each form extracted. This isn’t as easy as it might sound! Besides different data structures in each document, the same data element may have different names on different documents—for example, SSN, or Social Security Number, or Tax ID. These three all refer to the same data.
Today, a new Analyze Lending API, for analyzing and classifying the documents contained in mortgage loan application packages, and extracting the data they contain, is available for Amazon Textract. The new API was created in response to requests from major lenders in the industry to help them process applications faster and reduce errors, which improves the end-customer experience and lower operating costs.
Until now, classification and extraction of data from mortgage loan application packages have been human-intensive tasks, although some lenders have used a hybrid approach, using technology such as Amazon Textract. However, customers told us that they needed even greater workflow automation to speed up automation efforts and reduce human error so that their staff could focus on higher-value tasks.
The new API also provides additional value-add services. It’s able to perform signature detection in terms of which documents have signatures and which don’t. It also provides a summary output of the documents in a mortgage application package and identifies select important documents such as bank statements and 1003 forms that would normally be present. The new workflow is powered by a collection of machine learning (ML) models. When a mortgage application package is uploaded, the workflow classifies the documents in the package before routing them to the right ML model, based on their classification, for data extraction.
Test-Driving the New Analyze Lending API
Although the new API is intended for lenders to incorporate into their business process workflows and applications, anyone can actually try it using the Amazon Textract console. This enables you to see how the API classifies documents and extracts the data elements they contain. If you’re interested in the application of machine learning and artificial intelligence, this may be of interest to you even if you’re not processing a mortgage application package.
I start by opening the Amazon Textract console, expanding Analyze Lending in the navigation panel, and then selecting Demo. The demo console immediately analyzes a set of synthetic test files, and outputs the results shown below (you can always restart the demo by clicking the Reset demo button). I get a summary of the analysis results and a document carousel for each of the documents in the package. The demo console also has a handy help panel containing (among other things) a summary of terminology related to the documents.
In the carousel I can see that one document has a signature badge, indicating a signature was detected, but, before taking a look, if I scroll the carousel, I can see that one document was labeled Unclassified:
Returning in the carousel to the document marked with a signature badge, I can see that it’s a check. Signature detection is usually a highly manual process so having the document analysis automatically mark when one is detected is a significant time saver.
Payslips are another document type that customers have told us can be difficult and time-consuming to handle. Selecting the detected payslip in the carousel shows the data extracted from it.
The synthetic data in the demo console provides an overview of how the API is able to analyze, classify, and extract data from the documents in a mortgage application package. However, I can also use my own documents. To do this in the demo console, I click the Upload package button and provide a single file, up to 5 MB, and 10 pages maximum for testing in the demo console, containing documents to analyze. Outside use in the demo console, the API supports documents with up to 3000 pages.
The results, for both the synthetic and your own data, can be downloaded by clicking the Download results button. This provides a .zip file containing four files—two are the raw JSON responses from the API. The other two are CSV-format files containing the summary (
summary.csv) and the extracted data (
extractions.csv). Both files are in key-value format.
The contents of the summary data file, for the synthetic test data, are below.
'DocumentName,'FirstPage,'LastPage "'Payslips","'1","'1" "'Checks","'2","'2" "'Identity document","'3","'3" "'1099 DIV","'4","'4" "'Bank statement","'5","'5" "'W2","'6","'6" "'Unclassified","'7","'7"
Below is an example of the data contained in the extractions file.
'key,'value "'PAY PERIOD END DATE","'7/18/2008" "'PAY DATE","'7/25/2008" "'BORROWER NAME","'JOHN STILES" "'BORROWER ADDRESS","'101 MAIN STREET ANYTOWN, USA 12345" "'COMPANY NAME","'ANY COMPANY CORP." "'COMPANY ADDRESS","'475 ANY AVENUE ANYTOWN, USA 10101" "'FEDERAL FILING STATUS","'Married" "'STATE FILING STATUS","'2" "'CURRENT GROSS PAY","'$ 452.43" "'YTD GROSS PAY","'23,526.80" "'CURRENT NET PAY","'$ 291.90" "'REGULAR HOURLY RATE","'10.00" "'HOLIDAY HOURLY RATE","'10.00" "'WARNINGS MESSAGES NOTES","'EFFECTIVE THIS PAY PERIOD YOUR REGULAR HOURLY RATE HAS BEEN CHANGED FROM $8.00 TO $10.00 PER HOUR." "'CURRENT REGULAR PAY","'320" ...
Try the Analyze Lending API Yourself
The new API is available in all Regions where Amazon Textract is offered but do be aware that the workflow and processing are focused on US-centric documents. Pricing for the new API is the same as for the existing table, form, and queries. You can find more details on the service pricing page. Finally, you can read more on the API in the Developer Guide.
Explore the new Analyze Lending API for yourself today in the Amazon Textract console!— Steve