AWS Public Sector Blog

Extracting, analyzing, and interpreting information from Medicaid forms with AWS

Extracting, analyzing, and interpreting information from Medicaid forms with AWS

What if paper forms could be processed at the same speed as digital forms? What if their contents could be automatically entered in the same database as the digital forms? Medicaid agencies could analyze data in near real time and drive actionable insights on a single dashboard. Whether a provider submits claims electronically or on paper, the claim could be adjudicated using the same process and analyzed the same way, saving both time and money.

By using artificial intelligence (AI) and machine learning (ML) services from Amazon Web Services (AWS), Medicaid agencies can create this streamlined solution. Plus, AWS makes the adoption of solutions like this even simpler by providing no code or low code serverless services.

In this walkthrough, learn how to extract, analyze, and interpret relevant information from paper-based Medicaid claims forms. This solution incorporates a dashboard that provides visuals and insights from data extracted from both electronic and paper-based forms. As providers upload paper-based forms into the system, the dashboard reflects these inputs in near real-time.

This walkthrough uses a Medicaid claims form as the example form to be processed, but this solution is applicable to any paper-based form, including Medicaid eligibility forms. By using the solution in this walkthrough, users can automate the processing of those paper based forms, reducing the time spent to do clerical work like data entry.

Solution overview: Extracting, analyzing, and interpreting information from Medicaid forms with AWS

This walkthrough outlines how to build a solution to extract, analyze, and interpret information input into paper-based forms using AWS. This solution is built around two key services: Amazon Textract, which supports the data extraction, and Amazon QuickSight, which enables the data interpretation.

Amazon Textract is an ML service that automatically extracts text, handwriting, and data from scanned documents. It goes beyond simple optical character recognition (OCR) to identify, understand, and extract data from forms and tables. Amazon Textract uses ML to read and process a wide range of documents, extracting text, handwriting, tables, and other data without any manual intervention. Users can automate document processing and action on the extracted information.

Amazon QuickSight is a serverless, cloud-based business intelligence (BI) service that brings data insights to teams and end-users through ML-powered dashboards and data visualizations, which can be accessed via QuickSight or embedded in apps and portals that users access.

Figure 1 illustrates the architecture of the solution, described in more detail below:

Figure 1. The architecture overview of AWS services involved in the workflow, described in detail in the following section.

Figure 1. The architecture overview of AWS services involved in the workflow, described in detail in the following section.

  1. Case worker uploads a scanned image of the paper based claim to an Amazon Simple Storage Service (Amazon S3) bucket. This triggers an Amazon Simple Notification Service (Amazon SNS) message and invokes an AWS Lambda function.
  2. The Lambda function’s call to Amazon Textract is made asynchronously. This is especially useful for processing multi-page documents, which may take time, avoiding issues related to timeouts. The response to the asynchronous operation is a job identifier (JobId). The Amazon Textract asynchronous API is configured to send a notification to Amazon SNS when the documents finish processing.
  3. The notification to Amazon SNS triggers a Lambda function that uses the JobId to retrieve processed content from Amazon Textract and save the results to another bucket in Amazon S3.
  4. AWS Glue catalogs the data and Amazon Athena queries the extracted data.
  5. Amazon QuickSight displays the dashboard that displays the visuals and insights derived from the extracted data in near real-time.

Solution deployment

For this walkthrough, you must have the following prerequisites:

1. An AWS account

2. AWS console access to launch AWS CloudFormation templates

3. Permission to create and modify AWS Lambda functions

4. A QuickSight account before running the AWS CloudFormation templates.

5. Understanding of the following AWS services:

a. AWS Lake Formation

b. AWS Lambda

c. AWS Identity and Access Management (IAM)

d. Amazon QuickSight

e. Amazon Textract

f. Amazon DynamoDB

g. AWS Glue

h. Amazon Athena

Section 1: Deploy AWS CloudFormation templates

Set up the customer-managed key

First, create a customer managed key with the following steps:

  1. Navigate to the GitHub repo for the template.
  2. Choose “Raw.” Open the context (right-click) menu and then choose Save as. Save the file on your local machine as “textract-demo-cmk.yaml.”
  3. In the AWS Management Console, navigate to the CloudFormation dashboard, and create a CloudFormation stack using the saved file.

Figure 2. AWS CloudFormation stack successfully created. This creates an AWS KMS customer managed key.

Figure 2. AWS CloudFormation stack successfully created. This creates an AWS KMS customer managed key.

  1. Once the stack creation process completes, check the customer managed key in the AWS Key Management Service (AWS KMS) dashboard within the console.

Figure 3. The customer managed key appears within the Customer managed keys section within the AWS KMS dashboard.

Figure 3. The customer managed key appears within the Customer managed keys section within the AWS KMS dashboard.

Set up data ingestion and data extraction workflow

Use the following CloudFormation template to launch the necessary AWS Lambda functions, Amazon DynamoDB tables, Amazon S3 buckets, Amazon SNS topic, and required AWS IAM roles to create the data ingestion and extraction workflow:

  1. Navigate to the GitHub repo for the template.
  2. Choose “Raw.” Open the context (right-click) menu and then choose Save as. Save the file on your local machine as “textract-demo-process-documents.yaml.”
  3. Navigate to the CloudFormation dashboard within the AWS console and create a CloudFormation stack using the saved file.

Figure 4. CloudFormation stack successfully created. This creates Lambda functions, Amazon DynamoDB tables, Amazon S3 Buckets, Amazon SNS Topic, and required IAM roles.

Figure 4. CloudFormation stack successfully created. This creates Lambda functions, Amazon DynamoDB tables, Amazon S3 buckets, Amazon SNS topics, and required IAM roles.

  1. Once the stack creation process completes, you can check the Lambda functions, Amazon S3 buckets, Amazon DynamoDB tables, Amazon SNS topics, and required IAM roles in the respective service dashboards.

Figure 5. AWS Lambda functions listed in the Lambda console.

Figure 5. AWS Lambda functions listed in the Lambda console.

Figure 6. Amazon S3 buckets listed in the Amazon S3 console.

Figure 6. Amazon S3 buckets listed in the Amazon S3 console.

Figure 7. Amazon DynamoDB tables listed in the Amazon DynamoDB console.

Figure 7. Amazon DynamoDB tables listed in the Amazon DynamoDB console.

Figure 8. Amazon SNS topics listed in the Amazon SNS console.

Figure 8. Amazon SNS topics listed in the Amazon SNS console.

Set up data analytics

Then, create the data analytics workflow using the following CloudFormation template, which launches the AWS Glue resources and Athena sample queries.

  1. Navigate to the GitHub repo for the template.
  2. Choose “Raw.” Open the context (right-click) menu and then choose Save as. Save the file on your local machine as “textract-demo-analytics.yaml”.
  3. Navigate to the CloudFormation dashboard within the AWS console and create a CloudFormation stack using the saved file.

Figure 9. AWS CloudFormation stack successfully created. This creates AWS Glue resources and Athena Sample queries.

Figure 9. AWS CloudFormation stack successfully created. This creates AWS Glue resources and Athena Sample queries.

  1. Once the stack creation process completes, you can check the AWS Glue and Athena resources in their respective service dashboards.

Figure 10. Database in AWS Glue console.

Figure 10. Database in AWS Glue console.

Figure 11. Crawlers in AWS Glue console.

Figure 11. Crawlers in AWS Glue console.

Figure 12. Tables in AWS Glue console.

Figure 12. Tables in AWS Glue console.

Figure 13. Tables in Athena console.

Figure 13. Tables in Athena console.

  1. In the Athena dashboard, select “MyCustomWorkGroup” in the drop-down menu in the top navigation bar. Then, choose the Saved Queries tab and choose ID which will load the query in the Editor tab.

Figure 14. Saved Queries in Athena console.

Figure 14. Saved Queries in Athena console.

Figure 15. Saved Queries in Athena Query Editor tab.

Figure 15. Saved Queries in Athena Query Editor tab.

  1. Select Run to launch the query. This creates a new view in Athena.

Figure 16. Successful view creation in Athena console.

Figure 16. Successful view creation in Athena console.

Section 2: Set up visualization workflow with Amazon QuickSight

Now, set up Amazon QuickSight to create the visualization and interpretation component of the solution.

  1. Set up QuickSight:

a. Navigate to the Amazon QuckSight dashboard in the console and sign up for QuickSight if it is not already enabled.

b. You only need a standard license to support this solution.

c. On the Create your QuickSight account page, make sure that access and auto-discovery is enabled for AWS Identity and Access Management (IAM), Amazon Athena, and the Amazon S3 bucket for Athena query results.

2. Create a dataset in QuickSight:

a. In the top navigation bar, select the person-shaped Account icon and verify that your QuickSight console is connected to the same AWS Region as your deployment.

b. Select Datasets in the left navigation area, and choose New dataset.

c. Select Athena and then enter a data source name; choose “MyCustomWorkgroup” from the Athena workgroup drop down options. Select Create data source.

d. In the “Choose your table” window, select AwsDataCatalog as the catalog and select documenttextract as the database. Select document_view as the table that you can visualize. Choose Select.

Figure 17. Datasource for dataset selection screen in QuickSight.

Figure 17. Datasource for dataset selection screen in QuickSight.

e. In the “Finish dataset creation” window, choose Directly query your data and then select Visualize.

Figure 18. Choosing to directly query the data source.

Figure 18. Choosing to directly query the data source.

  1. Create analysis sheet:

a. Navigate to the QuickSight main page by selecting QuickSight in the top navigation bar.

b. In the left navigation bar, select Analysis.

c. Select New analysis.

d. Choose “document_view” as the dataset.

e. Select Use in analysis.

f. In the “New sheet” window, select Create.

Figure 19. Initiating analysis board creation.

Figure 19. Initiating analysis board creation.

  1. Prepare the dashboard:

a. The columns from the dataset is now available on the left side pane. This includes columns such as “charge,” “city,” and more matches to the fields extracted from the paper form.

Figure 20. Analysis sheet in QuickSight.

Figure 20. Analysis sheet in QuickSight.

b. Drag and drop the fields as needed into the sheet to prepare the dashboard. You can customize this dashboard based on your needs.

c. For this walkthrough, set up a graph that shows charges for each providerid. Drag and drop “dateofservice” to Y axis and “totalcharge” to Value.

Figure 21. Charges by date of service.

Figure 21. Charges by date of service.

d. Publish the analysis as a dashboard by selecting the share button in the top navigation bar, then select Publish dashboard.

Figure 22. The share button to publish the dashboard.

Figure 22. The share button to publish the dashboard.

Figure 23. Publish the analysis as a dashboard.

Figure 23. Publish the analysis as a dashboard.

Section 3: Test the solution

Now, test what you deployed.

  1. Upload the document that you want to use to the textract-demo-docupload-us-east-1-XXXXXXXXXX Amazon S3 bucket. This walkthrough uses this sample Medicaid claim document. Refer to how to upload an object to an Amazon S3 bucket if needed.

Figure 24. The document uploaded to the Amazon S3 bucket.

Figure 24. The document uploaded to the Amazon S3 bucket.

  1. The Lambda function (textract-demo-functions-TextractCallback) reads the response data, key values, and table values from Amazon Textract and uploads them to an Amazon S3 bucket (textract-demo-output- XXXXXXXXXXXX).
  2. Verify the Amazon Textract output data (e.g, keyvalues, tablesvalues) for the documents in the Amazon S3 bucket (textract-demo-output- XXXXXXXXXXXX). Refer here on how to access an objects from the Amazon S3 console.

Figure 25. Output data from Amazon Textract is uploaded to the Amazon S3 bucket: Key values.

Figure 25. Output data from Amazon Textract is uploaded to the Amazon S3 bucket: Key values.

Figure 26. Output data from Amazon Textract is uploaded to the Amazon S3 bucket: Table values.

Figure 26. Output data from Amazon Textract is uploaded to the Amazon S3 bucket: Table values.

  1. Log in to the Athena console and launch the view you created to see the extracted data from the document.

Figure 27. Query results from Amazon Athena.

Figure 27. Query results from Amazon Athena.

  1. Refresh the dashboard to see data in near real time.

Figure 28. QuickSight dashboard featuring the insights from the extracted data from the document.

Figure 28. QuickSight dashboard featuring the insights from the extracted data from the document.

Clean up

This is an entirely serverless solution, so costs are directly related to usage. The persistent cost to maintaining this application when not in use is in the user data stored in Amazon DynamoDB and Amazon S3.

To avoid incurring future charges, delete the resources you created in this walkthrough. To deprovision all resources:

  1. In the console, navigate to the AWS CloudFormation dashboard and select the stack specified in the deployment section of this post.
  2. Choose Delete near the top and stack title to begin deleting the stack. This process will take approximately five minutes.

NOTE: This will NOT delete the contents of the Amazon DynamoDB table and Amazon S3 bucket. Amazon S3 bucket lifecycle configuration can be used to optimize the cost of storage.

Conclusion

This walkthrough outlines how to create a solution to extract relevant information from a paper-based Medicaid claims form. This solution then analyzes and helps interpret the extracted information with the use of a dashboard, in near real-time.

Using a solution like this one, a state Medicaid agency may run analytics on data extracted out of paper-based forms, in near real time, without manual intervention to key in the data. All the data from the forms drives the insights on a dashboard. With this, the Medicaid agency can adjudicate and analyze claims in the same way, regardless of whether the provider submitted them electronically or via paper. This can help agencies save hours in staff resources, as data is extracted automatically and staff doesn’t have to manually enter data for each form.

AWS is ready to support Medicaid agencies as they transform to meet Public Health Emergency (PHE) unwinding efforts. Contact the AWS Public Sector Team to learn more.

Health and human services (HHS) agencies across the country are using the power of AWS to unlock their data, improve citizen experience, and deliver better outcomes. See more Health and Human Services Cloud Resources here. Learn more about how governments use AWS to innovate for their constituents, design engaging constituent experiences, and more at the AWS Cloud for State and Local Governments hub.

Read more about AWS for Medicaid agencies:

Subscribe to the AWS Public Sector Blog newsletter to get the latest in AWS tools, solutions, and innovations from the public sector delivered to your inbox, or contact us.

Please take a few minutes to share insights regarding your experience with the AWS Public Sector Blog in this survey, and we’ll use feedback from the survey to create more content aligned with the preferences of our readers.

Vignesh Srinivasan

Vignesh Srinivasan

Vignesh is a senior solutions architect at Amazon Web Services (AWS). He previously worked with the Centers for Medicare & Medicaid Services (CMS), including helping to implement the Federal Health Exchange as part of the Affordable Care Act. He was also on the team that fixed healthcare.gov and successfully migrated the system to AWS. He has a master’s degree from Rochester Institute of Technology and an MBA from the University of Maryland.

Venkata Kampana

Venkata Kampana

Venkata is a senior solutions architect in the Amazon Web Services (AWS) Health and Human Services team and is based in Sacramento, Calif. In this role, he helps public sector customers achieve their mission objectives with well-architected solutions on AWS.