AWS Big Data Blog

How healthcare organizations can analyze and create insights using price transparency data

In recent years, there has been a growing emphasis on price transparency in the healthcare industry. Under the Transparency in Coverage (TCR) rule, hospitals and payors to publish their pricing data in a machine-readable format. With this move, patients can compare prices between different hospitals and make informed healthcare decisions. For more information, refer to Delivering Consumer-friendly Healthcare Transparency in Coverage On AWS.

The data in the machine-readable files can provide valuable insights to understand the true cost of healthcare services and compare prices and quality across hospitals. The availability of machine-readable files opens up new possibilities for data analytics, allowing organizations to analyze large amounts of pricing data. Using machine learning (ML) and data visualization tools, these datasets can be transformed into actionable insights that can inform decision-making.

In this post, we explain how healthcare organizations can use AWS services to ingest, analyze, and generate insights from the price transparency data created by hospitals. We use sample data from three different hospitals, analyze the data, and create comparative trends and insights from the data.

Solution overview

As part of the Centers for Medicare and Medicaid Services (CMS) mandate, all hospitals now have their machine-readable file containing the pricing data. As hospitals generate this data, they can use their organization data or ingest data from other hospitals to derive analytics and competitive comparison. This comparison can help hospitals do the following:

  • Derive a price baseline for all medical services and perform gap analysis
  • Analyze pricing trends and identify services where competitors don’t participate
  • Evaluate and identify the services where cost difference is above a specific threshold

The size of the machine-readable files from hospitals is smaller than those generated by the payors. This is due to the complexity of the JSON structure, contracts, and the risk evaluation process on the payor side. Due to this low complexity, the solution uses AWS serverless services to ingest the data, transform it, and make it available for analytics. The analysis of the machine-readable files from payors requires advanced computational capabilities due to the complexity and the interrelationship in the JSON file.


As a prerequisite, evaluate the hospitals for which the pricing analysis will be performed and identify the machine-readable files for analysis. Amazon Simple Storage Service (Amazon S3) is an object storage service offering industry-leading scalability, data availability, security, and performance. Create separate folders for each hospital inside the S3 bucket.

Architecture overview

The architecture uses AWS serverless technology for the implementation. The serverless architecture features auto scaling, high availability, and a pay-as-you-go billing model to increase agility and optimize costs. The architecture approach is split into a data intake layer, a data analysis layer, and a data visualization layer.

The architecture contains three independent stages:

  • File ingestion – Hospitals negotiate their contract and pricing with the payors one time a year with periodical revisions on a quarterly or monthly basis. The data ingestion process copies the machine-readable files from the hospitals, validates the data, and keeps the validated files available for analysis.
  • Data analysis – In this stage, the files are transformed using AWS Glue and stored in the AWS Glue Data Catalog. AWS Glue is a serverless data integration service that makes it easier to discover, prepare, move, and integrate data from multiple sources for analytics, ML, and application development. Then you can use Amazon Athena V3 to query the tables in the Data Catalog.
  • Data visualizationAmazon QuickSight is a cloud-powered business analytics service that makes it straightforward to build visualizations, perform ad hoc analysis, and quickly get business insights from the pricing data. This stage uses QuickSight to visually analyze the data in the machine-readable file using Athena queries.

File ingestion

The file ingestion process works as defined in the following figure. The architecture uses AWS Lambda, a serverless, event-driven compute service that lets you run code without provisioning or managing servers.

TCR Intake Architecture

The following flow defines the process to ingest and analyze the data:

  1. Copy the machine-readable files from the hospitals into the respective raw data S3 bucket.
  2. The file upload to the S3 bucket triggers an S3 event, which invokes a format Lambda function.
  3. The Lambda function triggers a notification when it identifies issues in the file.
  4. The Lambda function ingests the file, transforms the data, and stores the clean file in a new clean data S3 bucket.

Organizations can create new Lambda functions depending on the difference in the file formats.

Data analysis

The file intake and data analysis processes are independent of each other. Whereas the file intake happens on a scheduled or periodical basis, the data analysis happens regularly based on the business operation needs. The architecture for the data analysis is shown in the following figure.

TCR Data Analysis

This stage uses an AWS Glue crawler, the AWS Glue Data Catalog, and Athena v3 to analyze the data from the machine-readable files.

  1. An AWS Glue crawler scans the clean data in the S3 bucket and creates or updates the tables in the AWS Glue Data Catalog. The crawler can run on demand or on a schedule, and can crawl multiple machine-readable files in a single run.
  2. The Data Catalog now contains references to the machine-readable data. The Data Catalog contains the table definition, which contains metadata about the data in the machine-readable file. The tables are written to a database, which acts as a container.
  3. Use the Data Catalog and transform the hospital price transparency data.
  4. When the data is available in the Data Catalog, you can develop the analytics query using Athena. Athena is a serverless, interactive analytics service that provides a simplified, flexible way to analyze petabytes of data using SQL queries.
  5. Any failure during the process will be captured in the Amazon CloudWatch logs, which can be used for troubleshooting and analysis. The Data Catalog needs to be refreshed only when there is a change in the machine-readable file structure or a new machine-readable file is uploaded to the clean S3 bucket. When the crawler runs periodically, it automatically identifies the changes and updates the Data Catalog.

Data visualization

When the data analysis is complete and queries are developed using Athena, we can visually analyze the results and gain insights using QuickSight. As shown in the following figure, once the data ingestion and data analysis are complete, the queries are built using Athena.

TCR Visualization

In this stage, we use QuickSight to create datasets using the Athena queries, build visualizations, and deploy dashboards for visual analysis and insights.

Create a QuickSight dataset

Complete the following steps to create a QuickSight dataset:

  1. On the QuickSight console, choose Manage data.
  2. On the Datasets page, choose New data set.
  3. In the Create a Data Set page, choose the connection profile icon for the existing Athena data source that you want to use.
  4. Choose Create data set.
  5. On the Choose your table page, choose Use custom SQL and enter the Athena query.

After the dataset is created, you can add visualizations and analyze the data from the machine-readable file. With the QuickSight dashboard, organizations can easily perform price comparisons across different hospitals, identify high-cost services, and find other price outliers. In addition, you can use ML in QuickSight to gain ML-driven insights, detect pricing anomalies, and create forecasts based on historical files.

The following figure shows an illustrative QuickSight dashboard with insights comparing the machine-readable files from three different hospitals. With these visuals, you compare the pricing data across hospitals, create price benchmarks, determine cost-effective hospitals, and identify opportunities for competitive advantage.
Quicksight dashboard

Performance, operational, and cost considerations

The solution recommends QuickSight Enterprise for visualization and insights. For QuickSight dashboards, the Athena query results can be stored within the SPICE database for better performance.

The approach uses Athena V3, which offers performance improvements, reliability enhancements, and newer features. Using the Athena query result reuse feature enables caching and query result reuse. When multiple identical queries are run with the query result reuse option, repeat queries run up to five times faster, giving you increased productivity for interactive data analysis. Because you don’t scan the data, you get improved performance at a lower cost.


Hospitals create the machine-readable files on a monthly basis. This approach uses a serverless architecture that keeps the cost low and takes away the challenge of maintenance overhead. The analysis can begin with the machine-readable files for a few hospitals, and they can add new hospitals as they scale. The following example helps understand the cost for different hospital based on the data size:

  • A typical hospital with 100 GB storage/month, querying 20 GB data with 2 authors and 5 readers, costs around $2,500/year

AWS offers you a pay-as-you-go approach for pricing for the vast majority of our cloud services. With AWS you pay only for the individual services you need, for as long as you use them, and without requiring long-term contracts or complex licensing.

TCR Monthly cost


This post illustrated how to collect and analyze hospital-created price transparency data and generate insights using AWS services. This type of analysis and the visualizations provide the framework to analyze the machine-readable files. Hospitals, payors, brokers, underwriters, and other healthcare stakeholders can use this architecture to analyze and draw insights from pricing data published by hospitals of their choice. Our AWS teams can assist you to identify the correct strategy by offering thought leadership and prescriptive technical support for price transparency analysis.

Contact your AWS account team for more help on design and to explore private pricing. If you don’t have a contact with AWS yet, please reach out to be connected with an AWS representative.

About the Authors

Gokhul Srinivasan is a Senior Partner Solutions Architect leading AWS Healthcare and Life Sciences (HCLS) Global Startup Partners. Gokhul has over 19 years of Healthcare experience helping organizations with digital transformation, platform modernization, and deliver business outcomes.

Laks Sundararajan is a seasoned Enterprise Architect helping companies reset, transform and modernize their IT, digital, cloud, data and insight strategies. A proven leader with significant expertise around Generative AI, Digital, Cloud and Data/Analytics Transformation, Laks is a Sr. Solutions Architect with Healthcare and Life Sciences (HCLS).

Anil Chinnam Anil Chinnam is a Solutions Architect in the Digital Native Business Segment at Amazon Web Services(AWS). He enjoys working with customers to understand their challenges and solve them by creating innovative solutions using AWS services. Outside of work, Anil enjoys being a father, swimming and traveling.