AWS Machine Learning Blog

How Cortica used Amazon HealthLake to get deeper insights to improve patient care

This is a guest post by Ernesto DiMarino, who is Head of Enterprise Applications and Data at Cortica.

Cortica is on a mission to revolutionize healthcare for children with autism and other neurodevelopmental differences. Cortica was founded to fix the fragmented journey families typically navigate while seeking diagnoses and therapies for their children. To bring their vision to life, Cortica seamlessly blends neurology, research-based therapies, and technology into comprehensive care programs for the children they serve. This coordinated approach leads to best-in-class member satisfaction and empowers families to achieve long-lasting, transformative results.

In this post, we discuss how Cortica used Amazon HealthLake to create a data analytics hub to store a patient’s medical history, medication history, behavioral assessments, lab reports, and genetic variants in Fast Healthcare Interoperability Resource (FHIR) standard format. They create a composite view of the patient’s health journey and apply advance analytics to understand trends in patient progression with Cortica’s treatment approach.

Unifying our data

The challenges faced by Cortica’s team of three data engineers are no different than any other healthcare enterprise. Cortica has two EHRs (electronic health records), 6 specialties, 420 providers, and a few home-grown data capturing questionnaires, one of which has 842 questions. With multiple vendors providing systems and data solutions, Cortica finds itself in an all-too-common situation in the healthcare industry: volumes of data with multiple formats and complexity in matching patients from system to system. Cortica looked to solve some of this complexity by setting up a data lake on AWS.

Cortica’s team imported all data into an Amazon Simple Storage Service (Amazon S3) data lake using Python extract, transform, and load (ETL), orchestrating it with Apache Airflow. Additionally, they maintain a Kimball model star schema for financial and operational analytics. The data sizes are a respectable 16 terabytes of data. Most of the file formats delivered to the data lake are in CSV, PDF, and Parquet, all of which the data lake is well equipped to manage. However, the data lake solution is only part of the story. To truly derive value from the data, Cortica needed a standardized model to deal with the healthcare languages and vocabularies, as well as the many industry standardized code sets.

Deriving deeper value from data

Although the data lake and star schema data model work well for some financial and operational analytics, the Cortica team found that it was challenging to dive deeper into the data for meaningful insights to share with patients and their caregivers. Some of the questions they wanted to answer included:

  • How can Cortica present to caregivers a composite view of the patient’s healthcare journey with Cortica?
  • How can they show that patients are getting better over time using data from standardized assessments, medical notes, and goals tracking data?
  • How do patients with specific comorbidities progress to their goals compared to patients without comorbidities?
  • Can Cortica show how patients have better outcomes through the unique multispecialty approach?
  • Can Cortica partner with industry researchers sharing de-identified data to help further treatment for autism and other neurodevelopmental differences?

Before implementing the data lake, staff would read through PDFs, Excel, and vendor systems to create Excel files to capture the data points of interest. Interrogating the EHRs and manually transcribing documents and notes into a large spreadsheet for analysis would take months of work. This process wasn’t scalable and made it difficult to reproduce analytics and insights.

With the data lake, Cortica found that they still lacked the ability to quickly access the volumes of data, as well as join the various datasets together to make complex analysis. Because healthcare data is so driven by medical terminologies, they needed a solution that could help unify data from different healthcare fields to present a clear patient journey through the different specialties Cortica offers. To quickly derive this deeper value, they chose Amazon HealthLake to help provide this added layer of meaning to the data.

Cortica’s solution

Cortica adopted Amazon HealthLake to help standardize data and scale insights. Through implementing the FHIR standard, Amazon HealthLake provided a faster solution to standardizing data with a far less complex maintenance pathway. They were able to quickly load a basic set of resources into Amazon HealthLake. This allowed the team to create a proof of concept (POC) for starting to answer the bigger set of questions focused on their patient population. In a 3-day process, they were able to develop a POC for understanding their patient’s journey from the perspective of their behavior therapy goals and medical comorbidities. Most of the 3-day process was spent on two days fine-tuning the queries in Amazon QuickSight and making visualizations of the data. From a data to visual perspective, the data was ready in hours not months. The following diagram illustrates their pipeline.

Getting to insights faster

Cortica was able to quickly see across their patient population the length of time it took for patients to attain their goals. The team could then break it down by age-phenotype (a designated age grouping for comparing Cortica’s population). They saw the grouping of patients that were meeting their goals in 4, 6, 9, and 12-month intervals. They further sliced and diced the visuals by layering in a variety of categories such as goal status. Until now, staff and clinicians were only able to look at an individual’s data rather than population data. They couldn’t get these types of insights. The manual chart clinician abstraction process for this goal analysis would have taken months to complete.

The following charts show two visualizations of their goals.

As a fast follow with this POC, Cortica wanted to see how medical comorbidities impacted goal attainment. The specific medical comorbidities of interest were seizures, constipation, and sleep disturbances, because these are commonly found within this patient population. Data for the FHIR Condition Resource was loaded into the pipeline, and the team was able to identify cohorts by comorbidites and quickly visualize the information. In a few minutes, they had visualizations running, and could see the impact that these comorbidities had on goal attainment (see the following example diagram).

With Amazon HealthLake, the Cortica team can spend more time analyzing and understanding data patterns rather than figuring out where data comes from, formatting it, and joining it into a usable state. The value that Amazon brings to any healthcare organization is the ability to quickly move data, conform data, and start visualizing. With FHIR as the data model, a small non-technical team can request an organization’s integration team to provide a flat file feed of FHIR resources of interest to an S3 bucket. This data is easily loaded to Amazon HealthLake data stores via the AWS Command Line Interface (AWS CLI), AWS Management Console, or API. Next, they can run the data on Amazon Athena to expose the data to an SQL queryable tool and use QuickSight for visualization. Both clinical or non-technical teams can use this solution to start deriving value from data locked within medical records systems.


The tools available through AWS such as Amazon HealthLake, Amazon SageMaker, Athena, Amazon Comprehend Medical, and QuickSight are speeding up the ability to learn more about the patient population Cortica cares for in an actionable timeframe. Analysis that took months to complete can now be completed in days, and in some cases hours. AWS tools can enhance analysis by adding layers of richness to the data in minutes and provide different views of the same analysis. Furthermore, analysis that required chart abstraction can now be done through automated data pipelines, processing hundreds or thousands of documents to derive insights from notes, which were previously only available to a few clinicians.

Cortica is entering a new era of data analytics, one in which the data pipeline and process doesn’t require data engineers and technical staff. What is unknown can be learned from the data, ultimately bringing Cortica closer to its mission of revolutionizing the pediatric healthcare space and empowering families to achieve long-lasting, transformative results.

The content and opinions in this post are those of the third-party author and AWS is not responsible for the content or accuracy of this post.

About the Authors

Ernesto DiMarino is Head of Enterprise Applications and Data at Cortica.

Satadal Bhattacharjee is Sr Manager, Product Management, who leads products at AWS Health AI. He works backwards from healthcare customers to help them make sense of their data by developing services such as Amazon HealthLake and Amazon Comprehend Medical.