
Overview
This data pack is a synthetic healthcare dataset comprised of 2500 patients with 1 year of longitudinal history.
Using a Monte Carlo simulation technique, each synthetic record is modeled to emulate clinically relevant treatment scenarios. During the generation synthetic patients progress through a series of healthcare encounters. It is these encounters and their events that are used to generate the dataset which is comprised of healthcare data messages across several HL7 messaging standards.
These records are highly realistic and even include gaps of information like a patient record in a real-world healthcare ecosystem.
Common conditions that may be contained in the data pack:
• Appendicitis
• Cancer
• Covid
• Deep venous thrombosis
• Diabetes
• Food Insecurity (SDOH)
• Hypertension
• Osteoporosis
• Pregnancy
• Pulmonary embolism
• STIs
• Zika
HL7 Message standards output that may be included for a synthetic patient record:
ADT
Admission, Discharge, Transfer (ADT) messages are used to communicate patient demographics, visit information and patient state at a healthcare facility.
This synthetic data set contains the following number of synthetic Admit, Discharge and Transfer (ADT) messages in HL7 messaging standard version 2.6 with the following event types:
• A01 - Admit / visit notification
• A03 - Discharge/end visit
• A04 - Register a patient
Message count in data set:
3,719 total A01
3,719 total A03
16,006 total A04
VXU
Unsolicited Vaccination Update (VXU) messages are used to receive and send patient’s vaccination information.
This synthetic data set contains synthetic Unsolicited Vaccination Record (VXU) messages in HL7 messaging standard version 2.5.1 with an event type of V04.
Message count in data set: 5,652
ORU
ORUs are unsolicited transmission of an observation message designed contain information about a patient's clinical observations and are used for transmitting patient’s laboratory results to other systems.
This synthetic data set contains synthetic Observation Result (ORU) messages in HL7 messaging standard version 2.5.1 with an event type of R01.
Message count in data set: 2,349
CCD
Continuity of Care Documents (CCD) are XML based markup standard built using HL7 Clinical Document Architecture (CDA) elements. CCD’s carry summary information about the patient within the broader context of the personal health record.
Current data fields in CCD’s:
• Patient demographics
• Medications
• Allergies
• Encounters
• Problem lists
• Diagnosis
• Lab results
• Immunization
• Social History
Message count in data set: 19,725
FHIR
Fast Healthcare Interoperability Resources (FHIR) is a modern standard for exchanging healthcare information electronically. FHIR leverages web standards like HTTP, RESTful APIs, and JSON to enable seamless communication between different healthcare systems, applications, and devices.
FHIR facilitates interoperability by providing a framework for representing and exchanging clinical data in a structured, standardized format, allowing healthcare stakeholders to easily access and share patient information across disparate systems, leading to improved care coordination, streamlined workflows, and enhanced patient outcomes.
The synthetic patient records generated by our statistic population health model generator are output in JSON FHIR version R4 resources.
Message count in data set: 10,038 bundles containing an average of 100 FHIR resources in each bundle (~1,003,800 total FHIR resources)
Details
Introducing multi-product solutions
You can now purchase comprehensive solutions tailored to use cases and industries.
Features and programs
Financing for AWS Marketplace purchases
Pricing
Dimension | Description | Cost/12 months |
|---|---|---|
Product Access | Dimension that grants access to the product for subscribers. | $5,999.00 |
Vendor refund policy
Refunds are not offered for this product.
How can we make this page better?
Legal
Vendor terms and conditions
Content disclaimer
Delivery details
AWS Data Exchange (ADX)
AWS Data Exchange is a service that helps AWS easily share and manage data entitlements from other organizations at scale.
Additional details
You will receive access to the following data sets.
Data set name | Type | Historical revisions | Future revisions | Sensitive information | Data dictionaries | Data samples |
|---|---|---|---|---|---|---|
Synthetic Data Pack: 2500 patients with 1 year of longitudinal data messages. | Last 1 revisions | All future revisions | Not included | Not included |
Resources
Vendor resources
Similar products

