Artificial Intelligence

Clario streamlines clinical trial software configurations using Amazon Bedrock

This post was co-written with Kim Nguyen and Shyam Banuprakash from Clario.

Clario is a leading provider of endpoint data solutions for systematic collection, management, and analysis of specific, predefined outcomes (endpoints) to evaluate a treatment’s safety and effectiveness in the clinical trials industry, generating high-quality clinical evidence for life sciences companies seeking to bring new therapies to patients. Since Clario’s founding more than 50 years ago, the company’s endpoint data solutions have supported clinical trials more than 30,000 times with over 700 regulatory approvals across more than 100 countries.

This post builds upon our previous post discussing how Clario developed an AI solution powered by Amazon Bedrock to accelerate clinical trials. Since then, Clario has further enhanced their AI capabilities, focusing on innovative solutions that streamline the generation of software configurations and artifacts for clinical trials while delivering high-quality clinical evidence.

Business challenge

In clinical trials, designing and customizing various software systems configurations to manage and optimize the different stages of a clinical trial efficiently is critical. These configurations can range from basic study setup to more advanced features like data collection customization and integration with other systems. Clario uses data from multiple sources to build specific software configurations for clinical trials. The traditional workflow involved manual extraction of necessary data from individual forms. These forms contained vital information about exams, visits, conditions, and interventions. Additionally, the process required the need to incorporate study-related information such as study plans, participation criteria, sponsors, collaborators, and standardized exam protocols from multiple enterprise data providers.

The manual nature of this process created several challenges:

  • Manual data extraction – Team members manually review PDF documents to extract structured data.
  • Transcript challenges – The manual transfer of data from source forms into configuration documents presents opportunities for improvement, particularly in reducing transcription inconsistencies and enhancing standardization.
  • Version control challenges – When studies required iterations or updates, maintaining consistency between documents and systems became increasingly complicated.
  • Fragmented information flow – Data existed in disconnected silos, including PDFs, study detail database records, and other standalone documents.
  • Software build timelines – The configuration process directly impacted the timeline for generating the necessary software builds.

For clinical trials where timing is essential and accuracy is non-negotiable, Clario has implemented rigorous quality control measures to minimize the risks associated with manual processes. While these efforts are substantial, they underscore a business challenge of ensuring precision and consistency across complex study configurations.

Solution overview

To address the business challenge, Clario developed a generative AI-powered solution that Clario refers to as the Clario’s Genie AI Service on AWS. This solution uses the capabilities of large language models (LLMs), specifically Anthropic’s Claude 3.7 Sonnet on Amazon Bedrock. The process is orchestrated using Amazon Elastic Container Service (Amazon ECS) to transform how Clario handled software configuration for clinical trials.

Clario’s approach uses a custom data parser using Amazon Bedrock to automatically structure information from PDF transmittal forms into validated tables. The Genie AI Service centralizes data from multiple sources, including transmittal forms, study details, standard exam protocols, and additional configuration parameters. An interactive review dashboard helps stakeholders verify AI-extracted information and make necessary corrections before finalizing the validated configuration. Post-validation, the system automatically generates a Software Configuration Specification (SCS) document as a comprehensive record of the software configuration. The process culminates with generative AI-powered XML generation, which is then released into Clario’s proprietary medical imaging software for study builds, creating an end-to-end solution that drastically reduces manual effort while improving accuracy in clinical trial software configurations.

The Genie AI Service architecture consists of several interconnected components that work together in a clear workflow sequence, as illustrated in the following diagram.

AWS architecture diagram showing clinical data workflow between corporate data center and AWS Cloud services

The workflow consists of the following steps:

  1. Initiate the study and collect data.
  2. Extract the data using Amazon Bedrock.
  3. Review and validate the AI-generated output.
  4. Generate essential documentation and code artifacts.

In the following sections, we discuss the workflow steps in more detail.

Study initiation and data collection

The workflow begins with gathering essential study information through multiple integrated steps:

  • Study code lookup – Users begin by entering a study code that uniquely identifies the clinical trial.
  • API integration with study database – The study lookup operation makes an API call to fetch study details such as such as study plan, participation criteria, sponsors, collaborators, and more from the study database, establishing the foundation for the configuration.
  • Transmittal form processing – Users upload transmittal forms containing study parameters such as information about exams, visits, conditions, and interventions to the Genie AI Service using the web UI through a secure AWS Direct Connect network.
  • Data structuring – The system organizes information into key categories:
    • Visit information (scheduling, procedures)
    • Exam specifications (protocols, requirements)
    • Study-specific custom fields (vitals, dosing information, and so on)

Data extraction

The solution uses Anthropic’s Claude Sonnet on Amazon Bedrock through API calls to perform the following actions:

  • Parse and extract structured data from transmittal forms
  • Identify key fields and tables within the documents
  • Organize the information into standardized formats
  • Apply domain-specific rules to properly categorize clinical trial visits
  • Extract and validate demographic fields while maintaining proper data types and formats
  • Handle specialized formatting rules for medical imaging parameters
  • Manage document-specific adaptations (such as different processing for phantom vs. subject scans)

Review and validation

The solution provides a comprehensive review interface for stakeholders to validate and refine the AI-generated configurations through the following steps:

  • Interactive review process – Reviewers access the Genie AI Service interface to perform the following actions:
    • Examine the AI-generated output
    • Make corrections or adjustments to the data as necessary
    • Add comments and highlight adjustments made as a feedback mechanism
    • Validate the configuration accuracy
  • Data storage – Reviewed and approved software configurations are saved to Clario’s Genie Database, creating a central, authoritative, auditable source of configuration data

Document and code generation

After the configuration data is validated, the solution automates the creation of essential documentation and code artifacts through a structured workflow:

  • SCS document creation – Reviewers access the Genie AI Service interface to finalize the software configurations by generating an SCS document using the validated data.
  • XML generation workflow – After the SCS document is finalized, the workflow completes the following steps:
    • The workflow fetches the configuration details from the Genie database.
    • The SCSXMLConverter, an internal microservice of the Genie AI Service, processes both SCS document and study configurations. This microservice invokes Anthropic’s Claude 3.7 Sonnet through API calls to generate a standardized SCS XML file.
    • Validation checks are performed on the generated XML to make sure it meets the structural and content requirements of Clario’s clinical study software.
    • The final XML output is created for use in the software build process with detailed logs of the conversion process.

Benefits and results

The solution enhanced data extraction quality while providing teams with a streamlined dashboard that accelerates the validation process.

By implementing consistent extraction logic and minimizing manual data entry, the solution has reduced potential transcription errors. Additionally, built-in validation safeguards now help identify potential issues early in the process, preventing problems from propagating downstream.

The solution has also transformed how teams collaborate. By providing centralized review capabilities and giving cross-functional teams access to the same solution, communication has become more transparent and efficient. The standardized workflows have created clearer channels for information sharing and decision-making.

From an operational perspective, the new approach offers greater scalability across studies while supporting iterations as studies evolve. This standardization has laid a strong foundation for expanding these capabilities to other operational areas within the organization.

Importantly, the solution maintains strong compliance and auditability through complete audit trails and reproducible processes. Key outcomes include:

  • Study configuration execution time has been reduced while improving overall quality
  • Teams can focus more on value-added activities like study design optimization.

Lessons learned

Clario’s journey to transform software configuration through generative AI has taught them valuable lessons that will inform future initiatives.

Generative AI implementation insights

The following key learnings emerged specifically around working with generative AI technology:

  • Prompt engineering is foundational – Few-shot prompting with domain knowledge is essential. The team discovered that providing detailed examples and explicit business rules in the prompts was necessary for success. Rather than simple instructions, Clario’s prompts include comprehensive business logic, edge case handling, and exact output formatting requirements to guide the AI’s understanding of clinical trial configurations.
  • Prompt engineering requires iteration – The quality of data extraction depends heavily on well-crafted prompts that encode domain expertise. Clario’s team spent significant time refining these prompts through multiple iterations and testing different approaches to capture complex business rules about visit sequencing, demographic requirements, and field formatting.
  • Human oversight within a validation workflow – Although generative AI dramatically accelerates extraction, human review remains necessary within a structured validation workflow. The Genie AI Service interface was specifically designed to highlight potential inconsistencies and provide convenient editing capabilities for reviewers to apply their expertise efficiently.

Integration challenges

Some important challenges surfaced during system integration:

  • Two-system synchronization – One of the biggest challenges has been verifying that changes made in the SCS documents are reflected in the solution. This bidirectional integration is still being refined.
  • System transition strategy – Moving from the proof-of-concept scripts to fully integrated solution functionality requires careful planning to avoid disruption.

Process adaptation

The team identified the following key factors for successful process change:

  • Phased Implementation – Clario rolled out the solution in stages, beginning with pilot teams who could validate functionality and serve as internal advocates to help teams transition from familiar document-centric workflows to the new solution.
  • Workflow optimization is iterative – The initial workflow design has evolved based on user feedback and real-world usage patterns.
  • Training requirements – Even with an intuitive interface, proper training makes sure users can take full advantage of the solution’s capabilities.

Technical considerations

Implementation revealed several important technical aspects to consider:

  • Data formatting variability – Transmittal forms vary significantly across different therapeutic areas (oncology, neurology, and so on) and even between studies within the same area. This variability creates challenges when the AI model encounters form structures or terminology it hasn’t seen before. Clario’s prompt engineering requires continuous iteration as they discover new patterns and edge cases in transmittal forms, creating a feedback loop where human experts identify missed or misinterpreted data points that inform future prompt refinements.
  • Performance optimization – Processing times for larger documents required optimization to maintain a smooth user experience.
  • Error handling robustness – Building resilient error handling into the generative AI processing flow was essential for production reliability.

Strategic insights

The project yielded valuable strategic lessons that will inform future initiatives:

  • Start with well-defined use cases – Beginning with the software configuration process gave Clario a concrete, high-value target for demonstrating generative AI benefits.
  • Build for extensibility – Designing the architecture with future expansion in mind has positioned them well for extending these capabilities to other areas.
  • Measure concrete outcomes – Tracking specific metrics like processing time and error rates has helped quantify the return on the generative AI investment.

These lessons have been invaluable for refining the current solution and informing the approach to future generative AI implementations across the organization.

Conclusion

The transformation of the software configuration process through generative AI represents more than just a technical achievement for Clario—it reflects a fundamental shift in how the company approaches data processing and knowledge work in clinical trials. By combining the pattern recognition and processing power of LLMs available in Amazon Bedrock with human expertise for validation and decision-making, Clario created a hybrid workflow that delivers the best of both worlds, orchestrated through Amazon ECS for reliable, scalable execution.

The success of this initiative demonstrates how generative AI on AWS is a practical tool that can deliver tangible benefits. By focusing on specific, well-defined processes with clear pain points, Clario has implemented the solution Genie AI Service powered by Amazon Bedrock in a way that creates immediate value while establishing a foundation for broader transformation.

For organizations considering similar transformations, the experience highlights the importance of starting with concrete use cases, building for human-AI collaboration and maintaining a focus on measurable business outcomes. With these principles in mind, generative AI can become a genuine catalyst for organizational evolution.


About the authors

Kim Nguyen serves as the Sr Director of Data Science at Clario, where he leads a team of data scientists in developing innovative AI/ML solutions for the healthcare and clinical trials industry. With over a decade of experience in clinical data management and analytics, Kim has established himself as an expert in transforming complex life sciences data into actionable insights that drive business outcomes. His career journey includes leadership roles at Clario and Gilead Sciences, where he consistently pioneered data automation and standardization initiatives across multiple functional teams. Kim holds a Master’s degree in Data Science and Engineering from UC San Diego and a Bachelor’s degree from the University of California, Berkeley, providing him with the technical foundation to excel in developing predictive models and data-driven strategies. Based in San Diego, California, he leverages his expertise to drive forward-thinking approaches to data science in the clinical research space.

Shyam Banuprakash serves as the Senior Vice President of Data Science and Delivery at Clario, where he leads complex analytics programs and develops innovative data solutions for the medical imaging sector. With nearly 12 years of progressive experience at Clario, he has demonstrated exceptional leadership in data-driven decision making and business process improvement. His expertise extends beyond his primary role, as he contributes his knowledge as an Advisory Board Member for both Modal and UC Irvine’s Customer Experience Program. Shyam holds a Master of Advanced Study in Data Science and Engineering from UC San Diego, complemented by specialized training from MIT in data science and big data analytics. His career exemplifies the powerful intersection of healthcare, technology, and data science, positioning him as a thought leader in leveraging analytics to transform clinical research and medical imaging.

Praveen Haranahalli is a Senior Solutions Architect at Amazon Web Services (AWS), where he architects secure, scalable cloud solutions and provides strategic guidance to diverse enterprise customers. With nearly two decades of IT experience including over a decade specializing in cloud computing, Praveen has delivered transformative implementations across multiple industries. As a trusted technical advisor, Praveen partners with customers to implement robust DevSecOps pipelines, establish comprehensive security guardrails, and develop innovative AI/ML solutions. He is passionate about solving complex business challenges through cutting-edge cloud architectures and empowering organizations to achieve successful digital transformations powered by artificial intelligence and machine learning.