Skip to main contentAWS Startups
  1. Learn
  2. Adaptive ML and CCS accelerate patient support with Meta Llama and AWS

Adaptive ML and CCS accelerate patient support with Meta Llama and AWS

How was this content?

Adaptive ML, a company that develops reinforcement learning software for enterprise AI, set out to help CCS, a leading provider of clinical solutions and home-delivered medical supplies, improve response times and reliability across its patient service operations for people managing chronic conditions. The Adaptive ML team tested an AI agent designed to execute real operational tasks across internal systems using Llama models from Meta on Amazon Web Services (AWS). A proof of concept demonstrated a faster and more efficient approach to enterprise AI support workflows that reduced response latency by more than 90 percent.

Improving patient support response times

Healthcare organizations that support patients with chronic conditions must respond quickly and reliably to requests about supplies, shipments, and care management, even during surges of peak demand. When patients rely on devices such as continuous glucose monitors or insulin pumps, delays in resolving issues can disrupt treatment and create operational strain for support teams. CCS provides patient support services that help individuals manage ongoing care needs, including the logistics and coordination required to maintain critical medical supplies. These interactions often require agents to access multiple internal systems to retrieve information, check orders, or guide patients through next steps. As support volumes fluctuate, AI is becoming an increasingly important component to improve response times. 

To address this challenge, CCS began exploring how AI agents could help streamline patient support workflows. The goal was to enable automated systems that could interact directly with enterprise tools, retrieve information from internal systems, and complete operational tasks on behalf of support teams. Achieving this requires more than conversational AI. Enterprise support agents must reliably execute function calls, allowing models to invoke APIs across systems such as CRMs, knowledge bases, and order management platforms. If those calls fail due to incorrect parameters or malformed outputs, the workflow stops. The request must then be handed off to a human agent, increasing wait times and operational overhead. Traditional approaches often rely on large proprietary models accessed through external APIs. While capable generalists, these models can introduce latency and limit control over training or optimization for specialized enterprise workflows. Adaptive ML partnered with CCS to explore a different approach: using reinforcement learning–optimized open models to power reliable AI agents that could operate quickly and efficiently in real-world healthcare support environments.

Deploying a specialized AI agent architecture

Adaptive ML implemented the proof of concept using Adaptive Engine, a reinforcement learning operations (RLOps) platform designed to help enterprises train, evaluate, and deploy specialized language models. For the CCS use case, Adaptive ML selected Meta Llama 3.2 3B model, a compact open-source model well suited to real-time enterprise applications. Smaller models offer significant advantages for operational workflows: faster inference times, lower infrastructure requirements, and the ability to iterate quickly during development. “As soon as we tested Llama models, the latency difference was dramatic,” said Olivier Cruchant, co-founder of Adaptive ML. “With a compact model you can respond in near real time, which is exactly what you need for patient support interactions.” 

These enterprise AI agents require a high level of function-calling accuracy to interact reliably with business systems. To meet this requirement, Adaptive ML applied reinforcement learning–based fine- tuning through Adaptive Engine. The process trained the Llama model to reliably generate the structured outputs required to interact with enterprise APIs and business systems. The system was deployed on Amazon Elastic Compute Cloud (Amazon EC2) p5.4xlarge instances equipped with NVIDIA H100 GPUs, providing the compute resources needed to run the model efficiently. 

Adaptive ML also used Amazon EC2 Capacity Blocks, which allow GPU resources to be reserved for specific time windows. This enabled the team to secure GPU availability for benchmarking and testing while maintaining flexibility in provisioning. “Being able to reserve capacity for a specific window was extremely helpful,” Olivier said. “It allowed us to run large-scale benchmarks with confidence that the infrastructure would be available.” AWS infrastructure also helped reduce system latency by placing both compute resources and supporting databases within the same availability zone. From CCS’s perspective, integration remained straightforward. Adaptive ML hosted the model environment on AWS and exposed it through an HTTPS API endpoint, allowing CCS applications to call the AI agent directly without major architectural changes.

Demonstrating scalable healthcare AI performance

The proof of concept demonstrated that a specialized, compact model could deliver enterprise-grade performance for AI-powered patient support workflows. The system achieved a client-side inference latency of approximately 230 milliseconds, representing more than a 90 percent reduction compared to a proprietary model baseline. This end-to- end response time includes the full request cycle, while model inference latency averaged roughly 160 milliseconds on the server side. That meant the AI agent could respond quickly even during multi-step workflows. “For real-time workflows, latency is everything,” Olivier said. “When responses come back in a few hundred milliseconds instead of several seconds, the experience becomes usable for both patients and support teams.” 

Lower latency also improves the reliability of automated workflows. Because the model can generate accurate function calls quickly, it can retrieve data from enterprise systems and complete tasks without requiring human intervention. That reduces delays in patient interactions and allows support teams to focus on more complex cases. The architecture also demonstrated a new economic model for enterprise AI deployments. “Small models unlock something powerful: the ability to integrate CCS’s proprietary knowledge and workflows into patient support—boosting both speed and reliability,” said Richard Mackey, CTO of CCS. 

The CCS proof of concept highlights how healthcare organizations can begin integrating AI agents into operational workflows while maintaining the responsiveness and reliability required for patient-facing services. By combining Meta’s Llama models with Adaptive ML’s reinforcement learning platform and AWS infrastructure, the collaboration demonstrates a path toward scalable AI support systems designed for real-world enterprise environments. 

How was this content?