AWS for Industries

From code to chemistry: using Kiro to tackle ADME-Tox, a key drug discovery challenge

We gave ourselves one month, two developers, one scientist, and one rule: no expanding the team when it got hard. Here is what we built, and why the constraints were the point.

Introduction

Building production-ready agentic AI solutions can present significant challenges from navigating complex architectures to establishing clear development patterns. The life sciences industry adds further complexity: strict regulations, diverse data modalities, and the variety of diseases all require additional time and specialized approaches. Opportunities across the biopharma value chain to accelerate molecular search, safety, synthesis, and eventual clinical comparison ultimately benefit patients.

In our first installment, we walked through how a small team used Kiro, an agentic AI-powered Integrated Development Environment (IDE) released by AWS, to go from a natural language specification to a working drug discovery agent in just three weeks. We covered how the Kiro spec-driven development model, Agent Hooks, and MCP server connectivity collapsed what would traditionally be months of architecture design and infrastructure setup into days of focused code iteration. The result was a production-ready target identification agent that synthesizes insights across 30+ fragmented biomedical data sources, identifies potential therapeutic targets with cited evidence, and provides traceable recommendations that scientists can validate and act upon.

At AWS, we focus on removing the undifferentiated heavy lifting so you can concentrate on what makes your science unique. Amazon Bio-Discovery launched in April 2026, is one such solution to accelerating drug discovery, specifically antibody therapeutics, by connecting generative AI models with physical, automated lab testing. Amazon Bio Discovery enables researchers to access 40+ AI biology models through a no-code interface to design, predict, and optimize drug candidates. But between a validated target and a clinical trial submission lies a critical evaluation step that eliminates approximately 90% of candidates (Sun et al): understanding how a molecule behaves in the human body. We set out to close that gap.

Enter: ADME-Tox

In biopharmaceutical discovery, promising hits emerge from screening workflows in monthly to quarterly campaigns. As efficacy, binding, and various phenotypic measurements are captured well-by-well, the next question is not “Is it active?” but rather “What’s the safety profile?” How will knowing about the physical and metabolic properties of the hit or lead influence its residence time in the body? Absorption, Distribution, Metabolism, Excretion, and Toxicity – collectively known as ADME-Tox – represent the gauntlet that eliminates approximately 90% of drug candidates before they ever reach a clinical trial (Sun et al).

In the clinical development phase of therapeutic testing, a candidate passes through Phase I (healthy volunteers), Phase II (small population of patients with disease indication), and Phase III (larger studies). Poor ADME-Tox profiles are among the leading causes of late-stage attrition: 30-45% of candidates fail in Phase II – III clinical trials due to emerging safety concerns (Sun et al). Without better preclinical risk models, pipeline triage, and pairing of observed ADME-Tox from Phase I into preclinical discussions, more R&D leads will be developed that ultimately fall out of the clinical pipeline, diverting time and money from potentially safer and more targeted patient therapeutics.

The standard battery of in vitro ADME-Tox assays (CYP, solubility, Caco-2, PAMPA, hERG, PPB) has only an estimated 50-70% predictive accuracy for human drug metabolism. How can we make sure the drugs we are considering making have a decent shot at passing preclinical testing with appropriate properties to perform in downstream clinical development? As general LLMs continue to evolve, we wanted to explore whether they are capable enough at reasoning and data analysis to suitably assess safety and toxicity risks of selected approved and known drug-like molecules.

Team and timeline

This is where our second drug discovery agent enters the picture. Building on the architectural patterns, Kiro spec conventions, the newly launched Kiro Autonomous Agent, and Amazon Bedrock integrations established in Part 1, we set out to answer a simple question: Could three people working part-time over the course of a single month, build a production-ready ADME-Tox prediction agent?

Our trio comprised two developers: a Principal Solutions Architect scientific solutions architect with a background in bioinformatics and a full-stack developer who had no prior experience with ADME modeling. A career scientist rounded out the group, serving as workflow validation, prompt tester, and data source recommender. We did this work in the in-between, between customer meetings, demos, trainings, and the general demands of our work weeks. Each contributor averaged 5 hours per week on this project across four weeks, approximately 60 combined hours in total.

That constraint was intentional. This blog series posits that the Kiro spec-first, agent-assisted development model is designed precisely for this reality: skilled developers with limited bandwidth who need to move fast without sacrificing production quality. Part 1 validated that thesis for a greenfield discovery workflow. Part 2 stress-tests against a domain that is simultaneously more scientifically complex and more operationally consequential.

Draft architecture for the ADME-Tox Kiro solution. Multi-agentic frameworks, Kiro powers, and AgentCore primitives contribute.

Figure 1: Draft architecture for the ADME-Tox Kiro solution. Multi-agentic frameworks, Kiro powers, and AgentCore primitives contribute.

Building from prompts and open-source data

We utilized Kiro in spec-driven design mode through a series of prompts, working between services and creating code and artifacts.

Step 1: Amazon Quick Research agent prompt

Amazon Quick Research prompt
Research on these 3 topics to build a research agent that provides comprehensive technical specifications and implementation strategies for creating specialized AI agents for drug discovery.

Agent 1: An ADME Prediction Agent should be the technically mature component, leveraging ADMET_AI as its core prediction engine. This ADME Prediction agent should be able to predict absorption, distribution, metabolism, and excretion properties. These are the open-source tools available to build this agent 1/. ADMET_AI: ADMET prediction models 2/ RDKit: Molecular descriptors and property calculations 3/ ChemProp: Graph neural networks for molecular property prediction and 4/DeepChem: Machine learning for drug discovery.

Agent 2: A Toxicity Assessment Agent should be able to assess various toxicity endpoints. Use these Open-Source Tools 1/ NCATS ADME Portal: Free ADME/Tox prediction models 2/ ToxCast/Tox21: EPA's toxicity prediction datasets 3/ QSAR Toolbox: OECD's free toxicity prediction software and 4/ pkCSM: Web-based ADMET prediction to develop the tools for this agent.

Agent 3: The Literature Mining Agent which can search and analyze relevant scientific literature

Use 1/ PubMed API: Access to biomedical literature 2/ ChEMBL API: Bioactivity database 3/ PubChem API: Chemical information database and 4/ DrugBank API: Drug and target information

Our initial analysis reveals that these agents can be successfully built using twelve open-source tools, with specific architectural patterns and integration strategies that maximize their complementary capabilities for pharmaceutical research. Discover public resources and papers that help us validate these requirements and measure Business Impact in terms of Performance and Scalability Advantages for developing this multi-Agent research assistant Amazon Bedrock AgentCore based agent. Include the Critical Implementation Requirements like Data Integration and Standardization in this report.

Finally, Highlight the Research Limitations and Future Considerations for these agents.

Step 2: Generate README.md for empty GitHub repository
Amazon Quick Report prompt

Generated a technical README format document to be used with Project Management tools for software development which will be used with Coding autonomous agents.

screenshot of Amazon Quick UI, showing prompt and system architecture upon query

Figure 2: Amazon Quick UI, showing prompt and system architecture upon query

Step 3: Leverage Kiro Autonomous Agent

Kiro Autonomous Agent is a frontier agent that represents a new class of AI agents designed to autonomously handle software development work as an asynchronous teammate. Kiro Autonomous Agent can learn from the initial agents and tools built by developers in Kiro IDE and apply the same patterns and codebase to build additional agents and tools without requiring constant human guidance.

Our initial system spec utilized multiple open-source repositories, including APIs to ChEMBL, PubChem, the NIH ADME@NCATS library, RDKit, and multiple other libraries. Using Amazon Bedrock AgentCore Runtime, we set up a model based on our previously communicated biomarker agent, an architectural pattern that gathers data sources and uses supervisor agents to construct the end analysis.

Early results

Disclaimer: This agentic system accesses open-source data and algorithms; it does not constitute medical advice or recommendations. Results should always be reviewed by a human-in-the-loop, as would be expected in preclinical testing at a pharmaceutical company.

We first tested the system on two closely related structural analogs: a marketed Cox-2 pain reliever and a 4-isopropyl analog. Specific data points (MW, logP) align with ChEMBL and PubChem, others are calculated.

For the prompt:

Compare these two compounds’ ADME-Tox profiles. Which is best for overall patient safety? C1(=N[N](C(=C1)C2=CC=C(C=C2)C)C3=CC=C(C=C3)[S](N)(=O)=O)C(F)(F)F; C1(=N[N](C(=C1)C2=CC=C(C=C2)C(C)C)C3=CC=C(C=C3)[S](N)(=O)=O)C(F)(F)F;

The output, with the following performance logs:

Model: us.anthropic.claude-sonnet-4-6 | Region: us-west-2

Tokens: 24916 in / 4204 out / 29120 total | Latency: 74001ms | Duration: 57.7s | Cycles: 1

The system delivered results in under 1 minute.

chart of Head-to-head comparison of iPr

Figure 3: Head-to-head comparison of iPr analog against marketed drug for base ADME properties

Table Comparing the head-to-head toxicology exports

Figure 4: Comparing the head-to-head toxicology exports from ChEMBL, RDKit, PubChem using Anthropic Claude Sonnet 4.6 called through Amazon Bedrock AgentCore Amazon Bedrock AgentCore

As expected, the approved drug (Compound 1) shows a better overall ADME-Tox profile, with the CYP2C9 liability called out in both. The model reasons further, using known oxidation sites and logP to infer zero-shot reasoning and comparators. Of note, the system flags potentially harmful Cox-2 inhibitors, like Vioxx, that were removed from the market.

We then tested a different molecular class: two generic sulfonamide antibacterials.

Using the prompt:

“Compare sulfadiazine C1=CN=C(N=C1)NS(=O)(=O)C2=CC=C(C=C2)N against sulfisoxazole CC1=C(ON=C1C)NS(=O)(=O)C2=CC=C(C=C2)N for overall safety risk in antibacterial treatment”

The initial ADME comparison shows the molecules have similar chemical and physicochemical properties.

Chart of initial ADME comparison

Figure 5: Computed ADME-Tox properties (RDKit)

However, in this example, multiple PubMed studies contraindicate one of the two molecules based on kidney injury (nephrotoxicity).

 

Comparison of sulfadiazine and sulfisoxazole for renal clearance using downstream clinical observations extracted from literature

Figure 6: Comparison of sulfadiazine and sulfisoxazole for renal clearance using downstream clinical observations extracted from literature

The system reasons and provides a comparative risk profile, using its knowledge bases and Anthropic Claude Sonnet 4.6 and Anthropic Claude Sonnet 4.6 to predict a “winner” based on classification. This view is extensible in our testing to comparing 5 analogs of a specific lead (or 5 separate drugs for a given indication).

chart of comprehensive safety scorecardFigure 7: Comparative safety scorecard, given current source data sets (see above)

During testing with multiple other entities from common names, SMILES inputs, and trade names in live demos at the April 2026 AWS Life Sciences Symposium, the system returned results in 1–2 minutes and generally matched similar searches across public databases.

Potential future directions

The authors recognize the limitation of accessing open-source models and databases for this study. Ideally, future builds will incorporate fine-tuned multimodal foundation models and will use data sets captured from clinical or academic collaborators. Expanding the agents’ understanding of molecular (or biologic) design to, for example, iterate to improve ADME-Tox safety will be a future area.

We also recognize that a panoply of windows, modals, and systems may exist in a given research computing or R&D IT environment, and that our agentic system may be better served as an integration into another portal: an ELN, a LIMS, or a compound design motif, perhaps even into a safety prediction step of Amazon Bio Discovery.

Conclusion

ADME-Tox assessment informs research, preclinical, and clinical evaluation risk of lead molecules. We built this agent in approximately 60 hours, with two developers and one scientist. It works. Not perfectly, not to replace in vitro testing but well enough to complement early-stage decision-making during hit-to-lead and subsequent development. Surfacing actionable results for a bench scientist, project manager, or lab leader in minutes should be an attractive application of agentic systems for life sciences.

Kiro’s spec-driven development and use of autonomous agents allowed us to move from scientific requirements to working agents without losing the thread between what the science needed and what the code did. If you have suitable domain knowledge and access to a Kiro-like environment, you can use the steps and prompts described above to build in an even shorter time (sub-3 hours, per recent hackathon attempts!)

AWS has collected multiple agentic approaches to Research and Development-relevant situations, many of which are available in our Sample Agents for Healthcare and Life Sciences on AWS repository. These agents highlight how agentic AI can transform early-stage drug discovery by rapidly analyzing and synthesizing vast amounts of complex biological data and using it to understand the biological profile of the medicines in question.

To learn more about agentic systems and Kiro-driven scientific solutions on AWS, visit our Healthcare and Life Sciences page or reach out to your AWS Life Sciences representative.

Further reading

Atul Chaudhari

Atul Chaudhari

Atul Chaudhari is a talented Storage and Machine Learning Solutions Architect working across multiple healthcare and life sciences enterprise accounts. Atul is currently enrolled in the Biomedical Data Science graduate and Drug Development: From Discovery to Commercialization Graduate programs.

Ariella Sasson

Ariella Sasson

Ariella Sasson, Ph.D., is a Principal Specialist Solution Architect for Data & AI/ML focusing on Healthcare and Life Sciences at AWS. She helps healthcare and life science organizations become data-driven — from designing modern data architectures to enabling AI/ML and high-performance computing workloads. Ariella brings over 20 years of experience in high-throughput clinical genomics, oncology, and pharma R&D. She is passionate about leveraging cloud and AI to accelerate life science research, genomics, and personalized medicine.

Mike Tarselli

Mike Tarselli

Mike Tarselli is a Specialist Leader in Healthcare & Life Sciences Data and AI at AWS. He has spent 25+ years in the biopharma industry. As a leader in AI and data strategy, he works with scientific and technical teams to help them realize their vision, while embracing the fast pace and enormity of AI.