Build a biomedical research agent with Biomni tools and Amazon Bedrock AgentCore Gateway

This post is co-authored with the Biomni group from Stanford.

Biomedical researchers spend approximately 90% of their time manually processing massive volumes of scattered information. This is evidenced by Genentech’s challenge of processing 38 million biomedical publications in PubMed, public repositories like the Human Protein Atlas, and their internal repository of hundreds of millions of cells across hundreds of diseases. There is a rapid proliferation of specialized databases and analytical tools across different modalities including genomics, proteomics, and pathology. Researchers must stay current with the large landscape of tools, leaving less time for the hypothesis-driven work that drives breakthrough discoveries.

AI agents powered by foundation models offer a promising solution by autonomously planning, executing, and adapting complex research tasks. Stanford researchers built Biomni that exemplifies this potential. Biomni is a general-purpose biomedical AI agent that integrates 150 specialized tools, 105 software packages, and 59 databases to execute sophisticated analyses such as gene prioritization, drug repurposing, and rare disease diagnosis.

However, deploying such agents in production requires robust infrastructure capable of handling computationally intensive workflows and multiple concurrent users while maintaining security and performance standards. Amazon Bedrock AgentCore is a set of comprehensive services to deploy and operate highly capable agents using any framework or model, with enterprise-grade security and scalability.

In this post, we show you how to implement a research agent using AgentCore with access to over 30 specialized biomedical database tools from Biomni, thereby accelerating scientific discovery while maintaining enterprise-grade security and production scale. The code for this solution is available in the open-source toolkit repository of starter agents for life sciences on Amazon Web Services (AWS). The step by step instruction helps you deploy your own tools and infrastructure, along with AgentCore components, and examples.

Prototype-to-production complexity gap

Moving from a local biomedical research prototype to a production system accessible by multiple research teams requires addressing complex infrastructure challenges.

Agent deployment with enterprise security

Enterprise security challenges include OAuth-based authentication, secure tool sharing through scalable gateways, comprehensive observability for research audit trails, and automatic scaling to handle concurrent research workloads. Many promising prototypes fail to reach production because of the complexity of implementing these enterprise-grade requirements while maintaining the specialized domain expertise needed for accurate biomedical analysis.

Session-aware research context management

Biomedical research workflows often span multiple conversations and require persistent memory of previous analyses, experimental parameters, and research preferences across extended research sessions. Research agents must maintain contextual awareness of ongoing projects, remember specific protein targets, experimental conditions, and analytical preferences. All that must be done while facilitating proper session isolation between different researchers and research projects in a multi-tenant production environment.

Scalable tool gateway

Implementing a reusable tool gateway that can handle concurrent requests from research agent, proper authentication, and consistent performance becomes critical at scale. The gateway must enable agents to discover and use tools through secure endpoints, help agents find the right tools through contextual search capabilities, and manage both inbound authentication (verifying agent identity) and outbound authentication (connecting to external biomedical databases) in a unified service. Without this architecture, research teams face authentication complexity and reliability issues that prevent effective scaling.

Solution overview

We use Strands Agents, an open source agent framework, to build a research agent with local tool implementation for PubMed biomedical literature search. We extended the agent’s capabilities by integrating Biomni database tools, providing access to over 30 specialized biomedical databases.

The overall architecture is shown in the following diagram.

Architecture diagram of a research agent with Biomni Gateway

The AgentCore Gateway service centralizes Biomni database tools as more secure, reusable endpoints with semantic search capabilities. AgentCore Memory service maintains contextual awareness across research sessions using specialized strategies for research context. Security is handled by AgentCore Identity service, which manages authentication for both users and tool access control. Deployment is streamlined with the AgentCore Runtime service, providing scalable, managed deployment with session isolation. Finally, the AgentCore Observability service enables comprehensive monitoring and auditing of research workflows that are critical for scientific reproducibility.

Step 1 – Creating tools such as the Biomni database tools using AgentCore Gateway

In real-world use cases, we need to connect agents to different data sources. Each agent might duplicate the same tools, leading to extensive code, inconsistent behavior, and maintenance nightmares. AgentCore Gateway service streamlines this process by centralizing tools into reusable, secure endpoints that agents can access. Combined with the AgentCore Identity service for authentication, AgentCore Gateway creates an enterprise-grade tool sharing infrastructure. To give more context to the agent with reusable tools, we provided access to over 30 specialized public database APIs through the Biomni tools registered on the gateway. The gateway exposes Biomni’s database tools through the Model Context Protocol (MCP), allowing the research agent to discover and invoke these tools alongside local tools like PubMed. It handles authentication, rate limiting, and error handling, providing a seamless research experience.

def create_gateway(gateway_name: str, api_spec: list) -> dict:
    # JWT authentication with Cognito
    auth_config = {
        "customJWTAuthorizer": {
            "allowedClients": [
                get_ssm_parameter("/app/researchapp/agentcore/machine_client_id")
            ],
            "discoveryUrl": 
                get_ssm_parameter("/app/researchapp/agentcore/cognito_discovery_url"),
        }
    }
    
    # Enable semantic search for BioImm tools
    search_config = {"hcp": {"searchType": "SEMANTIC"}}
    
    # Create the gateway
    gateway = bedrock_agent_client.create_gateway(
        name=gateway_name,
        collectionexecution_role_arn,
        protocolType="MCP",
        authorizerType="CUSTOM_JWT",
        authorizerConfiguration=auth_config,
        protocolConfiguration=search_config,
        description="My App Template AgentCore Gateway",
    )

We use an AWS Lambda function to host the Biomni integration code. The Lambda function is automatically configured as an MCP target in the AgentCore Gateway. The Lambda function exposes its available tools through the API specification ( api_spec.json).

# Gateway Target Configuration
lambda_target_config = {
    "mcp": {
        "lambda": {
            "lambdaArn": get_ssm_parameter("/app/researchapp/agentcore/lambda_arn"),
            "toolSchema": {"inlinePayload": api_spec},
        }
    }
}

# Create the target
create_target_response = gateway_client.create_gateway_target(
    gatewayIdentifier=gateway_id,
    name="LambdaUsingSDK",
    description="Lambda Target using SDK",
    targetConfiguration=lambda_target_config,
    credentialProviderConfigurations=[{
        "credentialProviderType": "GATEWAY_IAM_ROLE"
    }],
)

The full list of Biomni database tools included on the gateway are listed in the following table:

Group	Tool	Description
Protein and structure databases	UniProt	Query the UniProt REST API for comprehensive protein sequence and functional information
	AlphaFold	Query the AlphaFold Database API for AI-predicted protein structure predictions
	InterPro	Query the InterPro REST API for protein domains, families, and functional sites
	PDB (Protein Data Bank)	Query the RCSB PDB database for experimentally determined protein structures
	STRING	Query the STRING protein interaction database for protein-protein interaction networks
	EMDB (Electron Microscopy Data Bank)	Query for 3D macromolecular structures determined by electron microscopy
Genomics and variants	ClinVar	Query NCBI's ClinVar database for clinically relevant genetic variants and their interpretations
	dbSNP	Query the NCBI dbSNP database for single nucleotide polymorphisms and genetic variations
	gnomAD	Query gnomAD for population-scale genetic variant frequencies and annotations
	Ensembl	Query the Ensembl REST API for genome annotations, gene information, and comparative genomics
	UCSC Genome Browser	Query the UCSC Genome Browser API for genomic data and annotations
Expression and omics	GEO (Gene Expression Omnibus)	Query NCBI's GEO for RNA-seq, microarray, and other gene expression datasets
	PRIDE	Query the PRIDE database for proteomics identifications and mass spectrometry data
	Reactome	Query the Reactome database for biological pathways and molecular interactions
Clinical and drug data	cBioPortal	Query the cBioPortal REST API for cancer genomics data and clinical information
	ClinicalTrials.gov	Query ClinicalTrials.gov API for information about clinical studies and trials
	OpenFDA	Query the OpenFDA API for FDA drug, device, and food safety data
	GtoPdb (Guide to PHARMACOLOGY)	Query the Guide to PHARMACOLOGY database for drug targets and pharmacological data
Disease and phenotype	OpenTargets	Query the OpenTargets Platform API for disease-target associations and drug discovery data
	Monarch Initiative	Query the Monarch Initiative API for phenotype and disease information across species
	GWAS Catalog	Query the GWAS Catalog API for genome-wide association study results
	RegulomeDB	Query the RegulomeDB database for regulatory variant annotations and functional predictions
Specialized databases	JASPAR	Query the JASPAR REST API for transcription factor binding site profiles and motifs
	WoRMS (World Register of Marine Species)	Query the WoRMS REST API for marine species taxonomic information
	Paleobiology Database (PBDB)	Query the PBDB API for fossil occurrence and taxonomic data
	MPD (Mouse Phenome Database)	Query the Mouse Phenome Database for mouse strain phenotype data
	Synapse	Query Synapse REST API for biomedical datasets and collaborative research data

The following are examples of how individual tools get triggered through the MCP from our test suite:

# Protein and Structure Analysis
"Use uniprot tool to find information about human insulin protein"
# → Triggers uniprot MCP tool with protein query parameters
"Use alphafold tool for structure predictions for uniprot_id P01308"
# → Triggers alphafold MCP tool for 3D structure prediction
"Use pdb tool to find protein structures for insulin"
# → Triggers pdb MCP tool for crystallographic structures
# Genetic Variation Analysis  
"Use clinvar tool to find pathogenic variants in BRCA1 gene"
# → Triggers clinvar MCP tool with gene variant parameters
"Use gnomad tool to find population frequencies for BRCA2 variants"
# → Triggers gnomad MCP tool for population genetics data

As the tool collection grows, the agent can use built-in semantic search capabilities to discover and select tools based on the task context. This improves agent performance and reducing development complexity at scale. For example, the user asks, “tell me about HER2 variant rs1136201.” Instead of listing all 30 or more tools from the gateway back to the agent, semantic search returns ‘n’ most relevant tools. For example, Ensembl, Gwas catalog, ClinVar, and Dbsnp to the agent. The agent now uses a smaller subset of tools as input to the model to return a more efficient and faster response.

The following graphic illustrates using AgentCore Gateway for tool search.

AgentCore Gateway tool search

You can now test your deployed AgentCore gateway using the following test scripts and compare how semantic search narrows down the list of relevant tools based on the search query.

uv run tests/test_gateway.py --prompt "What tools are available?"
uv run tests/test_gateway.py --prompt "Find information about human insulin protein" --use-search

Step 2- Strands research agent with a local tool

The following code snippet shows model initialization, implementing the PubMed local tool that’s declared using the Strands @tool decorator. We’ve implemented the PubMed tool in research_tools.py that calls PubMed APIs to enable biomedical literature search capabilities within the agent's execution context.

PubMed Tool Creation

from agent.agent_config.tools.PubMed import PubMed

@tool(
    name="Query_pubmed",
    description=(
        "Query PubMed for relevant biomedical literature based on the user's query. "
        "This tool searches PubMed abstracts and returns relevant studies with "
        "titles, links, and summaries."
    ),
)
def query_pubmed(query: str) -> str:
    """
    Query PubMed for relevant biomedical literature based on the user's query.
    
    This tool searches PubMed abstracts and returns relevant studies with 
    titles, links, and summaries.
    
    Args:
        query: The search query for PubMed literature
        
    Returns:
        str: Formatted results from PubMed search
    """
    pubmed = PubMed()
    
    print(f"\nPubMed Query: {query}\n")
    result = pubmed.run(query)
    print(f"\nPubMed Results: {result}\n")
    
    return result

- - - Create the Strands research agent with the local tool and Claude Sonnet 4 Interleaved Thinking.

class ResearchAgent:
    def __init__(
        self,
        bearer_token: str,
        memory_hook: MemoryHook = None,
        session_manager: AgentCoreMemorySessionManager = None,
        bedrock_model_id: str = "us.anthropic.claude-sonnet-4-20250514-v1.0",
        #bedrock_model_id: str = "openai.gpt-oss-120b-1.0",  # Alternative
        system_prompt: str = None,
        tools: List[callable] = None,
    ):
        
        self.model_id = bedrock_model_id
        # For Anthropic Sonnet 4 interleaved thinking
        self.model = BedrockModel(
            model_id=self.model_id,
            additional_request_fields={
                "anthropic_beta": ["interleaved-thinking-2025-05-14"],
                "thinking": {"type": "enabled", "budget_tokens": 8000},
            },
        )
        
        self.system_prompt = (
            system_prompt
            if system_prompt
            else """
You are a **Comprehensive Biomedical Research Agent** specialized in conducting 
systematic literature reviews and multi-database analyses to answer complex biomedical research 
questions. Your primary mission is to synthesize evidence from both published literature 
(PubMed) and real-time database queries to provide comprehensive, evidence-based insights for 
pharmaceutical research, drug discovery, and clinical decision-making.

Your core capabilities include literature analysis and extracting data from 30+ specialized 
biomedical databases** through the Bioimm gateway, enabling comprehensive data analysis. The 
database tool categories include genomics and genetics, protein structure and function, pathways 
and system biology, clinical and pharmacological data, expression and omics data and other 
specialized databases.
"""
        )

In addition, we implemented citations that use a structured system prompt to enforce numbered in-text citations [1], [2], [3] with standardized reference formats for both academic literature and database queries, marking sure every data source is properly attributed. This allows researchers to quickly access and reference the scientific literature that supports their biomedical research queries and findings.

"""
<citation_requirements>
- ALWAYS use numbered in-text citations [1], [2], [3], etc. when referencing any data source
- Provide a numbered "References" section at the end with full source details
- For academic literature: format as "1. Author et al. Title. Journal. Year. ID: [PMID/DOI], available at: [URL]"
- For database sources: format as "1. Database Name (Tool: tool_name), Query: [query_description], Retrieved: [current_date]"
- Use numbered in-text citations throughout your response to support all claims and data points
- Each tool query and each literature source must be cited with its own unique reference number
- When tools return academic papers, cite them using the academic format with full bibliographic details
- Structure: Format each reference on a separate line with proper numbering - NO bullet points
- Present the References section as a clean numbered list, not a confusing paragraph
- Maintain sequential numbering across all reference types in a single "References" section
</citation_requirements>
"""

You can now test your agent locally:

uv run tests/test_agent_locally.py --prompt "Find information about human insulin protein"
uv run tests/test_agent_locally.py --prompt "Find information about human insulin protein" --use-search

Step 3 - Add Persistent Memory for contextual research assistance

The research agent implements the AgentCore Memory service with three strategies: semantic for factual research context, user_preference for research methodologies, and summary for session continuity. The AgentCore Memory session manager is integrated with Strands session management and retrieves relevant context before queries and save interactions after responses. This enables the agent to remember research preferences, ongoing projects, and domain expertise across sessions without manual context re-establishment.

# Test memory functionality with research conversations

python tests/test_memory.py load-conversation<br />python tests/test_memory.py load-prompt "My preferred response format is detailed explanations"

Step 4 - Deploy with AgentCore Runtime

To deploy our agent, we use AgentCore Runtime to configure and launch the research agent as a managed service. The deployment process configures the runtime with the agent's main entrypoint (agent/main.py), assigns an IAM execution role for AWS service access, and supports both OAuth and IAM authentication modes. After deployment, the runtime becomes a scalable, serverless agent that can be invoked using API calls. The agent automatically handles session management, memory persistence, and tool orchestration while providing secure access to the Biomni gateway and local research tools.

agentcore configure --entrypoint agent/main.py -er arn:aws:iam::&lt;Account-Id&gt;:role/&lt;Role&gt; --name researchapp&lt;AgentName&gt;

For more information about deploying with AgentCore Runtime, see Get started with AgentCore Runtime in the Amazon Bedrock AgentCore Developer Guide.

Agents in action

The following are three representative research scenarios that showcase the agent's capabilities across different domains: drug mechanism analysis, genetic variant investigation, and pathway exploration. For each query, the agent autonomously determines which combination of tools to use, formulates appropriate sub-queries, analyzes the returned data, and synthesizes a comprehensive research report with proper citations. The accompanying demo video shows the complete agent workflow, including tools selection, reasoning, and response generation.

Conduct a comprehensive analysis of trastuzumab (Herceptin) mechanism of action and resistance mechanisms you’ll need:
1. HER2 protein structure and binding sites
2. Downstream signaling pathways affected
3. Known resistance mechanisms from clinical data
4. Current clinical trials investigating combination therapies
5. Biomarkers for treatment response predictionQuery relevant databases to provide a comprehensive research report.
Analyze the clinical significance of BRCA1 variants in breast cancer risk and treatment response. Investigate:
1. Population frequencies of pathogenic BRCA1 variants
2. Clinical significance and pathogenicity classifications
3. Associated cancer risks and penetrance estimates
4. Treatment implications (PARP inhibitors, platinum agents)
5. Current clinical trials for BRCA1-positive patients
  Use multiple databases to provide comprehensive evidence

The following video is a demonstration of a biomedical research agent:

Scalability and observability

One of the most critical challenges in deploying sophisticated AI agents is making sure they scale reliably while maintaining comprehensive visibility into their operations. Biomedical research workflows are inherently unpredictable—a single genomic analysis might process thousands of files, while a literature review could span millions of publications. Traditional infrastructure struggles with these dynamic workloads, particularly when handling sensitive research data that requires strict isolation between different research projects.In this deployment, we use Amazon Bedrock AgentCore Observability to visualize each step in the agent workflow. You can use this service to inspect an agent's execution path, audit intermediate outputs, and debug performance bottlenecks and failures. For biomedical research, this level of transparency is not just helpful—it's essential for regulatory compliance and scientific reproducibility.

Sessions, traces, and spans form a three-tiered hierarchical relationship in the observability framework. A session contains multiple traces, with each trace representing a discrete interaction within the broader context of the session. Each trace contains multiple spans that capture fine-grained operations. The following screenshot shoes the usage of one agent: Number of sessions, token usage, and error rate in production

The following screenshot shows the agents in production and their usage (number of Sessions, number of invocations)

The built-in dashboards show performance bottlenecks and identify why certain interactions might fail, enabling continuous improvement and reducing the mean time to detect (MTTD) and mean time to repair (MTTR). For biomedical applications where failed analyses can delay critical research timelines, this rapid issue resolution capability makes sure that research momentum is maintained.

Future direction

While this implementation focuses on only a subset of tools, the AgentCore Gateway architecture is designed for extensibility. Research teams can seamlessly add new tools without requiring code changes by using the MCP protocol. Newly registered tools are automatically discoverable by agents allowing your research infrastructure to evolve alongside the rapidly changing tool sets.

For computational analysis that requires code execution, the AgentCore Code Interpreter service can be integrated into the research workflow. With AgentCore Code Interpreter the research agent can retrieve data and execute Python-based analysis using domain-specific libraries like BioPython, scikit-learn, or custom genomics packages.

Future extensions could support multiple research agents to collaborate on complex projects, with specialized agents for literature review, experimental design, data analysis, and result interpretation working together through multi-agent collaboration. Organizations can also develop specialized research agents tailored to specific therapeutic areas, disease domains, or research methodologies that share the same enterprise infrastructure and tool gateway.

Looking ahead with Biomni

“Biomni today is already useful for academic research and open exploration. But to enable real discovery—like advancing drug development—we need to move beyond prototypes and make the system enterprise-ready. Embedding Biomni into the workflows of biotech and pharma is essential to turn research potential into tangible impact.

That’s why we are excited to integrate the open-source environment with Amazon Bedrock AgentCore, bridging the gap from research to production. Looking ahead, we’re also excited about extending these capabilities with the Biomni A1 agent architecture and the Biomni-R0 model, which will unlock even more sophisticated biomedical reasoning and analysis. At the same time, Biomni will remain a thriving open-source environment, where researchers and industry teams alike can contribute tools, share workflows, and push the frontier of biomedical AI together with AgentCore.”

Conclusion

This implementation demonstrates how organizations can use Amazon Bedrock AgentCore to transform biomedical research prototypes into production-ready systems. By integrating Biomni's comprehensive collection of over 150 specialized tools through the AgentCore Gateway service, we illustrate how teams can create enterprise-grade tool sharing infrastructure that scales across multiple research domains.The combination of Biomni's biomedical tools with the enterprise infrastructure of Bedrock AgentCore organizations can build research agents that maintain scientific rigor while meeting production requirements for security, scalability, and observability. Biomni's diverse tool collection—spanning genomics, proteomics, and clinical databases—exemplifies how specialized research capabilities can be centralized and shared across research teams through a secure gateway architecture.

To begin building your own biomedical research agent with Biomni tools, explore the implementation by visiting our GitHub repository for the complete code and documentation. You can follow the step-by-step implementation guide to set up your research agent with local tools, gateway integration, and Bedorck AgentCore deployment. As your needs evolve, you can extend the system with your organization's proprietary databases and analytical tools. We encourage you to join the growing environment of life sciences AI agents and tools by sharing your extensions and improvements.

About the authors

Hasan Poonawala is a Senior AI/ML Solutions Architect at AWS, working with Healthcare and Life Sciences customers. Hasan helps design, deploy and scale Generative AI and Machine learning applications on AWS. He has over 15 years of combined work experience in machine learning, software development and data science on the cloud. In his spare time, Hasan loves to explore nature and spend time with friends and family.

pidemal Pierre de Malliard is a Senior AI/ML Solutions Architect at Amazon Web Services and supports customers in the Healthcare and Life Sciences Industry. He is currently based in New York City.

Necibe Ahat is a Senior AI/ML Specialist Solutions Architect at AWS, working with Healthcare and Life Sciences customers. Necibe helps customers to advance their generative AI and machine learning journey. She has a background in computer science with 15 years of industry experience helping customers ideate, design, build and deploy solutions at scale. She is a passionate inclusion and diversity advocate.

Kexin Huang is a final-year PhD student in Computer Science at Stanford University, advised by Prof. Jure Leskovec. His research applies AI to enable interpretable and deployable biomedical discoveries, addressing core challenges in multi-modal modeling, uncertainty, and reasoning. His work has appeared in Nature Medicine, Nature Biotechnology, Nature Chemical Biology, Nature Biomedical Engineering and top ML venues (NeurIPS, ICML, ICLR), earning six best paper awards. His research has been highlighted by Forbes, WIRED, and MIT Technology Review, and he has contributed to AI research at Genentech, GSK, Pfizer, IQVIA, Flatiron Health, Dana-Farber, and Rockefeller University.

Artificial Intelligence