Code Ocean and AWS transform reproducible scientific research with agentic AI

By: Ankith Ede, Solutions Architect – AWS
By: Jake Valsamis, Sr Product Manager – Code Ocean
By: Gokhul Srinivasan, Sr Partner Solutions Architect – AWS
By: Simon Adar, CEO – Code Ocean
By: Daniel Koster, VP of Product – Code Ocean

Code Ocean

Introduction

Cloud computing has transformed the life sciences industry, offering scientists access to vast computational resources: petabytes of storage and high performance compute. Yet this digital transformation comes with a paradox. Effectively and securely using the cloud requires engineering and DevOps skills that fall outside of the core expertise of most scientists. This gap has created pressure on research IT teams, who must balance the demands of infrastructure management and compliance while enabling scientists to focus on research.

As the pace of AI innovation accelerates, new demands are being placed on research IT because the safe and compliant use of these technologies hinges on IT-led governance. Generative and agentic AI accelerate scientific discovery by giving researchers the tools to analyze complex datasets and build workflows at scale using natural language. Although scientists are eager to adopt these tools, research IT must ensure secure implementation and meet both organizational and regulatory standards. Without adequate infrastructure, AI implementations risk security, IP exposure, inconsistent reproducibility, and lack of auditability.

In this post, we explore how organizations can adopt Code Ocean to empower research IT and equip scientists with a scalable, reproducible research platform. We’ll also highlight a real-world case study from the Allen Institute and introduce how research IT delivers secure, compliant access to agentic AI with Trusted Agents, which we recently launched in Code Ocean 4.0.

Scaling research IT with a FAIR and reproducible science platform

Code Ocean is a cloud-centered computational research platform that scientists can use to access cloud resources self-sufficiently while maintaining security, compliance, and cost control. Deployed within the customer’s Amazon Virtual Private Cloud (Amazon VPC) endpoint, Code Ocean provides built-in access to data and scalable compute for researchers to run analyses, freeing research IT from day-to-day support. Code Ocean automates the provisioning of cloud computing, which means scientists can focus on discovery rather than managing infrastructure.

At the core of the Code Ocean platform are integrated computational best practices. Code is automatically versioned with Git, so that every change is tracked and reversible. All analyses and pipelines are designed to be immutable and fully reproducible, which guarantees that any result can be recreated exactly as it was originally generated. Data lineage is captured for every result, providing a transparent record of how each output was produced from its source inputs. By working in Code Ocean all data, analyses, and pipelines are automatically findable, accessible, interoperable, and reusable (FAIR). Code Ocean also inherently supports the tools scientists already use, including RStudio, Jupyter, Code Server, Nextflow, and MLflow, which means business users can operate without dependency on research IT. The platform incorporates cost optimizations by default: storage is automatically tiered, idle compute resources are detected and shut down, and high-resolution budget monitoring provides clear visibility into spend. By embedding these capabilities into its platform, Code Ocean reduces the operational burden on research IT while giving scientists the flexibility to innovate and collaborate at scale.

From risk to readiness with Trusted Agents

Organizations are increasingly exploring the power of agentic AI to automate and accelerate the process of discovery. However, organizations must still consider secure, compliant, and documented usage before applying agent workflows in a highly regulated industry such as life sciences. It was these concerns that motivated Code Ocean’s latest release, Trusted Agents in version 4.0, which now enables research IT to bring secure, auditable AI to their organization. With this major release, Code Ocean provisions and manages AI infrastructure securely within the customer’s virtual private cloud (vpc), keeping sensitive data within the customer’s environment.

Code Ocean uses AWS Batch to run workflow pipelines and system jobs and Amazon Elastic Compute Cloud (Amazon EC2) instances for system services and workers. This provides built-in guardrails so scientists can use generative AI confidently within a compliant, controlled framework and work more independently with intelligent agents that guide analysis and assist with troubleshooting.

Secure AI infrastructure

Built on Amazon Bedrock, Code Ocean 4.0 uses large language model (LLMs) such as Amazon Nova Pro and Claude Sonnet by Anthropic in Amazon Bedrock, and supports agents built using agentic frameworks such as Strands Agents. This approach means that developers and scientists can safely use the latest AI capabilities without compromising security or compliance.

The following architecture diagram shows two EC2 instances with Code Ocean Amazon Machine Image (AMI) handling system services and worker tasks. LLM flows go to Amazon Bedrock. AWS Batch manages pipelines and system jobs, interacting with Amazon Elastic Container Service (Amazon ECS). Data is stored across different AWS data storage services.

Figure 1: Code Ocean AWS architecture for scalable workloads

Aqua: Trusted AI for reproducible science

The release introduces Aqua, a natural language AI agent designed specifically for scientists. Purpose-built to work within the context of Code Ocean, Aqua combines the reasoning power of Claude Sonnet models by Anthropic in Amazon Bedrock with the ability to retrieve platform-specific knowledge and execute actions directly. Using the Code Ocean Model Context Protocol (MCP), Aqua is equipped with 18 specialized tools for managing, searching, and executing tasks across platform resources. To ensure compliance and reproducibility, Aqua operates entirely within Code Ocean’s controlled environment, with several factors that underscore Aqua’s ability to generate fully tracked, reproducible, and accountable results:

Data assets accessed or created by Aqua are immutable.
Code generated by Aqua is executed within a Capsule for version control and reproducibility.
Results include full provenance, making the AI-generated output traceable to its underlying code, data, and environment.
Actions that Aqua performs are attributed to the specific user on whose behalf it operates, maintaining a comprehensive, auditable history.

The following screenshot shows the Code Ocean platform UI. On the left, the user asks Aqua the following question about the protein structure displayed on the right:

“What is the solvent accessible surface area of the 1GFL protein I’m viewing?”

With awareness of what is displayed in the platform UI, Aqua has all the necessary context to answer the question.

Figure 2: Aqua, Code Ocean’s AI assistant, embedded in the platform UI

The following screenshot shows the Aqua UI. On the left, an AI generated result of a single cell analysis is displayed. The right side shows the result’s lineage graph, which traces data flow from input, through capsule execution, to final reproducible result.

Figure 3: Aqua’s automated provenance and lineage tracking for AI-generated results

For developers, the release introduces Cline, a secure, context-aware coding agent embedded directly into Visual Studio Code (VSCode). Preconfigured through Amazon Bedrock, Cline provides LLM-powered coding assistance within a familiar development environment and enables integration of custom MCP servers into the Code Ocean platform.

Allen Institute scaled reproducible research with Code Ocean

Driven by the need for a secure, reproducible research platform, the Allen Institute, a Seattle-based research institute, adopted Code Ocean. Code Ocean accelerated Allen Institute’s transition to AWS and provided scientists with access to a scalable computing environment. By using Code Ocean’s integration with AWS services, researchers now consume over 1.75 million CPU hours and generate more than 150 TB of reproducible results each month. Using Code Ocean to manage scalability, cost controls, and compliance, the institute has scaled its data processing from terabytes to petabytes annually.

“Code Ocean makes it easy for our scientists to do their work reproducibly. New users to the platform can get far with just a little support; this gives our engineers time to focus on domain-specific challenges.”

– Dr. David Feng, Senior Director of Scientific Computing, the Allen Institute for Neural Dynamics

Three years into their partnership, the Allen Institute has already generated a data corpus on Code Ocean of over two petabytes, running over 450,000 computations totaling over 1,500,000 hours. They now support over 250 researchers across disciplines with the part-time effort of only five members of the research IT team.

Code Ocean: The trusted foundation for AI-driven science

The convergence of cloud computing and AI presents both a transformative opportunity and a growing challenge for life sciences organizations. As researchers strive to harness the power of these technologies, research IT needs to ensure security, compliance, and reproducibility at scale. Code Ocean bridges this divide, empowering scientists to innovate independently while allowing research IT to maintain full governance and control. The Allen Institute’s experience shows how research IT can use Code Ocean to support hundreds of researchers and petabyte-scale workloads through automation and easy access to scalable infrastructure.

Code Ocean Trusted Agents extends this foundation, providing scientists with secure, auditable access to AI capabilities within their own AWS environment. Aqua intelligent research assistant accelerates discovery with traceable and reproducible agentic analysis. It also acts as a real-time support agent, diagnosing build issues, interpreting AWS errors, and guiding users through complex workflows without IT intervention. Aqua frees IT teams to focus on infrastructure scaling and scientific innovation by cutting operational costs and boosting self-sufficiency.

Code Ocean delivers a future where every scientific breakthrough is reproducible, where researchers spend 100% of their time on discovery, and AI accelerates rather than complicates the scientific process.

Experience the future of secure, scalable science

Discover how Code Ocean empowers research IT and scientists to innovate faster with Trusted Agents. Book a demo today or visit www.codeocean.com to learn more.

Code Ocean – AWS Partner Spotlight

Code Ocean is an AWS Advanced Technology Partner that provides a Computational Science platform for life science Research and Development teams who want a fast and efficient way to start, scale, collaborate, and reproduce computational research

Contact Code Ocean | Partner Overview | AWS Marketplace

AWS Partner Network (APN) Blog

Code Ocean and AWS transform reproducible scientific research with agentic AI

Introduction

Scaling research IT with a FAIR and reproducible science platform

From risk to readiness with Trusted Agents

Secure AI infrastructure

Aqua: Trusted AI for reproducible science

Allen Institute scaled reproducible research with Code Ocean

Code Ocean: The trusted foundation for AI-driven science

Experience the future of secure, scalable science

Code Ocean – AWS Partner Spotlight

Resources

Follow

Learn

Resources

Developers

Help