NVIDIA Run:ai

NVIDIA Run:ai is a Kubernetes-native platform for orchestrating AI workloads and optimizing GPU resources. Designed for ML and AI teams, it streamlines resource management, boosts GPU utilization, and accelerates development. Run:ai offers seamless integration with leading MLOps tools and cloud environments, enabling dynamic GPU scheduling, fractional GPU allocation, and automated workload orchestration.

View purchase options

Request private offer

Overview

Try agent mode

Create proposal

Ask question

*** This offering is only available via private offer - please contact your NVIDIA sales representative to initiate the process ***

NVIDIA Run:ai delivers an enterprise-grade AI workload orchestration platform that maximizes the efficiency and scalability of your AWS GPU infrastructure. Purpose-built for Kubernetes environments and optimized for AI/ML workloads, Run:ai enables AWS customers to achieve greater throughput, improved utilization, and faster model development - all while maintaining tight control over resources and costs.

Run:ai abstracts the complexity of managing GPU resources and accelerates time-to-insight for data science teams, while providing DevOps and IT stakeholders with robust tools for visibility, policy enforcement, and cost optimization. Run:ai ensures rapid deployment and integration with AWS-native services such as Amazon EKS and Amazon EC2 GPU instances, and AWS Identity and Access Management (IAM).

Key capabilities:

Flexible GPU Scaling for AI Workloads: Seamlessly scale GPU resources up or down across AWS environments to match the dynamic needs of training, tuning, and inference. Automated GPU Orchestration: Ensure optimal resource allocation and scheduling for multiple workloads using intelligent policies that minimize idle time. Team-Based Resource Governance: Use role-based access control and team-level quotas to ensure isolation, compliance, and shared infrastructure visibility across AI teams. Integration with AWS Services: Deploy alongside Amazon EKS and integrate with services like Amazon S3, CloudWatch, and IAM for a unified operational experience. MLOps Workflow Compatibility: Native support for JupyterHub, Kubeflow, MLflow, and other AWS-hosted tools to support end-to-end machine learning pipelines. With NVIDIA Run:ai, organizations can rapidly onboard AI teams, democratize access to GPU infrastructure, and accelerate innovation while keeping infrastructure flexible and cost-effective. The solution is ideal for enterprises looking to scale AI initiatives without the burden of managing complex infrastructure manually.

Note: This AMI includes the KAI Scheduler, an open-source GPU workload scheduler from NVIDIA. KAI provides a lightweight preview of a limited subset of the capabilities available in Run:ai's full enterprise platform.

Highlights

Optimize GPU Usage at Scale: Run:ai eliminates idle GPUs by enabling fractional sharing and dynamic allocation, maximizing hardware efficiency across teams and workloads.
Purpose-Built AI Scheduling: the Run:ai intelligent scheduler is designed specifically for AI workloads, using techniques like gang scheduling and preemption to efficiently manage complex training and inference jobs.
Centralized Hybrid Control: Manage all GPU resources - on-prem, cloud, or hybrid - from a single control plane with full visibility, policy enforcement, and multi-tenant support.

Details

Sold by

NVIDIA

Introducing multi-product solutions

You can now purchase comprehensive solutions tailored to use cases and industries.

Learn more

Explore multi-product solutions

Features and programs

Financing for AWS Marketplace purchases

AWS Marketplace now accepts line of credit payments through the PNC Vendor Finance program. This program is available to select AWS customers in the US, excluding NV, NC, ND, TN, & VT.

View financing details

Pricing

NVIDIA Run:ai

Info

View purchase options

Pricing is based on actual usage, with charges varying according to how much you consume. Subscriptions have no end date and may be canceled any time.

Additional AWS infrastructure costs may apply. Use the AWS Pricing Calculator to estimate your infrastructure costs.

Usage costs (1)

Info

Dimension	Description	Cost/unit
GPU	Run:ai Annual Plan per GPU	$2,600.00

Vendor refund policy

https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-software-license-agreement/

Custom pricing options

Request private offer

Request a private offer to receive a custom quote.

How can we make this page better?

We'd like to hear your feedback and ideas on how to improve this page.

Legal

Vendor terms and conditions

Upon subscribing to this product, you must acknowledge and agree to the terms and conditions outlined in the vendor's End User License Agreement (EULA) .

Content disclaimer

Vendors are responsible for their product descriptions and other product content. AWS does not warrant that vendors' product descriptions or other product content are accurate, complete, reliable, current, or error-free.

Usage information

Info

Delivery details

64-bit (x86) Amazon Machine Image (AMI)

Amazon Machine Image (AMI)

An AMI is a virtual image that provides the information required to launch an instance. Amazon EC2 (Elastic Compute Cloud) instances are virtual servers on which you can run your applications and workloads, offering varying combinations of CPU, memory, storage, and networking resources. You can launch as many instances from as many different AMIs as you need.

Version release notes

https://run-ai-docs.nvidia.com/self-hosted/getting-started/whats-new/whats-new-2-23

Additional details

Usage instructions

Prerequisites

An AWS account with permissions to launch Amazon EC2 instances and manage IAM roles.
Access to supported Amazon EC2 GPU instance types
An existing Kubernetes cluster (Amazon EKS)
Network connectivity that allows the Kubernetes cluster to communicate with the NVIDIA Run:ai control plane
kubectl installed on a machine with access to the Kubernetes cluster

Deployment

Accept the AWS Marketplace Private Offer for this product.
Select a supported Amazon EC2 GPU instance type.
Configure networking, IAM role, and security groups as required.
Complete the instance launch.

Adding a Kubernetes Cluster to Run:ai Before installing the NVIDIA Run:ai cluster, verify that system and network requirements are met.

Log in to the NVIDIA Run:ai platform.
If this is the first cluster, the New Cluster form opens automatically. Otherwise, navigate to Resources and click + New Cluster.
Enter a unique name for the cluster.
(Optional) Select the NVIDIA Run:ai cluster version (latest by default).
Enter the cluster URL (fully qualified domain name).
Click Continue.

Installing the NVIDIA Run:ai Cluster

Review the installation instructions presented in the platform.
Run the provided installation commands on the target Kubernetes cluster.

Detailed installation instructions can be found here: https://run-ai-docs.nvidia.com/self-hosted/getting-started/installation/install-using-helm/helm-install

For details about KAI Scheduler open source: https://github.com/NVIDIA/KAI-Scheduler

Support

Vendor support

NVIDIA Run:ai customers receive Enterprise Business Standard Support for this solution:

Get support

AWS infrastructure support

AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.

Get support

Similar products

NVIDIA Run:ai

By NVIDIA

View product

NVIDIA AI Enterprise

By NVIDIA

NVIDIA AI Enterprise is an end-to-end, cloud-native software platform that accelerates data science pipelines and streamlines development and deployment of production-grade AI applications, including generative AI.

View product

NVIDIA GPU-Optimized AMI

By NVIDIA

The NVIDIA GPU-Optimized AMI is an environment for running the GPU-accelerated deep learning and HPC containers from the NVIDIA NGC catalog. The deep learning containers from NGC catalog require this AMI for GPU acceleration on AWS P5d, P4d, P3, G4dn, G5 GPU instances.

View product

NVIDIA Isaac Sim™ Development Workstation (Linux)

By NVIDIA

NVIDIA Isaac Sim™ Development Workstation (Linux) is preconfigured with the software tools for accelerating and scaling your robotic simulation workloads from synthetic data generation to software-in-loop testing of your robotics stack.

View product

Customer reviews

Leave a review

Ratings and reviews

Info

0 ratings

5 star

4 star

3 star

2 star

1 star

0 reviews

No customer reviews yet

Be the first to review this product . We've partnered with PeerSpot to gather customer feedback. You can share your experience by writing or recording a review, or scheduling a call with a PeerSpot analyst.