Project Ceiba

Constructing the world's largest AI supercomputer in the cloud

Constructing the world’s largest AI supercomputer in the cloud

Project Ceiba, a groundbreaking collaboration between AWS and NVIDIA, aims to push the boundaries of artificial intelligence (AI) by constructing the largest AI supercomputer in the cloud. Hosted exclusively on AWS, this cutting-edge supercomputer will power NVIDIA's research and development efforts in AI.

Drive cutting-edge innovation

NVIDIA research and development teams will harness the immense power of Project Ceiba to drive advancements in a wide range of cutting-edge fields, including large language models (LLMs), graphics (images, videos, and 3D generation), simulation, digital biology, robotics, autonomous vehicles, climate prediction with NVIDIA Earth-2, and more. This groundbreaking initiative will propel NVIDIA’s work to advance generative AI, shaping the future of artificial intelligence and its applications across diverse domains.

Design pattern

Scalable AI infrastructure

Project Ceiba will be available via the NVIDIA DGX Cloud architecture. DGX Cloud is an end-to-end, scalable AI platform for developers, offering scalable capacity built on the latest NVIDIA architecture and co-engineered at every layer with AWS. DGX Cloud will be available on AWS later this year, and AWS will be the first Cloud Service Provider to offer NVIDIA Blackwell architecture-based DGX Cloud with GB200s. Project Ceiba is built upon AWS's purpose-built AI infrastructure, engineered to deliver the immense scale, enhanced security, and unparalleled performance necessary for a supercomputer of this magnitude.

Design pattern

Exaflops of AI processed, around 375 times more powerful than the current world's fastest supercomputer Frontier

per superchip, enabling lightning-fast data transfer and processing

NVIDIA GB200 Grace-Blackwell GPUs, the first-of-its-kind supercomputer

Features

This joint project has set several industry-defining milestones:
Project Ceiba's configuration includes 20,736 NVIDIA GB200 Grace Blackwell Superchips. This first-of-its-kind supercomputer is built using NVIDIA’s latest GB200 NVL72, a liquid-cooled, rack-scale system featuring fifth-generation NVLink, that scales to 20,736 Blackwell GPUs connected to 10,368 NVIDIA Grace CPUs. This supercomputer is capable of processing a massive 414 exaflops of AI, that’s around 375 times more powerful than the current world's fastest supercomputer Frontier. If the entire world's current supercomputing capacity was combined, it wouldn't reach 1% of the computing power represented by 414 exaflops. To put this into perspective, it is equivalent to having over 6 billion of the world's most advanced laptop computers working in tandem. To put this further into perspective, if every human on Earth performed one calculation per second, it would take them over 1,660 years to match what Project Ceiba can achieve in just one second.

Project Ceiba is the first system to leverage the massive scale-out capabilities enabled by fourth-generation AWS Elastic Fabric Adapter (EFA) networking, providing an unprecedented 1,600 Gbps per superchip of low-latency, high-bandwidth networking throughput, enabling lightning-fast data transfer and processing. 

Liquid cooling has been around for years. Gamers will have that for their personal gaming computer. While it’s not a new technology and AWS has purposefully chosen air-cooling over liquid cooling before Project Ceiba because it made cost-effectiveness sense. To address the power density challenges and deliver this unparalleled computing power in Project Ceiba, AWS has pioneered the use of liquid cooling at scale in data centers for more efficient and sustainable high-performance computing solutions.

Project Ceiba will incorporate industry-leading security features designed to protect even the most sensitive AI data. NVIDIA's Blackwell GPU architecture, which provides a secure communication between GPUs integrated with AWS Nitro System and EFA technologies, will enable secure end-to-end encrypted data for generative AI workloads. This joint solution provides the decryption and loading of sensitive AI data into the GPUs while maintaining complete isolation from the infrastructure operators. All while verifying the authenticity of the applications used to process the data. Using the Nitro System, customers can cryptographically validate their applications to AWS Key Management System (KMS) and decrypt data only when the necessary checks pass, ensuring end-to-end encryption for their data as it flows through generative AI workloads. Read this blog and visit the secure AI webpage to learn more.