AWS and Nvidia build a supercomputer with 16,384 Superchips, Team Up for Generative AI Infrastructure


Although many companies are developing accelerators for artificial intelligence (AI) workloads, Nvidia's CUDA platform is currently unrivaled regarding AI support. As a result, demand for Nvidia-based AI infrastructure is high. To address it, Amazon Web Services and Nvidia entered a strategic partnership under which AWS will offer Nvidia-based infrastructure for generative AI. The two companies will partner on several key projects.

"Today, we offer the widest range of Nvidia GPU solutions for workloads including graphics, gaming, high performance computing, machine learning, and now, generative AI," said Adam Selipsky, CEO at AWS. "We continue to innovate with Nvidia to make AWS the best place to run GPUs, combining next-gen Nvidia Grace Hopper Superchips with AWS's EFA powerful networking, EC2 UltraClusters' hyper-scale clustering, and Nitro's advanced virtualization capabilities."

Project Ceiba is a cornerstone of this collaboration, aiming to create the world's fastest GPU-powered AI supercomputer hosted by AWS and available exclusively for Nvidia. This ambitious project will integrate 16,384 Nvidia GH200 Superchips (using the GH200 NVL32 solution packing 32 GH200 GPUs with 19.5 TB of unified memory) that are set to offer a staggering 65 'AI ExaFLOPS' of processing power. This supercomputer is for Nvidia's generative AI research and development projects.

The Nvidia DGX Cloud hosted on AWS is another major component of the partnership. This AI-training-as-a-service platform is the first commercially available instance to incorporate the GH200 NVL32 machine with 19.5 TB of unified memory. The platform provides developers with the largest shared memory available in a single instance, significantly accelerating the training process for advanced generative AI and large language models, potentially exceeding 1 trillion parameters.

In addition, AWS will be the first to offer a cloud-based AI supercomputer based on Nvidia's GH200 Grace Hopper Superchips. This unique configuration will connect 32 Grace Hopper Superchips per instance using NVLink. It will scale up to thousands of GH200 Superchips (and 4.5 TB HBM3e memory) connected with Amazon's EFA networking and supported by advanced virtualization (AWS Nitro System) and hyper-scale clustering (Amazon EC2 UltraClusters).

The collaboration will also introduce new Nvidia-powered Amazon EC2 instances. The instances will feature H200 Tensor Core GPUs with up to 141 GB of HBM3e memory for large-scale generative AI and high-performance computing (HPC) workloads. Additionally, G6 and G6e instances, equipped with NvidiaL4 and L40S GPUs, respectively, are designed for a wide array of applications ranging from AI fine-tuning to 3D workflow development and leverage Nvidia Omniverse for creating AI-enabled 3D applications.

Finally, the collaboration will introduce Nvidia's advanced software to speed up generative AI development on AWS. This includes the NeMo LLM framework and NeMo Retriever for creating chatbots and summarization tools and BioNeMo for accelerating drug discovery processes.

"Generative AI is transforming cloud workloads and putting accelerated computing at the foundation of diverse content generation," said Jensen Huang, founder and CEO of Nvidia. "Driven by a common mission to deliver cost-effective state-of-the-art generative AI to every customer, Nvidia and AWS are collaborating across the entire computing stack, spanning AI infrastructure, acceleration libraries, foundation models, to generative AI services."