Skip to main content
2024

Recursal AI Trains AI Models for 100+ Languages Using Amazon SageMaker HyperPod

Learn how Recursal AI scales foundation model training and inference using Amazon SageMaker HyperPod and AWS Inferentia.

Key Outcomes

1 million

tokens per second processed during a 15-day training run

7,500

tokens per second output achieved during inference

1%

of the traditional Transformer architecture’s footprint during inference

Nearly 1:1

scaling unlocked during training

Overview

Although generative artificial intelligence (generative AI) grabs the headlines, it requires significant budgets for compute time and remains largely out of reach for people who speak languages other than English. That’s why Recursal AI (Recursal) aims to make AI accessible for everyone, regardless of their native language or economic status. The company uses a novel language model architecture—receptance weighted key value (RWKV)—that drastically reduces the compute and energy requirements for text generation while significantly expanding the set of supported languages.

However, training a foundation model is no small feat: beyond expertise in machine learning, it requires enormous effort to prepare datasets and manage infrastructure throughout model training. Using Amazon Web Services (AWS), Recursal trained its recent model, EagleX, with an extremely lean infrastructure team. Now, Recursal has robust infrastructure and a strong foundation for training its next generation of models, making generative AI affordable and energy efficient for its customers.

Missing alt text value

About Recursal AI

Founded in 2023, Recursal AI is a startup that specializes in open-source artificial intelligence (AI) and has a mission to make AI more affordable and accessible by increasing cost efficiency and reducing language barriers.

Opportunity | Training Foundation Models with a Lean Team Using Amazon SageMaker HyperPod

Recursal grew out of the RWKV community project, Eagle, which was released as an open-source large language model (LLM) with 7.52 billion parameters. The startup was founded in December 2023 to scale its LLM so that it could work in over 100 different languages. Initially, Recursal experimented with different model training services, but it ran into issues with manual management, scalability, and networking efficiency. To address these challenges, Recursal turned to AWS to use its cost-effective and flexible services for large-scale training and inferencing.

The company first set up training pipelines using Amazon SageMaker HyperPod, which reduces the time to train foundation models by up to 40 percent with a purpose-built infrastructure for distributed training at scale. “The whole process was streamlined,” says Nathan Wilce, infrastructure/data lead at Recursal. “Using SageMaker HyperPod, we can take advantage of cluster resiliency features that identify and automatically recover training jobs from the last saved checkpoint in the event of a hardware failure.”

The startup also delivers cost savings to its customers by using Amazon FSx for Lustre, which provides fully managed shared storage with the scalability and performance of the popular Lustre file system. “High-speed storage is not as expensive as GPUs,” says Eugene Cheah, CEO of Recursal. “If your training cluster needs to pause for 10 minutes to save a checkpoint, that comes with a trade-off, and a lot of people don’t see that.” Using Amazon FSx with SageMaker HyperPod for model training, Recursal drives down the cost of its model training by reducing idle GPU time, thereby delivering cost-effective generative AI to its customers.

Using SageMaker HyperPod, the company increased both the speed and efficiency of model training. Using the nearly one-to-one scaling from SageMaker HyperPod networking, Recursal’s lean team processed 1 million tokens per second on a 16 ml.p5.48xlarge (8×H100) cluster during a 15-day training run. “By the time we compiled one training dataset, the model had already run through the last one,” says Harrison Vanderbyl, chief technology officer at Recursal.

Also, using SageMaker HyperPod, the company benefited from automated hardware recovery. “The real win for us was efficiency: GPUs fail under the heavy compute load of training, and hardware failure means restarting the training run from the last checkpoint. A bigger cluster means more tokens per second but also more hardware, hence more failures and recoveries throughout the run,” says Wilce. “Using SageMaker HyperPod, we changed the training run from a stressful operation needing 24/7 human oversight to a progress bar that we could casually monitor while attending to other parts of the business.”

Recursal’s EagleX model was launched globally on AWS Marketplace, where companies can discover, deploy, and manage software that runs on AWS. With the launch of this 8-billion-parameter model, Recursal plans to expand its model line to larger sizes. “We’re scaling our model architecture to compete with the high-performing open-source models” says Cheah.

Solution | Running AI Workloads on AWS at 1% of the Traditional Transformer Architecture’s Footprint

Though Recursal’s EagleX model is extremely efficient in terms of compute demands, the implementation on hardware offers another range of possibilities for speedups or slowdowns. So Recursal opted to use AWS Inferentia, which delivers high performance at low cost in Amazon EC2 for deep learning and generative AI inference, as its main inference hardware.

“We saw AWS Inferentia as an excellent match for the RWKV architecture,” says Vanderbyl. “It unlocks greater memory usage per core, taking advantage of the incredible parallelism of the EagleX model.” By using AWS Inferentia, the Recursal team has achieved a throughput of 7,500 tokens per second—and it expects to be able to triple or quadruple that number as it scales.

Moreover, by using 1 percent of the infrastructure footprint of the traditional Transformer architecture, the startup can offer low costs and significant energy-efficiency gains. “Our models now use significantly fewer resources per request, which helped us create a green model,” says Cheah

Outcome | Making Generative AI More Accessible Using AWS

By delivering generative AI capabilities on AWS in over 100 different languages, Recursal is increasing the accessibility of technical innovation across the globe. And by lowering the cost per token, it is empowering innovators while saving energy.

Moving forward, Recursal aims to simplify generative AI adoption for nontechnical users. “It takes effort to adapt a model to a use case, whether that’s prompt engineering or model fine-tuning. We’re building tools that make it extremely simple—almost automatic—to fine-tune our models and make generative AI adoption more intuitive for everyone,” says Wilce.

Though Recursal’s team is small, the startup is already serving global clients, and it plans to increase the capabilities of its models using AWS. “We are working alongside AWS to bring our cutting-edge solutions, products, and tools to market,” says Cheah.

Architecture Map

Logo for recursal.ai in black text with abstract icon on the left.
Using SageMaker HyperPod, we can take advantage of cluster resiliency features.

Nathan Wilce

Infrastructure/Data Lead, Recursal AI

Get Started

Organizations of all sizes across all industries are transforming their businesses and delivering on their missions every day using AWS. Contact our experts and start your own AWS journey today.
Contact Sales

Did you find what you were looking for today?

Let us know so we can improve the quality of the content on our pages