AI21 Labs logo

AI21 Labs Trains 178-Billion-Parameter Language Model Using Amazon EC2 P4d Instances, PyTorch


AI21 Labs uses machine learning to develop language models focused on understanding meaning, and in 2021 it set a goal to train the recently released Jurassic-1 Jumbo, an autoregressive language model with 178 billion parameters. Developers who register for beta testing will get access to Jurassic-1 Jumbo and can immediately start to customize the model for their use case. The software startup wanted to train the model efficiently, so it looked to Amazon Web Services (AWS) and built a solution using Amazon Elastic Compute Cloud (Amazon EC2), a web service that provides secure, resizable compute capacity in the cloud. Choosing Amazon EC2 gave the company control over the training process, including node allocation.

For powerful compute and networking functionality, the company selected Amazon EC2 P4d Instances, which deliver high throughput and low-latency networking for machine learning training and high-performance computing applications in the cloud. Using Amazon EC2 P4d Instances, AI21 Labs gained the required performance and memory by distributing model training across hundreds of GPUs to deliver natural language processing as a service through its Jurassic-1 Jumbo model. Because the company now trains and controls its own large-scale model, it can work toward developing new models at the same scale and innovate with greater ease.

Members of the A121 team gather in their open office for a meeting

“Amazon EC2 P4d Instances offer 400 Gbps high-performance networking on EFA. The GPU-to-GPU networking speed directly impacts the ability to scale efficiently and remain cost effective when scaling to hundreds of GPUs.” 

Opher Lieber
Technical Lead for Jurassic, AI21 Labs

Powering Language Model Training at Scale

Founded in 2017, AI21 Labs pursues a hybrid mission: conducting natural language processing research and developing artificial intelligence–powered products for reading and writing. Its flagship product, Wordtune, is an intelligent writing and editing assistant that launched in October 2020 and has grown to support nearly one million users. Its other primary product, AI21 Studio, offers API access to the company’s Jurassic-1 language models as well as custom model development. “We are part of a small cohort of companies that are offering language models as a service, empowering anyone from independent developers to multinational enterprises to build apps and services on top of advanced natural language processing technology,” says Yoav Shoham, cofounder and co-CEO at AI21 Labs. “Additionally, we’re pursuing scientific innovations and tackling software engineering challenges posed by models of this size and complexity.”

To train its first deep learning megamodel efficiently and support the model’s high scaling and performance needs, AI21 Labs needed powerful compute, efficient networking speed, and access to technical support and guidance. For these reasons, in early 2021 the company began implementing a solution on AWS, opting to train the model using Amazon EC2 P4d Instances. These instances are deployed in hyperscale clusters called Amazon EC2 UltraClusters, providing more than 4,000 NVIDIA A100 GPUs, petabit-scale nonblocking networking infrastructure, and high throughput, low-latency storage. 

The company’s approach was further optimized over low-latency, high-bandwidth GPUDirectRDMA, along with Elastic Fabric Adapter (EFA), a network interface for Amazon EC2 instances that lets customers run applications requiring high levels of internode communications at scale on AWS. Because of the size of the model, the team needed to use parallel processing to achieve an efficient training time, so it looked to the networking capabilities on AWS to support its distributed training and model parallelism. “Amazon EC2 P4d Instances offer 400 Gbps high-performance networking on EFA,” says Opher Lieber, Jurassic technical lead at AI21 Labs. “The GPU-to-GPU networking speed directly impacts the ability to scale efficiently and remain cost effective when scaling to hundreds of GPUs.”

Hitting Key Training Milestones on AWS

AI21 Labs began by bringing up its code base on Amazon EC2 P4d Instances activated for EFA. Then it tested and verified the performance and efficient scaling of its multinode training approach. Next the team launched a quick training of the full-size model—which uses hundreds of GPUs—to verify function and performance. From there it was able to begin training its Jurassic-1 Jumbo model on AWS. For orchestration, the company chose an in-house solution that allocates instances using an AWS software development kit—the AWS SDK for Python (Boto3), which makes it easy to integrate a customer’s Python application, library, or script with various AWS services.

For storage, AI21 Labs chose Amazon Simple Storage Service (Amazon S3), which offers industry-leading scalability, data availability, security, and performance. “We were able to reach very good performance on Amazon S3 using help from the AWS team—so it was an easy choice because of both performance and price,” says Lieber. The team uses Amazon S3 buckets to store and load checkpoints efficiently and in a distributed way. To log training progress and events, the team uses Amazon CloudWatch, a monitoring and observability service. 

While implementing its solution, AI21 Labs took advantage of support from AWS. Its team consulted AWS specialists who provided guidance on service-level, architectural, and hardware-related questions and concerns. Moreover, the company improved the performance of Jurassic-1 Jumbo using PyTorch on AWS, an open-source deep learning framework that makes it easy to develop machine learning models and deploy them to production. 

AI21 Labs completed training over the course of several months, concluding in June 2021. The new megamodel, an autoregressive language model, has 178 billion parameters, which is comparable to the company’s competitor’s offering. It also offers a differentiated 256,000-item vocabulary that provides expanded text representation capabilities as well as support for named entities. The company now offers Jurassic-1 Jumbo (along with its counterpart, Jurassic-1 Large, which has 7 billion parameters) in open beta through the company’s AI21 Studio offering. Using the service, a wide range of developers can build products on the Jurassic-1 Jumbo model, and AI21 Labs has already seen adoption across many industries, including marketing, content creation, gaming, medical research, automotive, and telecommunications and finance.

Using Its Model to Innovate with Agility

Because AI21 Labs owns and has direct access to its model, it can adapt and innovate without depending on third parties and can explore ongoing innovation goals, which are a key part of its mission. AI21 Labs is currently prototyping additional models, which it also plans to train at scale. “Training and owning our own megamodels will continue to be a critical differentiating factor in both our Wordtune and AI21 Studio offerings,” says Shoham.

About AI21 Labs

Headquartered in Tel Aviv, Israel, AI21 Labs develops large-scale language models focused on understanding semantics and context and delivers artificial intelligence–based writing assistance through its flagship product, Wordtune, and reading assistance through its AI-powered reading tool, Wordtune Read.

Benefits of AWS

  • Scaled to hundreds of GPUs efficiently and cost effectively
  • Supported distributed training and model parallelism on PyTorch
  • Built knowledge for developing models at scale
  • Trained its own model, supporting innovation and agility
  • Developed a language model with 178 billion parameters and a 256,000-item vocabulary
  • Supports application development using its model

AWS Services Used

Amazon EC2 P4d Instances

Amazon EC2 P4d instances deliver the highest performance for machine learning (ML) training and high performance computing (HPC) applications in the cloud. P4d instances are powered by the latest NVIDIA A100 Tensor Core GPUs and deliver industry-leading high throughput and low latency networking. 

Learn more »

Elastic Fabric Adapter

Elastic Fabric Adapter (EFA) is a network interface for Amazon EC2 instances that enables customers to run applications requiring high levels of inter-node communications at scale on AWS. Its custom-built operating system (OS) bypass hardware interface enhances the performance of inter-instance communications, which is critical to scaling these applications. 

Learn more »

Amazon S3

Amazon Simple Storage Service (Amazon S3) is an object storage service offering industry-leading scalability, data availability, security, and performance. Customers of all sizes and industries can store and protect any amount of data for virtually any use case, such as data lakes, cloud-native applications, and mobile apps. 

Learn more »

Get Started

Companies of all sizes across all industries are transforming their businesses every day using AWS. Contact our experts and start your own AWS Cloud journey today.