Customer Stories / Software & Internet / United States

2024
Arcee AI logo

Accelerating Innovation for AI/ML Using AWS Services with Arcee AI

Learn how software startup Arcee AI accelerated innovation using Amazon EC2 Capacity Blocks for ML, AWS Inferentia2, and AWS Trainium.

97.94% reduction

in training cost using AWS Trainium

Reduced from 17 hours

to 1.6-hour training using AWS Trainium

Overview

Artificial intelligence (AI) is rapidly advancing, compelling many startups to develop sophisticated models to tackle a variety of use cases. The larger and more complex the model, the more the GPUs that are required to train it. Arcee AI develops domain-adapted small language models (SLMs) to help enterprises perform specialized tasks, such as analyzing legal documents. It needs to have reliable access to GPUs to manage its compute-intensive workloads.

When Arcee AI went to market, it needed a large amount of accelerated compute resources to train the many SLMs that it is continually developing. But because of ongoing GPU shortages, the startup lacked the compute that is required to effectively train its models and bring them into production. To navigate this obstacle, Arcee AI implemented a system for flexible compute reservation on Amazon Web Services (AWS) and adopted AI chips from AWS to optimize price performance.

Woman pointing at computer screen

Opportunity | Using AWS Services to Train Language Models for Arcee AI

Founded in 2023, Arcee AI is a seed-stage startup that develops specialized and scalable machine learning (ML) models with advanced security features. Its solutions are primarily designed for enterprises with large proprietary bodies of text, such as financial institutions or tech corporations. Arcee AI’s SLMs act as advanced advisers that operate within each company’s cloud environment, summarizing and interpreting volumes of text to aid decision-making and streamline operations.

Arcee AI trains its models so that they can be adapted to the specific needs and data environments of its customers. Training language models, especially ones as large as Arcee AI’s, requires substantial computational power. However, the limited GPU capacity affected the startup’s ability to scale operations and deliver timely AI solutions to its customers. So Arcee AI needed to break down its computational tasks into smaller, less complex routines.

As the startup prepared to go to market, it needed a way to reserve compute as required while optimizing price performance. Seeking a solution, Arcee AI turned to the AWS team for guidance.

kr_quotemark

If you want to quickly develop an application that needs to be capable of different things in the future, AWS is a great place to build it.”

Jacob Solawetz
Cofounder and Chief Technology Officer, Arcee AI

Solution | Reducing the Training Time of ML Models from 17 to 1.6 Hours

To address the challenges it was facing, Arcee AI took a two-pronged approach. First, the startup adopted Amazon Elastic Compute Cloud (Amazon EC2) Capacity Blocks for ML, which organizations use to reserve GPU instances to run ML workloads. Using this service, Arcee AI built a system for flexible compute reservation that helps developers reliably secure GPU capacity in advance and at a reasonable price. This has facilitated the timely deployment of Arcee AI’s specialized models.

To help developers manage GPU reservations, the company built a web application that is hosted through an API architecture running on Amazon Elastic Container Service (Amazon ECS), a fully managed container orchestration service. When developers need GPUs, they can access the application’s front-facing dashboard to search for unused capacity blocks and reserve what they need in a few seconds.

“The idea that we could reserve these instances gave us peace of mind,” says Jacob Solawetz, cofounder and chief technology officer of Arcee AI. “We knew how hard GPUs were to get, and the use of Amazon EC2 Capacity Blocks for ML was the path forward for us and our customers.”

Second, to improve price performance, Arcee AI explored the breadth and depth of the accelerated computing portfolio of Amazon EC2 instances and selected two instance types. One is Amazon EC2 Trn1 Instances, which are powered by AWS Trainium accelerators—the ML chips that AWS purpose built for deep learning. The other is Amazon EC2 Inf2 Instances, which are powered by AWS Inferentia2—the second-generation AWS Inferentia accelerator. Arcee AI uses Trn1 Instances to continually pretrain its SLMs on domain-specific datasets so that it can fine-tune the models to address the nuances of specialized fields. Using Inf2 Instances, the startup can cost-effectively run inference workloads and serve its adapted models to customers. Taking advantage of AWS Trainium and AWS Inferentia2, the company reduced its dependence on GPUs while optimizing price performance.

In addition to having access to a variety of compute options and consumption models, Arcee AI can seamlessly incorporate other AWS services. By default, Arcee AI’s customers run Inf2 Instances on Amazon SageMaker, a service that is used to build, train, and deploy ML models for virtually any use case. But they can also choose among different AWS compute resources as needed. To quickly surface the information that they need to perform analytics, Arcee AI and its customers use a vector database that is powered by Amazon OpenSearch Service, which unlocks near real-time search, monitoring, and analysis of business and operational data.

Arcee AI’s customers also have the option to use over 1 million ML models from Hugging Face through the company’s SLM adaptation system, which facilitates the incorporation and customization of SLMs to match domain-specific business needs. They can also merge established models with their own trained models to enhance their AI capabilities even further.

Using AWS, Arcee AI has delivered successful outcomes for its customers using SLMs. For example, it takes a 7-billion-parameter model about 20–30 seconds to complete an inference, helping customers gain faster insights and responses. Additionally, Arcee AI has reduced its ML model training time from 17 to 1.6 hours using AWS Trainium, cutting the training cost by 97.94 percent. Using high-performance compute instances on AWS, Arcee AI has been able to efficiently train models with extensive datasets, including one PubMed dataset with 88 billion tokens.

Outcome | Driving Business Growth and AI Innovation

Using AWS, Arcee AI is empowering enterprises to harness the full potential of domain-adapted language models and drive transformative innovation. In the future, the startup plans to add more features to its SLM adaptation system to help customers rapidly adapt and scale their ML needs in line with their broader business goals.

“If you want to quickly develop an application that needs to be capable of different things in the future, AWS is a great place to build it,” says Solawetz. “Many aspects of AWS are hyperoptimized for ML workloads, from Amazon ECS GPU runtime agents to AWS Trainium chips. AWS is an excellent cloud environment to run ML and AI.”

About Arcee AI

Arcee AI develops domain-adapted small language models to help enterprises build, deploy, and manage generative artificial intelligence models. These specialized models operate entirely within a client’s virtual private cloud.

AWS Services Used

Amazon EC2

Amazon Elastic Compute Cloud (Amazon EC2) offers the broadest and deepest compute platform, with over 750 instances and choice of the latest processor, storage, networking, operating system, and purchase model to help you best match the needs of your workload.

Learn more »

Amazon EC2 Capacity Blocks for ML

With Amazon EC2 Capacity Blocks for ML, easily reserve Amazon EC2 P5 instances, powered by the latest NVIDIA H100 Tensor Core GPUs, and Amazon EC2 P4d instances, powered by NVIDIA A100 Tensor Core GPUs, for a future start date.

Learn more »

Amazon EC2 Trn1 Instances

Amazon EC2 Trn1 instances, powered by AWS Trainium chips, are purpose built for high-performance deep learning (DL) training of generative AI models, including large language models (LLMs) and latent diffusion models.

Learn more »

Amazon EC2 Inf2 Instances

Amazon EC2 Inf2 instances are purpose built for deep learning (DL) inference. They deliver high performance at the lowest cost in Amazon EC2 for generative artificial intelligence (AI) models, including large language models (LLMs) and vision transformers.

Learn more »

More Software & Internet Customer Stories

no items found 

1

Get Started

Organizations of all sizes across all industries are transforming their businesses and delivering on their missions every day using AWS. Contact our experts and start your own AWS journey today.