Partner Success with AWS / Software & Internet / United States

May 2024

Fireworks AI

NVIDIA

Fireworks AI Delivers Blazing Fast Generative AI with NVIDIA and AWS

Connect with NVIDIA

Gained

access to the most powerful NVIDIA GPUs with Amazon EC2 instances

20X

higher performance over other generative AI providers

Delivered

up to 4X lower latency for Fireworks AI customers

Overview

Fireworks AI delivers a fast, affordable, and customizable platform for developers to run and fine-tune generative artificial intelligence (AI) models at scale. To provide the most performant inference service for ultra-low-latency use cases, Fireworks AI elected to run on NVIDIA H100 and A100 Tensor Core GPUs through Amazon EC2 P4 and P5 instances. This enabled Fireworks AI to deliver up to 4X lower latency than previous solutions with zero compromise on model quality.

Providing Performance and Quality for Every Generative AI Workload

With the emergence of generative AI, businesses have a wealth of new opportunities to utilize it. For example, generative AI can transform customer experiences by developing beautiful images or engaging in complex conversations. However, the generative AI models that power these experiences are extremely large and it’s difficult for businesses to serve and scale these models, especially given high user expectations for latency and quality. Waiting several seconds for an image to generate or a chat bot to respond can lead to a frustrating user experience that’s untenable for many use cases.

The founding team at Fireworks AI noticed these challenges through their work with PyTorch—the deep learning framework that the latest generative AI models are developed on. Using their experience from bringing PyTorch to life, the Fireworks AI team developed software that provides an easy-to-use API to run customized models with the best performance. However, Fireworks AI needed to ensure the hardware it used would support exceptionally fast inference.

kr_quotemark

We’re excited about the latest generation of GPUs from NVIDIA and AWS because of the higher memory bandwidth and computational power they provide.”

Lin Qiao
CEO and Co-founder, Fireworks AI

Taking Off with NVIDIA Chips

AWS Partner NVIDIA delivered the powerful GPUs that Fireworks AI needed to take off. “NVIDIA is the best GPU and high-performance kernel provider in the world,” said Lin Qiao, chief executive officer and co-founder at Fireworks AI. “Access to advanced GPUs through Amazon EC2 has been fantastic. We reliably get accelerated computing that helps us stay ahead of the game.” Fireworks AI serves on top of both NVIDIA A100 and H100 Tensor Core GPUs and has built its own kernel on top of NVIDIA’s libraries.

The platform also uses Amazon Elastic Kubernetes Service (Amazon EKS) and Amazon Simple Storage Service (Amazon S3). Amazon EKS offers an optimized image that includes configured NVIDIA drivers for GPU-enabled instances of Amazon Elastic Compute Cloud (Amazon EC2), making it easy to run GPU-powered workloads. For Fireworks AI, the Kubernetes tier allows the team to orchestrate services across various machines. “Because Amazon Web Services (AWS) has battle-tested Amazon EKS, we can focus on our product development,” said Dmytro Dzhulgakov, chief technical officer at Fireworks AI.

Sparking Insights with High-Performance Inference

With NVIDIA GPUs running on AWS, Fireworks AI can deliver customers a high-performance inference service. In fact, the H100 GPU provides up to 20X higher performance over the prior generation. It can also be partitioned into seven GPU instances using NVIDIA’s multi-instance GPU technology to dynamically adjust to shifting demands. “As we continue to optimize for performance, NVIDIA H100s are key because they accelerate serving speed greatly,” Qiao said.

Lowering Latency by 4X

In addition to high quality inference, Fireworks AI also delivers four times lower latency than other popular open-source large language model (LLM) engines like vLLM. “Fireworks AI works through the entire stack—from inference serving orchestration, to PyTorch runtime optimization and low-level kernel optimization, to device, CPU, and memory bandwidth optimization,” Qiao said. The result is a generative AI platform that enables both fast and high-quality inference, so that users can have the best possible experience with new generative AI products.

Building on a New Generation of GPUs

The Fireworks AI team continues to expand its partnership with NVIDIA to build out the next evolution of its serving tier. “We’re excited about the latest generation of GPUs from NVIDIA and AWS because of the higher memory bandwidth and computational power they provide,” Qiao said. Advancements in chip technology will directly impact the performance that Fireworks AI delivers to its customers.

About Fireworks AI

Fireworks AI offers a generative AI platform that enables product developers to run state-of-the-art, open-source models with the best speed, quality, and scalability.

About AWS Partner NVIDIA

Since its founding in 1993, NVIDIA has been a pioneer in accelerated computing. The company’s invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined computer graphics, ignited the era of modern AI, and is fueling industrial digitalization across markets. NVIDIA is now a full-stack computing company with data-center-scale offerings that are reshaping industry.

AWS Services Used

Amazon EC2

Amazon Elastic Compute Cloud (Amazon EC2) offers the broadest and deepest compute platform, with over 750 instances and choice of the latest processor, storage, networking, operating system, and purchase model to help you best match the needs of your workload.

Learn more »

Amazon EKS

Amazon Elastic Kubernetes Service (Amazon EKS) is a managed Kubernetes service to run Kubernetes in AWS and on-premises data centers. In the cloud, Amazon EKS automatically manages the availability and scalability of the Kubernetes control plane nodes responsible for scheduling containers, managing application availability, storing cluster data, and other key tasks.

Learn more »

Amazon S3

Amazon Simple Storage Service (Amazon S3) is an object storage service offering industry-leading scalability, data availability, security, and performance.

More Software & Internet Success Stories

no items found

Software & Internet

Palo Alto Networks Boosts 2,000 Developers’ Productivity Using AI Solutions from AWS, Anthropic, and Sourcegraph

Palo Alto Networks, a leading cybersecurity company, sought to boost developer productivity using generative artificial intelligence (AI) technology. The goal was to create a custom solution that would enhance the speed and quality of coding while maintaining strict security standards. By leveraging Amazon Web Services (AWS), Claude 3.5 Sonnet and Claude 3 Haiku from AWS Partner Anthropic, and Cody from AWS Partner Sourcegraph, Palo Alto Networks developed a secure AI tool for generating, optimizing, and troubleshooting code. Within three months, Palo Alto Networks onboarded 2,000 developers and increased productivity up to 40 percent, with an average of 25 percent. This custom AI solution has empowered both senior and junior developers, and the company expects further improvements in code quality and efficiency.

2024
Software & Internet

IBM Reduces the Co-Selling Lifecycle by 90% and Boosts Sales Opportunities with AWS by 117% with ACE CRM Integration Using Labra Platform

IBM, a global technology enterprise, wanted to simplify the process of creating and sharing AWS co-selling opportunities from Salesforce. IBM deployed an ACE CRM integration from AWS Partner Labra, a provider of software as a service (SaaS) solutions. The integration helps IBM sales and marketing teams move campaign responses and sales opportunities from within Salesforce directly into ACE. With Labra’s co-sell automation, IBM has cut co-sell time by 90 percent, increased co-sell opportunities by 117 percent, increased revenue, and created a custom integration that streamlines marketing nurture tools.

2024
Software & Internet

Starburst Accelerates AWS Co-Selling with Tackle ACE CRM Integration

Starburst, which provides an open data lakehouse platform for global customers, sought to reduce the manual effort required to collaborate with its strategic cloud partner, AWS. Starburst uses Salesforce as its CRM system and needed a solution to replicate relevant opportunity data to the APN Customer Engagements (ACE) pipeline manager. Starburst worked with AWS Partner Tackle and implemented the Tackle ACE CRM integration, which allows Starburst to enter, manage, and monitor sales activity to enable sophisticated co-sell activity within ACE, the AWS Partner Network (APN) sales collaboration tool. As a result, Starburst cut opportunity sharing time by up to 50 percent, reduced opportunity rejections, enhanced opportunity data quality, and enabled teams to focus on other joint go-to-market (GTM) opportunities.

2024
Software & Internet

ERIN’s Cloud Transformation Cuts Costs, Promotes Growth, and Enables New Features

An innovative employee referral platform operating with a team of 25 people, ERIN faced challenges keeping up with rapid growth. A collaboration with AWS Partner SourceFuse improved ERIN’s cloud setup to use its resources more effectively and eliminate unnecessary services. The outcome was transformative: ERIN cut hosting costs by 40 percent, found the flexibility it needed to meet demand, and advanced its feature roadmap. The solution helped ERIN achieve near triple-digit growth in revenue year over year, broaden its reach, and deliver new offerings to customers.

2024

1 …

… 39

Get Started

Organizations of all sizes across all industries are transforming their businesses and delivering on their missions every day using AWS. Contact our experts and start your own AWS journey today.

Contact Sales