Customer Stories / Software & Internet / United States

Forethought Logo

Optimizing Costs and Performance for Generative AI Using Amazon SageMaker with Forethought Technologies

Learn how Forethought Technologies, a provider of generative AI solutions for customer service, reduced costs by up to 80 percent using Amazon SageMaker.

80% cost reduction

using Amazon SageMaker Serverless Inference

66% cost reduction

using Amazon SageMaker multi-model endpoints

Improved resource efficiency

and availability

Improved customer response times

and hyperpersonalization


Forethought Technologies (Forethought), a customer service software provider, wanted to improve its machine learning (ML) costs and availability as it gained new customers. The company was already using Amazon Web Services (AWS) for ML model training and inference and wanted to be increasingly efficient and scalable with its small cloud infrastructure team.

To achieve its goals, Forethought migrated the inference and hosting of ML models to Amazon SageMaker, which is used to build, train, and deploy ML models for virtually any use case with fully managed infrastructure, tools, and workflows. Using Amazon SageMaker, Forethought improved availability and customer response times and reduced its ML costs by up to 80 percent.

Two Businesspeople Examining Graph On Computer

Opportunity | Using Amazon SageMaker to Support More Customers at Lower Cost for Forethought

Forethought’s suite of customer service solutions is powered by generative AI, a type of AI that can create new content and ideas, including conversations, stories, images, videos, and music. At the center of Forethought’s product is its SupportGPT technology, which uses large language models and information retrieval systems to power over 30 million customer interactions each year. Through automation, the company reduces the load on customer support teams by assisting users with conversational AI. Many of Forethought’s customers use its product during busy periods, such as holidays or tax season, to handle more customer issues with fewer customer support agents. Forethought offers hyperpersonalized ML models for its customers, often training multiple models per customer to meet individual use cases.

Forethought was founded in 2017 in the United States and initially used multiple cloud providers to host its products, using Amazon SageMaker for training ML models. In its first 2 years, the company built a solution for its ML inference using Amazon Elastic Kubernetes Service (Amazon EKS), a managed Kubernetes service to run Kubernetes on the AWS Cloud and on premises. As the company continued to grow and gain new customers, it wanted to improve the availability of its solution and reduce costs.

To meet its scalability, availability, and cost-optimization needs, Forethought chose to migrate its ML inference to Amazon SageMaker, and the company began using additional features of Amazon SageMaker to improve its products. In this process, Forethought architected its pipeline to benefit from the latency and availability improvements that it could achieve using Amazon SageMaker. “From the Amazon SageMaker team and across the board, for anything that we need, they connect us with the right people so that we can be successful using AWS,” says Jad Chamoun, director of core engineering at Forethought.


By migrating to Amazon SageMaker multi-model endpoints, we reduced our costs by up to 66% while providing better latency and better response times for customers.”

Jad Chamoun
Director of Core Engineering, Forethought Technologies

Solution | Reducing Costs and Improving Availability Using Amazon SageMaker Inference

Forethought migrated its ML inference from Amazon EKS to Amazon SageMaker Model Deployment multi-model endpoints, a scalable and cost-effective solution to deploying large numbers of models. One example of this feature in action in Forethought’s solution is autocompleting the next words in a sentence when a user is typing. The company uses Amazon SageMaker multi-model endpoints to run multiple ML models on a single inference endpoint. This improves the scalability and efficiency of hardware resources such as GPUs. The company also reduced costs by using Amazon SageMaker multi-model endpoints. “Using Amazon SageMaker, we can support customers at a lower cost per customer,” says Chamoun. “By migrating to Amazon SageMaker multi-model endpoints, we reduced our costs by up to 66 percent while providing better latency and better response times for customers.”

Forethought also uses Amazon SageMaker serverless inference, a purpose-built inference option, to deploy and scale ML models without configuring or managing any of the underlying infrastructure. Forethought’s use of Amazon SageMaker Serverless Inference revolves around small models and classifiers that are fine-tuned to each customer use case, such as automatically determining the priority of a support ticket. By migrating some of its classifiers to Amazon SageMaker Serverless Inference, Forethought saved around 80 percent on related cloud costs.

The cloud infrastructure team at Forethought is a team of three people. Running and managing all the ML models and Kubernetes clusters was too much overhead for the small team. Using Amazon SageMaker, the company can scale as much as it wants with the people it has. “We run multiple instances within Amazon SageMaker multi-model endpoints,” says Chamoun. “We are able to share resources more efficiently while providing better availability than we did in the past.”

Using Amazon SageMaker, the Forethought team no longer has to worry about memory exceptions or availability, issues that the three engineers otherwise would have spent considerable time working on. Because the company set up the automated pipelines for language models using Amazon SageMaker, teams at Forethought and its customers can interface with the data that they want to train and submit it. “Not having to be involved as things are being trained, deployed, and scaled was key for us to work on other things that are more impactful for the company,” says Chamoun. Forethought now runs over 80 percent of its GPU inference on Amazon SageMaker between Amazon SageMaker multi-model endpoints and Amazon SageMaker Serverless Inference.

Outcome | Continuing to Provide Hyperpersonalization Using AWS

Forethought is continuing to grow and provide hyperpersonalized ML models for more customers. The company is still engaging AWS to improve its infrastructure and innovate its product. Forethought is part of the AWS Global Startup Program, an invite-only, go-to-market program supporting mid-to-late-stage startups that have raised institutional funding, achieved product-market fit, and are ready to scale. The company is getting the word out about its product, which is now on AWS Marketplace.

“Whether it’s our search services, our inference for specific ML models, or chatting with our customer support bots, everything we have uses Amazon SageMaker,” says Chamoun.

About Forethought Technologies

Forethought Technologies is a startup in the United States providing a generative AI suite for customer service that uses machine learning to transform the customer support life cycle. The company powers over 30 million customer interactions a year.

AWS Services Used

Amazon SageMaker

Build, train, and deploy machine learning (ML) models for any use case with fully managed infrastructure, tools, and workflows

Learn more »

AWS Global Startup Program

The AWS Global Startup program is an invite-only, go-to-market program supporting mid-to-late stage startups that have raised institutional funding, achieved product-market fit, and are ready to scale.

Learn more »

More Generative AI Customer Stories

no items found 


Get Started

Organizations of all sizes across all industries are transforming their businesses and delivering on their missions every day using AWS. Contact our experts and start your own AWS journey today.