Customer Stories / Telecommunications / United States

2024
Cisco logo

Accelerating LLMs Using Amazon SageMaker with Cisco

Learn how Cisco improved efficiency and optimized inference costs using Amazon SageMaker and NVIDIA Triton Inference Server.

Improved

development and deployment cycle time

Simplified

engineering and delivery

Reduced

costs

Overview

To keep up with fast-paced advancements in the technology industry, hardware and software company Cisco uses Amazon Web Services (AWS) to power its innovation. When its growing artificial intelligence (AI) and machine learning (ML) models needed more resources, Cisco began hosting its models on a managed, purpose-built AWS solution to scale its models separately from its applications. Now, it has simplified its engineering and unlocked greater efficiency.

girl at night with phone

Opportunity | Using Amazon SageMaker to Optimize Resources for Cisco

Founded in 1984, Cisco is a global hardware, software, and service company that seeks to help enterprises, commercial businesses, and consumers make powerful connections through advanced technology and support. Over the years, Cisco has expanded significantly, employing over 71,000 people worldwide. One way that Cisco grows is by acquiring other companies—including Webex, which develops telecommunications applications for web conferencing and videoconferencing. Cisco’s Webex team has branched out into natural language AI, building a comprehensive set of AI and ML features for multiple use cases—such as background noise removal, chatbots, and speech recognition. To build new features into its Webex suite, Cisco has begun using large language models (LLMs), which can contain up to hundreds of gigabytes of data.

The Webex team operates multiple applications that offer AI and ML features. Most of these applications are hosted on Amazon Elastic Kubernetes Service (Amazon EKS), a managed Kubernetes service to start, run, and scale Kubernetes on AWS and on-premises data centers. The team embedded ML models into the container image for applications running on Amazon EKS. However, operating ML models through applications in this way requires significant resources. As the team developed larger and more sophisticated models, it began to run into efficiency and cost issues. The LLMs had slowed the processes of allocating resources and starting applications.

In 2022 , the team started to look for an alternative approach. It decided to begin separating the embedded ML models from the applications and migrate the models to Amazon SageMaker, which developers use to build, train, and deploy ML models for virtually any use case with fully managed infrastructure, tools, and workflows. This way, applications can scale separately from the models, which increases speed and saves resources.

kr_quotemark

AWS services are reliable and cost effective. We have a lot of options for using our resources efficiently.”

Travis Mehlinger
Principal Engineer, Cisco

Solution | Improving Efficiency and Reducing Costs by Migrating to Amazon SageMaker

After testing Amazon SageMaker in multiple environments, Cisco’s Webex team had immediate success. It quickly migrated its large models—spanning 3 environments and 10 different applications that require at least 1 model—to Amazon SageMaker while continuing to host the applications on Amazon EKS. Cisco deployed dozens of models on Amazon SageMaker endpoints. It also used NVIDIA’s Triton Inference Server, which supports model concurrency and scales globally across AWS data centers.

The migration made it simpler for Cisco’s engineers to deliver applications, creating a clean break between the relatively lean applications and the underlying AI and ML models, which require more resources . “The applications and the models work and scale fundamentally differently, with entirely different cost considerations,” says Travis Mehlinger, principal engineer at Cisco. “By separating them rather than lumping them together, it’s much simpler to solve issues independently.”

By migrating its models to Amazon SageMaker, the team also significantly improved development and deployment cycle time. With the models available using Amazon SageMaker endpoints, developers do not need to keep the models in their workstation memory to make changes to applications. This way, application startup time decreased, and experiments speeded up. Now, the team has the resources available to fix bugs, perform tests, and add features to applications in its development, integration, and production environments in much less time than before. “On AWS, we have more time to plan enhancements to an application,” says Mehlinger.

Cisco also achieved cost savings by migrating to Amazon SageMaker. For example, Cisco’s applications need to be running even during off-peak hours for reliability and speed, but the models do not need to be available all the time. Using Amazon SageMaker endpoints, Cisco can use asynchronous inference to reduce costs. The team can configure the endpoints to scale up to match demand as requests for inference come in, and then scale back down to zero when the work is done—all without impacting the application. By taking the resources off-line when they are not needed, Cisco saves money without sacrificing speed or availability .

Moreover, because Amazon SageMaker is a managed service, Cisco’s team does not have to invest in the infrastructure, hosting, or scaling. “On AWS, we can focus on putting the pieces together,” says Mehlinger. “We can focus on the work that we’re good at instead of the issues that experts have already resolved.”

In November 2023, Cisco adopted an Amazon SageMaker inference feature that made it possible to deploy multiple models behind a single endpoint. This inclusion further improved the efficiency of Cisco’s large-scale compute resources for near-real-time inference, resulting in faster scaling, better response time, and extra cost savings.

Outcome | Building New AI and ML Features for Cisco

Cisco unlocked a host of benefits by migrating to Amazon SageMaker, but it is not slowing down. As it continues its migration, Cisco’s Webex team is actively working on several features that harness AI, ML, and LLMs. Running these enormous models requires more sophisticated solutions. So, as the next part of its strategy, the team is looking into Amazon Bedrock, a fully managed service that offers a choice of high-performing foundation models to build generative AI applications. It is also evaluating multimodel endpoints to further improve price performance.

“AWS services are reliable and cost effective,” says Mehlinger. “We have a lot of options for using our resources efficiently. Plus, we get an immense amount of support that makes it simple for us to understand an issue and how to solve it and then fix it and move on to the next issue.”

About Cisco

Founded in 1984, Cisco is a hardware and software company that specializes in developing networking technology. It offers a range of technologies as well as technical support and other advanced services to enterprises, businesses, and consumers.

AWS Services Used

Amazon SageMaker

Amazon SageMaker is a fully managed service that brings together a broad set of tools to enable high-performance, low-cost machine learning (ML) for any use case.

Learn more »

Amazon Bedrock

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible AI.

Learn more »

Amazon EKS

Amazon Elastic Kubernetes Service (Amazon EKS) is a managed Kubernetes service to run Kubernetes in the AWS cloud and on-premises data centers.

Learn more »

More Telecommunications Customer Stories

no items found 

1

Get Started

Organizations of all sizes across all industries are transforming their businesses and delivering on their missions every day using AWS. Contact our experts and start your own AWS journey today.