Ram Vegiraju | Artificial Intelligence

Run ML inference on unplanned and spiky traffic using Amazon SageMaker multi-model endpoints

Amazon SageMaker multi-model endpoints (MMEs) are a fully managed capability of SageMaker inference that allows you to deploy thousands of models on a single endpoint. Previously, MMEs pre-determinedly allocated CPU computing power to models statically regardless the model traffic load, using Multi Model Server (MMS) as its model server. In this post, we discuss a […]

Evaluate large language models for quality and responsibility

The risks associated with generative AI have been well-publicized. Toxicity, bias, escaped PII, and hallucinations negatively impact an organization’s reputation and damage customer trust. Research shows that not only do risks for bias and toxicity transfer from pre-trained foundation models (FM) to task-specific generative AI services, but that tuning an FM for specific tasks, on […]

Deploying ML models using SageMaker Serverless Inference

Amazon SageMaker Serverless Inference was recently announced at re:Invent 2021 as a new model hosting feature that lets customers serve model predictions without having to explicitly provision compute instances or configure scaling policies to handle traffic variations. Serverless Inference is a new deployment capability that complements SageMaker’s existing options for deployment that include: SageMaker Real-Time […]

Artificial Intelligence

Author: Ram Vegiraju

Run ML inference on unplanned and spiky traffic using Amazon SageMaker multi-model endpoints

Evaluate large language models for quality and responsibility

Deploying ML models using SageMaker Serverless Inference

Learn

Resources

Developers

Help