Artificial Intelligence

Author: Vivek Gangasani

Vivek is a Senior Machine Learning Solutions Architect at Amazon Web Services. He works with Machine Learning startups to build and deploy AI/ML applications on AWS. He is currently focused on delivering solutions for MLOps, ML Inference and low-code ML. He has worked on projects in different domains, including Natural Language Processing and Computer Vision.

Introducing Disaggregated Inference on AWS powered by llm-d

In this blog post, we introduce the concepts behind next-generation inference capabilities, including disaggregated serving, intelligent request scheduling, and expert parallelism. We discuss their benefits and walk through how you can implement them on Amazon SageMaker HyperPod EKS to achieve significant improvements in inference performance, resource utilization, and operational efficiency.

Build Agentic Workflows with OpenAI GPT OSS on Amazon SageMaker AI and Amazon Bedrock AgentCore

In this post, we show how to deploy gpt-oss-20b model to SageMaker managed endpoints and demonstrate a practical stock analyzer agent assistant example with LangGraph, a powerful graph-based framework that handles state management, coordinated workflows, and persistent memory systems.

Introducing auto scaling on Amazon SageMaker HyperPod

In this post, we announce that Amazon SageMaker HyperPod now supports managed node automatic scaling with Karpenter, enabling efficient scaling of SageMaker HyperPod clusters to meet inference and training demands. We dive into the benefits of Karpenter and provide details on enabling and configuring Karpenter in SageMaker HyperPod EKS clusters.

Amazon SageMaker HyperPod launches model deployments to accelerate the generative AI model development lifecycle

In this post, we announce Amazon SageMaker HyperPod support for deploying foundation models from SageMaker JumpStart, as well as custom or fine-tuned models from Amazon S3 or Amazon FSx. This new capability allows customers to train, fine-tune, and deploy models on the same HyperPod compute resources, maximizing resource utilization across the entire model lifecycle.

Optimize RAG in production environments using Amazon SageMaker JumpStart and Amazon OpenSearch Service

In this post, we show how to use Amazon OpenSearch Service as a vector store to build an efficient RAG application.

Supercharge your LLM performance with Amazon SageMaker Large Model Inference container v15

Today, we’re excited to announce the launch of Amazon SageMaker Large Model Inference (LMI) container v15, powered by vLLM 0.8.4 with support for the vLLM V1 engine. This release introduces significant performance improvements, expanded model compatibility with multimodality (that is, the ability to understand and analyze text-to-text, images-to-text, and text-to-images data), and provides built-in integration with vLLM to help you seamlessly deploy and serve large language models (LLMs) with the highest performance at scale.

DeepSeek-R1 model now available in Amazon Bedrock Marketplace and Amazon SageMaker JumpStart

DeepSeek-R1 is an advanced large language model that combines reinforcement learning, chain-of-thought reasoning, and a Mixture of Experts architecture to deliver efficient, interpretable responses while maintaining safety through Amazon Bedrock Guardrails integration.

Use Amazon Bedrock tooling with Amazon SageMaker JumpStart models

In this post, we explore how to deploy AI models from SageMaker JumpStart and use them with Amazon Bedrock’s powerful features. Users can combine SageMaker JumpStart’s model hosting with Bedrock’s security and monitoring tools. We demonstrate this using the Gemma 2 9B Instruct model as an example, showing how to deploy it and use Bedrock’s advanced capabilities.

Amazon SageMaker Inference now supports G6e instances

G6e instances on SageMaker unlock the ability to deploy a wide variety of open source models cost-effectively. With superior memory capacity, enhanced performance, and cost-effectiveness, these instances represent a compelling solution for organizations looking to deploy and scale their AI applications. The ability to handle larger models, support longer context lengths, and maintain high throughput makes G6e instances particularly valuable for modern AI applications.

Maximize Stable Diffusion performance and lower inference costs with AWS Inferentia2

Generative AI models have been experiencing rapid growth in recent months due to its impressive capabilities in creating realistic text, images, code, and audio. Among these models, Stable Diffusion models stand out for their unique strength in creating high-quality images based on text prompts. Stable Diffusion can generate a wide variety of high-quality images, including […]

← Older posts