AWS Deep Learning AMIs | Artificial Intelligence

Reducing container cold start times using SOCI index on DLAMI and DLC

In this post, we look at how to use SOCI on publicly available Deep Learning AMIs and Containers, when to use the various SOCI modes provided by the tool, and how to quickly and efficiently use this tool in your workloads today.

Configure and verify a distributed training cluster with AWS Deep Learning Containers on Amazon EKS

Misconfiguration issues in distributed training with Amazon EKS can be prevented following a systematic approach to launch required components and verify their proper configuration. This post walks through the steps to set up and verify an EKS cluster for training large models using DLCs.

Deploy LLMs on Amazon EKS using vLLM Deep Learning Containers

In this post, we demonstrate how to deploy the DeepSeek-R1-Distill-Qwen-32B model using AWS DLCs for vLLMs on Amazon EKS, showcasing how these purpose-built containers simplify deployment of this powerful inference engine. This solution can help you solve the complex infrastructure challenges of deploying LLMs while maintaining performance and cost-efficiency.

Streamline deep learning environments with Amazon Q Developer and MCP

In this post, we explore how to use Amazon Q Developer and Model Context Protocol (MCP) servers to streamline DLC workflows to automate creation, execution, and customization of DLC containers.

Build high-performance ML models using PyTorch 2.0 on AWS – Part 1

PyTorch is a machine learning (ML) framework that is widely used by AWS customers for a variety of applications, such as computer vision, natural language processing, content creation, and more. With the recent PyTorch 2.0 release, AWS customers can now do same things as they could with PyTorch 1.x but faster and at scale with […]

Optimized PyTorch 2.0 inference with AWS Graviton processors

New generations of CPUs offer a significant performance improvement in machine learning (ML) inference due to specialized built-in instructions. Combined with their flexibility, high speed of development, and low operating cost, these general-purpose processors offer an alternative to other existing hardware solutions. AWS, Arm, Meta and others helped optimize the performance of PyTorch 2.0 inference […]

Federated Learning on AWS with FedML: Health analytics without sharing sensitive data – Part 2

This blog post is co-written with Chaoyang He and Salman Avestimehr from FedML. Analyzing real-world healthcare and life sciences (HCLS) data poses several practical challenges, such as distributed data silos, lack of sufficient data at a single site for rare events, regulatory guidelines that prohibit data sharing, infrastructure requirement, and cost incurred in creating a […]

Federated Learning on AWS with FedML: Health analytics without sharing sensitive data – Part 1

This blog post is co-written with Chaoyang He and Salman Avestimehr from FedML. Analyzing real-world healthcare and life sciences (HCLS) data poses several practical challenges, such as distributed data silos, lack of sufficient data at any single site for rare events, regulatory guidelines that prohibit data sharing, infrastructure requirement, and cost incurred in creating a […]

Run machine learning inference workloads on AWS Graviton-based instances with Amazon SageMaker

Today, we are launching Amazon SageMaker inference on AWS Graviton to enable you to take advantage of the price, performance, and efficiency benefits that come from Graviton chips. Graviton-based instances are available for model inference in SageMaker. This post helps you migrate and deploy a machine learning (ML) inference workload from x86 to Graviton-based instances […]

Model hosting patterns in Amazon SageMaker, Part 7: Run ensemble ML models on Amazon SageMaker

Model deployment in machine learning (ML) is becoming increasingly complex. You want to deploy not just one ML model but large groups of ML models represented as ensemble workflows. These workflows are comprised of multiple ML models. Productionizing these ML models is challenging because you need to adhere to various performance and latency requirements. Amazon […]

Artificial Intelligence

Category: AWS Deep Learning AMIs