AWS Machine Learning Blog

Category: Compute

Accelerate NLP inference with ONNX Runtime on AWS Graviton processors

ONNX is an open source machine learning (ML) framework that provides interoperability across a wide range of frameworks, operating systems, and hardware platforms. ONNX Runtime is the runtime engine used for model inference and training with ONNX. AWS Graviton3 processors are optimized for ML workloads, including support for bfloat16, Scalable Vector Extension (SVE), and Matrix […]

How LotteON built dynamic A/B testing for their personalized recommendation system

This post is co-written with HyeKyung Yang, Jieun Lim, and SeungBum Shim from LotteON. LotteON is transforming itself into an online shopping platform that provides customers with an unprecedented shopping experience based on its in-store and online shopping expertise. Rather than simply selling the product, they create and let customers experience the product through their […]

Scale AI training and inference for drug discovery through Amazon EKS and Karpenter

This is a guest post co-written with the leadership team of Iambic Therapeutics. Iambic Therapeutics is a drug discovery startup with a mission to create innovative AI-driven technologies to bring better medicines to cancer patients, faster. Our advanced generative and predictive artificial intelligence (AI) tools enable us to search the vast space of possible drug […]

Generate customized, compliant application IaC scripts for AWS Landing Zone using Amazon Bedrock

As you navigate the complexities of cloud migration, the need for a structured, secure, and compliant environment is paramount. AWS Landing Zone addresses this need by offering a standardized approach to deploying AWS resources. This makes sure your cloud foundation is built according to AWS best practices from the start. With AWS Landing Zone, you eliminate the guesswork in security configurations, resource provisioning, and account management. It’s particularly beneficial for organizations looking to scale without compromising on governance or control, providing a clear path to a robust and efficient cloud setup. In this post, we show you how to generate customized, compliant IaC scripts for AWS Landing Zone using Amazon Bedrock.

Open source observability for AWS Inferentia nodes within Amazon EKS clusters

This post walks you through the Open Source Observability pattern for AWS Inferentia, which shows you how to monitor the performance of ML chips, used in an Amazon Elastic Kubernetes Service (Amazon EKS) cluster, with data plane nodes based on Amazon Elastic Compute Cloud (Amazon EC2) instances of type Inf1 and Inf2.

Scale LLMs with PyTorch 2.0 FSDP on Amazon EKS – Part 2

This is a guest post co-written with Meta’s PyTorch team and is a continuation of Part 1 of this series, where we demonstrate the performance and ease of running PyTorch 2.0 on AWS. Machine learning (ML) research has proven that large language models (LLMs) trained with significantly large datasets result in better model quality. In […]

Transform one-on-one customer interactions: Build speech-capable order processing agents with AWS and generative AI

In today’s landscape of one-on-one customer interactions for placing orders, the prevailing practice continues to rely on human attendants, even in settings like drive-thru coffee shops and fast-food establishments. This traditional approach poses several challenges: it heavily depends on manual processes, struggles to efficiently scale with increasing customer demands, introduces the potential for human errors, […]

Federated learning on AWS using FedML, Amazon EKS, and Amazon SageMaker

This post is co-written with Chaoyang He, Al Nevarez and Salman Avestimehr from FedML. Many organizations are implementing machine learning (ML) to enhance their business decision-making through automation and the use of large distributed datasets. With increased access to data, ML has the potential to provide unparalleled business insights and opportunities. However, the sharing of […]

Enhance code review and approval efficiency with generative AI using Amazon Bedrock

In the world of software development, code review and approval are important processes for ensuring the quality, security, and functionality of the software being developed. However, managers tasked with overseeing these critical processes often face numerous challenges, such as the following: Lack of technical expertise – Managers may not have an in-depth technical understanding of […]

Large language model inference over confidential data using AWS Nitro Enclaves

This post discusses how Nitro Enclaves can help protect LLM model deployments, specifically those that use personally identifiable information (PII) or protected health information (PHI). This post is for educational purposes only and should not be used in production environments without additional controls.