AWS Machine Learning Blog

Category: Amazon Elastic Kubernetes Service

How Sportradar used the Deep Java Library to build production-scale ML platforms for increased performance and efficiency

This is a guest post co-written with Fred Wu from Sportradar. Sportradar is the world’s leading sports technology company, at the intersection between sports, media, and betting. More than 1,700 sports federations, media outlets, betting operators, and consumer platforms across 120 countries rely on Sportradar knowhow and technology to boost their business. Sportradar uses data […]

Accelerate hyperparameter grid search for sentiment analysis with BERT models using Weights & Biases, Amazon EKS, and TorchElastic

Financial market participants are faced with an overload of information that influences their decisions, and sentiment analysis stands out as a useful tool to help separate out the relevant and meaningful facts and figures. However, the same piece of news can have a positive or negative impact on stock prices, which presents a challenge for […]

Scaling distributed training with AWS Trainium and Amazon EKS

Recent developments in deep learning have led to increasingly large models such as GPT-3, BLOOM, and OPT, some of which are already in excess of 100 billion parameters. Although larger models tend to be more powerful, training such models requires significant computational resources. Even with the use of advanced distributed training libraries like FSDP and […]

Run inference at scale for OpenFold, a PyTorch-based protein folding ML model, using Amazon EKS

This post was co-written with Sachin Kadyan, a leading developer of OpenFold. In drug discovery, understanding the 3D structure of proteins is key to assessing the ability of a drug to bind to it, directly impacting its efficacy. Predicting the 3D protein form, however, is very complex, challenging, expensive, and time consuming, and can take […]

Solution overview

Build flexible and scalable distributed training architectures using Kubeflow on AWS and Amazon SageMaker

In this post, we demonstrate how Kubeflow on AWS (an AWS-specific distribution of Kubeflow) used with AWS Deep Learning Containers and Amazon Elastic File System (Amazon EFS) simplifies collaboration and provides flexibility in training deep learning models at scale on both Amazon Elastic Kubernetes Service (Amazon EKS) and Amazon SageMaker utilizing a hybrid architecture approach. […]

Distributed training with Amazon EKS and Torch Distributed Elastic

Distributed deep learning model training is becoming increasingly important as data sizes are growing in many industries. Many applications in computer vision and natural language processing now require training of deep learning models, which are growing exponentially in complexity and are often trained with hundreds of terabytes of data. It then becomes important to use […]

The Intel®3D Athlete Tracking (3DAT) scalable architecture deploys pose estimation models using Amazon Kinesis Data Streams and Amazon EKS

This blog post is co-written by Jonathan Lee, Nelson Leung, Paul Min, and Troy Squillaci from Intel.  In Part 1 of this post, we discussed how Intel®3DAT collaborated with AWS Machine Learning Professional Services (MLPS) to build a scalable AI SaaS application. 3DAT uses computer vision and AI to recognize, track, and analyze over 1,000 […]

Build and deploy a scalable machine learning system on Kubernetes with Kubeflow on AWS

In this post, we demonstrate Kubeflow on AWS (an AWS-specific distribution of Kubeflow) and the value it adds over open-source Kubeflow through the integration of highly optimized, cloud-native, enterprise-ready AWS services. Kubeflow is the open-source machine learning (ML) platform dedicated to making deployments of ML workflows on Kubernetes simple, portable and scalable. Kubeflow provides many […]

Evolution of Cresta’s machine learning architecture: Migration to AWS and PyTorch

Cresta Intelligence, a California-based AI startup, makes businesses radically more productive by using Expertise AI to help sales and service teams unlock their full potential. Cresta is bringing together world-renowned AI thought-leaders, engineers, and investors to create a real-time coaching and management solution that transforms sales and increases service productivity, weeks after application deployment. Cresta […]

Serve 3,000 deep learning models on Amazon EKS with AWS Inferentia for under $50 an hour

October 2023: This post was reviewed and updated to include support for Graviton and Inf2 instances. More customers are finding the need to build larger, scalable, and more cost-effective machine learning (ML) inference pipelines in the cloud. Outside of these base prerequisites, the requirements of ML inference pipelines in production vary based on the business […]