AWS Compute Blog

Scalable deep learning training using multi-node parallel jobs with AWS Batch and Amazon FSx for Lustre

Contributed by Amr Ragab, HPC Application Consultant, AWS Professional Services How easy is it to take an AWS reference architecture and implement a production solution? At re:Invent 2018, Toyota Research Institute presented their production DL HPC architecture. This was based on a reference architecture for a scalable, deep learning, high performance computing solution, released earlier […]

Read More

Automatically update instances in an Amazon ECS cluster using the AMI ID parameter

This post is contributed by Adam McLean – Solutions Developer at AWS and Chirill Cucereavii – Application Architect at AWS  In this post, we show you how to automatically refresh the container instances in an active Amazon Elastic Container Service (ECS) cluster with instances built from a newly released AMI. The Amazon ECS-optimized AMI  comes prepackaged with the […]

Read More

Scheduling GPUs for deep learning tasks on Amazon ECS

This post is contributed by Brent Langston – Sr. Developer Advocate, Amazon Container Services Last week, AWS announced enhanced Amazon Elastic Container Service (Amazon ECS) support for GPU-enabled EC2 instances. This means that now GPUs are first class resources that can be requested in your task definition, and scheduled on your cluster by ECS. Previously, […]

Read More

Deploying a personalized API Gateway serverless developer portal

This post is courtesy of Drew Dresser, Application Architect – AWS Professional Services Amazon API Gateway is a fully managed service that makes it easy for developers to create, publish, maintain, monitor, and secure APIs at any scale. Customers of these APIs often want a website to learn and discover APIs that are available to […]

Read More

Working with AWS Lambda and Lambda Layers in AWS SAM

The introduction of serverless technology has enabled developers to shed the burden of managing infrastructure and concentrate on their application code. AWS Lambda has taken on that management by providing isolated, event-driven compute environments for the execution of application code. To use a Lambda function, a developer just needs to package their code and any […]

Read More

Recovering files from an Amazon EBS volume backup

Contributed by Jeff Bartley, Storage Solutions Architect, AWS Amazon Elastic Block Store (Amazon EBS) enables you to back up volumes at any time using EBS snapshots. Volume backups can be triggered manually or they can be scheduled using Amazon Data Lifecycle Manager (Amazon DLM) or AWS Backup. Each backup creates a unique EBS snapshot. The […]

Read More

Setting up AWS PrivateLink for Amazon ECS, and Amazon ECR

Amazon ECS and Amazon ECR now have support for AWS PrivateLink. AWS PrivateLink is a networking technology designed to enable access to AWS services in a highly available and scalable manner. It keeps all the network traffic within the AWS network. When you create AWS PrivateLink endpoints for ECR and ECS, these service endpoints appear […]

Read More

Learn about hourly-replication in Server Migration Service and the ability to migrate large data volumes

This post courtesy of Shane Baldacchino, AWS Solutions Architect AWS Server Migration Service (AWS SMS) is an agentless service that makes it easier and faster for you to migrate thousands of on-premises workloads to AWS. AWS SMS allows you to automate, schedule, and track incremental replications of live server volumes, making it easier for you […]

Read More

ICYMI: Serverless Q4 2018

This post is courtesy of Eric Johnson, Senior Developer Advocate – AWS Serverless Welcome to the fourth edition of the AWS Serverless ICYMI (in case you missed it) quarterly recap. Every quarter, we share all of the most recent product launches, feature enhancements, blog posts, webinars, Twitch live streams, and other interesting things that you […]

Read More

Migrate Wildfly Cluster to Amazon ECS using Service Discovery

This post is courtesy of Vidya Narasimhan, AWS Solutions Architect 1. Overview Java Enterprise Edition has been an important server-side platform for over a decade for developing mission-critical & large-scale applications amongst enterprises. High-availability & fault tolerance for such applications is typically achieved through built-in JEE clustering provided by the platform. JEE clustering represents a […]

Read More