Artificial Intelligence
Enable pod-based GPU metrics in Amazon CloudWatch
This post details how to set up container-based GPU metrics and provides an example of collecting these metrics from EKS pods.
Multiregion serverless distributed training with AWS Batch and Amazon SageMaker
Creating a global footprint and access to scale are one of the many best practices at AWS. By creating architectures that take advantage of that scale and also efficient data utilization (in both performance and cost), you can start to see how important access is at scale. For example, within autonomous vehicles (AV) development, data is geographically […]
Scalable multi-node deep learning training using GPUs in the AWS Cloud
A key barrier to the wider adoption of deep neural networks on industrial-size datasets is the time and resources required to train them. AlexNet, which won the 2012 ImageNet Large Scale Visual Recognition Competition (ILSVRC) and kicked off the current boom in deep neural networks, took nearly a week to train across the 1.2-million-image, 1000-category […]
Build an online compound solubility prediction workflow with AWS Batch and Amazon SageMaker
Machine learning (ML) methods for the field of computational chemistry are growing at an accelerated rate. Easy access to open-source solvers (such as TensorFlow and Apache MXNet), toolkits (such as RDKit cheminformatics software), and open-scientific initiatives (such as DeepChem) makes it easy to use these frameworks in daily research. In the field of chemical informatics, many […]



