AWS Machine Learning Blog

Category: Tensorflow on AWS

Creating a complete TensorFlow 2 workflow in Amazon SageMaker

Managing the complete lifecycle of a deep learning project can be challenging, especially if you use multiple separate tools and services. For example, you may use different tools for data preprocessing, prototyping training and inference code, full-scale model training and tuning, model deployments, and workflow automation to orchestrate all of the above for production. Friction […]

Read More

Running distributed TensorFlow training with Amazon SageMaker

TensorFlow is an open-source machine learning (ML) library widely used to develop heavy-weight deep neural networks (DNNs) that require distributed training using multiple GPUs across multiple hosts. Amazon SageMaker is a managed service that simplifies the ML workflow, starting with labeling data using active learning, hyperparameter tuning, distributed training of models, monitoring of training progression, […]

Read More

Launching TensorFlow distributed training easily with Horovod or Parameter Servers in Amazon SageMaker

Amazon SageMaker supports all the popular deep learning frameworks, including TensorFlow. Over 85% of TensorFlow projects in the cloud run on AWS. Many of these projects already run in Amazon SageMaker. This is due to the many conveniences Amazon SageMaker provides for TensorFlow model hosting and training, including fully managed distributed training with Horovod and […]

Read More