Posted On: Dec 1, 2021
Today, we are excited to announce Amazon SageMaker Training Compiler, a new feature of SageMaker that can accelerate the training of deep learning (DL) models by up to 50% through more efficient use of GPU instances.
State-of-the-art DL models for natural language processing (NLP) and computer vision (CV) tasks are complex multi-layered neural networks with billions of parameters that can take thousands of GPU hours to train. Even fine-tuning these models can sometimes take days, incurring high costs and slowing down innovation. To accelerate this process, you can now use SageMaker Training Compiler with minimal changes to your existing training script. SageMaker Training Compiler is integrated into the latest versions of PyTorch and TensorFlow in SageMaker and works under the hood of these frameworks so that no other changes to your workflow are required when it is enabled.
SageMaker Training Compiler accelerates training by converting DL models from their high-level language representation to hardware-optimized instructions. More specifically, SageMaker Training Compiler compilation makes graph-level optimizations (operator fusion, memory planning, and algebraic simplification), data flow-level optimizations (layout transformation, common sub-expression elimination), and back end optimizations (memory latency hiding, loop oriented optimizations) to more efficiently use hardware resources and, as a result, train the model faster. The returned model artifact from this accelerated training process is the same as it would be without these training optimizations enabled.
SageMaker Training Compiler is tested on most popular NLP DL models from Hugging Face including bert-base-cased, bert-base-uncased, distilbert-base-uncased, distilbert-base-uncased-finetuned-sst-2-english, gpt2, roberta-base, roberta-large, bert-base-chinese, and xlm-roberta-base. These models train up to 50% faster with SageMaker Training Compiler.