Posted On: Nov 11, 2022
Amazon SageMaker training jobs now support ml.trn1 instances, powered by AWS Trainium chips, which are purpose built for high-performance ML training applications in the cloud. You can use ml.trn1 instances on SageMaker to train natural language processing (NLP), computer vision, and recommender models across a broad set of applications, such as speech recognition, recommendation, fraud detection, image and video classification, and forecasting.
ml.trn1 instances can feature up to 16 AWS Trainium chips, which is a second-generation ML chip built by AWS after AWS Inferentia. ml.trn1 instances are the first EC2 instances with up to 800 Gbps of Elastic Fabric Adapter (EFA) network bandwidth. For efficient data and model parallelism, each ml.trn1.32xl instance has 512 GB of high-bandwidth memory, delivers up to 3.4 petaflops of FP16/BF16 compute power, and features NeuronLink, an intra-instance high-bandwidth nonblocking interconnect.
ml.trn1 instances are available in two sizes: ml.trn1.2xlarge, for experimenting with a single accelerator and training small models cost effectively, and ml.trn1.32xlarge for training large-scale models. SageMaker Model Training supports ml.trn1 instances today in the AWS US East (N. Virginia) and US West (Oregon) regions.
To read more about ml.trn1 instances, visit the AWS news blog or visit the Trn1 instance page. To get started using ml.trn1 instances sign into the Amazon SageMaker console. To learn more about Amazon SageMaker Model Training, visit our web page.