AWS Machine Learning Infrastructure

High performance, cost-effective, and energy-efficient infrastructure for ML applications

From Fortune 500 companies to startups, organizations across various industries are increasingly adopting machine learning (ML) for a wide range of use cases, including natural language processing (NLP), computer vision, voice assistants, fraud detection, and recommendation engines. In addition, large language models (LLMs) that have hundreds of billions of parameters are unlocking new generative AI use cases, for example, image and text generation. With the growth of ML applications comes the increased usage, management, and cost of compute, storage, and networking resources. Identifying and choosing the right compute infrastructure is essential for reducing high power consumption, lowering excessive costs, and avoiding complexity during the training and deployment of ML models to production. To help you accelerate your ML innovation, AWS offers the ideal combination of high performance, cost-effective, and energy-efficient purpose-built ML tools and accelerators, optimized for ML applications.

Benefits

Easy to Use

Easy to use

Access purpose-built ML accelerators such as AWS Trainium and AWS Inferentia to train and deploy foundation models (FMs) and integrate them into your applications using AWS managed services such as Amazon SageMaker and Amazon Bedrock. SageMaker provides data scientists and ML developers with pretrained foundation models that can be fully customized for your specific use case and data, and deployed into production. Bedrock provides customers with a serverless experience for building generative AI applications using FMs through an API.

High Performance

High performance

You can power your ML application with the highest performing ML infrastructure from AWS. Amazon EC2 P4d and Amazon EC2 Trn1 instances are ideal for high performance ML training. For inference, Amazon EC2 Inf2 instances, powered by second-generation Inferentia2 offer 4x higher throughput and up to 10x lower latency than previous generation Inferentia-based instances.

Cost Effective

Cost effective

With a broad choice of infrastructure services, you can choose the right infrastructure for your budget. AWS Trainium-based Amazon EC2 Trn1 instances deliver 50% savings on training costs and AWS Inferentia2-based Amazon EC2 Inf2 instances deliver up to 40% better price performance than comparable Amazon EC2 instances. You can re-invest these cost-savings to accelerate innovation and grow your business.

Support for ML frameworks

Sustainable

AWS is committed to achieving Amazon's goal of net-zero carbon by 2040. Amazon SageMaker, a fully managed ML service, offers ML accelerators optimized for energy efficiency and reduced power consumption while training and deploying ML models in production. Amazon EC2 instances powered by ML accelerators, such as AWS Trainium and AWS Inferentia2, offer up to 50% better performance per watt than other comparable Amazon EC2 instances.

Scale

Scalable

AWS customers have access to virtually unlimited compute, network, and storage so they can scale. You can scale up or down as needed from one GPU or ML accelerator to thousands, and terabytes to petabytes of storage. Using the cloud, you don’t need to invest in all possible infrastructure. Instead, take advantage of elastic compute, storage, and networking.

Sustainable ML workloads

AWS computing instances support major ML frameworks such as TensorFlow and PyTorch. They also support model libraries and toolkits such as Hugging Face for a broad range of ML use cases. The AWS Deep Learning AMIs (AWS DLAMIs) and AWS Deep Learning Containers (AWS DLCs) come pre-installed with optimizations for ML frameworks and toolkits to accelerate deep learning in the cloud.

Solutions

ML Infrastructure

*Depending on your inference requirements, you can explore other instances in Amazon EC2 for CPU-based inference.

Success stories

  • Pepperstone
  • Pepperstone logo

    Pepperstone uses AWS ML infrastructure to deliver a seamless global trading experience for more than 40,000 unique visitors a month. They use Amazon SageMaker to automate the creation and deployment of ML models. By switching to SageMaker, they were able to reduce friction between DevOps and data science teams and lower ML model training time from 180 hours to 4.3 hours.

  • Finch Computing
  • Finch Computing logo

    Finch Computing uses AWS Inferentia with PyTorch on AWS for building ML models to perform NLP tasks such as, language translation and entity disambiguation thereby reducing their inference costs by over 80% compared to GPUs.

  • Amazon Robotics
  • Amazon Robotics logo

    Amazon Robotics used Amazon SageMaker to develop a sophisticated machine learning model that replaced manual scanning in Amazon fulfillment centers. Amazon Robotics used Amazon SageMaker and AWS Inferentia to reduce inferencing costs by nearly 50 percent

  • Money Forward
  • Money Forward logo

    Money Forward launched their large-scale AI chatbot service on Amazon EC2 Inf1 instances and reduced their inference latency by 97% over comparable GPU-based instances while also reducing costs. Based on their successful migration to Inf1 instances, they are also evaluating AWS Trainium-based EC2 Trn1 instances to improve end-to-end ML performance and cost.

  • Rad AI
  • Rad AI logo

    Rad AI uses AI to automate radiology workflows and help streamline radiology reporting. With the new Amazon EC2 P4d instances, Rad AI sees faster inference and the ability to train models 2.4x faster and with higher accuracy.

  • Amazon Alexa
  • Amazon Alexa logo
    "Amazon Alexa’s AI and ML-based intelligence, powered by Amazon Web Services, is available on more than 100 million devices today – and our promise to customers is that Alexa is always becoming smarter, more conversational, more proactive, and even more delightful. Delivering on that promise requires continuous improvements in response times and machine learning infrastructure costs, which is why we are excited to use Amazon EC2 Inf1 to lower inference latency and cost-per-inference on Alexa text-to-speech. With Amazon EC2 Inf1, we’ll be able to make the service even better for the tens of millions of customers who use Alexa each month."

    Tom Taylor, Senior Vice President – Amazon Alexa

  • Autodesk
  • Autodesk logo
    "Autodesk is advancing the cognitive technology of our AI-powered virtual assistant, Autodesk Virtual Agent (AVA) by using Inferentia. AVA answers over 100,000 customer questions per month by applying natural language understanding (NLU) and deep learning techniques to extract the context, intent, and meaning behind inquiries. Piloting Inferentia, we are able to obtain a 4.9x higher throughput over G4dn for our NLU models, and look forward to running more workloads on the Inferentia-based Inf1 instances."

    Binghui Ouyang, Sr Data Scientist – Autodesk

  • Sprinklr
  • Sprinklr logo
    "Sprinklr provides a unified customer experience management (Unified-CXM) platform that combines different applications for marketing, advertising, research, customer care, sales, and social media engagement. The goal is always to have lower latency, which means a better customer experience. Using Amazon EC2 Inf1 Instances, we are able to achieve that."

    Jamal Mazhar, Vice President of Infrastructure and DevOps – Sprinklr

Optimize your ML infrastructure with AWS