Important Update
Thank you for your interest in Amazon Elastic Inference. Amazon Elastic Inference is no longer available to new customers. You can get better performance at lower cost for your machine learning inference workloads by using other hardware acceleration options such as AWS Inferentia. If you are currently using Amazon Elastic Inference, please consider migrating your workload to these alternatives. To learn more, visit AWS Machine Learning Infrastructure page.
General
Q: Why is Amazon encouraging customers to move workloads from Amazon Elastic Inference (EI) to newer hardware acceleration options such as AWS Inferentia?
Customers get better performance at a much better price than Amazon EI with new hardware accelerator options such as AWS Inferentia for their inference workloads. AWS Inferentia is designed to provide high performance inference in the cloud, to drive down the total cost of inference, and to make it easy for developers to integrate machine learning into their business applications. To enable customers to benefit from such newer generation hardware accelerators, we will not onboard new customers to Amazon EI after April 15, 2023.
Q: Which AWS services are impacted by the move to stop onboarding new customers to Amazon Elastic Inference (EI)?
This announcement will affect Amazon EI accelerators attached to any Amazon EC2, Amazon SageMaker instances, or Amazon Elastic Container Service (ECS) tasks. In Amazon SageMaker, this applies to both endpoints and notebook kernels using Amazon EI accelerators.
Q: Will I be able to create a new Amazon Elastic Inference (EI) accelerator after April 15, 2023?
No, if you are a new customer and have not used Amazon EI in the past 30 days, then you will not be able create a new Amazon EI instance in your AWS account after April 15, 2023. However, if you have used an Amazon EI accelerator at least once in the past 30 days, you can attach a new Amazon EI accelerator to your instance.
Q: We currently use Amazon Elastic Inference (EI) accelerators. Will we be able to continue using them after April 15, 2023?
Yes, you will be able use Amazon EI accelerators. We recommend that you migrate your current ML Inference workloads running on Amazon EI to other hardware accelerator options at your earliest convenience.
Q: How do I evaluate alternative instance options for my current Amazon SageMaker Inference Endpoints?
Amazon SageMaker Inference Recommender can help you identify cost-effective deployments to migrate existing workloads from Amazon Elastic Inference (EI) to an appropriate ML-Instance supported by SageMaker.
Q: How do I change the instance type for my existing endpoint in Amazon SageMaker?
- First, create a new EndpointConfig that uses the new instance type. If you have an autoscaling policy, delete the existing autoscaling policy.
- Call UpdateEndpoint while specifying your newly created EndpointConfig.
- Wait for your endpoint to change status to InService. This will take approximately 10-15 minutes.
- Finally, if you need autoscaling for your new endpoint, create a new autoscaling policy for this new endpoint and ProductionVariant.
Q: How do I change the instance type for my existing Amazon SageMaker Notebook Instance using Amazon Elastic Inference (EI)?
Click on “Notebook instances” in Console then click on the Notebook Instance you want to update. Make sure the Notebook Instance has a “Stopped” status. Finally, you can click “Edit” and change your instance type. Make sure when your Notebook Instance starts up, that you select the right kernel for your new instance.
Q: Is there a specific instance type which is a good alternative to Amazon Elastic Inference (EI)?
Every machine learning workload is unique. We recommend using Amazon SageMaker Inference Recommender to help you identify the right instance type for your ML workload, performance requirements, and budget. AWS Inferentia, specifically inf1.xlarge, is the best high performance and low-cost alternative for Amazon EI customers. In the table below, we compare performance and price per hour for different Amazon EI accelerator options on SageMaker with Inferentia. Inferentia provides the best price and performance and is cheaper per hour than all Amazon EI instances, assuming a c5.xlarge host instance (see table below). Please note that models must be first compiled before they can be deployed on AWS Inferentia. SageMaker customers can use SageMaker Neo setting “ml_inf” as the TargetDevice to compile their models. If you are not using SageMaker, please use the AWS Neuron compiler directly.
Prices below assume us-east-2 region
Instance type + Elastic inference | EI price per hour | Total EI Cost per hour | Premium compared to AWS Inferentia | Cost savings of Inferentia compared to EI | Performance (FP16 TFLOP) | Performance impr. with Inferentia |
ml.c5.xlarge + ml.eia2.medium | $0.17 | $0.37 | $0.07 | 19% | 8 | 8x |
ml.c5.xlarge + ml.eia1.medium | $0.18 | $0.39 | $0.09 | 23% | 8 | 8x |
ml.c5.xlarge + ml.eia2.large | $0.34 | $0.54 | $0.24 | 44% | 16 | 4x |
ml.c5.xlarge + ml.eia1.large | $0.36 | $0.57 | $0.27 | 47% | 16 | 4x |
ml.c5.xlarge + ml.eia2.xlarge | $0.48 | $0.68 | $0.38 | 56% | 32 | 2x |
ml.c5.xlarge + ml.eia1.xlarge | $0.73 | $0.93 | $0.63 | 68% | 32 | 2x |
Q: What is Amazon Elastic Inference?
A: Amazon Elastic Inference (Amazon EI) is an accelerated compute service that allows you to attach just the right amount of GPU-powered inference acceleration to any Amazon EC2 or Amazon SageMaker instance type or Amazon ECS task. This means you can now choose the instance type that is best suited to the overall compute, memory, and storage needs of your application, and then separately configure the amount of inference acceleration that you need.
Q: What are Amazon Elastic inference accelerators?
A: Amazon Elastic inference accelerators are GPU-powered hardware devices that are designed to work with any EC2 instance, Sagemaker instance, or ECS task to accelerate deep learning inference workloads at a low cost. When you launch an EC2 instance or an ECS task with Amazon Elastic Inference, an accelerator is provisioned and attached to the instance over the network. Deep learning tools and frameworks like TensorFlow Serving, Apache MXNet and PyTorch that are enabled for Amazon Elastic Inference, can automatically detect and offload model computation to the attached accelerator.
Q: What is the difference between the Amazon Elastic inference accelerator family types?
A: The EIA2 accelerators have twice the GPU memory of equivalent EIA1 accelerators. You can determine your GPU memory needs based on your model and tensor input sizes and choose the right accelerator family and type for your needs.
Configuring
Q: How do I provision Amazon Elastic Inference accelerators?
A: You can configure Amazon SageMaker endpoints or Amazon EC2 instances or Amazon ECS tasks with Amazon Elastic Inference accelerators using the AWS management console, AWS command line interface (CLI), or the AWS SDK. There are two requirements for launching EC2 instances with accelerators. First, you will need to provision an AWS PrivateLink VPC Endpoint for the subnets where you plan to launch accelerators. Second, as you launch an instance, you need to provide an instance role with a policy that allows users accessing the instance to connect to accelerators. When you configure an instance to launch with Amazon EI, an accelerator is provisioned in the same Availability Zone behind the VPC endpoint.
Q: What model formats does Amazon Elastic Inference support?
A: Amazon Elastic Inference supports models trained using TensorFlow, Apache MXNet, PyTorch, and ONNX models.
Q: Can I deploy models on Amazon Elastic Inference using TensorFlow, Apache MXNet or PyTorch frameworks?
A: Yes, you can use AWS-enhanced TensorFlow Serving, Apache MXNet and PyTorch libraries to deploy models and make inference calls.
Q: How do I get access to AWS optimized frameworks?
A: The AWS Deep Learning AMIs include the latest releases of TensorFlow Serving, Apache MXNet and PyTorch that are optimized for use with Amazon Elastic Inference accelerators. You can also obtain the libraries via Amazon S3 to build your own AMIs or container images. Please see our documentation (https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/elastic-inference.html) for more information.
Q: Can I use CUDA with Amazon Elastic Inference accelerators?
A: No. You can only use either the AWS-enhanced TensorFlow Serving, Apache MXNet or PyTorch libraries as an interface to Amazon Elastic Inference accelerators.
Pricing and billing
Q: How am I charged for Amazon Elastic Inference?
A: You pay only for the Amazon Elastic Inference accelerator hours you use. For more details, see the pricing page.
Q: Will I incur charges for AWS PrivateLink VPC Endpoints for the Amazon Elastic Inference service?
A: No. You will not incur charges for VPC Endpoints to the Amazon Elastic Inference service, as long as you have at least one instance configured with an accelerator, running in an Availability Zone where a VPC endpoint is provisioned.
See the Amazon Elastic Inference pricing page for detailed pricing information.
Get started with Amazon Elastic Inference on Amazon SageMaker or Amazon EC2.