AWS Inferentia | AWS Partner Network (APN) Blog

Category: AWS Inferentia

Reducing Inference Times by 87% for Darwinbox’s Talent Search Engine Using AWS Inferentia

Darwinbox wanted to reduce the time to infer resumes against job descriptions using PyTorch models. AWS Premier Partner Minfy helped them leverage Amazon SageMaker and AWS Inferentia to compile models with Neuron SDK and deploy them, achieving 87% faster inference without retraining. Key steps were compiling models with the Neuron SDK, extending SageMaker containers, using Inference Recommender to optimize configurations, and sending requests in mini-batches.