How AMPLYFI Manages Variable Traffic Machine Learning Workloads on AWS Lambda
Guest post by Tom Crawford, Lead DevOps Engineer, Lorenzo Bongiovanni, Principal Machine Learning Engineer, and Stephen Hall, Architect, AMPLYFI
Founded in 2015, AMPLYFI has developed an insight automation platform that helps organizations to make better decisions and change with conviction. AMPLYFI specializes in developing artificial intelligence driven solutions that unlock and analyze the vast amounts of unstructured data on the internet, internal company datasets, and industry databases, allowing customers to generate key decision-driving insights. Its products are used by some of the world’s largest organizations to enhance their existing business intelligence and market research capabilities by spotting early warning signals to future disruption, improving risk management, and deepening customer relationships.
AMPLYFI deploys a range of artificial intelligence techniques, from machine learning, natural language processing, pattern recognition, and unsupervised learning, to locate and interpret unstructured content. AMPLYFI then transforms this content into machine-curated structured datasets by automatically correcting, expanding, refreshing, and generating new insights. AMPLYFI has chosen to partner with AWS because of the diverse services offered, which was a critical factor for developing and expanding our multifaceted data processing pipelines at a fraction of the cost compared to traditional infrastructure.
Many of AMPLYFI’s machine learning tools leverage the latest generation neural language models. Due to recent improvements in neural architectures, these models have the ability to train hundreds of millions of parameters efficiently in order to learn deep patterns in the language, resulting in a dramatic advancement in performance for many natural language understanding tasks. However, the size of these models make them non-trivial to deploy at scale.
Due to the requirements of the data processing pipeline, these machine learning activities could be described as ‘bursty’, with highly variable levels of traffic. Provisioning dedicated AWS EC2 instances for these activities is extremely inefficient even with auto-scaling. Initially, we experimented with deploying a scalable machine learning pipeline utilizing batch processing, running data batches periodically. This worked effectively and produced the desired output. The one major drawback was end-to-end processing time.
At AMPLYFI, we love AWS Lambda. We use Lambda in many situations; from document processing pipelines, customer facing microservice APIs, asynchronous stream events and general housekeeping. We enjoy the flexibility, horizontal scalability, ease of deployment and the performance of Lambda. Most importantly, using Lambda and other serverless infrastructure allows us to scale cost with usage and to deploy development and test infrastructure with minimal idle costs.
AWS Lambda has limitations on internal file storage. The deployment package limit is 250 MB (unzipped, including layers) and AWS Lambda imposes a limit of 512 MB on the /tmp directory. Common Python dependencies such as PyTorch consume more than 500MB alone, and that’s without considering the size of the machine learning models.
We had tried without success to build smaller versions of these libraries with reduced dependencies and include them as Lambda Layers, and though we were able to deploy to Lambda, we could not invoke the model due to the limitations of the runtime storage.
Recently, we were excited to learn that AWS had released support for Lambda container images. We realized that this capability would enable us to move our dependency-heavy machine learning inference processes to sit alongside our existing Lambda based data processing pipeline.
Why Lambda with Container Images?
Our use case demanded that we perform an analysis and deliver the results to our users in near to real-time. We considered a number of options, but ultimately, using any non serverless option meant that we had to choose between permanent inference infrastructure or waiting for batches to complete. Even if we had configured the batch process to run more frequently, the delay between different stages of the pipeline would still remain. It would also be more complex to deliver a multi-stage model or a model that had a dependency on an earlier model completing.
This is where Lambda with containers really delivered. We can now perform analysis on documents in an event driven manner – in near real-time, as documents are passed into the pipeline. We now have the benefit of a self scaling machine learning process that caters to the volume of data received. This has occurred with no significant increase in processing costs.
Benefits of using AWS Lambda with container images:
– Significantly reduced total processing time
– Lower costs due to no idle resources when there is no traffic
– Fully-automated scaling and elasticity
In order to test the approach, we employed one of our existing models – sentiment analysis.
Sentiment analysis is performed by a neural language supervised model designed to analyze unstructured data to determine whether for a particular sentence, a positive or negative sentiment has been expressed towards the referenced organization or person. This analysis is performed as part of a wider, multi-stage Lambda based document processing pipeline.
As a robust AMPLYFI model that was already deployed in production, this would act as a reliable baseline for the comparison of each technique.
The following describes the basic steps required to build and deploy a Python Lambda with a dependency on a pre-trained model. The same process could be used to deliver any Lambda with dependencies that are larger than the standard Lambda size limits.
1. Create a model with a standardized JSON input/output format.
2. Create an entry point for the model as a standard Lambda handler in python:
3. Create a Dockerfile. Note that each container layer must be smaller than 10GB to fit the ECR size restrictions. Reducing the size of this container will also optimize the initialization time of the Lambda.
4. In the Dockerfile, install the requirements, copy the function code, set any required permissions and define the Lambda entry point:
5. Define the serverless file:
6. Deploy the function:
This builds the image, pushes to the Elastic Container Registry and deploys the Lambda.
Whilst we are in the early stages of using a new model in production, we are generous with the configured memory of the Lambda. As our confidence and understanding in the process improves, we are able to reduce costs by reducing the memory capacity to more closely reflect the specific requirements of each model. As always with Lambda, be mindful that reducing the memory will reduce the CPU capacity in a linear fashion, so the execution time of Lambda may increase as a result.
In our experience, when running a Lambda with a container image, the initial start time of the Lambda will increase depending on the size of the model. This is due to the image being copied from the registry and the Lambda cold start initialization period. Subsequent invocations, however, are much faster – less than a second in many cases.
Using the Docker runtime for the Lambda makes it very easy to invoke and test the function locally.
With a serverless infrastructure, we are able to manage throughput on a continuous basis. Previous implementations had relied on batch processing, which was not run continuously, leading to a much longer document completion time, and increased complexity of processing, especially where models had dependencies.
The ability of Lambda to scale horizontally means that we do not need to predict capacity or provision infrastructure. With this implementation, the lead time from a document being added to the pipeline to being customer visible has reduced significantly to a matter of minutes, even when processing thousands of documents concurrently.
With a memory allocation of 3008 MB and an average running time of approximately 2 seconds, the Lambda cost for this model is approximately $10 per 100,000 invocations (outside of the free tier). The actual cost might be slightly different depending on the region and other factors. It is likely that we can optimise this further to reduce the processing time, memory and cost. Cost is also minimized in development and test environments where throughput is much lower as there is no requirement for permanently enabled infrastructure.
This first iteration of Lambda with containers has been so successful for us that we have built much of our document processing pipeline with the same basic infrastructure. We believe that we have established a pattern that allows us to be flexible, scalable and agile in the implementation of new models.
This has been a hugely significant change for AMPLYFI. Our machine learning team is able to create cutting-edge models that can be deployed to production with much greater efficiency and we have barely scratched the surface of the possibilities here.
Please get in touch if you would like to know more about the AMPLYFI product suite or our engineering team.
Stephen Hall is an Architect at AMPLYFI. He enjoys solving serverless software engineering and data storage problems and the accelerated agile development of low cost scalable systems.
Tom Crawford is the Lead DevOps Engineer at AMPLYFI. He loves the challenge of merging AMPLYFI’s cutting edge business intelligence capabilities with AWS services to enable AMPLYFI to build secure, cost effective platforms that are available to users quickly by utilizing CI/CD solutions.
Lorenzo Bongiovanni is Principal Machine Learning Engineer at AMPLYFI. His focus is to leverage and develop the state-of-the-art in Natural Language Understanding to boost AMPLYFI’s Information Extraction capabilities.