Lessons Learned from Building and Testing a Data Ingest Workflow at Scale
Dean Kleissas, Research Engineer working on the IARPA MICrONS Project at the Johns Hopkins University Applied Physics Laboratory (JHU/APL), discussed the project, what makes it unique, and how the team leverages serverless technology in this blog post.
Here are some quick tips we learned while taking our ingest service from 0 to 4-5Gpbs, and from a single on-premises user the system hosted in AWS.
When using AWS Lambda:
- Remember to optimize your Lambda function memory allocation settings to balance performance and cost. Increasing the memory allocation can reduce execution time due to the additional allocation of CPU and network. The smallest Lambda function may not always be the cheapest for your task. Also, having Lambda functions finish faster can give you more effective Lambda capacity.
- When running Lambda in a VPC, remember that each subnet the Lambda function connects to will use an Elastic Network Interface (ENI). Be sure to have enough ENIs available in your account based on your subnet configuration and enough IPs allocated to reach your Lambda limit.
- If your Lambda functions are I/O heavy, make sure that your network architecture can handle the load as more functions start to spin up in parallel. If you don’t, an unforeseen hiccup increasing the latency can cause your execution time to go up and Lambda throttles to occur.
- Often circuit breakers and other resilient design patterns are useful to put into your architecture to avoid cascading failures in case your Lambda functions start to throttle due to your Lambda limit.
When using Amazon DynamoDB:
- Item size drives the capacity requirements, meaning that as an item’s size increases it can consume additional capacity units. These are in 4KB read and 1KB write units. The capacity units are metered at a “per second rate,” so some quick, big writes can cause your tables to throttle unexpectedly.
- Be sure to avoid a “hot partition” due to heavy I/O to a small region of a table’s key space. When this occurs, you’ll see throttle exceptions lower than the total available capacity for the table. Try pre-pending a hash or other random data to your keys to spread objects across a table’s partitions.
- Finally, the table capacity is actually allocated at the partition level. If you have a hot partition and can’t fix things at the key level, you will most likely need to double your capacity to split the partition.