Nielsen: Processing 55TB of Data Per Day with AWS Lambda
In preparation for the recording and during my initial conversations with Opher, I realized that there is an amazing story here that can help a lot of developers and architects who are building, or thinking of building, serverless architectures.
Many of these builders aren’t sure if serverless can withstand the load they need, what the costs are going to look like, or how complicated and different it is from what they are familiar with—and they have countless more questions. Nielsen’s story is a great example of how you can take this technology, which I like so much, to the extreme.
Serverless Big Data?
There are many use cases for serverless architectures, such as web systems, use of IoT, connection to microservices via API, automation, and more. Using serverless for data processing is one of the earliest architectures, and serverless Big Data, whether it is for processing large-volume data streams or file processing, is a very common practice.
What surprised me about Nielsen’s story was the size and complexity of the solution. Nielsen’s system, called “DataOut,” is processing 250 billion events-per-day, which translates to 55 terabytes (TB) of data. The system can automatically scale up and down, thanks to the capabilities of serverless architectures, processing from 1TB to 6TB of data per hour, and costing “only” $1,000 per day. (If this sounds like a lot to you, think again about how much computing power it takes to process this amount of data and how much workforce time the system saves.)
What did we learn?
More and more organizations (small and large) are adopting serverless architecture-based solutions for their core systems. The technology has the ability to scale up (and down), reach a very large scale, and provide flexibility at a reasonable price. This is what makes this architecture, even in the world of Big Data, something worth considering when planning a new system or improving an existing one.
True, there are still challenges to be solved and knowledge to be gained as more and more developers use these solutions, but Nielsen’s story is just one of a series of solutions built for better scalability and mostly simpler to operate and maintain.
Spend 10 minutes with Opher and me in the below video to learn how Nielsen built a data processing machine without machines (kind of).
For more content like this, subscribe to our YouTube channels This is My Architecture, This is My Code, and This is My Model, or visit the This is My Architecture on AWS, which has search functionality and the ability to filter by industry, language, and service.