AWS Database Blog
Nine Entertainment’s journey: Achieving 98% cost savings with Amazon ElastiCache Serverless for Valkey
This post is co-written with Michael Lorant, Principal Platform Engineer at Nine Entertainment, in partnership with AWS.
In this post we demonstrate how Nine Entertainment achieved a 98% cost reduction by migrating to Amazon ElastiCache Serverless for Valkey while improving scalability and eliminating manual intervention during peak events.
Managing costs while maintaining performance is a critical challenge for video streaming platforms. Nine Entertainment, one of Australia’s leading media companies, discovered that the caching infrastructure for the APIs supporting their streaming service was costing more than the underlying compute platform.
The challenge: When caching costs more than compute
Nine Entertainment operates two major streaming platforms in Australia: 9Now (a free-to-air BVOD service) and Stan (an entertainment and sports subscription service). These platforms serve hundreds of thousands of concurrent users, delivering a user experience using backend APIs that provide authentication, content browsing, user preferences, and features such as resume from last playback position.
Nine Entertainment deployed Amazon ElastiCache for Redis OSS to maintain the responsive underload of their 9Now streaming platform APIs. ElastiCache delivers sub-millisecond latency for read-intensive workloads by caching frequently accessed data in memory, boosting overall application performance and throughput.
Nine chose Amazon ElastiCache Serverless, which automatically scales compute, memory, and network resources vertically and horizontally to match application demand. The alternative node-based ElastiCache cluster deployment model provisions of a static number of nodes, and the variable load seen on streaming platforms would’ve required their engineering teams to continually monitor and adjust the node count to avoid a lack of capacity during sudden surges in demand or additional costs from over-provisioned capacity during quieter periods.
Amazon ElastiCache Serverless relieves engineering teams to focus on innovation rather than infrastructure management. The automated alignment of usage with demand can result in Amazon ElastiCache Serverless being 30% cheaper than node-based ElastiCache clusters for applications with fluctuating traffic patterns. However when analyzing their AWS bill, the Platform Engineering team made a surprising discovery. Amazon ElastiCache costs exceeded that for the load balancers and Amazon Elastic Compute Cloud (Amazon EC2) instances combined. The caching layer, designed to reduce load on downstream systems, was costing more than the systems it was meant to protect.
This cost anomaly demanded immediate attention. The team needed to understand not just what was happening, but why their caching infrastructure had become a significant cost driver.
Understanding the cost drivers
A critical insight came from understanding how Amazon ElastiCache Serverless pricing works. Rather than the per instance pricing model applied for a cluster deployment model, Amazon ElastiCache Serverless charges for data stored and consumes an ElastiCache Processing Unit (ECPU) for each kilobyte of data transferred.
Armed with this knowledge, the team analyzed ElastiCache metrics for the previous year and discovered that approximately 20 terabytes of data per day of traffic was passing through their caching layer, equivalent to 250 megabytes per second. ECPUs represented the main cost driver for their caching infrastructure under a serverless deployment model because of the large traffic volumes.
This was surprising given the nature of the data being cached. Nine wasn’t storing video files or images in ElastiCache, they were caching JSON objects containing user preferences, content metadata, and API responses.
However, these JSON payloads were being transferred and stored without any compression, resulting in the massive data transfer volumes that were driving high ECPU consumption.
The solution: Compression and Valkey Serverless
The engineering team implemented two key changes that would transform their cost structure:
Payload compression
They wrapped their existing key-value store interface with a lightweight compression module developed using Zstandard (zstd), an open source compression algorithm originally developed by Facebook. Zstandard offers excellent compression ratios for JSON data while maintaining fast compression and decompression speeds; critical for maintaining low latency in a high-throughput caching layer.
The compression module acted as a transparent wrapper around their existing Redis client, requiring minimal changes to the application code. The implementation required only 46 lines of code for the compression wrapper and five lines to integrate it into their existing Node.js applications.
This minimal code change delivered maximum impact by dramatically reducing the amount of data transferred through ElastiCache to under 3TB per day.
Migration to Amazon ElastiCache Serverless for Valkey
Nine Entertainment switched from the Redis OSS to the Valkey engine for ElastiCache, motivated by three key factors. ElastiCache for Valkey Serverless:
- Is priced 33% lower than other supported engines.
- Qualifies for Database Savings Plans, providing a further 30% discount over on-demand pricing.
- Offers substantially faster scaling capabilities, reducing the time taken to double the requests served from 10-12 minutes down to 2-3 minutes.
Faster scaling was particularly important for Nine Entertainment because streaming traffic patterns present unique challenges. While the large influx of viewers seen prior to popular shows or sporting events triggers a large surge in API requests, an even bigger surge is seen at the conclusion when hundreds of thousands of viewers exit the content playback at the same time and return to the main browsing interface.
Nine Entertainment was using ElastiCache for Redis clusters, and the 10-12 minutes required to double the requests per second it could serve still required careful capacity planning and manual pre-scaling before major events even though they were using ElastiCache Serverless. Engineers maintained runbooks detailing how to estimate and set minimum capacity levels to scale clusters before sporting events, season finales, and other high-viewership occasions. Forgetting to manually scale up or not scaling enough resulted in performance degradation during an event. Failing to revert those changes after an event generated unnecessary costs from unused capacity.
With Valkey Serverless, the system now handles these spikes automatically without the need for manual intervention.
Migration approach
The migration process was designed for zero downtime and consisted of the following steps:
- Provisioning a new ElastiCache Serverless for Valkey cluster in parallel with the existing ElastiCache for Redis OSS cluster
- Updating the application endpoint configuration to point to the new Valkey cluster
- Monitoring performance and validating functionality
- Decommissioning the previous Redis cluster once confident in the new setup
Some applications required updates to use clustered mode, but the overall migration was completed with minimal engineering effort, taking approximately one week of actual development time with additional time allocated for monitoring and validation.
The parallel deployment strategy removed risk. By running both clusters simultaneously during the transition, the engineering team could quickly roll back if any issues emerged. In practice, the migration proceeded smoothly, with the team gaining confidence over several days of monitoring before decommissioning the original Redis clusters.
Results: Transformational cost savings
The results exceeded all expectations:
- Annual costs decreased by 98%
- No performance degradation: in fact, performance improvements were observed
- Removing the need for manual cluster management before major events improved operational efficiency.
- Reduced risk of degradation during unexpected traffic spikes from events, such as breaking news.
For Nine Entertainment’s engineering team, the cost savings represented approximately 20% of their annual cost optimization objective; something they achieved with just two engineers. The return on investment was immediate and substantial, releasing budget for other strategic initiatives.
Breaking the serverless pricing perception
Nine Entertainment’s experience challenges a common perception about serverless offerings. Many organizations assume that serverless services cost more than provisioned alternatives, particularly when used for high-throughput workloads. However, ElastiCache Serverless for Valkey broke this assumption at Nine Entertainment.
The combination of pay-per-use pricing, faster scaling, and the ability to remove the need for minimum ECPU allocations made serverless genuinely more cost-effective than provisioned instances. The faster scaling time meant Nine Entertainment could run with much lower baseline capacity, confident that the system would scale quickly when needed.
This represents a significant shift in the value proposition for serverless. For workloads with variable traffic patterns, ElastiCache Serverless for Valkey can deliver both operational simplicity and significant cost savings.
Key takeaways
Nine Entertainment’s experience offers valuable lessons for organizations running high-throughput caching workloads:
- Understand your pricing model – ElastiCache pricing is based on data transfer, not just storage or compute. High-throughput workloads with uncompressed data can lead to unexpectedly high costs.
- Start with data analysis – Understanding your actual data transfer patterns is essential before optimizing costs. Nine’s analysis of the previous year’s usage revealed the true nature of their cost drivers.
- Compression delivers immediate ROI – A straightforward compression implementation can dramatically reduce costs with minimal code changes. For JSON payloads, compression ratios can be substantial.
- Faster scaling enables lower baseline costs – With Valkey’s improved scaling time, Nine Entertainment could remove their minimum ECPU allocation, knowing the system would scale rapidly enough to handle traffic spikes.
- Remove operational overhead – The migration removed the need for manual pre-scaling, runbooks, and constant capacity monitoring, relieving the team to focus on higher value work.
- Serverless isn’t always more expensive – ElastiCache Serverless for Valkey proved to be more cost-effective than provisioned instances for Nine’s workload, challenging the common perception that serverless offerings cost more.
Conclusion
Nine Entertainment’s migration to Amazon ElastiCache Serverless for Valkey demonstrates that significant cost improvement is possible without sacrificing performance or reliability. By understanding the pricing model, implementing compression, and using Valkey’s improved scaling capabilities, Nine Entertainment reduced their caching costs by 98% while improving operational efficiency.
The combination of payload compression and serverless architecture delivered benefits beyond cost savings. Engineers no longer need to maintain scaling runbooks or monitor clusters before major events. The system automatically handles traffic spikes during major sporting events and popular show finales, providing both cost efficiency and operational peace of mind.
If you’re running high-throughput caching workloads, consider analyzing your data transfer patterns and evaluating whether compression and ElastiCache Serverless for Valkey can deliver similar benefits for your organization. The investment in understanding your workload characteristics and implementing compression can yield substantial returns.
For more information about Amazon ElastiCache Serverless for Valkey, refer to the Amazon ElastiCache documentation. To learn more about cost improvement strategies, explore the AWS Cost Optimization Hub.