AWS Startups Blog
Use Spot Instance Pricing for Your Video Encoding Workflows with Bitmovin
Guest Post by Daniel Hoelbling-Inzko, Solution Architect, Bitmovin
Video encoding can be a time-consuming proposition. Every creative professional who has ever fallen asleep while exporting their Premiere or Final Cut master knows this. When encoding a video for adaptive bitrate streaming, most people do essentially the same thing: start ffmpeg and wait for it to finish churning through the bits.
Depending on your cost consciousness (and patience), you can choose to use your own laptop or something more appropriate, like a really beefy 64-core server that makes your infrastructure folks salivate and your finance department ask pointed questions.
To avoid the latter, the obvious solution is to move to the cloud and take advantage of the AWS infrastructure without any upfront capital investment. Offloading the capital investment to AWS, though, doesn’t come free. AWS needs to charge for running and maintaining the infrastructure so you don’t have to do it.
What if there was a way to cut these AWS costs by as much as 80% for video encoding workflows? Sounds too good to be true?
To deliver on their promise of almost infinite scalability, AWS needs to have insane amounts of idle resources available around the globe, just in case someone decides to call them up and spin up a few thousand instances. To offset some of this operational cost, AWS pioneered the so-called Spot Instance market.
Spot Instance market
Amazon EC2 Spot Instances allow you to bid on unused EC2 compute capacity, often at a significant discount to On-Demand Instance pricing.
Each customer can put in a request for compute time at a certain hourly rate that they are willing to pay. If your bid is higher than your competition, you’ll win the bid and get a compute instance at the requested rate. This can potentially be as much as ninety percent off of the actual On-Demand price that you’d pay if you reserve the instance.
So why isn’t everyone running Spot Instances? The drawback is that AWS doesn’t guarantee that you can keep the Spot Instance. If the market price for Spot Instances rises above your current bid, AWS will evict you and reassign the instance to someone else. You’ll get a two-minute notice, but after that the instance is gone and all the encoding progress that your ffmpeg has been working on for hours is also lost. You’ll end up having to acquire another instance and start from scratch.
AWS has a handy Spot Bid Advisor that tells you how likely you are to be outbid at certain price points and how long you can expect your instance to stay up. But even with the Advisor, using a Spot Instance can be a risky business if the workloads you run aren’t purpose-built for this environment.
Pitfalls of linear encoding tools
If you decide to bid aggressively close to the current market price (to save costs), your instances are very likely to get terminated close or sometimes even before the one-hour billing cycle has elapsed. Running linear encoding tools like ffmpeg that need the instance to stay up for the entire job to finish are therefore at a disadvantage in many ways.
In our tests with ffmpeg, we saw real-time factors of around 0.7x to 1x for a 1.5 hour video. If a video takes 90 minutes to encode, the decision to re-use that same Spot Instance for another encoding job gets difficult. At the end of the one-hour billing cycle, the instance might be terminated. Depending on how the Spot Instance market develops, the next 30 minutes of encoding progress might be lost. If delivery deadlines are critical, reusing that instance is probably not an option, forcing you to waste 25% of your instance time.
Also, because instances have to remain online for at least one finished job, you’ll have to bid quite high to make sure at least a reasonable amount of encoding jobs have a chance to finish before the one-hour billing cycle elapses.
Cloud native encoding tools
You can avoid these headaches simply by using encoding tools that are natively built for the cloud and can cope with dynamic infrastructure changes.
Our Bitmovin encoding solution has been purpose-built to leverage the horizontal scalability that the cloud has to offer, while also being resilient to node failures. We do so by taking a radically different approach to encoding. Instead of running on beefy machines for a long time, we leverage a fleet of small machines where each instance gets to do a small chunk of the job. This enables us to achieve encoding speeds of up to 100x real-time!
We can reach these encoding speeds by having a cluster of machines working together on encoding a single asset. Each cluster is managed by a so-called coordinator node that takes the input file and distributes parts of it to the associated worker nodes in the cluster. This design allows us to scale almost infinitely with any given workload. If customers want to encode multiple codecs like h264, VP9, HEVC, or AV1, we can just add more workers to the fleet to handle the load without having to take a hit in overall processing time or reduce quality due to hardware constraints. The following illustration shows such an example cluster with one coordinator that utilizes 5 workers to encode a video on AWS.
Such a horizontally scalable, chunked encoding solution will obviously see a much higher chance of one worker failing, so we had to build the system to be fault-tolerant and reliable, even when adding or removing worker nodes on the fly. Having the coordinator reschedule failed chunks on other workers was key to making this technology viable.
This resilience enables us to run on ephemeral resources and is coincidentally a perfect fit for the AWS Spot Instance market. We’ve been doing this for quite some time with our Bitmovin Cloud Encoding service. But now with the introduction of our newest product, Bitmovin Containerized Encoding, we enable our customers to run the same technology within their own AWS accounts, all while leveraging the cost advantages of AWS Spot Instances pricing!
Example pricing calculation for 24/7 workloads
So, let’s get into some numbers! How much does infrastructure really cost in the cloud? Let’s assume that we have a cluster of 1024 cores to encode videos 24/7. If we were to utilize 8-core (c4.2xlarge) instances, we’d need 128 instances to achieve an encoding throughput of around 40 hours of source video per hour with our sample bitrate ladder. (Keep in mind that with video the throughput depends in large parts on the used encoding settings, profiles, and more). This would get us 28,800 hours of encoded source video per month when utilizing the instances 24/7. We could also just quadruple the amount of instances and run through the same source files within a week, but for simplicity we’ll do the following example calculations with one month of encoding on 1024 cores.
The following table shows the sample bitrate ladder that we used for this evaluation.
h264 | 1080p | 4.8 MBit/s |
h264 | 720p | 2.4 MBit/s |
h264 | 480p | 1.2 MBit/s |
h264 | 360p | 0.8 MBit/s |
h264 | 240p | 0.4 MBit/s |
If we used only AWS On-Demand instances, renting this cluster would cost us $51 per hour. With an On-Demand Instance price of $0.398 per hour, it doesn’t look that bad at first glance. But doing this for the whole month quickly adds up to around $36,720.
Because our Bitmovin Containerized Encoding is resilient to worker failures, we can leverage the AWS Spot Instance market to drive this cost down. The only thing that can really fail an encoding with our system is removing the encoding coordinator, so we can opt to use On-Demand Instances for the encoding coordinator and run a big fleet of Spot Instances for the worker nodes.
This will guarantee that all encodings will go through, leaving us with $15 per hour, roughly one third the price for exactly the same video queue. This assumes 1% of the instances are running as coordinators on the more expensive On-Demand Instances for $0.398. The remainder is using a Spot Instance price of $0.097. This brings the total infrastructure cost for our 24/7 encoding infrastructure down to $9,280 per month.
Our flexible encoding system also enables us to mix different instance types. Because the AWS Spot Instance market price for a particular instance type might change depending on current supply and demand, we might find a particular instance type to be too expensive for a period of time. For example, if c4.2xlarge instances are in short supply on the Spot Instance market, we could start the coordinator on a c4.xlarge instance and the workers on the much beefier c4.4xlarge instance type (running less instances in turn). If we see the Spot Instance price drop for another instance type (c3.2xlarge for example), we can switch to these and decommission the more expensive nodes we’ve been using so far.
If you’re in an AWS Region where the Spot Instance market is quite stable, you can even get away with running the encoding coordinator on a Spot Instance to save cost. If the Spot Instance price goes up, it can always be switched back to On-Demand Instances between encodings.
The possibilities are endless when it comes to creating an optimized workflow for each of our customers. Get in touch with us today, and we can develop an optimal strategy for your video delivery workflow!