The Quortex approach to live streaming with Amazon EC2 Spot Instances
This blog was co-authored by Zavisa Bjelogrlic, Senior Partner Solution Architect AWS, Jérôme Viéron, CTO, Quortex, Marc Baillavoine CEO, Quortex, and Vincent Marguerie, R&D manager at Synamedia Quortex.
Amazon EC2 Spot Instances let you take advantage of unused EC2 capacity with a discount of up to a 90% compared to on-demand pricing. Spot Instances can be reclaimed with a two-minute notification, which is an excellent option for fault-tolerant, flexible applications that can manage interruptions like big data analytics, pool of web servers, high performance computing (HPC), test and development workloads, CI/CD, and containerized applications.
Video streaming, and especially live streaming, has rigorous requirements: high availability, flawless delivery performance, 24/7 operations, and flexibility to scale with audience size – from low loads over idle periods of the day up to very high loads during prime time or popular sporting events.
Amazon Web Services (AWS) offers several options to optimize operations costs with EC2 and other services, such as Reserved Instances and Savings Plans. Both options require a one- or three-year commitment. These are valid alternatives for long-term engagements and when usage profiles are predictable. On the other hand, Spot Instances require no commitment nor up front payment and therefore offer more flexibility, allowing for savings on short events or when audience attendance is unpredictable. The use of Spot Instances is a complement to commitment options from AWS.
Quortex, a French company recently acquired by video software provider and AWS Partner Synamedia, has developed a containerized video streaming architecture that leverages EC2 Spot Instances for cost optimization. This is a new way of offering video streaming services on AWS.
The Quortex solution is based on a containerized architecture. Microservices in the video processing chain are stateless and designed to work with segments shorter than 10 seconds. Input streams are broken into segments and distributed among clusters of jobs deployed in containers and dedicated to transcoding, packaging, encryption, and delivery. The solution is resilient and provides opportunities to reconfigure clusters in case of termination of Spot instances.
Quortex uses four principles to build a resilient, Spot-based architecture:
- An over-provisioned cluster configuration. Extra nodes run low priority pods and are reused for immediate reconfiguration of clusters in case of spot interruptions.
- On-demand instances are requested if Spot is not available. Provisioning of Spot Instances continues until the required capacity is reached. At this point, clusters are rebalanced to use a maximum of Spot Instances. Allocated on-demand instances are terminated.
- Provisioning of EC2 instances relies on diversification. Leveraging different instance families, instance generations, and instance sizes provides the provisioner with several alternatives to acquire and provision EC2 capacity. Diversification is defined at initial deployment and maintained dynamically over the life of the service. Diversification makes it easier to meet target capacity and reduce the cost of operations.
- Critical microservices are designed to work in degraded mode to provide a best effort service in case of any problem with resources. This allows smooth transitioning in case of issues with cluster reconfiguration.
The Quortex solution is built on a Kubernetes (K8s)The Kubernetes configuration includes clusters for EC2 Spot Instances and clusters for on-demand instances. Clusters include over-provisioned nodes where low priority, “dummy” pods are allocated by affinity rules. Cluster Scaler on AWS is used to adjust the size of the cluster for load changes as audiences fluctuate.
The architecture is presented in the following figure.
Every node in the Spot cluster includes an AWS Node Termination Handler. This handler is used to manage events that can cause an EC2 Spot Instance to become unavailable (instance interruptions, maintenance events, or capacity rebalancing). The handler ensures that the control plane responds appropriately to such events.
When the handler catches the termination event, it cordons and then drains the node. The running pods on the interrupted instances are evicted and moved to the spare nodes to take over lower priority, dummy, pods. Dummy pods change to a pending state and the cluster autoscaler will deploy a new node to host them.
A kubestitute module of Quortex (a K8s operator) is used if EC2 Spot Instances cannot be deployed. In this case, kubestitute will provision on-demand instances while waiting for EC2 Spot Instances. When Spot Instances desired capacity is reached, pods are rebalanced from on-demand nodes to Spot nodes. Kubestitute works with a variety of instance families and sizes to maximize the availability of new nodes required. For example, a typical Quortex configuration will mix c5, c5a, c6, c6g, c6a, c7g, and c4 families of multiple sizes. Kubestitute ensures Spot Instances are diversified at runtime in order to mitigate the impact of any larger preemption of Spot Instances.
With this architecture, the Quortex solution can reduce the cost of streaming video by 60% to 80% compared on-demand EC2 instances. Results vary depending on the type of offering and audience variability but savings is realized without long-term commitments and the solution works in cases of audience variability. These results were achieved while maintaining high availability and without jeopardizing streaming quality.
The solution is used by Red Bee Media and other customers for streaming channels serving millions of viewers in Europe.
“It was obvious to select Synamedia Quortex as a key component of our disaster recovery for streaming services. Synemedia Quortex’s just-in-time processing model and their smart usage of the spot instances meet our needs for a cost-effective, yet immediately available, disaster recovery environment” declared Steve Russell, Chief Product Officer at RedBee Media.
Quortex built the original architecture with a self-managed Kubernetes architecture before the availability of Amazon Elastic Kubernetes Service (EKS). Quortex is now optimizing the deployment on AWS by using Amazon managed node groups to handle on-demand and Spot instances clusters. Managed Spot Node groups, which implement Spot best practices and bake in the capacity rebalance feature of autoscaling, can greatly assist with proactively replacing an instance with a higher chance of interruption with a healthy instance
The kubestitute operator is being upgraded to use features available in the EKS Managed Spot Node Group to optimize the us of mixed spot and on-demand instances.
Quortex demonstrates that EC2 Spot Instances can be used for time-critical services like video streaming. This creates an interesting new option for events and for streaming with unpredictable and highly variable loads.
The solution is complementary to other AWS options and can be combined with savings plans or reserved Instances to reduce the costs of on-demand pools. Quortex is upgrading the solution to better use the existing Amazon Elastic Kubernetes Service (EKS) and its Managed Spot Node Groups.
Founded in 2018, Quortex changes the live streaming paradigm by introducing “Just-In-Time Everything”, a technology that builds the workflow based on user demand, not from the content origination. This keeps infrastructure and network costs to a minimum, while dynamically adapting to audience variability. With Quortex I/O, the benefits of that technology are now available as a SaaS service to make live streaming simpler than ever. Quortex was acquired in 2022 by video software provider Synamedia.
Learn more at https://www.quortex.io