How DAZN uses AWS Step Functions to orchestrate event-based video streaming at scale
This blog is co-authored by Russ Johnson at DAZN, and Corneliu Croitoru and Chris Fane at AWS.
In this blog post, we explain how DAZN, a global sports entertainment platform, used AWS Step Functions from Amazon Web Services (AWS) to build a lightweight, modular, and extensible orchestrator to automate its live sports streaming events. This architecture lets DAZN deploy just-in-time media workflows based on an event’s configured classification and resiliency requirements.
DAZN’s over-the-top (OTT) streaming platform broadcasts top tier sports to millions of customers around the world every week. To deliver this experience to its subscribers, the company manages the deployment of complex video workflows while also coordinating with backend application services that comprise its streaming platform.
Historically, broadcasters have used dedicated on-premises equipment provisioned for peak audience demand, so these resources are under-utilized much of the time. In addition, on-premises systems can be inflexible, and require pre-configuration for either North American or European standards. For international broadcasters, this means that equipment for one standard sits idle, while demand for another standard y outstrip supply. This can result in events dropping from the broadcast schedule, leaving subscribers disappointed.
DAZN is an event-based business, with the majority of those events taking place during evenings and weekends. By moving to flexible cloud-based solutions, customers like DAZN can scale to meet peak demand while only paying for the resources when they’re in use.
Moving media workloads to the cloud lets customers leverage the cloud’s elasticity, but deploying these workloads manually would require many steps. DAZN delivers events across the world and can have tens to hundreds of events running at any time. Automatically deploying workloads on cloud infrastructure shortly before an event starts allows DAZN to optimize its costs and minimize operational overhead. Some of this infrastructure takes time to launch and configure, and so this is where automation is critical to avoid human error. The following diagram illustrates the high-level operations that take place around a media event running from 7 PM until 9 PM.
Figure 1: Basic event broadcast timeline
Before a live event’s starting point at 7pm, there is a period of time we refer to as the event’s PreRoll. This PreRoll should allow sufficient time for all deployment steps to complete, the operations team to validate the event infrastructure, and some safety margin to allow time for any last-minute alterations. The stakes are high in live sport and any delay in provisioning infrastructure or its configuration can mean viewers missing potentially critical minutes of the game.
DAZN needed a dependable but lightweight application to automatically orchestrate scheduled deployment, management, and tear-down of event infrastructure to allow its operations team to focus on the finer details of bringing live sporting events to millions of people’s homes and mobile devices.
The following timeline illustrates end-to-end operations that take place around a live sports event running from 7 PM until 9 PM with DAZN’s new solution.
Figure 2: End-to-end timeline for an event
To achieve these operations, DAZN implemented a Serverless solution utilizing AWS Step Functions, which is deployed via AWS CDK.
The AWS Cloud Development Kit (AWS CDK) is an open-source software development framework used to define cloud resources as code using a familiar programming language. With AWS CDK, common architectural patterns can be grouped together into single constructs that can then be reused multiple times. The CDK codebase can be versioned and tested via typical CI/CD practices to provide a consistent and reliable way to deploy infrastructure.
AWS Step Functions is a low-code, visual workflow service that customers can use to build distributed applications and orchestrate media workflows and business processes. AWS Step Functions provides native integrations with a number of AWS services that customers can use to build robust workflows, allowing developers to focus on higher-value business logic. AWS Step Functions provides state tracking and visualization to help operators understand the current status of complex event lifecycles.
Figure 3: High-level architecture of the video streaming orchestration workflow
An Amazon DynamoDB (4) table holds information about scheduled events, fed from an upstream scheduling system. This includes the event’s unique ID, along with event configuration data such as the event’s PreRoll timestamp and the technical details that define the resilience and redundancy requirements for the event.
Once a day, the ‘Fetch Events’ AWS Lambda function (3) is triggered by AWS CloudWatch (1) and Amazon EventBridge (2). This Lambda function queries the DynamoDB table (4) for all the events in the next 24 hours and queues them into an SQS Queue (5) for processing. The ‘Event Scheduler’ Lambda function (6) consumes events from the queue and creates a Step Functions execution in the Event state machine (7) for each event.
The Step Functions state machine (7) in Figure 3 represents progress through the event infrastructure lifecycle, and can be used to get an “at a glance” understanding of the event’s deployment status. Each of the 4 steps represents another Step Functions State Machine. These sub-State Machines are used to encapsulate all the elements of a stage in the event’s lifecycle.
The diagram in Figure 4 depicts the details of each step in the main Step Function, and the encapsulated Step Functions.
Figure 4: Detailed definition of the main Step Function
The main Step Function receives JSON text as input and passes that input to each inner Step Function, which includes the latest PreRoll, event start, and event end timestamps. The PreRoll timestamp is then used via a “Timestamp Wait” to function as a configurable delay ahead of infrastructure deployment. The execution waits until the broadcast event’s scheduled PreRoll time before resuming the orchestration. Once execution resumes, the function reads the event configuration from the DynamoDB table, ensuring it has the latest event requirements. This provides DAZN with the flexibility to make changes up until the moment when the infrastructure is deployed.
This simple state machine is highly configurable, and can be expanded to include the orchestration of third-party resources, the implementation of automated infrastructure tests, and calls to external systems to trigger downstream actions and notifications.
An AWS CodeBuild job is used to both build and dismantle the video streaming infrastructure. AWS CodeBuild runs build commands and potentially long-running stack deployments before handing back to the state machine flow.
A small number of these ‘deploy workflow’ step functions are used to define the various event deployment permutations, and their corresponding tear down processes. These define resilience requirements such as the number of AWS Availability Zones the event is deployed across, and broadcast requirements such as whether graphics and or ad insertion infrastructure provisioning is required. This infrastructure can be defined via modular AWS CDK components that allow for the components to be easily reused between deployment permutations.
While an event is in progress, the state machine waits until the scheduled event’s end time. Live sports often overrun and has to wait for confirmation from a human operator. Once the event end time has been reached, the state machine polls the DynamoDB table awaiting confirmation that the tear down should continue. At this point, the remainder of the execution workflow runs to destroy the resources for the event and signal completion to the parent state machine.
Deployment state machines are easily adapted as business needs and technology stacks evolve, and provide a flexible way to define and orchestrate event resource lifecycle. Importantly, they also provide a means to define bespoke retry and recovery logic, allowing DAZN to build robust and reliable deployment workflows.
DAZN was able to build a working prototype of an event-based orchestration system with multiple workflow types in a matter of weeks, thanks to the simplicity of AWS Step Functions and the re-usability of AWS CDK. With a base workflow established, it was easy to visualize the steps required to orchestrate relevant media components.
The modularity of AWS CDK meant that it was easy to create and adapt variations of media workflows based on the tiering and redundancy requirements for an event. AWS CDK was integrated into CI/CD pipelines, which allows the service to deploy in a safe and reliable way across multiple environments.
With the flexibility of AWS Step Functions, DAZN could co-ordinate both synchronous and asynchronous tasks in a series of steps within a single state machine where cost is only incurred during state transitions. No expensive polling mechanism is required—and the system only moves on to the next step when it is ready.
DAZN’s creation of an event-based orchestrator for its media workflows provides the foundations for significant future growth. Crucially, the company’s adoption of on-demand cloud-based workflows strongly aligns DAZN’s costs with its revenue-generating activities.
Code samples available to bootstrap a similar architecture
A smart Cron Job App by leveraging Step Function ability to wait until a timestamp
Modular building system using AWS Step Functions
Serverless Workflows Collection
Live Streaming on AWS from SRT/Zixi/RIST/RTP FEC input sources
Live streaming from RTP/RTMP sources
Live Streaming content at scale using Amazon CloudFront