Kayo Sports builds real-time view of the customer on AWS
Kayo Sports is Australia’s dedicated multi-sports streaming service offering more than 50 live and on-demand sports streamed instantly. Kayo Sports was looking to create a unified database integrating internal and external sources of data, including customer behavior, preferences, profile information, and other interactions to provide a better experience across customer touchpoints. The company decided to build a cloud-native platform on AWS to collect, process, and manage customer engagement data in real time. This unified data platform has become a hub for machine learning and enables departments to manage their own reporting and analytics.
In this interview with Sajid Moinuddin and Narendra Bharani Keerthiseelan of Kayo Sports, we had the chance to learn about the what, how, and why his team chose to build their one-customer-view platform on AWS.
Sajid Moinuddin, Head of Platform Development & Architecture Technology at Kayo Sports, said:
“AWS’ analytics stack with native offerings like EKS, EMR, Athena, Glue, Redshift, Kinesis, and S3 helped us solely focus on the business problem while shielding us from the trivial integration and performance challenges that are typical to any on-premises Big Data ecosystem. Many operational requirements came ready out of the box with AWS’ managed services, so our small engineering team was able to focus solely on the business use cases without incurring much administrative overhead to enable those use cases in three months on our production environment.”
Q&A with Narendra Bharani Keerthiseelan, Principal Data Engineer at Kayo Sports
Tell us a bit about the real-time streaming analytics and data lake you built on AWS. Broadly: What was your approach, how did you build it, and what are you using it for?
At Kayo Sports, we built a Customer Data Platform that provides a unified database for all customer behavior, preferences, profiles, and other interaction data, from internal and external sources. The platform stitches profiles across data sets and integrates with different vendors, enriching the data and providing a better customer experience across all channels and customer touchpoints.
First, our approach was to build a cloud native platform which collects, processes, and manages customer data and engagements across vendors and partners in real time. This unified data platform should be a data hub for machine learning and integrations and also enable all departments with reporting/analytics needs.
As a strategy, we identified the key managed services on AWS that could be leveraged to develop our architecture around, allowing us to solely work on the business value. At the same time, it reduced the cost in time and money on integrations of vendor-based solution. On AWS, we were able to achieve this very quickly by choosing S3 as the storage for the data lake. The other AWS services seamlessly integrated with S3, on which we could build our key components like EKS and EMR for compute, Glue for data catalogue, Athena for data exploration, and Redshift and Redshift Spectrum for data warehouse.
What was the key problem you were looking to solve by implementing the streaming analytics and data lake?
Our primary objective behind building the data lake was to enable a data-driven decision-making platform at Kayo Sports. Quickly deriving value from our myriad of data sources was the key focus area. Being a greenfield solution, we wanted to avoid any architectural non-reversibility in the platform and grow organically with a rinse-and-repeat approach of build-measure-learn.
Some of the key problems addressed by the platform include:
- Eliminate data silos
- Stitching of data on the data platform instead of system to system integrations.
- Democratized access to data via a single, unified view of data across the organization
- Store high volumes of raw and transformed data in the data lake, at low cost
- Accommodate high speed data
- Secured data with governance
- Centralized catalogue
- Advanced analytics capability
What have AWS services allowed you to accomplish that you couldn’t have done without it? Put another way, why did you decide to build your data lake and streaming analytics on AWS?
Kayo was slated to grow rapidly due to the wide variety of partners and affiliate integration that it started with. So we wanted to opt for a platform that can keep up with this pace of growth without any significant spike on infrastructure budget and engineering effort as we face the four V’s of Big Data (volume, variety, velocity, and veracity).
The AWS analytics stack with native offerings like EKS, EMR, Athena, Glue, Redshift, Kinesis, and S3 helped us solely focus on the business problem while shielding us from the trivial integration and performance challenges that are typical to any on-premises Big Data ecosystem. Moreover, we were able to build a scalable, agile platform by combining AWS’ full-stack support for infrastructure as code and the GitOps process that we adopted across our engineering teams.
What are the top benefits Kayo Sports has realized by building your data lake and streaming analytics platform on AWS?
Kayo is the first streaming service of its kind in Australia, and we had a very aggressive product launch roadmap. Many of the data sources and analytical use cases were only discovered during the final months before launch. AWS services provided us with the required stack to enable those use cases in three months in our production environment with a small engineering team. Currently in production, we are running 7,000 data pipelines daily with 1,500 spark jobs and 30,000 python jobs replenishing our data lake in near real time and from 125 different sources.
Also, to make the data lake production-ready, we had to sort out a lot of operational aspects of the platform like security, alerting, monitoring, availability, etc. By using mostly AWS managed services, many of these operational requirements came ready out of the box, so the team was able to focus solely on the business use cases without incurring much administrative overhead.
With EMR and Redshift Spectrum, we can scale up our capacity on demand by treating our compute nodes like “cattle.” Combined with the dependable SLA provided by the AWS product offering, we were able to achieve 99.9% platform availability. Similarly, for security and compliance, we followed the shared responsibility model of AWS and built a federated security infrastructure for Kayo that is seamlessly integrated with the analytics product suite.