OneFootball Built a Data Lake in Days Using AWS Lake Formation to Serve 70 Million Fans
From its humble beginnings as one of the first 1,000 applications in the Apple App Store, OneFootball has grown to become one of the world’s most popular digital media platforms for soccer (football) enthusiasts. The company reaches 70 million fans a month with news, scores, statistics, livestreams, and highlights from soccer games around the world. To successfully serve those users, OneFootball’s various teams needed easy access to its backend databases to make informed business decisions and build and test machine learning models with the goal of improving the customer experience.
But to grant the teams’ request for data insights, the company’s lean analytics team of six staff had to run and manage various extract, transform, load (ETL) workloads for independent data silos across the company. With that complex, time-consuming process, the task of extracting data and converting it into timely, actionable information for sales and marketing teams, business analysts, news editors, and data scientists took 4–6 weeks. To better use data for the benefit of the company and soccer fans alike, OneFootball sought a nimbler solution on Amazon Web Services (AWS).
The company used AWS Lake Formation, a service launched in 2019, to easily set up a secure cloud-based data lake in days. Since integrating data from its backend databases into that data lake, OneFootball has simplified data ingestion into its centralized data lake and eliminated legacy ETL workloads. Now the task of receiving a request, extracting data, and delivering insights takes less than two days. The increased availability of data and the enabled self-service analytics have provided internal teams and end users with richer information in shorter amounts of time. The new infrastructure has also reduced technical work and has optimized staff productivity for the company of 220 people spread across five countries,
enabling it to focus on core business.
AWS Lake Formation enabled us to use Amazon S3 as a storage layer on top of a compute layer and seamlessly integrate it into our existing infrastructure. "
Head of Data and Insights, OneFootball
Feeding the World’s Appetite for Soccer
Founded in 2008, OneFootball is a media platform for soccer fans. Every month, it funnels more than 180,000 articles from 3,500 active content providers—independent content creators, clubs, federations, players, and broadcasters—to its users daily through its website and native iPhone and Android apps, which operate in 12 languages. The company first used AWS in 2014 to improve the scalability, reliability, and efficiency of its workloads as its customer base grew dramatically. Over the years, OneFootball has transitioned its entire platform to AWS.
To make backend data more available to stakeholders, OneFootball decided to build a data lake. The company already used Amazon Redshift, the most popular and fastest cloud data
warehouse available. But to quickly get up and running, it decided to create a data extraction system on its own, using existing frameworks. All backend data exposed through APIs was extracted through scripts that would comb through data and drop it into Amazon Redshift every night. The OneFootball team decided to manage ETL frameworks individually using different blueprints. This ultimately increased technical debt and the amount of maintenance the team had to manage. “It was a mess,” says Stephan Durry, head of data and insights at
OneFootball. “Alerting and monitoring were handled differently for each service: sometimes extractions would fail without us noticing immediately, causing missing data for our business users.” That’s when the team turned to AWS Lake Formation.
“It’s not about just extracting the data,” explains Rodrigo Del Monte, data engineer for OneFootball. “You need to compress and partition the data, which is where AWS Lake Formation shines.” Using the prefabricated blueprints in AWS Lake Formation, OneFootball could put the data in the right shape to be consumed by Amazon Redshift with very low overhead. Then various company stakeholders could ingest the information they need on the fly and handpick the tables they want to replicate in the data lake, making data more accessible throughout the company and giving OneFootball’s data engineers more time to innovate.
Seamless Integration for Self-Service Analytics Using a Data Lake on AWS
OneFootball’s data lake comprises all sets of backend databases needed to perform analytics on Amazon Simple Storage Service (Amazon S3), an object storage service that offers industry-leading scalability, data availability, security, and performance. Every day, OneFootball uses AWS Lake Formation to extract data from the data lake and bring it to the data-insights team site. The data is loaded into Amazon S3, and then Amazon Redshift can run queries against petabytes of data in Amazon S3 using Amazon Redshift Spectrum without having to load or transform any data. “AWS Lake Formation enabled us to use Amazon S3 as a storage layer on top of a compute layer right out of the box and seamlessly integrate it into our existing infrastructure,” says Durry. “Building something like this ourselves would have cost us time and caused headaches. If the team needs to ingest new data, instead of creating a complex project, we set up a blueprint and schedule that data to be available daily in the data lake.”
In the next phase, OneFootball uses an extract, load, transform (ELT) system to refresh analytics data daily or to create datasets used to build machine learning models. As an interface for its business users, the team maintains Metabase, an open-source business insights tool that lets users consume all the data that was stored in Amazon S3 by AWS Lake Formation.
Since implementing AWS Lake Formation, OneFootball has cut lead time for loading data from operational databases to the centralized data lake to 3–5 days. The coverage of relevant backend services as part of its data lake has gone up from 30 to 60 percent. This ultimately helped the team see a substantial growth in weekly active analytics users—the team’s internal key performance indicator—increasing usage of the analytics platform by 40 percent.
The capability for self-service analytics lets internal stakeholders consume analytics on demand and more quickly iterate and curate datasets for reporting and performance measurement. This drastically increased the amount of time data analysts could spend on explorative analysis and mining insights instead of running analytics queries; the time needed for the process of requesting and receiving data insights was cut from an average of 4–6 weeks to a maximum of 2 days. “Ultimately, we’re a small data team servicing over 220 people across OneFootball, but now we can spend more time understanding business problems rather than maintaining different types of database extractions,” says Durry. “Seeing more and more people across the organization make use of analytics on a daily basis is a great achievement. Having all relevant data sources reliably integrated was a prerequisite.”
Further Enriching Data Analytics Using More AWS Services
OneFootball plans to boost its data-analytics system using Amazon Kinesis Data Streams, a massively scalable and durable real-time data streaming service. “Using Amazon Kinesis Data Streams, we can load data into a data lake for analysts and machine learning models or we can have backend applications consume data in near real-time rather than waiting for daily ETL jobs to run,” explains Del Monte. “And the time to market is much faster.”
Currently, OneFootball is working on streaming events into its data lake infrastructure so that it can offer data in near real-time. Amazon Kinesis Data Streams loads data into Amazon
Elasticsearch Service, so end users can find and see the information they are interested in almost immediately.
Using AWS Lake Formation, OneFootball built a data lake and data analytics system that has proved to be a huge score for the company. Teams can use self-service analytics to quickly drive data insights and then focus on turning those insights into smart business decisions. “Everything is nicely managed now in terms of how many queries are run against our data lake,” says Durry. “By opening up the data lake and data warehouse, we put destiny into people’s hands.”
OneFootball is the world’s most popular digital media platform for soccer enthusiasts, reaching 85 million monthly fans in 15 languages with 24/7 news, livestreams, scores, stats, and highlights on more than 200 leagues and competitions worldwide. Following the acquisition of Dugout in December 2020, OneFootball welcomed Arsenal, Barcelona, Bayern Munich, Chelsea, Juventus, Liverpool, Manchester City, Paris Saint-Germain, Real Madrid and Olympique de Marseille as new shareholders.
Benefits of AWS
- Increased data coverage from relevant backend databases from 30% to 60%
- Increased usage of analytics platform by 40% for daily active end users
- Cut time needed to request and receive data from 4-6 weeks to two days
- Reduced lead time for loading data from operational databases to the data lake to 3-5 days
- Set up a data lake in days versus months
- Enables staff to more quickly iterate and curate datasets for explorative work
AWS Services Used
AWS Lake Formation
AWS Lake Formation is a service that makes it easy to set up a secure data lake in days. Creating a data lake with Lake Formation is as simple as defining data sources and what data access and security policies you want to apply.
Amazon Redshift is the world’s fastest cloud data warehouse and gets faster every year. Redshift powers analytical workloads for Fortune 500 companies, startups, and everything in between.
Amazon Simple Storage Service (Amazon S3)
Amazon Simple Storage Service (Amazon S3) is an object storage service that offers industry-leading scalability, data availability, security, and performance. Amazon S3 is designed for 99.999999999% (11 9's) of durability, and stores data for millions of applications for companies all around the world.
Amazon Kinesis Data Streams (KDS)
Amazon Kinesis Data Streams (KDS) is a massively scalable and durable real-time data streaming service. KDS can continuously capture gigabytes of data per second from hundreds of thousands of sources. The data collected is available in milliseconds to enable real-time analytics use cases such as real-time dashboards, real-time anomaly detection, dynamic pricing, and more.
Companies of all sizes across all industries are transforming their businesses every day using AWS. Contact our experts and start your own AWS Cloud journey today.