Bundesliga Match Fact Win Probability: Quantifying the effect of in-game events on winning chances using machine learning on AWS
Ten years from now, the technological fitness of clubs will be a key contributor towards their success. Today we’re already witnessing the potential of technology to revolutionize the understanding of football. xGoals quantifies and allows comparison of goal scoring potential of any shooting situation, while xThreat and EPV models predict the value of any in-game moment. Ultimately, these and other advanced statistics serve one purpose: to improve the understanding of who will win and why. Enter the new Bundesliga Match Fact: Win Probability.
In Bayern’s second match against Bochum last season, the tables turned unexpectedly. Early in the match, Lewandowski scores 1:0 after just 9 minutes. The “Grey Mouse” of the league is instantly reminded of their 7:0 disaster when facing Bayern for the first time that season. But not this time: Christopher Antwi-Adjei scores his first goal for the club just 5 minutes later. After conceding a penalty goal in the 38th minute, the team from Monaco di Bavaria seems paralyzed and things began to erupt: Gamboa nutmegs Coman and finishes with an absolute corker of a goal, and Holtmann makes it 4:1 close to halftime with a dipper from the left. Bayern hadn’t conceded this many goals in the first half since 1975, and was barely able to walk away with a 4:2 result. Who could have guessed that? Both teams played without their first keepers, which for Bayern meant missing out on their captain Manuel Neuer. Could his presence have saved them from this unexpected result?
Similarly, Cologne pulled off two extraordinary zingers in the 2020/2021 season. When they faced Dortmund, they had gone 18 matches without a win, while BVB’s Haaland was providing a master class in scoring goals that season (23 in 22 matches). The role of the favorite was clear, yet Cologne took an early lead with just 9 minutes on the clock. In the beginning of the second half, Skhiri scored a carbon-copy goal of his first one: 0:2. Dortmund subbed in attacking strength, created big chances, and scored 1:2. Of all players, Haaland missed a sitter 5 minutes into extra time and crowned Cologne with the first 3 points in Dortmund after almost 30 years.
Later in that season, Cologne—being last in the home-table—surprised RB Leipzig, who had all the motivation to close in on the championship leader Bayern. The opponent Leipzig pressured the “Billy Goats” with a team season record of 13 shots at goal in the first half, increasing their already high chances of a win. Ironically, Cologne scored the 1:0 with the first shot at goal in minute 46. After the “Red Bulls” scored a well-deserved equalizer, they slept on a throw-in just 80 seconds later, leading to Jonas Hector scoring for Cologne again. Just like Dortmund, Leipzig now put all energy into offense, but the best they managed to achieve was hitting the post in overtime.
For all of these matches, experts and novices alike would have wrongly guessed the winner, even well into the match. But what are the events that led to these surprising in-game swings of win probability? At what minute did the underdog’s chance of winning overtake the favorite’s as they ran out of time? Bundesliga and AWS have worked together to compute and illustrate the live development of winning chances throughout matches, enabling fans to see key moments of probability swings. The result is the new machine learning (ML)-powered Bundesliga Match Fact: Win Probability.
How does it work?
The new Bundesliga Match Fact Win Probability was developed by building ML models that analyzed over 1,000 historical games. The live model takes the pre-match estimates and adjusts them according to the match proceedings based on features that affect the outcome, including the following:
- Red cards
- Time passed
- Goal scoring chances created
- Set-piece situations
The live model is trained using a neural network architecture and uses a Poisson distribution approach to predict a goals-per-minute-rate r for each team, as described in the following equation:
Those rates can be viewed as an estimation of a team’s strength and are computed using a series of dense layers based on the inputs. Based on these rates and the difference between the opponents, the probabilities of a win and a draw are computed in real time.
The input to the model is a 3-tuple of input features, current goal difference, and remaining playtime in minutes.
The first component of the three input dimensions consists of a feature set that describes the current game action in real time for both teams in performance metrics. These include various aggregated team-based xG values, with particular attention to the shots taken in the last 15 minutes before the prediction. We also process red cards, penalties, corner kicks, and the number of dangerous free kicks. A dangerous free kick is classified as a free kick closer than 25m to the opponent’s goal. During the development of the model, besides the influence of the former Bundesliga Match Fact xGoals, we also evaluated the impact of Bundesliga Match Fact Skill in the model. This means that the model reacts to substitution of top players—players with badges in the skills Finisher, Initiator, or Ball winner.
Win Probability example
Let’s look at a match from the current season (2022/2023). The following graph shows the win probability for the Bayern Munich and Stuttgart match from matchday 6.
The pre-match model calculated a win probability of 67% for Bayern, 14% for Stuttgart, and 19% for a draw. When we look at the course of the match, we see a large impact of goals scored in minute 36′, 57′, and 60′. Until the first minute of overtime, the score was 2:1 for Bayern. Only a successful penalty shot by S. Grassy in minute 90+2 secured a draw. The Win Probability Live Model therefore corrected the draw forecast from 5% to over 90%. The result is an unexpected late swing, with Bayern’s win probability decreasing from 90% to 8% in the 90+2 minute. The graph is representative of the swing in atmosphere in the Allianz Arena that day.
How it is implemented?
Win Probability consumes event data from an ongoing match (goal events, fouls, red cards, and more) as well as data produced by other Match Facts, such as xGoals. For real-time updates of probabilities, we use Amazon Managed Streaming Kafka (Amazon MSK) as a central data streaming and messaging solution. This way, event data, positions data, and outputs of different Bundesliga Match Facts can be communicated between containers real time.
The following diagram illustrates the end-to-end workflow for Win Probability.
Gathered match-related data gets ingested through an external provider (DataHub). Metadata of the match is ingested and processed in an AWS Lambda function. Positions and events data are ingested through an AWS Fargate container (MatchLink). All ingested data is then published for consumption in respective MSK topics. The heart of the Win Probability Match Fact sits in a dedicated Fargate container (BMF WinProbability), which runs for the duration of the respective match and consumes all required data obtained though Amazon MSK. The ML models (live and pre-match) are deployed on Amazon SageMaker Serverless Inference endpoints. Serverless endpoints automatically launch compute resources and scale those compute resources depending on incoming traffic, eliminating the need to choose instance types or manage scaling policies. With this pay-per-use model, Serverless Inference is ideal for workloads that have idle periods between traffic spurts. When there are no Bundesliga matches, there is no cost for idle resources.
Shortly before kick-off, we generate our initial set of features and calculate the pre-match win probabilities by calling the PreMatch SageMaker endpoint. With those PreMatch probabilities, we then initialize the live model, which reacts in real time to relevant in-game events and is continuously queried to receive current win probabilities.
The calculated probabilities are then sent back to DataHub to be provided to other MatchFacts consumers. Probabilities are also sent to the MSK cluster to a dedicated topic, to be consumed by other Bundesliga Match Facts. A Lambda function consumes all probabilities from the respective Kafka topic, and writes them to an Amazon Aurora database. This data is then used for interactive near-real-time visualizations using Amazon QuickSight.
In this post, we demonstrated how the new Bundesliga Match Fact Win Probability shows the impact of in-game events on the chances of a team winning or losing a match. To do so, we build on and combine previously published Bundesliga Match Facts in real time. This allows commentators and fans to uncover moments of probability swings and more during live matches.
The new Bundesliga Match Fact is the result of an in-depth analysis by the Bundesliga’s football experts and AWS data scientists. Win probabilities are shown in the live ticker of the respective matches in the official Bundesliga app. During a broadcast, win probabilities are provided to commentators through the data story finder and visually shown to fans at key moments, such as when the underdog takes the lead and is now most likely to win the game.
We hope that you enjoy this brand-new Bundesliga Match Fact and that it provides you with new insights into the game. To learn more about the partnership between AWS and Bundesliga, visit Bundesliga on AWS!
We’re excited to learn what patterns you will uncover. Share your insights with us: @AWScloud on Twitter, with the hashtag #BundesligaMatchFacts.
About the Authors
Simon Rolfes played 288 Bundesliga games as a central midfielder, scored 41 goals, and won 26 caps for Germany. Currently, Rolfes serves as Managing Director Sport at Bayer 04 Leverkusen, where he oversees and develops the pro player roster, the scouting department and the club’s youth development. Simon also writes weekly columns on Bundesliga.com about the latest Bundesliga Match Facts powered by AWS. There he offers his expertise as a former player, captain, and TV analyst to highlight the impact of advanced statistics and machine learning into the world of football.
Tareq Haschemi is a consultant within AWS Professional Services. His skills and areas of expertise include application development, data science, machine learning, and big data. He supports customers in developing data-driven applications within the cloud. Prior to joining AWS, he was also a consultant in various industries such as aviation and telecommunications. He is passionate about enabling customers on their data/AI journey to the cloud.
Javier Poveda-Panter is a Data Scientist for EMEA sports customers within the AWS Professional Services team. He enables customers in the area of spectator sports to innovate and capitalize on their data, delivering high-quality user and fan experiences through machine learning and data science. He follows his passion for a broad range of sports, music and AI in his spare time.
Luuk Figdor is a Sports Technology Advisor in the AWS Professional Services team. He works with players, clubs, leagues, and media companies such as the Bundesliga and Formula 1 to help them tell stories with data using machine learning. In his spare time, he likes to learn all about the mind and the intersection between psychology, economics, and AI.
Gabriel Zylka is a Machine Learning Engineer within AWS Professional Services. He works closely with customers to accelerate their cloud adoption journey. Specialized in the MLOps domain, he focuses on productionizing machine learning workloads by automating end-to-end machine learning lifecycles and helping achieve desired business outcomes.
Jakub Michalczyk is a Data Scientist at Sportec Solutions AG. Several years ago, he chose math studies over playing football, as he came to the conclusion that he wasn’t good enough at the latter. Now he combines both these passions in his professional career by applying machine learning methods to gain a better insight into this beautiful game. In his spare time, he still enjoys playing seven-a-side football, watching crime movies, and listening to film music.