AWS Machine Learning Blog

Bundesliga Match Fact Keeper Efficiency: Comparing keepers’ performances objectively using machine learning on AWS

The Bundesliga is renowned for its exceptional goalkeepers, making it potentially the most prominent among Europe’s top five leagues in this regard. Apart from the widely recognized Manuel Neuer, the Bundesliga has produced remarkable goalkeepers who have excelled in other leagues, including the likes of Marc-André ter Stegen, who is a superstar at Barcelona. In view of such steep competition, people are split on the question of who the most remarkable sweeper in the German top league is. As demonstrated by Yann Sommer’s stunning 19 saves (Bundesliga record) against Bayern Munich last summer that aided his former club Mönchengladbach to pull a draw on the Bavarians, this league’s keepers are fiercely vying for the top spot.

We have witnessed time and time again that a keeper can make or break a win, yet it remains challenging to objectively quantify their effect on a team’s success. Who is the most efficient goal keeper in the Bundesliga? Who prevents more goals than the average? How can we even compare keepers with different playing styles? It’s about time to shed some light on our guardians’ achievements. Enter the brand-new Bundesliga Match Fact: Keeper Efficiency.

When talking about the best of the best shot-stoppers in the Bundesliga, the list is long and rarely complete. In recent years, one name has been especially dominant: Kevin Trapp. For years, Trapp has been regarded as one of the finest goalies in the Bundesliga. Not only was he widely considered the top-rated goalkeeper in the league during the 2021/22 season, but he also held that title back in 2018/19 when Eintracht Frankfurt reached the Europa League semifinals. Similar to Yann Sommer, Trapp often delivered his best performances on nights when his team was up against the Bavarians.

Many football enthusiasts would argue that Yann Sommer is the best keeper in Germany’s top league, despite being also the smallest. Sommer is highly skilled with the ball at his feet and has demonstrated his ability to produce jaw-dropping saves that are on par with others in the world elite. Although Sommer can genuinely match any goalkeeper’s level on his best days, he hasn’t had enough of those best days frequently in the past. Although he has improved his consistency over time, he still makes occasional errors that can frustrate fans. While being the well-deserved Switzerland’s #1 since 2016, time will tell whether he pushes Manuel Neuer off the throne in Munich.

And let’s not forget about Gregor Kobel. Since joining Borussia Dortmund, Kobel, who has previously played for Hoffenheim, Augsburg, and VfB Stuttgart, has been a remarkable signing for the club. Although Jude Bellingham has possibly overtaken him as the team’s highest valued player, there is still a valid argument that Kobel is the most important player for Dortmund. At only 25 years old, Kobel is among the most promising young goalkeepers globally, with the ability to make quality saves and face a significant number of shots in the Bundesliga. The pressure to perform at Dortmund is immense, second only to their fierce rivals Bayern Munich (at the time of this writing), and Kobel doesn’t have the same defensive protection as any Bayern keeper would. In 2022/23 so far, he has almost secured a clean sheet every other match for Die Schwarzgelben, despite the team’s inconsistency and often poor midfield performance.

As these examples show, the ways in which keepers shine and compete are manifold. Therefore, it’s no surprise that determining the proficiency of goalkeepers in preventing the ball from entering the net is considered one of the most difficult tasks in football data analysis. Bundesliga and AWS have collaborated to perform an in-depth examination to study the quantification of achievements of Bundesliga’s keepers. The result is a machine learning (ML)-powered insight that allows fans to easily evaluate and compare the goalkeepers’ proficiencies. We’re excited to announce the new Bundesliga Match Fact: Keeper Efficiency.

How it works

The new Bundesliga Match Fact Keeper Efficiency allows fans to evaluate the proficiency of goalkeepers in terms of their ability to prevent shooters from scoring. Although tallying the total number of saves a goalkeeper makes during a match can be informative, it doesn’t account for variations in the difficulty of the shots faced. To avoid treating a routine catch of a 30-meter shot aimed directly at the goalkeeper as being equivalent to an exceptional save made from a shot taken from a distance of 5 meters, we assign each shot a value known as xSaves, which measures the probability that a shot will be saved by a Keeper. In other words, a shot with an xSaves value of 0.9 would be saved 9 out of 10 times.

An ML model is trained through Amazon SageMaker, using data from four seasons of the first and second Bundesliga, encompassing all shots that landed on target (either resulting in a goal or being saved). Using derived characteristics of a shot, the model generates the probability that the shot will be successfully saved by the goalkeeper. Some of the factors considered by the model are: distance to goal, distance to goalkeeper, shot angle, number of players between the shot location and the goal, goalkeeper positioning, and predicted shot trajectory. We utilize an extra model to predict the trajectory of the shot using the initial few frames of the observed shot. With the predicted trajectory of the shot and the goalkeeper’s position, the xSaves model can evaluate the probability of the goalkeeper saving the ball.

Adding up all xSaves values of saved and conceded shots by a goalkeeper yields the expected number of saves a goalkeeper should have during a match or season. Comparing that against the actual number of saves yields the Keeper Efficiency. In other words, a goalkeeper with a positive Keeper Efficiency rating indicates that the goalkeeper has saved more shots than expected.

Keeper Efficiency in action

The following are a few estimates to showcase the Keeper Efficiency.

Example 1

Due to the large distance to the goal, and the relatively low distance and large number of defenders covering the goal, the probability that the shot will result in a goal is low. Because the goalkeeper saved the shot, he will receive a small increase in the Keeper Efficiency ranking.

Example 2

In this example, the striker is much closer to the goal, with only one defender between him and the goalkeeper, resulting in a lower save probability.

Example 3

In this example, the speed of the ball is much higher and the ball is higher off the ground, resulting in a very low probability that the ball will be saved. The goal was conceded, and therefore the goalkeeper will see a small decrease in his Keeper Efficiency statistic.

What makes a good save

The preceding video shows a medium difficulty shot with approximately a 50/50 chance of being saved, meaning that half the keepers in the league would save it and the other half concede the goal. What makes this save remarkable is the goalkeeper’s positioning, instinct, and reflexes. The goalkeeper remains focused on the ball even when his vision is obstructed by the defenders and changes his positioning multiple times according to where he thinks the biggest opening lies. Looking at it frame by frame, as soon as the attacking player winds up to take the shot, the goalkeeper makes a short hop backwards to better position himself for the jump to save the shot. The keeper’s reflexes are perfect, landing precisely at the moment when the striker kicks the ball. If he lands too late, he would be mid-air as the ball is flying towards the goal, wasting precious time. With both feet planted on the grass, he makes a strong jump, managing to save the shot.

How Keeper Efficiency is implemented

This Bundesliga Match Fact consumes both event and positional data. Positional data is information gathered by cameras on the positions of the players and ball at any moment during the match (x-y coordinates), arriving at 25Hz. Event data consists of hand-labelled event descriptions with useful attributes, such as shot on target. When a shot on target (a scored or saved goal) event is received, it queries the stored positional data and finds a sync frame—a frame during which the timing and position of the ball match with the event. This frame is used to synchronize the event data with the positional data. Having synchronized, the subsequent frames that track the ball trajectory are used to predict where the ball will enter the goal. Additionally, the goalkeeper position at the time of the shot is considered, as well as a number of other features such as the number of defenders between the ball and the goalpost and the speed of the ball. All this data is then passed to an ML model (xGBoost), which is deployed on Amazon SageMaker Serverless Inference to generate a prediction on the probability of the shot being saved.

The BMF logic itself (except for the ML model) runs on an AWS Fargate container. For every xSaves prediction, it produces a message with the prediction as a payload, which then gets distributed by a central message broker running on Amazon Managed Streaming for Apache Kafka (Amazon MSK). The information also gets stored in a data lake for future auditing and model improvements. The contents of the Kafka messages then get written via an AWS Lambda function to an Amazon Aurora Serverless database to be presented in an Amazon QuickSight dashboard. The following diagram illustrates this architecture.


The new Bundesliga Match Fact Keeper Efficiency measures the shot-stopping skills of the Bundesliga’s goalies, which are considered to be among the finest in the world. This gives fans and commentators the unique opportunity to understand quantitatively how much a goalkeeper’s performance has contributed to a team’s match result or seasonal achievements.

This Bundesliga Match Fact was developed among a team of Bundesliga and AWS experts. Noteworthy goalkeeper performances are pushed into the Bundesliga live ticker in the mobile app and on the webpage. Match commentators can observe exceptional Keeper Efficiency through the data story finder, and visuals are presented to the fans as part of broadcasting streams.

We hope that you enjoy this brand-new Bundesliga Match Fact and that it provides you with new insights into the game. To learn more about the partnership between AWS and Bundesliga, visit Bundesliga on AWS!

We’re excited to learn what patterns you will uncover. Share your insights with us: @AWScloud on Twitter, with the hashtag #BundesligaMatchFacts.

About the Authors

Javier Poveda-Panter is a Senior Data Scientist for EMEA sports customers within the AWS Professional Services team. He enables customers in the area of spectator sports to innovate and capitalize on their data, delivering high-quality user and fan experiences through machine learning and data science. He follows his passion for a broad range of sports, music, and AI in his spare time.

Tareq Haschemi is a consultant within AWS Professional Services. His skills and areas of expertise include application development, data science, machine learning, and big data. He supports customers in developing data-driven applications within the cloud. Prior to joining AWS, he was also a consultant in various industries such as aviation and telecommunications. He is passionate about enabling customers on their data/AI journey to the cloud.

Jean-Michel Lourier is a Senior Data Scientist within AWS Professional Services. He leads teams implementing data-driven applications side by side with AWS customers to generate business value out of their data. He’s passionate about diving into tech and learning about AI, machine learning, and their business applications. He is also an enthusiastic cyclist, taking long bike-packing trips.

Fotinos Kyriakides is an ML Engineer with AWS Professional Services. He focuses his efforts in the fields of machine learning, MLOps, and application development, in supporting customers to develop applications in the cloud that leverage and innovate on insights generated from data. In his spare time, he likes to run and explore nature.

Uwe Dick is a Data Scientist at Sportec Solutions AG. He works to enable Bundesliga clubs and media to optimize their performance using advanced stats and data—before, after, and during matches. In his spare time, he settles for less and just tries to last the full 90 minutes for his recreational football team.

Luuk Figdor is a Principal Sports Technology Advisor in the AWS Professional Services team. He works with players, clubs, leagues, and media companies such as the Bundesliga and Formula 1 to help them tell stories with data using machine learning. In his spare time, he likes to learn all about the mind and the intersection between psychology, economics, and AI.