Predicting qualification ranking based on practice session performance for Formula 1 Grand Prix
If you’re a Formula 1 (F1) fan, have you ever wondered why F1 teams have very different performances between qualifying and practice sessions? Why do they have multiple practice sessions in the first place? Can practice session results actually tell something about the upcoming qualifying race? In this post, we answer these questions and more. We show you how we can predict qualifying results based on practice session performances by harnessing the power of data and machine learning (ML). These predictions are being integrated into the new “Qualifying Pace” insight for each F1 Grand Prix (GP). This work is part of the continuous collaboration between F1 and the Amazon ML Solutions Lab to generate new F1 Insights powered by AWS.
Each F1 GP consists of several stages. The event starts with three practice sessions (P1, P2, and P3), followed by a qualifying (Q) session, and then the final race. Teams approach practice and qualifying sessions differently because these sessions serve different purposes. The practice sessions are the teams’ opportunities to test out strategies and tire compounds to gather critical data in preparation for the final race. They observe the car’s performance with different strategies and tire compounds, and use this to determine their overall race strategy.
In contrast, qualifying sessions determine the starting position of each driver on race day. Teams focus solely on obtaining the fastest lap time. Because of this shift in tactics, Friday and Saturday practice session results often fail to accurately predict the qualifying order.
In this post, we introduce deterministic and probabilistic methods to model the time difference between the fastest lap time in practice sessions and the qualifying session (∆t = tq-tp). The goal is to more accurately predict the upcoming qualifying standings based on the practice sessions.
Error sources of ∆t
The delta of the fastest lap time between practice and qualifying sessions (∆t) comes primarily from variations in fuel level and tire grip.
A higher fuel level adds weight to the car and reduces the speed of the car. For practice sessions, teams vary the fuel level as they please. For the second practice session (P2), it’s common to begin with a low fuel level and run with more fuel in the latter part of the session. During qualifying, teams use minimal fuel levels in order to record the fastest lap time. The impact of fuel on lap time varies from circuit to circuit, depending on how many straights the circuit has and how long these straights are.
Tires also play a significant role in an F1 car’s performance. During each GP event, the tire supplier brings various tire types with varying compounds suitable for different racing conditions. Two of these are for wet circuit conditions: intermediate tires for light standing water and wet tires for heavy standing water. The remaining dry running tires can be categorized into three compound types: hard, medium, and soft. These tire compounds provide different grips to the circuit surface. The more grip the tire provides, the faster the car can run.
Past racing results showed that car performance dropped significantly when wet tires were used. For example, in the 2018 Italy GP, because the P1 session was wet and the qualifying session was dry, the fastest lap time in P1 was more than 10 seconds slower than the qualifying session.
Among the dry running types, the hard tire provides the least grip but is the most durable, whereas the soft tire has the most grip but is the least durable. Tires degrade over the course of a race, which reduces the tire grip and slows down the car. Track temperature and moisture affects the progression of degradation, which in turn changes the tire grip. As in the case with fuel level, tire impact on lap time changes from circuit to circuit.
Data and attempted approaches
Given this understanding of factors that can impact lap time, we can use fuel level and tire grip data to estimate the final qualifying lap time based on known practice session performance. However, as of this writing, data records to directly infer fuel level and tire grip during the race are not available. Therefore, we take an alternative approach with data we can currently obtain.
The data we used in the modeling were records of fastest lap times for each GP since 1950 and partial years of weather data for the corresponding sessions. The lap times data included the fastest lap time for each session (P1, P2, P3, and Q) of each GP with the driver, car and team, and circuit name (publicly available on F1’s website). Track wetness and temperature for each corresponding session was available in the weather data.
We explored two implicit methods with the following model inputs: the team and driver name, and the circuit name. Method one was a rule-based empirical model that attributed observed ∆t to circuits and teams. We estimated the latent parameter values (fuel level and tire grip differences specific to each team and circuit) based on their known lap time sensitivities. These sensitivities were provided by F1 and calculated through simulation runs on each circuit track. Method two was a regression model with driver and circuit indicators. The regression model learned the sensitivity of ∆t for each driver on each circuit without explicitly knowing the fuel level and tire grip exerted. We developed and compared deterministic models using XGBoost and AutoGluon, and probabilistic models using PyMC3.
We built models using race data from 2014 to 2019, and tested against race data from 2020. We excluded data from before 2014 because there were significant car development and regulation changes over the years. We removed races in which either the practice or qualifying session was wet because ∆t for those sessions were considered outliers.
Managed model training with Amazon SageMaker
We trained our regression models on Amazon SageMaker.
Amazon SageMaker is a fully managed service that provides every developer and data scientist with the ability to build, train, and deploy ML models quickly. Specifically for model training, it provides many features to assist with the process.
For our use case, we explored multiple iterations on the choices of model feature sets and hyperparameters. Recording and comparing the model metrics of interest was critical to choosing the most suitable model. The Amazon SageMaker API allowed customized metrics definition prior to launching a model training job, and easy retrieval after the training job was complete. Using the automatic model tuning feature reduced the mean squared error (MSE) metric on the test data by 45% compared to the default hyperparameter choice.
We trained an XGBoost model using the Amazon SageMaker’s built-in implementation. Its built-in implementation allowed us to run model training through a general estimator interface. This approach provided better logging, superior hyperparameter validation, and a larger set of metrics than the original implementation.
In the rule-based approach, we reason that the differences of lap times ∆t primarily come from systematic variations of tire grip and compound for each circuit and fuel level for each team between practice and qualifying sessions. After accounting for these known variations, we assume residuals are random small numbers with a mean of zero. ∆t can be modeled with the following equation:
s is the sector number within a lap. ∆tg(c) and ∆tf(c,s) are known sensitivities of tire grip and fuel mass, and ε is the residual. A hierarchy exists among the factors contained in the equation. We assume grip variations for each circuit (g(c)) are at the top level. Under each circuit, there are variations of fuel level across teams in each sector (f(t,c,s)) and delta in sector time between the soft compound and actual compound used (∆p(s,c)).
To further simplify the model, we neglect ε because we assume it is small. We further assume fuel variation for each team across all circuits is the same in each sector (i.e., f(t,c,s) = f(t)). We can simplify the model to the following:
Because ∆tf(c,s) and ∆tg(c) are known, we can estimate team fuel variations (f(t)) and tire grip variations (g(c)) from the data, after adjusting for the tire compound effect (∆p(s,c)).
The differences in the sensitivities depend on the characteristics of circuits. From the following track maps, we can observe that the Italian GP circuit has fewer corner turns and the straight sections are longer compared to the Singapore GP circuit. Additional tire grip gives a larger advantage in the Singapore GP circuit.
ML regression model
For the ML regression method, we don’t directly model the relation between ∆t and fuel level and grip variations. Instead, we fit the following regression model with just the circuit, team, and driver indicator variables:
Ic, It, and Id represent the indicator variables for circuits, teams, and drivers.
Hierarchical Bayesian model
Another challenge with modeling the race pace was due to noisy measurements in lap times. The magnitude of random effect (ϵ) of ∆t could be non-negligible. Such randomness might come from drivers’ accidental drift from their normal practice at the turns or random variations of drivers’ efforts during practice sessions. With deterministic approaches, such random effect wasn’t appropriately captured. Ideally, we wanted a model that could quantify uncertainty about the predictions. Therefore, we explored Bayesian sampling methods.
With a hierarchical Bayesian model, we account for the hierarchical structure of the error sources. As with the rule-based model, we assume grip variations for each circuit (g(c))) are at the top level. The additional benefit of a hierarchical Bayesian model is that it incorporates individual-level variations when estimating group-level coefficients. It’s a middle ground between two extreme views of data. One extreme is to pool data for every group (circuit and driver) without considering the intrinsic variations among groups. The other extreme is to train a regression model for each circuit or driver. With 21 circuits, this amounts to 21 regression models. With a hierarchical model, we have a single model that considers the variations simultaneously at the group and individual level.
We can mathematically describe the underlying statistical model for the hierarchical Bayesian approach as the following varying intercepts model:
Here, i represents the index of each data observation, j represents the index of each driver, and k represents the index of each circuit. μjk represents the varying intercept for each driver under each circuit, and θk represents the varying intercept for each circuit. wp and wq represent the wetness level of the track during practice and qualifying sessions, and ∆T represents the track temperature difference.
Test models in the 2020 races
After predicting ∆t, we added it into the practice lap times to generate predictions of qualifying lap times. We determined the final ranking based on the predicted qualifying lap times. Finally, we compared predicted lap times and rankings with the actual results.
The following figure compares the predicted rankings and the actual rankings for all three practice sessions for the Austria, Hungary, and Great Britain GPs in 2020 (we exclude P2 for the Hungary GP because the session was wet).
For the Bayesian model, we generated predictions with an uncertainty range based on the posterior samples. This enabled us to predict the ranking of the drivers relatively with the median while accounting for unexpected outcomes in the drivers’ performances.
The following figure shows an example of predicted qualifying lap times (in seconds) with an uncertainty range for selected drivers at the Austria GP. If two drivers’ prediction profiles are very close (such as MAG and GIO), it’s not surprising that either driver might be the faster one in the upcoming qualifying session.
Metrics on model performance
To compare the models, we used mean squared error (MSE) and mean absolute error (MAE) for lap time errors. For ranking errors, we used rank discounted cumulative gain (RDCG). Because only the top 10 drivers gain points during a race, we used RDCG to apply more weight to errors in the higher rankings. For the Bayesian model output, we used median posterior value to generate the metrics.
The following table shows the resulting metrics of each modeling approach for the test P2 and P3 sessions. The best model by each metric for each session is highlighted.
All models reduced the qualifying lap time prediction errors significantly compared to directly using the practice session results. Using practice lap times directly without considering pace correction, the MSE on the predicted qualifying lap time was up to 2.8 seconds. With machine learning methods which automatically learned pace variation patterns for teams and drivers on different circuits, we brought the MSE down to smaller than half a second. The resulting prediction was a more accurate representation of the pace in the qualifying session. In addition, the models improved the prediction of rankings by a small margin. However, there was no one single approach that outperformed all others. This observation highlighted the effect of random errors on the underlying data.
In this post, we described a new Insight developed by the Amazon ML Solutions Lab in collaboration with Formula 1 (F1).
This work is part of the six new F1 Insights powered by AWS that are being released in 2020, as F1 continues to use AWS for advanced data processing and ML modeling. Fans can expect to see this new Insight unveiled at the 2020 Turkish GP to provide predictions for the upcoming qualifying races at practice sessions.
If you’d like help accelerating the use of ML in your products and services, please contact the Amazon ML Solutions Lab .
About the Author
Guang Yang is a data scientist at the Amazon ML Solutions Lab where he works with customers across various verticals and applies creative problem solving to generate value for customers with state-of-the-art ML/AI solutions.