This post is authored by Devin Pleuler, Senior Director of R&D at MLSE, and Ari Entin, Principal of AWS sports marketing communications. The content and opinions in this post are those of the third-party author and AWS is not responsible for the content or accuracy of this post.
In the high-stakes realm of the National Basketball Association (NBA), every team is on a relentless pursuit of excellence. For the Toronto Raptors, a crucial element of this quest is underpinned by data-driven insights that form the bedrock of strategic decision-making. One fundamental part of this puzzle involves maintaining and regularly updating player performance models throughout the season. With fresh players being introduced into the league, and current ones continually evolving, these models must be kept up-to-date and accurate.
Keith Boyarsky, VP, Basketball Strategy and Research at the Raptors, puts it succinctly, "Our player performance models are key to the decision-making process across various aspects of team operations - from scouting to coaching, and health management. It is imperative that these models are accurate and current to reflect the evolving dynamics of the league."
From On-Premises to the Cloud
A few years ago, the Toronto Raptors started utilizing an on-premises GPU for this crucial retraining process. This choice was primarily influenced by factors such as model size, security, and the volume of inbound content for retraining. However, the computational demands of the deep learning models quickly outpaced the capacity of the on-premises resources, resulting in retraining times that could span several days. As a result, the heaviest models could only be retrained on a weekly or monthly basis, leading to potential periods of stale models, especially when new players entered the league.
These deep learning models produce statistical player representations that are fed downstream into various other models that are used across the Toronto Raptors organization. The more data available on a player, the more accurate that player representation will be. Therefore, in low-sample size situations, it is critical to regularly retrain these models so their outputs accurately represent current player behavior and tendencies.
Boyarsky recalls, "We had to find a solution that would enable us to retrain our models more frequently and efficiently, to keep up with the rapid inflow of new data in an NBA season, as well accelerate our ability to iterate on the model architectures themselves."
Model Retraining - An Imperative for Accurate Insights
Retraining multiple ML models is no small task. This could mean training on entirely new data, adding more recent data to the existing data set, or updating certain components of the model, such as feature weights or parameters.
To provide a sense of the size and scale of the data involved, the Raptors leverage roughly 10-years of full player tracking data, and each season has 1,230 games. Each of these games represent about 100-200 MBs worth of data in their raw form and can grow exponentially when additional metadata is calculated and layered on top. In practice, this means the models process 1-2 terabytes of data during the training process.
However, constantly retraining the models is essential to address variables like model drift and ensure player performance models stay relevant, reflect the latest performance levels, adapt to strategic or style changes across the NBA, and This concept of model drift is a phenomenon where the predictive performance of a model decreases over time because the underlying relationships between inputs and outputs, originally captured during model training, change or 'drift' away from their original state. This is particularly pertinent in a dynamic landscape like professional basketball where player conditions, team strategies, and rules are in a state of flux.
Unleashing New Possibilities with AWS
That is where Amazon Web Services (AWS) came in. By harnessing the power of AWS, the Toronto Raptors have revolutionized their ML model retraining process. AWS's robust suite of services allows us to retrain models with unprecedented speed and efficiency, delivering accurate insights promptly to coaching and scouting departments, as well as the player health management team. This scalability will be even more critical in the upcoming season and beyond as the Raptors will start to incorporate full body pose tracking data – which represents 4 GB of data per game - into their player performance models."Our migration to AWS marked a major turning point in our data analytics journey. The enhanced speed and efficiency brought about by AWS's suite of services allowed us to deliver data-driven insights faster and more accurately," remarks Boyarsky. "This has empowered our team to make swift, informed decisions that are crucial in the fast-paced NBA environment."
The shift to AWS began with a strategic migration to Amazon SageMaker, an industry-leading platform for machine learning that provides every developer and data scientist with the ability to build, train, and deploy machine learning models quickly. Amazon EC2 instances, optimized for GPU-intensive workloads, further expedited model retraining.
The Toronto Raptors journey with AWS has not only transformed the way they utilize data, but it has also created an environment conducive to rapid, data-driven decision-making. By transitioning from on-premises GPU to AWS, they’ve not only streamlined operations but also significantly increased their capacity for agility and precision in decision-making. AWS's scalable and adaptable solutions have equipped them with the tools to keep pace with the ever-changing landscape of the NBA.
Boyarsky concludes, "Our partnership with AWS has been a transformative journey. And we believe this is just the beginning. The future of basketball is data-driven, and we're excited to be at the forefront of this revolution, thanks to AWS."