Powered by AWS
Inviting engineers, data scientists, students and other data analytics enthusiasts with no sports experience required to get involved in football analytics.
NFL Big Data Bowl / By the Numbers
Q&A with NFL's Big Data Bowl Co-Founder
The NFL is constantly evolving, both on and off the field. One of the biggest shifts in recent years is that, as the partnership with AWS has deepened, so too has the league’s commitment to using data and technology to improve the sport.
This includes the introduction of the Big Data Bowl, which was created 5 years ago, to develop new insights and analytical approaches using the NFL's Next Gen Stats data. The goal was to create an open platform for engineers, data scientists, students and other data analytics enthusiasts with no sports experience required and from all over the world to get involved in football analytics.
As the presenting sponsor of the event, AWS spoke with Mike Lopez, Sr. Director of Data and Analytics at the NFL and co-creator of the Big Data Bowl, for an exclusive look at the competition’s roots, how it’s evolved over the years, and the impact it’s having across the league.
Q. This is the 5th anniversary of the Big Data Bowl, which is a big milestone. But a lot of people don’t know the genesis of this competition. Can you give us a peek behind the curtain at how this got started and its progression over the years?
Interestingly enough, the foundational roots for what’s now the Big Data Bowl started well before I took a job with the NFL. I was an undergrad student in the early 2000s and, even though I played football and my dad was a high school football coach, I did my college thesis in baseball. Part of the reason I chose baseball is because that was one of the only sports that had publicly available data. If you wanted NFL play-by-play data basically anytime during the decade of the 2000s, you might have gotten lucky to find data on the website of Brian Burke, one of the field’s early adopters of football analytics. But generally, there just there wasn't a lot of data out there for folks to analyze in the public.
There’s a line from the movie Field of Dreams, ‘If you build it, they will come’ and for football analytics it’s “if you share the data, they will come.” And that's kind of been our motto with the Big Data Bowl where we know there are talented analysts that want to get into football, that want to innovate, and want to help change the sport.
Q. You joined the league in 2018, how did you get the Big Data Bowl off the ground and how has it expanded since that initial competition?
As someone with a passion for football and data science, I found the lack of available data personally annoying and something that held data-driven innovation in football back for years. And so, when I started in the league office, in the back of my mind, I knew I’d always want to push this topic forward.
One of the stories that I like to tell is that our first year, we had no idea if it was going to work. I had pitched this idea for a data science competition and had support from my boss at the time, Damani Leech, who’s now the President of the Denver Broncos. But as of three hours before the submission deadline, I think we only had a few papers submitted. It was quite humbling to think, ‘Hey, I promised this and we're only going to get five or ten submissions.’ We had a NFL account set up on Kaggle and I thought the account was broken. I kept thinking something's going wrong here because we've only got a few submissions but I knew all these people who said they were going to submit. As it turns out, everyone was just trying to squeeze in as much as they could before deadline. It was due at midnight on a weekday, and I’m normally in bed at that point but I’m up, just waiting for the submissions to come in. We ended up getting around 60 submissions in the last 90 minutes which was equal parts thrilling and a relief.
I want to say our first year we had we had over 100 participants that were part of 75 submissions. This year, we had over 400 participants produce 230 submission. And over this history of the competition, we’ve had participants from over 75 countries too.
Q. One of the more interesting aspects here is that it’s an open, global competition that doesn’t require sports experience. How has that influenced the growth and types of submissions you’ve gotten in the last 5 years?
With NFL data only recently being publicly available, so many of the ideas are fundamentally new. Many of the best and most creative ones come from non-football experts or people who know the game well, but they also understand an adjacent field like physics and are able to port that knowledge over to football.
That’s one thing that makes this competition so challenging. There is no textbook on how to analyze or handle the data so when you’re filtering it or looking to apply it in a specific way, you're probably the first person in the world to ever do that with this data. And that complexity is compounded if you're trying to apply something from a different field. That's a humbling and scary experience but we've constantly been impressed with both the participant effort and solution creativity.
Q. How have you evolved the Big Data Bowl to encourage a more diverse pool of participants?
We had such a big focus early on to try and attract folks that don’t have a traditional sports analytics background. But it can be hard to jump into the Big Data Bowl competition without any experience or connections. So early on in the competition’s history, we started a mentorship program that connects experienced NFL analytics experts with interested beginners. They get a chance to work one-on-one and get feedback like, ‘I really like this, tell me why you emphasized this aspect’ or ‘have you tried approaching the problem this way instead of that way’. This year alone, we had more than 400 applicants for that program. And it's humbling to know that so many people are interested in football data and just need a little bit of support.
We also really encourage our participants to partner with domain experts, like a coach or former player, to better understand if their ideas are practical and useful on the field. I remember back in 2021 when the competition focused on special teams, there were a couple of students from University of Pittsburgh and they actually asked that college team’s punter about his strategy, ‘Why do you punt to the left or right, what are the gunner’s roles on different kinds of punts?’ And that type of collaboration and line of questioning is really informative because you're hearing directly about team strategy and performance.
Q. You’ve laid the groundwork for people to get involved in the football analytics community but how has this contributed to participants getting jobs in this field?
In the NFL, teams are always looking for a competitive edge. And as more football data has become available, they’re recognizing that the ability to analyze and interpret this data could help them win more games.
It’s been really cool and rewarding to see teams across the league as well as other sports leagues and industries outside of football take notice of the Big Data Bowl. In the first 4 years of the competition, we’ve seen more than 50 Big Data Bowl participants get hired to work in professional sports, including over 30 in football.
Q. If you look at the prior Big Data Bowls, the challenges have covered so many different aspects of the game. How do you decide what to focus on each year?
Every year, we start the process around April where we try to get a lot of different perspectives. We talk to league officials, different clubs, our network partners, and Next Gen Stats team. We try to find out what folks are interested in, what analysis gaps might exist, what feedback they’re hearing from fans, and how that overlaps with what we think would be interesting from a data standpoint.
In the end, our goal is to try and cover a lot of the various positions in the NFL. We’ve done different aspects of offense, defense, and special teams. And this year we’re looking at an area that hasn’t gotten a lot of measurement analysis in offensive and defensive linemen and I’m excited to see what the different proposals.
Q. Can you tell us what the collaboration is like with the Next Gen Stats team?
They’re an integral part of the Big Data Bowl and it’s been great to see how we impact each other and the different ways we work together.
On one hand, their ability to capture so many different kinds of information allow us to have a large volume of clean, labeled, high-quality data that sets the Big Data Bowl apart from so many other sports analytics competitions. And over the last few years, we’ve seen a number of Big Data Bowl submissions go from research projects to productionized Next Gen Stats we see on television. There are lot of steps to from point A to B, some of which have included my team building stats alongside them and others where the AWS AI and machine learning teams are providing expertise.
But the potential to see your work show up on such a big stage is certainly a motivating factor for people to participate. I remember watching the 2021 NFC Championship game and the expected yards stat flashed across the screen. And I sat there thinking how cool it was that it came from a Big Data Bowl competition and, in less than a year, it was being shown on television.
Q. Tell us more about this stacking effect of how, even with different topics each year, participants are able to build off prior Big Data Bowl submissions?
One of the really nice aspects of the competition each year is seeing the new solutions for a totally different theme be based on the solutions to the previous year's theme. And it started back in our very first competition.
A couple of solutions used a fairly well known but complex algorithm from basketball and soccer, called space allocation that measures how much space certain players own on the field. For example, they can tell that a player like Lionel Messi will own a lot of space despite maybe not necessarily taking up that space physically. That concept was a foundation for a couple of winning papers in the 2019 Big Data Bowl.
Fast forward a little bit and then you see a couple of similar aspects of expected yards being built into pass rush metrics now, or there was a paper on the way linemen turn players that they're going against that was based on a blocking metric that somebody came up with in last year’s Big Data Bowl around special teams. So those are the fun types of ways that submissions can build on one another. And I do think it adds to the community nature of the competition where the goal isn't to just to win the competition. I think a lot of people do, but they also want to learn, they want to say hey, somebody built this, here's how I can expand it or apply to somewhere else.
Q. What’s the benefit of having a trusted technology partner like AWS involved in the Big Data Bowl and across the league?
Similar to how Big Data Bowl participants don’t need sports experience to participate, AWS is able to bring a deeper level of AI and machine learning expertise to the table. In practice, this means we’re able to leverage their knowledge and techniques that haven’t traditionally been used in sports to directly lead to new stats like QB Passing Score being developed.
Also, beyond stats, one of the realities of dealing with the player tracking data is that it's not spreadsheet friendly. And that does a couple of things. One is it forces participants to grow from Excel into usually either R or Python or something that's going to allow you to use a notebook to analyze the data. That's one of the things on the back end when we have to ingest this data ourselves, share it as a league office, or as an extra stats team, we need a cloud- based software to help us do that. And so that's where AWS has been really nice. Each time that our Next Gen Stats team takes in these algorithms and produces a winning solution that we see on air, that's all done by AWS. Even our team internally, we're using AWS tools right now to store data on the cloud and to be able to write reproducible code. And that’s important because it allows anyone to come in and immediately import this code and run everything from scratch. At the league office we’re doing our best to modernize and keep up with the technology and AWS helps us do that.