The Big Data Bowl competition: Q&A with the NFL

The National Football League (NFL) continues to evolve on and off the field. One of the biggest shifts in recent years is the league’s commitment to using data and technology to improve the sport. Amazon Web Services (AWS) supports this commitment as the NFL’s partnership with AWS continues to deepen.

The Big Data Bowl exemplifies this commitment. Created five years ago, the Big Data Bowl is an annual sports analytics contest that offers new insights and analytical approaches using NFL Next Gen Stats data. The Big Data Bowl provides an open platform for engineers, data scientists, students, and other data analytics enthusiasts all over the world (no sports experience required) to get involved in football analytics.

As the presenting sponsor of the NFL’s Big Data Bowl event, AWS spoke with Mike Lopez, Sr. Director of Data and Analytics at the NFL and co-creator of the Big Data Bowl, to provide an exclusive look at the competition’s roots, its evolution over the years, and its impact across the league.

This is the fifth anniversary of the Big Data Bowl, which is a big milestone. But a lot of people don’t know the genesis of this competition. Can you give us a peek behind the curtain at how this got started and its progression over the years?

Interestingly enough, the foundational roots for what’s now the Big Data Bowl started well before I took a job with the NFL. I was an undergrad student in the early 2000s and, even though I played football and my dad was a high school football coach, I did my college thesis on baseball. Part of the reason I chose baseball is because it was one of the only sports that had publicly available data. If you wanted NFL play-by-play data basically anytime during the decade of the 2000s, you may have gotten lucky finding it on the website of Brian Burke, one of the field’s early adopters of football analytics. But generally, there just there wasn’t a lot of data out there for folks to analyze in the public domain.

There’s a line from the movie Field of Dreams, ‘If you build it, they will come’ and for football analytics it’s “If you share the data, they will come.” And that’s kind of been our motto with the Big Data Bowl, where we know there are talented analysts who want to get into football, who want to innovate, and want to help change the sport.

You joined the league in 2018, how did you get the Big Data Bowl off the ground and how has it expanded since that initial competition?

As someone with a passion for football and data science, I found the lack of available data personally annoying and something that held data-driven innovation in football back for years. And so, when I started in the league office, in the back of my mind, I knew I’d always want to push this topic forward.

One of the stories that I like to tell is that in our first year, we had no idea if it was going to work. I pitched this idea for a data science competition and had support from my boss at the time, Damani Leech, who is now the President of the Denver Broncos. But as of three hours before the submission deadline, I think we only had a few papers submitted. It was quite humbling to think, ‘Hey, I promised this and we’re only going to get five or ten submissions.’ We had an NFL account set up on Kaggle and I thought the account was broken. I kept thinking something’s going wrong here because we’ve only got a few submissions but I knew all these people who said they were going to submit. As it turns out, everyone was just trying to squeeze in as much as they could before deadline. It was due at midnight on a weekday, and I’m normally in bed at that point but I’m up, just waiting for the submissions to come in. We ended up getting around 60 submissions in the last 90 minutes, which was equal parts thrilling and a relief.

I want to say our first year we had we had over 100 participants as part of 75 submissions. This year, we had over 400 participants produce 230 submission. And over the history of the competition, we’ve had participants from over 75 countries too.

One of the more interesting aspects here is that it’s an open, global competition that doesn’t require sports experience. How has that influenced the growth and types of submissions you’ve gotten in the last 5 years?

With NFL data only recently becoming publicly available, so many of the ideas are fundamentally new. Many of the best and most creative ones come from non-football experts or people who know the game well, but they also understand an adjacent field like physics and are able to port that knowledge to football.

That’s one thing that makes this competition so challenging. There is no textbook on how to analyze or handle the data, so when you’re filtering it or looking to apply it in a specific way, you’re probably the first person in the world to ever do so with this data. And that complexity is compounded if you’re trying to apply something from a different field. That’s a humbling and scary experience, but we’ve constantly been impressed with both the participant effort and solution creativity.

How have you evolved the Big Data Bowl to encourage a more diverse pool of participants?

We had such a big focus early on to try and attract folks that don’t have a traditional sports analytics background. But it can be hard to jump into the Big Data Bowl competition without any experience or connections. So early on in the competition’s history, we started a mentorship program that connects experienced NFL analytics experts with interested beginners. They get a chance to work one-on-one and get feedback like, ‘I really like this, tell me why you emphasized this aspect’ or ‘have you tried approaching the problem this way instead of that way’. This year alone, we had more than 400 applicants for that program. And it’s humbling to know that so many people are interested in football data and just need a little bit of support.

We also really encourage our participants to partner with domain experts, like a coach or former player, to better understand if their ideas are practical and useful on the field. I remember back in 2021 when the competition focused on special teams, there were a couple of students from University of Pittsburgh and they actually asked that college team’s punter about his strategy, ‘Why do you punt to the left or right, what are the gunner’s roles on different kinds of punts?’ And that type of collaboration and line of questioning is really informative because you’re hearing directly about team strategy and performance.

You’ve laid the groundwork for people to get involved in the football analytics community. but how has this contributed to participants getting jobs in this field?

In the NFL, teams are always looking for a competitive edge. And as more football data has become available, they’re recognizing that the ability to analyze and interpret this data could help them win more games.

It’s been really cool and rewarding to see teams across the league as well as other sports leagues and industries outside of football take notice of the Big Data Bowl. In the first four years of the competition, we’ve seen more than 50 Big Data Bowl participants get hired to work in professional sports, including over 30 in football.

NFL big data bowl competition

If you look at the prior Big Data Bowls, the challenges have covered so many different aspects of the game. How do you decide what to focus on each year?

Every year, we start the process around April where we try to get a lot of different perspectives. We talk to league officials, different clubs, our network partners, and the Next Gen Stats team. We try to find out what folks are interested in, what analysis gaps might exist, what feedback they’re hearing from fans, and how that overlaps with what we think would be interesting from a data standpoint.

In the end, our goal is to try and cover a lot of the positions in the NFL. We’ve done different aspects of offense, defense, and special teams. And this year we’re looking at an area that hasn’t gotten a lot of measurement: analysis of offensive and defensive linemen. I’m excited to see the different proposals.

Can you tell us what the collaboration is like with the Next Gen Stats team?

They’re an integral part of the Big Data Bowl and it’s been great to see how we impact each other and the different ways we work together.

On one hand, their ability to capture so many different kinds of information allows us to have a large volume of clean, labeled, high-quality data that sets the Big Data Bowl apart from so many other sports analytics competitions. And over the last few years, we’ve seen a number of Big Data Bowl submissions go from research projects to the productionized Next Gen Stats we see on television. There are lot of steps from point A to B, some of which have included my team building stats alongside them and others where the AWS AI and machine learning teams are providing expertise.

But the potential to see your work show up on such a big stage is certainly a motivating factor for people to participate. I remember watching the 2021 NFC Championship game and the expected yards stat flashed across the screen. And I sat there thinking how cool it was that it came from a Big Data Bowl competition and, in less than a year, it was being shown on television.

Tell us more about this stacking effect of how, even with different topics each year, participants are able to build off prior Big Data Bowl submissions?

One of the really nice aspects of the competition each year is seeing that the new solutions for a totally different theme are based on solutions to the previous year’s theme. And it started back in our very first competition.

A couple of solutions used a fairly well known but complex algorithm from basketball and soccer, called space allocation that measures how much space certain players own on the field. For example, they can tell that a player like Lionel Messi will own a lot of space despite maybe not necessarily taking up that space physically. That concept was a foundation for a couple of winning papers in the 2019 Big Data Bowl.

Fast forward a little bit and then you see a couple similar aspects of expected yards being built into pass rush metrics now, or there was a paper on the way linemen turn players that they’re going against that was based on a blocking metric that somebody came up with in last year’s Big Data Bowl around special teams. So those are the fun types of ways that submissions can build on one another. And I do think it adds to the community nature of the competition where the goal isn’t to just to win the competition. I think people want to win, but they also want to learn, they want to say hey, somebody built this, here’s how I can expand it or apply it somewhere else.

What’s the benefit of having a trusted technology partner like AWS involved in the Big Data Bowl and across the league?

Similar to how Big Data Bowl participants don’t need sports experience to participate, AWS is able to bring a deeper level of AI and machine learning expertise to the table. In practice, this means we’re able to leverage knowledge and techniques that haven’t traditionally been used in sports to directly lead to new stats like QB Passing Score being developed.

Also, beyond stats, one of the realities of dealing with player tracking data is that it’s not spreadsheet friendly. And that does a couple of things. One is it forces participants to grow from Excel into usually either R or Python or something that’s going to allow you to use a notebook to analyze the data. That’s one of the things on the back end when we have to ingest this data ourselves, share it as a league office, or as an extra stats team, we need cloud- based software to help us do that. And that’s where AWS has been really nice. Each time that our Next Gen Stats team takes in these algorithms and produces a winning solution that we see on air, that’s all done by AWS. Even our team internally, we’re using AWS tools right now to store data in the cloud and to be able to write reproducible code. And that’s important because it allows anyone to come in and immediately import this code and run everything from scratch. At the league office, we’re doing our best to modernize and keep up with the technology and AWS helps us do that.

AWS for Industries

The Big Data Bowl competition: Q&A with the NFL

This is the fifth anniversary of the Big Data Bowl, which is a big milestone. But a lot of people don’t know the genesis of this competition. Can you give us a peek behind the curtain at how this got started and its progression over the years?

You joined the league in 2018, how did you get the Big Data Bowl off the ground and how has it expanded since that initial competition?

One of the more interesting aspects here is that it’s an open, global competition that doesn’t require sports experience. How has that influenced the growth and types of submissions you’ve gotten in the last 5 years?

How have you evolved the Big Data Bowl to encourage a more diverse pool of participants?

You’ve laid the groundwork for people to get involved in the football analytics community. but how has this contributed to participants getting jobs in this field?

If you look at the prior Big Data Bowls, the challenges have covered so many different aspects of the game. How do you decide what to focus on each year?

Can you tell us what the collaboration is like with the Next Gen Stats team?

Tell us more about this stacking effect of how, even with different topics each year, participants are able to build off prior Big Data Bowl submissions?

What’s the benefit of having a trusted technology partner like AWS involved in the Big Data Bowl and across the league?

Resources

Follow

Learn

Resources

Developers

Help