AWS for M&E Blog
Under the hood: How the Seattle Seahawks apply data-driven insights across the franchise using AWS
This post is authored by Patrick Ward, Head of Research and Analytics, Seattle, Seahawks and Dan Skutt, Database Administrator, Seattle Seahawks. The content and opinions in this post are those of the third-party author and AWS is not responsible for the content or accuracy of this post.
Introduction
The National Football League (NFL) is the highest level of competitive football in the world. Each year, 32 teams compete in a 17-game regular season, attempting to qualify for the playoffs and make a run at being crowned Super Bowl champion. Unlike the sport of baseball, a paucity of research exists around NFL player performance and their corresponding contribution on the field. This void centers on player interaction, which ultimately determines whether the offense or defense will win the down. While simple count statistics (yards gained, touchdowns, tackles, etc.) have been traditionally collected and reported, newer data streams, largely driven by advances in technology, have provided teams with the ability to begin to untangle such relationships between players and better understand performance.
At the Seattle Seahawks, we create methods for acquiring and storing data and then help to direct the research and analysis efforts within football operations. These efforts are broadly defined within three domains: (1) Player health and performance; (2) Player acquisition; and (3) Team and opponent scouting. Across these three domains our research and analytics department’s primary objective is to acquire, store, and clean data and then conduct relevant analysis specific to the needs of the decision-makers within those departments. As data has grown from the simple count statistics into large sets of player tracking data across NCAA and NFL competitions there is a need now, more than ever, to make sure an infrastructure is in place to help the team derive meaningful insights from this information. Our partnership with the engineers at Amazon Web Services (AWS) has been front and center at taking our data pipeline to the next level, allowing for faster sharing of information across football operations.
Data utilization within football
Currently, we ingest data relating to players in the NFL and NCAA, which includes not only basic stats, but advanced stats provided by various companies that manually code events within the game, and large amounts of player tracking data. In addition, in-house data collection pertaining to player health, such as inertial sensor data worn during practice and force plate data collected during jumping activities, is also pertinent to the weekly workflow of our football team.
While teams have traditionally functioned by storing data in dispersed data stores and spreadsheets, performing analysis as questions arise, such a strategy is no longer adequate given the volume of data we now have access to. Utilizing the AWS best practices for ETL and data management to retrieve the data and then organize and store it within our Amazon Redshift database has allowed us to provide rapid insights to those in football operations. Such speed is made possible by leveraging tools including Amazon S3, AWS Step Functions, AWS Lambda Functions, AWS Fargate, Amazon EventBridge, AWS Database Migration Service, Amazon DynamoDB, and AWS Glue jobs to ingest the raw data and create aggregated tables that combine the raw data in meaningful ways for analysts and developers. We use AWS Step Functions as the core serverless automation piece performing the ETL that pulls data from our many data sources into Amazon S3. Within those Step Functions we leverage Lambda Functions, DMS, and Fargate to perform different functions depending on the work required to perform the necessary extract, transform, and load processes. Each of these tools get scaled as needed for their specific tasks. Settings and configuration information are stored in DynamoDB with scheduling information in EventBridge. Glue jobs are used to move data from Amazon S3 into Redshift.
Figure 1
Following data ingestion and aggregation, analysis is performed, and the result is shared within our in-house application and Shiny web applications. This allows the relevant users to visualize data in informative ways and drive additional questions. The following example is motivated by such a workflow.
The combination of inertial sensors, in-game radio frequency data generated by player worn sensors (NextGen), and jump data captured weekly via force plate are used to inform the team’s performance staff about the individual player’s readiness to tolerate the upcoming week of preparation leading into the next game. With a game being played every seventh day, ensuring that players recover optimally and “win the week” with respect to physical, mental, and tactical preparation is imperative to high levels of success on game day. Previously, without AWS, we would store such information across multiple CSV files and join it together to produce daily and weekly reports. This approach is not sustainable and requires a substantial amount of heavy lifting on the part of the analyst to always be present to run these reports and make sure the data is properly stored and coded properly within the CSV sheets. Moreover, these reports were static, meaning that performance staff members must parse through them to find information that is relevant to a player or problem they are attempting to solve for.
Using AWS, the data is ingested automatically via API to our RedShift database using Step Functions and the associated tools shown in Figure 1. Queries are then run against these data sources and a data set is produced that can be analyzed for daily and weekly insights regarding player health and well-being. The analysis is run within AWS, generating a table of outputs for the performance staff. These outputs are then shared across the department using a Shiny web app, offering a user interface that allows performance staff members to query the data in various ways (e.g., by position, by player, by date range, by training day or training week), depending on the question they are attempting to answer. This analysis is discussed by the staff and used to help plan training for the week based on individual player’s needs. Once players are identified to be outside of a normal range or on a potentially unwanted trend, requiring an intervention to mitigate any unintended consequences to player health or performance, additional analysis is conducted, which ingests the training plan for the upcoming week, providing an expected workload on each of the upcoming days conditional on how the training plan has been constructed (Figure 2). Collectively, this information is then used to assist the Performance Director in developing a bespoke training plan for each athlete leading into the next game.
Figure 2
Conclusion
The development of technology infrastructure using AWS represents a path forward for our football operations department. The construction of our data lake allows us to interact with diverse data sets and to construct data sets to ask unique football questions that help the team prepare for upcoming opponents, evaluate player performance, and manage player health. Integrating AWS infrastructure with other AWS Partners, such as Posit Shiny web apps, allows us to leverage model building and output sharing with decision-makers across the organization.
Learn more about how the Seattle Seahawks use AWS: https://aws.amazon.com/sports/nfl/seattle-seahawks/