2024 Big Data Bowl Final Project: Attack Zones

For the 2nd time, I attempted to enter the Big Data Bowl competition but my frustrating combination of procrastination and inexperience with the Kaggle notebook posting procedures caused me to not upload my notebook correctly/on time and it did not count. However, I'm proud of the work that I did and want to share it, so below is an explanation post about my 2024 project (see this link for an explanation on my 2022 project)

Introduction

Officially, football games are won and lost by the points a team scores. Unofficially, football is a battle for distance, measured in yards from the opponent’s endzone. While a defender's primary responsibility is to prevent points, their main objective after an offensive player gets the ball is to minimize the yards gained by the ball carrier.

Before getting into the data preparation, let me introduce the main metric used to evaluate the yards given up by the tackler after initial contact: "Yards After Contact." Given that the endzones in our tracking data are positioned at the far ends of the X-values, "Yards After Contact" focuses solely on the X-direction and is calculated as the difference in the ball carrier's X-value between the "first_contact" event and the "tackle" event. A higher X-distance between these events indicates a poorer performance by the tackler, as it means they allowed more yards after initial contact before completing the tackle. This metric is added to the tracking data and labeled as "x_totaldis_fc_t" or sometimes "x_dist."

This post investigates which attack angles by a tackler (referred to as Attack Zones) lead to the fewest yards gained after the tackler has made first contact with the ball carrier. The analysis focuses specifically on solo tackles, standardizing the directions of motion for both the ball carrier and the tackler. Although this falls under the metric category, the insights gained are invaluable for coaches as well.


Data Preparation

The primary data sources for this notebook are from Big Data Bowl (BDB) and the R stats package "nflverse." The BDB data includes game, play, player, tackle, and tracking data for weeks 1-9 of the 2022 NFL season. Unique "gameId" and "playId" columns were merged into a single "game_play_Id" column, which was then matched with the corresponding “nflverse” game and play data. This integration added additional context, such as solo tackles, penalties, and expected points added, for each "game_play_Id," covering over 50,000 plays in total.

To filter the data, I isolated every "game_play_Id" that met the following criteria:

  • Involved a single solo tackler
  • Did not involve a penalty
  • Did not involve a turnover
  • Did not end out of bounds
  • Was not a special teams play
  • Was not a sack

This filtering process left approximately 5,450 plays within the Weeks 1-9 tracking data. I then used the BDB data to identify the unique ball carrier and tackler "nflId" for each "game_play_Id" and isolated these two "nflId"s (along with the football) in the tracking data for all solo tackles.

A big part of the data preparation involved standardizing the ball carrier’s direction of motion (and play direction) to the right and adjusting the tackler's angle of attack based on changes in the ball carrier's direction. To do this, I modified all direction angles from the given 0 degrees (increasing clockwise) to the more standard 0 degrees (increasing counterclockwise). I then rotated both the ball carrier's and tackler’s angles so that the ball carrier’s final direction was always 0 degrees to the right.

It's important to note that the angle between the ball carrier and tackler (angle of attack) remained unchanged, only their direction of motion was standardized. While this adjustment altered the position data, once I had the "Yards After Contact" X-distance for each solo tackle, I no longer needed the position data, as my focus shifted to the angle of attack. This standardization enabled a consistent comparison of each tackle's angle of attack.


Exploratory Analysis

For context, the chart below illustrates the correlations between expected points added (EPA) per play versus total yards gained per play (left) and "Yards After Contact" (right). As anticipated, both total yards gained and "Yards After Contact" show a positive correlation with a high EPA per play. This indicates that a defender should aim to limit both metrics as much as possible.

In my exploratory analysis, I first examined the time duration between the "first_contact" and "tackle" events in the tracking data. As shown in the chart below, the vast majority of solo tackles occur within 2 seconds of first contact, with 80% happening in under 1.8 seconds.

To analyze the plays and tackles themselves, I isolated the tracking data for just the ball carrier and tackler for each solo tackle. I began by creating basic animated GIFs of each play. A sample of these animations is shown below, displaying the players' movements along the field using their X and Y position data, with a black arrow indicating their orientation (the direction they were facing) and a green arrow showing their direction of movement. This visualization provided important context on the players' movements that the raw data alone couldn’t.


In-Depth Analysis

After investigating whether a player's momentum and force (calculated by multiplying their weight by their speed and acceleration, respectively) influenced the "Yards After Contact" given up on a play, I decided to focus on the player's attack angle toward the ball carrier.

Tacklers are traditionally taught to square up to the ball carrier and meet them head-on (moving in opposite directions) to drive the ball carrier backward and minimize "Yards After Contact." While this makes sense theoretically, in practice, most ball carriers have a significant momentum advantage when they meet a tackler, often driving the tackler backward from the point of first contact.

By standardizing the ball carrier’s direction of motion to 0 degrees right and rotating the tackler’s direction accordingly, I derived a single angle measurement (tackler’s angle of attack) instead of dealing with multiple directions. Due to the variety of angles, I grouped them into 12 groups of 30 degrees each, forming 12 Attack Zones.

Zone 1 ranges from 0 degrees right counterclockwise to 30 degrees, continuing in this manner until Zone 12, which covers 330 degrees to 360 degrees (0) right. This grouping allowed for a clearer understanding of which of the 12 Attack Zones were more successful in limiting the "Yards After Contact" compared to examining 360 individual degrees. Below are the descriptive statistics for the Attack Zones, along with an overall chart showing the median values and counts for each Attack Zone.

Given the directional nature of the data, I created the radar charts at the top of the post. These charts display the direction and median "Yards After Contact" values, as well as a similar radar chart illustrating the EPA/play for each Attack Zone. Lower values in both metrics are better for a defender, as they indicate giving up fewer "Yards After Contact" and expected points when attacking from a smaller angle rather than a larger one.


Results and Discussion

Contrary to conventional tackling wisdom, meeting the ball carrier head-on (Attack Zones 1 and 12) does not result in the lowest median "Yards After Contact" or lowest EPA/play. Surprisingly, these zones exhibit some of the highest values. This may be because defenders might be stationary, waiting for the ball carrier, making it easier for them to be overpowered.

Interestingly, the best Attack Zones from which a tackler should engage a ball carrier are from the side and at an angle from behind. Tackling from the side makes sense as the ball carrier may overcome a head-on momentum transfer and continue moving, whereas momentum transferred from the side is harder for them to counter. Even if a tackler has less momentum than the ball carrier, a smaller force from the side can effectively bring the ball carrier down compared to a larger force head-on.

The same principle applies to tackling from behind. The additional momentum from a different angle is more challenging for the ball carrier to overcome than a direct head-on confrontation. While it may seem counterintuitive to tackle from behind, as it might push the ball carrier forward, this forward push is often less than the "Yards After Contact" gained by meeting the ball carrier head-on with the same momentum.

For limitations about these results, it's important to note that this data only considers solo tackles that include both the "first_contact" and "tackle" events. Tackles lacking a "first_contact" event are not included in the "Yards After Contact" calculations, as the final metric cannot be determined without the initial contact point. Further research is needed to understand the distinction between these events fully.

Additionally, this analysis did not explore how Attack Zone knowledge could assist tacklers in collaborative efforts, such as helping from the side or back to complete the tackle faster. Nor did it examine "missed tackles" and which Attack Zones result in the most missed tackles. Both assisted tackles and missed tackles should be investigated further, as this notebook focuses solely on completed solo tackles that have both "first_contact" and "tackle" events.